2d181e74b3ce32e9959b35da98979d6a.ppt
- Количество слайдов: 24
Saddle. Hill A SCSI I/O Generator By Owen Parry
Project Motivation To create an application that exercises new controller firmware and hardware. – Provide the ability to rapidly add features. – Provide more tolerance for hardware/firmware failures. To Develop a mechanism that allows multiple hosts to randomly access and efficiently share the SATA affiliations. – Current methods seek to avoid the Serial ATA limitation; usingle initiator in a SAS domain; limiting communication with the disk drive to only 1 initiator at a time. – Need simple and decentralized strategy for an embedded environment. – Achieve long-term max-min fairness. – Avoid violating I/O time limits.
Background SCSI targeted at the enterprise storage market. Used primarily to attach hard disk drives. – High performance RPM: 10 K, 15 K Seek Time: 3. 2 – 7. 4 ms – Greater reliability. – – – MTBF: 1. 2 M Hr Capacity, 18 – 300 Gb Expensive: $160 - $1400 Multiple host support. ATA targeted at the desktop market. – Medium performance RPM: 5400, 7200 Seek Time: 8. 9 – 9. 5 ms – Mediocre reliability – – – MTBF: 500 K hr Capacity, 40 GB – 1 Tb Cheap: $75 - $300 Single host.
Background Serial Attached SCSI is the new Transport protocol replacing parallel SCSI. SAS Advantages. – Faster Data Rates. SAS-1: 300 MB/s SAS-2: 600 MB/s – Larger Drive counts. Typical Domain size 128 16 K addresses using fan-out expanders – Increased data integrity. – Configuration flexibility. – Supports Serial ATA Drives.
Background Problem with SATA in SAS topology – Architecture only allows a single host. – SAS uses mutual exclusion called an “Affiliation. ” The first initiator to open a connection may own the affilation indefinitely. – Vendors want to simultaneously issue commands to SATA disks from multiple initiators.
Related Work Unable to locate other works in the storage area. Closely Related Research – Wireless LANs Bandwidth sharing schemes: – Maxmin Fair Scheduling in Wireless Networks, Leandros Tassiulas and Saswati Sarkar. Channel time sharing schemes: – Proportional Fairness in Wireless LANs and Ad Hoc Networks, Li Bin Jian, and Soung Chang Liew – Time-based fairness improves Performance in Multi-rate WLANs, Godfrey Tan and John Guttag.
Saddle. Hill Design / Implementation
Saddle. Hill Design / Implementation Built using Trolltech’s Qt 4. 2. 3 Compiled for x 86_64 bit systems. Comprised of four logical blocks.
Saddle. Hill Design / Implementation Main. Window – Lists PCI SAS Initiators devices. – Lists SAS Target devices. – Displays Live Test Statistics. – Displays Application messages. – Accepts user input.
Saddle. Hill Design / Implementation Management Unit – Manages Saddle. Hill’s physical I/O Data buffers, and Initiator operational buffers. – Address conversion: Virtual to Physical; Physical to Virtual. – Maintains a list of SAS Initiators and Targets. – Maintain the application message log. – Maintain the model objects (system device, message, statistics) which are used by the GUI to gather and display information to user. – Distributes device configurations. – Starts/Stops I/O tests. – Calculates I/O and Throughput rates.
Saddle. Hill Design / Implementation IO Engine – Initializes SAS targets. – Maintains disk SAS Addresses, and Target ID. – Generates, Issues, and Completes SCSI Commands. e. g. Read 10, Write And Verify 10, Inquiry, Read Capacity etc. Comprises three threads to perform each of the above tasks. – Maintains statistics: Number of I/Os issued Number of I/Os completed Error count Amount of Data Transferred. I/O Response times.
Saddle. Hill Design / Implementation Hardware Abstraction Layer – Saddle. Hill. Driver Registers with linux kernel as a character device. Registers with PCI core. Allocates blocks of physical memory. Currently 16 MB. Reserves the physical memory to prevent swapping. Provides the facilities to map PCI SAS I/O control registers to user space. Provides the facilities to map the physical memory to user space. Provides PCI Device configuration information to user space application. – HAL (User Level) Implements the MPI specification Initializes the SAS Adapter Converts Requests from IO Engine to the MPI specific format. Sends requests to and receive replies to/from the Initiator via the PCI control registers. Processes MPI Replies and completes request to IO Engine. Manages STP Affiliations. Maintains test statistics – – – – Number of I/Os issues. Number of I/Os completed. Error Count. Amount of Data Transferred. I/O Response Times. Affiliation ownership times. Affiliation synchronization count.
Affiliation Synchronization Uses idea put forward in “Proportional Firness in Wireless LANs and Ad Hoc Networks. ” – Fix the maximum transmission time. – Contend fairly among the initiators for the mutex. Implementation – Affiliation Acquisition started be reception of new I/O Calculate back-off. – Use uniform distribution random number generator to choose back-off time within contention window size. Generate SCSI Inquiry command Sleep for length of back-off Issue Inquiry Failed synchronization attempt doubles contention window size. Start timer on successful acquisition – Affiliation Release – – Resource released if no I/Os are waiting to be sent. Resource released after ownership timer expires. There are no preemptions. I/Os are placed into a waiting state during the release and acquisition process. I/Os outstanding at the time of release are allowed to completed. The truncated binary exponential back-off strategy is used to calculate the backoff times.
Finding the Back-Off Strategy Considered strategies for back-off included: – – – No Back-off Fixed Window BEB TBEB Logarithmic – – – – Read 10, Write 10 commands Single Block Transfers Same LBA Drive Caching Enabled NCQ enabled Drive Q-Depth = 8 3 Gb SATA disk Multiple Initiators Test Strategy:
Finding the Back-Off Strategy STP Ownership Times
Finding the Back-Off Strategy Synchronization Requests
Finding the Back-Off Strategy Average I/O Response
Finding the Back-Off Strategy No Back-Off – Too many synchronization attempts – Depending on topology configuration will favor some initiators Fixed Window – There is no way to chose the appropriate window size. BEB – Violates the I/O time limits in long test runs. Logarithmic – Achieves near perfect max-min fairness in resource ownership in both short and long terms. – Large number of synchronization requests. Unacceptable in large topologies. The Truncated Binary Exponential Strategy was chosen for the implementation of the synchronization algorithm – Closely achieves long-term max-min fairness – Low number of synchronization attempts.
Performance Transaction processing profile was used. – Small Block Transfer (1 -16 Blocks) – Concerned with I/O Rates rather than throughput. Single Initiator – Same IO size ~2250 IOPS. – Random IO sizes ~902 IOPS. Dual Initiators – Same IO sizes ~ 2075 IOPS. 8% Performance decrease. – Random IO sizes ~ 786 IOPS. 12% Performance decrease Quad Initiators – Same IO sizes ~ 1975 IOPS. 14% Performance decrease – Random IO sizes ~745 IOPS. 17% Performance decrease
Performance
Future Directions Due to the challenges of SATA in enterprise storage environments. Vendors are employing varying strategies to deal with the SATA problem. These include: – Completely removing SATA from topologies. – Building special hardware that increase the affiliation resources. The STP Resource sharing algorithm will be moved to the SAS Initiator port. – Requires a change in the mechanism that acquires an releases affiliations. Utilize the SAS CLOSE(CLEAR AFFILIATION) primitive when tearing down connections. Simply convert and issue host IO. Saddle. Hill – Short Term Support SAS-2 Initiator Support additional SBC and SPC commands Support SSC and MMC SCSI command sets FW Upgrade Support Initiator Configuration Modification – Long Term Build into a automated firmware unit test system.
Conclusion All project goals achieved – User-Level SCSI I/O generator – Synchronization algorithm that meets the simplicity, fairness and decentralization objectives.
QUESTIONS?
2d181e74b3ce32e9959b35da98979d6a.ppt