Скачать презентацию Graduate Computer Architecture I Lecture 13 Storage Скачать презентацию Graduate Computer Architecture I Lecture 13 Storage

eaea96da7856541efd2231b5124b6806.ppt

  • Количество слайдов: 36

Graduate Computer Architecture I Lecture 13: Storage & I/O Graduate Computer Architecture I Lecture 13: Storage & I/O

Classical DRAM Organization bit (data) lines r o w d e c o d Classical DRAM Organization bit (data) lines r o w d e c o d e r row address Each intersection represents a 1 -T DRAM Cell Array word (row) select Column Selector & I/O Circuits data • Column Address Row and Column Address together: – Select 1 bit a time 2 - CSE/ESE 560 M – Graduate Computer Architecture I

1 -T Memory Cell (DRAM) • Write: row select – 1. Drive bit line 1 -T Memory Cell (DRAM) • Write: row select – 1. Drive bit line – 2. . Select row • Read: – 1. Precharge bit line to Vdd/2 – 2. . Select row – 3. Cell and bit line share charges bit • Very small voltage changes on the bit line – 4. Sense (fancy sense amp) • Can detect changes of ~1 million electrons – 5. Write: restore the value • Refresh – 1. Just do a dummy read to every cell. 3 - CSE/ESE 560 M – Graduate Computer Architecture I

Main Memory Performance Cycle Time Access Time • DRAM (Read/Write) Cycle Time > DRAM Main Memory Performance Cycle Time Access Time • DRAM (Read/Write) Cycle Time > DRAM (Read/Write) Access Time • DRAM (Read/Write) Cycle Time : – How frequent can you initiate an access? – Analogy: A little kid can only ask his father for money on Saturday • DRAM (Read/Write) Access Time: – How quickly will you get what you want once you initiate an access? – Analogy: As soon as he asks, his father will give him the money • DRAM Bandwidth Limitation analogy: – What happens if he runs out of money on Wednesday? 4 - CSE/ESE 560 M – Graduate Computer Architecture I

Increasing Bandwidth - Interleaving Access Pattern without Interleaving: D 1 available Start Access for Increasing Bandwidth - Interleaving Access Pattern without Interleaving: D 1 available Start Access for D 1 CPU Memory Start Access for D 2 Memory Bank 0 Access Pattern with 4 -way Interleaving: CPU Memory Bank 1 Access Bank 0 Memory Bank 2 Access Bank 1 Access Bank 2 Access Bank 3 We can Access Bank 0 again 5 - CSE/ESE 560 M – Graduate Computer Architecture I Memory Bank 3

Main Memory Performance • Wide • – CPU/Mux 1 word; Mux/Cache, Bus, Memory N Main Memory Performance • Wide • – CPU/Mux 1 word; Mux/Cache, Bus, Memory N words (Alpha: 64 bits & 256 bits) • Simple – CPU, Cache, Bus, Memory same width (32 bits) 6 - CSE/ESE 560 M – Graduate Computer Architecture I Interleaved – CPU, Cache, Bus 1 word: Memory N Modules (4 Modules); example is word interleaved

Main Memory Performance • Timing model – 1 to send address, – 4 for Main Memory Performance • Timing model – 1 to send address, – 4 for access time, 10 cycle time, 1 to send data – Cache Block is 4 words • Simple M. P. = 4 x (1+10+1) = 48 • Wide M. P. = 1 + 10 + 1 = 12 • Interleaved M. P. = 1+10+1 + 3 =15 address 0 4 8 12 Bank 0 address 1 5 9 13 Bank 1 7 - CSE/ESE 560 M – Graduate Computer Architecture I address 2 6 10 14 3 7 11 15 Bank 2 Bank 3

DRAM History • DRAMs: capacity +60%/yr, cost – 30%/yr – 2. 5 X cells/area, DRAM History • DRAMs: capacity +60%/yr, cost – 30%/yr – 2. 5 X cells/area, 1. 5 X die size in 3 years • ‘ 98 DRAM fab line costs $2 B – DRAM only: density, leakage v. speed • Rely on increasing no. of computers & memory per computer (60% market) – SIMM or DIMM is replaceable unit – computers use any generation DRAM • Commodity, second source industry – high volume, low profit, conservative – Little organization innovation in 20 years • Order of importance – (1) Cost/bit (2) Capacity (3) Performance – RAMBUS: 10 X BW, +30% cost little impact 8 - CSE/ESE 560 M – Graduate Computer Architecture I

DRAM Crossroads • After 20 years of 4 X every 3 years, running into DRAM Crossroads • After 20 years of 4 X every 3 years, running into wall? (64 Mb - 1 Gb) • How can keep $1 B fab lines full if buy fewer DRAMs per computer? • Cost/bit – 30%/yr if stop 4 X/3 yr? • What will happen to $40 B/yr DRAM industry? 9 - CSE/ESE 560 M – Graduate Computer Architecture I

Tunneling Magnetic Junction • Tunneling Magnetic Junction RAM (TMJ-RAM) – 6 x faster than Tunneling Magnetic Junction • Tunneling Magnetic Junction RAM (TMJ-RAM) – 6 x faster than DRAM – Density of DRAM, non volatile (no refresh) – “Spintronics”: combination quantum spin and electronics – Same technology used in high density disk drives 10 - CSE/ESE 560 M – Graduate Computer Architecture I

Carbon Nanotube Memory • 10, 000 to 30, 000 times more dense than today's Carbon Nanotube Memory • 10, 000 to 30, 000 times more dense than today's Dynamic Random Access Memory (DRAM) memory chips, said Charles M. Lieber, professor of chemistry at Harvard and leader of the team that developed the nanotube technique. • "It's much higher density than… will ever be attainable with silicon [wafer] technology, " he said. In principle these memory arrays could also be 1, 000 to 10, 000 times faster than today's memory chips, he added. 11 - CSE/ESE 560 M – Graduate Computer Architecture I

Micro-Electro-Mechanical Systems (MEMS) • Magnetic “sled” floats on array of read/write heads – Approx Micro-Electro-Mechanical Systems (MEMS) • Magnetic “sled” floats on array of read/write heads – Approx 250 Gbit/in 2 – Data rates: IBM: 250 MB/s w 1000 heads CMU: 3. 1 MB/s w 400 heads • Electrostatic actuators move media around to align it with heads – Sweep sled ± 50 m in < 0. 5 s • Capacity estimated to be in the 1 -10 GB in 1 cm 2 12 - CSE/ESE 560 M – Graduate Computer Architecture I

Storage Errors • Motivation: – DRAM is dense Signals are easily disturbed – High Storage Errors • Motivation: – DRAM is dense Signals are easily disturbed – High Capacity higher probability of failure • Approach: Redundancy – Add extra information so that we can recover from errors – Can we do better than just create complete copies? 13 - CSE/ESE 560 M – Graduate Computer Architecture I

Error Correction Codes (ECC) • Memory systems generate errors (accidentally flipped-bits) – DRAMs store Error Correction Codes (ECC) • Memory systems generate errors (accidentally flipped-bits) – DRAMs store very little charge per bit – “Soft” errors occur occasionally when cells are struck by alpha particles or other environmental upsets. – Less frequently, “hard” errors can occur when chips permanently fail. – Problem gets worse as memories get denser and larger • Where is “perfect” memory required? – servers, spacecraft/military computers, ebay, … • Memories are protected against failures with ECCs • Extra bits are added to each data-word – used to detect and/or correct faults in the memory system – in general, each possible data word value is mapped to a unique “code word”. A fault changes a valid code word to an invalid one which can be detected. 14 - CSE/ESE 560 M – Graduate Computer Architecture I

Simple Error Detection Coding Parity Bit • Each data value, before it is written Simple Error Detection Coding Parity Bit • Each data value, before it is written to • memory is “tagged” with an extra bit to force the stored word to have even parity: b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 p Each word, as it is read from memory is “checked” by finding its parity (including the parity bit). b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 p + + c • A non-zero parity indicates an error occurred: – two errors (on different bits) is not detected (nor any even number of errors) – odd numbers of errors are detected. • • What is the probability of multiple simultaneous errors? Extend the idea for Hamming ECC 15 - CSE/ESE 560 M – Graduate Computer Architecture I

Input and Output • CPU Performance: 60% per year • I/O system performance limited Input and Output • CPU Performance: 60% per year • I/O system performance limited by mechanical delays (disk I/O) – < 10% per year (IO per sec or MB per sec) • Amdahl's Law: system speed-up limited by the slowest part! – 10% IO & 10 x CPU => 5 x Performance (lose 50%) – 10% IO & 100 x CPU => 10 x Performance (lose 90%) • I/O bottleneck: – Diminishing fraction of time in CPU – Diminishing value of faster CPUs 16 - CSE/ESE 560 M – Graduate Computer Architecture I

I/O Systems Processor interrupts Cache Memory - I/O Bus Main Memory I/O Controller Disk I/O Systems Processor interrupts Cache Memory - I/O Bus Main Memory I/O Controller Disk 17 - CSE/ESE 560 M – Graduate Computer Architecture I Disk I/O Controller Graphics Network

Technology Trends Disk Capacity now doubles every 18 months; before 1990 every 36 months Technology Trends Disk Capacity now doubles every 18 months; before 1990 every 36 months • Today: Processing Power Doubles Every 18 months • Today: Memory Size Doubles Every 18 months(4 X/3 yr) The I/O GAP • Today: Disk Capacity Doubles Every 18 months • Disk Positioning Rate (Seek + Rotate) Doubles Every Ten Years! 18 - CSE/ESE 560 M – Graduate Computer Architecture I

Historical Perspective • 1956 IBM Ramac — early 1970 s Winchester – Developed for Historical Perspective • 1956 IBM Ramac — early 1970 s Winchester – Developed for mainframe computers, proprietary interfaces – Steady shrink in form factor: 27 in. to 14 in. • 1970 s developments – 5. 25 inch floppy disk formfactor (microcode into mainframe) – early emergence of industry standard disk interfaces • ST 506, SASI, SMD, ESDI • Early 1980 s – PCs and first generation workstations • Mid 1980 s – Client/server computing – Centralized storage on file server • accelerates disk downsizing: 8 inch to 5. 25 inch – Mass market disk drives become a reality • industry standards: SCSI, IPI, IDE • 5. 25 inch drives for standalone PCs, End of proprietary interfaces 19 - CSE/ESE 560 M – Graduate Computer Architecture I

Disk History 1973: 1. 7 Mbit/sq. in 140 MBytes 20 - CSE/ESE 560 M Disk History 1973: 1. 7 Mbit/sq. in 140 MBytes 20 - CSE/ESE 560 M – Graduate Computer Architecture I 1979: 7. 7 Mbit/sq. in 2, 300 MBytes

Historical Perspective • Late 1980 s/Early 1990 s: – Laptops, notebooks, (palmtops) – 3. Historical Perspective • Late 1980 s/Early 1990 s: – Laptops, notebooks, (palmtops) – 3. 5 inch, 2. 5 inch, (1. 8 inch form factors) – Form factor plus capacity drives market, not so much performance • Recently Bandwidth improving at 40%/ year – Challenged by DRAM, flash RAM in PCMCIA cards • Still expensive, Intel promises but doesn’t deliver • Unattractive MBytes per cubic inch – Optical disk fails on performance • Finds niche (CD ROM) 21 - CSE/ESE 560 M – Graduate Computer Architecture I

Disk History 1989: 63 Mbit/sq. in 60, 000 MBytes 1997: 1450 Mbit/sq. in 2300 Disk History 1989: 63 Mbit/sq. in 60, 000 MBytes 1997: 1450 Mbit/sq. in 2300 MBytes 22 - CSE/ESE 560 M – Graduate Computer Architecture I 1997: 3090 Mbit/sq. in 8100 MBytes

Disk Performance Model /Trends • Capacity + 100%/year (2 X / 1. 0 yrs) Disk Performance Model /Trends • Capacity + 100%/year (2 X / 1. 0 yrs) • Transfer rate (BW) + 40%/year (2 X / 2. 0 yrs) • Rotation + Seek time – 8%/ year (1/2 in 10 yrs) • MB/$ > 100%/year (2 X / <1. 5 yrs) Fewer chips + areal density 23 - CSE/ESE 560 M – Graduate Computer Architecture I

Photo of Disk Head, Arm, Actuator Spindle Arm { Actuator 24 - CSE/ESE 560 Photo of Disk Head, Arm, Actuator Spindle Arm { Actuator 24 - CSE/ESE 560 M – Graduate Computer Architecture I Head Platters (12)

Nano-layered Disk Heads • Special sensitivity of Disk head comes from “Giant Magneto-Resistive effect” Nano-layered Disk Heads • Special sensitivity of Disk head comes from “Giant Magneto-Resistive effect” or (GMR) • IBM is (was) leader in this technology – Same technology as TMJ RAM breakthrough Coil for writing 25 - CSE/ESE 560 M – Graduate Computer Architecture I

Disk Device Terminology Arm Head Inner Outer Sector Track Actuator Platter • Several platters Disk Device Terminology Arm Head Inner Outer Sector Track Actuator Platter • Several platters – information recorded magnetically on both surfaces (usually) • Bits recorded – in tracks, which in turn divided into sectors (e. g. , 512 Bytes) • Actuator – moves head (end of arm, 1/surface) over track (“seek”), select surface, • wait for sector rotate under head, then read or write “Cylinder” - all tracks under heads 26 - CSE/ESE 560 M – Graduate Computer Architecture I

Alternative Data Storage: Early 1990 s Cap Technology (MB) Conventional Tape: Cartridge (. 25 Alternative Data Storage: Early 1990 s Cap Technology (MB) Conventional Tape: Cartridge (. 25") 150 IBM 3490 (. 5") 800 BPI TPI BPI*TPI Data Xfer Access (Million) (KByte/s) Time 12000 22860 104 38 1. 2 0. 9 92 3000 minutes seconds 43200 61000 1638 1870 71 114 492 183 45 secs 20 secs Magnetic & Optical Disk: Hard Disk (5. 25") 1200 33528 IBM 3390 (10. 5") 3800 27940 1880 2235 63 62 3000 4250 18 ms 20 ms Sony MO (5. 25") 640 18796 454 88 100 ms Helical Scan Tape: Video (8 mm) 4600 DAT (4 mm) 1300 24130 27 - CSE/ESE 560 M – Graduate Computer Architecture I

Current Drawbacks to Tape • Tape wear out: – Helical 100 s of passes Current Drawbacks to Tape • Tape wear out: – Helical 100 s of passes to 1000 s for longitudinal • Head wear out: – 2000 hours for helical • Both must be accounted for in economic / reliability model • Long rewind, eject, load, spin-up times; not inherent, just no need in marketplace (so far) • Designed for archival 28 - CSE/ESE 560 M – Graduate Computer Architecture I

Memory Cost 2005 Memory Modules SRAM DRAM Flash 2 MB 1 GB $26 $90 Memory Cost 2005 Memory Modules SRAM DRAM Flash 2 MB 1 GB $26 $90 $50 $13000/GB $90/GB $50/GB $150 $100 $0. 75/GB $1. 43/GB $2 $0. 43/GB Magnetic Disks 3. 5” 200 GB 70 GB Optical Disks 5. 25” 4. 6 GB 29 - CSE/ESE 560 M – Graduate Computer Architecture I

Disk Arrays Disk Product Families Conventional: 4 disk 3. 5” 5. 25” 10” designs Disk Arrays Disk Product Families Conventional: 4 disk 3. 5” 5. 25” 10” designs Low End Disk Array: 1 disk design 3. 5” 30 - CSE/ESE 560 M – Graduate Computer Architecture I 14” High End

Large Disk Array of Small Disks IBM 3390 (K) IBM 3. 5 Large Disk Array of Small Disks IBM 3390 (K) IBM 3. 5" 0061 x 70 20 GBytes 320 MBytes 23 GBytes 97 cu. ft. 0. 1 cu. ft. 11 cu. ft. 3 KW 11 W 1 KW Data Rate 15 MB/s 120 MB/s I/O Rate 600 I/Os/s 55 I/Os/s 3900 IOs/s MTTF 250 KHrs ? ? ? Hrs Cost $250 K $2 K $150 K Data Capacity Volume Power large data and I/O rates Disk Arrays have potential for high MB per cu. ft. , high MB per KW reliability? 31 - CSE/ESE 560 M – Graduate Computer Architecture I

Array Reliability • Reliability of N disks = Reliability of 1 Disk ÷ N Array Reliability • Reliability of N disks = Reliability of 1 Disk ÷ N – 50, 000 Hours ÷ 70 disks = 700 hours • Disk System Life: Drops from 6 years to 1 month! • Arrays (without redundancy) too unreliable to be useful! Hot spares support reconstruction in parallel with access: very high media availability can be achieved 32 - CSE/ESE 560 M – Graduate Computer Architecture I

Redundant Arrays of Disks • Files are Redundant Arrays of Disks • Files are "striped" across multiple spindles • Redundancy yields high data availability Disks will fail Contents reconstructed from data redundantly stored in the array Capacity penalty to store it Bandwidth penalty to update Mirroring/Shadowing (high capacity cost) Techniques: Horizontal Hamming Codes (overkill) Parity & Reed-Solomon Codes Failure Prediction (no capacity overhead!) Vax. Sim. Plus — Technique is controversial 33 - CSE/ESE 560 M – Graduate Computer Architecture I

RAID 3: Parity Disk 10010011 11001101 10010011. . . logical record Striped physical records RAID 3: Parity Disk 10010011 11001101 10010011. . . logical record Striped physical records P 1 0 0 1 1 0 0 1 1 0 0 • Parity computed across recovery group to protect against hard disk failures 33% capacity cost for parity in this configuration wider arrays reduce capacity costs, decrease expected availability, increase reconstruction time • Arms logically synchronized, spindles rotationally synchronized logically a single high capacity, high transfer rate disk Targeted for high bandwidth applications: Scientific, Image Processing 34 - CSE/ESE 560 M – Graduate Computer Architecture I

RAID 5+: High I/O Rate Parity A logical write becomes four physical I/Os Independent RAID 5+: High I/O Rate Parity A logical write becomes four physical I/Os Independent writes possible because of interleaved parity Reed-Solomon Codes ("Q") for protection during reconstruction D 0 D 1 D 2 D 3 P D 4 D 5 D 6 P D 7 D 8 D 9 P D 10 D 11 D 12 P D 13 D 14 D 15 P D 16 D 17 D 18 D 19 D 20 D 21 D 22 D 23 P . . . Targeted for mixed applications Increasing Logical Disk Addresses . . . 35 - CSE/ESE 560 M – Graduate Computer Architecture I . . . Disk Columns. . . Stripe Unit

IRAM • Combine Processor with Memory in One Chip – Large Bandwidth – Bandwidth IRAM • Combine Processor with Memory in One Chip – Large Bandwidth – Bandwidth of nearly 1000 gigabits per second (32 K bits in 50 ns) – A hundredfold increase over the fastest computers today. – The fastest programs will keep most memory accesses within a single IRAM, rewarding compact representations of code and data 36 - CSE/ESE 560 M – Graduate Computer Architecture I