038c9ca38d8e7a1d5650f48bbe6073fc.ppt
- Количество слайдов: 31
Input/Output Professor Alvin R. Lebeck Computer Science 220 Fall 2001
Admin • HW #4 Due November 12 • Projects Alvin R. Lebeck 2
Review: VM & Complete Memory Hierarchy • • Caches cost-effective memory hierarchy VM is very nice for programmers TLB speeds up address translation Know how to block diagram entire hierarchy – direct-mapped, 2 -way, fully-associative – where is TLB? – including how to get desired word or byte from cache block Alvin R. Lebeck 3
System Organization Processor interrupts Cache Memory Bus I/O Bridge Core Chip Set I/O Bus Main Memory Disk Controller Disk Graphics Controller Graphics Network Interface Network Alvin R. Lebeck 4
Why I/O? • • • Interactive Apps Long term storage (files) Swap for VM Networks (next chapter) 106 difference CPU (10 -9) & I/O (10 -3) Response Time vs Throughput – Not always another process to execute • Remember Amdahl’s Law Alvin R. Lebeck 5
Types of Storage Devices • • • Magnetic Disks Magnetic Tapes CD DVD Flash Memory Juke Box (automated tape library, robots) Alvin R. Lebeck 6
Magnetic Disks • Long term nonvolatile storage • Another slower, less expensive level of memory hierarchy Track Sector Arm Cylinder Head Platter Alvin R. Lebeck 7
Organization of a Hard Magnetic Disk Platters Track Sector • Typical numbers (depending on the disk size): – 500 to 2, 000 tracks per surface – 32 to 128 sectors per track » A sector is the smallest unit that can be read or written • Traditionally all tracks have the same number of sectors: – Constant bit density: record more sectors on the outer tracks – Recently relaxed: constant bit size, speed varies with track location Alvin R. Lebeck 8
Magnetic Disk Characteristic • Cylinder: all the tracks under the head at a given point on all surfaces • Read/write data is a three-stage process: Track Sector Cylinder Head Platter – Seek time: position the arm over the proper track – Rotational latency: wait for the desired sector to rotate under the read/write head – Transfer time: transfer a block of bits (sector) under the read-write head • Average seek time as reported by the industry: – Typically in the range of 8 ms to 12 ms – (Sum of the time for all possible seek) / (total # of possible seeks) • Due to locality of disk reference, actual average seek time may: – Only be 25% to 33% of the advertised number Alvin R. Lebeck 9
Disk Access • Access time = queue + seek + rotational + transfer + overhead • Seek time – move arm over track – average is confusing (startup, slowdown, locality of accesses) • Rotational latency – wait for sector to rotate under head – average = 0. 5/(3600 RPM) = 8. 3 ms • Transfer Time – f(size, BW bytes/sec) Alvin R. Lebeck 10
Disk Time Example • Disk Parameters: – – Transfer size is 8 K bytes Advertised average seek is 12 ms Disk spins at 7200 RPM Transfer rate is 4 MB/sec • Controller overhead is 2 ms • Assume that disk is idle so no queuing delay • What is Average Disk Access Time for a Sector? – Ave seek + ave rot delay + transfer time + controller overhead – 12 ms + 0. 5/(7200 RPM/60) + 8 KB/4 MB/s + 2 ms – 12 + 4. 15 + 2 = 20 ms • Advertised seek time assumes no locality: typically 1/4 to 1/3 advertised seek time: 20 ms => 12 ms Alvin R. Lebeck 11
DRAM as Disk • Solid state disk, Expanded Storage, NVRAM • Disk is slow, DRAM is fast => replace Disk with battery backed DRAM • BUT, Disk is cheap, much cheaper than DRAM Alvin R. Lebeck 12
Alternative Storage • CD ROM – read only: good for distribution • CD RW • FLASH Memory • Magnetic Tape – Sequential Access – R-DAT (Rotating Digital Audio Tape) » Helical Scan (angle to tape, high density ~5 GB) – Tera to peta bytes of storage (NASA EOS) Alvin R. Lebeck 13
Connecting I/O Devices to CPU/Memory • Memory Bus – – – Short Fast Known set of components Proprietary (don’t release design free) Ultra doesn’t have traditional bus • Separate I/O Bus (e. g. , PCI) – – Standard Accept variety of components (w/ different BW performance) Long Slow Alvin R. Lebeck 14
Processor Interface Issues • Interconnections – Busses • Processor interface – – Instructions Memory mapped I/O • I/O Control Structures – – – Polling Interrupts DMA I/O Controllers I/O Processors • Capacity, Access Time, Bandwidth Alvin R. Lebeck 15
Device Controllers Interrupt? Busy Done Error Bus Command Status Device Controller Data 0 Data 1 Controller deals with mundane control (e. g. , position head, error detection/correction) Data n-1 Processor communicates with Controller Device Alvin R. Lebeck 16
Review: Interrupts and Exceptions • Unnatural change in control flow • Interrupt is external event – devices: disk, network, keyboard, etc. – clock for timeslicing – these are useful events, must do something when they occur. • Exception is often potential problem with program – – – segmentation fault bus error divide by 0 don’t want my bug to crash the entire machine page fault (virtual memory…) Alvin R. Lebeck 17
Review: Handling an Interrupt/Exception User Program ld add st mul beq ld sub bne Interrupt Handler RETT Service Routines • Invoke specific kernel routine based on type of interrupt – interrupt/exception handler • Must determine what caused interrupt – could use software to examine each device – PC = interrupt_handler • Vectored Interrupts – PC = interrupt_table[i] • Clear the interrupt • kernel initializes table at boot time • May return from interrupt (RETT) to different process (e. g, context switch) Alvin R. Lebeck 18
Device Drivers • top-half – API (open, close, read, write, ioctl) – I/O Control (IOCTL, device specific arguments) • bottom-half – interrupt handler – communicates with device – resumes process • Must have access to user address space and device control registers => runs in kernel mode. Alvin R. Lebeck 19
I/O Interface CPU Memory memory bus Independent I/O Bus Interface Peripheral CPU common memory & I/O bus Memory Seperate I/O instructions (in, out) Lines distinguish between I/O and memory transfers Interface Peripheral VME bus Multibus-II Nubus 40 Mbytes/sec optimistically 10 MIP processor completely saturates the bus! Alvin R. Lebeck 20
Memory Mapped I/O CPU Memory Single Memory & I/O Bus No Separate I/O Instructions Interface Peripheral CPU Interface Physical Address ROM RAM Peripheral I/O $ L 2 $ Memory Bus Memory I/O bus Issue command through store Check for completion with load Write-back cache / Write buffer? Bus Adaptor Bridge Alvin R. Lebeck 21
Programmed I/O (Polling) CPU Is the data ready? Memory IOC device no yes read data but checks for I/O completion can be dispersed among computationally intensive code store data done? busy wait loop not an efficient way to use the CPU unless the device is very fast! no yes Alvin R. Lebeck 22
Interrupt Driven Data Transfer add sub and or nop CPU Device $ (1) I/O interrupt Controller L 2 $ user program (2) save PC Memory Bus I/O bus Memory Bus Adapter (3) interrupt service addr User program progress only halted during actual transfer Interrupt Overhead can dominate transfer time. 1000 xfers of 1000 bytes each: 2 usecs for interrupt 98 usecs for service (4) read store. . . rti interrupt service routine memory Device xfer rate: 10 MB/s =>. 1 usec/byte =>. 1 ms for 1000 bytes Alvin R. Lebeck 23
Direct Memory Access CPU sends a starting address, direction, and length count to DMAC. Then issues "start". Time to do 1000 x 1000 bytes: 1 DMA set-up sequence @ 50 µsec 1 interrupt @ 2 µsec 1 interrupt service sequence @ 48 µsec. 0001 second of CPU time 0 CPU $ L 2 $ Memory Bus I/O bus Memory Mapped I/O ROM RAM Peripherals Memory Bus Adapter DMA CNTRL DMAC must “talk” to both memory bus and I/O bus (e. g. , PCI). n DMAC Alvin R. Lebeck 24
Input/Output Processors D 1 IOP CPU D 2 main memory bus Mem . . . I/O bus (1) CPU IOP (3) (4) (2) Dn target device where cmds are issues instruction to IOP OP Device Address looks in memory for commands interrupts when done memory Device to/from memory transfers are controlled by the IOP directly. IOP steals memory cycles. OP Addr Cnt Other what to do special requests where to put data how much Alvin R. Lebeck 25
Relationship to Processor Architecture • I/O instructions and I/O busses connected directly to processor have largely disappeared (Memory Mapped I/O) – Some embedded processors still have them (micro-controllers) • Interrupts: – Stack replaced by shadow registers – Handler saves registers and re-enables higher priority int's – Interrupt types reduced in number; handler must query interrupt controller Alvin R. Lebeck 26
Relationship to Processor Architecture • Caches required for processor performance cause problems for I/O – Flushing is expensive, I/O pollutes cache – Solution is borrowed from shared memory multiprocessors "snooping" • Virtual memory frustrates DMA • Load/store architecture at odds with atomic operations – load locked, store conditional • Caches and write buffers – need uncached and write buffer flush for memory mapped I/O • Stateful processors hard to context switch Alvin R. Lebeck 27
I/O Data Flow Impediment to high performance: multiple copies, complex hierarchy Alvin R. Lebeck 28
Communication Networks Performance limiter is memory system, OS overhead, not HW protocols • • • Send/receive queues in processor OS memory Network controller copies back and forth via DMA No host intervention needed Interrupt host when message sent or received Memory-to-Memory copy to user space Alvin R. Lebeck 29
Network Connected Devices • High speed networks (10 Gb Ethernet soon) • How can we eliminate overheads? Page Flipping • OS places aligned data into memory and remaps pages RDMA • Idea is to eliminate kernel-to-user copy (User-level messaging) • Requires “translation” on Network Interface (NI) • Application registers region with OS • OS stores pointer in NI • On Receive, pointer says where data should go Alvin R. Lebeck 30
Next Time • Bus designs (connecting components) • RAID Alvin R. Lebeck 31
038c9ca38d8e7a1d5650f48bbe6073fc.ppt