89449256c13669d71033fe01057697cb.ppt
- Количество слайдов: 77
On Managing Continuous Media Data Edward Chang Hector Garcia-Molina Stanford University
Challenges z. Large Volume of Data z. MPEG 2 100 Minute Movie: 3 -4 GBytes z. Large Data Transfer Rate z. MPEG 2: 4 to 6 Mbps z. HDTV: 19. 2 Mbps z. Just-in-Time Data Requirement z. Simultaneous Users 2
. . . Challenges z. Traditional Optimization Objectives: z. Maximizing Throughput!! z. Maximizing Throughout!!! z. How about Cost? z. How about Initial Latency? 3
Related Work z. IBM T. J. Watson Labs. (P. Yu) z. USC (S. Ghandeharizadeh) z. UCLA (R. Muntz) z. UBC (Raymond Ng) z. Bell Labs. (B. Ozden) zetc. 4
Outline z Server (Single Disk) z. Revisiting Conventional Wisdom z. Minimizing Cost z. Minimizing Initial Latency z Server (Parallel Disks) z. Balancing Workload z. Minimizing Cost & Initial Latency z Client z. Handling VBR z. Supporting VCR-like Functions 5
Conventional Wisdom (for Single Disk) z. Reducing Disk Latency leads to Better Disk Utilization z. Reducing Disk Latency leads to Higher Throughput z. Increasing Disk Utilization leads to Improved Cost Effectiveness 6
Is Conventional Wisdom Right? z. Does Reducing Disk Latency lead to Better Disk Utilization? z. Does Reducing Disk Latency lead to Higher Throughput? z. Does Increasing Disk Utilization lead to Improved Cost Effectiveness? 7
Tseek: Disk Latency TR: Disk Transfer Rate DR: Display Rate S: Segment Size (Peak Memory Use per Request) T: Service Cycle Time 8
S = DR × T T = N × (Tseek + S/TR) 9
Disk Utilization N × TR × DR × Tseek S = S is directly proportional to Tseek Dutil TR - N × DR = Dutil S/TR + Tseek is Constant! 10
Is Conventional Wisdom Right? z. Does Reducing Disk Latency lead to Better Disk Utilization? NO! z. Does Reducing Disk Latency lead to Higher Throughput? z. Does Increasing Disk Utilization lead to Improved Cost Effectiveness? 11
What Affects Throughput? × Disk Utilization Disk Latency Throughput ? Memory Utilization 12
Memory Requirement z. We Examine Two Disk Scheduling Policies’ Memory Requirement z. Sweep (Elevator Policy): Enjoys the Minimum Seek Overhead z. Fixed-Stretch: Suffers from High Seek Overhead 13
Per User Peak Memory Use S = N × TR × DR × Tseek TR - N × DR 14
Sweep (Elevator) z. Disk Latency: Minimum z. IO Time Variability: Very High 15
Sweep (Elevator) z. Memory Sharing: Poor z. Total Memory Requirement: 2 * N * Ssweep 16
Fixed-Stretch z. Disk Latency: High (because of Stretch) z. IO Variability: No (because of Fixed) 17
Fixed-Stretch z. Memory Sharing: Good z. Total Memory Requirement: 1/2 * N * Sfs 18
Throughput z Sweep q 2 * N * Ssweep q. Available Memory = 40 Mbytes q. N = 40 z Fixed Stretch q 1/2 * N * Ssf q. Available Memory = 40 Mbytes q. N= 42 q. Higher Throughput * Based on A Realistic Case Study Using Seagate Disks 19
What Affects Throughput? × Disk Utilization Disk Latency Throughput ? Memory Utilization 20
Is Conventional Wisdom Right? z. Does Reducing Disk Latency lead to Better Disk Utilization? NO! z. Does Reducing Disk Latency lead to Higher Throughput? NO! z. Does Increasing Disk Utilization lead to Improved Cost Effectiveness? 21
Per Stream Cost 22
Per-Stream Memory Cost Cm × S = Cm × N × TR × DR × Tseek TR - N × DR 23
Example z Disk Cost: $200 a unit z Memory Cost: $5 each MBytes z Supporting N = 40 Requires 60 MBytes Memory z$200 + 300 = $500 z Supporting N = 50 Requires 160 MBytes Memory z$200 + 800 = $1, 000 z For the same cost $1, 000, it’s better to buy 2 Disks and 120 Mbytes to support N = 80 Users! z Memory Use is Critical 24
Is Conventional Wisdom Right? z. Does Reducing Disk Latency lead to Better Disk Utilization? NO! z. Does Reducing Disk Latency lead to Higher Throughput? NO! z. Does Increasing Disk Utilization lead to Improved Cost Effectiveness? NO! 25
So What? z 26
Outline z Server (Single Disk) z. Revisiting Conventional Wisdom z. Minimizing Cost z. Minimizing Initial Latency z Server (Parallel Disks) z. Balancing Workload z. Minimizing Cost & Initial Latency z Client z. Handling VBR z. Supporting VCR-like Functions 27
Initial Latency z. What is it? z. The time between when a request arrives at the server to the time when the data is available in the server’s main memory z. Where is it important? z. Interactive applications (e. g. , video game) z. Interactive features (e. g. , fast-scan) 28
Sweep (Elevator) 29
Fixed-Stretch z. Space Out IOs 30
Fixed-Stretch 31
Fixed-Stretch 32
Our Contribution: Bubble. Up z. Fixed-Stretch Enjoys Fine Throughput z. Bubble. Up Remedies Fixed-Stretch to Minimize Initial Latency 33
Schedule Office Work z 8 am: z 9 am: z 10 am: z 11 am: z. Noon: Host a Visitor Do Email Write Paper Lunch 34
Bubble. Up 35
Bubble. Up z. Empty Slots are Always Next in Time z. No additional Memory Required z. Fill the Buffer up to the Segment Size z. No additional Disk Bandwidth Required z. The Disk Is Idle Otherwise 36
Evaluation 37
Fast-Scan 38
Fast-Scan 39
Data Placement Policies z. Please refer to our publications 40
41
Chunk Allocation z. Allocate Memory in Chunks z. A Chunk = k * S z. Replicate the Last Segment of a Chunk in the Beginning of Next Chunk z. Example z. Chunk 1: s 1, s 2, s 3, s 4, s 5 z. Chunk 2: s 5, s 6, s 7, s 8, s 9 42
Chunk Allocation z. Largest-Fit First z. Best Fit (Last Chunk) 43
18 Segment Placement 44
Largest-Fit First 45
Best Fit 46
Outline z Server (Single Disk) z. Revisiting Conventional Wisdom z. Minimizing Cost z. Minimizing Initial Latency z Server (Parallel Disks) z. Balancing Workload z. Minimizing Cost & Initial Latency z Client z. Handling VBR z. Supporting VCR-like Functions 47
Unbalanced Workload 48
Balanced Workload 49
Per Stream Memory Use (Use M Disks Independently) S = N × TR × DR × Tseek TR - N × DR M×N 50
Per Stream Memory Use (Use M Disks As One Disk) M×N 51
…Continue S = S’ = N × TR × DR × Tseek TR - N × DR N × M × TR × M × DR × Tseek TR × M - N × M × DR M × N × TR × DR × Tseek TR - N × DR = M×S 52
Challenges z. Using M Disks Independently: z. Unbalanced Workload z. Low Per-Stream Memory Cost z. Using M Disks As One Virtual Disk (i. e. , Employing Fine-Grained Striping): z. Balanced Workload z. High Per-Stream Memory Cost 53
Our Approach (2 DB) z. Use Disks Independently z. To Minimize Cost z. Replicate Hot Movies (20% Movies) z. To Balance Workload z. Use Bubble. Up z. To Minimize Initial Latency 54
2 D Bubble. Up (2 DB) z. Intelligent Data Placement z. Efficient Request Scheduling z. FODO, 1998 55
2 DB Data Placement: Chunk Allocation 56
2 DB Scheduling z. Formally, This is a Bipartite Weighted Matching problem z. Can be solved using Hungarian method in O(V^3), where V = NM z. We use a Greedy Method to reduce the problem to a Bipartite Unweighted Matching problem z. Can be solved in O(M^2) 57
Why 2 DB Works? 58
59
60
n balls n urns, finite n: ln n / ln ln n(1 + o(1)) ln ln n / ln 2 + O(1) m balls n urns, m > n and infinite m and n: d: number of possible destinations ln ln n / ln d (1 + o(1)) + O(m/n) 61
What 2 DB Costs? z. Storage Cost z. Addition disk cost = % hot movies z. Typically 20% of movies subscribed 80% of time z. Throughput is scaled back by a fraction to achieve balanced work 62
Evaluation z 2 DB Achieves Balanced Workload with High Throughput z. Compared to e. g. , some dynamic load balancing schemes z 2 DB Incurs Low Additional Storage Cost z 2 DB Enjoys Minimum Initial Latency 63
Outline z Server (Single Disk) z. Revisiting Conventional Wisdom z. Minimizing Cost z. Minimizing Initial Latency z Server (Parallel Disks) z. Balancing Workload z. Minimizing Cost & Initial Latency z Client z. Handling VBR z. Supporting VCR-like Functions 64
Media Client z. Most Studies Assume Dumb Clients z. We Propose Smart Clients for z. Handling VBR z. Supporting VCR-like Functions 65
Handling VBR z. Server Can Handle VBR z. Frame rate fluctuates but the moving average does not fluctuate as much z. Rates are even out when N is large, which is typically the case 66
. . . VBR z. But, the Server Cannot Eliminate Bitrate Mismatch z. Packetization and Channel Delay can change the bitrate z. The Solution Must Be at the Client Side! 67
Supporting VCR-like Functions z. Pause z. Phone call interruptions z. Biological needs z. Fast Forward z. Catching up the program after a pause z. Instant Replay 68
How to Pause A Movie? z. Broadcast TV Cannot Be Paused z. Pausing Via a Point-to-point Link Affects the Server’s Scheduling z. Caching!!! z. Main Memory Caching? z. Too expensive! (19. 2 mbps * 20 min = 2 GBytes) 69
Buffer Management 70
Challenges z. Must Ensure Arriving Bits Do Not Overflow the Network Buffer z. Must Ensure Decoder Buffer Does Not Underflow z. Must Work for Any Off-the-shelf Disks, CPU Box 71
Our Contribution: MEDIC z. MEDIC: MEmory & Disk Integrated Cache z. MEDIC Manages IOs Between Memory and Disk Efficiently z. Only 4 Mbytes main memory needed!!! z. Make a set-top box affordable z. MEDIC Adapts to Hardware Configuration 72
Demo z. Regular Playback z. Pause z. Resume Regular Playback z. Fast Forward z. Instant Replay (not shown) 73
Visualize MEDIC 74
Conclusions (Contributions in Blue) z Server (Single Disk) z. Revisiting Conventional Wisdom z. Minimizing Cost z. Minimizing Initial Latency z Server (Parallel Disks) z. Balancing Workload z. Minimizing Cost & Initial Latency z Client z. Handling VBR z. Supporting VCR-like Functions 75
…Conclusions z. Our Server Supports z. Low Latency Playback and Fast Forward z. Our Client Supports z. Pause and Low Latency Instance Replay z. Together, We Propose A Complete Endto-end Solution for Continuous Media Data Delivery! 76
Future Work z. Enhancing MEDIC for Managing Heterogeneous Data, from Both Broadcast & Internet Channels z. Video Panoramas z. Interactive TV z. Indexing Videos for Replay z. Video/Image databases 77