be468ea027a614468ba81de0e5c2a003.ppt
- Количество слайдов: 29
An On-the-Fly Reference Counting Garbage Collector for Java Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni – Microsoft Corporation ACM Conference on Object Oriented Programming Systems Languages & Applications Tampa, Florida October 18, 2001 On-the-Fly Reference Counting Levanoni & Petrank
Garbage Collection Today • Two classic approaches: – Tracing [Mc. Carthy 1960]: trace reachable objects, reclaim objects not traced. – Reference counting [Collins 1960]: keep reference count for each object, reclaim objects with count 0. • Today’s advanced environments: – multiprocessors – huge memories 2 On-the-Fly Reference Counting Levanoni & Petrank
Motivation for RC • Reference Counting work is proportional to work on creations and modifications. – Can tracing deal with tomorrow’s huge heaps? • Reference counting has good locality. • Tracing rules JVM’s, is it justified? • The Challenge: – RC write barriers seem too expensive. – RC seems impossible to “parallelize”. 3 On-the-Fly Reference Counting Levanoni & Petrank
This work • An improved RC (suitable for Java) – Reduced overhead on write barrier, – Concurrent with low overhead: on-the-fly, no sync. operation in write barrier, multiprocessor. – Thus: low latency, high performance. • Implementation: – JVM: SUN’s Java Virtual Machine 1. 2. 2 – Platform: 4 -way IBM Netfinity 8500 R server with 550 MHz Intel III Xeon and 2 GB memory. 4 On-the-Fly Reference Counting Levanoni & Petrank
Agenda ü Introduction ü Motivation Ø The Algorithm • Related issues • Implementation and Measurements • Conclusions 5 On-the-Fly Reference Counting Levanoni & Petrank
Terminology 6 On-the-Fly Reference Counting Levanoni & Petrank
Basic Reference Counting • Each object has an RC field, new objects get o. RC: =1. • When p that points to o 1 is modified to point to o 2 we do: o 1. RC--, o 2. RC++. • if then o 1. RC==0: – Delete o 1. – Decrement o. RC for all sons of o 1. – Recursively delete objects whose RC is decremented to 0. 7 On-the-Fly Reference Counting Levanoni & Petrank
Basic Reference Counting • Each object has an RC field, new objects get o. RC: =1. • When p that points to o 1 p is modified to point to o 2 we do: o 1. RC--, o 2. RC++. o 1 o 2 • if then o 1. RC==0: – Delete o 1. – Decrement o. RC for all sons of o 1. – Recursively delete objects whose RC is decremented to 0. 8 On-the-Fly Reference Counting Levanoni & Petrank
Deferred Reference Counting • Problem: overhead on updating program variables (locals) costs too much. • Solution [Deutch & Bobrow] : – Don’t update RC for locals. – “Once in a while”: collect all objects with o. RC=0 that are not referenced from local roots. • Deferred RC reduces overhead by 80%. Used in most modern RC systems. 9 On-the-Fly Reference Counting Levanoni & Petrank
Multithreaded RC? • Problem: – Parallel updates confuse counts: Thread 1: Read A. next; A. next C; B. RC- -; C. RC++ C A Thread 2: Read A. next; A. next D; B. RC- -; D. RC++ B D – (And more: Update ref counts in parallel races. ) 10 On-the-Fly Reference Counting Levanoni & Petrank
Multithreaded RC • Problem: – Parallel updates confuse counts. – Update ref counts in parallel races. • [De. Treville]: – Lock heap for each pointer modification. – Thread records its updates in a buffer. – Once in a while (snapshot alike): 11 • GC threads all buffers to update ref counts • Reclaims all objects with 0 rc that are not local. On-the-Fly Reference Counting Levanoni & Petrank
To Summarize… • Overhead on write barrier is considered high. – Even with deferred RC of Deutch & Bobrow. • Using reference counting concurrently with program threads seems to bear high synchronization cost. – Lock or “compare & swap” for each pointer update. 12 On-the-Fly Reference Counting Levanoni & Petrank
Improving RC • Consider a pointer p that takes the following values between GC’s: O 0, O 1, O 2, …, On. • All RC algorithms perform 2 n operations: O 0. RC--; O 1. RC++; O 1. RC--; O 2. RC++; O 2. RC--; … ; On. RC++; • But only two operations are needed: O 0. RC-- and On. RC++ O 0 13 O 1 p O 2 O 3 O 4 On-the-Fly Reference Counting . . . O n Levanoni & Petrank
Improving RC cont’d • Don’t record all pointer modifications. Record first modifications between GC’s (O 0). • During the collection, for each recorded ptr p: – find O 0 by checking the record, – find On by reading the heap during the collection. • Apply only two operations for each such pointer: O 0. RC-- and On. RC++ 14 p This reduces number of logging & counter updates by a factor of 100 O Onormal. O 3 O 4 O 10000 for On-the-Fly 2 Reference Counting. . . Levanoni & Petrank benchmarks! On 1
Improving Synch. Overhead • Simple solutions bear unacceptable overhead: – De. Treville uses a lock for all pointer modifications – Simple alternatives require 3 compare-andswap’s • Our second contribution: – A carefully designed write barrier (and an observation) allows elimination of all sync. operations from the write barrier. 15 On-the-Fly Reference Counting Levanoni & Petrank
The write barrier Update(Object **slot, Object *new){ Object *old = *slot if (!Is. Dirty(slot)) { log( slot, old ) Observation: Set. Dirty(slot) If two threads: } 1. invoke the write barrier in *slot = new parallel, and 2. both log an old value, } then both record the same old value. 16 On-the-Fly Reference Counting Levanoni & Petrank
Intermediate Algorithm: Snapshot Oriented, Concurrent • Use write barrier with program threads. • To collect: – Stop all threads – Scan roots (locals) – get the buffers with modified slots – Clear all dirty bits. – Resume threads – For each modified slot: • decrease rc for old value (written in buffer), • increase rc for current value (“read heap”), 17 – Reclaim non-local objects with rc 0. On-the-Fly Reference Counting Levanoni & Petrank
The Sliding View Algorithm On-th-Fly • Do all collection as threads run: – – – Read threads buffers (one thread at a time), Clear all dirty bits, Sliding Update reference counts, View Read roots of each thread, one at a time, Reclaim (recursively) objects with rc 0. • Note: rc’s are not correct for any specific point in time, yet, with care, most dead objects may be reclaimed! • Borrow ideas from [Lamport et. Al. ] 18 On-the-Fly Reference Counting Levanoni & Petrank
Cycles Collection • Our solution: use a tracing algorithm infrequently. • Currently this is the most efficient solution. Cycle collectors have high cost. • We propose a new on-the-fly mark & sweep algorithm that works best with the same sliding view. Can also be used “on its own”. 19 On-the-Fly Reference Counting Levanoni & Petrank
Implementation for Java • Based on Sun’s JDK 1. 2. 2 for Windows NT • Main features – 2 -bit RC field per object (á la [Wise et. al. ]) – A supplemental sliding view tracing algorithm – A custom allocator for on-the-fly RC: • Multi leveled fine grained locking • Supports sporadic reclamation of objects • Supports sweeping the heap 20 On-the-Fly Reference Counting Levanoni & Petrank
Performance Measurements • First multiprocessor measurements in a “normal” environment! – • (Previous measured reports assumed one CPU is free for GC all the time. ) Benchmarks: – – 21 Server benchmarks • • SPECjbb 2000 --- simulates business-like transactions in a large firm MTRT --- a multi-threaded ray tracer Client benchmarks • SPECjvm 98 --- a suite of mostly single-threaded client benchmarks On-the-Fly Reference Counting Levanoni & Petrank
Improved RC • How many RC updates are eliminated? Benchmark No of stores No of “first” stored Ratio of “first” stores jbb Compress 264, 115 51 1/269 1/1273 Db Jack Javac Jess mpegaudio 22 71, 011, 357 64, 905 33, 124, 780 135, 174, 775 22, 042, 028 26, 258, 107 5, 517, 795 30, 696 1, 546 535, 296 27, 333 51 1/1079 1/87435 1/41 1/961 1/108192 On-the-Fly Reference Counting Levanoni & Petrank
SPECjbb Latency (Max Transaction Time) 23 On-the-Fly Reference Counting Levanoni & Petrank
SPECjbb Throughput 24 On-the-Fly Reference Counting Levanoni & Petrank
MTRT Throughput 25 On-the-Fly Reference Counting Levanoni & Petrank
SPECjbb Heap Utilization 26 On-the-Fly Reference Counting Levanoni & Petrank
Client Performance 27 On-the-Fly Reference Counting Levanoni & Petrank
Related Work • On-the-fly tracing: – Dijkstra et. al. (1976), Steele (1976), Lamport (1976), – Kung & Song (1977), Gries (1977) Ben-Ari (1982, 1984), Huelsbergen et. al. (1993, 1998) – Doligez-Gonthier-Leroy (1993 -4), Domani-Kolodner. Petrank (2000) • Concurrent reference counting: – – – 28 – De. Treville (1990), Martinez et. al. (1990), Lins (1992) Plakal & Fischer (2001), Bacon et. al. (2001) Reference Counting On-the-Fly Levanoni & Petrank
Conclusions • A new algorithm for reference counting. – Low overhead on pointer modification – On-the-fly • Implementation for Java • Measurements show high throughput and low latency. • To be out soon: A matching paper on the sliding view tracing collector. 29 On-the-Fly Reference Counting Levanoni & Petrank
be468ea027a614468ba81de0e5c2a003.ppt