2f493928abfe27bfef5b94be7a526879.ppt
- Количество слайдов: 100
Introduction to Memory Management & Garbage Collection or How to Live with Memory Allocation Problems OOPSLA 2000 Tutorial no. 70 Tuesday 17 October 2000 Richard Jones Computing Laboratory University of Kent at Canterbury Eric Jul DIKU Department of Computer Science University of Copenhagen ©Richard Jones, Eric Jul, 2000. All rights reserved. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 1
Overview Dangers of explicit deallocation … Introduction to automatic memory management Reference counting BREAK Tracing collectors Automatic memory management in C++ Finalisation Advanced techniques: generational and incremental GC © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 2
Part 1: Introduction Who are we? Who are you? © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 3
So what about you? Programming Languages? • C, C++, Java, Smalltalk, Modula-3, CL • Malloc/free; new/delete; GC Garbage Collection Knowledge • Little or no knowledge of the area • Some knowledge or experience Current employment • industry, or • academia? © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 6
What is the problem? Memory is a scarce and precious resource Some applications can manage with a bounded amount of memory using static allocation combined with stack allocation. Others use dynamic allocation of memory because: • Some objects live longer than the method that creates them. • Recursive data structures such as lists and trees. • Avoids fixed hard limits on data structure sizes. If we had unbounded amounts of memory, we’d never worry. The PROBLEM is that we don’t have unbounded memory. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 7
What can we do? Dynamically allocate and deallocate memory. REUSE deallocated memory. Dynamical memory allocation is available in many languages, e. g. , using languages features: • New • Delete X allocates a new object deallocated the object X Such features allows programmer to handle allocation themselves. Objects that no longer are needed are called garbage. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 8
Garbage Collection Identifying garbage and deallocating the memory it occupies is called garbage collection. We can try to handle the garbage collection housekeeping chores related to object allocation and deallocation ourselves. Such housekeeping can be simple but for many applications the chores become complex – and error prone. Can we do it ourselves? – Or should it be automatic? ? © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 9
Why AUTOMATIC Garbage Collection? Because human programmers just can’t get it right. Either too little is collected leading to memory leaks, or too much is collected leading to broken programs. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 10
Space leaks? Human programmers can: • • • Forget to delete an object when it is no longer needed. Return a newly allocated object – but when will it be deallocated? Not figure out when a shared objects should be deleted. Sharing is a significant problem Can be handled by using the principle last one to leave the room turns off the light. However, this is easily forgotten, and, worse, in a large building, it can be close to impossible to detect that you are the last! © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 11
Dangling Pointers Eager human programmers can delete objects too early leading to dangling pointers Consider an object that is shared between two different parts of a program each having its own pointer to the object. If one of the pointers is deleted then the other pointer is left pointing to a non-existent object – we say that it is a dangling pointer. (My wife, who is effective at throwing things out, introduced me to the concept of dangling pointers quite early in our marriage. ) © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 12
Sharing is a Real Problem Sharing is a significant problem Memory leaks and dangling pointers are two sides of the same coin: The difficulty is managing objects in the presences of sharing. Example Consider the principle: last one to leave the room turns off the light. However, this is easily forgotten, and, worse, in a large building, it can be close to impossible to detect that you are the last to leave! © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 13
Abstraction & Modularity Besides the practical problems of explicit memory management, we also believe that explicit management conflicts with the software engineering principles of abstraction and modularity. THEREFORE this tutorial is about building automatic garbage collectors BUT ALSO about how to survive without automatic GC © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 14
More “why” Don’t be hard on yourself • Don’t reinvent the wheel • Garbage collectors honed by time and much usage can offer better performance than custom memory managers Caveat • It’s not a silver bullet • Some memory management problems cannot be solved using automatic GC, e. g. if you forget to drop references to objects that you no longer need. • Some environments are inimicable to garbage collection – embedded systems with limited memory – hard real-time systems © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 15
What to look for In this tutorial • we’ll review how objects are allocated including the dangers and problems related to explicit allocation • we’ll show to do it yourself when you must live with explicit allocation and deallocation • we’ll show automatic garbage collectors can do it for you by reviewing the basic algorithms • We’ll show you can help the collector to work efficiently • we’ll tell you about performance issues of GC © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 16
Part 2: Object Allocation In the following, we review how objects are allocated • Object & Machine Model? • Explicit Allocation • Dangers of Explicit Allocation © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 17
Object Model In this tutorial, we assume that we are using a system that is to support some kind of objects. Objects are represented in memory by some number of bits stored contiguously in memory. They consist of a header (which is not visible to the user program — often called the mutator) and zero or more fields. Objects can contain references to other objects. A reference to an object is usually implemented merely by the memory address of the piece of memory where the object is stored. For simplicity, we assume that there is only one thread of execution. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 18
Machine Model Our object system runs on some machine consisting of: • A stack which implements the executing thread of control • A part of memory where global variables are stored • A number of registers that contain addresses of various parts of memory (or arithmetic data) • A heap which is a part of memory that is split into many pieces each of which either contains the representation of an object or is unused (in which case we say it is free). © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 19
Allocation means finding a free piece of memory in the heap and reserving it for the representation of an object. Deallocation means changing the status of a piece of memory from allocated to free. Liveness An object is live as long as it still is reachable from some part of the program’s computation. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 20
Objects in Memory Objects are typically allocated in the heap. An Object Reference is a pointer to an object (typically merely the heap address of the start of the object). Variables contain object references (ignoring primitive data). Each object can contain a number of variables and thereby reference other objects. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 21
Static & dynamic allocation Static allocation — allocation takes place when a program starts – basically memory is laid out by the compiler Dynamic allocation — allocate new objects while the program is executing. A simple form of dynamic allocation is stack allocation where objects are allocated on the program stack and deallocated using a stack discipline. Heap allocation is the most general form of allocation: objects are allocated in the heap. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 22
Dangers of Explicit Deallocation With explicit deallocation the programmer ends up: Doing too little • Garbage objects are not deallocated and slowly but surely clutters memory and so the program runs out of memory (such a failure to delete garbage objects is called a memory leak). Doing too much • Throwing away a non-garbage object. Subsequent use of a live reference to the object will cause the program to fail in inexplicable ways. Such a reference is a dangling reference. • Throwing away a garbage object twice! Likely to break the memory manager. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 23
The REAL bad thing about explicit deallocation The problems of: • Memory leaks • Dangling references • Double deallocation are real and omnipresent in explicit deallocation systems and they cause the real problem: Wasting huge amounts of debugging time! and despite this, programs may still fail in mysterious ways long after being put into production. Finding and fixing MM bugs can account for 40% of debug time. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 24
Part 3: Living without GC: Doing it yourself How do we manage without GC? In the following, we take a look at techniques for doing it yourself or with some help from tools. • • Defensive programming Pairing Principle The Ownership concept Monitoring technique Administrator technique Tools Living without GC in C++ © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 25
Defensive programming Defensive strategies for doing without GC Sharing: • Copying objects instead of sharing. • transform a global deallocation decision to a local one Can be wasteful of space, but can be useful. Example: Everyone gets their own set of lights: turn your lights off when leaving. (Simple, but obviously wasteful. ) © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 26
Pairing Principle Pairing principle: For each new() pair it with a delete() Make sure there is a one-to-one correspondence. For every new() check that the corresponding delete() is there. One way is to have the allocating object also be the deallocating object. Example If you turned the light on, YOU turn it off. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 27
Pairing Principle Example Allocate in the constructor; deallocate in the destructor. Class A { Xclass X; void A() { X = new Xclass; } void ~A() { delete X; }. . . } © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 28
Ownership Concept Observation: Objects are often passed around Thus it is often an object other than the creator who must do the deallocation. Ownership Concept Initially, the allocating object is the owner of the newly allocated object. When passing a reference to the allocated object, the ownership can also be passed. (The previous owner should throw away the reference – it may become dangling very soon!) Only the owner is allowed to deallocate the object; the last owner does the deallocation. Each owner either passes on the ownership rights – or deallocates. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 29
Monitoring Technique Monitoring Tool A simple mechanism to help find bugs is to maintain a table of allocated objects. Malloc() is replaced by a version that store the address of newly allocated objects in a table. Free() is replaced by a version that checks the table before freeing. Such monitoring can help find bugs: • memory leaks (the table will fill with a particular type of object), • dangling references (the table can be checked to see, if a reference is valid before using it) • double deallocations (free will protest if a non-allocated object is free’d). © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 30
Multiple Owners: Shared Objects Handling Shared Objects Using Reference Counting We could attempt to handle shared objects by trying to keep track of multiple owners, e. g. , by expanding the monitoring table by a count field. For every new owner, we must increment the count. And decrement it every time an owner is done with the object. When the last owner is done (the count goes to zero), the object is deallocated. Requires extra code for deallocation, copying of references, etc. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 31
Advanced Monitors By adding more and more functionality to the monitoring of allocation of objects, we end up with a large, and potentially unwieldy, set of routines. At some point, the thought occurs that such routines should be written once and for all and then be presented as a tool. Indeed such tools are available – let’s take a look at them. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 32
Tools Range of tools available to help track down memory problems • • • Purify C++ Expert Insure++ Electric Fence Bounds Checker Great Circle © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 33
What they do Purify modifies object code to track Loads and Stores (hence doesn’t need source code) to detect • • • Uninitialised data Use of freed memory Freeing mismatched memory Memory leaks (including file descriptors) Stack overflow Reports by • Stack trace and source line program: purify $(CC) $(CFLAGS) -o program $(OBJS) $(LIBS) © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 34
What they don’t do Tools tackle the symptom rather than the disease They don’t • Simplify interfaces • Enhance reusability • Cure leaks or dangling references GC on the other hand ensures that certain classes of MM error simply cannot occur © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 35
Part 4: Automatic Memory Management Automatic memory management including garbage collection handles the most significant of the problems that we tried to solve until now. Doing it yourself has is cumbersome to do – and quite error prone. In the following, we present automatic memory management © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 36
What is garbage? Almost all garbage collectors assume the following definition of live objects called liveness by reachability: if you can get to an object, then it is live. More formally: An object is live if and only if: it is referenced in a predefined variable called a root, or it is referenced in a variable contained in a live object (i. e. it is transitively referenced from a root). Non-live objects are called dead objects, i. e. garbage. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 37
Graphs & Roots The objects and references can be considered a directed graph: The live objects of the graph are those reachable from a root. The process executing a computation is called the mutator because it is viewed as dynamically changing the object graph. What are the roots of a computation? Determining roots is, in general, language-dependent In common language implementations roots include • • • words in the static area registers words on the execution stack that point into the heap. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 38
Why garbage collect? Language requirement • many OO languages assume GC, e. g. allocated objects may survive much longer than the method that created them Problem requirement • the nature of the problem may make it very hard/impossible to determine when something is garbage © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 39
Why GC is a software engineering issue SE = management of complexity in large-scale software systems Tools: modularity & abstraction • Explicit MM cuts against these principles • Auto MM offers increased abstraction Relieve programmers of book-keeping detail • Time is better spent on higher-level details of design and implementation © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 40
Reliable code is understandable code Understand behaviour of a module or a few neighbouring modules Behaviour of module should be independent of context One module should not cause the failure of another (e. g. through a MM error) © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 41
Composing components Modules should be reusable in different contexts • Cohesive • Loosely-coupled • Communicate with as few other modules as possible and exchange as little information as possible [Meyer] • Interfaces should be simple and welldefined © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 42
But… Liveness is a global property • An object is live if it can be used by any part of the program • This cannot (in general) be determined by inspection of a single code fragment Adding MM book-keeping clutter to interfaces • Weakens abstractions • Reduces extensibility of modules © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 43
Stack example When stack A is popped, can first->data be reclaimed? © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 44
Liveness An object is only live if it can effect future computation • Must be able to load it (or a part of it) into registers. • Well-behaved programs that do not access random addresses in memory. What data is known (and can be manipulated)? • Global data held in static areas • Local variables, parameters and compiler temporaries that may be held on the stack or in machine registers Hence the program may also use • Any objects that can be reached by way computations on known objects. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 45
Liveness by reachability Almost all garbage collectors assume the following definition of live objects called liveness by reachability: if you can get to an object, then it is live. More formally: An object is live if and only if: it is referenced in a predefined variable called a root, or it is referenced in a variable contained in a live object (i. e. it is transitively referenced from a root). Non-live objects are called dead objects, i. e. garbage. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 46
A conservative estimate ‘Liveness by reachability’ provides a conservative estimate of the set of live objects. • Contains all objects that could be used by a well-behaved program • May contain objects that will never be used again. Stack Thing a = some. Computation(); if(a. property()) E 1(); else E 2(); Reference to a may be held on stack — hence considered reachable — until E 1/E 2 has completed. But static analysis may reveal that a could be discarded after the conditional test. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection a Thing E 2() 47
Help the collector Help the GC that you are finished with an object r = new File. Reader(filename) // use the reader … reader. close(); reader = null; Root My. Object r File. Reader • This is a simple, local, decision • Don’t null the reference if it is about to disappear (e. g. local variable in a method that’s about to return), • Do dispose of components when you have finished with them if your framework (e. g. AWT) that requires you to. e. g. my. Window. dispose(); © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 48
Cost Metrics for GC Execution time Delay time • total execution time • distribution of GC execution time • time to allocate a new object Memory usage • additional memory overhead • fragmentation • virtual memory and cache performance © Richard Jones, Eric Jul, 1999 -2000 • length of disruptive pauses • zombie times Other important metrics • comprehensiveness • implementation simplicity and robustness OOPSLA 2000 Tutorial: Garbage Collection 49
No silver bullet Often not necessary for simple programs • But beware reuse of simple code Hard real-time code GC doesn’t cure problem of data structures that grow without limit • Surprisingly common e. g. caching • Benign in small problems, bad for large or long running ones • Java’s References model Abstraction may hide concrete representations • E. g. stack as an array [Problem is that this assumes concept of tracing GC. Put it later? ] © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 50
The basic algorithms • Reference counting: Keep a note on each object in your garage, indicating the number of live references to the object. If an object’s reference count goes to zero, throw the object out (it’s dead). • Mark-Sweep: Put a note on objects you need (roots). Then recursively put a note on anything needed by a live object. Afterwards, check all objects and throw out objects without notes. • Mark-Compact: Put notes on objects you need (as above). Move anything with a note on it to the back of the garage. Burn everything at the front of the garage (it’s all dead). • Copying: Move objects you need to a new garage. Then recursively move anything needed by an object in the new garage. Afterwards, burn down the old garage (any objects in it are dead)! © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 51
Reference counting A mechanism to share ownership Goal • identify when you are the only owner • You can make the disposal decision. Basic idea: count the number of references from live objects. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 52
Reference counting: principle Each object has a reference count (RC) • when a reference is copied, the referent’s RC is incremented • when a reference is deleted, the referent’s RC is decremented • an object can be reclaimed when its RC = 0 © Richard Jones, Eric Jul, 1999 -2000 Update(left(R), S) OOPSLA 2000 Tutorial: Garbage Collection 53
Reference counting: recursive freeing Once an object’s RC=0, it can be freed. But object may contain references to further objects. Before this object is freed, the RCs of its constituents should also be freed. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 54
Reference counting: implementation New() { if free_list == nil abort "Memory exhausted" newcell = allocate() RC(newcell) = 1 return newcell } Update(R, S) { RC(S) = RC(S) + 1 delete(*R) *R = S } © Richard Jones, Eric Jul, 1999 -2000 delete(T) { RC(T) = RC(T) - 1 if RC(T) == 0 { for U in Children(T) delete(*U) free(T) } } free(N) { next(N) = free_list; free_list = N; } OOPSLA 2000 Tutorial: Garbage Collection 55
Example 1 1 free 1 f 2 1 after before © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 2 1 1 1 56
Advantages of reference counting ü Simple to implement ü Costs distributed throughout program ü Good locality of reference: only touch old and new targets' RCs ü Works well because few objects are shared and many are short-lived ü Zombie time minimized: the zombie time is the time from when an object becomes garbage until it is collected ü Immediate finalisation is possible (due to near zero zombie time) © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 58
Break 85 minutes so far © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 60
Tracing GC: idea We can formalise our definition of reachability: live = { N Objects | ( r Roots. r N) ( M live. M N) } We can encode this definition simply • Start at the roots; the live set is empty • Add any object that a root points at to our live set • Repeat Add any object a live object points at to the live set Until no new live objects are found • Any objects not in the live set are garbage © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 61
Mark-Sweep Mark-sweep is such a tracing algorithm — it works by following (tracing) references from live objects to find other live objects. Implementation of the live set: Each object has a mark-bit associated with it, indicating whether it is a member of the live set. There are two phases: • Mark phase: starting from the roots, the graph is traced and the mark-bit is set in each unmarked object encountered. At the end of the mark phase, unmarked objects are garbage. • Sweep phase: starting from the bottom, the heap is swept – mark-bit not set: the object is reclaimed – mark-bit set: the mark-bit is cleared © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 62
© Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 63
The mark-stack The simplest solution is to implement marking recursively: • walk a minimum spanning tree of the object graph mark(N) { if mark. Bit(N) == UNMARKED { mark. Bit(N) = MARKED for M in Children(N) mark(*M) } } A more efficient method is to use a marking stack • Repeat until the mark stack is empty. Pop the top item If it is unmarked, mark it. If it is a branch point in the graph, push any unmarked children onto the stack © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 64
New() { if free_pool. empty mark. Heap() sweep() newobj = allocate() return newobj } mark. Heap() { mark. Stack = empty for R in Roots mark. Bit(R) = MARKED mark. Stack. push(R) mark() } © Richard Jones, Eric Jul, 1999 -2000 mark() { while mark. Stack not empty { N = pop(mark. Stack) for M in Children(N) if mark. Bit(M)== UNMARKED { mark. Bit(M) = MARKED if not atom(*M) push(mark. Stack, *M) } } } sweep() { N = Heap_bottom while N < Heap_top if mark. Bit(N) == UNMARKED free (N) else mark. Bit(N) = UNMARKED N += N. size } OOPSLA 2000 Tutorial: Garbage Collection 65
Marking exercise A B C D E F G H © Richard Jones, Eric Jul, 1999 -2000 I OOPSLA 2000 Tutorial: Garbage Collection 66
Fragmentation allocate() Fragmentation: inability to use available memory • External: allocated memory scattered into blocks; free blocks cannot be coalesced • Internal: memory manager allocated more space than actually required — common causes are headers, rounding sizes up Fragmentation is a problem for explicit memory managers as well; free() is often not free. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 69
Copying GC is a set-partitioning problem 1. A mark-bit is one way of defining two sets. 2. Mark-compact physically moves members of the live set to a different part of the heap • the free pointer marks the dividing line between live data and memory that can be overwritten 3. Copying collection is a simpler solution: it picks out live objects and copies them to a ‘fresh’ heap © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 70
Copying GC Example scan=free so collection is complete © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 72
flip(){ Fromspace, Tospace = Tospace, Fromspace top_of_space = Tospace + space_size scan = free = Tospace for R in Roots {R = copy(R)} while scan < free { for P in Children(scan) {*P = copy(*P)} scan = scan + size (scan) } } copy(P) { if forwarded(P){return forwarding_address(P)} else { addr = free move(P, free) free = free + size(P) forwarding_address(P) = addr return addr } } © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 73
The sharing problem © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 74
Forwarding addresses preserve sharing © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 75
Copying GC Example copy root scan C' B' A' scan D' and E' scan=free scan G' F' and update pointer, copy F and G, copy B and C, D E, use A'snothingis complete so collection to do forwarding address leaving forwarding addresses leaving forwarding address © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 76
O H P Disadvantages of copying GC û Stop-and-copy may be disruptive Degrades with residency û Requires twice the address space of other simple collectors • touch twice as many pages • trade-off against fragmentation û Cost of copying large objects Long-lived data may be repeatedly copied û All references must be updated Moving objects may break mutator invariants û Breadth-first copying may disturb locality patterns © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 79
Complexity: caveat emptor Claim: “Copying is always better than mark-sweep GC” The collectors we've seen so far are very simple minded Let us compare their basic performance… Copying is more expensive than setting a bit Efficient implementations of mark-sweep are dominated by cost of mark phase • linear scanning less expensive than tracing, and • cost of sweep can be reduced further Simple asymptotic complexity analyses are misleading © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 80
Part 5: Memory management in C++ does not provide automatic memory management Techniques RC with smart pointers Conservative GC using mark-sweep Finalisation © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 82
Cleanup in destructors Important in Destructors in C++: Delete all objects in pointer members. Yclass *p, *q; . . . Void Xclass() { p = new Yclass(); q = new Yclass(); } void ~Xclass() { delete p; delete q; } © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 83
”Must do” for pointer members in C++ For all pointer members check: • • • Initialisation in each constructor Deletion in assignment operator Deletion in destructor Does copy constructor create shared objects? Is creation paired with deletion? © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 84
General advice C++ In general: • Exploit the concept of ownership • Check return value of new – it may be null! • Adhere to convention, e. g. , write delete if you write new • Consider passing and returning objects by value. • Writing a function that returns a dereferenced pointer is a memory leak waiting to happen! © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 85
The Smart Pointer Concept Basic idea: allow the programmer to write code that is executed every time a pointer is manipulated: • Creation • Assignment • Copy constructor Smart pointers is a powerful language concept that can be used for many purposes including Garbage Collection. The point: Smart pointers can be thought of as adding a level of indirection: Instead of having a reference to an object, you get a reference to a smart pointer object which executes some code every time you use the original reference. The smart pointer object contains a reference to the real object in question. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 86
RC with smart pointers Common C++ technique: The basic idea is that the smart pointer object maintains a reference count together with the object reference count. Template<typename T> class shared_ptr { T *ptr; long *rc; public: shared_ptr(T* p) : ptr(p) { rc = new long; } ~shared_ptr() { delete ptr; delete rc; } © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 87
Using Smart Pointers for RC More smart pointer RC implementation T& operator*() { return *ptr; } T* operator->() { return ptr; } shared_ptr& operator= (other object r) { if (--*rc == 0) { delete ptr; } // last reference to object increment reference count for r } © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 88
Smart Pointers comments Smart pointers are ingenious but their actual implementation is quite gory. However, they work and can be utilise for many purporses including RC GC. ADVICE: start by using a publicly available implementation such as the one given in the notes. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 89
Conservative collectors ‘Uncooperative’ compilers (e. g. for C++) present the GC implementer with many challenges. • For compatibility with existing compilers and libraries, we cannot alter the layout of data in the heap. • We cannot add headers to objects Further constraints: • values of words cannot be changed unless it is safe to do so • compiler optimisations may compromise reachability invariants • memory manager must be library-safe © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 90
Finding live objects Conservative collectors try hard to find garbage but if in doubt, • be conservative • declare questionable objects live. Conservative collectors have little or no knowledge of • where roots are to be found • stack frame layout • which words are pointers and which are not © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 91
BDW collector: C interface It’s this simple to use: To make a C program a garbage collected program, just add: #define malloc(sz) GC_malloc(sz) #define realloc(p, sz) GC_realloc(p, sz) #define free(p) To improve performance, allocate pointer-free data with GC_malloc_atomic(sz) GC_realloc_atomic(p, sz) to tell the collector not to trace these objects © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 92
BDW C++ interface Objects derived from class gc are collectable. class A : public gc {. . . }; A* a = new A; // a is collectable. To collect non-class typedef int A[10]; A* a = new (GC) A; Objects allocated with : : operator new are uncollectable Both uncollectable and collectable objects can be explicitly deleted • delete invokes an object's destructors and frees its storage immediately. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 93
Performance GC may add a little overhead to a program, but this must be weighed against the potential large gain in reliability and programmer efficiency. Modern collectors have short pause times, are fast, and are being used increasingly, e. g. , in Java implementations. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 94
Execution time © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 96
Maximum memory usage Note: with Ultrix allocator, gawk and cfrac used only 79 and 64 kb respectively © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 97
Some caveats about these figures No attempts were made optimise for GC • no account was taken of atomic objects • the tests were done prior to ‘blacklisting’ leak-prevention technology Programming style was ignored • GC'ed programs still supported malloc/free invariants • pointers were not nulled to allow GC to release memory • unnecessary copying may have happened At best, these surveys provide an upper bound on the cost of this conservative collector. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 98
Problems for conservative GC Wrongly identifying a bit-pattern as a pointer would cause a leak • Classes of data such as compressed bitmaps prone to false pointers • If reference points into the heap but fails validity tests black list that address — do not use it for allocation Disguised pointers • Some programming practices may hide pointers Most such practices are not ANSI compliant • Optimising compilers can destroy last reference to a live object but reinstate it later In practice • Leaks are relatively uncommon • Defensive programming techniques prevent common scenarios • Optimising compilers don’t destroy all references © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 99
Defensive programming Programmers can help conservative collectors avoid leaks caused by pointer misidentification. 1. Tell the collector that an object never contains pointers Image im = (Image) GC_malloc_atomic(sz); 2. Use ‘cons’-style lists rather storing spine pointers in contents objects © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 100
Finalisation Finalisers are methods called when an object dies • explicitly when it is deleted • implicitly by the collector Finalisation is commonly used to release scarce resources (e. g. to close files) Correspoindingly, initialisation is allocation. In non-GC'ed languages, most finalisation is to reclaim memory • with GC, this action is unnecessary © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 101
Part 6: Advanced collectors Simple tracing collectors suffer from several drawbacks • disruptive delays • repeated work on long-lived objects • poor spatial locality We now outline the approaches taken by sophisticated garbage collectors. © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 102
Generational GC Weak generational hypothesis “Most objects die young” It is common for 80 -95% objects to die before a further megabyte has been allocated • 95% of objects are ‘short-lived’ in many Java programs • 50 -90% of CL and 75 -95% of Haskell objects die before they are 10 kb old • SML/NJ reclaims 98% of any generation at each collection • Only 1% Cedar objects survived beyond 721 kb of allocation © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 103
Not a universal panacea Generational GC is a successful strategy for many but not all programs. There are common examples of programs that do not obey the weak generational hypothesis. It is common for programs to retain most objects for a long time and then to release them all at the same time. Generational GC imposes a cost on the mutator: û pointer writes become more expensive © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 105
Incremental garbage collection • runs collector in parallel with mutator • attempts to bound pause time • many soft real-time solutions • but no general hard real-time solutions yet © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 106
Making GC incremental Sequential GC can be made incremental by interleaving collection with allocation. At each allocation, do a small amount of GC work. Tune the rate of collection to the rate of allocation to prevent mutator running out of memory before collection is complete © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 107
Synchronisation Asynchronous execution of mutator and collector introduces a coherency problem. For example, in the marking phase Update(right(B), right(A)) right(A) = nil Collector Update(right(A), right(B)) marks A right(B) = nil Collector scans A Collector marks B Collector scans B © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 108
Two ways to prevent disruption There are two ways to prevent the mutator from interfering with a collection by writing white pointers into black objects. 1) Ensure the GC sees objects before the mutator does • when mutator attempts to access a white object, the object is visited by the collector • protect white objects with a read-barrier 2) Record where mutator writes pointers so that the GC can (re)visit objects • protect objects with a write-barrier © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 109
In conclusion Points: • Garbage collection is useful • You can live without – albeit that can be painful • Automatic mechanisms for GC are better – even at a slight extra execution time cost • Conservative collectors actually work • Classic algorithms reviewed • There are many advanced collector available © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 110
Final Remarks Garbage collection is a relatively mature technology. But hard problems remain. Commercial deployment of collector technology is still at an early stage. There are few players, and they use a small set of solutions. There is no one magic solution to all problems: know your application! Resources • www. cs. ukc. ac. uk/people/staff/rej/gc. html © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 111
Copying GC Example copy root scan C' B' A' scan D' and E' scan=free scan G' F' and update pointer, copy F and G, copy B and C, D E, use A'snothingis complete so collection to do forwarding address leaving forwarding addresses leaving forwarding address © Richard Jones, Eric Jul, 1999 -2000 OOPSLA 2000 Tutorial: Garbage Collection 112
2f493928abfe27bfef5b94be7a526879.ppt