dfc0b607b61e477c40f507e133342958.ppt
- Количество слайдов: 47
Automatic Pool Allocation: Compile-Time Control Over Complete Pointer-Based Data Structures Vikram Adve University of Illinois at Urbana-Champaign Joint work with: Chris Lattner, Dinakar Dhurjati, Sumant Kowshik Thanks: NSF (CAREER, Embedded 02, NGS 00, NGS 99, OSC 99), Marco/DARPA
Why Does Data Layout Matter? Performance Security Working sets Buffer overruns Spatial locality Dangling pointers Temporal locality Uninitialized Heap allocation pointers overheads S/w Reliability Dangling pointers Checkpointing Static bug detection Static data race detection … and complex heap-based data structures are ubiquitous.
Compiling Pointer-Intensive Codes Today Current analyses and transformations focus on primitives Ø disambiguate individual loads and stores Ø optimize individual loads and stores Ø reorder, split, or merge individual data types Q. Can compilers manipulate entire logical data structures? A list? A tree of linked lists? A hashtable? A graph?
List 1 Nodes List 2 Nodes Tree Nodes What the program creates What the compiler sees: : What the compiler SHOULD create and see:
Why Segregate Data Structures into Pools? Programs are designed around data structures Direct benefit of segregation: Better performance Ø Smaller working sets Ø Improved spatial locality Ø Sometimes convert irregular to regular strides Primary Goal: Better compiler information & control Ø Compiler knows where (sets of) data structures live in memory Ø Compiler knows order of data in memory (in some cases)
Outline FAutomatic Pool Allocation [ LA: PLDI 05] • Using Pool Allocation to Improve Performance Ø Use 1: Improving heap locality, performance Ø Use 2: Transparent pointer compression LA: MSP 05] [ • Using Pool Allocation for Bug Detection, Security Ø Use 3: Detecting buffer overruns fast and transparently DA: ICSE 06 ] Ø Use 4: Detecting all dangling pointer errors fast [ DA: Submitted ] Ø Use 5: SAFECode. . . • SAFECode: A Safe Execution Environment for [
Automatic Pool Allocation The transformation algorithm [Lattner and Adve, PLDI 2005] (Best Paper Award)
Pool Allocation: Current Approaches Compiler has no information Current Manual Pool Allocation about pool properties Ø Via library: By class (e. g. , C++ STL), scope, or data structure Ø Via language support: By scope or data structure Automatic Region Inference for ML (Tofte & Birkedal, Aiken) Goal is memory management Ø By lifetime only, e. g. , stack of regions not layout control, Ø Limited destructive updates not DS separation Never automated before 1. Imperative languages including C, C++, … 2. Pool allocation by logical data structures
Pool Allocation: The Key Insight Partition heap objects according to the results of some pointer analysis. The pointer analysis representation we use is called a Data Structure Graph (DS Graph).
DS Graph Properties int G; Each pointer field has a single void two. Lists() { outgoing edge {G, H, S, U} : Storage list *X = make. List(10); class list *Y = make. List(100); add. GTo. List(X); add. GTo. List(Y); free. List(X); free. List(Y); int: GMRC G Y list: HMRC } X list: HMRC list* Object type Field-sensitive for “type-safe” nodes int These data structures have been proven
DS Graph for Olden MST Benchmark Key Insight : “Fully contextsensitive” points-to graph identifies data structure instances “Fully contextsensitive” Identify objects by full acyclic call
DS Graph for Olden EM 3 D Benchmark
DS Graph for Olden Power Benchmark Olden-Power Benchmark build_tree() t = malloc(…); t->l = build_lateral(…); build_lateral() l = malloc(…); l->next = build_lateral(…); l->b = build_branch(…);
Automatic Pool Allocation Overview Segregate memory according to points-to graph N graph nodes 1 pool (default: 1 -to-1) Retain explicit free() for objects Points-to graph (two disjoint linked lists) Pool 1 Pool 2 Pool 3 Pool 1 Pool 2 Pool 4
Points-to Graph Assumptions Specific assumptions: Ø Separate points-to graph for each function Ø Unification-based graph Ø Can be used to compute escape info head list: HMR list* int Linked List Use any points-to that satisfies the above Our implementation uses DSA [Lattner: Ph. D] Ø Infers C type info for many objects Ø Context-sensitive Ø Field-sensitive analysis DSA+pool allocation time < Ø Results show that it is very 3% of GCC -O 3 for all tested fast: programs.
Pool Allocation: Example list *make. List(int Num), {Pool* P) poolalloc(P); list *New = malloc(sizeof(list)); , New->Next = Num ? make. List(Num-1) P) : 0; New->Data = Num; return New; } int two. Lists( Pool* P 2) ) { Pool P 1; poolinit(&P 1); Change calls to free into calls to poolfree retain explicit deallocation , list *X = make. List(10); &P 1) , list *Y = make. List(100); P 2) GL = Y; add. GTo. List(X); add. GTo. List(Y); free. List(X); &P 1) , , free. List(Y); P 2) } pooldestroy(&P 1); P 1 P 2
Pool Allocation Algorithm Details Indirect Function Calls: fp 1 { F 1, F 2 } call fp 2 arg 1 … arg. N fp 2 { F 2, F 3 } Must pass same pool arguments to F 1, F 2 and F 3 Ø Partition functions into equivalence classes: call fp 1 arg 1 … arg. N If F 1, F 2 have common call-site same class Ø Merge points-to graphs for each equivalence class Ø Apply previous transformation unchanged Pools reachable from global variables Such a pooldesc is a “runtime constant, ” so make it global also Ø See paper for details [LA: PLDI 05]
Pool Allocation Properties Strengths: • • • Transparent: Fully automatic for any LLVM program Static Map: Every pointer var/field points to unique, known pool Pool Type Information: Many type-homogeneous pools Lifetimes: Lifetime of every pool is bounded Pool Points-to Graph: Compiler knows which pools contain pointers to every pool, and vice versa Limitations: 1. No deallocation: No automatic deallocation of items in pools 2. Unsafe: No guarantee of memory safety
Use 1 of Pool Allocation Improving performance of heap-intensive codes [Lattner and Adve, PLDI 2005]
Pool Allocation Speedup Most programs are 0% to Two are 10 x faster, one 20% faster 2 x faster is almost with pool allocation alone Several programs unaffected by pool allocation 10 -20% speedup across many pointer intensive programs Some programs (ft, chomp) order of magnitude faster
Cache/TLB miss reduction Miss rates measured with perfctr on AMD Athlon 2100+ Sources: Ø Defragmented heap Ø Reduced inter-object padding Ø Segregating the heap!
Chomp Access Pattern with Malloc Allocates three object types (red, green, blue) Each traversal sweeps through all of memory Blue nodes are interspersed with green/red nodes Spends most time traversing green/red nodes
Chomp Access Pattern with Pool. Alloc
FT Access Pattern With Malloc Heap segregation has a similar effect on FT: Ø See Lattner’s Ph. D. thesis for details
Pool Specific Optimizations Different Data Structures Have Different Properties Pool allocation segregates heap: Ø Optimize using pool-specific properties build traverse destroy head list: HMR list* int complex allocation pattern Examples of properties we look for: Ø Pool is type-homogenous Ø Pool contains data that only requires 4 -byte alignment Ø Opportunities to reduce allocation overhead
Pool-Specific Optimizations 1. Selective Pool Allocation Ø Don’t pool allocate when not profitable 2. Pool. Free Elimination Ø poolfree redundant if followed by pooldestroy 3. “Bump-pointer” allocation if pool has no poolfree: Ø Eliminate per-object header Ø Eliminate freelist overhead (faster object allocation) 4. Type-safe pools infer a type for the pool: Ø Use 4 -byte alignment for pools we know don’t need it
Pool Optimization Speedup (Full. PA) PA Time One is are 5 -15% faster Pool optimizationsother Pool optzns effect help Most 44% faster, can be additive with thethan with optimizationspool some progs that pool is 29% faster allocation itself doesn’t with Pool Alloc alone allocation effect Baseline 1. 0 = Run Time with Pool Allocation Optimizations help all of these programs: Ø Despite being very simple, they make a big impact
Use 3 of Pool Allocation Detecting buffer overruns fast and transparently [Dhurjati and Adve, ICSE 2006, to appear]
Array Bounds Errors Most common reason for security attacks Ø Over 50% of attacks reported by CERT 1988: First exploited … 2006: Continues to get exploited Key problem : Tracking target object of each pointer is very expensive (without “fat pointers”)
Jones-Kelley: Transparent Bounds Checking p = malloc(n * sizeof(int)); (…, …) … (p, n *4) q =. . . ; (…, …) … lookup ref = lookup(q); r = q + i; Check(ref, r); (p, n*4) q Idea : Register all array objects in a global splay tree; lookup on every pointer calculation Advantage : Backwards-compatible: no wrappers needed
Separate search tree per pool p = malloc(n * sizeof(int)); … P 1 P 2 (p, n*4) (…, …) q =. . . ; … ref = lookup(P 1, q); r = q + i; Check(ref, r); 3 Key Insights: 1. Splay tree for a pool should be (very) small. In fact, 2 -element cache works great! 2. Pool for each pointer is known! 3. In type-homogeneous pools, can distinguish (and ignore) scalars.
Experimental Results Dramatic improvement in lookup overheads Ø Average overhead: 12% for Olden (34%, 69% for 2 cases) Ø < 4% for 2 system daemons Compares with 5 x-6 x for original Jones-Kelly. Up to 11 x-12 x for Ruwase-Lam extension (which we use). Effective in finding bugs Ø Zitser’s suite: models 14 buffer overruns in sendmail (7), wu-ftpd (4), bind (3) Caveat: Like J-K, doesn’t work for casts from pointers to int Ø All 14 and back detected successfully.
Use 5: SAFECode A Safe Compilation Strategy for C/C++ Programs • Sound analysis [Dhurjati and Adve, PLDI 2006, to appear] Formal proof of soundness is in accompanying technical report [TR: UIUCDCS-R-2005 -2657]. • Memory safety [Dhurjati et al. , PLDI 2006, TECS 2005]
Safe Languages Provide Basic Guarantees e. g. , Java, C#, Modula -3, ML 1. Prevent memory access violations 2. Detect errors during development Often ignored 3. Enable sound compile-time analyses Ø e. g. in tools for safety checking, model checking, program verification Weakly typed languages like C, C++ do not provide any of these benefits
Why care about C/C++? Huge body of essential legacy software Dominant in critical domains: OS kernels, embedded systems, daemons, language runtime systems. Example: Microsoft Longhorn (basis of Vista)? Ø Less than 25% in C# [Amitabh Srivastava, CGO 04 keynote address] Ø Mostly high level components, e. g. , windowing system he features that make C/C++ still in C/C++ system softwa Ø Performance critical code popular for are the features that make C/C++ unsafe: Nested structs; stack-allocated objects; untagged unions; explicit free; custom allocators.
Current Solutions No memory violations Error checking Sound static analysis Several 100 x - some - Safe. C 5 x - some - Jones-Kelley 5 -6 x - some - SFI Over 2 x y - - Fisher. Patil 2 x-6 x Y Y - Yong Modified C Overhead Purify, Valgrind Pure C Solution Over 2 x - some - SAFECode 0 -30% Y some Y CCured Upto 1. 87 x Y some Y Cyclone 1 x-2 x Y some y
SAFECode Compiler and Run-time System • A typed assembly language (LLVM) Ø Language-independent Ø Simple, transparent runtime system • Sound analysis and memory safety Ø Heap safety: via Automatic Pool Allocation + run-time checks Ø Stack safety: via Data Structure Analysis (DSA) + heap conversion Ø Array safety: via pool checks or precise array bounds itially, for “type-safe” C, with restricted pointer casts [TECS 200 checks Now, for nearly arbitrary, unmodified C programs [PLDI 2006]
Guaranteeing Static Analysis Many program verification tools build on alias analysis, call graph, assumed type information Ø E. g. , SLAM, ESP , BLAST Memory errors can invalidate these analyses Detecting all memory errors is expensive Ø Dangling pointer errors Ø Precise array bounds errors Solution : Enforce key analyses in the presence of some memory errors: Alias analysis, call graph, type information.
What is Alias Analysis A static summary of memory objects and their connectivity head struct List* head = make. List(20); int P[4]; P[i] = …. struct List (TK) next val P Q struct List *Q = (Struct List *)P; Q->val = … TU S, A field 0 TK : Type Known, TU : Type Unknown H
Enforcing Alias Analysis Problem 1: Ø Must ensure that tmp points to an object in this points-to set With normal allocation: TU S, A field 0 tmp struct List (TK) H next val Ø Objects are scattered in memory Ø Checking set membership at run-time is extremely expensive Insight 1: Automatic Pool Allocation partitions heap corresponding to nodes in the graph. These partitions are compact and can be checked Caveat: Currently only flow-insensitive, unification based efficiently!
Enforcing Alias Analysis Problem 2: Ø Checking every pointer access or initialization is still very expensive Insight 2: Ignoring memory errors, any pointer obtained from TK pool already has correct aliasing behavior. Pointers obtained from other pools will be explicitly checked: Poolcheck(PP, p , align): • Mask lower k bits of p, look in hash table of page addresses in PP • Alignment check if array references in TK pool
Tolerating Dangling Pointers Problem 3: Ø But memory errors (dangling pointer errors, array bounds violations) could corrupt locations in TK pools Insight 3 (also used for “type-safe” C w/o GC): Reallocating a freed block to a new request of the same type cannot cause any type violation or (in the same pool) aliasing violation, despite dangling pointers. Only array references in TK pools must be checked (can optimize): Poolcheck(PP, p , align).
Evaluation of Run-time Overhead Programs: Olden, Ptrdist, 3 system daemons Program SAFECode ratio CCured ratio bh No source changes necessary Compared Olden with Ccured. 1. 03 1. 31 bisort 1. 00 0. 97 em 3 d 1. 27 1. 49 treeadd 0. 99 2. 72 tsp 0. 99 1. 23 yacr 2 1. 30 - ftpd 1. 00 - fingerd 1. 03 - Max 1. 30 2. 72
Summary
What Could You Do With Pool Allocation? Embedded Systems Ø Pointer compression, data compression for embedded codes Ø Data partitioning for explicit local memories / buffers / tiles Ø Power savings for dead / dormant pools Dependable Systems Ø Efficient checkpointing by ignoring unmodified pools Ø Efficient replicated execution for servers Ø Focusing instrumentation for program testing High Performance Systems Ø Data-structure-centric profiling
Summary Automatic Pool Allocation Gives compilers information about data structure layouts, lifetimes, points-to information SAFECode A sound execution strategy for C, C++ programs: enable sound analysis, enforce memory safety. llvm. cs. uiuc. edu
llvm. cs. uiuc. edu
dfc0b607b61e477c40f507e133342958.ppt