60c222d66c4f079d15864477ea4e3310.ppt
- Количество слайдов: 40
CGCExplorer: A Semi-Automated Search Procedure for Provably Correct Concurrent Collectors Martin Vechev University of Cambridge Eran Yahav David Bacon IBM T. J. Watson Research Center Noam Rinetzky Tel Aviv University
Synthesizing Concurrent Algorithms l Designing practical and efficient concurrent algorithms is hard · trading off simplicity for performance · fine-grained coordination l Result: sub-optimal, buggy algorithms l Need a more structured approach to synthesize correct and optimal implementations out of coarse-grained specifications Some tasks are best done by machine, while others are best done by human insight; and a properly designed system will find the right balance. – D. Knuth
Synthesizing Concurrent Collectors l Concurrent garbage collectors · Widely used · Must be correct, but also fast and scalable · Many algorithms, not many formal proofs l A challenge problem for verification and synthesis • Concurrency • Heap with no a priori bound l Focus on a specific family of collection algorithms · A generalization of Dijkstra’s algorithm · Concurrent, Tracing, Non-moving • Single mutator, single collector (non-parallel)
Contributions l Unifying framework – collection algorithms as common skeleton with parametric functions Mutator Step Trace Step Expose Mutator Collector
Contributions
Contributions explored 1, 600, 000 collection algorithms found 6 correct algorithms specified various sets of blocks hundreds of variations in 10 cycles
Overview Generation High-level design Find algorithm outline Find building blocks Low-level search explore algorithm space Verification High-level design Find a sufficient local invariant Find a sufficient abstraction Low-level search Verify local invariant
Algorithm Space - Counting Algorithms l Track collector’s progress (wavefront) l Count pointer installations from behind wavefront · Increment on install, decrement on delete · Up to a predetermined counting threshold l expose objects with count > 0 when finished tracing Collector wavefront root 1 object header scanned field
Counting Algorithms: High Level View Mutator Step update source field to target obj check wavefront if source field behind wavefront - update new target object count - update old target object count Trace Step read field value update wavefront (collector progress) mark target object select objects with count > 0 produce new roots Expose Mutator Collector
Coarse-Grained to Fine-Grained Synchronization Mutator Step (source, field, new) Trace Step (source, field) { { old = source. field w = source. field. WF w new. MC++ w log = log U {new} w old. MC-source. fld = new } atomic M 1: M 2: M 3: M 4: M 5: M 6: C 1: dst = source. field C 2: source. field. WF = true C 3: mark dst } Set Expose (log) Result is incorrect, may lose objects! atomic What now ? Can we remove atomics ? { } E 1: o = remove element from log E 2: mc = o. MC E 3: (mc > 0) mark o E 4: (mc > 0) V = V U {o} return V
Coarse-Grained to Fine-Grained Synchronization Mutator Step (source, field, new) Trace Step (source, field) { { M 1: M 2: M 3: M 4: M 5: M 6: old = source. field w = source. field. WF w new. MC++ w log = log U {new} w old. MC-source. fld = new } What now ? Can we remove atomics ? C 1: dst = source. field C 2: source. field. WF = true C 3: mark dst } Set Expose (log) { E 1: o = remove element from log E 2: mc = o. MC E 3: (mc > 0) mark o E 4: (mc > 0) V = V U {o} return V }
Coarse-Grained to Fine-Grained Synchronization Mutator Step (source, field, new) Trace Step (source, field) { { M 1: M 2: M 5: M 3: M 4: M 6: } old = source. field w = source. field. WF w old. MC-w new. MC++ w log = log U {new} source. fld = new C 1: dst = source. field C 2: source. field. WF = true C 3: mark dst } “When in doubt, use brute force. ” --Ken Thompson What now ? Can we remove atomics ? Set Expose (log) { E 1: o = remove element from log E 2: mc = o. MC E 3: (mc > 0) mark o E 4: (mc > 0) V = V U {o} return V }
System Input – Building Blocks Mutator Building Blocks Tracing Step Building Blocks M 1: M 2: M 3: M 4: M 5: M 6: C 1: dst = source. field C 3: mark dst C 2: source. field. WF = true old = source. field w = source. field. WF w new. MC++ w log = log U {new} w old. MC-source. fld = new Input Constraints • Mutator blocks: [M 3, M 4] • Tracing blocks: [C 1, C 3] • Expose blocks: [ E 1, E 2, E 3, E 4 ] • Dataflow e. g. M 2 < M 3 Expose Building Blocks E 1: E 2: E 3: E 4: o= remove element from log mc = o. MC (mc > 0) mark o (mc > 0) V = V U {o}
System Output – (Verified) Algorithms l Explored 306 variations in around 2 mins Mutator Step (source, field, new) Trace Step (source, field) { { C 1: dst = source. field C 3: mark dst C 2: source. field. WF = true M 1: old = source. field M 6: source. fld = new M 2: w = source. field. WF } M 3: w new. MC++ M 4: w log = log U {new} Set Expose(log) M 5: w old. MC— { E 1: E 2: E 3: E 4: } o = remove element from log mc = o. MC (mc > 0) mark o (mc > 0) V = V U {o} } l Least atomic (verified) algorithm with given blocks
But What Now ? l How do we get further improvement? l Need more insights l Need new building blocks · Example: start and end of collector reading a field Coordination Meta-data Atomicity Ordering
Continuing the Search… l We derived a non-atomic algorithm (at the granularity of blocks) · Non atomic write-barrier, collector step and expose · System explored over 1, 600, 000 algorithms (took ~34 hours) l All experiments took ~41 machine hours and ~3 human hours
CGC: Challenge for Automatic Verification l Unbounded heap and sequence of mutations l Checking a global invariant is hard · State space too big even for partial checking · 3 nodes can quickly consume several GB in the SPIN model checker l Solution • Manually boil down to a local invariant • Automatically prove local invariant · Use abstraction - unbounded number of concrete nodes conservatively represented by small, bounded number of abstract nodes
What Do We Prove? l Want to prove collector safety · Retaining all live objects l Local invariant: for every object · If an object is referenced from a scanned field at time of expose, it is either marked, or its count > 0 l Show for any arbitrary object, under any arbitrary sequence of mutations
Abstraction Intuition wavefront root hiddn 2 object header scanned field Select tracked representative object Track reference count only for the selected object
Abstraction Intuition wavefront root hiddn 2 object header scanned field Only up to a fixed number of pointers matter – up to counting threshold • Track these precisely • Forget the rest
Recap Generation High-level design Find algorithm outline Find building blocks Low-level search Explore algorithm space Verification High-level design Find a sufficient local invariant Find proof outline Find proof building blocks Find a sufficient abstraction Low-level search Verify local invariant
What’s next? l Concurrent Collector Synthesis · Get real algorithms · Mapping to real machine instructions • Yet another level of search l Synthesis of other concurrent algorithms · In the pipeline – concurrent set algorithms l Local abstractions for concurrent programs
Invited Questions 1) Are your algorithms practical? 2) What are the limitations of this approach? Would it work for my problem? 3) How do you prove that your algorithms terminate? 4) Can you show another algorithm? 5) How do you reduce the number of calls to the modelchecker? 6) You didn’t mention any related work 7) Can you give more details on experimental results?
ANSWERS FOLLOW
Where Do Building Blocks Come From? l Read/write of heap location, and l Collector coordination meta-data · e. g. , collector progress, state flags
Progress Coordination Metadata header fld_1 start_1 end_1 fld_2 start_2 end_2 fld_2 6 bits count marked start_3 end_3 header fld_1 start_1 end_1 fld_2 5 bits count marked start_2 end_2 fld_2 start_3 header fld_1 start_1 … fld_1 end_1 fld_2 start_2 end_2 start_3 end_3 header count marked fld_1 fld_2 header fld_1 fld_2 0 bits … count marked fld_2 1 bit header fld_2 count marked end_3
Refined Input – Finer Building Blocks Mutator Building Blocks M 1: old = source. field M 2 s: ws = source. field. WFs M 2 e: we = source. field. WFe M 3 s: ws new. MC++ M 4 s: ws log = log U {new} M 5 e: we old. MC-M 6: source. fld = new Collector Building Blocks C 1: dst = source. field C 3: mark dst C 2 s: source. field. WFs = true C 2 e: source. field. WFe = true Expose Building Blocks Input Constraints • Mutator: [ M 3 s, M 4 s ] • Tracing: [C 1, C 3], C 2 s < [C 1, C 3] < C 2 e • Expose: [ E 1, E 2, E 3, E 4 ] • Dataflow: e. g. M 2 s < M 3 s E 1: E 2: E 3: E 4: o = remove element from log mc = o. MC (mc > 0) mark o (mc > 0) V = V U {o}
System Output Mutator Step (source, field, new) { Trace Step (source, field) { M 1: M 2 e: M 6: M 2 s: old = source. field we = source. field. WFe source. fld = new ws = source. field. WFs M 3 s: ws new. MC++ M 4 s: ws log = log U {new} M 5 e: we old. MC– } C 2 s: source. field. WFs = true C 1: C 3: dst = source. field mark dst C 2 e: source. field. WFe = true } Set expose (log) { E 1: E 2: E 3: E 4: • Constraints = Insights. e. g. : and. M 2 e < M 6 < M 2 s C 2 s < C 13 < C 2 e } o = remove element from log mc = o. MC (mc > 0) mark o (mc > 0) V = V U {o}
(Some) Related Work l Superoptimizer: a look at the smallest program, Massalin, ASPLOS’ 87 · Finite state, limited length of instruction sequences l Programming by Sketching, Solar-Lezama et. al. , PLDI’ 05 · Finite state l Sketching with Stencils, Solar Leazma et. al. , PLDI’ 07 l Automatic discovery of mutual exclusion algorithms, Bar David and Taubenfeld, PODC’ 03 · Finite state l Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms, PLDI’ 06 l Check. Fence: Sebastian Burckhardt, Rajeev Alur and Milo M. K. Martin, PLDI’ 07 l …
Algorithm Exploration less atomic different orders more atomic
Algorithm Exploration Mutator Step Trace Step less atomic different orders more atomic Expose differe nt orders more atomic
Limitations l Need algorithm designer insights · Designer needs to understand results of each phase l Abstraction is tailor-made · Designing an abstraction for the next collector? l Pushing the limits of current model-checkers · Multiple mutators? Unbounded number of mutators? · Better partial-order reduction may help
Are Your Algorithms Practical? l Are your algorithms correct? l Honest answer: not yet · So far focused on correctness more than on performance · However, counting algorithms are of practical interest The moral is that for the design of multiprocessor installations we cannot rely on the traditional approach of the optimistic engineer, who, when the design looks reasonable, puts it together to see if it works. -- Edsger W. Dijkstra
Experimental Results Run Total Checked Correct Time (min) 1 306 45 1 2 2 2744 162 2 34 3 12 7 2 1 4 592 146 14 56 5 32 26 1 1 6 3024 550 80 212 7 Timed out 8 6144 127 10 39 9 1624320 1833 6 2072 10 364032 288 0 39 2001206 3184 116 2456 TOTAL + About 180 minutes of human working with the system (3. 8 Ghz Xeon processor and 8 Gb memory running version 4 of Red. Hat Linux. )
Why Does it Work? l Ingredients · Relentless optimism · Limited setting l Limited Setting · · single collector, single mutator counting threshold is known algorithm skeleton is fixed algorithm uses a barrier before moving to the sweep phase · … (see paper)
Algorithm Space - Counting Algorithms l Concurrent · Single mutator, single collector (not parallel) l Tracing · Computes transitive reachability from roots l Non-Moving · Collector does not relocate objects
How Do You Prove Termination? Manually
DEMONS START HERE IF NOT EARLIER
Synthesizing Concurrent Algorithms it seems unavoidable that multiprocessor installations will be built… it seems Some tasks are best done by many of them equally unavoidable that machine, while be putare best done aforementioned will others together by by human insight; and a properly designed. Isystem willat the optimistic engineer. shudder find the right balance. thought of all the new bugs: they will only – D. Knuth delight the Devil. Am I too pessimistic? Nobody knows the trouble I have seen. . . --Edsger W. Dijkstra
60c222d66c4f079d15864477ea4e3310.ppt