0c8f6f81c39266151125b9a247d7e01a.ppt
- Количество слайдов: 21
HIGH-LEVEL ADAPTIVE PROGRAM OPTIMIZATION WITH ADAPT Michael J. Voss and Rudolf Eigenmann PPo. PP, ‘ 01 (Presented by Kanad Sinha)
Agenda Motivation General choices for adaptive optimization ADAPT The Architecture The Language An example Results
Motivation There’s only so much optimization that can be performed at compile-time. Have to generate code for generic system models – make compile-time assumptions that may be sensitive to input, unknown till runtime. Convergence of technologies – difficult to generate common binary to exploit individual system characteristics.
Motivation Possible solution? “Use of adaptive and dynamic optimization paradigms, where optimization is performed at runtime when complete system and input knowledge is available. ”
Ways to go about it… Choose from statically generated codevariants + Easy - May not result in max possible optimization - Can result in code explosion Parameterization + Single copy of source - May still not result in max possible optimization Dynamic compilation + Complete input and system knowledge – max optimization possible - Considerable runtime overhead
ADAPT : Features Automated De-Coupled Adaptive Program Optimization Generic framework, which leverages existing tools Uses a domain-specific language, AL, by which adaptive techniques can be specified …
ADAPT : Features (contd. ) Supports dynamic compilation and parameterization Enables optimizations through “runtime sampling” Facilitates an iterative modification and search approach
ADAPT : Prelude 3 functions of a dynamic/adaptive optimization system Evaluate effectiveness of particular optimization for current input & system information Apply optimization if profitable Re-evaluate applied optimizations and tune according current runtime conditions
ADAPT – The Architecture
ADAPT – The Architecture Runtime system consists of: Modified version of application Remote optimizer has source code description of target machine stand-alone tools & compilers Local optimizer agent of remote-optimizer on system detects hot-spots tracks multiple interval contexts (here, loop bounds) runs in separate thread Optimization and execution truly asynchronous
ADAPT – The Architecture LO invokes RO, when hotspot detected RO tunes the interval using available tools, according to user-specified heuristics RPC returns If new code available, dynamically link to application as the new best/experimental version, depending on RO’s message
ADAPT – The Architecture
ADAPT – The Architecture Candidate code sections have 2 control flow paths through best known version through experimental version Each of these can be replaced dynamically Flag indicates which version to execute Monitor experimental versions of each context collected data used as feedback if better, swap with best known version
ADAPT – The Architecture Optimization process outside critical path/decoupled from execution
ADAPT – The Language ADAPT Language (AL) * Features: Uses an LL 1 grammar => simple parser Domain specific language with C-style format Defines reserved words that at runtime contain useful input data and system information * “A full description of ADAPT language is beyond the scope of this paper”, and by extension, this presentation.
ADAPT – An example
ADAPT – An example Initialize some variables Constraints Interface to tool to be used This block defines the heuristic
ADAPT – An example Statement Description constraint(compile- Supplies a compile-time constraint) apply_spec (condition, type, syntax[, params]) A description of a tool or flag collect (event list) execute; Initiates the monitoring of an experimental code version mark_as_best Specifies that the code variant that would be generated under the current runtime conditions is a new best known version end_phase Denotes the end of an optimization phase
ADAPT - Results Test Machines: 6 core Sun ULTRA Enterprise 4000, single-core Pentium II Linux workstation Experiment Result Useless Copying - Run a dynamically compiled version of code without applying any optimization • • Specialization – Loop bounds replaced as constants by their runtime value. Average improvement: • E 4000: 13. 6% • Pentium: 2. 2% Flag Selection – Experiment with various combinations of compiler flags Average improvement: • E 4000: 35% • Pentium: 9. 2% Identified some non-intuitive choices Loop Unrolling – Loop unrolled by factors that evenly divide no. of iterations of innermost loop to a maximum factor of 10. Average improvement: • E 4000: 18% • Pentium: 5% Loop Tiling – Loops deemed appropriate tiled for ½, ¼, . . , 1 /16 of L 2 cache size Average improvement: • E 4000: 13. 5% • Pentium: 9. 8% Parallelization – Loops deemed appropriate by Polaris parallelized Average improvement: • E 4000: 51. 8% Less than ~5% Some cases show a speed-up!
Today’s Take-aways There’s advantage in doing runtime optimization Can be applied to general-purpose programs as well For full-blown runtime optimization, need to move optimization process outside the critical path
if (questions(“? !”) == 1) delay(); THANK_YOU(“Have a great weekend!”);


