Скачать презентацию HIGH-LEVEL ADAPTIVE PROGRAM OPTIMIZATION WITH ADAPT Michael J Скачать презентацию HIGH-LEVEL ADAPTIVE PROGRAM OPTIMIZATION WITH ADAPT Michael J

0c8f6f81c39266151125b9a247d7e01a.ppt

  • Количество слайдов: 21

HIGH-LEVEL ADAPTIVE PROGRAM OPTIMIZATION WITH ADAPT Michael J. Voss and Rudolf Eigenmann PPo. PP, HIGH-LEVEL ADAPTIVE PROGRAM OPTIMIZATION WITH ADAPT Michael J. Voss and Rudolf Eigenmann PPo. PP, ‘ 01 (Presented by Kanad Sinha)

Agenda Motivation General choices for adaptive optimization ADAPT The Architecture The Language An example Agenda Motivation General choices for adaptive optimization ADAPT The Architecture The Language An example Results

Motivation There’s only so much optimization that can be performed at compile-time. Have to Motivation There’s only so much optimization that can be performed at compile-time. Have to generate code for generic system models – make compile-time assumptions that may be sensitive to input, unknown till runtime. Convergence of technologies – difficult to generate common binary to exploit individual system characteristics.

Motivation Possible solution? “Use of adaptive and dynamic optimization paradigms, where optimization is performed Motivation Possible solution? “Use of adaptive and dynamic optimization paradigms, where optimization is performed at runtime when complete system and input knowledge is available. ”

Ways to go about it… Choose from statically generated codevariants + Easy - May Ways to go about it… Choose from statically generated codevariants + Easy - May not result in max possible optimization - Can result in code explosion Parameterization + Single copy of source - May still not result in max possible optimization Dynamic compilation + Complete input and system knowledge – max optimization possible - Considerable runtime overhead

ADAPT : Features Automated De-Coupled Adaptive Program Optimization Generic framework, which leverages existing tools ADAPT : Features Automated De-Coupled Adaptive Program Optimization Generic framework, which leverages existing tools Uses a domain-specific language, AL, by which adaptive techniques can be specified …

ADAPT : Features (contd. ) Supports dynamic compilation and parameterization Enables optimizations through “runtime ADAPT : Features (contd. ) Supports dynamic compilation and parameterization Enables optimizations through “runtime sampling” Facilitates an iterative modification and search approach

ADAPT : Prelude 3 functions of a dynamic/adaptive optimization system Evaluate effectiveness of particular ADAPT : Prelude 3 functions of a dynamic/adaptive optimization system Evaluate effectiveness of particular optimization for current input & system information Apply optimization if profitable Re-evaluate applied optimizations and tune according current runtime conditions

ADAPT – The Architecture ADAPT – The Architecture

ADAPT – The Architecture Runtime system consists of: Modified version of application Remote optimizer ADAPT – The Architecture Runtime system consists of: Modified version of application Remote optimizer has source code description of target machine stand-alone tools & compilers Local optimizer agent of remote-optimizer on system detects hot-spots tracks multiple interval contexts (here, loop bounds) runs in separate thread Optimization and execution truly asynchronous

ADAPT – The Architecture LO invokes RO, when hotspot detected RO tunes the interval ADAPT – The Architecture LO invokes RO, when hotspot detected RO tunes the interval using available tools, according to user-specified heuristics RPC returns If new code available, dynamically link to application as the new best/experimental version, depending on RO’s message

ADAPT – The Architecture ADAPT – The Architecture

ADAPT – The Architecture Candidate code sections have 2 control flow paths through best ADAPT – The Architecture Candidate code sections have 2 control flow paths through best known version through experimental version Each of these can be replaced dynamically Flag indicates which version to execute Monitor experimental versions of each context collected data used as feedback if better, swap with best known version

ADAPT – The Architecture Optimization process outside critical path/decoupled from execution ADAPT – The Architecture Optimization process outside critical path/decoupled from execution

ADAPT – The Language ADAPT Language (AL) * Features: Uses an LL 1 grammar ADAPT – The Language ADAPT Language (AL) * Features: Uses an LL 1 grammar => simple parser Domain specific language with C-style format Defines reserved words that at runtime contain useful input data and system information * “A full description of ADAPT language is beyond the scope of this paper”, and by extension, this presentation.

ADAPT – An example ADAPT – An example

ADAPT – An example Initialize some variables Constraints Interface to tool to be used ADAPT – An example Initialize some variables Constraints Interface to tool to be used This block defines the heuristic

ADAPT – An example Statement Description constraint(compile- Supplies a compile-time constraint) apply_spec (condition, type, ADAPT – An example Statement Description constraint(compile- Supplies a compile-time constraint) apply_spec (condition, type, syntax[, params]) A description of a tool or flag collect (event list) execute; Initiates the monitoring of an experimental code version mark_as_best Specifies that the code variant that would be generated under the current runtime conditions is a new best known version end_phase Denotes the end of an optimization phase

ADAPT - Results Test Machines: 6 core Sun ULTRA Enterprise 4000, single-core Pentium II ADAPT - Results Test Machines: 6 core Sun ULTRA Enterprise 4000, single-core Pentium II Linux workstation Experiment Result Useless Copying - Run a dynamically compiled version of code without applying any optimization • • Specialization – Loop bounds replaced as constants by their runtime value. Average improvement: • E 4000: 13. 6% • Pentium: 2. 2% Flag Selection – Experiment with various combinations of compiler flags Average improvement: • E 4000: 35% • Pentium: 9. 2% Identified some non-intuitive choices Loop Unrolling – Loop unrolled by factors that evenly divide no. of iterations of innermost loop to a maximum factor of 10. Average improvement: • E 4000: 18% • Pentium: 5% Loop Tiling – Loops deemed appropriate tiled for ½, ¼, . . , 1 /16 of L 2 cache size Average improvement: • E 4000: 13. 5% • Pentium: 9. 8% Parallelization – Loops deemed appropriate by Polaris parallelized Average improvement: • E 4000: 51. 8% Less than ~5% Some cases show a speed-up!

Today’s Take-aways There’s advantage in doing runtime optimization Can be applied to general-purpose programs Today’s Take-aways There’s advantage in doing runtime optimization Can be applied to general-purpose programs as well For full-blown runtime optimization, need to move optimization process outside the critical path

if (questions(“? !”) == 1) delay(); THANK_YOU(“Have a great weekend!”); if (questions(“? !”) == 1) delay(); THANK_YOU(“Have a great weekend!”);