1739eecf66e59d31d7dc5da49f06ec8b.ppt
- Количество слайдов: 37
Instruction Selection Copyright 2003, Keith D. Cooper, Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use.
The Problem Writing a compiler is a lot of work • Would like to reuse components whenever possible • Would like to automate construction of components Front End Middle End Back End Today’s lecture: Automating Instruction Selection Infrastructure • Front end construction is largely automated • Middle is largely hand crafted • (Parts of ) back end can be automated
Definitions Instruction selection • Mapping IR into assembly code • Assumes a fixed storage mapping & code shape • Combining operations, using address modes Instruction scheduling • Reordering operations to hide latencies • Assumes a fixed program (set of operations) • Changes demand for registers Register allocation • Deciding which values will reside in registers • Changes the storage mapping, may add false sharing • Concerns about placement of data & memory operations
The Problem Modern computers (still) have many ways to do anything Consider register-to-register copy in ILOC • Obvious operation is i 2 i ri rj • Many others exist add. I ri, 0 rj sub. I ri, 0 rj lshift. I ri, 0 rj mult. I ri, 1 rj div. I ri, 1 rj rshift. I ri, 0 rj … and others … xor. I ri, 0 • or. I ri, 0 would ignore all rjthese Human rj of • Algorithm must look at all of them & find low-cost encoding Take context into account And ILOC is an overly-simplified case (busy functional unit? )
The Goal Want to automate generation of instruction selectors Front End Middle End Back End Infrastructure Machine description Back-end Generator Tables Pattern Matching Engine Description-based retargeting Machine description should also help with scheduling & allocation
The Big Picture Need pattern matching techniques • Must produce good code • Must run quickly (some metric for good ) Our treewalk code generator (Lec. 24) ran quickly How good was the code? Tree x IDENT IDENT Treewalk Code load. I load. AO mult 4 r 5 r 0, r 5 r 6 8 r 7 r 0, r 7 r 8 r 6, r 8 r 9 Desired Code load. AI r 0, 4 r 5 load. AI r 0, 8 r 6 mult r 5, r 6 r 7
The Big Picture Need pattern matching techniques • Must produce good code • Must run quickly (some metric for good ) Our treewalk code generator (Lec. 24) ran quickly How good was the code? Treewalk Code x IDENT IDENT load. I load. AO mult 4 r 5 r 0, r 5 r 6 8 r 7 r 0, r 7 r 8 r 6, r 8 r 9 Pretty easy to fix. See 1 st digression in Ch. 7 Desired Code load. AI r 0, 4 r 5 load. AI r 0, 8 r 6 mult r 5, r 6 r 7
The Big Picture Need pattern matching techniques • Must produce good code • Must run quickly (some metric for good ) Our treewalk code generator (Lec. 24) ran quickly How good was the code? Treewalk Code x IDENT NUMBER <2> load. I load. AO load. I mult 4 r 5 r 0, r 5 r 6 2 r 7 r 6, r 7 r 8 Desired Code load. AI r 0, 4 r 5 mult. I r 5, 2 r 7
The Big Picture Need pattern matching techniques • Must produce good code • Must run quickly (some metric for good ) Our treewalk code generator (Lec. 24) ran quickly How good was the code? Treewalk Code x IDENT NUMBER <2> load. I load. AO load. I mult Desired Code 4 r 5 r 0, r 5 r 6 2 r 7 r 6, r 7 r 8 load. AI r 0, 4 r 5 mult. I r 5, 2 r 7 Must combine these This is a nonlocal problem
The Big Picture Need pattern matching techniques • Must produce good code • Must run quickly (some metric for good ) Our treewalk code generator (Lec. 24) ran quickly How good was the code? Tree x IDENT
The Big Picture Need pattern matching techniques • Must produce good code • Must run quickly (some metric for good ) Our treewalk code generator met the second criteria (lec. 24) How did it do on the first ? Tree x IDENT
How do we perform this kind of matching ? Tree-oriented IR suggests pattern matching on trees • Tree-patterns as input, matcher as output • Each pattern maps to a target-machine instruction sequence • Use dynamic programming or bottom-up rewrite systems Linear IR suggests using some sort of string matching • Strings as input, matcher as output • Each string maps to a target-machine instruction sequence • Use text matching (Aho-Corasick) or peephole matching In practice, both work well; matchers are quite different
Peephole Matching • Basic idea • Compiler can discover local improvements locally Look at a small set of adjacent operations Move a “peephole” over code & search for improvement • Classic example was store followed by load Original code store. AI r 1 r 0, 8 load. AI r 0, 8 r 15 Improved code store. AI r 1 r 0, 8 i 2 i r 15
Peephole Matching • Basic idea • Compiler can discover local improvements locally Look at a small set of adjacent operations Move a “peephole” over code & search for improvement • Classic example was store followed by load • Simple algebraic identities Original code add. I mult r 2, 0 r 7 r 4, r 7 r 10 Improved code mult r 4, r 2 r 10
Peephole Matching • Basic idea • Compiler can discover local improvements locally Look at a small set of adjacent operations Move a “peephole” over code & search for improvement • Classic example was store followed by load • Simple algebraic identities • Jump to a jump Original code jump. I L 10: jump. I L 10 L 11 Improved code L 10: jump. I L 11
Peephole Matching Implementing it • Early systems used limited set of hand-coded patterns • Window size ensured quick processing Modern peephole instruction selectors • Break problem into three tasks IR Expander IR LLIR Simplifier LLIR (Davidson) Matcher ASM LLIR ASM • Apply symbolic interpretation & simplification systematically
Peephole Matching Expander • Turns IR code into a low-level IR (LLIR) such as RTL • Operation-by-operation, template-driven rewriting • LLIR form includes all direct effects (e. g. , setting cc) • Significant, albeit constant, expansion of size IR Expander IR LLIR Simplifier LLIR Matcher LLIR ASM
Peephole Matching Simplifier • Looks at LLIR through window and rewrites is • Uses forward substitution, algebraic simplification, local constant propagation, and dead-effect elimination • Performs local optimization within window IR Expander IR LLIR Simplifier LLIR • This is the heart of the peephole system Matcher ASM LLIR ASM Benefit of peephole optimization shows up in this step
Peephole Matching Matcher • Compares simplified LLIR against a library of patterns • Picks low-cost pattern that captures effects • Must preserve LLIR effects, may add new ones (e. g. , set cc) • Generates the assembly code output IR Expander IR LLIR Simplifier LLIR Matcher LLIR ASM
Example Original IR Code OP Arg 1 Arg 2 Result mult 2 Y t 1 sub x t 1 w Expand LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18
Example LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 LLIR Code Simplify r 13 MEM(r 0+ @y) r 14 2 x r 13 r 17 MEM(r 0 + @x) r 18 r 17 - r 14 MEM(r 0 + @w) r 18
Example LLIR Code r 13 MEM(r 0+ @y) r 14 2 x r 13 r 17 MEM(r 0 + @x) r 18 r 17 - r 14 MEM(r 0 + @w) r 18 Match ILOC Code load. AI r 0, @y r 13 mult. I 2 x r 13 r 14 load. AI r 0, @x r 17 sub r 17 - r 14 r 18 store. AI r 18 r 0 , @w • Introduced all memory operations & temporary names • Turned out pretty good code
Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window)
Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 10 2 r 11 @y r 12 r 0 + r 11
Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 10 2 r 11 @y r 12 r 0 + r 11 r 10 2 r 12 r 0 + @y r 13 MEM(r 12)
Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 10 2 r 12 r 0 + @y r 13 MEM(r 12) r 10 2 r 13 MEM(r 0 + @y) r 14 r 10 x r 13
Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 10 2 r 13 MEM(r 0 + @y) r 14 r 10 x r 13 MEM(r 0 + @y) r 14 2 x r 13 r 15 @x
Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) 1 st op it has rolled out of window r 13 MEM(r 0 + @y) r 14 2 x r 13 r 15 @x r 16 r 0 + r 15
Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 r 14 2 x r 13 r 15 @x r 16 r 0 + r 15 (3 -operation window) r 14 2 x r 13 r 16 r 0 + @x r 17 MEM(r 16)
Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 14 2 x r 13 r 16 r 0 + @x r 17 MEM(r 16) r 14 2 x r 13 r 17 MEM(r 0+@x) r 18 r 17 - r 14
Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 14 2 x r 13 r 17 MEM(r 0+@x) r 18 r 17 - r 14 r 19 @w
Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 17 MEM(r 0+@x) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19
Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 r 18 r 17 - r 14 r 20 r 0 + @w MEM(r 20) r 18
Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 18 r 17 - r 14 r 20 r 0 + @w MEM(r 20) r 18 r 17 - r 14 MEM(r 0 + @w) r 18
Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 18 r 17 - r 14 r 20 r 0 + @w MEM(r 20) r 18 r 17 - r 14 MEM(r 0 + @w) r 18
Example LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 LLIR Code Simplify r 13 MEM(r 0+ @y) r 14 2 x r 13 r 17 MEM(r 0 + @x) r 18 r 17 - r 14 MEM(r 0 + @w) r 18
Making It All Work Details • LLIR is largely machine independent • Target machine described as LLIR ASM pattern • Actual pattern matching Use a hand-coded pattern matcher Turn patterns into grammar & use LR parser (RTL) (gcc) (VPO) • Several important compilers use this technology • It seems to produce good portable instruction selectors Key strength appears to be late low-level optimization


