Скачать презентацию Instruction Selection Copyright 2003 Keith D Cooper Kennedy Скачать презентацию Instruction Selection Copyright 2003 Keith D Cooper Kennedy

1739eecf66e59d31d7dc5da49f06ec8b.ppt

  • Количество слайдов: 37

Instruction Selection Copyright 2003, Keith D. Cooper, Kennedy & Linda Torczon, all rights reserved. Instruction Selection Copyright 2003, Keith D. Cooper, Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use.

The Problem Writing a compiler is a lot of work • Would like to The Problem Writing a compiler is a lot of work • Would like to reuse components whenever possible • Would like to automate construction of components Front End Middle End Back End Today’s lecture: Automating Instruction Selection Infrastructure • Front end construction is largely automated • Middle is largely hand crafted • (Parts of ) back end can be automated

Definitions Instruction selection • Mapping IR into assembly code • Assumes a fixed storage Definitions Instruction selection • Mapping IR into assembly code • Assumes a fixed storage mapping & code shape • Combining operations, using address modes Instruction scheduling • Reordering operations to hide latencies • Assumes a fixed program (set of operations) • Changes demand for registers Register allocation • Deciding which values will reside in registers • Changes the storage mapping, may add false sharing • Concerns about placement of data & memory operations

The Problem Modern computers (still) have many ways to do anything Consider register-to-register copy The Problem Modern computers (still) have many ways to do anything Consider register-to-register copy in ILOC • Obvious operation is i 2 i ri rj • Many others exist add. I ri, 0 rj sub. I ri, 0 rj lshift. I ri, 0 rj mult. I ri, 1 rj div. I ri, 1 rj rshift. I ri, 0 rj … and others … xor. I ri, 0 • or. I ri, 0 would ignore all rjthese Human rj of • Algorithm must look at all of them & find low-cost encoding Take context into account And ILOC is an overly-simplified case (busy functional unit? )

The Goal Want to automate generation of instruction selectors Front End Middle End Back The Goal Want to automate generation of instruction selectors Front End Middle End Back End Infrastructure Machine description Back-end Generator Tables Pattern Matching Engine Description-based retargeting Machine description should also help with scheduling & allocation

The Big Picture Need pattern matching techniques • Must produce good code • Must The Big Picture Need pattern matching techniques • Must produce good code • Must run quickly (some metric for good ) Our treewalk code generator (Lec. 24) ran quickly How good was the code? Tree x IDENT IDENT Treewalk Code load. I load. AO mult 4 r 5 r 0, r 5 r 6 8 r 7 r 0, r 7 r 8 r 6, r 8 r 9 Desired Code load. AI r 0, 4 r 5 load. AI r 0, 8 r 6 mult r 5, r 6 r 7

The Big Picture Need pattern matching techniques • Must produce good code • Must The Big Picture Need pattern matching techniques • Must produce good code • Must run quickly (some metric for good ) Our treewalk code generator (Lec. 24) ran quickly How good was the code? Treewalk Code x IDENT IDENT load. I load. AO mult 4 r 5 r 0, r 5 r 6 8 r 7 r 0, r 7 r 8 r 6, r 8 r 9 Pretty easy to fix. See 1 st digression in Ch. 7 Desired Code load. AI r 0, 4 r 5 load. AI r 0, 8 r 6 mult r 5, r 6 r 7

The Big Picture Need pattern matching techniques • Must produce good code • Must The Big Picture Need pattern matching techniques • Must produce good code • Must run quickly (some metric for good ) Our treewalk code generator (Lec. 24) ran quickly How good was the code? Treewalk Code x IDENT NUMBER <2> load. I load. AO load. I mult 4 r 5 r 0, r 5 r 6 2 r 7 r 6, r 7 r 8 Desired Code load. AI r 0, 4 r 5 mult. I r 5, 2 r 7

The Big Picture Need pattern matching techniques • Must produce good code • Must The Big Picture Need pattern matching techniques • Must produce good code • Must run quickly (some metric for good ) Our treewalk code generator (Lec. 24) ran quickly How good was the code? Treewalk Code x IDENT NUMBER <2> load. I load. AO load. I mult Desired Code 4 r 5 r 0, r 5 r 6 2 r 7 r 6, r 7 r 8 load. AI r 0, 4 r 5 mult. I r 5, 2 r 7 Must combine these This is a nonlocal problem

The Big Picture Need pattern matching techniques • Must produce good code • Must The Big Picture Need pattern matching techniques • Must produce good code • Must run quickly (some metric for good ) Our treewalk code generator (Lec. 24) ran quickly How good was the code? Tree x IDENT IDENT Treewalk Code load. I load. AO mult @G r 5 4 r 6 r 5, r 6 r 7 @H r 7 4 r 8, r 9 r 10 r 7, r 10 r 11 Desired Code load. I load. AI mult 4 r 5, @G r 6 r 5, @H r 7 r 6, r 7 r 8

The Big Picture Need pattern matching techniques • Must produce good code • Must The Big Picture Need pattern matching techniques • Must produce good code • Must run quickly (some metric for good ) Our treewalk code generator met the second criteria (lec. 24) How did it do on the first ? Tree x IDENT IDENT Common offset Treewalk Code load. I load. AO mult @G r 5 4 r 6 r 5, r 6 r 7 @H r 7 4 r 8, r 9 r 10 r 7, r 10 r 11 Desired Code load. I load. AI mult 4 r 5, @G r 6 r 5, @H r 7 r 6, r 7 r 8 Again, a nonlocal problem

How do we perform this kind of matching ? Tree-oriented IR suggests pattern matching How do we perform this kind of matching ? Tree-oriented IR suggests pattern matching on trees • Tree-patterns as input, matcher as output • Each pattern maps to a target-machine instruction sequence • Use dynamic programming or bottom-up rewrite systems Linear IR suggests using some sort of string matching • Strings as input, matcher as output • Each string maps to a target-machine instruction sequence • Use text matching (Aho-Corasick) or peephole matching In practice, both work well; matchers are quite different

Peephole Matching • Basic idea • Compiler can discover local improvements locally Look at Peephole Matching • Basic idea • Compiler can discover local improvements locally Look at a small set of adjacent operations Move a “peephole” over code & search for improvement • Classic example was store followed by load Original code store. AI r 1 r 0, 8 load. AI r 0, 8 r 15 Improved code store. AI r 1 r 0, 8 i 2 i r 15

Peephole Matching • Basic idea • Compiler can discover local improvements locally Look at Peephole Matching • Basic idea • Compiler can discover local improvements locally Look at a small set of adjacent operations Move a “peephole” over code & search for improvement • Classic example was store followed by load • Simple algebraic identities Original code add. I mult r 2, 0 r 7 r 4, r 7 r 10 Improved code mult r 4, r 2 r 10

Peephole Matching • Basic idea • Compiler can discover local improvements locally Look at Peephole Matching • Basic idea • Compiler can discover local improvements locally Look at a small set of adjacent operations Move a “peephole” over code & search for improvement • Classic example was store followed by load • Simple algebraic identities • Jump to a jump Original code jump. I L 10: jump. I L 10 L 11 Improved code L 10: jump. I L 11

Peephole Matching Implementing it • Early systems used limited set of hand-coded patterns • Peephole Matching Implementing it • Early systems used limited set of hand-coded patterns • Window size ensured quick processing Modern peephole instruction selectors • Break problem into three tasks IR Expander IR LLIR Simplifier LLIR (Davidson) Matcher ASM LLIR ASM • Apply symbolic interpretation & simplification systematically

Peephole Matching Expander • Turns IR code into a low-level IR (LLIR) such as Peephole Matching Expander • Turns IR code into a low-level IR (LLIR) such as RTL • Operation-by-operation, template-driven rewriting • LLIR form includes all direct effects (e. g. , setting cc) • Significant, albeit constant, expansion of size IR Expander IR LLIR Simplifier LLIR Matcher LLIR ASM

Peephole Matching Simplifier • Looks at LLIR through window and rewrites is • Uses Peephole Matching Simplifier • Looks at LLIR through window and rewrites is • Uses forward substitution, algebraic simplification, local constant propagation, and dead-effect elimination • Performs local optimization within window IR Expander IR LLIR Simplifier LLIR • This is the heart of the peephole system Matcher ASM LLIR ASM Benefit of peephole optimization shows up in this step

Peephole Matching Matcher • Compares simplified LLIR against a library of patterns • Picks Peephole Matching Matcher • Compares simplified LLIR against a library of patterns • Picks low-cost pattern that captures effects • Must preserve LLIR effects, may add new ones (e. g. , set cc) • Generates the assembly code output IR Expander IR LLIR Simplifier LLIR Matcher LLIR ASM

Example Original IR Code OP Arg 1 Arg 2 Result mult 2 Y t Example Original IR Code OP Arg 1 Arg 2 Result mult 2 Y t 1 sub x t 1 w Expand LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18

Example LLIR Code r 10 2 r 11 @y r 12 r 0 + Example LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 LLIR Code Simplify r 13 MEM(r 0+ @y) r 14 2 x r 13 r 17 MEM(r 0 + @x) r 18 r 17 - r 14 MEM(r 0 + @w) r 18

Example LLIR Code r 13 MEM(r 0+ @y) r 14 2 x r 13 Example LLIR Code r 13 MEM(r 0+ @y) r 14 2 x r 13 r 17 MEM(r 0 + @x) r 18 r 17 - r 14 MEM(r 0 + @w) r 18 Match ILOC Code load. AI r 0, @y r 13 mult. I 2 x r 13 r 14 load. AI r 0, @x r 17 sub r 17 - r 14 r 18 store. AI r 18 r 0 , @w • Introduced all memory operations & temporary names • Turned out pretty good code

Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window)

Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 10 2 r 11 @y r 12 r 0 + r 11

Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 10 2 r 11 @y r 12 r 0 + r 11 r 10 2 r 12 r 0 + @y r 13 MEM(r 12)

Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 10 2 r 12 r 0 + @y r 13 MEM(r 12) r 10 2 r 13 MEM(r 0 + @y) r 14 r 10 x r 13

Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 10 2 r 13 MEM(r 0 + @y) r 14 r 10 x r 13 MEM(r 0 + @y) r 14 2 x r 13 r 15 @x

Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) 1 st op it has rolled out of window r 13 MEM(r 0 + @y) r 14 2 x r 13 r 15 @x r 16 r 0 + r 15

Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 r 14 2 x r 13 r 15 @x r 16 r 0 + r 15 (3 -operation window) r 14 2 x r 13 r 16 r 0 + @x r 17 MEM(r 16)

Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 14 2 x r 13 r 16 r 0 + @x r 17 MEM(r 16) r 14 2 x r 13 r 17 MEM(r 0+@x) r 18 r 17 - r 14

Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 14 2 x r 13 r 17 MEM(r 0+@x) r 18 r 17 - r 14 r 19 @w

Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 17 MEM(r 0+@x) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19

Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 r 18 r 17 - r 14 r 20 r 0 + @w MEM(r 20) r 18

Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 18 r 17 - r 14 r 20 r 0 + @w MEM(r 20) r 18 r 17 - r 14 MEM(r 0 + @w) r 18

Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 Steps of the Simplifier LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 (3 -operation window) r 18 r 17 - r 14 r 20 r 0 + @w MEM(r 20) r 18 r 17 - r 14 MEM(r 0 + @w) r 18

Example LLIR Code r 10 2 r 11 @y r 12 r 0 + Example LLIR Code r 10 2 r 11 @y r 12 r 0 + r 11 r 13 MEM(r 12) r 14 r 10 x r 13 r 15 @x r 16 r 0 + r 15 r 17 MEM(r 16) r 18 r 17 - r 14 r 19 @w r 20 r 0 + r 19 MEM(r 20) r 18 LLIR Code Simplify r 13 MEM(r 0+ @y) r 14 2 x r 13 r 17 MEM(r 0 + @x) r 18 r 17 - r 14 MEM(r 0 + @w) r 18

Making It All Work Details • LLIR is largely machine independent • Target machine Making It All Work Details • LLIR is largely machine independent • Target machine described as LLIR ASM pattern • Actual pattern matching Use a hand-coded pattern matcher Turn patterns into grammar & use LR parser (RTL) (gcc) (VPO) • Several important compilers use this technology • It seems to produce good portable instruction selectors Key strength appears to be late low-level optimization