f7cb018e5fe426807942ba6e4974ee91.ppt
- Количество слайдов: 24
Region. Scout: Exploiting Coarse Grain Sharing in Snoop Coherence www. eecg. toronto. edu/aenao Andreas Moshovos moshovos@eecg. toronto. edu Moshovos © 1
Improving Snoop Coherence CPU I$ D$ interconnect Main Memory Conventional Considerations: Complexity and Correctness NOT Power/Bandwidth § Can we: (1) Reduce Power/bandwidth (2) Leverage snoop coherence? § Remains Attractive: Simple / Design Re-use Yes: Exploit Program Behavior to Dynamically Identify Requests that do not Need Snooping § Moshovos © 2
Region. Scout: Avoid Some Snoops CPU I$ D$ interconnect Main Memory Frequent case: non-sharing even at a coarse level/Region n Region. Scout: Dynamically Identify Non-Shared Regions n First Request to a Region Identifies it as not Shared l Subsequent Requests do not need to be broadcast l n Uses Imprecise Information Small structures l Layer on top of conventional coherence l No additional constraints l Moshovos © 3
Roadmap n Conventional Coherence: l The need for power-aware designs n Potential: Program Behavior n Region. Scout: What and How n Implementation n Evaluation n Summary Moshovos © 4
Coherence Basics CPU CPU X p o no s op o sn t hi Main Memory Given request for memory block X (address) n Detect where its current value resides n Moshovos © 5
Conventional Coherence not Power-Aware/Bandwidth-Effective CPU CPU L 2 s is m m s is Main Memory All L 2 tags see all accesses Perf. & Complexity: Have L 2 tags why not use them Power: All L 2 tags consume power on all accesses Bandwidth: broadcast all coherent requests Moshovos © 6
Region. Scout Motivation: Sharing is Coarse Typical Memory Space Snapshot: colored by owner(s) addresses Region: large continuous memory area, power of 2 size n CPU X asks for data block in region R n 1. 2. No one else has X No one else has any block in R Region. Scout Exploits this Behavior Layered Extension over Snoop Coherence Moshovos © 7
Optimization Opportunities CPU I$ D$ SWITCH Memory n Power and Bandwidth Originating node: avoid asking others l Remote node: avoid tag lookup l Moshovos © 8
better % of all requests Global Region Misses Potential: Region Miss Frequency Region Size Even with a 16 K Region ~45% of requests miss in all remote nodes Moshovos © 9
Region. Scout at Work: Non-Shared Region Discovery CPU CPU 2 1 3 2 Region Miss Global Region Miss Main Memory Record: Non-Shared Regions Record: Locally Cached Regions First request detects a non-shared region Moshovos © 10
Region. Scout at Work: Avoiding Snoops CPU CPU 1 2 Global Region Miss Main Memory Record: Non-Shared Regions Record: Locally Cached Regions Subsequent request avoids snoops Moshovos © 11
Region. Scout is Self-Correcting CPU CPU 1 2 2 Main Memory Record: Non-Shared Regions Record: Locally Cached Regions Request from another node invalidates non-shared record Moshovos © 12
Implementation: Requirements n Requesting Node provides address: address n offset lg(Region Size) CPU At Originating Node – from CPU: l n Region Tag Have I discovered that this region is not shared? At Remote Nodes – from Interconnect: l Do I have a block in the region? Moshovos © 13
Remembering Non-Shared Regions address Region Tag offset Non-Shared Region Table valid Few entries 16 x 4 in most experiments Records non-shared regions n Lookup by Region portion prior to issuing a request n Snoop requests and invalidate n Moshovos © 14
What Regions are Locally Cached? Region Tag offset counter n If we had as many counters as regions: Block Allocation: counter[region]++ l Block Eviction: counter[region]-l Region cached only if counter[region] non-zero l n Not Practical: l E. g. , 16 K Regions and 4 G Memory 256 K counters Moshovos © 15
What Regions are Locally Cached? Region Tag offset Cached Region Hash hash p bits counter “Counter”: + on block allocation - on block eviction Few entries, e. g. , 256 P-bit 1 if counter non-zero used for lookups n Use few Counters Imprecise: Records a superset of locally cached Regions l False positives: lost opportunity, correctness preserved l Moshovos © 16
Roadmap n Conventional Coherence n Program Behavior: Region Miss Frequency n Region. Scout n Evaluation n Summary Moshovos © 17
Evaluation Overview n Methodology n Filter rates l n Practical Filters can capture many Region Misses Interconnect bandwidth reduction Moshovos © 18
Methodology n In-House simulator based on Simplescalar l l l l l n Execution driven All instructions simulated – MIPS like ISA System calls faked by passing them to host OS Synchronization using load-linked/store-conditional Simple in-order processors Memory requests complete instantaneously MESI snoop coherence 1 or 2 level memory hierarchy WATTCH power models SPLASH II benchmarks Scientific workloads l Feasibility study l Moshovos © 19
Identified Global Region Misses better Filter Rates CRH Size For small CRH better to use large regions Practical Region. Scout filters capture a lot of the potential Moshovos © 20
CMP better Messages Bandwidth Reduction Region Size Moderate Bandwidth Savings for SMP (15%-22%) More so for CMP (>25%) Moshovos © 21
Related Work n Region. Scout l n Jetty l n Moshovos, Memik, Falsafi, Choudhary, HPCA 2001 PST l n Technical Report, Dec. 2003 Eckman, Dahlgren, and Stenström, ISLPED 2002 Coarse-Grain Coherence l Cantin, Lipasti and Smith, ISCA 2005 Moshovos © 22
Summary n Exploit program behavior/optimize a frequent case l n Many requests result in a global region miss Region. Scout l l l l Practical filter mechanism Dynamically detect would-be region misses Avoid broadcasts Save tag lookup power and interconnect bandwidth Small structures Layered extension over existing mechanisms Invisible to programmer and the OS Moshovos © 23
Region. Scout and Directories n Different information Directory block-level sharing l Region. Scout: Region-level sharing l u Could build Region-level directory u This work serves as motivation n Directories use precise information l Region. Scout does not have to Directories/Implementation n Region. Scout can approximate a directory n l If remote nodes sent sharing info as opposed to a single bit Moshovos © 24


