Скачать презентацию August 14 -15 2006 Memory Management Internals Скачать презентацию August 14 -15 2006 Memory Management Internals

363aad7895aed74007fdf1100b2d4ebe.ppt

  • Количество слайдов: 39

August 14 -15, 2006 August 14 -15, 2006

Memory Management Internals Allocation Strategies for High Performance Steve Smith Software Design Engineer Game Memory Management Internals Allocation Strategies for High Performance Steve Smith Software Design Engineer Game Technology Group Microsoft August 14 -15 2006

Presentation Overview What this talk is about Windows and Xbox 360 memory management How Presentation Overview What this talk is about Windows and Xbox 360 memory management How memory allocation functions work Performance consequences of different allocation schemes Common pitfalls in managing memory What this talk is not about How to write your own custom allocators August 14 -15 2006

Virtual Memory “Virtualizes” physical memory Non-contiguous memory presented as contiguous 4 K native page Virtual Memory “Virtualizes” physical memory Non-contiguous memory presented as contiguous 4 K native page size 4 K or 64 K native page size Can allocate without committing RAM Per-page control over access rights August 14 -15 2006

Virtual Memory 0 x 0000 2 GB Application Code DLL Code Application Data Stacks Virtual Memory 0 x 0000 2 GB Application Code DLL Code Application Data Stacks 0 x 7 FFFFFFF 0 x 80000000 2 GB System 0 x. FFFF August 14 -15 2006

Virtual Memory 0 x 0000 0 x 3 FFFFFFF 0 x 40000000 0 x Virtual Memory 0 x 0000 0 x 3 FFFFFFF 0 x 40000000 0 x 7 FFFFFFF 0 x 80000000 Virtual 4 -KB Page Range 2 GB Virtual 64 -KB Page Range Code 64 -KB Range 0 x 90000000 0 x 9 FFFFFFF 512 MB Code 4 -KB Range 0 x. A 0000000 0 x. BFFFFFFF 0 x. C 0000000 0 x. DFFFFFFF 0 x. E 0000000 0 x. FFFF Physical 64 -KB Range Physical 16 -MB Range 1. 5 GB Physical 4 -KB Range August 14 -15 2006

Virtual Memory Access Rights Avoid giving pages execution rights …unless you really need to. Virtual Memory Access Rights Avoid giving pages execution rights …unless you really need to. Enforce via PAGE_EXECUTE flag Use Read-only pages (PAGE_READONLY) Can catch bugs Performance benefit to loading asset data into memory, then marking as read only Access rights controlled using the Virtual. Protect API August 14 -15 2006

Virtual Memory: Beware! Do not request very large continuous areas of virtual memory Address Virtual Memory: Beware! Do not request very large continuous areas of virtual memory Address space can be fragmented 32 -bit VM: Only 2 GB Available Competing with EXEs, DLLs, stack, heaps, memory-mapped I/O, etc. Do not allocate more memory than is available (no paging!) Prefer 64 K page sizes over 4 K pages August 14 -15 2006

Virtual Memory: Beware! Third-party DLLs are often based at an arbitrary address Fragments VM Virtual Memory: Beware! Third-party DLLs are often based at an arbitrary address Fragments VM space Rebase DLLs to fix this Some D 3 D 9 drivers map all VRAM into VM Significant portion of VM as video memory increased Can result in strange crashes restoring device Make sure you are aware of this Is not an issue with 64 bit August 14 -15 2006

Virtual Memory Best Practices Be careful about VM address space fragmentation Keep custom heap Virtual Memory Best Practices Be careful about VM address space fragmentation Keep custom heap allocations limited to 256 MB or so …Or less… Be careful about physical memory fragmentation August 14 -15 2006

General Purpose Allocation Global standard unbounded heap created automatically Get. Process. Heap to access General Purpose Allocation Global standard unbounded heap created automatically Get. Process. Heap to access heap handle Aligned allocation – Heap. Alloc 8 -byte aligned on 32 -bit Windows 16 -byte aligned on 64 -bit Windows 16 -byte aligned on Xbox 360 Recommended for small to medium size allocations August 14 -15 2006

Minimize Allocations Allocate what you use up front Avoid allocations in-frame! Don’t process data Minimize Allocations Allocate what you use up front Avoid allocations in-frame! Don’t process data on load (other than decompression) Block load data/flatten trees Avoid allocating small chunks of memory Prefer growable arrays/vectors over linked lists …Or avoid allocations altogether… ; ) August 14 -15 2006

“Hidden” Allocations STL Lots of cases of allocations “under the hood” D 3 DX “Hidden” Allocations STL Lots of cases of allocations “under the hood” D 3 DX Internal allocations Not high performance XAudio Does physical allocations under the hood Many other potential cases – be aware! August 14 -15 2006

Wrap It Up… Write a general wrapper to sit on top of platform-specific APIs Wrap It Up… Write a general wrapper to sit on top of platform-specific APIs Removes need to worry about implementation details Can plug in your own custom allocation scheme without changing code Override new, delete, malloc, free… Add debug features Assert on debug August 14 -15 2006

Multi-core Considerations Synchronization Overhead Complexity Memory usage efficiency August 14 -15 2006 Multi-core Considerations Synchronization Overhead Complexity Memory usage efficiency August 14 -15 2006

One Heap Owned per Thread Memory allocation managed in one place Heap synchronization at One Heap Owned per Thread Memory allocation managed in one place Heap synchronization at a higher level Use HEAP_NO_SERIALIZE for heaps created with Heap. Create(…) API Single heap - simplification Potential problem with contention August 14 -15 2006

One Heap Per Thread Each thread creates its own heap Reserves a chunk of One Heap Per Thread Each thread creates its own heap Reserves a chunk of virtual memory Commits on demand (as heap grows) Data locality! Each thread manages its own allocations No synchronization required Use HEAP_NO_SERIALIZE for heaps allocated with Heap. Create(…) API Assert thread ID on allocate/deallocate August 14 -15 2006

Low Fragmentation Heap Build on top of existing heap Windows XP and Vista only Low Fragmentation Heap Build on top of existing heap Windows XP and Vista only Modify the default heap… …or a heap created with Heap. Create(…) ULONG ul. Info = 2; HANDLE h. Def. Heap = Get. Process. Heap(); Heap. Set. Information( h. Def. Heap, Heap. Compatibility. Information, &ul. Info, sizeof( ul. Info ) ); August 14 -15 2006

LFH – How It Works 8 Byte Granularity 16 Byte Granularity Largest Bucket 1 LFH – How It Works 8 Byte Granularity 16 Byte Granularity Largest Bucket 1 Bucket 33 Bucket 128 1 byte to 8 bytes 257 bytes To 262 bytes 15874 bytes To 16384 bytes Bucket 32 Bucket 127 248 bytes To 256 bytes 15858 bytes To 15873 bytes August 14 -15 2006

The C Run-Time Uses process default heap Uses this heap for temporary allocations new The C Run-Time Uses process default heap Uses this heap for temporary allocations new and new[] throw on out-of-memory condition by default Checking for NULL doesn’t buy anything Can link with nothrownew. obj to shut this off Watch out though – STL will then crash on OOM Can also use std: : nothrow…e. g. Foo *p. Foo = new( std: : nothrow ) Foo; August 14 -15 2006

CRT Allocation Uses Heap. Alloc(…) under the hood App compat modes to emulate older CRT Allocation Uses Heap. Alloc(…) under the hood App compat modes to emulate older CRT Allocations 8 -byte aligned _mm_malloc for aligned allocations (16 byte) 16 -byte aligned on Win 64 8– 15 bytes of overhead on Win 32 16– 31 bytes of overhead on Win 64 Overhead adjacent to allocation block August 14 -15 2006

CRT Allocation Uses global default heap CRT manages allocation 16 -byte aligned 16– 31 CRT Allocation Uses global default heap CRT manages allocation 16 -byte aligned 16– 31 byte overhead per allocation Overhead is adjacent to allocation block This can be bad… August 14 -15 2006

Bad…! = cache line boundary = used memory (hot) = overhead or unused (cold) Bad…! = cache line boundary = used memory (hot) = overhead or unused (cold) 16 bytes 50% of cache line Unused! Very bad…! > 80% of cache line Unused! August 14 -15 2006

Better… August 14 -15 2006 Better… August 14 -15 2006

Best… Investigate writing custom small block allocator Investigate custom heap options August 14 -15 Best… Investigate writing custom small block allocator Investigate custom heap options August 14 -15 2006

Platform Recommendations August 14 -15 2006 Platform Recommendations August 14 -15 2006

Recommendations Use Virtual. Alloc(…) to reserve custom heaps Reasonably sized - <= 256 MB Recommendations Use Virtual. Alloc(…) to reserve custom heaps Reasonably sized - <= 256 MB Partition custom heaps to give asset data read-only access after load Memory-mapped files: Keep size under control in Win 32 Be more aggressive in Win 64 Be careful with execute rights August 14 -15 2006

Global. Alloc Old 16 -bit memory model allocation Included only for Emulate old Win Global. Alloc Old 16 -bit memory model allocation Included only for Emulate old Win 3. x behavior Clipboard interaction Don’t use this in your game! August 14 -15 2006

Xbox 360 Memory APIs XPhysical. Alloc(…) XMem. Alloc(…) XPhysical. Protect(…) To change memory protection Xbox 360 Memory APIs XPhysical. Alloc(…) XMem. Alloc(…) XPhysical. Protect(…) To change memory protection on a previously allocated chunk of memory August 14 -15 2006

XPhysical. Alloc Not recommended for performance-critical code… Walks through pages in memory to find XPhysical. Alloc Not recommended for performance-critical code… Walks through pages in memory to find space Potentially rebases used VM pages in range (defragments) Recommended for infrequent allocations Once only for asset data on asset load Allocation for asset data space on startup Use write-combined memory August 14 -15 2006

XPhysical. Alloc The Algorithm Enters spin-lock to block other allocations Checks range & page XPhysical. Alloc The Algorithm Enters spin-lock to block other allocations Checks range & page availability Linear search pages for valid range Search each candidate range for immovable pages until none found Relocate all used VM pages in range Flush Virtual Address List Flush & Invalidate processor caches Flush & Invalidate TLB August 14 -15 2006

XPhysical. Protect Not designed for performance Look into using D 3 D macros instead: XPhysical. Protect Not designed for performance Look into using D 3 D macros instead: GPU_CONVERT_*() Defined in d 3 d 9 gpu. h Example: GPU_CONVERT_GPU_TO_CPU_ADDRESS_64 KB Look for the Memory. Views sample (coming in the September XDK) August 14 -15 2006

XMem. Alloc/XMem. Free Allows custom memory management Used by XDK for many allocations Developers XMem. Alloc/XMem. Free Allows custom memory management Used by XDK for many allocations Developers may provide own implementation XMem. Alloc. Default(…) is the “default”… Wraps XPhysical. Alloc(…) for writecombined or physical memory allocations Wraps Local. Alloc(…) for heap allocations August 14 -15 2006

XMem. Alloc/XMem. Free Overriding default recommended XAudio uses XMem. Alloc(…) a lot internally to XMem. Alloc/XMem. Free Overriding default recommended XAudio uses XMem. Alloc(…) a lot internally to allocate physical memory XMem. Alloc(…) used throughout XDK By overriding, you can significantly reduce some performance bottlenecks To override: just implement your own XMem. Alloc(…) Call XMem. Alloc. Default(…) for cases you don’t want/need to handle yourself August 14 -15 2006

Caring for the Cache Keep related data together in memory Avoid small pages TLB Caring for the Cache Keep related data together in memory Avoid small pages TLB contains only 1024 entries Lots of 4 KB pages can cause TLB misses are expensive Prefer 64 KB pages August 14 -15 2006

Select Types Wisely CPU Write-only data (e. g. asset data) Write-combined memory, 64 -KB Select Types Wisely CPU Write-only data (e. g. asset data) Write-combined memory, 64 -KB pages Large read/write datasets Virtual memory, 64 -KB pages Small/temporary allocations Application heap CRT/Heap. Alloc/Custom Allocator Be aware of overhead and cache issues from generic allocators August 14 -15 2006

Write-Combined RAM Use for write-only data Reading is very expensive Always write in order, Write-Combined RAM Use for write-only data Reading is very expensive Always write in order, minimum 4 -byte chunks Double-check what the compiler is doing May reorder writes, which kills perf August 14 -15 2006

In Summary Avoid Allocations In Frame! Minimize allocations where possible Avoid VM/physical memory fragmentation In Summary Avoid Allocations In Frame! Minimize allocations where possible Avoid VM/physical memory fragmentation Think about overhead of standard APIs Consider custom heap solution Prefer 64 K pages August 14 -15 2006

August 14 -15, 2006 Direct. X Developer Center http: //msdn. microsoft. com/directx Game Development August 14 -15, 2006 Direct. X Developer Center http: //msdn. microsoft. com/directx Game Development MSDN Forums http: //forums. microsoft. com/msdn Xbox 360 Central http: //xds. xbox. com/ XNA Web site http: //www. microsoft. com/xna © 2006 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.