363aad7895aed74007fdf1100b2d4ebe.ppt
- Количество слайдов: 39
August 14 -15, 2006
Memory Management Internals Allocation Strategies for High Performance Steve Smith Software Design Engineer Game Technology Group Microsoft August 14 -15 2006
Presentation Overview What this talk is about Windows and Xbox 360 memory management How memory allocation functions work Performance consequences of different allocation schemes Common pitfalls in managing memory What this talk is not about How to write your own custom allocators August 14 -15 2006
Virtual Memory “Virtualizes” physical memory Non-contiguous memory presented as contiguous 4 K native page size 4 K or 64 K native page size Can allocate without committing RAM Per-page control over access rights August 14 -15 2006
Virtual Memory 0 x 0000 2 GB Application Code DLL Code Application Data Stacks 0 x 7 FFFFFFF 0 x 80000000 2 GB System 0 x. FFFF August 14 -15 2006
Virtual Memory 0 x 0000 0 x 3 FFFFFFF 0 x 40000000 0 x 7 FFFFFFF 0 x 80000000 Virtual 4 -KB Page Range 2 GB Virtual 64 -KB Page Range Code 64 -KB Range 0 x 90000000 0 x 9 FFFFFFF 512 MB Code 4 -KB Range 0 x. A 0000000 0 x. BFFFFFFF 0 x. C 0000000 0 x. DFFFFFFF 0 x. E 0000000 0 x. FFFF Physical 64 -KB Range Physical 16 -MB Range 1. 5 GB Physical 4 -KB Range August 14 -15 2006
Virtual Memory Access Rights Avoid giving pages execution rights …unless you really need to. Enforce via PAGE_EXECUTE flag Use Read-only pages (PAGE_READONLY) Can catch bugs Performance benefit to loading asset data into memory, then marking as read only Access rights controlled using the Virtual. Protect API August 14 -15 2006
Virtual Memory: Beware! Do not request very large continuous areas of virtual memory Address space can be fragmented 32 -bit VM: Only 2 GB Available Competing with EXEs, DLLs, stack, heaps, memory-mapped I/O, etc. Do not allocate more memory than is available (no paging!) Prefer 64 K page sizes over 4 K pages August 14 -15 2006
Virtual Memory: Beware! Third-party DLLs are often based at an arbitrary address Fragments VM space Rebase DLLs to fix this Some D 3 D 9 drivers map all VRAM into VM Significant portion of VM as video memory increased Can result in strange crashes restoring device Make sure you are aware of this Is not an issue with 64 bit August 14 -15 2006
Virtual Memory Best Practices Be careful about VM address space fragmentation Keep custom heap allocations limited to 256 MB or so …Or less… Be careful about physical memory fragmentation August 14 -15 2006
General Purpose Allocation Global standard unbounded heap created automatically Get. Process. Heap to access heap handle Aligned allocation – Heap. Alloc 8 -byte aligned on 32 -bit Windows 16 -byte aligned on 64 -bit Windows 16 -byte aligned on Xbox 360 Recommended for small to medium size allocations August 14 -15 2006
Minimize Allocations Allocate what you use up front Avoid allocations in-frame! Don’t process data on load (other than decompression) Block load data/flatten trees Avoid allocating small chunks of memory Prefer growable arrays/vectors over linked lists …Or avoid allocations altogether… ; ) August 14 -15 2006
“Hidden” Allocations STL Lots of cases of allocations “under the hood” D 3 DX Internal allocations Not high performance XAudio Does physical allocations under the hood Many other potential cases – be aware! August 14 -15 2006
Wrap It Up… Write a general wrapper to sit on top of platform-specific APIs Removes need to worry about implementation details Can plug in your own custom allocation scheme without changing code Override new, delete, malloc, free… Add debug features Assert on debug August 14 -15 2006
Multi-core Considerations Synchronization Overhead Complexity Memory usage efficiency August 14 -15 2006
One Heap Owned per Thread Memory allocation managed in one place Heap synchronization at a higher level Use HEAP_NO_SERIALIZE for heaps created with Heap. Create(…) API Single heap - simplification Potential problem with contention August 14 -15 2006
One Heap Per Thread Each thread creates its own heap Reserves a chunk of virtual memory Commits on demand (as heap grows) Data locality! Each thread manages its own allocations No synchronization required Use HEAP_NO_SERIALIZE for heaps allocated with Heap. Create(…) API Assert thread ID on allocate/deallocate August 14 -15 2006
Low Fragmentation Heap Build on top of existing heap Windows XP and Vista only Modify the default heap… …or a heap created with Heap. Create(…) ULONG ul. Info = 2; HANDLE h. Def. Heap = Get. Process. Heap(); Heap. Set. Information( h. Def. Heap, Heap. Compatibility. Information, &ul. Info, sizeof( ul. Info ) ); August 14 -15 2006
LFH – How It Works 8 Byte Granularity 16 Byte Granularity Largest Bucket 1 Bucket 33 Bucket 128 1 byte to 8 bytes 257 bytes To 262 bytes 15874 bytes To 16384 bytes Bucket 32 Bucket 127 248 bytes To 256 bytes 15858 bytes To 15873 bytes August 14 -15 2006
The C Run-Time Uses process default heap Uses this heap for temporary allocations new and new[] throw on out-of-memory condition by default Checking for NULL doesn’t buy anything Can link with nothrownew. obj to shut this off Watch out though – STL will then crash on OOM Can also use std: : nothrow…e. g. Foo *p. Foo = new( std: : nothrow ) Foo; August 14 -15 2006
CRT Allocation Uses Heap. Alloc(…) under the hood App compat modes to emulate older CRT Allocations 8 -byte aligned _mm_malloc for aligned allocations (16 byte) 16 -byte aligned on Win 64 8– 15 bytes of overhead on Win 32 16– 31 bytes of overhead on Win 64 Overhead adjacent to allocation block August 14 -15 2006
CRT Allocation Uses global default heap CRT manages allocation 16 -byte aligned 16– 31 byte overhead per allocation Overhead is adjacent to allocation block This can be bad… August 14 -15 2006
Bad…! = cache line boundary = used memory (hot) = overhead or unused (cold) 16 bytes 50% of cache line Unused! Very bad…! > 80% of cache line Unused! August 14 -15 2006
Better… August 14 -15 2006
Best… Investigate writing custom small block allocator Investigate custom heap options August 14 -15 2006
Platform Recommendations August 14 -15 2006
Recommendations Use Virtual. Alloc(…) to reserve custom heaps Reasonably sized - <= 256 MB Partition custom heaps to give asset data read-only access after load Memory-mapped files: Keep size under control in Win 32 Be more aggressive in Win 64 Be careful with execute rights August 14 -15 2006
Global. Alloc Old 16 -bit memory model allocation Included only for Emulate old Win 3. x behavior Clipboard interaction Don’t use this in your game! August 14 -15 2006
Xbox 360 Memory APIs XPhysical. Alloc(…) XMem. Alloc(…) XPhysical. Protect(…) To change memory protection on a previously allocated chunk of memory August 14 -15 2006
XPhysical. Alloc Not recommended for performance-critical code… Walks through pages in memory to find space Potentially rebases used VM pages in range (defragments) Recommended for infrequent allocations Once only for asset data on asset load Allocation for asset data space on startup Use write-combined memory August 14 -15 2006
XPhysical. Alloc The Algorithm Enters spin-lock to block other allocations Checks range & page availability Linear search pages for valid range Search each candidate range for immovable pages until none found Relocate all used VM pages in range Flush Virtual Address List Flush & Invalidate processor caches Flush & Invalidate TLB August 14 -15 2006
XPhysical. Protect Not designed for performance Look into using D 3 D macros instead: GPU_CONVERT_*() Defined in d 3 d 9 gpu. h Example: GPU_CONVERT_GPU_TO_CPU_ADDRESS_64 KB Look for the Memory. Views sample (coming in the September XDK) August 14 -15 2006
XMem. Alloc/XMem. Free Allows custom memory management Used by XDK for many allocations Developers may provide own implementation XMem. Alloc. Default(…) is the “default”… Wraps XPhysical. Alloc(…) for writecombined or physical memory allocations Wraps Local. Alloc(…) for heap allocations August 14 -15 2006
XMem. Alloc/XMem. Free Overriding default recommended XAudio uses XMem. Alloc(…) a lot internally to allocate physical memory XMem. Alloc(…) used throughout XDK By overriding, you can significantly reduce some performance bottlenecks To override: just implement your own XMem. Alloc(…) Call XMem. Alloc. Default(…) for cases you don’t want/need to handle yourself August 14 -15 2006
Caring for the Cache Keep related data together in memory Avoid small pages TLB contains only 1024 entries Lots of 4 KB pages can cause TLB misses are expensive Prefer 64 KB pages August 14 -15 2006
Select Types Wisely CPU Write-only data (e. g. asset data) Write-combined memory, 64 -KB pages Large read/write datasets Virtual memory, 64 -KB pages Small/temporary allocations Application heap CRT/Heap. Alloc/Custom Allocator Be aware of overhead and cache issues from generic allocators August 14 -15 2006
Write-Combined RAM Use for write-only data Reading is very expensive Always write in order, minimum 4 -byte chunks Double-check what the compiler is doing May reorder writes, which kills perf August 14 -15 2006
In Summary Avoid Allocations In Frame! Minimize allocations where possible Avoid VM/physical memory fragmentation Think about overhead of standard APIs Consider custom heap solution Prefer 64 K pages August 14 -15 2006
August 14 -15, 2006 Direct. X Developer Center http: //msdn. microsoft. com/directx Game Development MSDN Forums http: //forums. microsoft. com/msdn Xbox 360 Central http: //xds. xbox. com/ XNA Web site http: //www. microsoft. com/xna © 2006 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.


