Introduction_DX8_Vertex_Shaders.ppt
- Количество слайдов: 45
Introduction to Vertex Shaders Richard Huddy Richard. H@nvidia. com NVIDIA Proprietary
What you guys have been asking for… • Complete control of the transformation and lighting pipeline • Custom vertex lighting • Custom skinning and blending • Custom texgen • Custom texture matrix operations NVIDIA Proprietary
Enter the Vertex Shader • Assembly language interface to the transformation and lighting engine • Instruction set to perform all vertex Tn. L • Constant table to store data (matrices, light position, attenuation, etc) • Registers to save intermediate data • Reads an untransformed, unlit vertex • Creates a transformed and lit vertex NVIDIA Proprietary
Assembly language • • Fixed, complete, very powerful SIMD instruction set Four operations simultaneously (argb, xyzw) Dynamically loaded between primitive calls Extensive support for vector and matrix operations (lighting, rotations, etc. ) • Capable of efficiently implementing the entire functionality of Fixed Function Pipeline NVIDIA Proprietary
Custom Substitute for Standard T&L Vertex Input Constant Memory 128 bits 4 floats 16 entries 128 instructions 12 entries Vertex Output 128 bits 4 floats 13 entries NVIDIA Proprietary A 0 data 128 bits 4 floats Vertex Shader addr Registers 128 bits 4 floats 96 entries
What does it do? • Per vertex calculation • Processing of: • Colors 3 D coordinates - procedural geometry, blending, morphing, deformations • Texture coordinates – texgens, set up for pixel shaders, tangent space bumpmap setup • Fog – elevation based, volume based • • Shader program accepts one input vertex, generates one output vertex NVIDIA Proprietary
What doesn’t it do? • Does not perform polygon based operations Back face culling • Occlusion culling • • Can’t write to other vertices • Does not create vertices NVIDIA Proprietary
What is calculated? • Create a completely specified vertex. • Vertex position in HCLIP space • And, optionally: • Texgen/texture matrix/texture coord output • Lighting/color output • Fog NVIDIA Proprietary
Then what happens? • • • Frustum clip Homogenous divide Viewport Mapping Back Face Cull Rasterization NVIDIA Proprietary
Flexible Input Sign and Muxing X Z W X [-1 ] Y Y Z W Vxxxx Vzxyw X X X Y NVIDIA Proprietary Z W X Y Y Z W
Swizzles Source registers can be swizzled: MOV R 1, R 2. yzwx; before R 1 0. 0 after R 2 x y 7. 0 z 3. 0 6. 0 w 2. 0 NVIDIA Proprietary R 2 R 1 x y 3. 0 z 6. 0 2. 0 w 7. 0 x y 7. 0 z 3. 0 6. 0 w 2. 0 x y z w
Negations Source registers can be negated (and swizzled): MOV R 1, -R 2. yzzx; before R 1 0. 0 after R 2 x y 7. 0 z 3. 0 6. 0 w 2. 0 NVIDIA Proprietary R 2 R 1 x y -3. 0 z -6. 0 w -7. 0 x y 7. 0 z 3. 0 6. 0 w 2. 0 x y z w
Cross Product • i j k • R 0. x R 0. y R 0. z = (R 0. y*R 1. z – R 1. y*R 0. z)i + • R 1. x R 1. y R 1. z (R 0. z*R 1. x – R 1. z*R 0. x)j + • (R 0. x*R 1. y – R 1. x*R 0. y)k; • Or (R 0. yzx * R 1. zxy – R 1. yzx*R 0. zxy) • MUL R 2, R 0. yzxw, R 1. zxyw; • MAD R 2, -R 1. yzxw, R 0. zxyw, R 2 NVIDIA Proprietary
Masks Destination register can mask which components are written to… R 1 write all components R 1. x write only x component R 1. xw write only x, w components NVIDIA Proprietary
The “Constant Area” • 96 entries, each is a Vec 4. Typical uses: • • • Matrix data - 4 of Vec 4’s are typically the matrix for the transform Light characteristics, (position, attenuation etc) Current time Vertex interpolation data Procedural data Only one constant per instruction (but you can use it several times if you want) E. g. mad r 1, v 0, C[95] Note that this is how you access all constant data NVIDIA Proprietary
Input Vertex Data • 16 Vec 4’s from your own Vertex. Buffer(s) • Input vertex is completely flexible “Weakly typed” – meaning it’s up to you to interpret it consistently • Position, normal, texture coordinates etc. NVIDIA Proprietary
Vertex output data • A well defined vertex HCLIP(x, y, z, w) - o. Pos • Diffuse color (r, g, b, a) -> 0. 0 to +1. 0 – o. D 0 • Specular color (r, g, b, a) -> 0. 0 to +1. 0 – o. D 1 • • Up to 4 Texture coordinates (each as s, t, r, q) • One for each physical hardware texture unit – o. T 0 o. T 7 • Fog (f, *, *, *) -> value used in fog equation - o. Fog • Outputs of shader clamped as required NVIDIA Proprietary
Instruction format Generally of the form: Op. Name dest, [-]s 1 [, [-]s 2 [, [-]s 3]] ; comment e. g. mov mad r 1, r 2, r 3, r 4 Destination ‘r’ can have a write-mask Source ‘r’ can be swizzled e. g. mov r 1. x, r 2. y mov r 1, r 2. zxyw ‘[’ and ‘]’ indicate optional modifiers NVIDIA Proprietary
What are the instructions? • nop • mov • mul • mad • add • rsq NVIDIA Proprietary • dp 3 • dp 4 • dst • lit • min • max • slt • sge • expp • log • rcp
nop, mov, mul • nop • Do nothing • mov • Move (with conditional sign change, mask and swizzle) • mul • dest, src 1, src 2 Set dest to the product of src 1 and src 2 NVIDIA Proprietary
add, mad, rsq • add • Add src 1 to src 2. [And the optional negation creates subtraction] • mad • dest, src 1, src 2, src 3 Multiply src 1 by src 2 and add src 3 - into dst • rsq dest, src Source must have one subscript… • dest. x = dest. y = dest. z = dest. w = 1/sqrt(src) • • Reciprocal square root of src (much more useful than straight ‘square root’). NVIDIA Proprietary
dp 3, dp 4 • 3 and 4 Component dot products • dp 3 dest, src 1, src 2 dest. x = dest. y = dest. z = dest. w = • (src 1. x * src 2. x) + • • • (src 1. y * src 2. y) + (src 1. z * src 2. z) • And dp 4 does the same but includes ‘w’ in the computation NVIDIA Proprietary
min, max • min • Component-wise min operation • max • dest, src 1, src 2 Component-wise max operation NVIDIA Proprietary
slt, sge • slt dest, src 1, src 2 dest = (src 1 < src 2) ? 1 : 0 • For each component… • • sge • • • dest, src 1, src 2 dst = (src 1 >= src 2) ? 1 : 0 Which is equivalent to… dst = (src 1 < src 2) ? 0 : 1 i. e. the exact opposite of slt For each component… NVIDIA Proprietary
dst • dst dest, src 1, src 2 Calculate distance vector. src 1 vector is (NA, d*d, NA) and src 2 is (NA, 1/d, NA, 1/d). • dest is set to (1, d, d*d, 1/d) • • Which is what you want for standard attenuation… NVIDIA Proprietary
lit • lit dest, src Calculates lighting coefficients from two dot products and a power. src is: • src. x = n • l (unit normal and light vectors) • • • src. y = n • h (unit normal and halfangle vectors) src. z is unused src. w = power (in range +128 to – 128) dest set to (1. 0, src. x, L, 1. 0) • If src. x > 0. 0 src. w • L = (MAX(src. y, 0) • else L = 0 NVIDIA Proprietary
expp, log • expp dest, src. w dest. x = 2 ** (int)src. w • dest. y = fractional part (src. w) • dest. z = 2 ** src. w • • dest. w = 1. 0 • log dest, src. w dest. x = exponent((int)src. w) • dest. y = mantissa(src. w) • dest. z = log 2(src. w) • dest. w = 1. 0 • NVIDIA Proprietary
rcp • rcp dest, src. w Source must have just one subscript (x, y, z or w) • dest. x = dest. y = dest. z = dest. w = • 1 / src. w So… this is the other half of the puzzle for division • … you divide by doing a ‘rcp’ and then a ‘mul’ • NVIDIA Proprietary
Raise to the Power ; compute scalar r 0. z = r 1. x^r 1. y LIT r 0. z, r 1. xxyy ; r 1. x must be greater than zero NVIDIA Proprietary
Absolute Value ; r 0 = |r 1| MAX r 0, r 1, -r 1 NVIDIA Proprietary
Division ; scalar r 0. x = r 1. x/r 2. x RCP r 0. x, r 2. x MUL r 0. x, r 1. x, r 0. x NVIDIA Proprietary
Square Root ; scalar r 0. x = sqrt(r 1. x) RSQ r 0. x, r 1. x ; using x/sqrt(x) = sqrt(x) is higher MUL r 0. x, r 1. x ; precision than 1/( 1/sqrt(x) ) NVIDIA Proprietary
Set on Less or Equal ; r 0 = (r 1 <= r 2) ? 1 : 0 SGE r 0, -r 1, -r 2 NVIDIA Proprietary
Set on Greater ; r 0 = (r 1 > r 2) ? 1 : 0 SLT r 0, -r 1, -r 2 NVIDIA Proprietary
Set on Equal ; r 0 = (r 1 == r 2) ? 1 : 0 SGE r 0, -r 1, -r 2 SGE r 2, r 1, r 2 MUL r 0, r 2 NVIDIA Proprietary
Set on Not Equal ; r 0 = (r 1 != r 2) ? 1 : 0 SLT r 0, r 1, r 2 SLT r 2, -r 1, -r 2 ADD r 0, r 2 NVIDIA Proprietary
Clamp to [0, 1] Range ; compute r 0 = (r 0 < 0) ? 0 : (r 0 1) ? 1 : r 0 DEF c 0, 0. 0 f, 1. 0 f, 0. 0 f MAX r 0, c 0. x MIN r 0, c 0. y NVIDIA Proprietary
Compute the Floor ; scalar r 0. y = floor(r 1. y) EXPP r 0. y, r 1. y ADD r 0. y, r 1. y, - r 0. y NVIDIA Proprietary
Compute the Ceiling ; scalar r 0. y = ceiling(r 1. y) EXPP r 0. y, -r 1. y ADD r 0. y, r 1. y, r 0. y NVIDIA Proprietary
How do I branch? • No branching, no early out • Why? Performance and predictability • No execution dependencies • • You can multiply by zero, and accumulate NVIDIA Proprietary
Example Vertex Declaration struct POSCOLORVERTEX { FLOAT x, y, z; DWORD diff. Color; }; D 3 DVERTEXELEMENT 9 dw. Decl 3[] = { {0, 0, D 3 DDECLTYPE_FLOAT 3, D 3 DDECLMETHOD_DEFAULT, D 3 DDECLUSAGE_POSITION, 0}, {0, 12, D 3 DDECLTYPE_D 3 DCOLOR, D 3 DDECLMETHOD_DEFAULT, D 3 DDECLUSAGE_COLOR, 0}, D 3 DDECL_END() }; LPDIRECT 3 DVERTEXDECLARATION 9 m_p. Vertex. Declaration; g_d 3 d. Device->Create. Vertex. Declaration(dw. Decl 3, &m_p. Vertex. Declaration); m_pd 3 d. Device->Set. Vertex. Declaration(m_p. Vertex. Declaration); NVIDIA Proprietary
Example Vertex Declaration struct POSCOLORVERTEX { FLOAT x, y, z; DWORD diff. Color; float tu 0, tv 0; }; D 3 DVERTEXELEMENT 9 dw. Decl 3[] = { {0, 0, D 3 DDECLTYPE_FLOAT 3, D 3 DDECLMETHOD_DEFAULT, D 3 DDECLUSAGE_POSITION, 0}, {0, 12, D 3 DDECLTYPE_D 3 DCOLOR, D 3 DDECLMETHOD_DEFAULT, D 3 DDECLUSAGE_COLOR, 0}, {0, 16, D 3 DDECLTYPE_FLOAT 2, D 3 DDECLMETHOD_DEFAULT, D 3 DDECLUSAGE_TEXCOORD, 0}, D 3 DDECL_END() }; LPDIRECT 3 DVERTEXDECLARATION 9 m_p. Vertex. Declaration; g_d 3 d. Device->Create. Vertex. Declaration(dw. Decl 3, &m_p. Vertex. Declaration); m_pd 3 d. Device->Set. Vertex. Declaration(m_p. Vertex. Declaration); NVIDIA Proprietary
Example Vertex Sahder vs_1_1 dcl_position v 0 dcl_color 0 v 1 dcl_texcoord v 2 #define CV_WORLDVIEWPROJ 0 0 #define CV_WORLDVIEWPROJ 1 1 #define CV_WORLDVIEWPROJ 2 2 #define CV_WORLDVIEWPROJ 3 3 def c 4, 1, 1 ; transform to clip space dp 4 o. Pos. x, v 0, c[WORLDVIEWPROJ 0] dp 4 o. Pos. y, v 0, c[WORLDVIEWPROJ 1] dp 4 o. Pos. z, v 0, c[WORLDVIEWPROJ 2] dp 4 o. Pos. w, v 0, c[WORLDVIEWPROJ 3] ; write out color dp 3 o. D 0, v 1, c 4 ; write texture coords mov o. T 0. xy, v 2 NVIDIA Proprietary
Create Vertex Sahder DWORD dw. Flags = 0; dw. Flags |= D 3 DXSHADER_DEBUG; LPD 3 DXBUFFER p. Code = NULL; LPD 3 DXBUFFER p. Errors = NULL; LPDIRECT 3 DVERTEXSHADER 9 m_p. Vertex. Shader = NULL; HRESULT hr. Err = D 3 DXAssemble. Shader. From. File("dx 9/vshader. vsh", NULL, dw. Flags, &p. Code, &p. Errors); if(p. Errors) { char* sz. Errors = (char*)p. Errors->Get. Buffer. Pointer(); p. Errors->Release(); } if(FAILED(hr. Err)) { Message. Box(NULL, "vertex shader creation failed", "CRenderer. DX 9: : Create", MB_OK|MB_ICONEXCLAMATION); return false; } char* sz. Code = (char*)p. Code->Get. Buffer. Pointer(); hr. Err = m_p. Device->Create. Vertex. Shader((DWORD*)p. Code->Get. Buffer. Pointer(), &m_p. Vertex. Shader); p. Code->Release(); if(FAILED(hr. Err)) { Message. Box(NULL, "Create. Vertex. Shader failed", "CRenderer. DX 9: : Create", MB_OK|MB_ICONEXCLAMATION); return false; } m_p. Device->Set. Vertex. Shader (m_p. Vertex. Shader); NVIDIA Proprietary
Set Constants D 3 DXMATRIX mt. World; D 3 DXMATRIX mt. View; D 3 DXMATRIX mt. Proj; D 3 DXMATRIX mt. Worl. View; D 3 DXMATRIX mt. World. View. Proj; D 3 DXMatrix. Multiply(&mt. World. View, &mt. World, &mt. View); D 3 DXMatrix. Multiply. Transpose(&mt. World. View. Proj, & mt. Worl. View, & mt. Proj); m_p. Device->Set. Vertex. Shader. Constant. F(0, (float*)& mt. World. View. Proj, 4); NVIDIA Proprietary