Скачать презентацию Introduction to Vertex Shaders Richard Huddy Richard H nvidia Скачать презентацию Introduction to Vertex Shaders Richard Huddy Richard H nvidia

Introduction_DX8_Vertex_Shaders.ppt

  • Количество слайдов: 45

Introduction to Vertex Shaders Richard Huddy Richard. H@nvidia. com NVIDIA Proprietary Introduction to Vertex Shaders Richard Huddy Richard. H@nvidia. com NVIDIA Proprietary

What you guys have been asking for… • Complete control of the transformation and What you guys have been asking for… • Complete control of the transformation and lighting pipeline • Custom vertex lighting • Custom skinning and blending • Custom texgen • Custom texture matrix operations NVIDIA Proprietary

Enter the Vertex Shader • Assembly language interface to the transformation and lighting engine Enter the Vertex Shader • Assembly language interface to the transformation and lighting engine • Instruction set to perform all vertex Tn. L • Constant table to store data (matrices, light position, attenuation, etc) • Registers to save intermediate data • Reads an untransformed, unlit vertex • Creates a transformed and lit vertex NVIDIA Proprietary

Assembly language • • Fixed, complete, very powerful SIMD instruction set Four operations simultaneously Assembly language • • Fixed, complete, very powerful SIMD instruction set Four operations simultaneously (argb, xyzw) Dynamically loaded between primitive calls Extensive support for vector and matrix operations (lighting, rotations, etc. ) • Capable of efficiently implementing the entire functionality of Fixed Function Pipeline NVIDIA Proprietary

Custom Substitute for Standard T&L Vertex Input Constant Memory 128 bits 4 floats 16 Custom Substitute for Standard T&L Vertex Input Constant Memory 128 bits 4 floats 16 entries 128 instructions 12 entries Vertex Output 128 bits 4 floats 13 entries NVIDIA Proprietary A 0 data 128 bits 4 floats Vertex Shader addr Registers 128 bits 4 floats 96 entries

What does it do? • Per vertex calculation • Processing of: • Colors 3 What does it do? • Per vertex calculation • Processing of: • Colors 3 D coordinates - procedural geometry, blending, morphing, deformations • Texture coordinates – texgens, set up for pixel shaders, tangent space bumpmap setup • Fog – elevation based, volume based • • Shader program accepts one input vertex, generates one output vertex NVIDIA Proprietary

What doesn’t it do? • Does not perform polygon based operations Back face culling What doesn’t it do? • Does not perform polygon based operations Back face culling • Occlusion culling • • Can’t write to other vertices • Does not create vertices NVIDIA Proprietary

What is calculated? • Create a completely specified vertex. • Vertex position in HCLIP What is calculated? • Create a completely specified vertex. • Vertex position in HCLIP space • And, optionally: • Texgen/texture matrix/texture coord output • Lighting/color output • Fog NVIDIA Proprietary

Then what happens? • • • Frustum clip Homogenous divide Viewport Mapping Back Face Then what happens? • • • Frustum clip Homogenous divide Viewport Mapping Back Face Cull Rasterization NVIDIA Proprietary

Flexible Input Sign and Muxing X Z W X [-1 ] Y Y Z Flexible Input Sign and Muxing X Z W X [-1 ] Y Y Z W Vxxxx Vzxyw X X X Y NVIDIA Proprietary Z W X Y Y Z W

Swizzles Source registers can be swizzled: MOV R 1, R 2. yzwx; before R Swizzles Source registers can be swizzled: MOV R 1, R 2. yzwx; before R 1 0. 0 after R 2 x y 7. 0 z 3. 0 6. 0 w 2. 0 NVIDIA Proprietary R 2 R 1 x y 3. 0 z 6. 0 2. 0 w 7. 0 x y 7. 0 z 3. 0 6. 0 w 2. 0 x y z w

Negations Source registers can be negated (and swizzled): MOV R 1, -R 2. yzzx; Negations Source registers can be negated (and swizzled): MOV R 1, -R 2. yzzx; before R 1 0. 0 after R 2 x y 7. 0 z 3. 0 6. 0 w 2. 0 NVIDIA Proprietary R 2 R 1 x y -3. 0 z -6. 0 w -7. 0 x y 7. 0 z 3. 0 6. 0 w 2. 0 x y z w

Cross Product • i j k • R 0. x R 0. y R Cross Product • i j k • R 0. x R 0. y R 0. z = (R 0. y*R 1. z – R 1. y*R 0. z)i + • R 1. x R 1. y R 1. z (R 0. z*R 1. x – R 1. z*R 0. x)j + • (R 0. x*R 1. y – R 1. x*R 0. y)k; • Or (R 0. yzx * R 1. zxy – R 1. yzx*R 0. zxy) • MUL R 2, R 0. yzxw, R 1. zxyw; • MAD R 2, -R 1. yzxw, R 0. zxyw, R 2 NVIDIA Proprietary

Masks Destination register can mask which components are written to… R 1 write all Masks Destination register can mask which components are written to… R 1 write all components R 1. x write only x component R 1. xw write only x, w components NVIDIA Proprietary

The “Constant Area” • 96 entries, each is a Vec 4. Typical uses: • The “Constant Area” • 96 entries, each is a Vec 4. Typical uses: • • • Matrix data - 4 of Vec 4’s are typically the matrix for the transform Light characteristics, (position, attenuation etc) Current time Vertex interpolation data Procedural data Only one constant per instruction (but you can use it several times if you want) E. g. mad r 1, v 0, C[95] Note that this is how you access all constant data NVIDIA Proprietary

Input Vertex Data • 16 Vec 4’s from your own Vertex. Buffer(s) • Input Input Vertex Data • 16 Vec 4’s from your own Vertex. Buffer(s) • Input vertex is completely flexible “Weakly typed” – meaning it’s up to you to interpret it consistently • Position, normal, texture coordinates etc. NVIDIA Proprietary

Vertex output data • A well defined vertex HCLIP(x, y, z, w) - o. Vertex output data • A well defined vertex HCLIP(x, y, z, w) - o. Pos • Diffuse color (r, g, b, a) -> 0. 0 to +1. 0 – o. D 0 • Specular color (r, g, b, a) -> 0. 0 to +1. 0 – o. D 1 • • Up to 4 Texture coordinates (each as s, t, r, q) • One for each physical hardware texture unit – o. T 0 o. T 7 • Fog (f, *, *, *) -> value used in fog equation - o. Fog • Outputs of shader clamped as required NVIDIA Proprietary

Instruction format Generally of the form: Op. Name dest, [-]s 1 [, [-]s 2 Instruction format Generally of the form: Op. Name dest, [-]s 1 [, [-]s 2 [, [-]s 3]] ; comment e. g. mov mad r 1, r 2, r 3, r 4 Destination ‘r’ can have a write-mask Source ‘r’ can be swizzled e. g. mov r 1. x, r 2. y mov r 1, r 2. zxyw ‘[’ and ‘]’ indicate optional modifiers NVIDIA Proprietary

What are the instructions? • nop • mov • mul • mad • add What are the instructions? • nop • mov • mul • mad • add • rsq NVIDIA Proprietary • dp 3 • dp 4 • dst • lit • min • max • slt • sge • expp • log • rcp

nop, mov, mul • nop • Do nothing • mov • Move (with conditional nop, mov, mul • nop • Do nothing • mov • Move (with conditional sign change, mask and swizzle) • mul • dest, src 1, src 2 Set dest to the product of src 1 and src 2 NVIDIA Proprietary

add, mad, rsq • add • Add src 1 to src 2. [And the add, mad, rsq • add • Add src 1 to src 2. [And the optional negation creates subtraction] • mad • dest, src 1, src 2, src 3 Multiply src 1 by src 2 and add src 3 - into dst • rsq dest, src Source must have one subscript… • dest. x = dest. y = dest. z = dest. w = 1/sqrt(src) • • Reciprocal square root of src (much more useful than straight ‘square root’). NVIDIA Proprietary

dp 3, dp 4 • 3 and 4 Component dot products • dp 3 dp 3, dp 4 • 3 and 4 Component dot products • dp 3 dest, src 1, src 2 dest. x = dest. y = dest. z = dest. w = • (src 1. x * src 2. x) + • • • (src 1. y * src 2. y) + (src 1. z * src 2. z) • And dp 4 does the same but includes ‘w’ in the computation NVIDIA Proprietary

min, max • min • Component-wise min operation • max • dest, src 1, min, max • min • Component-wise min operation • max • dest, src 1, src 2 Component-wise max operation NVIDIA Proprietary

slt, sge • slt dest, src 1, src 2 dest = (src 1 < slt, sge • slt dest, src 1, src 2 dest = (src 1 < src 2) ? 1 : 0 • For each component… • • sge • • • dest, src 1, src 2 dst = (src 1 >= src 2) ? 1 : 0 Which is equivalent to… dst = (src 1 < src 2) ? 0 : 1 i. e. the exact opposite of slt For each component… NVIDIA Proprietary

dst • dst dest, src 1, src 2 Calculate distance vector. src 1 vector dst • dst dest, src 1, src 2 Calculate distance vector. src 1 vector is (NA, d*d, NA) and src 2 is (NA, 1/d, NA, 1/d). • dest is set to (1, d, d*d, 1/d) • • Which is what you want for standard attenuation… NVIDIA Proprietary

lit • lit dest, src Calculates lighting coefficients from two dot products and a lit • lit dest, src Calculates lighting coefficients from two dot products and a power. src is: • src. x = n • l (unit normal and light vectors) • • • src. y = n • h (unit normal and halfangle vectors) src. z is unused src. w = power (in range +128 to – 128) dest set to (1. 0, src. x, L, 1. 0) • If src. x > 0. 0 src. w • L = (MAX(src. y, 0) • else L = 0 NVIDIA Proprietary

expp, log • expp dest, src. w dest. x = 2 ** (int)src. w expp, log • expp dest, src. w dest. x = 2 ** (int)src. w • dest. y = fractional part (src. w) • dest. z = 2 ** src. w • • dest. w = 1. 0 • log dest, src. w dest. x = exponent((int)src. w) • dest. y = mantissa(src. w) • dest. z = log 2(src. w) • dest. w = 1. 0 • NVIDIA Proprietary

rcp • rcp dest, src. w Source must have just one subscript (x, y, rcp • rcp dest, src. w Source must have just one subscript (x, y, z or w) • dest. x = dest. y = dest. z = dest. w = • 1 / src. w So… this is the other half of the puzzle for division • … you divide by doing a ‘rcp’ and then a ‘mul’ • NVIDIA Proprietary

Raise to the Power ; compute scalar r 0. z = r 1. x^r Raise to the Power ; compute scalar r 0. z = r 1. x^r 1. y LIT r 0. z, r 1. xxyy ; r 1. x must be greater than zero NVIDIA Proprietary

Absolute Value ; r 0 = |r 1| MAX r 0, r 1, -r Absolute Value ; r 0 = |r 1| MAX r 0, r 1, -r 1 NVIDIA Proprietary

Division ; scalar r 0. x = r 1. x/r 2. x RCP r Division ; scalar r 0. x = r 1. x/r 2. x RCP r 0. x, r 2. x MUL r 0. x, r 1. x, r 0. x NVIDIA Proprietary

Square Root ; scalar r 0. x = sqrt(r 1. x) RSQ r 0. Square Root ; scalar r 0. x = sqrt(r 1. x) RSQ r 0. x, r 1. x ; using x/sqrt(x) = sqrt(x) is higher MUL r 0. x, r 1. x ; precision than 1/( 1/sqrt(x) ) NVIDIA Proprietary

Set on Less or Equal ; r 0 = (r 1 <= r 2) Set on Less or Equal ; r 0 = (r 1 <= r 2) ? 1 : 0 SGE r 0, -r 1, -r 2 NVIDIA Proprietary

Set on Greater ; r 0 = (r 1 > r 2) ? 1 Set on Greater ; r 0 = (r 1 > r 2) ? 1 : 0 SLT r 0, -r 1, -r 2 NVIDIA Proprietary

Set on Equal ; r 0 = (r 1 == r 2) ? 1 Set on Equal ; r 0 = (r 1 == r 2) ? 1 : 0 SGE r 0, -r 1, -r 2 SGE r 2, r 1, r 2 MUL r 0, r 2 NVIDIA Proprietary

Set on Not Equal ; r 0 = (r 1 != r 2) ? Set on Not Equal ; r 0 = (r 1 != r 2) ? 1 : 0 SLT r 0, r 1, r 2 SLT r 2, -r 1, -r 2 ADD r 0, r 2 NVIDIA Proprietary

Clamp to [0, 1] Range ; compute r 0 = (r 0 < 0) Clamp to [0, 1] Range ; compute r 0 = (r 0 < 0) ? 0 : (r 0 1) ? 1 : r 0 DEF c 0, 0. 0 f, 1. 0 f, 0. 0 f MAX r 0, c 0. x MIN r 0, c 0. y NVIDIA Proprietary

Compute the Floor ; scalar r 0. y = floor(r 1. y) EXPP r Compute the Floor ; scalar r 0. y = floor(r 1. y) EXPP r 0. y, r 1. y ADD r 0. y, r 1. y, - r 0. y NVIDIA Proprietary

Compute the Ceiling ; scalar r 0. y = ceiling(r 1. y) EXPP r Compute the Ceiling ; scalar r 0. y = ceiling(r 1. y) EXPP r 0. y, -r 1. y ADD r 0. y, r 1. y, r 0. y NVIDIA Proprietary

How do I branch? • No branching, no early out • Why? Performance and How do I branch? • No branching, no early out • Why? Performance and predictability • No execution dependencies • • You can multiply by zero, and accumulate NVIDIA Proprietary

Example Vertex Declaration struct POSCOLORVERTEX { FLOAT x, y, z; DWORD diff. Color; }; Example Vertex Declaration struct POSCOLORVERTEX { FLOAT x, y, z; DWORD diff. Color; }; D 3 DVERTEXELEMENT 9 dw. Decl 3[] = { {0, 0, D 3 DDECLTYPE_FLOAT 3, D 3 DDECLMETHOD_DEFAULT, D 3 DDECLUSAGE_POSITION, 0}, {0, 12, D 3 DDECLTYPE_D 3 DCOLOR, D 3 DDECLMETHOD_DEFAULT, D 3 DDECLUSAGE_COLOR, 0}, D 3 DDECL_END() }; LPDIRECT 3 DVERTEXDECLARATION 9 m_p. Vertex. Declaration; g_d 3 d. Device->Create. Vertex. Declaration(dw. Decl 3, &m_p. Vertex. Declaration); m_pd 3 d. Device->Set. Vertex. Declaration(m_p. Vertex. Declaration); NVIDIA Proprietary

Example Vertex Declaration struct POSCOLORVERTEX { FLOAT x, y, z; DWORD diff. Color; float Example Vertex Declaration struct POSCOLORVERTEX { FLOAT x, y, z; DWORD diff. Color; float tu 0, tv 0; }; D 3 DVERTEXELEMENT 9 dw. Decl 3[] = { {0, 0, D 3 DDECLTYPE_FLOAT 3, D 3 DDECLMETHOD_DEFAULT, D 3 DDECLUSAGE_POSITION, 0}, {0, 12, D 3 DDECLTYPE_D 3 DCOLOR, D 3 DDECLMETHOD_DEFAULT, D 3 DDECLUSAGE_COLOR, 0}, {0, 16, D 3 DDECLTYPE_FLOAT 2, D 3 DDECLMETHOD_DEFAULT, D 3 DDECLUSAGE_TEXCOORD, 0}, D 3 DDECL_END() }; LPDIRECT 3 DVERTEXDECLARATION 9 m_p. Vertex. Declaration; g_d 3 d. Device->Create. Vertex. Declaration(dw. Decl 3, &m_p. Vertex. Declaration); m_pd 3 d. Device->Set. Vertex. Declaration(m_p. Vertex. Declaration); NVIDIA Proprietary

Example Vertex Sahder vs_1_1 dcl_position v 0 dcl_color 0 v 1 dcl_texcoord v 2 Example Vertex Sahder vs_1_1 dcl_position v 0 dcl_color 0 v 1 dcl_texcoord v 2 #define CV_WORLDVIEWPROJ 0 0 #define CV_WORLDVIEWPROJ 1 1 #define CV_WORLDVIEWPROJ 2 2 #define CV_WORLDVIEWPROJ 3 3 def c 4, 1, 1 ; transform to clip space dp 4 o. Pos. x, v 0, c[WORLDVIEWPROJ 0] dp 4 o. Pos. y, v 0, c[WORLDVIEWPROJ 1] dp 4 o. Pos. z, v 0, c[WORLDVIEWPROJ 2] dp 4 o. Pos. w, v 0, c[WORLDVIEWPROJ 3] ; write out color dp 3 o. D 0, v 1, c 4 ; write texture coords mov o. T 0. xy, v 2 NVIDIA Proprietary

Create Vertex Sahder DWORD dw. Flags = 0; dw. Flags |= D 3 DXSHADER_DEBUG; Create Vertex Sahder DWORD dw. Flags = 0; dw. Flags |= D 3 DXSHADER_DEBUG; LPD 3 DXBUFFER p. Code = NULL; LPD 3 DXBUFFER p. Errors = NULL; LPDIRECT 3 DVERTEXSHADER 9 m_p. Vertex. Shader = NULL; HRESULT hr. Err = D 3 DXAssemble. Shader. From. File("dx 9/vshader. vsh", NULL, dw. Flags, &p. Code, &p. Errors); if(p. Errors) { char* sz. Errors = (char*)p. Errors->Get. Buffer. Pointer(); p. Errors->Release(); } if(FAILED(hr. Err)) { Message. Box(NULL, "vertex shader creation failed", "CRenderer. DX 9: : Create", MB_OK|MB_ICONEXCLAMATION); return false; } char* sz. Code = (char*)p. Code->Get. Buffer. Pointer(); hr. Err = m_p. Device->Create. Vertex. Shader((DWORD*)p. Code->Get. Buffer. Pointer(), &m_p. Vertex. Shader); p. Code->Release(); if(FAILED(hr. Err)) { Message. Box(NULL, "Create. Vertex. Shader failed", "CRenderer. DX 9: : Create", MB_OK|MB_ICONEXCLAMATION); return false; } m_p. Device->Set. Vertex. Shader (m_p. Vertex. Shader); NVIDIA Proprietary

Set Constants D 3 DXMATRIX mt. World; D 3 DXMATRIX mt. View; D 3 Set Constants D 3 DXMATRIX mt. World; D 3 DXMATRIX mt. View; D 3 DXMATRIX mt. Proj; D 3 DXMATRIX mt. Worl. View; D 3 DXMATRIX mt. World. View. Proj; D 3 DXMatrix. Multiply(&mt. World. View, &mt. World, &mt. View); D 3 DXMatrix. Multiply. Transpose(&mt. World. View. Proj, & mt. Worl. View, & mt. Proj); m_p. Device->Set. Vertex. Shader. Constant. F(0, (float*)& mt. World. View. Proj, 4); NVIDIA Proprietary