Introduction to Open GL Joseph Kider University of

Introduction to Open. GL Joseph Kider University of Pennsylvania CIS 565 – Fall 2011 (Source: Patrick Cozzi)

Administrivia n Assignment 2 handed out ¨ Upgrade your video card drivers [NVIDIA | ATI]

Agenda Review Monday’s GLSL material n Open. GL n ¨ Shaders and uniforms ¨ Vertex arrays and buffers ¨ Multithreading n Review Assignment 1

GLSL Review n Rewrite with one if and one compare if (dist < w. Prime) { if (dist < closest. Distance) { closest. Distance = dist; } }

GLSL Review n Implement this concisely bool Point. Inside. Axis. Aligned. Bounding. Box(vec 3 p, vec 3 b 0, vec 3 b 1) { //. . . } b 1 p b 0 n Does your code also work for vec 2?

GLSL Review n n What is the difference between a fixed function and programmable stage? Vertex shader ¨ What n is its input? Output? Fragment shader ¨ What is its input? Output? ¨ [true | false] Fragment shaders allow you to change the xy position n [true | false] A best practice is to roll your own functions instead of calling library functions ¨ In general, build vs buy

Open. GL Is a C-based API n Is cross platform n Is run by the ARB: Architecture Review Board n Hides the device driver details n Open. GL vs Direct 3 D n ¨ Not going there – at least not on record

Open. GL n We are using GL 3. 3 core profile ¨ No fixed function vertex and fragment shading ¨ No legacy API calls: gl. Begin() n gl. Rotatef() n gl. Tex. Envf() n Alpha. Func() n… n Recall the fixed function light map Why was the alpha test remove?

Open. GL n Software stack: Application Open. GL API Device Driver GPU

Open. GL n Major objects: Framebuffers Vertex Arrays Textures Vertex Buffers Samplers Index Buffers Shader Programs Shader Objects n Pixel Buffers Fixed Function State We are not covering everything. Just surveying the most relevant parts for writing GLSL shaders

Shaders n Shader object: an individual vertex, fragment, etc. shader ¨ Are provided shader source code as a string ¨ Are compiled n Shader program: Multiple shader objects linked together

Shader Objects n Compile a shader object: const char *source = //. . . GLint source. Length = //. . . GLuint v = gl. Create. Shader(GL_VERTEX_SHADER); gl. Shader. Source(v, 1, &source. Length); gl. Compile. Shader(v); GLint compiled; gl. Get. Shaderiv(v, GL_COMPILE_STATUS, &compiled); // success: compiled == GL_TRUE //. . . gl. Delete. Shader(v);

Shader Objects n Compile a shader object: const char *source = //. . . GLint source. Length = //. . . Open. GL functions start with gl. Why? How would you design this in C++? GLuint v = gl. Create. Shader(GL_VERTEX_SHADER); gl. Shader. Source(v, 1, &source. Length); gl. Compile. Shader(v); v is an opaque object • What is it under the hood? • How would you design this in C++? GLint compiled; gl. Get. Shaderiv(v, GL_COMPILE_STATUS, &compiled); // success: compiled == GL_TRUE //. . . gl. Delete. Shader(v);

Shader Objects n Compile a shader object: const char *source = //. . . GLint source. Length = //. . . Provide the shader’s source code GLuint v = gl. Create. Shader(GL_VERTEX_SHADER); gl. Shader. Source(v, 1, &source. Length); gl. Compile. Shader(v); GLint compiled; gl. Get. Shaderiv(v, GL_COMPILE_STATUS, &compiled); // success: compiled == GL_TRUE //. . . gl. Delete. Shader(v); Where should the source come from? Why can we pass more than one string?

Shader Objects n Compile a shader object: const char *source = //. . . GLint source. Length = //. . . GLuint v = gl. Create. Shader(GL_VERTEX_SHADER); gl. Shader. Source(v, 1, &source. Length); gl. Compile. Shader(v); GLint compiled; gl. Get. Shaderiv(v, GL_COMPILE_STATUS, &compiled); // success: compiled == GL_TRUE //. . . gl. Delete. Shader(v); Compile, but what does the driver really do?

Shader Objects n Compile a shader object: const char *source = //. . . GLint source. Length = //. . . GLuint v = gl. Create. Shader(GL_VERTEX_SHADER); gl. Shader. Source(v, 1, &source. Length); gl. Compile. Shader(v); Good developers check for error. Again, how would you design this in C++? GLint compiled; gl. Get. Shaderiv(v, GL_COMPILE_STATUS, &compiled); // success: compiled == GL_TRUE //. . . gl. Delete. Shader(v); Calling gl. Get* has performance implications. Why?

Shader Objects n Compile a shader object: const char *source = //. . . GLint source. Length = //. . . GLuint v = gl. Create. Shader(GL_VERTEX_SHADER); gl. Shader. Source(v, 1, &source. Length); gl. Compile. Shader(v); GLint compiled; gl. Get. Shaderiv(v, GL_COMPILE_STATUS, &compiled); // success: compiled == GL_TRUE //. . . gl. Delete. Shader(v); Good developers also cleanup resources

Shader Objects n Compile a shader object: const char *source = //. . . GLint source. Length = //. . . GLuint v = gl. Create. Shader(GL_VERTEX_SHADER); gl. Shader. Source(v, 1, &source. Length); gl. Compile. Shader(v); GLint compiled; gl. Get. Shaderiv(v, GL_COMPILE_STATUS, &compiled); // success: compiled == GL_TRUE //. . . gl. Delete. Shader(v); This process is just like compiling an Open. CL kernel. We will see later this semester

Shader Programs n Link a shader program: GLuint v = gl. Create. Shader(GL_VERTEX_SHADER); GLuint f = gl. Create. Shader(GL_FRAGMENT_SHADER); //. . . GLuint p = gl. Create. Program(); gl. Attach. Shader(p, v); gl. Attach. Shader(p, f); gl. Link. Program(p); GLint linked; gl. Get. Shaderiv(p, GL_LINK_STATUS, &linked); // success: linked == GL_TRUE //. . . gl. Delete. Program(v);

Shader Programs n Link a shader program: GLuint v = gl. Create. Shader(GL_VERTEX_SHADER); GLuint f = gl. Create. Shader(GL_FRAGMENT_SHADER); //. . . GLuint p = gl. Create. Program(); gl. Attach. Shader(p, v); gl. Attach. Shader(p, f); gl. Link. Program(p); GLint linked; gl. Get. Shaderiv(p, GL_LINK_STATUS, &linked); // success: linked == GL_TRUE //. . . gl. Delete. Program(v); A program needs at least a vertex and fragment shader

Shader Programs n Link a shader program: GLuint v = gl. Create. Shader(GL_VERTEX_SHADER); GLuint f = gl. Create. Shader(GL_FRAGMENT_SHADER); //. . . GLuint p = gl. Create. Program(); gl. Attach. Shader(p, v); gl. Attach. Shader(p, f); gl. Link. Program(p); GLint linked; gl. Get. Shaderiv(p, GL_LINK_STATUS, &linked); // success: linked == GL_TRUE //. . . gl. Delete. Program(v);

Shader Programs n Link a shader program: GLuint v = gl. Create. Shader(GL_VERTEX_SHADER); GLuint f = gl. Create. Shader(GL_FRAGMENT_SHADER); //. . . GLuint p = gl. Create. Program(); gl. Attach. Shader(p, v); gl. Attach. Shader(p, f); gl. Link. Program(p); GLint linked; gl. Get. Shaderiv(p, GL_LINK_STATUS, &linked); // success: linked == GL_TRUE //. . . gl. Delete. Program(v); Be a good developer again

Using Shader Programs GLuint p = gl. Create. Program(); //. . . gl. Use. Program(p); gl. Draw*(); // * because there are lots of draw functions Part of the current state • How do you draw different objects with different shaders? • What is the cost of using multiple shaders? • How do you reduce the cost? • Hint: write more CPU code – really.

Using Shader Programs GLuint p = gl. Create. Program(); //. . . gl. Use. Program(p); gl. Draw*(); // * because there are lots of draw functions

Uniforms GLuint p = gl. Create. Program(); //. . . gl. Link. Program(p); GLuint m = gl. Get. Uniform. Location(p, “u_model. View. Matrix”); GLuint l = gl. Get. Uniform. Location(p, “u_light. Map”); gl. Use. Program(p); mat 4 matrix = //. . . gl. Uniform. Matrix 4 fv(m, 1, GL_FALSE, &matrix[0][0]); gl. Uniform 1 i(l, 0);

Uniforms GLuint p = gl. Create. Program(); //. . . gl. Link. Program(p); Each active uniform has an integer index location. GLuint m = gl. Get. Uniform. Location(p, “u_model. View. Matrix”); GLuint l = gl. Get. Uniform. Location(p, “u_light. Map”); gl. Use. Program(p); mat 4 matrix = //. . . gl. Uniform. Matrix 4 fv(m, 1, GL_FALSE, &matrix[0][0]); gl. Uniform 1 i(l, 0);

Uniforms GLuint p = gl. Create. Program(); //. . . gl. Link. Program(p); GLuint m = gl. Get. Uniform. Location(p, “u_model. View. Matrix”); GLuint l = gl. Get. Uniform. Location(p, “u_light. Map”); gl. Use. Program(p); mat 4 matrix = //. . . gl. Uniform. Matrix 4 fv(m, 1, GL_FALSE, &matrix[0][0]); gl. Uniform 1 i(l, 0); mat 4 is part of the C++ GLM library GLM: http: //www. g-truc. net/project-0016. html#menu

Uniforms GLuint p = gl. Create. Program(); //. . . gl. Link. Program(p); GLuint m = gl. Get. Uniform. Location(p, “u_model. View. Matrix”); GLuint l = gl. Get. Uniform. Location(p, “u_light. Map”); gl. Use. Program(p); gl. Uniform* for mat 4 matrix = //. . . gl. Uniform. Matrix 4 fv(m, 1, GL_FALSE, &matrix[0][0]); gl. Uniform 1 i(l, 0); Uniforms can be changed as often as needed, but are constant during a draw call Not transposing the matrix all sorts of datatypes

Uniforms GLuint p = gl. Create. Program(); //. . . gl. Link. Program(p); GLuint m = gl. Get. Uniform. Location(p, “u_model. View. Matrix”); GLuint l = gl. Get. Uniform. Location(p, “u_light. Map”); gl. Use. Program(p); Why not gl. Uniform*(p, …)? mat 4 matrix = //. . . gl. Uniform. Matrix 4 fv(m, 1, GL_FALSE, &matrix[0][0]); gl. Uniform 1 i(l, 0);

Drawing How do we transfer vertices from system memory to video memory? n How do we issue draw calls? n

Drawing n It doesn’t matter if we’re using: n Efficiently transferring data between the CPU and GPU is critical for performance.

Drawing n n n • 4 GB/s reads and writes • Theoretical 128 M 32 byte vertices/second Typical pre-Nahalem Intel System Separate system and video memory Need to transfer vertices from one to the other quickly Image from http: //arstechnica. com/hardware/news/2009/10/day-of-nvidia-chipset-reckoning-arrives. ars

Drawing n How good is 128 M vertices/second? Boeing 777 model: ~350 million polygons Image from http: //graphics. uni-sb. de/Massive. RT/boeing 777. html

Drawing n How good is 128 M vertices/second? Procedurally generated model of Pompeii: ~1. 4 billion polygons Image from http: //www. vision. ee. ethz. ch/~pmueller/wiki/City. Engine/Documents

Drawing n Open. GL has evolved since 1992 (GL 1. 0) ¨ Immediate mode ¨ Display lists ¨ Client-side vertex arrays ¨ Vertex buffer objects (VBOs)

Drawing: Immediate Mode GLfloat v 0[3] = { 0. 0 f, 0. 0 f }; //. . . gl. Begin(GL_TRIANGLES); gl. Vertex 3 fv(v 0); gl. Vertex 3 fv(v 1); gl. Vertex 3 fv(v 2); gl. Vertex 3 fv(v 3); gl. Vertex 3 fv(v 4); gl. Vertex 3 fv(v 5); gl. End(); Pro: really simple n What’s the con? n

Drawing: Display Lists GLuint dl = gl. Gen. Lists(1); gl. New. List(dl, GL_COMPILE); gl. Begin(GL_TRIANGLES); //. . . gl. End(); gl. End. List(); //. . . gl. Call. List(dl); //. . . gl. Delete. Lists(dl, 1);

Drawing: Display Lists GLuint dl = gl. Gen. Lists(1); gl. New. List(dl, GL_COMPILE); gl. Begin(GL_TRIANGLES); //. . . gl. End(); gl. End. List(); //. . . gl. Call. List(dl); //. . . gl. Delete. Lists(dl, 1); Create one display list, just like gl. Create. Shader creates a shader

Drawing: Display Lists GLuint dl = gl. Gen. Lists(1); gl. New. List(dl, GL_COMPILE); gl. Begin(GL_TRIANGLES); //. . . gl. End(); gl. End. List(); //. . . gl. Call. List(dl); //. . . gl. Delete. Lists(dl, 1); Open. GL commands between gl. New. List and gl. End. List are not executed immediately. Instead, they are compiled into the display list.

Drawing: Display Lists GLuint dl = gl. Gen. Lists(1); gl. New. List(dl, GL_COMPILE); gl. Begin(GL_TRIANGLES); //. . . gl. End(); gl. End. List(); //. . . gl. Call. List(dl); A single function call executes the display list. You can execute the same display list many times. n Pros ¨ ¨ n Cons Compiling is slow. How do you support dynamic data? ¨ Usability: what is compiled into a display list and what isn’t? ¨ //. . . gl. Delete. Lists(dl, 1); Little function call overhead Optimized compiling: stored in video memory, perhaps vertex cache optimized, etc.

Drawing: Display Lists GLuint dl = gl. Gen. Lists(1); gl. New. List(dl, GL_COMPILE); gl. Begin(GL_TRIANGLES); //. . . gl. End(); gl. End. List(); //. . . gl. Call. List(dl); //. . . gl. Delete. Lists(dl, 1); You guys are good developers

Drawing: Client-side Vertex Arrays n Point GL to an array in system memory GLfloat vertices[] = {. . . }; // 2 triangles = 6 vertices = 18 floats gl. Enable. Client. State(GL_VERTEX_ARRAY); gl. Vertex. Pointer(3, GL_FLOAT, 0, vertices); gl. Draw. Arrays(GL_TRIANGLES, 0, 18); gl. Disable. Client. State(GL_VERTEX_ARRAY);

Drawing: Client-side Vertex Arrays Store vertices in an array GLfloat vertices[] = {. . . }; // 2 triangles = 6 vertices = 18 floats gl. Enable. Client. State(GL_VERTEX_ARRAY); gl. Vertex. Pointer(3, GL_FLOAT, 0, vertices); gl. Draw. Arrays(GL_TRIANGLES, 0, 18); gl. Disable. Client. State(GL_VERTEX_ARRAY);

Drawing: Client-side Vertex Arrays GLfloat vertices[] = {. . . }; // 2 triangles = 6 vertices = 18 floats gl. Enable. Client. State(GL_VERTEX_ARRAY); gl. Vertex. Pointer(3, GL_FLOAT, 0, vertices); gl. Draw. Arrays(GL_TRIANGLES, 0, 18); gl. Disable. Client. State(GL_VERTEX_ARRAY); Ugh, tell GL we have vertices (positions, actually) • Managing global state is painful

Drawing: Client-side Vertex Arrays GLfloat vertices[] = {. . . }; // 2 triangles = 6 vertices = 18 floats gl. Enable. Client. State(GL_VERTEX_ARRAY); gl. Vertex. Pointer(3, GL_FLOAT, 0, vertices); gl. Draw. Arrays(GL_TRIANGLES, 0, 18); gl. Disable. Client. State(GL_VERTEX_ARRAY); Pointer to our vertices

Drawing: Client-side Vertex Arrays GLfloat vertices[] = {. . . }; // 2 triangles = 6 vertices = 18 floats gl. Enable. Client. State(GL_VERTEX_ARRAY); gl. Vertex. Pointer(3, GL_FLOAT, 0, vertices); gl. Draw. Arrays(GL_TRIANGLES, 0, 18); gl. Disable. Client. State(GL_VERTEX_ARRAY); Stride, in bytes, between vertices. 0 means tightly packed.

Drawing: Client-side Vertex Arrays GLfloat vertices[] = {. . . }; // 2 triangles = 6 vertices = 18 floats gl. Enable. Client. State(GL_VERTEX_ARRAY); gl. Vertex. Pointer(3, GL_FLOAT, 0, vertices); gl. Draw. Arrays(GL_TRIANGLES, 0, 18); gl. Disable. Client. State(GL_VERTEX_ARRAY); Each vertex has 3 floating point components

Drawing: Client-side Vertex Arrays GLfloat vertices[] = {. . . }; // 2 triangles = 6 vertices = 18 floats gl. Enable. Client. State(GL_VERTEX_ARRAY); gl. Vertex. Pointer(3, GL_FLOAT, 0, vertices); gl. Draw. Arrays(GL_TRIANGLES, 0, 18); Draw in a single GL call gl. Disable. Client. State(GL_VERTEX_ARRAY); n n Pro: little function call overhead Con: bus traffic

Drawing: Vertex Buffer Objects VBO: Vertex Buffer Object n Like client-side vertex arrays, but: n ¨ Stored in driver-controlled memory, not an array in your application ¨ Provide hints to the driver about how you will use the buffer n VBOs are the only way to store vertices in GL 3. 3 core profile. The others are deprecated We can use textures, but let’s not jump ahead

Drawing: Vertex Buffer Objects GLuint vbo; GLfloat* vertices = new GLfloat[3 * number. Of. Vertices]; gl. Gen. Buffers(1, &vbo); gl. Bind. Buffer(GL_ARRAY_BUFFER_ARB, vbo); gl. Buffer. Data(GL_ARRAY_BUFFER_ARB, number. Of. Bytes, vertices, GL_STATIC_DRAW_ARB); // Also check out gl. Buffer. Sub. Data delete [] vertices; gl. Delete. Buffers(1, &vbo);

Drawing: Vertex Buffer Objects GLuint vbo; GLfloat* vertices = new GLfloat[3 * number. Of. Vertices]; gl. Gen. Buffers(1, &vbo); gl. Bind. Buffer(GL_ARRAY_BUFFER_ARB, vbo); gl. Buffer. Data(GL_ARRAY_BUFFER_ARB, number. Of. Bytes, vertices, GL_STATIC_DRAW_ARB); // Also check out gl. Buffer. Sub. Data delete [] vertices; gl. Delete. Buffers(1, &vbo); Copy from application to driver-controlled memory. GL_STATIC_DRAW should imply video memory.

Drawing: Vertex Buffer Objects GLuint vbo; GLfloat* vertices = new GLfloat[3 * number. Of. Vertices]; gl. Gen. Buffers(1, &vbo); gl. Bind. Buffer(GL_ARRAY_BUFFER_ARB, vbo); gl. Buffer. Data(GL_ARRAY_BUFFER_ARB, number. Of. Bytes, vertices, GL_STATIC_DRAW_ARB); // Also check out gl. Buffer. Sub. Data delete [] vertices; gl. Delete. Buffers(1, &vbo); n n Does gl. Buffer. Data block? Does gl. Buffer. Sub. Data block?

Drawing: Vertex Buffer Objects n Usage Hint ¨ Static: 1 -to-n update-to-draw ratio ¨ Dynamic: n-to-m update to draw (n < m) ¨ Stream: 1 -to-1 update to draw n It’s a hint. Do drivers take it into consideration?

Drawing: Vertex Buffer Objects Map a pointer to driver-controlled memory • Also map just a subset of the buffer Image from http: //developer. nvidia. com/object/using_VBOs. html

Drawing: Vertex Buffer Objects n In general: Say no to drugs too, please. Immediate Mode VBOs Image from: http: //upgifting. com/tmnt-pizza-poster

Vertex Array Objects n VBOs are just buffers ¨ Raw bytes ¨ VAOs: Vertex Array Objects Interpret VBOs as actual vertices n Used when issuing gl. Draw* n n You are not responsible for the implementation details

VBO Layouts Separate Buffers Non-interleaved Buffer Images courtesy of A K Peters, Ltd. www. virtualglobebook. com

VBO Layouts: Tradeoffs n Separate Buffers ¨ Flexibility, n n n Combination of static and dynamic buffers Multiple objects share the same buffer Non-interleaved Buffer ¨ How n is the memory coherence? Interleaved Buffer ¨ Faster n n e. g. : for static buffers Proportional to the number of attributes Hybrid?

Vertex Throughput: VBO Layouts 64 k triangles per batch and n 4 -ﬂoat texture coordinates Image from http: //www. sci. utah. edu/~csilva/papers/thesis/louis-bavoil-ms-thesis. pdf

Vertex Throughput: Batching Image from http: //www. sci. utah. edu/~csilva/papers/thesis/louis-bavoil-ms-thesis. pdf

Vertex Throughput: Batching Making lots of gl. Draw* calls is slow. Why? Image from http: //www. sci. utah. edu/~csilva/papers/thesis/louis-bavoil-ms-thesis. pdf

Vertex Throughput Tips Optimize for the Vertex Caches n Use smaller vertices n ¨ Use less precision, e. g. , half instead of float ¨ Compress, then decompress in vertex shader ¨ Pack, then unpack in vertex shader ¨ Derive attributes or components from other attributes ¨ How many components do you need to store a normal?

Vertex Throughput Tips n Know your architecture! Image from http: //www. sci. utah. edu/~csilva/papers/thesis/louis-bavoil-ms-thesis. pdf

Vertex Throughput Tips n Know your architecture! GL_SHORT faster on NVIDIA… …slower on ATI Image from http: //www. sci. utah. edu/~csilva/papers/thesis/louis-bavoil-ms-thesis. pdf

Vertex Throughput Tips n Know your architecture! Image from http: //www. sci. utah. edu/~csilva/papers/thesis/louis-bavoil-ms-thesis. pdf

Vertex Throughput Tips n Know your architecture! n GL_SHORT normals faster than GL_FLOAT on NVIDIA But not ATI ¨ Still true today? ¨ Image from http: //www. sci. utah. edu/~csilva/papers/thesis/louis-bavoil-ms-thesis. pdf

Vertex Throughput Tips n Know your architecture! n GL_BYTE normals use less memory than GL_SHORT or GL_FLOAT but are slower Why? ¨ Still true today? ¨ Image from http: //www. sci. utah. edu/~csilva/papers/thesis/louis-bavoil-ms-thesis. pdf

Vertex Throughput Tips n Know your architecture! Image from http: //www. sci. utah. edu/~csilva/papers/thesis/louis-bavoil-ms-thesis. pdf

Vertex Throughput Tips n Know your architecture! Do you believe me yet?

Multithreaded Rendering n Quake 4 CPU usage ¨ 41% - driver ¨ 49% - engine n Split render work into two threads: Image from http: //mrelusive. com/publications/presentations/2008_gdc/GDC%2008%20 Threading%20 QUAKE%204%20 and%20 ETQW%20 Final. pdf

Multithreaded Rendering n Tradeoffs ¨ Throughput vs latency ¨ Memory usage – double buffering n Cache pollution ¨ Synchronization ¨ Single n core machines DOOM III era

Multithreaded Open. GL Drivers Driver CPU overhead is moved to a separate core n Application remains unchanged n What happens when you call gl. Get*? n Image from http: //developer. apple. com/library/mac/#technotes/tn 2006/tn 2085. html

Not Covered Today Textures n Framebuffers n State management n… n Useful for GPGPU – and graphics, obviously

Class Poll n n n Multithreaded graphics engine design class? More graphics-related classes? Itching to get to GPGPU and GPU computing?

Open. GL Resources n Open. GL/GLSL ¨ http: //www. khronos. org/files/opengl-quick-reference-card. pdf n Open. GL ¨ Spec http: //www. opengl. org/registry/doc/glspec 33. core. 20100311. pdf n Open. GL ¨ Quick Reference Card Forums http: //www. opengl. org/discussion_boards/