# Optimizing Ray Marching through Partial Evaluation

I love ray tracing as a 3D rendering technique. I love it because in its purest form, it’s very elegant and beautiful. It’s a simulation of light and photons flowing through an environment. Because of its physically-based nature, ray tracing can produce very realistic images from very simple code. You can render impressive scenes with just a few hundred lines of C. This is in stark contrast with state of the art polygon rasterization techniques used in modern 3D engines, which make use of a huge amount of very complex hacks to get something that tries to look somewhat realistic.

Traditional Whitted-style ray tracing is done with intersection tests, that is, the scene is rendered by testing if rays coming out of the camera and going into the scene intersect with various objects. With this style of rendering, in order to be able to render a 3D object, you need to be able to write a piece of code that checks whether or not a given ray intersects with the object, and if so, where the nearest intersection occurs (there might be multiple). This is relatively straightforward if you want to render something like a sphere, but it quickly gets more complicated when rendering more complex parametric surfaces.

Recently, I fell in love when I discovered a ray tracing technique that in many ways is even simpler yet more powerful than classical ray tracing. The technique, known as ray marching, or sphere tracing, makes it possible to render a scene that is directly defined by a Signed Distance Function (SDF). This is a function that, given a point in space, returns the closest distance to any point that is part of the scene. The distance is positive if the point is outside of scene objects, zero at the boundary, and negative inside scene objects.

The beauty of this is that SDFs are extremely simple to define. For instance, the signed distance from a point to a sphere is simply the distance between the point and the center of the sphere, minus the sphere’s radius. Better yet, SDFs can be combined together in very simple ways as well. You can produce the union of two objects by computing the minimum of both of their SDFs, and the intersection by computing the maximum. You can also do funky things such as infinitely repeating objects using modulo operators. SDFs truly have a lot of expressive power due to their combinatorial nature.

The demoscene group Mercury has produced some amazing demos of 3D animations based on the rendering of SDFs on the GPU. These demos are entirely procedural and fit in a tiny 64 kilobytes! The Timeless demo, in particular, showcases the kinds of spatial transformations and deformations of space that can be done when your scene is entirely defined by a mathematical function, as opposed to polygons. There are many interesting SDF code examples on ShaderToy you’re interested in playing with them.

Hopefully, by this point, I’ve convinced you that SDFs and ray marching are insanely cool. It’s also pretty amazing that modern GPUs are now fast and flexible enough that you can render fairly interesting SDFs in real-time. Unfortunately, SDFs remain expensive enough to render that it’s still tricky to render complex scenes. I think it would be great to live in a world where SDFs and real-time ray tracing could replace polygons, but it seems we aren’t quite there yet.

I spent some time thinking about SDFs, and it got me thinking: there’s got to be a way to make these faster. If you think about it, rendering an SDF is costly because there’s quite a bit of computations going on at every pixel, many evaluations of a potentially complex SDF. This computational work, however, is highly redundant. Do you really need to evaluate the entire SDF for every single pixel? Ray marching is very much a brute force technique. There just has to be a more efficient way to go about this.

SDFs are mathematical functions, pieces of code that get repeatedly evaluated. I come from a compiler background, and so I started wondering if, potentially, compiler optimizations could be applied to these SDFs. What if, for instance, we could apply partial evaluation to optimize the said distance functions and make them run faster? I did some research, and it turns out, unsurprisingly, that I wasn’t the first one to think of such a concept. There has already been work in applying partial evaluation to ray tracing and applying partial evaluation to the optimization of shaders.

The work that’s been done on applying partial evaluation to ray tracing, in my opinion, doesn’t go far enough. The authors have essentially partially evaluated the ray tracing algorithm and specialized it in function of the objects present in a given 3D scene. What I think would be very useful, is to specialize the evaluation of SDFs in function of the camera position. What if, for instance, you could mathematically prove that certain objects in your scene are not present in the left half of the view frustum. That is, what if you could prove that most of the objects in your scene are not visible on the left half of the screen?

It seems like it would be relatively straightforward to recursively subdivide the image and view frustum into multiple quadrants and determine which objects will definitely not be visible in each. I know it can be done, in fact, directly by using the definition of SDFs, without very fancy tricks. If you could do this, then you could generate a separate, optimized SDF for smaller fractions of the image to be rendered. What I’m essentially proposing is to generate optimized pieces of shader code on the fly for smaller areas of the screen, which can be evaluated much faster than the SDF for an entire scene.

I don’t know how viable it is to compile and run fresh shader code every time the camera moves on a GPU, but I believe it might actually be possible, using this kind of optimization, to render SDFs on a multicore desktop CPU at decent frame rates.

I think an octree could be a good starting point for the optimization idea you have.

Yes, it isn’t quite the same though, traversing rays through an octree for each pixel has a cost, touches a fair amount of memory. Fitting complex objects into an octree is feasible but possibly nontrivial also.

Actually using an BSP Tree(Binary Space Partition) is a piece of cake, especially if the sphere casting works. First of all, you don’t need to traverse the rays. You just need to test weather a the point of the ray march is in this part of the tree. Just one dot product at each node of the tree as you traverse to the leaf. To create the BSP tree calculate the center of each half of the BSP sub tree. Then for each object calculate the ray march sphere, if the sphere is larger then that half of the BSP sub tree then that half of the tree doesn’t contain the object. Repeat creating sub trees until the lists in each half are the same or a certain depth is hit.

That is the simplest ray acceleration I have seen in my 3 decades of working in computer graphics.

-Kurt

No doubt that using spatial subdivision is effective, except that it involves some amount of work for each pixel (actually, for each ray traced, which is going to be more than the number of pixels), which is expensive when you have many millions of pixels.

Actually, the more rays, the faster the speed up. The performance penalty comes in two ways. The initial creation of the oct/BSP tree, and use during each ray. The basic theory is the fasster partial SDF equations at each leaf node will more then makeup for the cost of traversing the tree for each ray. Since the performance benefit is per ray, you need to run greater then N rays to make up the cost of creating the tree.

Interesting side note. Shorter wider trees should be better. Less pointer chasing. Hence faster.

I recently found a new “graphics” technique along these lines, that is literally rocket science and Oscar worthy. I have to track down all the links and videos, so I be back in a day or three.

-Kurt

smells like ~dynamic programming

You can do this kind of partial evaluation with interval arithmetic. If you know that A < B within a particular interval, you can skip evaluating B in the operation min(A, B). For details, see "Interval Arithmetic and Recursive Subdivision for Implicit Functions and Constructive Solid Geometry", Duff 1992.

I've written a few CAD systems that use SDFs and functional representations. The most polished is at http://www.mattkeeter.com/projects/antimony

When I last tried GPU acceleration, the overhead of sending the simplified programs to the GPU ended up eating all of the gains from parallellization, but I'd be interested to see how your experiments turn out.

Thanks for the link to a great paper!

I did an MSc thesis in 1986 about optimizing ray tracing by partial evaluation. In addition to specialising to the object definitions and shader functions, I also did some experiments where I additionally specialised with respect to the origin of the rays (for primary rays and rays from light sources). This gave 10-20% additional speedup, IIRC, so your idea of doing this is probably fine. I’m not familiar with ray marching, so I can’t say how well it would apply there.

Would you be willing to share your thesis? Do you have it in ps or pdf format?

You can use interval arithmetic, or some more advanced versions of it like affine arithmetic, to find zero of sfd much faster (less iterations), at the expense of a bit slower per iteration cost. But at the same time you are much more reliable to artifacts, and can use much larger steps + binary / bisection search to speed things up. It once (many times years ago) implemented affine arithmetic based solver (in D in fact), and used it for raytracing too (in D too), it was pretty cool. I didn’t use GPU tho, that would be interesting.

Curious to see what you rendered with it!

It’s possible to optimize raymarching using bounding volume hierarchies as well.

Yes. It’s a similar idea, except that with partial evaluation you’d look at the whole process of intersection testing and rendering as a computational graph that you would simplify based on which part(s) of the screen you’re casting rays into.

I’d completely forgotten about this post, but just got an email notification of new comments, so I guess I subscribed to back in 2016 😂

Anyways, I have since wrote a SIGGRAPH paper on rendering implicit surfaces on the GPU:

https://mattkeeter.com/research/mpr

It’s using a light-weight interpreter on the GPU and constructing optimized instruction tapes for spatial regions, so lots of overlap with ideas discussed here.

I’ve also seen an approach based on separate shaders (rather than a GPU-based interpreter) here:

http://zone.dog/braindump/sdf_clustering/

(source: https://github.com/Aeva/tangerine)

Very cool. Thank you for the update and congratulations on your SIGGRAPH paper!