Realtime ray tracing

Real time ray tracing


I'm not really expertly qualified to write such an article, but I'll make sure that it's as concise and informative as possible. I consider this article to be a work in progress, so if you'd like to know anything about real time ray tracing, don't hesitate to ask. Though I'm not 100% clear on some subjects (e.g. CSG and torus intersections), but I'll do my best to find out about it and perhaps write something on the subject one day. I'm personally interested in this subject myself, so whenever I learn something cool, I'll post something about it here.

Please note that I assume you've implemented a basic ray tracer before, and wish to speed it up. If you haven't done this, you can check out some tutorials on the web about it. I really recommend FuzzyPhoton's and Tom Hammersley's introduction to ray tracing if you haven't learnt about this yet.

Enough babble, on with the tutorial.

When Whitted invented ray tracing (or at least the generalised rendering method it is today), it took quite a while before the first implementation of ray tracing was done. This is mostly because of the inherent computational complexity of the algorithm, but can also be attributed to the slow machines of the time. Already we can see two fundamental reasons why ray tracing (even today) is not nearly as popular as normal polygon-based rendering methods:

1.) Algorithmically, it has a much higher initial cost of rendering than polygon-based algorithms.

2.) A lot of computational power is required to generate the images within a sizable fraction of the human life time :)

Although these problems are very similar, they are in fact separate problems. You can only optimise the ray tracing pipeline so much, until the only way to speed it up is to:

a.) Throw more processing power at it.
b.) Be slimy, use cheats and tricks :)

Before we get to the various methods of improving ray tracing speed, we have to accept that we will have to sacrifice scene complexity in order to achieve decent frame rates (even on modern CPUs).

We can do little about the second fundamental problem I listed above (high CPU requirements), since that's on the user's side. What we can do is take FULL advantage of the CPU power that we do have, for example using the MMX, SSE and 3Dnow! instructions present on many of today's modern CPUs. The problem with this is that it involves a lot of low-level code specific to only one part of the rendering pipeline, which in turn is specific to one type of processor. This basically means it's a LOT of work, so it really should be left as the last optimisation step.

Examples of real time ray tracing

There are actually very few demos and intros which showcase real time ray tracing, but the most notable ones are:

"Rubicon" and "Rubicon 2" by Suburban Creations
"Heaven seven" by Exceed
"Fresnel" and "Fresnel 2" by Kolor

Heaven seven is amazing in that it looks a lot better than it is :) It doesn't shadow check the CSG objects, it has only light source (seemingly; I expect to get some flames from the wonderful Exceed bunch on this :) active at any one time, yet it still looks massively impressive.

Rubicon and Rubicon 2 are examples of highly optimised ray tracing, without cheating. While this looks really good, it's still too slow to be considered real time (especially for the high demoscene standards). I've exchanged many emails with the intro's coder (crossbone of Suburban Creations, Eberhard Grummt, hiho mate! :) and I can assure you that this is probably the most algorithmically optimised ray tracer ever. In spite of this, it's still slow because it doesn't employ some of the cheats that Heaven seven and the Fresnel series do. What I gather from this is that you really have to cheat in order to get things running very smoothly. These are primarily optimised for quality (test your CPU and run the high quality version Rubicon 1 :).

Fresnel and Fresnel 2 go beyond cheating and actually use it as a feature: hardware-accelerated ray tracing. I know this sounds impossible, but it really is possible, and you stand to gain enormous performance improvements if you do (not to mention bilinear filtering, perspective correct texturing, sub pixel accuracy, the lot). These really are technical masterpieces, for reasons aside from the real time ray tracing (Wavelet texture compression [the textures aren't generated, they're actually hand drawn and compressed to ~2k each!], another SERIOUSLY neglected topic ripe for tutoring :). I've also exchanged some emails with Shiva of Kolor (the coder, respect!), and this is another really optimised ray tracer, and probably the best example for real time ray tracing to date. Minor flaw: the sub-sampling (cheat, feature, whatever) is done incorrectly at the edges of objects on the screen, producing a somewhat ugly effect. But at the lowest subdivision resolution, it's hardly noticeable.

So without further ado, let's see what we can do to improve the algorithmic efficiency of our ray tracer.

Algorithmic optimisations

Many ray tracers use spatial subdivision techniques (e.g. octrees, BSPs, KD-trees, etc) to speed up ray tracing, the idea being to reduce the number of intersection tests performed. While this is an absolute must for large (more than ~100 objects) scenes, with small scenes such as the one's we'll be using, it's just overhead. This is a hotly debated subject; this is just my logical reasoning. I haven't performed any official tests to verify this, but I'm fairly confident that this is the case. If anyone has other results from tests, please enlighten me :)

So while spatial subdivision is good for large scenes (to the point where in really large scenes with high resolution output, scene complexity is an almost negligible performance factor, making it the future for rendering [this is my hypothesis, don't flame me please!] in years to come), it's really useless for the real time ray tracing we'll be doing for the next two years or so. Who knows after then?

So the next thing we can do falls into two sub-categories:

1.) Improve the efficiency of all intersection tests.
2.) Exploit temporal and spatial coherence.

The solution to the first problem is very straightforward: just go through all your algorithms and equations, and make sure they're absolutely optimal. This normally involves special-casing equations for certain objects (e.g. not using a general quadric intersector to intersect a sphere) and factorising equations to minimise the number of operations required to perform the intersection. Also, sometimes you only need to know if an intersection occurs, and don't really care about where exactly it occurs, this can be really well optimised.

Now let me list some examples of exploiting spatial and temporal coherence.

Shadow caching

If you trace your screen in horizontal strips from left to right (or even better, in square blocks...), top to bottom (as most frame buffers are stored) and need to perform a shadow check for each intersection, you'll notice that if one point is in shadow, then next point will almost certainly be too. So to doing a shadow check for each object (and according to Murphy's law, the object that obstructs the light will be last on the list :) in the list sequentially, keep a pointer to the last obstructing object in the light's structure. A cheat (should be listed under the cheats section, but it's applicable here) would be to only do a shadow check every other pixel, and if one is in shadow then assume next pixel is too (it's hardly noticeable) and skip its shadow check.

This can be implemented fairly easily. When doing your shadow checking for all your lights, just check the current light's "last occluder" before any others, and if it intersects, don't bother checking any other objects. Otherwise loop through all the objects (checking if it ISN'T the light's current "last occluder" every iteration) and check for intersections. If one of those intersects, make it the last occluder.

I've implemented this in my real time ray tracing test program, lRay, and found (surprisingly) that it didn't make as large an impact as I'd hoped for (usually around 5%), though I think I'm just using a bad example case. Update: the reason this didn't make such a big speed difference was because I had relatively few objects in my scene. Now that I have more objects in my scenes (and have used it in my soft shadowing algorithm in another renderer :), I can confirm that this really does help.

Forward differencing

If you're ray tracing implicit surfaces and do a piecewise approximation of the surface by interpolating along the grid using a quadratic or cubic interpolation scheme, you can take advantage of the fact that you're tracing along a planar surface (the view plane) and that the points you trace are equally spaced (unless you're using stochastic sampling, which you shouldn't and wouldn't be doing with real time ray tracing :) to make your intersection fast enough for real time purposes (you just need some initialisation per scan line of each block of the implicit surface's grid).

This is intentionally vague because I wanted to include a reference to the technique, while making it brief because I don't expect to see this in real time for another few years :)

Speedup tricks

Aside from improving algorithms and using the current machine more effectively via specialised instruction sets, you can also improve speed greatly by just using your head a little. If you see a part of your code that's taking a particularly long time to execute, ask yourself if maybe there isn't a way that this could either be removed, precalculated, simulated or otherwise avoided.

Some notable speedup tricks include:

First hit optimisations

If any of the values you calculate during the course of rendering involve the origin of the ray and some other value(s) that remain constant for each frame (sphere radii, object positions etc), you can precalculate those values and store them as private data members of the object's data structure, and just reuse them whenever you need to calculate stuff from primary rays (the rays spawned from the camera or eye). This only works for primary rays, because the reflected or refracted rays obviously do not propagate from the camera's position, which means you'll have to write a specialised intersection routine for primary rays (or, as you'll see later, rays with any fixed origin).

This is a real must for any real time ray tracer, as primary rays constitute a large percentage (relatively, compared to "normal" ray tracing) of the total rays fired, since the scene complexity and difficulty (erm, for want of a better adjective describing the number of reflective and refractive surfaces :) are much lower for this type of scene.

Shadow check optimisations

Shadow checks are very expensive when you have lots of objects in the scene (n objects = n shadow checks PER INTERSECTION). But since all these shadow rays propagate from a fixed origin (the point of intersection), you can precalculate the same values as mentioned above for the shadow rays' origin. This should save you an enormous amount of time.

Again, this really is a must, because it will greatly reduce the speed impact that additional objects incur when being added to the scene.

Render at 1/2 or 1/4 resolution and interpolate

This can be seen in Rubicon 1 and 2. Though it isn't really a trick, it's hardly a kosher way of improving rendering performance either :) You needn't interpolate in normal squares, I've heard of many triangular interpolation schemes (such as the one used in Spinning Kids' "I feel like I could" demo) which look considerably better than standard square-based interpolation. I'm not really sure if there's a speed hit or not when interpolating along triangles rather than squares, but I'm sure there must be (it's just simpler to interpolate along squares). Which raises the question of whether or not those cycles should be used to improve the interpolation, or reduce the amount of interpolation and use squares. Tricky, but I'm sure it wouldn't go that far... some concrete performance tests would be great, but there aren't exactly vast pools of real time ray tracing info on the net :)

Pre-calculate per-frame constants

As mentioned in the introduction, some values related to intersections are constant for the entire frame (that is, are independent of ray orientation), and can thus be pre-calculated. Vector dot products are really good pre-calculation fodder, since the amount of time taken to load the floating point values over the slow system bus (I'm using a Celeron [update: not anymore :)], so I try to avoid using my 83.3mhz system bus and use the high clock speed instead :), to multiply them, sum them, then store them in a floating point register... this is no walk in the park. But loading a single float from RAM is hardly straining, and it also saves you tons of CPU work.

Reciprocals are also very good to precalculate if you divide by per-frame constants a lot.

Fastest ray / sphere intersection

I present what I find to be the fastest algebraic ray / sphere intersection routine.

This builds on what I mentioned above about precalculating per-frame constants. Here is a straight rip from my lRay source showing how you can implement really fast ray / sphere intersection, additionally with pre-calculated per-frame constants.

void lraySphere::frameCalc(lrayCamera *c)
    rSquared = radius * radius;
    invRadius = 1.0f / radius;
    origin = c->p - centre;
    cTerm = origin * origin - rSquared;

float inline lraySphere::intersectGeneral(ray *r)
    float b = r->d.x * (r->o.x - centre.x) + r->d.y * (r->o.y - centre.y) + r->d.z * (r->o.z - centre.z);
    float c = (r->o.x - centre.x) * (r->o.x - centre.x) + (r->o.y - centre.y) * (r->o.y - centre.y) + (r->o.z - centre.z) * (r->o.z - centre.z) - rSquared;

    float d = b * b - c;

    if (d < 0.0f) return -1.0f;

    return -b - sqrtf(d);

float inline lraySphere::intersectFixed(ray *r)
    float b = r->d.x * origin.x + r->d.y * origin.y + r->d.z * origin.z;

    float d = b * b - cTerm;

    if (d < 0.0f) return -1.0f;

    return -b - sqrtf(d);

I had an idea for doing really fast ray/sphere intersections that I ran by a friend of mine, and we came up with this method:

Instead of trying to solve the ray/sphere intersection problem algebraicly, you can look at the problem geometrically. Above is a diagram showing the geometry involved in ray/sphere intersection.

Polygon mesh rendering

Have you got a nice 4k triangle cow model you'd like to import into a ray tracing scene (hi Cuban! :)? Apart from using a bounding box (and some form of spatial subdivision scheme), there are very few straight ways to improve the intersection time required for such an object. Solution: don't ray trace it. Just fill a z-buffer with the object as it would appear transformed to the eye's view, along with the intersection distance parameter ("t" as it is often called) and any other info you need (colour, transparency coefficients, normals [expensive memory-wise], anything really), using standard scan line algorithms. So you can just ray trace right over that z-buffer, and everything will be perfect.

Problems start when you need the thing to reflect. Now, if it's a convex object (it at no point mirrors itself), you can just skip the entire mesh in the reflection's intersection tests, and get REALLY cheap reflection calculation for the object! If the object is non-convex, then you'd better make sure that you have some spatial subdivision scheme implemented, otherwise things WILL get painful :) I've never seen or heard of this anywhere, it's just a crazy idea I came up with while I was in the shower some day, so I don't know how useful it'd be.

Shadow intersections aren't nice for this either. Make sure you have a bounding box (NOT axially-aligned, you want a tight fit, even at the cost of having an arbitrarily orientated bounding box) for this. Just rotate the object's bounding box with the object.

In general, polygon meshes aren't used in real time ray tracing, but with a lot of tricks and hacking, it should be possible. The biggest problem with these is that if you want shadowing, you still have to do ray / mesh intersections, which are really expensive, even if you're using some sort of subdivision scheme (unless you're rendering a polygonal sphere, which is stupid when you have a ray tracer handy). Also, you shouldn't do this with hardware acceleration unless you have zillions of triangles in the mesh, because AGP bus inefficiencies (when reading from the hardware's framebuffer) might make hardware rendering actually slower than software rendering.

Shadow casting flags

You can really bump up scene efficiency if you know beforehand which objects won't obstruct light cast onto other objects. This is really simple to implement (even when designing the scene), and really helps, so it's a must.

Unfortunately it's really difficult to automatically generate these flags (by checking all points on every surface on all time steps in the animation sequence), so the person creating the scene must set it.

Using variable precision iterative root finders

When doing ray / implicit surface or ray / displacement mapped surface intersections and you're refining your intersection point (using interval bisection or whatever root finding method you choose), a nice trick is to use fewer refinement steps for secondary rays, you won't notice the difference. Also, when doing shadow checks it's completely unnecessary to refine your intersection once you know that the implicit surface intersects the ray.

Dump OOP

As wonderfully suited to ray tracing OOP might be, it still adds massive (my recent measurements with VTune really scared me) overhead to the typical (classic?) implementation where every object is inherited from a base class defining several virtual functions which need to be implemented in every subclass. Doing ray tracing without OOP feels really dirty and "hacked", but it's certainly faster.

At the core you'd just keep a list of the various objects by type (e.g. CSphere sphere[100], CMesh mesh[100], etc) along with the number of objects of each type. The number of calls to virtual functions for each pixel (if each one has ObjectNum intersection tests for reflections, then 2x or 3x ObjectNum shadow checks, it starts adding up VERY quickly!) are a very serious overhead which I failed to realise until recently, and is probably the most fundamental decision to be made when setting out to code your RTRT.

Don't use spatial subdivision schemes

For normal scenes (at the time of writing, Moore's law be damned! :), spatial subdivision doesn't yield enough of a performance increase to amortise the overhead incurred by using it.

Reducing the list of objects to intersect

A nice speedup you can implement quite easily involves pruning down the list of objects you want to intersect by iterating through each row in your image and checking for intersections between every object in your scene and the plane formed by the camera position and the two endpoints of the row. Have an array which stores whether or not that row intersects all the objects. Do the same for all columns. Code-wise it looks something like this:

class objectList
  bool intersectsObject[numObjects];

objectList row[YRES];
objectList column[XRES];

Then to build the list of objects to intersect, you'd do something like:

for (int y = 0; y < yResolution; y++)
    vector yScanline = topLeftViewPlanePoint + downStep * (float)y;

    intersectionPlane.normal = (yScanline ^ camera.right).normalise();
    intersectionPlane.dTerm = intersectionPlane.n * camera.p; // "d" term for the plane

    // find all objects that intersect the current row
    for (int i = 0; i < numObjects; i++)
        if (object[i]->intersects(&intersectionPlane)) row[y].intersectsObject[i] = true; else row[y].intersectsObject[i] = false;

And then do a similar procedure for the colums.

Then, when you trace a pixel (you have to specify x and y co-ordinates), all you have to do is loop through the list of all objects in the scene, then check if the object is intersected by BOTH the row and the column for that pixel (check the array element on both lists belonging to that object). If it is, then do intersection testing with that object, otherwise don't. In code:

for (i = 0; i < numObjects; i++)
    if (row[y].intersectsObject[i] && column[x].intersectsObject[i])
        // intersect object...

Effectively what this does is construct a 2D bounding box for intersections with that object.

The biggest problem with this method is writing plane / object intersection testing routines for each object. For planes and spheres this is trivial (for spheres, just get the minimum distance between the centre of the sphere and the plane, and check if this is less than or equal to the sphere's radius), but for other types of objects it can be quite non-trivial. If you can't write an efficient plane / object intersection function for a specific object, just always return true (for the intersection) and the object will always be intersected.

Taking advantage of a fixed camera

A very nice trick you can do is to precalculate your primary rays for all screen pixels, and just rotate the directions using a 3x3 matrix multiply. Also, because your camera is fixed, you can precalculate the above planes and just rotate the plane using the inverse (transpose for orthogonal 3x3 matrix) of your rotation matrix. Reasoning for this can be found in an excellent explanation from Steve Hollasch.

Frameless rendering

Because ray tracing is pixel-driven, you can easily trace pixels from random places on the view plane. So if your camera movement isn't too fast and you don't mind some fake motion blurring, you can opt to just trace N number of pixels (where N is intelligently chosen to keep things real time :) from random places on the view plane every frame, and just draw that. The less your view changes, the better this will work.

I've been thinking of fast ways to bias this to distribute more samples where they're needed most (like object boundaries), but so far I've only come up with some slow-ish methods which are probably not going to be very feasible for real time rendering.


Cheating or faking often produces the biggest speed improvements of all, usually at some expense to technical-satisfaction and image quality :) Since the demoscene lives and breathes on doing the impossible (by not actually doing it, Phong shading via env-mapping springs to mind), this is a very popular thing to do to speed up a slow ray tracer :)

Image sub-sampling

This has to be the most popular cheat around, and is basically the reason why I'm writing this article. I think this is such a neat cheat that it really deserves to be implemented in all ray tracers everywhere. It actually has the potential to IMPROVE your image quality if you do it right. This is only really true if you're doing texturing and lots of interpolation using hardware acceleration instead of interpolating in software.

Basically, what you do is define a grid of points at a resolution much lower (usually using 8x8 or 16x16 blocks) than your screen resolution. At each point on the grid, you do calculate your normal ray tracing, but with a twist: you don't actually draw anything. You just store the intersected object's ID, U and V texture co-ordinates for that intersection, the additive colour at that point (specular lighting + reflected colour + refracted colour) and the diffuse shade at that point (just the simple Lambertian shading co-efficient). Now, all you have to do is interpolate the texture along the square, then shade it by the interpolated diffuse co-efficient, and add the interpolated additive colour. That's it! Well, almost, there are some points to note:

Firstly, what if the object IDs aren't the same on all four corners of the square? Well, there are three things you can do at this point:

1.) The first (and much faster) method is to evaluate the complete colour (as you'd render to the frame buffer normally) for all the corners and just interpolate using standard Gouraud interpolation (or some quadratic interpolation, it's up to you really :) Fresnel 1 and 2 by Kolor use this method, and I'd like to thank Shiva of Kolor for explaining this one to me.

2.) The second (and slower, but much better looking) method is to subdivide and re-trace the square until all the subdivided squares either have the same object IDs or are one pixel in area (making sure to only trace the 5 new points :). Obviously for high resolutions, the latter case can be quite slow, but still not as slow as normal ray tracing! You can actually make this method faster than the above method by increasing the initial interpolated block size (24x24 or 32x32 even), but this starts to look really bad (especially around the sharp shadows, but this can be fixed by adding the shadow's ID to the object's ID [making sure that there aren't multiple combinations of object IDs and shadow IDs that can produce the same ID], thanks Dman for that tip!) after you really turn up the block size, and leads us to the second problem of this method. Heaven seven and just about all other real time ray tracers use this method (a notable exception being Rubicon 1 and 2).

3.) Just trace the whole square. This is the method I've implemented in lRay at the moment. Some of the principle reasons for doing this include low code complexity (hehe, I'm really lazy :) and low overhead. If you have tons of code and huge data structures (I've seen all kinds of hashing schemes that need to be cleared every frame, etc) to manage subsampling then there's only so much to be gained. The recursive subsampling helps most in high resolution situations when you have 16x16 or bigger blocks. Personally, I'd rather use a lower resolution and have higher scene complexity. Weigh the benefits yourself.

The second problem with this method is that really small objects (smaller than your block size) that are completely contained within the block and not touching any of the corners will be missed. Having implemented this scheme for 8x8 blocks, it does happen here and there, but it's not that critical. Unfortunately when you're tracing CSG and implicit sufaces some objects might have very thin edges which get missed in some frames, and in others not. This looks particularly bad when animated. Oh well, that's the price you pay...

So by all means, implement this. Since this interpolation is remarkably akin to what 3D cards have been evolving to do for ages (hint :), you can be smart like the Kolor bunch and use OpenGL or DirectX to draw your quads. This way, you not only massively improve your rendering speed, but you also get bilinear filtering, sub-pixel accuracy, high quality RGB Gouraud shading, blending, everything for free. Now that is really something, and should be standard with every real time ray tracing demo :) If you still decide to use software to do the interpolation, I'd highly recommend using at least MMX for this (RGB Gouraud isn't nice without MMX), especially if you use bilinear filtering in software (madness, rather turn up your detail settings and object count :)

If you're going to use hardware accelerators to draw your quads, there are some things you should take into consideration. Firstly, it really makes sense to render your quads in batches that use the same texture(s). Secondly, you'll probably have to use loads of indexed triangle strips and other tomfoolery to get things running at a decent speed. Unfortunately I don't know enough about hardware accelerated rendering to give an informed opinion of how much this affects speed, but I imagine that stuffing all those quads over the AGP bus every frame can't be fun. However, with software rendering it really doesn't matter... though it might still make sense to render the quads by texture as well, for memory coherency. However that will require loads of sorting which touches loads of memory anyway, so it might not work out to be as fast. However if you have loads of different and small textures this might be worthwhile.

Frame interpolation

I did some tests long ago with non-linear frame interpolation, and found that if the scene doesn't change too much from frame to frame (read: SLOW camera flybys :), then this looks really good (like an effect :) and can save you tons of time. What you do is make sure you've rendered at least one frame ahead, and just interpolate between the current, previous and next frames using a cubic interpolator (b-splines work well too, but are far too computationally expensive). Just make sure that you write your interpolator in MMX :)

Another simple trick would be to render TV-style: every other scanline, and just interpolate (preferably non-linearly, it really makes a difference) between the known and unknown scan lines. Tip: for interpolation speed (cache-wise), render vertical rather than horizontal strips.

Object faking

It's apparently quite possible to fake quartic objects such as torii by using CSGd quadric objects. Gamma2 by mfx supposedly does this, but I haven't really confirmed this, and I didn't notice that it's faked, so it just goes to show how effective this can be (along with really strong blurring filters... I couldn't see much anyway :). Although not related to speeding up ray tracing, it's a cool hack if you can do it correctly.

Conclusion and links

There isn't very much info on the net for ray tracing newbies who don't have a PhD in mathematics, English and computer software engineering, which is a pity. I've had some requests for an introductory article on ray tracing, but recently I found a really excellent article online which deals with many aspects of ray tracing, apart from having a great how-to tutorial :) See the FuzzyPhoton link.

Unfortunately the mirror to Tom Hammersley's site has also disappeared, so I've decided to mirror the ray tracing article here on my site. Tom, if you're alive and this bugs you, drop me a line :)

Most of the documents I've read on high-level subjects such as ray tracing, radiosity, wavelets etc. are normally "encoded" by big shot professors so that only other big shot professors can read and understand them (e.g.: "image" = "discretely bounded uni-polar planar integer function", "blur" = "high frequency band cut-off homogenous scalar convolution in the spatial domain", etc... I should collect quotes like these, they're so damn funny), and I'm only now beginning to get used to their enigmatic "dialect".

Anyway, rant mode off. If you have any straight English ray tracing info (particularly that which is applicable to real time ray tracing), I'd really appreciate it if you could just drop me a line. In fact, email me anyway :)

Here are some links I've gathered on ray tracing, you'll find that many of them have tips on speeding the process up. Just be sure to ignore the spatial subdivision stuff :)


FuzzyPhoton (excellent introductory ray tracing article and glossary)

Ray tracing tutorials on Polygone

Tom Hammersley's ray tracing doc (local mirror)

Some very good intersection tutorials

The RTRT Realm (a good source of ray tracing demos) (the best place to get demos)


Nick Chapman's real time ray tracer

2 to the x's ray tracing tutorials

Paul Bourke's website (excellent page with all kinds of useful stuff)


Greets and credits

I'd like to thank all the friendly people who have helped me learn about real time ray tracing, especially my then-mentor, Crossbone of Suburban Creations for answering all of my really stupid and incessant questions. Respect! :) Other greets (I'm probably forgetting 99% of them, but here goes):

Quartz of Kolor (Massively helpful with everything :)
Shiva of Kolor (For help with ray tracing [esp. CSG] and wavelet image compression)
Tom Hammersley (Your doc on ray tracing got me started! Put your site back online and update it, it ruled!)
Shaun Nirenstein (For the many enlightening conversations)
Ingo Wald (Who helped me a lot with many aspects of ray tracing on his visit to Afrigraph)
IRCnet #coders (HardCoreCoderCentre, much fun and help even at 4 in the morning :)
The RTRT mailing list (Lots of really cool guys and good info here)
The South African demoscene (Keep up the local scene spirit!)