NaNs on the PS3 in GPAD (black pixels)

NaN stands for Not a Number, and is “a value or symbol that is usually produced as the result of an operation on invalid input operands, especially in floating-point calculations.” My first experience with NaNs in CG was when I was working on Scooby Doo 2. I was simulating curtains on top of live-action footage in Maya using Syflex Cloth and rendering them using mental ray. I kept getting strange black specks in my renders (only a single pixel in size), especially while rendering with heavy motion blur. After doing some digging I realized the black specks were NaNs being generated by some combination of my shading, lighting, and rendering setup.

 

NaNs are technically errors and are sometimes treated as such. However, in graphics the easiest thing to do is just return Zero and not generate an error that halts the program. NaNs that don’t cause an exception or generate an error are called Quiet NaNs. This is what was happening to my curtain renders – the NaN results were simply being rendered as black specs (RGBA: 0, 0, 0, 0).

So, how are NaNs created?

  • All mathematical operations with a NaN as at least one operand
  • The divisions 0/0, ∞/∞, ∞/-∞, -∞/∞, and -∞/-∞
  • The multiplications 0×∞ and 0×-∞
  • The additions ∞ + (-∞), (-∞) + ∞ and equivalent subtractions.
  • Applying a function to arguments outside its domain, including taking the square root of a negative number, taking the logarithm of zero or a negative number, or taking the inverse sine or cosine of a number which is less than -1 or greater than +1.

It may seem like it’s pretty easy to avoid NaNs, but even some everyday functions can generate Quiet NaNs if you’re not careful. I ran into a Quiet NaN today that I managed to generate myself using some pretty innocuous HLSL code. It kept me occupied for a good 20 minutes before I finally realized what was happening. Here’s the offending code (explanation follows – see if you can spot the problem):

float2 Vec = normalize(float2(U, V));

Hint1: Dividing by zero will cause a NaN.
Hint2: The problem is a bit hidden in the normalize() function call.
Hint3: Vector Normalization

Answer: In the above code, a NaN is generated if we try to normalize a zero-length vector.

In order to get around this problem, we have to check for a zero-length vector before we make the function call. However, due to something I haven’t quite figured out, once a variable is “dirtied” by a NaN it can’t be recovered. From what I have read this could be a driver issue. Because of this, the following code to test for a NaN result doesn’t work:

float U = 0.0f;
float V = 0.0f;
float2 Vec = (0.0f, 0.0f);
if(U + V != 0.0f) { Vec = normalize(float2(U, V)); }

Instead of executing NaN-causing code directly in the ‘if’ statement, I tested for the conditions that can cause a NaN first, thereby avoiding the NaN result entirely. I first checked if my vector was of zero-length. Only then did I normalize it myself. This prevented a NaN from being generated at all:

float U = 0.0f;
float V = 0.0f;
float2 Vec = float2(U, V);
float test = length(Vec);
if(test == 0.0f) { test = 1.0f; }
Vec = Vec / test;

I’m still curious as to why a NaN “dirties” any variable it is assigned to, however. It seems like the ‘if’ statement is getting flattened by the compiler and the NaN result is hanging around even if the conditions that cause it are not met. Does anyone have any insight into this?