# Projection matrix

This article is for the operation of projecting a 3 dimensional scene onto a viewing plane in computer graphics, not to be confused with projections in linear algebra, the canonical projections of a product or a projector matrix.

## Definition

The projection matrix is a [ilmath]4\times 4[/ilmath] real-valued matrix which represents the transform of view coordinates to clip coordinate, it is the P the MVP stack of matrices and is still commonly used today (where as a single of a model matrix or a view matrix is rarely used these days). The concepts for all 3 are still present in today's (2017) graphics systems.

See classic transformation pipeline (computer graphics) for information on the other steps.

### Types

There are 2 common types of projection matrix (although really it's just a 4x4 matrix we've called "projection"):

1. Perspective projection matrix - the volume the camera views is represented by a frustumCaveat:Not sure if this is the right word[Note 1] and these are mapped to [ilmath][-1,1]^4[/ilmath] - a cube-of-4-dimensions of sides of length 2, called the clip coordinates
• As its name suggests this is "perspectively correct" as it makes things that are far away smaller.
2. Orthogonal projection matrix - rather than the "frustum" shape this transform takes a cuboid[Note 2] and maps it to clip coordinates, again [ilmath][-1,1]^4[/ilmath]
• This doesn't have the "perspective property", think for example of isometric paper - drawings on them have no concept of depth, as such far away things are the same size as close things (an object moving directly away from the camera would not change appearance in terms of geometry[Note 3])
• Orthogonal matrices are also used for 2-dimensional rendering tasks, allowing icons and such to be displayed on the screen using the graphics API directly. Such a matrix is very close to the identity matrix and usually differs just to account for the screen's aspect ratio.
• Historically for 2d games platforms like the SNES, this was a 3x3 matrix or even a 3-column-2-row one, and any layering to give depth in side-scrollers was done manually.

## Using the projection matrix

Here [ilmath](x,y)^T[/ilmath] means that the row-vector [ilmath](x,y)[/ilmath] is to be transposed to [ilmath]\left(\begin{array}{c}x\\y\end{array}\right)[/ilmath]

Let [ilmath]P[/ilmath] be the projection matrix, and

• let [ilmath]v'\in\mathbb{R}^3[/ilmath] be given, so [ilmath]v'\eq(x,y,z)^T[/ilmath] for some [ilmath]x,y,z\in\mathbb{R} [/ilmath], now
• let [ilmath]v:\eq (x,y,z,1)^T\in\mathbb{R}^4[/ilmath] (for the use of a [ilmath]1[/ilmath] at the end there see position and direction vectors in 3d rendering)
• [ilmath]v_c:\eq Pv[/ilmath] - this gives us a position-vector [ilmath]v_c[/ilmath] in clip coordinates

### Further steps

As this may be the only article that covers it I add this information here, if the other pages exist (like classic transformation pipeline (computer graphics)) go there instead and please tell me to remove this! It was written quickly so forgive the poor quality!

With our clip coordinates we perform the perspective divide:

• $\text{PerspectiveDivide}\left(\begin{array}{c}x\\y\\z\\w\end{array}\right):\eq\left(\begin{array}{c}\dfrac{x}{w}\\\dfrac{y}{w}\\\dfrac{z}{w}\end{array}\right):\eq v_n\in\mathbb{R}^3$

This new [ilmath]3[/ilmath]-vector is a point in "normalised device coordinates", which is a cube with sides of length 2, [ilmath][-1,1]^3[/ilmath]

Note that points outside of the camera's view-volume will be outside of [ilmath][-1,1]^3[/ilmath] - this makes clipping much easier in NDC coordinates than in clip coordinates, however there are tricks (that I can't remember) involving the w that also help with clipping.

Lastly are so called "screen coordinates", if our screen has width [ilmath]w[/ilmath] pixels and height [ilmath]h[/ilmath] pixels then the screen coordinates are:

• $\left(\begin{array}{c}x\\y\\z\end{array}\right)\mapsto\left(\begin{array}{c}w*\dfrac{x+1}{2}\\ h*\left(1-\dfrac{y+1}{2}\right)\\ z\end{array}\right)$

The [ilmath]\frac{r+1}{2} [/ilmath] part will take a value, {{M\r\in[-1,1]}} and map it to [ilmath][0,1][/ilmath] with [ilmath]-1\maptso 0[/ilmath] and [ilmath]1\mapsto 1[/ilmath] in the obvious linear way.

We subtract this value from 1 for the [ilmath]y[/ilmath] coordinate because [ilmath]-1[/ilmath] for y's clip coordinate corresponds to the bottom of the view, but the 0th row of pixels on a screen correspond to top row. So we need to flip that.

The z value - if significant to the program - may be transformed or may just be discarded, it is often stored in a depth buffer and is crucial for the depth test of 3d graphics.

When we start a render with depth testing we'll clear the depth buffer with some value less than the furthest depth (here, highest) value that can be in NDC coordinates, in this case [ilmath]1[/ilmath], so we'll use like [ilmath]2[/ilmath] - anything strictly above [ilmath]1[/ilmath] will do.

When we are rendering a point we read the depth buffer at the screen-coords [ilmath]x,y[/ilmath], if the value we read from the buffer is lower than [ilmath]z[/ilmath], we throw away this pixel because we've already rendered something closer-to-the-camera than it.

Otherwise we calculate it's shading (to yield colour and such) and then we set the resulting screen coordinate [ilmath]x,y[/ilmath] pixel to whatever colour we want (writing that colour to the [ilmath]x,y[/ilmath]th location in the colour-buffer) and we overwrite the larger [ilmath]z[/ilmath]-value already in the depth buffer with the point's own depth value, [ilmath]z[/ilmath] - to the Template:X,yth location in the depth buffer.