Written by Luka Kerr on August 12, 2019



Single Instruction, Multiple Thread (SIMT)

Each processor in the GPU has multiple threads, executing in lock step for a piece of code.

It combines single instruction, multiple data (SIMD) with multithreading.


OpenGL is a low-level 2D/3D graphics API. It is:

OpenGL is used to:


Coordinate System

By default the viewport is centred at (0, 0). The left boundary is at x = -1, the right at x = 1, the bottom at y = -1, and the top at y = 1.



We can draw polygons by splitting them up into simpler shapes (typically triangles). One simple method is to use a triangle fan.

The steps to draw a polygon using a triangle fan are as follows:

  1. Start with any vertex of the polygon and move clockwise or counter-clockwise around it
  2. The first three points form a triangle. Any new points after that form a triangle with the last point and the starting point

Tessellation works for all simple convex polygons, and some concave ones. It can be drawn with GL_TRIANGLE_FAN.

2D Transformations

Model Transformation

A model transformation describes how a local coordinate system maps to the world coordinate system. Each object in a scene has its own local coordinate system.

Coordinate Frames

Each coordinate system has a coordinate frame. A coordinate frame has an origin, and the direction and scale of the x and y axes. Different transformations can be applied to coordinate frames including:

Vectors & Matrices

The sum of a point and a vector is a point: $P + \vec{v} = Q$.

The difference between two points is a vector: $\vec{v} = Q - P$.

The sum/difference of two vectors is a vector: $(u_1, u_2)^T \pm (v_1, v_2)^T = (u_1 \pm v_1, u_2 \pm v_2)^T$.

Magnitude: $|\vec{v}| = \sqrt{v^2_1 + v^2_2 + \cdots + v^2_n}$.

Normalisation (direction): $\hat{\vec{v}} = \dfrac{\vec{v}}{|\vec{v}|}$, and $|\hat{\vec{v}}| = 1$.

Dot product: $\vec{u} \cdot \vec{v} = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n$.

Cross product: $\vec{a} \times \vec{b} = (a_2 b_3 - a_3 b_2, a_3 b_1 - a_1 b_3, a_1 b_2 - a_2 b_1)^T$.

Cross product magnitude is the area of the parallelogram formed by: $|\vec{a} \times \vec{b}| = |a| |b| \sin \theta$.

Matrix multiplication: $A_{m \times n} \cdot B_{n \times p} = C_{m \times p}$ where $C_{ij} = (a_{i1} \cdot b_{1j}) + (a_{i2} \cdot b_{2j}) + \cdots + (a_{in} \cdot b_{nj})$.

Scene Trees

Scene Tree

A scene tree is used to represent a complex scene. Each node in the scene tree represents an objet, and each edge represents the transformation to get from the parent object’s coordinate system to the child’s.

Each node in a scene tree computes its own coordinate frame, draws itself within the frame, and passes the frame down to its children. As such, when a node is transformed, all of its children move with it.

def drawTree(frame):
  # Compute new frame
  draw() this object
  for child in children:

Scene Graph

A scene graph is a more general scene tree. It is a directed acyclic multi-graph.


Transformation Pipeline

We transform in 2 stages

\[P_{camera} \overset{view}{\longleftarrow} P_{world} \overset{model}{\longleftarrow} P_{local}\]

The model transform transforms points in the local coordinate system to the world coordinate system.

The view transform transforms points in the world coordinate system to the camera’s coordinate system.

If \(P_{world} = Trans(Rot(Scale(P_{camera})))\) then the view transform is \(P_{camera} = Scale^{-1}(Rot^{-1}(Trans^{-1}(P_{world}))).\)

Coordinate Frames

A 2D coordinate frame is defined by

A point on the coordinate frame can be defined as a displacement from the origin, i.e. $P = p_1 i + p_2 j + \phi$.

Affine Transformations

Transformations between coordinate frames can be represented as matrices

\[\begin{pmatrix} q_1 \\ q_2 \\ 1 \end{pmatrix} = \begin{pmatrix} i_1 & j_1 & \phi_1 \\ i_1 & j_2 & \phi_2 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ 1 \end{pmatrix}\]

Similarly for vectors

\[\begin{pmatrix} v_1 \\ v_2 \\ 0 \end{pmatrix} = \begin{pmatrix} i_1 & j_1 & \phi_1 \\ i_1 & j_2 & \phi_2 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} u_1 \\ u_2 \\ 0 \end{pmatrix}\]

All affine transformations can be expressed as combinations of four basic types:

2D Translation

To translate the origin to a new point $\phi$

\[\begin{pmatrix} q_1 \\ q_2 \\ 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & \phi_1 \\ 0 & 1 & \phi_2 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ 1 \end{pmatrix}\]

2D Rotation

To rotate a point about the origin by $\theta$

\[\begin{pmatrix} q_1 \\ q_2 \\ 1 \end{pmatrix} = \begin{pmatrix} cos(\theta) & -sin(\theta) & 0 \\ sin(\theta) & cos(\theta) & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ 1 \end{pmatrix}\]

Likewise to rotate a vector by $\theta$

\[\begin{pmatrix} v_1 \\ v_2 \\ 0 \end{pmatrix} = \begin{pmatrix} cos(\theta) & -sin(\theta) & 0 \\ sin(\theta) & cos(\theta) & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} u_1 \\ u_2 \\ 0 \end{pmatrix}\]

2D Scale

To scale a point by factors $(s_x, s_y)$ about the origin

\[\begin{pmatrix} s_x p_1 \\ s_y p_2 \\ 1 \end{pmatrix} = \begin{pmatrix} s_x & 0 & 0 \\ 0 & s_y & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ 1 \end{pmatrix}\]

Likewise to scale a vector by $(s_x, s_y)$

\[\begin{pmatrix} s_x v_1 \\ s_y v_2 \\ 0 \end{pmatrix} = \begin{pmatrix} s_x & 0 & 0 \\ 0 & s_y & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ 0 \end{pmatrix}\]

2D Shear

To horizontally shear a point by a factor $h$

\[\begin{pmatrix} q_1 \\ q_2 \\ 1 \end{pmatrix} = \begin{pmatrix} 1 & h & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ 1 \end{pmatrix}\]

To vertically shear a point by a factor $v$

\[\begin{pmatrix} q_1 \\ q_2 \\ 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 \\ v & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ 1 \end{pmatrix}\]

Decomposing Transformations

Every 2D affine transformation can be decomposed as $M = M_{translate} M_{rotate} M_{scale} M_{shear}$.

To decompose the transform, consider the matrix form

\[\begin{pmatrix} i_1 & j_1 & \phi_1 \\ i_2 & j_2 & \phi_2 \\ 0 & 0 & 1 \end{pmatrix}\]

Then, assuming uniform scaling and no shear

\[\begin{array}{lcl} M_{translate} = \begin{pmatrix}\phi_1 \\ \phi_2 \\ 1\end{pmatrix} \\ M_{rotate} = atan2(i_2, i_1) \\ M_{scale} = |i| \end{array}\]

where $atan2(i_2, i_1)$ is $atan(i_2 / i_1)$ or $tan^{-1}(i_2 / i_1)$ adujsting for $i_1$ being 0. If $i_1 = 0$ and $i_2 \ne 0$ we get $90^{\circ}$ if $i_2 > 0$, or $-90^{\circ}$ if $i_2 < 0$.

Inverse Transformations

Inverses can be calculated as follows

\[\begin{array}{lcl} M_{translate}^{-1}(d_x, d_y) & = & M_{translate}(-d_x, -d_y) \\ M_{rotate}^{-1}(\theta) & = & M_{rotate}(-\theta) \\ M_{scale}^{-1}(s_x, s_y) & = & M_{scale}(1 / s_x, 1 / s_y) \\ M_{shear}^{-1}(h) & = & M_{shear}(-h) \\ \end{array}\]

Linear Interpolation

\[\begin{array}{lcl} lerp(P, Q, t) & = & P + t(Q - P) \\ lerp(P, Q, t) & = & P(1 - t) + tQ \\ \end{array}\]



Parametric Form

\[L(t) = P + t(Q - P)\]

Point Normal Form

\[\mathbf{n} \cdot (P - L) = 0\]

Line Intersection

Two lines

Solve the simultaneous equations $(B - A)t = (C - A) + (D - C)u$.

Polygon Intersection

To check if a point $P$ is in a polygon, we can fire a horizontal ray from $P$ and check how many times it crosses the edges of the polygon. If $n$ is the number of crosses and $n$ is even then $P$ is not in the polygon, if $n$ is odd then $P$ is in the polygon.


Shaders are programs executed on the GPU for the purpose of rendering graphics, written in glsl.

Vertex Shaders

The GPU will execute the vertex shader for every vertex we supply it.

Variable types include

Variables starting with gl_ and have special meaning.

// Incoming vertex position
in vec2 position;

// Takes the input position and returns it as is
void main() {
  gl_Position = vec4(position, 0, 1);

Fragment Shaders

The GPU will execute the fragment shader for every pixel it draws into the frame buffer.

out vec4 outputColor;

// Output black
void main() {
  outputColor = vec4(0, 0, 0, 0);


Moving to 3D is simply a matter of adding an extra dimension to our points and vectors

\[P = \begin{pmatrix} p_x \\ p_y \\ p_z \\ 1 \end{pmatrix}, \quad v = \begin{pmatrix} v_x \\ v_y \\ v_z \\ 0 \end{pmatrix}\]

3D Affine Transformations

3D affine transformations have the same structure as 2D but have an extra axis $k$

\[M = \begin{pmatrix} i_1 & j_1 & k_1 & \phi_1 \\ i_2 & j_2 & k_2 & \phi_2 \\ i_3 & j_3 & k_3 & \phi_3 \\ 0 & 0 & 0 & 1 \end{pmatrix}\]

3D Translation

To translate the origin to a new point $\phi$

\[\begin{pmatrix} q_1 \\ q_2 \\ q_3 \\ 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & \phi_1 \\ 0 & 1 & 0 & \phi_2 \\ 0 & 0 & 1 & \phi_3 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ p_3 \\ 1 \end{pmatrix}\]

3D Rotation

The rotation matrix depends on the axis of rotation. We can decompose any rotation into a sequence of rotations about the $x$, $y$ and $z$ axes.

\[\begin{pmatrix} q_{x_1} \\ q_{x_2} \\ q_{x_3} \\ 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & cos(\theta) & -sin(\theta) & 0 \\ 0 & sin(\theta) & cos(\theta) & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ p_3 \\ 1 \end{pmatrix}\] \[\begin{pmatrix} q_{y_1} \\ q_{y_2} \\ q_{y_3} \\ 1 \end{pmatrix} = \begin{pmatrix} cos(\theta) & 0 & sin(\theta) & 0 \\ 0 & 1 & 0 & 0 \\ -sin(\theta) & 0 & cos(\theta) & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ p_3 \\ 1 \end{pmatrix}\] \[\begin{pmatrix} q_{z_1} \\ q_{z_2} \\ q_{z_3} \\ 1 \end{pmatrix} = \begin{pmatrix} cos(\theta) & -sin(\theta) & 0 & 0 \\ sin(\theta) & cos(\theta) & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ p_3 \\ 1 \end{pmatrix}\]

3D Scale

To scale a point by factors $(s_x, s_y, s_z)$ about the origin

\[\begin{pmatrix} s_x p_1 \\ s_y p_2 \\ s_z p_3 \\ 1 \end{pmatrix} = \begin{pmatrix} s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & s_z & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ p_3 \\ 1 \end{pmatrix}\]

3D Shear

To horizontally shear a point by a factor $h$

\[\begin{pmatrix} q_1 \\ q_2 \\ q_3 \\ 1 \end{pmatrix} = \begin{pmatrix} 1 & h & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ p_3 \\ 1 \end{pmatrix}\]


Winding Order

By default, triangles are defined with counter-clockwise vertices are processed as front-facing triangles. Clockwise are processed as back-facing triangles.

Back Face Culling

An optimisation called face culling allows non-visible triangles of closed surfaces (back faces) to be culled before rasterisation. This is based on the winding order of the triangle - front faces are CCW by default.

View Volume

A 3D camera shows a 3D view volume. This is the area of space that will be displayed in the viewport. Objects outside the view volume are clipped. The view volume is in camera coordinates.

Orthographic View Volume

The camera is located at the origin in camera co-ordinates and oriented down the negative z-axis.

Canonical View Volume

It is convenient for clipping if we scale all coordinates so that visible points lie within the range $(-1, 1)$. The origin is in the middle of the view volume and the z-axis is flipped.

Perspective Camera

We can define a different kind of projection that implements a perspective camera. The view volume is a frustum. A frustum consists of:

public Matrix4 frustum(float left, float right, float bottom, 
                       float top, float near, float far) {
  return new Matrix4(new float[] {
    2*near/(right-left), 0, 0, 0,
    0, 2*near/(top-bottom), 0, 0,
    (right+left)/(right-left), (top+bottom)/(top-bottom), -(far+near)/(far-near), -1,
    0, 0, -2*far*near/(far-near), 0

It is sometimes easier to define a frustum with a field of view, aspect ratio and near and far planes:

public Matrix4 perspective(float fovy, float aspectRatio, float near, float far) {
  float halfHeight = (float) (near * Math.tan(Math.toRadians(fovy) / 2));
  return Matrix4.frustum(
    -aspectRatio * halfHeight, aspectRatio * halfHeight, 
    -halfHeight, halfHeight, near, far

Assuming a symmetric view volume:

The Transformation Pipeline

To transform a point $P = (p_x, p_y, p_z)^T$:


Bilinear Interpolation



\[\begin{array}{lcl} f(R_1) = \dfrac{y - y_1}{y_2 - y_1} f(Q_2) + \dfrac{y_2 - y}{y_2 - y_1} f(Q_1) \\ \\ f(R_2) = \dfrac{y - y_3}{y_4 - y_3} f(Q_4) + \dfrac{y_4 - y}{y_4 - y_3} f(Q_3) \\ \\ f(P) = \dfrac{x - x_1}{x_2 - x_1} f(R_2) + \dfrac{x_2 - x}{x_2 - x_1} f(R_1) \end{array}\]

where $f(Q_n)$ gives the value $z_n$, or the depth, when computing pseudodepth.


Cohen-Sutherland Algorithm

Cyrus Beck Algorithm


3D objects can be represented as polygonal meshes. Usually polygonal meshes are made up of a collection of triangles as they are easier to work with, although require more memory.

Mesh Data Structure

It is common to represent a polygon mesh in terms of lists:

For example, given a cube we can construct a triangle mesh as follows

vertex_list = {
  { -1, -1,  1 }, // Vertex 0, with (x, y, z) coordinates
  {  1, -1,  1 }, // Vertex 1, with (x, y, z) coordinates
  {  1,  1,  1 }, // ...
  { -1 , 1,  1 },
  { -1, -1, -1 },
  {  1, -1, -1 },
  {  1,  1, -1 },
  { -1,  1, -1 }

face_list = {
  { 0, 1, 2 }, // Triangle face 0, made up of vertices 0, 1, 2
  { 2, 3, 0 }, // Triangle face 1, made up of vertices 2, 3, 4
  { 1, 5, 6 }, // ...
  { 6, 2, 1 },
  { 5, 4, 7 },
  { 7, 6, 5 },
  { 4, 0, 3 },
  { 3, 7, 4 },
  { 3, 2, 6 },
  { 6, 7, 3 },
  { 4, 5, 1 },
  { 1, 0, 4 }

.ply Format

The PLY format is a simple file format for representing polygon data.

The header specifies how many vertices and how many faces there are and what format they’re in. The body lists all the vertices and faces.



The colour of an object in a scene depends on

There are two kinds of reflection we need to deal with, diffuse and specular.

Dull or matte surfaces exhibit diffuse reflection. Light falling on the surface is reflected uniformly in all directions. It does not depend on the viewpoint.

Polished surfaces exhibit specular reflection. Light falling on the surface is reflected at the same angle. Reflections will look different from different view points.


Most objects will have both a diffuse and a specular component to their illumination. will also include an ambient component to cover lighting from indirect sources.

We will build a lighting equation $I(P) = I_{ambient}(P) + I_{diffuse}(P) + I_{specular}(P)$, where $I(P)$ is the amount of light coming from $P$ to the camera.

To calculate the lighting equation we need to know three important vectors

All three of these vectors need to be normalised.

If we have multiple lights, we add the ambient, diffuse and specular components from all of them:

\[I(P) = \sum_{l \in lights} I_{ambient}^l(P) + I_{diffuse}^l(P) + I_{specular}^l(P)\]

Diffuse Illumination

Diffuse scattering is equal in all directions so does not depend on the viewing angle. The amount of reflected light depends on the angle of the source of the light.

This can be formalised as:

\[I_{diffuse} = max(0, I_{si} \rho_d(\mathbf{s} \cdot \mathbf{m})\]


Specular Reflection

Phong Model

The Phong model is an approximate model to calculate the specular reflection value $\mathbf{r}$.

\[\begin{array}{rrcl} & \mathbf{r} & = & normalise(-\mathbf{s} + 2(\mathbf{s} \cdot \mathbf{m}) \mathbf{m}) \\ \therefore & I_{specular} & = & max(0, I_{si} \rho_{sp} (\mathbf{r} \cdot \mathbf{v})^f) \end{array}\]


Blinn Phong Model

The Blinn Phong model uses a vector halfway between the source and the viewer instead of calculating the reflection vector.

\[\begin{array}{rrcl} & \mathbf{h} & = & normalise(\dfrac{\mathbf{s} + \mathbf{v}}{2}) \\ \therefore & I_{specular} & = & max(0, I_{si} \rho_{sp} (\mathbf{h} \cdot \mathbf{m})^f) \end{array}\]


Ambient Light

Lighting with just diffuse and specular lights gives very stark shadows, whereas in reality shadows are not completely black as light is coming from all directions.

\[I_{ambient} = I_a \rho_a\]



Point Lights

Point lights have a point in the world from which we compute the source vector.

Directional Lights

Directional lights are so far away that the source vector is effectively the same everywhere.


A spotlight has a direction, a cutoff angle and a attenuation factor. The brightness can be calculated as:

\[I = I_{si} (\cos(\theta))^\epsilon\]


Face Normals

Every vertex for a given face will be given the same normal.

To calculate the normal we can use either the cross product method or Newell’s method.

Cross Product Method

Works for any convex polygon, but vertices must be in counter-clockwise order.

Pick two non-parallel adjacent edges with vertices $P_{i - 1}, P_{i + 1}$ and calculate for point $P_i$:

\[\mathbf{n} = (P_{i + 1} - P_i) \times (P_{i - 1} - P_i).\]
private Vector3 normal(Point3D p1, Point3D p2, Point3D p3) {
  Vector3 a = p2.minus(p1);
  Vector3 b = p3.minus(p1);

  return a.cross(b);
Newell’s Method

A more robust approach to computing face normals for arbitrary polygons:

\[\begin{array}{lcl} n_x & = & \sum_{i = 0}^{N - 1} (y_i - y_{i + 1})(z_i + z_{i + 1}) \\ n_y & = & \sum_{i = 0}^{N - 1} (z_i - z_{i + 1})(x_i + x_{i + 1}) \\ n_z & = & \sum_{i = 0}^{N - 1} (x_i - x_{i + 1})(y_i + y_{i + 1}) \\ \end{array}\]

where $(x_N, y_N, z_N) = (x_0, y_0, z_0)$.

Vertex Normals

We keep a separate buffer of normals for a mesh, and calculate it by a weighted average of the face normals.

private void vertexNormals() {
  // Initialise the normals to the zero vector
  for (int i = 0; i < normals.capacity(); i++) {
    normals.put(i, 0, 0, 0);

  // Add the face normals of all surrounding faces.
  for (int i = 0; i < indices.capacity() / 3; i++) {
    int index1 = indices.get(i*3);
    int index2 = indices.get(i*3 + 1);
    int index3 = indices.get(i*3 + 2);

    Point3D p1 = vertices.get(index1);
    Point3D p2 = vertices.get(index2);
    Point3D p3 = vertices.get(index3);

    Vector3 normal = normal(p1, p2, p3);

    normals.put(index1, normals.get(index1).translate(normal));
    normals.put(index2, normals.get(index2).translate(normal));
    normals.put(index3, normals.get(index3).translate(normal));


Flat Shading

Flat shading simply shades the entire face the same colour. It does this by computing $I(P)$ for some $P$ on the face (usually the first vertex) and sets every pixel on the face that colour.

Flat shading is good for

Flat shading is bad for

Gouraud Shading

The lighting equation $I(P)$ is calculated for each vertex $P$ using an associated vertex normal.

We calculate fragment colours by bilinear interpolation on neighbouring vertices:

\[\begin{array}{lcl} colour(R_1) & = & lerp(V_1, V_2, \frac{y - y_1}{y_2 - y_1}) \\ colour(R_2) & = & lerp(V_3, V_4, \frac{y - y_3}{y_4 - y_3}) \\ colour(P) & = & lerp(R_1, R_2, \frac{x - x_1}{x_2 - x_1}) \\ \end{array}\]


Gouraud shading is good for

Gouraud shading is bad for

Phong Shading

The lighting equation is calculated per fragment rather than per vertex.

void main() {
  vec3 m_unit = normalize(m);

  // Compute the s, v and r vectors
  vec3 s = normalize(view_matrix * vec4(lightPos, 1) - viewPosition).xyz;
  vec3 v = normalize(-viewPosition.xyz);
  vec3 r = normalize(reflect(-s, m_unit));

  vec3 ambient = ambientIntensity * ambientCoeff;
  vec3 diffuse = max(0, lightIntensity * diffuseCoeff * dot(m_unit, s));
  vec3 specular = vec3(0);

  // Only show specular reflections for the front face
  if (dot(m_unit, s) > 0) {
    specular = max(0, lightIntensity * specularCoeff * pow(dot(r, v), phongExp));

  vec3 intensity = ambient + diffuse + specular;

  outputColor = vec4(intensity, 1) * input_color;

Phong shading is good for

Phong shading is bad for


We implement colour by having separate red, green and blue components for light intensities $I(P), I_{ambient}(P), I_{specular}(P)$ and reflection coefficients $\rho_a, \rho_d, \rho_{sp}$. The lighting equation is then applied three times, once for each colour.


Alpha Blending

Given an $\alpha$ value and two colours, we can linearly interpolate and compute new $r^\prime, g^\prime, b^\prime, a^\prime$ values based on $\alpha$:

\[\begin{pmatrix} r^\prime \\ g^\prime \\ b^\prime \\ a^\prime \end{pmatrix} = \alpha \begin{pmatrix} r_{image} \\ g_{image} \\ b_{image} \\ a_{image} \end{pmatrix} + (1 - \alpha) \begin{pmatrix} r \\ g \\ b \\ a \end{pmatrix}\]

Alpha blending has some problems, including

In summary, transparent polygons should be drawn in back-to-front order after opaque ones.


It is generally useful to express curves in parametric form:

\[\begin{pmatrix} p_1 \\ p_2 \\ p_3 \end{pmatrix} = P(t), t \in [0, 1]\]

Linear Interpolation

Linear interpolation is good for straight lines. It is of degree 1 and involves 2 control points $P_0, P_1$.

\[P(t) = (1 - t)P_0 + t P_1\]

Quadratic Interpolation

Quadratic interpolation is good for parabolas. It is of degree 2 and involves 3 control points $P_0, P_1, P_2$.

\[P(t) = (1 - t)^2 P_0 + 2t(1 - t) P_1 + t^2 P_2\]

Cubic Interpolation

Cubic interpolation is good for a variety of curves. It is of degree 3 and involves 4 control points $P_0, P_1, P_2, P_3$.

\[P(t) = (1 - t)^3 P_0 + 3t(1 - t)^2 P_1 + 3t^2(1 - t) P_2 + t^3 P_3\]

Bezier Curves

The above family of curves are known as Bezier curves. The have the general form:

\[P(t) = \sum_{k = 0}^m B_k^m (t) P_k\]


Bernstein Polynomials

\[B_k^m(t) = \begin{pmatrix} m \\ k \end{pmatrix} t^k (1 - t)^{m - k}\]


\[\begin{pmatrix} m \\ k \end{pmatrix} = \frac{m!}{k! (m - k)!}\]


The tangent vector to the curve at parameter $t$ is given by

\[\begin{array}{rcl} \frac{d P(t)}{dt} & = & \sum_{k = 0}^m \frac{d B_k^m (t)}{dt} P_k \\ & = & m \sum_{k = 0}^{m - 1} B_k^{m - 1} (t) (P_{k + 1} - P_k) \end{array}\]



Extruded shapes are created by sweeping a 2D polygon along a line or curve.

We can extrude a polygon along a path by specifying it as a series of transformations:

\[\begin{array}{rcl} poly & = & P_0, P_1, \dots, P_k \\ path & = & M_0, M_1, \dots, M_n \end{array}\]

and at each point in the path, we calculate a cross section $poly_i = M_i P_0, M_i P_1, \dots, M_i P_k$.

Frenet Frame

We can get the curve values at various points $t_i$ and then build a polygon perpendicular to the curve at $C(t_i)$ using a Frenet frame.

We align the $k$ axis with the (normalised) tangent, and choose values of $i$ and $j$ to be perpendicular:

\[\begin{array}{rcl} \phi & = & C(t) \\ k & = & \hat{C^\prime}(t) \\ i & = & \begin{pmatrix}-k_2 \\ k_1 \\ 0\end{pmatrix} \\ j & = & k \times i \end{array}\]

To find the tangent ($k$) we can:


A surface with radial symmetry can be made by sweeping a half cross-section around an axis.


Textures are a way to add detail to models without requiring too many polygons. A texture is basically a function that maps texture coordinates to pixel values:

\[T(s, t) = \begin{pmatrix} r \\ g \\ b \\ a \end{pmatrix}\]

where $(s, t)$ is a texture coordinate.

Textures are most commonly represented by bitmaps. Pixels on a texture are called texels.

Textures & Shading

The simplest way to approach textures is to replace illumination calculations with a texture look up: $I(P) = T(s(P), t(P))$.

A more common solution is to use the texture to modulate the ambient and diffuse reflection coefficients: $I(P) = T(s, t)[I_{ambient} + I_{diffuse}] + I_{specular}$.


Normal bitmap textures have finite detail, and if we zoom in close we can see individual texels. To fix this we can:


A similar problem occurs when we zoom out, that is that multiple texels map to a single pixel. This is known as aliasing and occurs when samples are taken from an image at a lower resolution than repeating detail in the image.

MIP mapping

Mipmaps are precomputed low resolution versions of a texture. To use mipmaps, we simply choose the next smallest mipmap for the required resolution.


Trilinear filtering

Use bilinear filtering to compute pixel values based on the next highest and the next lowest mipmap resolutions, and interpolate between these values depending on the desired resolution.

Anisotropic Filtering

If a polygon is on an oblique angle away from the camera, then minification may occur much more strongly in one dimension than the other. Anisotropic filtering treats the two axes independently.

Reflection Mapping

Generating reflections is expensive, but we can make a reasonable approximation with textures:

Shadow Buffer

To take into account whether a light source is occluded by another object, we can keep a shadow buffer for each light source. It is similar to the depth buffer by recording the distance from the light source to the closest object in each direction.

Shadow rendering is usually done in multiple passes:



Light Mapping

If our light sources and large portions of the geometry are static then we can precompute the lighting equations and store the results in textures called light maps. This process is known as baked lighting.




Rasterisation is the process of converting lines and polygons represented by their vertices into fragments.

Bresenham’s Algorithm

Given two points $(x_0, y_0), (x_1, y_1)$, we can draw a line using only integer calculations:

int y = y_0;

int w = x_1 - x_0; 
int h = y_1 - y_0;
int F = 2 * h - w;

for (int x = x_0; x <= x_1; x++) {
  drawPixel(x, y);

  if (F < 0) {
    F += 2 * h;
  } else { 
    F += 2 * (h - w);

Polygon Filling

Shared Edges

Pixels on shared edges between polygons need to be drawn consistently regardless of the order the polygons are drawn, with no gaps. We adopt a rule that we do not draw rightmost or uppermost edge pixels.

Scanline Algorithm

// For every scanline
for (y = minY; y <= maxY; y++) {
  remove all edges that end at y
  for (Edge e : active) {
    e.x += e.inc;
  add all edges that start at y - keep list sorted by x
  for (int i = 0; i < active.size; i += 2) {
    fillPixels(active[i].x, active[i + 1].x, y);
Edge Table

The edge table is a lookup table indexed on the y-value of the lower vertex of the edge. Horizontal edges are not added. We store the the x-value of the lower vertex, the increment (inverse gradient i.e. $run / rise$) of the edge and the y-value of the upper vertex.

Active Edge List

We keep a list of active edges that overlap the current scanline. Edges are added to the list as we pass the bottom vertex. Edges are removed from the list as we pass the top vertex.

For each edge in the active edge list we store The x-value of its crossing with the current row (initially the bottom x-value), the increment (inverse gradient) and the y-value of the top vertex.


There are two basic approaches to eliminating aliasing:

Particle Systems

Particles are usually represented as small textured quads or point sprites (single vertices with an image attached). They are billboarded, i.e. transformed so that they are always face towards the camera.

Particles are created by an emitter object and evolve over time, usually changing their properties.


Particle systems are well suited to implementation as vertex shaders, where particles are represented as individual vertices.

Ray Tracing

In ray tracing, objects are modelled as implicit forms and each pixel is computed by casting a ray and seeing which models it intersects.

Global Lighting

In reality, the light falling on a surface comes from everywhere. Methods that take this lighting into account are called global lighting methods.

There are two main methods for global lighting:

Both methods are computationally expensive and are rarely suitable for real-time rendering.


for each pixel (x, y):
  v = P(x, y) - E
  hits = {}
  for each object obj in the scene:
    E' = M_inverse * E
    v' = M_inverse * v
    hits.add(obj.hit(E', v'))
  hit = h in hits with min time > 1
  if (hit is null):
    set (x, y) to background
    set (x, y) to hit.obj.color(R(hit.time))


At each hit point we cast a new ray towards each light source. These rays are called shadow feelers.


We can implement realistic reflections by casting further reflected rays.

Reflected rays can be reflected off many objects one after the other, and so we usually stop after a fixed number of reflections to avoid infinite recursion.


We can model transparent objects by casting a second ray that continues through the object.


The illumination equation is extended to include reflected and transmitted components, which are computed recursively:

\[I(P) = I_{ambient} + I_{diffuse} + I_{specular} + I(P_{reflection}) + I(P_{transparent}).\]


To handle transparency appropriately we need to take into account the refraction of light. Light bends as it moves from one medium to another. The change is described by Snell’s Law:

\[\frac{\sin \theta_1}{c_1} = \frac{\sin \theta_2}{c_2}\]

where $c_1, c_2$ are the speeds of light in each medium.

For maximum realism, we should calculate different paths for different colours, since different wavelengths of light move at different speeds.


Extents are bounding boxes or spheres which enclose an object. Testing for a hit against a box or sphere is fast, and if the test succeeds, then we can proceed to test against the actual object.

Binary Space Partitioning (BSP)

A BSP tree divides the world into cells, where each cell contains a small number of objects. When casting a ray, we only want to visit the leaves of the tree that the ray passes through.

Traversal Algorithm Pseudocode

visit(E, v, node): (E eye) 
  if (node is leaf):
    intersect ray with objs in leaf
    if (E on left):
      visit(E, v, left)
      other = right;
      visit(E, v, right)
      other = left
    if (ray crosses boundary):
      E' = intersect(E, v, boundary)
      visit(E', v, other)

Volumetric Ray Tracing

We can also apply ray tracing to volumetric objects like smoke or fog or fire. Such objects are transparent, but have different intensity and transparency throughout the volume.

We represent the volume as two functions:

\[\begin{array}{rcl} C(P) & = & \text{colour at point} \ P \\ \alpha(P) & = & \text{transparency at point} \ P \end{array}\]


We cast a ray from the camera through the volume and take samples at fixed intervals along the ray. We end up with $N + 1$ samples:

\[\begin{array}{rcl} P_i & = & R(t_{hit}, i \Delta t) \\ C_i & = & C(P_i) \\ \alpha_i & = & \alpha(P_i) \\ C_N & = & (r, g, b)_{background} \\ \alpha_N & = & 1 \end{array}\]
Alpha Compositing

We now combine these values into a single colour by applying the alpha-blending equation.

\[\begin{array}{ccl} C_N^N & = & C_N \\ \underbrace{C_N^i}_\text{Total colour at $i$} & = & \alpha_i \underbrace{C_i}_\text{Local colour at $i$} + (1 - \alpha_i) \underbrace{C_N^{i + 1}}_\text{Total colour at $i + 1$} \end{array}\]

We can write a closed formula for the colour from $a$ to $b$ as

\[C_b^a = \sum_{i = 1}^b \alpha_i C_i \prod_{j = a}^{i - 1} (1 - \alpha_j)\]


Colour Blending

We write $C = rR + gG + bB$ to indicate that colour $C$ is equivalent $r$ units of red, $g$ units of green and $b$ units of blue.

Complementary Colours

Colours that add to give white are called complementary colours.


Colour is expressed as the addition of red, green and blue components:

\[C(r, g, b) = rR + gG + bB.\]


CMY is a subtractive colour model, typically used in describing printed media. It uses cyan, magenta and yellow which are contrasting colours.


CMYK is similar to CMY, but has an additional component K, which stands for ‘key’ and refers to black ink.


Colour is expressed as three values:


Colour is expressed in terms of its:


The CIE standard, also known as the XYZ model, is a way of describing colours as a three dimensional vector $(x, y, z)$ with

\[\begin{array}{c} 0 \le x, y, z \le 1 \\ x + y + z = 1 \end{array}\]


Any output device has a certain range of colours it can represent, which we call its gamut. We can depict this as an area on the CIE chart.

Rendering Intents

There are four standard rendering intents which describe approaches to gamut mapping.

Absolute Colourmetric

Map $C$ to the nearest point within the gamut. This approach distorts hues and does not preserve relative saturation.

Relative Colourmetric

Desaturate $C$ until it lies in the gamut. This approach maintains hues more closely but does not preserve relative saturation.


Desaturate all colours until they all lie in the gamut. This approach maintains hues and preserves relative saturation but removes a lot of saturation.


Attempt to maintain saturated colours. This approach is not standardised.


Postprocessing is the process of adding visual affects after the scene has already been rendered.

Frame Buffer Objects

The frame buffer OpenGL gives us by default is very limited in most implementations.

We can create our own frame buffer objects with:

int[] fbos = new int[1];
gl.glGenFramebuffers(1, fbos, 0);

In order to use the frame buffer we have to attach to it at least a color buffer and optionally a depth or other buffers.

Gaussian Blur

Uses a Gaussian distribution to combine each pixel with its neighbours. The pixel itself is weighted most, and nearby pixels weighted depending on their distance from it.

Bloom Filter

Combines multiple post-processing steps:

  1. Extract bright regions
  2. Blur bright regions
  3. Draw original scene to screen
  4. Draw blurred brightness on top

High Dynamic Range

By using frame buffer objects we can render into buffers with more color detail. To render to the screen, this colour information has to be mapped to the conventional 8 bits per channel. This process is referred to as tone mapping.

Different tone mappings can do things like reduce overall brightness or enhance certain colours.

Ambient Occlusion

Approximates indirect lighting by identifying small holes, crevices and gaps and making them darker. For each pixel that has already been rendered, calculate an occlusion factor based on the depth of it and the depth of surrounding pixels.

Deferred Shading

In deferred shading we render the scene into multiple buffers containing basic information (e.g. normals, colour, depth). Actual lighting calculations can then be deferred to a post processing step.


A spline is a smooth piecewise-polynomial function. The places where the polynomials join are called knots. A joined sequence of Bezier curves is an example of a spline.

A spline provides local control in the fact that a control point only affects the curve within a limited neighbourhood.


We can generalise Bezier splines into a larger class called basis splines or B-splines. A B-spline of degree $M$ has equation:

\[P(t) = \sum_{k = 0}^L N_k^m (t) P_k\]

where $L$ is the number of control points, with $L > m$.

Uniform / Non-uniform

Rational Bezier Curves

We can create a greater variety of curve shapes if we weight the control points:

\[P(t) = \frac{\sum_{k = 0}^m w_k B_k^m (t) P_k}{\sum_{k = 0}^m w_k B_k^m (t)}\]

where a higher weight draws the curve closer to that point.

Rational Bezier curves can exactly represent all conic sections, which is not possible with normal Bezier curves.

Rational B-splines

We can also weight control points in B-splines to get rational B-splines:

\[P(t) = \frac{\sum_{k = 0}^L w_k N_k^m (t) P_k}{\sum_{k = 0}^L w_k N_k^m (t)}.\]