Written by Luka Kerr on August 12, 2019
Introduction
CPU & GPU
- CPU consists of a few cores optimized for sequential serial processing
- GPU has a massively parallel architecture consisting of smaller special purpose cores designed for parallel work
Single Instruction, Multiple Thread (SIMT)
Each processor in the GPU has multiple threads, executing in lock step for a piece of code.
It combines single instruction, multiple data (SIMD) with multithreading.
OpenGL
OpenGL is a low-level 2D/3D graphics API. It is:
- Free and open source
- Cross platform
- Highly optimised
- Designed to use GPUs
OpenGL is used to:
- Transfer data to the graphics memory
- Draw primitive shapes using data
UNSWgraph
Coordinate System
By default the viewport is centred at (0, 0)
. The left boundary is at x = -1
, the right at x = 1
, the bottom at y = -1
, and the top at y = 1
.
Geometry
Tessellation
We can draw polygons by splitting them up into simpler shapes (typically triangles). One simple method is to use a triangle fan.
The steps to draw a polygon using a triangle fan are as follows:
- Start with any vertex of the polygon and move clockwise or counter-clockwise around it
- The first three points form a triangle. Any new points after that form a triangle with the last point and the starting point
Tessellation works for all simple convex polygons, and some concave ones. It can be drawn with GL_TRIANGLE_FAN
.
2D Transformations
Model Transformation
A model transformation describes how a local coordinate system maps to the world coordinate system. Each object in a scene has its own local coordinate system.
Coordinate Frames
Each coordinate system has a coordinate frame. A coordinate frame has an origin, and the direction and scale of the x and y axes. Different transformations can be applied to coordinate frames including:
translate(float x, float y)
: move the origin of the coordinate framerotate(float degrees)
: rotate the coordinate framescale(float x, float y)
: scale the coordinate frame byx
andy
amounts
Vectors & Matrices
The sum of a point and a vector is a point: $P + \vec{v} = Q$.
The difference between two points is a vector: $\vec{v} = Q - P$.
The sum/difference of two vectors is a vector: $(u_1, u_2)^T \pm (v_1, v_2)^T = (u_1 \pm v_1, u_2 \pm v_2)^T$.
Magnitude: $|\vec{v}| = \sqrt{v^2_1 + v^2_2 + \cdots + v^2_n}$.
Normalisation (direction): $\hat{\vec{v}} = \dfrac{\vec{v}}{|\vec{v}|}$, and $|\hat{\vec{v}}| = 1$.
Dot product: $\vec{u} \cdot \vec{v} = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n$.
Cross product: $\vec{a} \times \vec{b} = (a_2 b_3 - a_3 b_2, a_3 b_1 - a_1 b_3, a_1 b_2 - a_2 b_1)^T$.
Cross product magnitude is the area of the parallelogram formed by: $|\vec{a} \times \vec{b}| = |a| |b| \sin \theta$.
Matrix multiplication: $A_{m \times n} \cdot B_{n \times p} = C_{m \times p}$ where $C_{ij} = (a_{i1} \cdot b_{1j}) + (a_{i2} \cdot b_{2j}) + \cdots + (a_{in} \cdot b_{nj})$.
Scene Trees
Scene Tree
A scene tree is used to represent a complex scene. Each node in the scene tree represents an objet, and each edge represents the transformation to get from the parent object’s coordinate system to the child’s.
Each node in a scene tree computes its own coordinate frame, draws itself within the frame, and passes the frame down to its children. As such, when a node is transformed, all of its children move with it.
def drawTree(frame):
# Compute new frame
frame
.translate()
.rotate()
.scale()
draw() this object
for child in children:
child.drawTree(frame)
Scene Graph
A scene graph is a more general scene tree. It is a directed acyclic multi-graph.
- Each node can have multiple parents
- Multiple edges can go from parent to child
- Shared nodes are drawn multiple times
Camera
Transformation Pipeline
We transform in 2 stages
\[P_{camera} \overset{view}{\longleftarrow} P_{world} \overset{model}{\longleftarrow} P_{local}\]The model transform transforms points in the local coordinate system to the world coordinate system.
The view transform transforms points in the world coordinate system to the camera’s coordinate system.
If \(P_{world} = Trans(Rot(Scale(P_{camera})))\) then the view transform is \(P_{camera} = Scale^{-1}(Rot^{-1}(Trans^{-1}(P_{world}))).\)
Coordinate Frames
A 2D coordinate frame is defined by
- An origin $\phi$
- 2 axis vectors $i$ ($x$-axis) and $j$ ($y$-axis)
A point on the coordinate frame can be defined as a displacement from the origin, i.e. $P = p_1 i + p_2 j + \phi$.
Affine Transformations
Transformations between coordinate frames can be represented as matrices
\[\begin{pmatrix} q_1 \\ q_2 \\ 1 \end{pmatrix} = \begin{pmatrix} i_1 & j_1 & \phi_1 \\ i_1 & j_2 & \phi_2 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ 1 \end{pmatrix}\]Similarly for vectors
\[\begin{pmatrix} v_1 \\ v_2 \\ 0 \end{pmatrix} = \begin{pmatrix} i_1 & j_1 & \phi_1 \\ i_1 & j_2 & \phi_2 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} u_1 \\ u_2 \\ 0 \end{pmatrix}\]All affine transformations can be expressed as combinations of four basic types:
- Translation
- Rotation
- Scale
- Shear
2D Translation
To translate the origin to a new point $\phi$
\[\begin{pmatrix} q_1 \\ q_2 \\ 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & \phi_1 \\ 0 & 1 & \phi_2 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ 1 \end{pmatrix}\]2D Rotation
To rotate a point about the origin by $\theta$
\[\begin{pmatrix} q_1 \\ q_2 \\ 1 \end{pmatrix} = \begin{pmatrix} cos(\theta) & -sin(\theta) & 0 \\ sin(\theta) & cos(\theta) & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ 1 \end{pmatrix}\]Likewise to rotate a vector by $\theta$
\[\begin{pmatrix} v_1 \\ v_2 \\ 0 \end{pmatrix} = \begin{pmatrix} cos(\theta) & -sin(\theta) & 0 \\ sin(\theta) & cos(\theta) & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} u_1 \\ u_2 \\ 0 \end{pmatrix}\]2D Scale
To scale a point by factors $(s_x, s_y)$ about the origin
\[\begin{pmatrix} s_x p_1 \\ s_y p_2 \\ 1 \end{pmatrix} = \begin{pmatrix} s_x & 0 & 0 \\ 0 & s_y & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ 1 \end{pmatrix}\]Likewise to scale a vector by $(s_x, s_y)$
\[\begin{pmatrix} s_x v_1 \\ s_y v_2 \\ 0 \end{pmatrix} = \begin{pmatrix} s_x & 0 & 0 \\ 0 & s_y & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ 0 \end{pmatrix}\]2D Shear
To horizontally shear a point by a factor $h$
\[\begin{pmatrix} q_1 \\ q_2 \\ 1 \end{pmatrix} = \begin{pmatrix} 1 & h & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ 1 \end{pmatrix}\]To vertically shear a point by a factor $v$
\[\begin{pmatrix} q_1 \\ q_2 \\ 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 \\ v & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ 1 \end{pmatrix}\]Decomposing Transformations
Every 2D affine transformation can be decomposed as $M = M_{translate} M_{rotate} M_{scale} M_{shear}$.
To decompose the transform, consider the matrix form
\[\begin{pmatrix} i_1 & j_1 & \phi_1 \\ i_2 & j_2 & \phi_2 \\ 0 & 0 & 1 \end{pmatrix}\]Then, assuming uniform scaling and no shear
\[\begin{array}{lcl} M_{translate} = \begin{pmatrix}\phi_1 \\ \phi_2 \\ 1\end{pmatrix} \\ M_{rotate} = atan2(i_2, i_1) \\ M_{scale} = |i| \end{array}\]where $atan2(i_2, i_1)$ is $atan(i_2 / i_1)$ or $tan^{-1}(i_2 / i_1)$ adujsting for $i_1$ being 0. If $i_1 = 0$ and $i_2 \ne 0$ we get $90^{\circ}$ if $i_2 > 0$, or $-90^{\circ}$ if $i_2 < 0$.
Inverse Transformations
Inverses can be calculated as follows
\[\begin{array}{lcl} M_{translate}^{-1}(d_x, d_y) & = & M_{translate}(-d_x, -d_y) \\ M_{rotate}^{-1}(\theta) & = & M_{rotate}(-\theta) \\ M_{scale}^{-1}(s_x, s_y) & = & M_{scale}(1 / s_x, 1 / s_y) \\ M_{shear}^{-1}(h) & = & M_{shear}(-h) \\ \end{array}\]Linear Interpolation
\[\begin{array}{lcl} lerp(P, Q, t) & = & P + t(Q - P) \\ lerp(P, Q, t) & = & P(1 - t) + tQ \\ \end{array}\]where
- $P$ and $Q$ are points
- $t \in [0, 1]$
Lines
Parametric Form
\[L(t) = P + t(Q - P)\]Point Normal Form
\[\mathbf{n} \cdot (P - L) = 0\]Line Intersection
Two lines
- $L_{AB} = A + (B - A)t$
- $L_{CD} = C + (D - C)u$
Solve the simultaneous equations $(B - A)t = (C - A) + (D - C)u$.
Polygon Intersection
To check if a point $P$ is in a polygon, we can fire a horizontal ray from $P$ and check how many times it crosses the edges of the polygon. If $n$ is the number of crosses and $n$ is even then $P$ is not in the polygon, if $n$ is odd then $P$ is in the polygon.
Shaders
Shaders are programs executed on the GPU for the purpose of rendering graphics, written in glsl
.
Vertex Shaders
The GPU will execute the vertex shader for every vertex we supply it.
Variable types include
in
- inputs to the shader and are different for each vertexuniform
- inputs to the shader that are the same for every vertexout
- what the shader outputsvec2
,vec3
,vec4
- vectors of different size
Variables starting with gl_
and have special meaning.
// Incoming vertex position
in vec2 position;
// Takes the input position and returns it as is
void main() {
gl_Position = vec4(position, 0, 1);
}
Fragment Shaders
The GPU will execute the fragment shader for every pixel it draws into the frame buffer.
out vec4 outputColor;
// Output black
void main() {
outputColor = vec4(0, 0, 0, 0);
}
3D
Moving to 3D is simply a matter of adding an extra dimension to our points and vectors
\[P = \begin{pmatrix} p_x \\ p_y \\ p_z \\ 1 \end{pmatrix}, \quad v = \begin{pmatrix} v_x \\ v_y \\ v_z \\ 0 \end{pmatrix}\]3D Affine Transformations
3D affine transformations have the same structure as 2D but have an extra axis $k$
\[M = \begin{pmatrix} i_1 & j_1 & k_1 & \phi_1 \\ i_2 & j_2 & k_2 & \phi_2 \\ i_3 & j_3 & k_3 & \phi_3 \\ 0 & 0 & 0 & 1 \end{pmatrix}\]3D Translation
To translate the origin to a new point $\phi$
\[\begin{pmatrix} q_1 \\ q_2 \\ q_3 \\ 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & \phi_1 \\ 0 & 1 & 0 & \phi_2 \\ 0 & 0 & 1 & \phi_3 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ p_3 \\ 1 \end{pmatrix}\]3D Rotation
The rotation matrix depends on the axis of rotation. We can decompose any rotation into a sequence of rotations about the $x$, $y$ and $z$ axes.
\[\begin{pmatrix} q_{x_1} \\ q_{x_2} \\ q_{x_3} \\ 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & cos(\theta) & -sin(\theta) & 0 \\ 0 & sin(\theta) & cos(\theta) & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ p_3 \\ 1 \end{pmatrix}\] \[\begin{pmatrix} q_{y_1} \\ q_{y_2} \\ q_{y_3} \\ 1 \end{pmatrix} = \begin{pmatrix} cos(\theta) & 0 & sin(\theta) & 0 \\ 0 & 1 & 0 & 0 \\ -sin(\theta) & 0 & cos(\theta) & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ p_3 \\ 1 \end{pmatrix}\] \[\begin{pmatrix} q_{z_1} \\ q_{z_2} \\ q_{z_3} \\ 1 \end{pmatrix} = \begin{pmatrix} cos(\theta) & -sin(\theta) & 0 & 0 \\ sin(\theta) & cos(\theta) & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ p_3 \\ 1 \end{pmatrix}\]3D Scale
To scale a point by factors $(s_x, s_y, s_z)$ about the origin
\[\begin{pmatrix} s_x p_1 \\ s_y p_2 \\ s_z p_3 \\ 1 \end{pmatrix} = \begin{pmatrix} s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & s_z & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ p_3 \\ 1 \end{pmatrix}\]3D Shear
To horizontally shear a point by a factor $h$
\[\begin{pmatrix} q_1 \\ q_2 \\ q_3 \\ 1 \end{pmatrix} = \begin{pmatrix} 1 & h & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \\ p_3 \\ 1 \end{pmatrix}\]Depth
Winding Order
By default, triangles are defined with counter-clockwise vertices are processed as front-facing triangles. Clockwise are processed as back-facing triangles.
Back Face Culling
An optimisation called face culling allows non-visible triangles of closed surfaces (back faces) to be culled before rasterisation. This is based on the winding order of the triangle - front faces are CCW by default.
View Volume
A 3D camera shows a 3D view volume. This is the area of space that will be displayed in the viewport. Objects outside the view volume are clipped. The view volume is in camera coordinates.
Orthographic View Volume
The camera is located at the origin in camera co-ordinates and oriented down the negative z-axis.
Canonical View Volume
It is convenient for clipping if we scale all coordinates so that visible points lie within the range $(-1, 1)$. The origin is in the middle of the view volume and the z-axis is flipped.
Perspective Camera
We can define a different kind of projection that implements a perspective camera. The view volume is a frustum. A frustum consists of:
- A left, right, bottom and top
- A near plane from the camera to a near viewport
- A far plane from the camera to a far away viewport
public Matrix4 frustum(float left, float right, float bottom,
float top, float near, float far) {
return new Matrix4(new float[] {
2*near/(right-left), 0, 0, 0,
0, 2*near/(top-bottom), 0, 0,
(right+left)/(right-left), (top+bottom)/(top-bottom), -(far+near)/(far-near), -1,
0, 0, -2*far*near/(far-near), 0
});
}
It is sometimes easier to define a frustum with a field of view, aspect ratio and near and far planes:
public Matrix4 perspective(float fovy, float aspectRatio, float near, float far) {
float halfHeight = (float) (near * Math.tan(Math.toRadians(fovy) / 2));
return Matrix4.frustum(
-aspectRatio * halfHeight, aspectRatio * halfHeight,
-halfHeight, halfHeight, near, far
);
}
Assuming a symmetric view volume:
- $\tan(FOV / 2) = \dfrac{height}{2 \times near}$
- $width = height \times aspectRatio$
The Transformation Pipeline
To transform a point $P = (p_x, p_y, p_z)^T$:
- Extend to homogenous coordinates \(P = (p_x, p_y, p_z, 1)^T\)
- Multiply by model matrix to get world coordinates \(P_w = M_{model} P\)
- Multiply by view matrix to get camera coordinates \(P_c = M_{view} P_w\)
- Multiply by projection matrix to get CVV coordinates \(P_{cvv} = M_p P_c\)
- Clip to remove points outside CVV
- Perspective division to eliminate fourth component \(P_n = \frac{1}{p_w} P_{cvv}\)
- Viewport transformation to window coordinates \(P_v = M_{viewport} P_n\)
Pseudodepth
Bilinear Interpolation
Given
- A point $P$ with $(x, y)$ values
- A point $Q_1$ with $(x_1, y_1, z_1)$, with $y_1 < y$
- A point $Q_2$ with $(x_2, y_2, z_2)$, with $y_2 > y$
- A point $Q_3$ with $(x_3, y_3, z_3)$, with $x_3 > x$
- A point $Q_4$ with $(x_4, y_3, z_4)$, with $x_4 < x$
then
\[\begin{array}{lcl} f(R_1) = \dfrac{y - y_1}{y_2 - y_1} f(Q_2) + \dfrac{y_2 - y}{y_2 - y_1} f(Q_1) \\ \\ f(R_2) = \dfrac{y - y_3}{y_4 - y_3} f(Q_4) + \dfrac{y_4 - y}{y_4 - y_3} f(Q_3) \\ \\ f(P) = \dfrac{x - x_1}{x_2 - x_1} f(R_2) + \dfrac{x_2 - x}{x_2 - x_1} f(R_1) \end{array}\]where $f(Q_n)$ gives the value $z_n$, or the depth, when computing pseudodepth.
Clipping
Cohen-Sutherland Algorithm
- Used for line clipping on a rectangular clipping window
- Divides a 2D space into 9 regions and then determines the lines and portions of lines that are visible in the viewport
Cyrus Beck Algorithm
- Used with a convex polygon clipping window
- More efficient than Cohen-Sutherland
Meshes
3D objects can be represented as polygonal meshes. Usually polygonal meshes are made up of a collection of triangles as they are easier to work with, although require more memory.
Mesh Data Structure
It is common to represent a polygon mesh in terms of lists:
- Vertex list: all the vertices used in the mesh
- Face list: each face’s vertices as indices into the vertex list
For example, given a cube we can construct a triangle mesh as follows
vertex_list = {
{ -1, -1, 1 }, // Vertex 0, with (x, y, z) coordinates
{ 1, -1, 1 }, // Vertex 1, with (x, y, z) coordinates
{ 1, 1, 1 }, // ...
{ -1 , 1, 1 },
{ -1, -1, -1 },
{ 1, -1, -1 },
{ 1, 1, -1 },
{ -1, 1, -1 }
};
face_list = {
{ 0, 1, 2 }, // Triangle face 0, made up of vertices 0, 1, 2
{ 2, 3, 0 }, // Triangle face 1, made up of vertices 2, 3, 4
{ 1, 5, 6 }, // ...
{ 6, 2, 1 },
{ 5, 4, 7 },
{ 7, 6, 5 },
{ 4, 0, 3 },
{ 3, 7, 4 },
{ 3, 2, 6 },
{ 6, 7, 3 },
{ 4, 5, 1 },
{ 1, 0, 4 }
}
.ply
Format
The PLY format is a simple file format for representing polygon data.
The header specifies how many vertices and how many faces there are and what format they’re in. The body lists all the vertices and faces.
Lighting
Illumination
The colour of an object in a scene depends on
- The colour and amount of light that falls on it
- The colour and reflectivity of the object
There are two kinds of reflection we need to deal with, diffuse and specular.
Dull or matte surfaces exhibit diffuse reflection. Light falling on the surface is reflected uniformly in all directions. It does not depend on the viewpoint.
Polished surfaces exhibit specular reflection. Light falling on the surface is reflected at the same angle. Reflections will look different from different view points.
Components
Most objects will have both a diffuse and a specular component to their illumination. will also include an ambient component to cover lighting from indirect sources.
We will build a lighting equation $I(P) = I_{ambient}(P) + I_{diffuse}(P) + I_{specular}(P)$, where $I(P)$ is the amount of light coming from $P$ to the camera.
To calculate the lighting equation we need to know three important vectors
- The normal vector $\mathbf{m}$ to the surface $P$
- The view vector $\mathbf{v}$ from $P$ to the camera
- The source vector $\mathbf{s}$ from $P$ to the light source
All three of these vectors need to be normalised.
If we have multiple lights, we add the ambient, diffuse and specular components from all of them:
\[I(P) = \sum_{l \in lights} I_{ambient}^l(P) + I_{diffuse}^l(P) + I_{specular}^l(P)\]Diffuse Illumination
Diffuse scattering is equal in all directions so does not depend on the viewing angle. The amount of reflected light depends on the angle of the source of the light.
This can be formalised as:
\[I_{diffuse} = max(0, I_{si} \rho_d(\mathbf{s} \cdot \mathbf{m})\]where
- $I_{si}$ is the source intensity
- $\rho_d$ is the diffuse reflection coefficient in $[0, 1]$
Specular Reflection
Phong Model
The Phong model is an approximate model to calculate the specular reflection value $\mathbf{r}$.
\[\begin{array}{rrcl} & \mathbf{r} & = & normalise(-\mathbf{s} + 2(\mathbf{s} \cdot \mathbf{m}) \mathbf{m}) \\ \therefore & I_{specular} & = & max(0, I_{si} \rho_{sp} (\mathbf{r} \cdot \mathbf{v})^f) \end{array}\]where
- $I_{si}$ is the source intensity
- $\rho_{sp}$ is the specular reflection coefficient in the range $[0, 1]$
- $f$ is the phong exponent, typically in the range $[1, 128]$
Blinn Phong Model
The Blinn Phong model uses a vector halfway between the source and the viewer instead of calculating the reflection vector.
\[\begin{array}{rrcl} & \mathbf{h} & = & normalise(\dfrac{\mathbf{s} + \mathbf{v}}{2}) \\ \therefore & I_{specular} & = & max(0, I_{si} \rho_{sp} (\mathbf{h} \cdot \mathbf{m})^f) \end{array}\]where
- $I_{si}$ is the source intensity
- $\rho_{sp}$ is the specular reflection coefficient in the range $[0, 1]$
- $f$ is the phong exponent, typically in the range $[1, 128]$
Ambient Light
Lighting with just diffuse and specular lights gives very stark shadows, whereas in reality shadows are not completely black as light is coming from all directions.
\[I_{ambient} = I_a \rho_a\]where
- $I_a$ is the ambient light intensity
- $\rho_a$ is the ambient reflection coefficient in the range $(0, 1)$
Lights
Point Lights
Point lights have a point in the world from which we compute the source vector.
Directional Lights
Directional lights are so far away that the source vector is effectively the same everywhere.
Spotlights
A spotlight has a direction, a cutoff angle and a attenuation factor. The brightness can be calculated as:
\[I = I_{si} (\cos(\theta))^\epsilon\]where
- $I_{si}$ is the source intensity
- $\theta$ is the angle
- $\epsilon$ is the attenuation factor
Face Normals
Every vertex for a given face will be given the same normal.
To calculate the normal we can use either the cross product method or Newell’s method.
Cross Product Method
Works for any convex polygon, but vertices must be in counter-clockwise order.
Pick two non-parallel adjacent edges with vertices $P_{i - 1}, P_{i + 1}$ and calculate for point $P_i$:
\[\mathbf{n} = (P_{i + 1} - P_i) \times (P_{i - 1} - P_i).\]private Vector3 normal(Point3D p1, Point3D p2, Point3D p3) {
Vector3 a = p2.minus(p1);
Vector3 b = p3.minus(p1);
return a.cross(b);
}
Newell’s Method
A more robust approach to computing face normals for arbitrary polygons:
\[\begin{array}{lcl} n_x & = & \sum_{i = 0}^{N - 1} (y_i - y_{i + 1})(z_i + z_{i + 1}) \\ n_y & = & \sum_{i = 0}^{N - 1} (z_i - z_{i + 1})(x_i + x_{i + 1}) \\ n_z & = & \sum_{i = 0}^{N - 1} (x_i - x_{i + 1})(y_i + y_{i + 1}) \\ \end{array}\]where $(x_N, y_N, z_N) = (x_0, y_0, z_0)$.
Vertex Normals
We keep a separate buffer of normals for a mesh, and calculate it by a weighted average of the face normals.
private void vertexNormals() {
// Initialise the normals to the zero vector
for (int i = 0; i < normals.capacity(); i++) {
normals.put(i, 0, 0, 0);
}
// Add the face normals of all surrounding faces.
for (int i = 0; i < indices.capacity() / 3; i++) {
int index1 = indices.get(i*3);
int index2 = indices.get(i*3 + 1);
int index3 = indices.get(i*3 + 2);
Point3D p1 = vertices.get(index1);
Point3D p2 = vertices.get(index2);
Point3D p3 = vertices.get(index3);
Vector3 normal = normal(p1, p2, p3);
normals.put(index1, normals.get(index1).translate(normal));
normals.put(index2, normals.get(index2).translate(normal));
normals.put(index3, normals.get(index3).translate(normal));
}
}
Shading
Flat Shading
Flat shading simply shades the entire face the same colour. It does this by computing $I(P)$ for some $P$ on the face (usually the first vertex) and sets every pixel on the face that colour.
Flat shading is good for
- Diffuse illumination
- Flat surfaces with distant light sources
- Non-realistic/retro rendering
Flat shading is bad for
- Close light sources
- Specular shading
- Curved surfaces
Gouraud Shading
The lighting equation $I(P)$ is calculated for each vertex $P$ using an associated vertex normal.
We calculate fragment colours by bilinear interpolation on neighbouring vertices:
\[\begin{array}{lcl} colour(R_1) & = & lerp(V_1, V_2, \frac{y - y_1}{y_2 - y_1}) \\ colour(R_2) & = & lerp(V_3, V_4, \frac{y - y_3}{y_4 - y_3}) \\ colour(P) & = & lerp(R_1, R_2, \frac{x - x_1}{x_2 - x_1}) \\ \end{array}\]where
- $V_1, V_2, V_3, V_4$ are four vertices surrounding $P$
- $R_1, R_2$ are two vertices on either side of $P$
Gouraud shading is good for
- Curved surfaces
- Close light sources
- Diffuse shading
Gouraud shading is bad for
- Specular shading
Phong Shading
The lighting equation is calculated per fragment rather than per vertex.
void main() {
vec3 m_unit = normalize(m);
// Compute the s, v and r vectors
vec3 s = normalize(view_matrix * vec4(lightPos, 1) - viewPosition).xyz;
vec3 v = normalize(-viewPosition.xyz);
vec3 r = normalize(reflect(-s, m_unit));
vec3 ambient = ambientIntensity * ambientCoeff;
vec3 diffuse = max(0, lightIntensity * diffuseCoeff * dot(m_unit, s));
vec3 specular = vec3(0);
// Only show specular reflections for the front face
if (dot(m_unit, s) > 0) {
specular = max(0, lightIntensity * specularCoeff * pow(dot(r, v), phongExp));
}
vec3 intensity = ambient + diffuse + specular;
outputColor = vec4(intensity, 1) * input_color;
}
Phong shading is good for
- Specular shading
- Diffuse shading
Phong shading is bad for
- Slow hardware
Colour
We implement colour by having separate red, green and blue components for light intensities $I(P), I_{ambient}(P), I_{specular}(P)$ and reflection coefficients $\rho_a, \rho_d, \rho_{sp}$. The lighting equation is then applied three times, once for each colour.
Transparency
Alpha Blending
Given an $\alpha$ value and two colours, we can linearly interpolate and compute new $r^\prime, g^\prime, b^\prime, a^\prime$ values based on $\alpha$:
\[\begin{pmatrix} r^\prime \\ g^\prime \\ b^\prime \\ a^\prime \end{pmatrix} = \alpha \begin{pmatrix} r_{image} \\ g_{image} \\ b_{image} \\ a_{image} \end{pmatrix} + (1 - \alpha) \begin{pmatrix} r \\ g \\ b \\ a \end{pmatrix}\]Alpha blending has some problems, including
- Depends on the order that pixels are drawn
- Need to draw transparent polygons after the polygons behind them
- If using the depth buffer and you try to draw a transparent polygon before the objects behind it, the later objects will not be drawn
In summary, transparent polygons should be drawn in back-to-front order after opaque ones.
Curves
It is generally useful to express curves in parametric form:
\[\begin{pmatrix} p_1 \\ p_2 \\ p_3 \end{pmatrix} = P(t), t \in [0, 1]\]Linear Interpolation
Linear interpolation is good for straight lines. It is of degree 1 and involves 2 control points $P_0, P_1$.
\[P(t) = (1 - t)P_0 + t P_1\]Quadratic Interpolation
Quadratic interpolation is good for parabolas. It is of degree 2 and involves 3 control points $P_0, P_1, P_2$.
\[P(t) = (1 - t)^2 P_0 + 2t(1 - t) P_1 + t^2 P_2\]Cubic Interpolation
Cubic interpolation is good for a variety of curves. It is of degree 3 and involves 4 control points $P_0, P_1, P_2, P_3$.
\[P(t) = (1 - t)^3 P_0 + 3t(1 - t)^2 P_1 + 3t^2(1 - t) P_2 + t^3 P_3\]Bezier Curves
The above family of curves are known as Bezier curves. The have the general form:
\[P(t) = \sum_{k = 0}^m B_k^m (t) P_k\]where
- $m$ is the degree of the curve
- $P_0 \dots P_m$ are the control points
- $B_k^m$ is a Bernstein polynomial
Bernstein Polynomials
\[B_k^m(t) = \begin{pmatrix} m \\ k \end{pmatrix} t^k (1 - t)^{m - k}\]where
\[\begin{pmatrix} m \\ k \end{pmatrix} = \frac{m!}{k! (m - k)!}\]Tangents
The tangent vector to the curve at parameter $t$ is given by
\[\begin{array}{rcl} \frac{d P(t)}{dt} & = & \sum_{k = 0}^m \frac{d B_k^m (t)}{dt} P_k \\ & = & m \sum_{k = 0}^{m - 1} B_k^{m - 1} (t) (P_{k + 1} - P_k) \end{array}\]Modelling
Extrusion
Extruded shapes are created by sweeping a 2D polygon along a line or curve.
We can extrude a polygon along a path by specifying it as a series of transformations:
\[\begin{array}{rcl} poly & = & P_0, P_1, \dots, P_k \\ path & = & M_0, M_1, \dots, M_n \end{array}\]and at each point in the path, we calculate a cross section $poly_i = M_i P_0, M_i P_1, \dots, M_i P_k$.
Frenet Frame
We can get the curve values at various points $t_i$ and then build a polygon perpendicular to the curve at $C(t_i)$ using a Frenet frame.
We align the $k$ axis with the (normalised) tangent, and choose values of $i$ and $j$ to be perpendicular:
\[\begin{array}{rcl} \phi & = & C(t) \\ k & = & \hat{C^\prime}(t) \\ i & = & \begin{pmatrix}-k_2 \\ k_1 \\ 0\end{pmatrix} \\ j & = & k \times i \end{array}\]To find the tangent ($k$) we can:
- Use maths \(\begin{array}{rcl} C(t) & = & (\cos(t), \sin(t), bt) \\ T(t) & = & normalise(-\sin(t), \cos(t), b) \end{array}\)
- Approximate \(T(t) = normalise(C(t + 1) - C(t - 1))\)
Revolution
A surface with radial symmetry can be made by sweeping a half cross-section around an axis.
Textures
Textures are a way to add detail to models without requiring too many polygons. A texture is basically a function that maps texture coordinates to pixel values:
\[T(s, t) = \begin{pmatrix} r \\ g \\ b \\ a \end{pmatrix}\]where $(s, t)$ is a texture coordinate.
Textures are most commonly represented by bitmaps. Pixels on a texture are called texels.
Textures & Shading
The simplest way to approach textures is to replace illumination calculations with a texture look up: $I(P) = T(s(P), t(P))$.
A more common solution is to use the texture to modulate the ambient and diffuse reflection coefficients: $I(P) = T(s, t)[I_{ambient} + I_{diffuse}] + I_{specular}$.
Magnification
Normal bitmap textures have finite detail, and if we zoom in close we can see individual texels. To fix this we can:
- Choose the nearest texel
- Bilinear filter between the nearest four texels
Minification
A similar problem occurs when we zoom out, that is that multiple texels map to a single pixel. This is known as aliasing and occurs when samples are taken from an image at a lower resolution than repeating detail in the image.
MIP mapping
Mipmaps are precomputed low resolution versions of a texture. To use mipmaps, we simply choose the next smallest mipmap for the required resolution.
Filtering
Trilinear filtering
Use bilinear filtering to compute pixel values based on the next highest and the next lowest mipmap resolutions, and interpolate between these values depending on the desired resolution.
Anisotropic Filtering
If a polygon is on an oblique angle away from the camera, then minification may occur much more strongly in one dimension than the other. Anisotropic filtering treats the two axes independently.
Reflection Mapping
Generating reflections is expensive, but we can make a reasonable approximation with textures:
- Generate a cube that encloses the reflective object
- Place a camera at the centre of the cube and render the outside world onto the faces of the cube
- Use this image to texture the object
Shadow Buffer
To take into account whether a light source is occluded by another object, we can keep a shadow buffer for each light source. It is similar to the depth buffer by recording the distance from the light source to the closest object in each direction.
Shadow rendering is usually done in multiple passes:
- Render the scene from each light’s viewpoint capturing only $z$-info in shadow buffer
- Render the scene from camera’s point of view, using the previously captured shadow buffers to modulate the fragments
Pros:
- Provides realistic shadows
- No knowledge or processing of the scene geometry is required
Cons:
- More computation
- Shadow quality is limited by precision of shadow buffer
- Shadow edges are hard to compute
Light Mapping
If our light sources and large portions of the geometry are static then we can precompute the lighting equations and store the results in textures called light maps. This process is known as baked lighting.
Pros:
- Sophisticated lighting effects can be computed at compile time, where speed is less of an issue
Cons:
- Memory and loading times for many individual light maps
- Not suitable for dynamic lights or moving objects
- Potential aliasing effects
Rasterisation
Rasterisation is the process of converting lines and polygons represented by their vertices into fragments.
Bresenham’s Algorithm
Given two points $(x_0, y_0), (x_1, y_1)$, we can draw a line using only integer calculations:
int y = y_0;
int w = x_1 - x_0;
int h = y_1 - y_0;
int F = 2 * h - w;
for (int x = x_0; x <= x_1; x++) {
drawPixel(x, y);
if (F < 0) {
F += 2 * h;
} else {
F += 2 * (h - w);
y++;
}
}
Polygon Filling
Shared Edges
Pixels on shared edges between polygons need to be drawn consistently regardless of the order the polygons are drawn, with no gaps. We adopt a rule that we do not draw rightmost or uppermost edge pixels.
Scanline Algorithm
// For every scanline
for (y = minY; y <= maxY; y++) {
remove all edges that end at y
for (Edge e : active) {
e.x += e.inc;
}
add all edges that start at y - keep list sorted by x
for (int i = 0; i < active.size; i += 2) {
fillPixels(active[i].x, active[i + 1].x, y);
}
}
Edge Table
The edge table is a lookup table indexed on the y-value of the lower vertex of the edge. Horizontal edges are not added. We store the the x-value of the lower vertex, the increment (inverse gradient i.e. $run / rise$) of the edge and the y-value of the upper vertex.
Active Edge List
We keep a list of active edges that overlap the current scanline. Edges are added to the list as we pass the bottom vertex. Edges are removed from the list as we pass the top vertex.
For each edge in the active edge list we store The x-value of its crossing with the current row (initially the bottom x-value), the increment (inverse gradient) and the y-value of the top vertex.
Antialiasing
There are two basic approaches to eliminating aliasing:
- Prefiltering: compute the amount occupied by a pixel and set its value to that percentage
- Postfiltering: taking samples at a higher resolution and then averaging
Particle Systems
Particles are usually represented as small textured quads or point sprites (single vertices with an image attached). They are billboarded, i.e. transformed so that they are always face towards the camera.
Particles are created by an emitter object and evolve over time, usually changing their properties.
GPU
Particle systems are well suited to implementation as vertex shaders, where particles are represented as individual vertices.
Ray Tracing
In ray tracing, objects are modelled as implicit forms and each pixel is computed by casting a ray and seeing which models it intersects.
Global Lighting
In reality, the light falling on a surface comes from everywhere. Methods that take this lighting into account are called global lighting methods.
There are two main methods for global lighting:
- Raytracing models specular reflection and refraction
- Radiosity models diffuse reflection
Both methods are computationally expensive and are rarely suitable for real-time rendering.
Pseudocode
for each pixel (x, y):
v = P(x, y) - E
hits = {}
for each object obj in the scene:
E' = M_inverse * E
v' = M_inverse * v
hits.add(obj.hit(E', v'))
hit = h in hits with min time > 1
if (hit is null):
set (x, y) to background
else:
set (x, y) to hit.obj.color(R(hit.time))
Shadows
At each hit point we cast a new ray towards each light source. These rays are called shadow feelers.
Reflections
We can implement realistic reflections by casting further reflected rays.
Reflected rays can be reflected off many objects one after the other, and so we usually stop after a fixed number of reflections to avoid infinite recursion.
Transparency
We can model transparent objects by casting a second ray that continues through the object.
Illumination
The illumination equation is extended to include reflected and transmitted components, which are computed recursively:
\[I(P) = I_{ambient} + I_{diffuse} + I_{specular} + I(P_{reflection}) + I(P_{transparent}).\]Refraction
To handle transparency appropriately we need to take into account the refraction of light. Light bends as it moves from one medium to another. The change is described by Snell’s Law:
\[\frac{\sin \theta_1}{c_1} = \frac{\sin \theta_2}{c_2}\]where $c_1, c_2$ are the speeds of light in each medium.
For maximum realism, we should calculate different paths for different colours, since different wavelengths of light move at different speeds.
Extents
Extents are bounding boxes or spheres which enclose an object. Testing for a hit against a box or sphere is fast, and if the test succeeds, then we can proceed to test against the actual object.
Binary Space Partitioning (BSP)
A BSP tree divides the world into cells, where each cell contains a small number of objects. When casting a ray, we only want to visit the leaves of the tree that the ray passes through.
Traversal Algorithm Pseudocode
visit(E, v, node): (E eye)
if (node is leaf):
intersect ray with objs in leaf
else:
if (E on left):
visit(E, v, left)
other = right;
else:
visit(E, v, right)
other = left
endif
if (ray crosses boundary):
E' = intersect(E, v, boundary)
visit(E', v, other)
endif
endif
Volumetric Ray Tracing
We can also apply ray tracing to volumetric objects like smoke or fog or fire. Such objects are transparent, but have different intensity and transparency throughout the volume.
We represent the volume as two functions:
\[\begin{array}{rcl} C(P) & = & \text{colour at point} \ P \\ \alpha(P) & = & \text{transparency at point} \ P \end{array}\]Sampling
We cast a ray from the camera through the volume and take samples at fixed intervals along the ray. We end up with $N + 1$ samples:
\[\begin{array}{rcl} P_i & = & R(t_{hit}, i \Delta t) \\ C_i & = & C(P_i) \\ \alpha_i & = & \alpha(P_i) \\ C_N & = & (r, g, b)_{background} \\ \alpha_N & = & 1 \end{array}\]Alpha Compositing
We now combine these values into a single colour by applying the alpha-blending equation.
\[\begin{array}{ccl} C_N^N & = & C_N \\ \underbrace{C_N^i}_\text{Total colour at $i$} & = & \alpha_i \underbrace{C_i}_\text{Local colour at $i$} + (1 - \alpha_i) \underbrace{C_N^{i + 1}}_\text{Total colour at $i + 1$} \end{array}\]We can write a closed formula for the colour from $a$ to $b$ as
\[C_b^a = \sum_{i = 1}^b \alpha_i C_i \prod_{j = a}^{i - 1} (1 - \alpha_j)\]Colour
Colour Blending
We write $C = rR + gG + bB$ to indicate that colour $C$ is equivalent $r$ units of red, $g$ units of green and $b$ units of blue.
Complementary Colours
Colours that add to give white are called complementary colours.
RGB
Colour is expressed as the addition of red, green and blue components:
\[C(r, g, b) = rR + gG + bB.\]CMY
CMY is a subtractive colour model, typically used in describing printed media. It uses cyan, magenta and yellow which are contrasting colours.
CMYK
CMYK is similar to CMY, but has an additional component K, which stands for ‘key’ and refers to black ink.
HSV
Colour is expressed as three values:
- $H$: the hue as an angle from $0^\circ$ (red) to $360^\circ$ (red)
- $S$: the saturation from 0 (grey) to 1 (full colour)
- $V$: the brightness from 0 (black) to 1 (bright colour)
HLS
Colour is expressed in terms of its:
- Hue: the colour of the dominant wavelength
- Luminance: the total power of the light
- Saturation: the purity of the light
CIE
The CIE standard, also known as the XYZ model, is a way of describing colours as a three dimensional vector $(x, y, z)$ with
\[\begin{array}{c} 0 \le x, y, z \le 1 \\ x + y + z = 1 \end{array}\]Gamut
Any output device has a certain range of colours it can represent, which we call its gamut. We can depict this as an area on the CIE chart.
Rendering Intents
There are four standard rendering intents which describe approaches to gamut mapping.
Absolute Colourmetric
Map $C$ to the nearest point within the gamut. This approach distorts hues and does not preserve relative saturation.
Relative Colourmetric
Desaturate $C$ until it lies in the gamut. This approach maintains hues more closely but does not preserve relative saturation.
Perceptual
Desaturate all colours until they all lie in the gamut. This approach maintains hues and preserves relative saturation but removes a lot of saturation.
Saturation
Attempt to maintain saturated colours. This approach is not standardised.
Postprocessing
Postprocessing is the process of adding visual affects after the scene has already been rendered.
Frame Buffer Objects
The frame buffer OpenGL gives us by default is very limited in most implementations.
We can create our own frame buffer objects with:
int[] fbos = new int[1];
gl.glGenFramebuffers(1, fbos, 0);
In order to use the frame buffer we have to attach to it at least a color buffer and optionally a depth or other buffers.
Gaussian Blur
Uses a Gaussian distribution to combine each pixel with its neighbours. The pixel itself is weighted most, and nearby pixels weighted depending on their distance from it.
Bloom Filter
Combines multiple post-processing steps:
- Extract bright regions
- Blur bright regions
- Draw original scene to screen
- Draw blurred brightness on top
High Dynamic Range
By using frame buffer objects we can render into buffers with more color detail. To render to the screen, this colour information has to be mapped to the conventional 8 bits per channel. This process is referred to as tone mapping.
Different tone mappings can do things like reduce overall brightness or enhance certain colours.
Ambient Occlusion
Approximates indirect lighting by identifying small holes, crevices and gaps and making them darker. For each pixel that has already been rendered, calculate an occlusion factor based on the depth of it and the depth of surrounding pixels.
Deferred Shading
In deferred shading we render the scene into multiple buffers containing basic information (e.g. normals, colour, depth). Actual lighting calculations can then be deferred to a post processing step.
Splines
A spline is a smooth piecewise-polynomial function. The places where the polynomials join are called knots. A joined sequence of Bezier curves is an example of a spline.
A spline provides local control in the fact that a control point only affects the curve within a limited neighbourhood.
B-splines
We can generalise Bezier splines into a larger class called basis splines or B-splines. A B-spline of degree $M$ has equation:
\[P(t) = \sum_{k = 0}^L N_k^m (t) P_k\]where $L$ is the number of control points, with $L > m$.
Uniform / Non-uniform
- Uniform B-splines have equally spaced knots
- Non-uniform B-splines allow knots to be positioned arbitrarily and even repeat
- A multiple knot is a knot value that is repeated several times
Rational Bezier Curves
We can create a greater variety of curve shapes if we weight the control points:
\[P(t) = \frac{\sum_{k = 0}^m w_k B_k^m (t) P_k}{\sum_{k = 0}^m w_k B_k^m (t)}\]where a higher weight draws the curve closer to that point.
Rational Bezier curves can exactly represent all conic sections, which is not possible with normal Bezier curves.
Rational B-splines
We can also weight control points in B-splines to get rational B-splines:
\[P(t) = \frac{\sum_{k = 0}^L w_k N_k^m (t) P_k}{\sum_{k = 0}^L w_k N_k^m (t)}.\]