Graphics Programming
Cameras: Perspective Projection

Thorsten Thormählen
November 15, 2024
Part 6, Chapter 1

This is the print version of the slides.

Advance slides with the → key or
by clicking on the right border of the slide

Control Keys

→ move to next slide (also Enter or Spacebar).
← move to previous slide.
d enable/disable drawing on slides
p toggles between print and presentation view
CTRL + zoom in
CTRL - zoom out
CTRL 0 reset zoom

Slides can also be advanced by clicking on the left or right border of the slide.

Notation

Type	Font	Examples
Variables (scalars)	italics	$a, b, x, y$
Functions	upright	$\mathrm{f}, \mathrm{g}(x), \mathrm{max}(x)$
Vectors	bold, elements row-wise	$\mathbf{a}, \mathbf{b}= \begin{pmatrix}x\\y\end{pmatrix} = (x, y)^\top,$ $\mathbf{B}=(x, y, z)^\top$
Matrices	Typewriter	$\mathtt{A}, \mathtt{B}= \begin{bmatrix}a & b\\c & d\end{bmatrix}$
Sets	calligraphic	$\mathcal{A}, B=\{a, b\}, b \in \mathcal{B}$
Number systems, Coordinate spaces	double-struck	$\mathbb{N}, \mathbb{Z}, \mathbb{R}^2, \mathbb{R}^3$

Cameras

To display a three-dimensional scene as a two-dimensional image, the mapping process must be described mathematically
The mapping of 3D objects into a 2D image plane is often called projection
To this end, in computer graphics different camera models are used

Camera Models

Effects caused by lens reflections
in the camera

Often these camera models are idealized and can only approximately simulate the effects, which occur when observing the world with our eyes or with a real camera
This introductory lecture describes only those geometric projections that arise when straight lines are used as projection rays

Source: Alan Weir, Creative Commons License

Camera Models

Camera models can be classified into perspective projection and parallel projection

Perspective projection

Parallel projection

Perspective Projection

Pinhole Camera

The perspective projection is very familiar to us as human beings, because our eye produces such a perspective projection
An important attribute of the perspective projection, in contrast to the parallel projection, is that objects at a larger distance to the viewer or camera are displayed smaller
The simplest perspective projection is a mapping that use a pinhole camera model

Pinhole Camera

A pinhole camera consists of a camera body with a very small hole through which the light can enter
The image is formed at the back of the camera body and is displayed upside-down

A larger hole has the advantage that more light can enter the camera, resulting in shorter exposure times
The disadvantage is that multiple projections overlap and the image is out of focus

object

camera

pinhole

image
of the object

larger pinhole

Pinhole Camera

In computer graphics, usually an idealized model of a camera is used, which has an infinitely small hole
This camera model can not simulate defocus, that is, all objects will be displayed perfectly sharp
Furthermore, it is assumed that the image is formed on an imaginary image plane in front of the projection center, so that the image is no longer upside-down

object

camera

pinhole

image
of the object

center of projection

image plane

Perspective Projection

focal length

center of projection

image plane

focal length

image plane

$x$

$y$

$z$

$x$

$f$

$z$

$\tilde{\mathbf{P}}$

$\mathbf{P}$

$\tilde{\mathbf{P}}$

$\mathbf{P}$

$f \frac{p_x}{p_z}$

$f$

The formula for mapping a 3D point $\mathbf{P}=(p_x,p_y,p_z)^\top$ to a point $\tilde{\mathbf{P}}= (\tilde{p}_x,\tilde{p}_y,\tilde{p}_z)^\top$ located at the image plane of the camera is given by:
$\tilde{\mathbf{P}}= \left( f \frac{p_x}{p_z}, f \frac{p_y}{p_z}, f \right)^\top$
This follows immediately from the figure by application of the intercept theorem, since
$\frac{\tilde{p}_x}{f} = \frac{p_x}{p_z}$ and $\frac{\tilde{p}_y}{f} = \frac{p_y}{p_z}$

Perspective Projection

Using homogeneous coordinates the perspective projection can be written as a linear mapping using a $4 \times 4$ matrix:
$\tilde{\mathbf{P}}= \begin{pmatrix} \tilde{p}_x \\ \tilde{p}_y \\ \tilde{p}_z \end{pmatrix}= \begin{pmatrix} f \frac{p_x}{p_z}\\ f \frac{p_y}{p_z}\\ f \end{pmatrix} \in \mathbb{R}^3 \longmapsto \underline{\tilde{\mathbf{P}}}= \begin{pmatrix}f \, p_x \\f \, p_y \\ f \, p_z\\ p_z\end{pmatrix} \in \mathbb{H}^3$

$\begin{align}\underline{\tilde{\mathbf{P}}} & = \begin{pmatrix}f \, p_x \\f \, p_y \\ f \, p_z\\ p_z\end{pmatrix} = \underbrace{\begin{bmatrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & f & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix}}_{\mathtt{A}} \begin{pmatrix}p_x \\p_y \\ p_z\\ 1\end{pmatrix}\\ \underline{\tilde{\mathbf{P}}} &=\mathtt{A}\, \underline{\mathbf{P}} \end{align}$

Perspective Projection in OpenGL

$f$

image plane

$x$

$y$

$z$

In OpenGL, the camera is pointing in the negative $z$-direction. Therefore, we have:
$\begin{align}\underline{\tilde{\mathbf{P}}} & = \begin{pmatrix}f \, p_x \\f \, p_y \\ f \, p_z\\ -p_z\end{pmatrix} = \underbrace{\begin{bmatrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & f & 0 \\ 0 & 0 & -1 & 0 \end{bmatrix}}_{\mathtt{A}} \begin{pmatrix}p_x \\p_y \\ p_z\\ 1\end{pmatrix}\\ \underline{\tilde{\mathbf{P}}} &=\mathtt{A}\, \underline{\mathbf{P}} \end{align}$

Perspective Projection in OpenGL

near

far

$x$

$z$

$-z_n$

$-z_f$

displayed
area

In OpenGL, there is a so-called near- and a far-clipping plane
The near-plane and far-plane are located parallel to the image plane
Points are only displayed if their $z$-coordinate lies within the range defined by the near- and far-plane
To this end, a new linear mapping is defined, such that for points with a $z$-coordinate on the near-plane it holds:

$p_z=-z_n \quad \mapsto \quad \tilde{p}_z=-1$

and for points on the far-plane:

$p_z=-z_f \quad \mapsto \quad \tilde{p}_z=1$
In order to accomplish this, two new parameters $\alpha$ and $\beta$ are added to the linear transformation matrix

Perspective Projection in OpenGL

$\underline{\tilde{\mathbf{P}}} = \begin{pmatrix}f \, p_x \\f \, p_y \\ \alpha \, p_z + \beta \\ -p_z\end{pmatrix} = \begin{bmatrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & \alpha & \beta \\ 0 & 0 & -1 & 0 \end{bmatrix} \begin{pmatrix}p_x \\p_y \\ p_z\\ 1\end{pmatrix} \in \mathbb{H}^3$

Thus, the projected point in Cartesian coordinates is:
$\tilde{\mathbf{P}}=(\tilde{p}_x,\tilde{p}_y,\tilde{p}_z)^\top = \left( f \frac{p_x}{-p_z}, f \frac{p_y}{-p_z}, -\alpha \, + \frac{\beta}{-p_z} \right)^\top \in \mathbb{R}^3$
Now $\alpha$ and $\beta$ can be determined from the conditions for the mapping of the $z$-coordinate:
$\begin{align}p_z=-z_n \,&\mapsto \, \tilde{p}_z=-1 \quad \Rightarrow -\alpha \, + \frac{\beta}{z_n} = -1 \\ p_z=-z_f \, &\mapsto \, \tilde{p}_z=\ 1 \,\,\, \,\quad \Rightarrow -\alpha \, + \frac{\beta}{z_f} = 1\end{align}$

Perspective Projection in OpenGL

Solving the equation system for $\alpha$ and $\beta$ provides:
$\begin{align} \alpha &= \frac{z_f+z_n}{z_n-z_f}\\ \beta & = \frac{2 z_f \, z_n}{z_n-z_f}\end{align}$
Thus, for the new projection matrix we have:
$\begin{align} \underline{\tilde{\mathbf{P}}} & = \underbrace{\begin{bmatrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & \frac{z_f+z_n}{z_n-z_f} & \frac{2 z_f \, z_n}{z_n-z_f} \\ 0 & 0 & -1 & 0 \end{bmatrix}}_{\mathtt{A}} \begin{pmatrix}p_x \\p_y \\ p_z\\ 1\end{pmatrix}\\ \underline{\tilde{\mathbf{P}}} &=\mathtt{A}\, \underline{\mathbf{P}} \end{align}$

Perspective Projection in OpenGL

focal length A

image height A

focal length B

image height B

$y$

$z$

-1

$\Theta$

Until now we have only defined the focal length $f$ as the distance between the image plane and the camera center, but nothing was stated about the size of the image plane in $x$- and $y$-direction
In the end, only the ratio between the size of the image plane and the focal length is important, which is uniquely defined by the opening angle $\Theta$. All configurations with the same opening angle result in the same image (only with scaled $x$- and $y$-coordinates).
In OpenGL, the size of the image plane is always chosen such that the resulting $x$- and $y$-coordinates are in the range $[-1; 1]$.
For a given opening angle the focal length is therefore obtained by (compare figure):
$\frac{f}{1} = \frac{\cos( 0.5 \, \Theta)}{\sin( 0.5 \, \Theta)} \Leftrightarrow f = \mathrm{cotan}( 0.5 \, \Theta)$

Transformation Matrices in OpenGL

For the projection from the camera coordinate system into the image plane the GL_PROJECTION matrix is used.
The manipulation of this matrix is activated by
```
glMatrixMode(GL_PROJECTION);
```
All functions for matrix manipulation, such as glLoadIdentity, glLoadMatrix, glMultMatrix, glRotate, glScale, glTranslate, glPushMatrix, glPopMatrix, gluPerspective are then executed on the GL_PROJECTION matrix.
The current state of the GL_PROJECTION matrix influences the transformation of objects only if they are drawn (OpenGL as a state machine)

Perspective Projection in OpenGL

Creating a perspective projection matrix in OpenGL:

$\mathtt{A}$

$x$

$y$

$z$

$x$

$y$

$z$

glMatrixMode(GL_PROJECTION);
glLoadIdentity();
gluPerspective(fovy, aspect, near, far);

$ \mathtt{A} = \begin{bmatrix} \frac{f}{\mathrm{aspect}} & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & \frac{\mathrm{far}+\mathrm{near}}{\mathrm{near}-\mathrm{far}} & \frac{2 \ast \mathrm{far} \ast \mathrm{near}}{\mathrm{near}-\mathrm{far}} \\ 0 & 0 & -1 & 0 \end{bmatrix}$

with $f = \mathrm{cotan}( 0.5 \ast \mathrm{fovy})$

and $\mathrm{aspect}= \mathrm{w} / \mathrm{h}$

Example: "Dolly Zoom" or "Vertigo Effect"

The idea of the "Dolly Zoom" effect is to compensate a camera translation in $z$-direction ("Dolly") by a change in focal length ("Zoom")
Mathematically it is easy to see from the projection equation that achieving the compensation is possible, because for the $y$-coordinate of a projected point we have:
$\tilde{p}_y = f \frac{p_y}{-p_z}$
Since there is only one focal length $f$ but typically many 3D points with different depth value $p_z$ in the scene, the compensation can only be achieved for a selected depth value. This creates an interesting perspective effect.
A well-known movie is Vertigo (1958) by Alfred Hitchcock, who has used this effect to simulate dizziness of the protagonist

Example: "Dolly Zoom" or "Vertigo Effect"

focal length $f$

$f'$

$y$

$z$

-1

$\Theta$

$y$

-1

$\Theta'$

object

time $t=t_0=1$

time $t > 1$

$3\,t$

e.g. unit cube

$\mathbf{P}$

$\mathbf{P}'$

Example: Dolly Zoom in OpenGL

Source code of the example with GLUT: DollyZoom.cpp
Source code of the example with Qt: DollyZoom.cpp
Source code of the example with Java: DollyZoom.java

Example: Dolly Zoom in OpenGL

class Renderer {

public:
  float t; //time
  const float d0; // initial distance

public:
  Renderer() : t(1.0), d0(3.0), width(0), height(0) {}

public:
  void display() {
    glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    gluPerspective (dollyZoomFovy(), 
                    (float)width/(float)height, 
                    0.1, 50.0);

    glMatrixMode(GL_MODELVIEW);
    glLoadIdentity();
    // translate camera by 3 units
    glTranslatef(0.0f, 0.0f, -t*d0);

    // draw a cube in the local coordinate system
    drawCube();
    // draw random lines
    drawRandomLines();
  }

  void init() {
    glEnable(GL_DEPTH_TEST);

    // create random values between -1.0 and 1.0
    for(unsigned r=0; r < 1000; r++) {
      int r = rand();
      randVals.push_back(2.0*float(r)/float(RAND_MAX)-1.0f);
    }
  }

  void resize(int w, int h) {
    // ignore this for now
    glViewport(0, 0, w, h);
    width = w;
    height = h;
  }

  float dollyZoomFovy() {
    float fovyInit = 60.0f; // initial field of view
    float theta = fovyInit / 180.0f * M_PI; // degree to rad
    float f = 1.0f / tan(theta/2.0f);
    float fNew = f * (d0*t-1) / (d0-1);
    float thetaNew = atan(1.0f / fNew) * 2.0f;
    float val = 180.0 * thetaNew / M_PI; //rad to degree
    return val;
  }

private:
  int width;
  int height;
  std::vector<float> randVals;

private:
  void drawCube() {

    glColor3f(1.0f, 1.0f, 1.0f);
    glLineWidth(3.0f);
    glBegin(GL_LINE_LOOP);
    glVertex3f(-1.0f, 1.0f, 1.0f);
    glVertex3f( 1.0f, 1.0f, 1.0f);
    glVertex3f( 1.0f,-1.0f, 1.0f);
    glVertex3f(-1.0f,-1.0f, 1.0f);
    glEnd();
    glBegin(GL_LINE_LOOP);
    glVertex3f(-1.0f, 1.0f,-1.0f);
    glVertex3f( 1.0f, 1.0f,-1.0f);
    glVertex3f( 1.0f,-1.0f,-1.0f);
    glVertex3f(-1.0f,-1.0f,-1.0f);
    glEnd();

    glBegin(GL_LINE_LOOP);
    glVertex3f( 1.0f, 1.0f,-1.0f);
    glVertex3f( 1.0f, 1.0f, 1.0f);
    glVertex3f( 1.0f,-1.0f, 1.0f);
    glVertex3f( 1.0f,-1.0f,-1.0f);
    glEnd();

    glBegin(GL_LINE_LOOP);
    glVertex3f(-1.0f, 1.0f,-1.0f);
    glVertex3f(-1.0f, 1.0f, 1.0f);
    glVertex3f(-1.0f,-1.0f, 1.0f);
    glVertex3f(-1.0f,-1.0f,-1.0f);
    glEnd();
    glLineWidth(1.0);
  }

  void drawRandomLines() {
    if(randVals.size() % 5) return;
    unsigned i = 0;
    while(i < randVals.size()) {
      glColor3f(fabs(randVals[i++]), 
                fabs(randVals[i++]), 
                fabs(randVals[i++]));
      float x = randVals[i++];
      float y = randVals[i++];
      glBegin(GL_LINES);
      glVertex3f(x, y, -1.0f);
      glVertex3f(x, y,  1.0f);
      glEnd();
    }
  }
};

Transformation of the Camera

$\mathtt{T}_{\mathrm{\small cam}}$

$\mathtt{T}_{\mathrm{\small obj}}$

world coordinate system

local coordinate system

camera coordinate system

Until now, it was assumed that the projection center of the camera is located at the origin of the global world coordinate system
If a transformation $\mathtt{T}_{\mathrm{\small cam}}$ is applied to the camera, the projection of a point $\mathbf{P}$ defined in a local object coordinate system is given by:
$\underline{\tilde{\mathbf{P}}} = \mathtt{A} \, \mathtt{T}_{\mathrm{\small cam}}^{-1} \, \mathtt{T}_{\mathrm{\small obj}} \, \underline{\mathbf{P}}$

Transformation of the Camera

Mapping equation for homogeneous points:

$\underline{\tilde{\mathbf{P}}} = \mathtt{A} \, \mathtt{T}_{\mathrm{\small cam}}^{-1} \, \mathtt{T}_{\mathrm{\small obj}} \, \underline{\mathbf{P}}$

where the $4 \times 4$ matrix

$\mathtt{T}_{\mathrm{\small obj}}$ describes the transformation from the local coordinate system to the world coordinate system
$\mathtt{T}_{\mathrm{\small cam}}^{-1}$ describes the transformation from the world coordinate system to the camera coordinate system
$\mathtt{A}$ describes the transformation from the camera coordinate system into the image plane

Transformation of the Camera

$\mathbf{C}_a$

$\mathbf{C}_b$

$\mathbf{e}_x$

$\mathbf{e}_y$

$\mathbf{e}_z$

$\tilde{\mathbf{a}}_x$

$\tilde{\mathbf{a}}_y$

$\tilde{\mathbf{a}}_z$

$\tilde{\mathbf{b}}_x$

$\tilde{\mathbf{b}}_y$

$\tilde{\mathbf{b}}_z$

World coordinate system

Local coordinate system

Camera coordinate system

The transformation matrices are given by the basis vectors of the coordinate systems (as discussed in the chapters before)
The transformation matrix $\mathtt{T}_{\mathrm{\small obj}}$ transforms a point from the local to the global coordinate system
$ \mathtt{T}_{\mathrm{\small obj}} = \begin{bmatrix}\tilde{\mathbf{b}}_x & \tilde{\mathbf{b}}_y & \tilde{\mathbf{b}}_z & \mathbf{C}_b\\0 & 0 & 0 & 1\end{bmatrix}$
The transformation matrix $\mathtt{T}_{\mathrm{\small cam}}$ transforms a point from the camera to the world coordinate system
$\begin{align} \mathtt{T}_{\mathrm{\small cam}} & = \begin{bmatrix}\tilde{\mathbf{a}}_x & \tilde{\mathbf{a}}_y & \tilde{\mathbf{a}}_z & \mathbf{C}_a\\0 & 0 & 0 & 1\end{bmatrix} \\ & = \begin{bmatrix} \mathtt{R}_a & \mathbf{C}_a\\ \mathbf{0}^\top & 1\end{bmatrix} \end{align}$

Transformation of the Camera

For the inverse transformation $\mathtt{T}_{\mathrm{\small cam}}^{-1}$ from the world into the camera coordinate system we have (with $ \mathtt{R}_a^{-1}= \mathtt{R}_a^\top$):
$\mathtt{T}_{\mathrm{\small cam}}^{-1} = \begin{bmatrix} \mathtt{R}_a & \mathbf{C}_a\\ \mathbf{0}^\top & 1\end{bmatrix}^{-1} = \begin{bmatrix} \mathtt{R}_a^{\top} & -\mathtt{R}_a^{\top} \mathbf{C}_a\\ \mathbf{0}^\top & 1\end{bmatrix} $

Transformation of the Camera in OpenGL

Mapping equation for homogeneous points:

$\underline{\tilde{\mathbf{P}}} = \mathtt{A} \, \underbrace{\mathtt{T}_{\mathrm{\small cam}}^{-1} \, \mathtt{T}_{\mathrm{\small obj}}}_{\mathtt{T}_{\mathrm{\small modelview}}} \, \underline{\mathbf{P}}$

In OpenGL, all transformations, except for the projection matrix $\mathtt{A}$, are combined into a so-called GL_MODELVIEW matrix
Thus, the GL_MODELVIEW matrix directly describes the transformation from the respective local coordinate system to the camera coordinate system
The GL_PROJECTION matrix $\mathtt{A}$ describes the mapping from the camera coordinate system into the image plane

gluLookAt

To simplify the definition of the matrix $\mathtt{T}_{\mathrm{\small cam}}^{-1}$ there is the GLU function
```
gluLookAt(eyex, eyey, eyez, refx, refy, refz, upx, upy, upz);
```
By setting up an eye point $\mathbf{C}_{\mathrm{\small eye}}$, a targeted reference point $\mathbf{P}_{\mathrm{\small ref}}$, and a vector $\mathbf{v}_{\mathrm{\small up}}$ (which defines the direction in which the $y$-coordinate of the camera is pointing) the basis vectors of the camera coordinate system can be computed:

eye point $\mathbf{C}_{\mathrm{\small eye}}$

reference point $\mathbf{P}_{\mathrm{\small ref}}$

up vector $\mathbf{v}_{\mathrm{\small up}}$

$\tilde{\mathbf{a}}_x$

$\tilde{\mathbf{a}}_y$

$\tilde{\mathbf{a}}_z$

$\begin{align} \mathbf{d} & = \mathbf{C}_{\mathrm{\small eye}} - \mathbf{P}_{\mathrm{\small ref}}\\ \tilde{\mathbf{a}}_z &= \frac{\mathbf{d}}{|\mathbf{d}|}, \mathbf{v}' = \frac{\mathbf{v}_{\mathrm{\small up}}}{|\mathbf{v}_{\mathrm{\small up}}|} \\ \tilde{\mathbf{a}}_x &= \mathbf{v}'\times \tilde{\mathbf{a}}_z \\ \tilde{\mathbf{a}}_y &= \tilde{\mathbf{a}}_z \times \tilde{\mathbf{a}}_x\\ \mathtt{R}_{a} & = \begin{bmatrix}\tilde{\mathbf{a}}_x & \tilde{\mathbf{a}}_y & \tilde{\mathbf{a}}_z \end{bmatrix} \\ \end{align}$

This results in:
$\mathtt{T}_{\mathrm{\small cam}}^{-1} = \begin{bmatrix} \mathtt{R}_a^{\top} & -\mathtt{R}_a^{\top} \mathbf{C}_{\mathrm{\small eye}}\\ \mathbf{0}^\top & 1\end{bmatrix}$

Example: gluLookAt

Source code of the example with GLUT: LookAt.cpp
Source code of the example with Qt: LookAt.cpp
Source code of the example with Java: LookAt.java

Example: gluLookAt

class Renderer {
  ...
  void resize(int w, int h) {
    glViewport(0, 0, w, h);
    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    gluPerspective (30.0, (float)w/(float)h, 2.0, 20.0);
  }
  void display() {    
    glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
    glMatrixMode(GL_MODELVIEW);
    glLoadIdentity();

    // camera orbits in the y=10 plane
    // and looks at origin
    double rad = M_PI / 180.0f * t;
    gluLookAt(10.0*cos(rad), 10.0 , 10.0*sin(rad), // eye
              0.0, 0.0, 0.0, // look at
              0.0, 1.0, 0.0); // up

    //draw cube at origin
    drawCube();

    glRotatef(45.0f, 0.0f, 0.0f, 1.0f);
    glTranslatef(2.5f, 0.0f, 0.0f );
    glScalef(0.5f, 0.5f, 0.5f);

    //draw transformed cube
    drawCube(); 
  }
  ...
}

Example: gluLookAt

Which transformations are applied to the vertices of the smaller cube?
```
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
gluPerspective (...);

glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
gluLookAt(...);


glRotatef(...);
glTranslatef(...);
glScalef(...);
```
$\mathtt{T}_{\mathrm{\small projection}}= \mathtt{I}$
$\mathtt{T}_{\mathrm{\small projection}}= \mathtt{I} \, \mathtt{A}$

$\mathtt{T}_{\mathrm{\small modelview}}= \mathtt{I}$
$\mathtt{T}_{\mathrm{\small modelview}}= \mathtt{I}\,\mathtt{T}_{\mathrm{\small cam}}^{-1}$

$\mathtt{T}_{\mathrm{\small modelview}}= \mathtt{I}\,\mathtt{T}_{\mathrm{\small cam}}^{-1} \,\mathtt{T}_r$
$\mathtt{T}_{\mathrm{\small modelview}}= \mathtt{I}\,\mathtt{T}_{\mathrm{\small cam}}^{-1} \,\mathtt{T}_r\,\mathtt{T}_t$
$\mathtt{T}_{\mathrm{\small modelview}}= \mathtt{I}\,\mathtt{T}_{\mathrm{\small cam}}^{-1} \,\mathtt{T}_r\,\mathtt{T}_t\,\mathtt{T}_s $

$\begin{align} \underline{\tilde{\mathbf{P}}} &= \mathtt{T}_{\mathrm{\small projection}} \mathtt{T}_{\mathrm{\small modelview}} \, \underline{\mathbf{P}}\\ &= \mathtt{A} \, \mathtt{T}_{\mathrm{\small cam}}^{-1} \,\mathtt{T}_r\,\mathtt{T}_t\,\mathtt{T}_s \,\underline{\mathbf{P}} \end{align}$

Example: gluLookAt and chains of transformations

Source code with GLUT: LookAtLocalTrans.cpp
Source code with Qt: LookAtLocalTrans.cpp
Source code with Java: LookAtLocalTrans.java

Example: gluLookAt and chains of transformations

class Renderer {
public:
  float t;

public:
  Renderer() : t(0.0) {}

public:
  void display() {    
    glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

    glMatrixMode(GL_MODELVIEW);
    glLoadIdentity();
    
    // camera orbits in the y=10 plane
    // and looks at origin
    double rad = M_PI / 180.0f * t;
    gluLookAt(10.0*cos(rad), 10.0 , 10.0*sin(rad), // eye
              0.0, 0.0, 0.0, // look at
              0.0, 1.0, 0.0); // up

    //draw model at origin
    drawCubeHierarchy(0, 4);
  }

  void init() {
    glEnable(GL_DEPTH_TEST);
  }

  void resize(int w, int h) {
    glViewport(0, 0, w, h);
    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    gluPerspective (30.0, (float)w/(float)h, 0.1, 50.0);
  }

private:
  void drawCube() {
    ...
  }

  void drawCubeHierarchy(int depth, int neighbors) {
    drawCube(); // draw parent
    depth +=1;
    if (depth < 6){
      for (int n = 0; n < neighbors; n++){
        glPushMatrix();
        glRotatef(n*90.0f-90.0f, 0.0f, 0.0f, 1.0f);
        glTranslatef(2.5f, 0.0f, 0.0f );
        glScalef(0.5f, 0.5f, 0.5f);
        drawCubeHierarchy(depth, 3); // draw children
        glPopMatrix();
      }
    }
  }
};

Per-Vertex Operations

When using the fixed-function pipeline the following transformations are applied on the vertex data

OpenGL-Pipeline

Source: based on Mark Segal, Kurt Akeley, The OpenGL Graphics System: A Specification Version 2.0, 2004, Figure 2.1. Block diagram of the GL (modified)

Perspective Division

The so-called "Perspective Division" transfers the projected points in homogeneous coordinates into the Cartesian coordinate system by dividing by the last coordinate:
$\underline{\mathbf{P}} = \begin{pmatrix}p_x\\p_y\\p_z\\p_w\end{pmatrix} \in \mathbb{H}^3 \quad \longmapsto \quad \mathbf{P}= \begin{pmatrix}\frac{p_x}{p_w}\\\frac{p_y}{p_w}\\\frac{p_z}{p_w} \end{pmatrix} \in \mathbb{R}^3 $

Clipping

The projection matrix was designed such that after projection and perspective division all $x$, $y$ and $z$-coordinates within the visible volume are mapped to the range $-1$ to $1$
All primitives that are completely outside this range must not be drawn
By testing for the range $[-1;1]$ it would be easy to implement clipping after the perspective division step
In OpenGL, the clipping is carried out before the perspective division. Why?

Instead of testing the range $[-1;1]$ the range $[-p_w;p_w]$ can be checked just as quickly
$-p_w < p_x < p_w \quad \longmapsto \quad -1 < \frac{p_x}{p_w} < 1$
This has the advantage that
- for the case $p_w=0$ no special treatment is needed and
- the division computation for clipped coordinates is no longer needed

Viewport Transformation

$x$

$y$

-1

screen

width

height

In a final transformation step , the coordinates in the range $[-1;1]$ are scaled to the screen coordinates

To this end, OpenGL provides the command:

glViewport(int ix, int iy, int width, int height)

The variables ix and iy define the lower-left corner of the viewport and width and height the screen size (the unit is pixels)

Example: glViewport

Source code of the example with GLUT: Viewport.cpp
Source code of the example with Qt: Viewport.cpp
Source code of the example with Java: Viewport.java

Example: glViewport

class Renderer {

public:
  float t;

public:
  Renderer() : t(0.0), width(0), height(0) {}

public:
  void display() {
    glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

    // top right viewport (look from front)
    glViewport(width/2, height/2, width/2, height/2);
    glMatrixMode(GL_MODELVIEW);
    glLoadIdentity();
    drawFrame();
    // set camera (look from positive x-direction)
    gluLookAt(10.0, 0.0, 0.0, 
               0.0, 0.0, 0.0, 
               0.0, 0.0, 1.0);
    // draw scene
    drawSceneGrid();
    drawRotatingPyramid();

    // bottom left viewport (look from left)
    glViewport(0, 0, width/2, height/2);
    glMatrixMode(GL_MODELVIEW);
    glLoadIdentity();
    drawFrame();
     // set camera (look from negative y-direction)
    gluLookAt(0.0, -10.0, 0.0, 
              0.0,   0.0, 0.0,
              0.0,   0.0, 1.0);
    // draw scene
    drawSceneGrid();
    drawRotatingPyramid();

    // top left viewport (look from top)
    glViewport(0, height/2, width/2, height/2);
    glMatrixMode(GL_MODELVIEW);
    glLoadIdentity();
    drawFrame();
    // set camera (look from positive z-direction)
     gluLookAt(0.0, 0.0, 10.0, 
               0.0, 0.0,  0.0, 
              -1.0, 0.0,  0.0);
    // draw scene
    drawSceneGrid();
    drawRotatingPyramid();

    // bottom right viewport (perspective)
    glViewport(width/2, 0, width/2, height/2);
    glMatrixMode(GL_MODELVIEW);
    glLoadIdentity();
    drawFrame();
    // set camera
    gluLookAt(8.0, -2.0, 5.0, 
              0.0,  0.0, 0.0, 
              0.0,  0.0, 1.0);
    // draw scene
    drawSceneGrid();
    drawRotatingPyramid();
  }

  void init() {
    glEnable(GL_DEPTH_TEST);
    //glEnable(GL_CULL_FACE);
  }

  void resize(int w, int h) {
    width = w;
    height = h;
    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    gluPerspective (30.0, 
                    (float)width/(float)height, 
                    0.1, 50.0);
  }

private:
  int width;
  int height;

private:
  void drawFrame() {
      glLineWidth(2.0);
      glMatrixMode(GL_PROJECTION);
      glPushMatrix();
      glLoadIdentity();
      glColor3f(1.0f, 1.0f, 1.0f);
      glBegin(GL_LINE_LOOP);
      glVertex3f(-1.0f, 1.0f, 0.0f);
      glVertex3f( 1.0f, 1.0f, 0.0f);
      glVertex3f( 1.0f,-1.0f, 0.0f);
      glVertex3f(-1.0f,-1.0f, 0.0f);
      glEnd();
      glPopMatrix();
      glMatrixMode(GL_MODELVIEW);
      glLineWidth(1.0);
  }

  void drawSceneGrid() {
      glColor3f(0.3f, 0.3f, 0.3f);
      glBegin(GL_LINES);
      for(unsigned i=0; i<=10; i++) {
        glVertex3f(-5.0f+i, -5.0f,   0.0f);
        glVertex3f(-5.0f+i,  5.0f,   0.0f);
        glVertex3f(-5.0f,   -5.0f+i, 0.0f);
        glVertex3f( 5.0f,   -5.0f+i, 0.0f);
      }
      glEnd();  

    glColor3f(0.0f, 0.0f, 1.0f);
    drawCoordinateAxisZ();
    glColor3f(0.0f, 1.0f, 0.0f);
    drawCoordinateAxisY();
    glColor3f(1.0f, 0.0f, 0.0f);
    drawCoordinateAxisX();
  }

  void drawCoordinateAxisZ() {
    glLineWidth(2.0);
    glBegin(GL_LINES);
    glVertex3f(0.0f, 0.0f, 0.0f); // z-axis
    glVertex3f(0.0f, 0.0f, 2.0f);
    glEnd();
    glLineWidth(1.0);

    // z-axis tip
    glBegin(GL_TRIANGLES);
    glVertex3f( 0.0f, 0.0f, 2.0f);
    glVertex3f(-0.05f, 0.05f, 1.9f);
    glVertex3f( 0.05f, 0.05f, 1.9f);
    glVertex3f( 0.0f,  0.0f, 2.0f);
    glVertex3f( 0.05f, -0.05f, 1.9f);
    glVertex3f(-0.05f, -0.05f, 1.9f);
    glVertex3f( 0.0f,  0.0f, 2.0f);
    glVertex3f( 0.05f,  0.05f, 1.9f);
    glVertex3f( 0.05f, -0.05f, 1.9f);
    glVertex3f( 0.0f,  0.0f, 2.0f);
    glVertex3f(-0.05f, -0.05f, 1.9f);
    glVertex3f(-0.05f,  0.05f, 1.9f);
    glEnd();
    glBegin(GL_POLYGON);
    glVertex3f( 0.05f, -0.05f, 1.9f);
    glVertex3f( 0.05f,  0.05f, 1.9f);
    glVertex3f(-0.05f,  0.05f, 1.9f);
    glVertex3f(-0.05f, -0.05f, 1.9f);
    glEnd();
  }

  void drawCoordinateAxisX() {
      glPushMatrix();
      glRotatef(90.0f, 0.0f, 1.0f, 0.0f);
      drawCoordinateAxisZ();
      glPopMatrix();
  }

  void drawCoordinateAxisY() {
      glPushMatrix();
      glRotatef(-90.0f, 1.0f, 0.0f, 0.0f);
      drawCoordinateAxisZ();
      glPopMatrix();
  }

  void drawRotatingPyramid() {
    glRotatef(t, 0.0f, 0.0f, 1.0f);
    drawPyramid();
  }

  void drawPyramid() {
    glColor3f(1.0,0.0,0.0);
    glBegin(GL_TRIANGLES);
    glVertex3f( 0.0f, 0.0f, 1.5f);
    glVertex3f(-1.0f, 1.0f, 0.0f);
    glVertex3f( 1.0f, 1.0f, 0.0f);
    glEnd();
    glColor3f(0.0,1.0,0.0);
    glBegin(GL_TRIANGLES);
    glVertex3f( 0.0f,  0.0f, 1.5f);
    glVertex3f( 1.0f, -1.0f, 0.0f);
    glVertex3f(-1.0f, -1.0f, 0.0f);
    glEnd();
    glColor3f(0.0,0.0,1.0);
    glBegin(GL_TRIANGLES);
    glVertex3f( 0.0f,  0.0f, 1.5f);
    glVertex3f( 1.0f,  1.0f, 0.0f);
    glVertex3f( 1.0f, -1.0f, 0.0f);
    glEnd();
    glColor3f(1.0,1.0,0.0);
    glBegin(GL_TRIANGLES);
    glVertex3f( 0.0f,  0.0f, 1.5f);
    glVertex3f(-1.0f, -1.0f, 0.0f);
    glVertex3f(-1.0f,  1.0f, 0.0f);
    glEnd();
    glColor3f(0.0,1.0,1.0);
    glBegin(GL_POLYGON);
    glVertex3f( 1.0f, -1.0f, 0.0f);
    glVertex3f( 1.0f,  1.0f, 0.0f);
    glVertex3f(-1.0f,  1.0f, 0.0f);
    glVertex3f(-1.0f, -1.0f, 0.0f);
    glEnd();
  }
};

Vanishing Points

By using a perspective projection parallel lines in 3D space are mapped to non-parallel lines in the 2D image plane
The 2D intersection of this line in the image plane is called vanishing point
Each spatial direction can have its own (or no) vanishing point
Depending on how many vanishing points exist, the projection is called a 1-, 2-, or 3-point perspective

Source: wikipedia.org; Author: Wolfram Gothe 2009; public domain

Z-Buffer

Depth Test

In the previous examples glEnable(GL_DEPTH_TEST) and glClear(GL_DEPTH_BUFFER_BIT) were used without discussing their functionality
The function call glEnable(GL_DEPTH_TEST) is used to activated the depth test in OpenGL
If the depth test is disabled, the primitives are written into the framebuffer in the order in which they are passed into the OpenGL pipeline
This means that later drawn primitives are covering the ones drawn earlier
This is typically not the desired behavior
Instead, primitives that are closer to the camera should cover more distant ones, regardless of the order of drawing
Ideally, the decision for each drawn pixel should be done in the framebuffer, because the individual primitives can penetrate each other
In OpenGL the Z-Buffer method is employed

Z-Buffer Method

$x$

Normalized device coordinates

Camera coordinates

$\mathtt{A}$

$x$

$y$

$z$

$x$

$y$

$z$

Although actually the $z$-coordinate in the camera coordinate system is the one to consider, the depth test can be carried out after the perspective division, since the depth relations are not changed
However, when using "Normalized device coordinates" the $z$-axis is reversed with respect to the camera coordinate system, i.e., more distant points have a larger $z$
(note, the left-handed coordinate here)
For points on the near-plane in the camera coordinate system, now applies $\tilde{p}_z=-1$ and respectively for the ones on the far-plane $\tilde{p}_z=1$

Z-Buffer Method

The Z-Buffer method requires (in addition to the usual framebuffer which contains the color information) a depth buffer of the same dimensions, which contains the depth values

Framebuffer Depth Buffer

Z-Buffer Method

At the beginning of the rendering process, the depth buffer is initialized with the z-values of the far-plane. This is done in OpenGL using the command glClear(GL_DEPTH_BUFFER_BIT)
Writing a pixel in the frame- and depth-buffer occurs during the per-fragment operations in the OpenGL pipeline
The depth value for each pixel is interpolated by the rasterizer using the transformed vertex information
If the depth value for the pixel is smaller than the currently stored one in the depth buffer the color value is written into the framebuffer and the depth value into the depth buffer, otherwise both remain unchanged

FOR each primitiv
  FOR each pixel of primitive at position (x,y) with colour c and depth d
    IF d < depthbuffer(x,y) 
      framebuffer(x,y) = c
      depthbuffer(x,y) = d
    END IF
  END FOR
END FOR

Z-Fighting

depth resolution

$z$

The depth buffer has only a certain accuracy. Typically an integer value with 16, 24 or 32 bits of precision
The interval [-1.0; 1.0] is mapped to [0.0, 1.0] and then to [0, MAX_INT], e.g., [0, 65535] for 16 bits
The value is rounded to the nearest integer
Because the "Normalized device coordinates" have already been divided by $p_w$ the rounding errors for objects close to the camera are smaller (and consequently their depth accuracy is higher)
Therefore, at distant primitives that are close together sometimes the so-called "Z-Fighting" can be observed, which is caused by random inaccuracies in the z-values where at times the one or the other primitive is shown.
To resolve Z-fighting, it is important to choose the near- and far-plane with care, since these ultimately define the z-range onto which the possible integer depth value are spread
Therefore, the near- and far-plane should be selected as close together as possible, such that they just enclose the depicted 3D scene

Example: Z-Fighting

Source code of the example with GLUT: ZFighting.cpp
Source code of the example with Qt: ZFighting.cpp
Source code of the example with Java: ZFighting.java

Example: Z-Fighting

class Renderer {

public:
  float t;
  int width, height;
  double nearPlane, farPlane;
  int depthBits; 

public:
  Renderer() : t(0.0), nearPlane(2.0), farPlane(20.0) {}

public:

  void resize(int w, int h) {
    glViewport(0, 0, w, h);
    width = w;
    height = h;
  }

  void display() {    
    glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    gluPerspective (30.0, (float)width/(float)height, nearPlane, farPlane);
    glMatrixMode(GL_MODELVIEW);
    glLoadIdentity();

    // camera orbits in the y=10 plane
    // and looks at origin
    double rad = M_PI / 180.0f * t;
    gluLookAt(10.0*cos(rad), 10.0 , 10.0*sin(rad), // eye
              0.0, 0.0, 0.0, // look at
              0.0, 1.0, 0.0); // up

    //draw cube at origin
    drawCube();
  }

  void init() {
    glEnable(GL_DEPTH_TEST);
    glGetIntegerv (GL_DEPTH_BITS, &depthBits);
  }
private:
  void drawCube() {
    ...
  }
};

Are there any questions?

Please notify me by e-mail if you have questions, suggestions for improvement, or found typos: Contact

Graphics Programming Cameras: Perspective Projection

Control Keys

Notation

Cameras

Camera Models

Camera Models

Perspective Projection

Pinhole Camera

Pinhole Camera

Pinhole Camera

Perspective Projection

Perspective Projection

Perspective Projection in OpenGL

Perspective Projection in OpenGL

Perspective Projection in OpenGL

Perspective Projection in OpenGL

Perspective Projection in OpenGL

Transformation Matrices in OpenGL

Perspective Projection in OpenGL

Example: "Dolly Zoom" or "Vertigo Effect"

Example: "Dolly Zoom" or "Vertigo Effect"

Example: Dolly Zoom in OpenGL

Example: Dolly Zoom in OpenGL

Transformation of the Camera

Transformation of the Camera

Transformation of the Camera

Transformation of the Camera

Transformation of the Camera in OpenGL

gluLookAt

Example: gluLookAt

Example: gluLookAt

Example: gluLookAt

Example: gluLookAt and chains of transformations

Example: gluLookAt and chains of transformations

Per-Vertex Operations

OpenGL-Pipeline

Perspective Division

Clipping

Viewport Transformation

Example: glViewport

Example: glViewport

Vanishing Points

Z-Buffer

Depth Test

Z-Buffer Method

Z-Buffer Method

Z-Buffer Method

Z-Fighting

Example: Z-Fighting

Example: Z-Fighting

Are there any questions?

Graphics Programming
Cameras: Perspective Projection