© CMG Lee • Dec2003
  Home 
Basic 3D 
Projection 
Holographic 
Optical addressing 
Flat panel 
Survey 
Accommodation 
Autostereo 
Perspective 
Principles 
Simulation 
Cam Display 
References 
 

Principles of three-dimensional vision

  1. Overview

    As television and motion pictures trick the eye into seeing continuous motion from discrete images, 3D displays simulate a 3D scene by presenting to each eye of the viewer what it expects to see in the scene from its position.

    In order to simulate this, we need to understand the mechanisms used by humans to visualise a 3D scene.

  2. Depth perception

    Several cues or stimuli are interpreted (perhaps subconsciously) by the viewer's brain to visualise depth from what the eyes see. These cues can be divided into two following categories [Marston,2000].

    Category Cue Strength Range
    Physiological Stereopsis Strong under 1 km
    Kineopsis Strong under 1 km
    Convergence Medium under 10 m
    Accommodation Medium under 2 m
    Psychological Occlusion Strong any
    Linear perspective Varied any
    Light and shade Varied any
    Retinal image size Varied any
    Texture gradients Varied under 1 km
    Luminance/colour Weak over 100 m
    Areal perspective Weak over 1 km


    1. Physiological depth cues

      Physiological depth cues depend on physical adaptation of the eye to properly focus on the image. These cues are stronger than psychological ones

      Stereopsis

      Stereopsis or binocular disparity, the strongest physiological depth cue, relies on the slightly different view each eye sees due to their different location in space. The average separation of adult human eyes is 6.5 cm.

      Kineopsis

      Kineopsis or motion parallax is similar to binocular disparity but depends on a shift of position of the viewer. When the viewer moves, the images of objects at different distances move at different rates. This cue is strong up to large distances.

      Convergence

      Convergence is the toe-in of the eyeballs to fuse the two separate views from the eyes into a coherent scene, and is fairly short-ranged.

      Accommodation

      Accommodation is the focusing of the image on the retina using the ciliary muscle. Its effect increases for short distances.

    2. Psychological depth cues

      Psychological depth cues derive from prior experience the viewer has had in 3D environments, including the real world. Unlike the physiological depth cues, they can all be simulated in a single 2D image.

      Occlusion

      Occlusion, the strongest psychological depth cue, occurs when objects appear to overlap one another. The one with the most continuous outline is assumed to be nearest.

      Linear perspective

      Linear perspective is the apparent convergence of parallel lines with distance.

      Light and shade

      Light and shade is based on the assumption that a scene is lit from above.

      Retinal image size

      Retinal image size utilises prior knowledge of spatial dimensions of an object recognised in the scene. Its apparent size then suggests its approximate distance.

      Texture gradients

      Texture gradients are useful when surfaces have a uniform texture. For a given surface with a uniform texture, finer detail (zoomed out) suggests a farther surface.

      Saturation and areal perspective

      The luminance or brightness, and colour saturation of an object tends to decrease with distance.

      Areal perspective is the haziness or loss of contrast caused by particulate scattering apparent for very far objects.

  3. Colour perception

    Human colour vision is effected by the cones in the eyes. Most people have three types of cones with peak sensitivities at wavelengths 570 nm, 535 nm and 445 nm corresponding to red, green and blue respectively [Koehler,1996]. The combination of the response of each type of cone is interpreted as colour. It is hence sufficient to separate each of these primary colours, process and display them as separate channels.

    The display can be spatially-multiplexed (each channel is optically combined or projected onto the same screen), time-multiplexed (each channel is displayed sequentially at a rate higher than that of persistence of vision, around 24 Hz) or a hybrid of both techniques.


Copyright CMG Lee & ARL Travis, Photonics and Sensors Group, Cambridge University Engineering Department