Immersion, Presence, and Performance in Virtual Environments

Title Page

1. Introduction: Is VR better than a workstation?

2. Immersion, Presence and Task Performance

2.1 Immersion and Presence

In reports of earlier studies we have made a distinction between immersion and presence (Slater, Usoh and Steed, 1995). Immersion refers to what is, in principle, a quantifiable description of a technology. It includes the extent to which the computer displays are extensive, surrounding, inclusive, vivid and matching. The displays are more extensive the more sensory systems that they accommodate. They are surrounding to the extent that information can arrive at the person's sense organs from any (virtual) direction, and the participant can turn towards that direction receiving the appropriate directional sensory signals. The notion of surrounding also includes the greater the reproduction of the natural modes of sensory presentation (visual and auditory stereopsis for example). They are inclusive to the extent that all external sensory data (from physical reality) is shut out. Their vividness is a function of the variety and richness of the sensory information they can generate (Steuer, 1992). Vividness is concerned with the richness, information content, resolution and quality of the displays. Finally, immersion requires that there is match between the participant's proprioceptive feedback about body movements, and the information generated on the displays. A turn of the head should result in a corresponding change to the visual display, and, for example, to the auditory displays so that perceived sound direction is invariant to the orientation of the head. Matching requires body tracking, at least head tracking, but generally the greater the degree of body mapping, the greater the extent to which the movements of the body can be accurately reproduced.

Immersion also requires a self-representation in the VE - a Virtual Body (VB). The VB is both part of the perceived environment, and represents the being that is doing the perceiving. Perception in the VE is centred on the position in virtual space of the VB - e.g., visual perception from the viewpoint of the eyes in the head of the VB, an egocentric viewpoint.

Our general hypothesis is that presence is an increasing function of two orthogonal variables. The first variable is the extent of the match between the displayed sensory data and the internal representation systems and subjective world models typically employed by the participant. Although immersion is increased with the vividness of the displays, as discussed above, we must also take into account the extent to which the information displayed allows individuals to construct their own internal mental models of reality. For example, a vivid visual display system might afford some individuals a sense of "presence", but be unsuited for others in the absence of sound (Slater, Usoh and Steed, 1994). The second variable is the extent of the match between proprioception and sensory data. The changes to the display must ideally be consistent with and match through time, without lag, changes caused by the individual's movement and locomotion - whether of individual limbs or the whole body relative to the ground.

Immersion, in our view, is therefore an objective description of what any particular system does provide. Presence is a state of consciousness, the (psychological) sense of being in the virtual environment, and corresponding modes of behaviour. Participants who are highly present should experience the VE as more the engaging reality than the surrounding world, and consider the environment specified by the displays as places visited rather than as images seen. Behaviours in the VE should be consistent with behaviours that would have occurred in everyday reality in similar circumstances.

It is important to realise that this model operates at many levels. Considering the visual display as an example, at the most basic level the important factors may be field of view, resolution, colour resolution, binocular disparity. Corresponding behaviours from the point of view of presence are those autonomic responses governed by the visual system such as vergence and accommodation (Ellis, 1991). At a higher level the realism of the content of the visual display may be considered - such as whether objects behave in accordance with physical laws. Corresponding behaviour with respect to presence may be concerned with observable gross involuntary behaviour - such as the looming effect (when an individual ducks in response to a flying object), or the experience of vertigo in response to a virtual visual cliff. At the highest level such features as the realism of the illumination may govern people's voluntary responses, such as evaluations of their sense of "being there", or the "realism" of the virtual environment.

2.2 Presence and Task Performance

It is sometimes argued that it is important to study presence because of the potential relationship between presence and performance. For example, in (Barfield, Sheridan, Zeltzer, Slater, 1995) we find:

"Not only is it necessary to develop a theory of presence for virtual environments, it is also necessary to develop a basic research program to investigate the relationship between presence and performance using virtual environments. ... we need to determine when, and under what conditions, presence can be a benefit or a detriment to performance? ... When simulation and virtual environments are employed, what is contributed by the sense of presence per se?"

The question of the relationship between presence and performance goes to the heart of why presence is important. The issue is not really that of whether presence itself enhances performance. For example, an individual's performance in word processing is usually superior using a modern point-and-click user interface than under UNIX using "vi" - not of course because of presence, but because of the former's superior user interface. In our view presence is important because the greater the degree of presence, the greater the chance that participants will behave in a VE in a manner similar to their behaviour in similar circumstances in everyday reality. Hence if an IVE is being used to train fire-fighters or surgeons, then presence is crucial, since we want them to behave appropriately in the VE and then transfer knowledge to corresponding behaviour in the real world. There could obviously be cases where presence would diminish performance, just as being present in a situation in real life using a machine with a poor "user interface" similarly affects performance adversely.

Hence it is posing the wrong question to consider whether presence per se facilitates task performance. Rather presence brings into play "natural" reactions to a situation (which may or may not have something to do with efficiency of task performance) - and the greater the extent to which these natural reactions can be brought into play the greater that presence is facilitated, and so on. It isn't really a question of how good the performance is, but rather how it is grounded in presence.
We would nevertheless expect to find an association between presence and performance for some tasks - precisely those tasks that benefit from immersion. For the purposes of this study we postulated that increased "immersion" would lead to improved "task performance" (to be defined below in the context of this experiment). This is because the task involved comprehension and memory of a complex three dimensional structure and events relating to that structure, and we considered that performance would be enhanced by an egocentric, stereo view based on a head-tracked HMD compared to an exocentric screen based view. Since our overall hypothesis is that both performance and presence are enhanced by immersion, we would therefore not be surprised to find an association between performance and presence.

3. Experiment