Movement, Attention and Movies: the Possibilities and Limitations of Eye Tracking? – Adrian G. Dyer & Sarah Pink

Abstract

Movies often present a rich encapsulation of the diversity of complex visual information and other sensory qualities and affordances that are part of the worlds we inhabit. Yet we still know little about either the physiological or experiential elements of the ways in which people view movies. In this article we bring together two approaches that have not commonly been employed in audience studies, to suggest ways in which to produce novel insights into viewer attention: through the measurement of observer eye movements whilst watching movies; in combination with an anthropological approach to understanding vision as a situated practice. We thus discuss how both eye movement studies that investigate complex media such as movies need to consider some of the important principles that have been developed for sensory ethnography, and in turn how ethnographic and social research can gain important insights into aspects of human engagement from the emergence of new technologies that can better map how an understanding of the world is constructed through sensory perceptual input. We consider recent evidence that top down mediated effects like narrative do promote significant changes in how people attend to different aspects of a film, and thus how film media combined with eye tracking and ethnography may reveal much about how people build understandings of the world.

Introduction

Seeing in complex environments is not a trivial task. Whilst people are often under the impression that you can believe what you see (Levin et al. 2000), physiological and neural constraints on how our visual system operates means that only a very small proportion of an overall visual scene might be reliably perceived at one point in time during the evaluation of a sequence of events. Evidence for the way in which we often only perceive a portion of the vast amount of visual information present in a scene is nicely illustrated in the ‘Gorillas in our Midst’ short (25s) motion sequence where a group of six participants (three dressed respectively in white or black teams) are filmed passing a basketball between team members (Simons and Chabris 1999). Subjects observing the film sequence are required to count the number of passes between the three students dressed in white, and whilst many subjects do correctly count the number of passes, the majority of test subjects fail to observe a large gorilla (an actor dressed as a gorilla) that walks into the middle of the visual field and beats it’s chest, before walking casually out of the scene. People typically don’t see this salient gorilla in the action sequence because their attention has been directed to the basketball catching team in white with the instruction of counting the number of passes. Why do we miss such a salient object as a gorilla, and what does this mean for our understanding of how different subjects might view complex information in real life, or in presentations that encapsulate aspects of real life, such as movies?

In this article we take an interdisciplinary approach to the question of how we might see certain things in complex dynamic environments. We draw together insights from the neurosciences and eye tracking studies, with anthropological understandings of vision and audio-visual media in order to map out an approach to audience research that accounts for the relationship between human perception, vision as a form of practical activity, and the environments through which these are co-constituted. We first build a brief outline of how the eye, visual perception and the subjectivity or selectivity of viewing are currently understood from the perspective of vision sciences. This demonstrates how physiologically there is evidence that the eye sees selectively, yet it does not fully explain why or how perceptual understanding might vary across different persons, or for the same person across different contexts. We then build on this understanding with a discussion of what we may learn from eye tracking studies with moving images. As we will show, eye tracking can offer detailed measurements of how the eye attends to specific instances, movements, and points within sequences of action. This can reveal patterns of attention across a sample of participants, towards specific types of action. Yet eye tracking is limited in that while it can tell us what participants’ eyes are attending to, it cannot easily tell us why, what they are experiencing, what their affective states are, nor how their actions are shaped by the wider social, material, sensory and atmospheric environments of which they are part. Therefore in the subsequent section we turn to phenomenological anthropology, and draw on the possibilities provided by the theoretical-ethnographic dialogue that is at the core of anthropological research, to suggest how the propositions of eye tracking studies might be situated in relation to the ongoingness and movement of complex environments.

We discuss that such an interdisciplinary approach, which brings together monitoring and measurement with qualitative and experiential research, is needed to generate understandings of not only what people view but of how these viewing practices and experiences become relevant as part of the ways in which they both perceive and participate in the making of everyday worlds. However we end the article with a note about the relative complexities of working across disciplines, and in particular between those that measure and those that use empathetic and collaborative modes of understanding and knowing, which can be theorised as part of the ways film is experienced (Bordwell and Thompson 2010; Pink 2013). For a review of how these issues may relate to broader issues about film culture, eye tracking and the moving image readers are also referred to the manuscripts in this special issue by Redmond and Batty (2015), and Smith (2015).

Visual Resolution, Perception and the Human Eye

To enable visual perception the human eye has cone photoreceptors distributed across the retina which enable wide field binocular visual perception of about 180 degrees (Leigh and Zee 2006). In the central fovea region of the eye, cone photoreceptors are much more densely packed, and our resulting high acuity vision is only about 2-3 degrees of visual angle (Leigh and Zee 2006). Visual angle is a convenient way understand the relationship between the actual size of an object and viewing distance, for example, our fovea acuity is approximately equivalent to the width of our thumb held at about 57 cm (at this distance 1cm represents 1 degree of visual angle). This means that to view visual information in detail it is often necessary to direct the gaze of our eyes to different parts of a scene, and this is typically done with either ballistic eye movements termed saccades, or much slower smooth pursuit eye movements like when we follow the movement of a slow object in the distance (Martinez-Conde et al. 2004). Saccades are commonly broken down into two main types that are of high value for interpreting how viewers might perceive their environment, including reflexive saccades mainly thought to be driven by image salience (also termed exogenous control), or volitional saccades (endogenous control) where a viewers’ internal decision making directs attention through top-down mechanisms to where the gaze should be attended within a scene or movie sequence (Martinez-Conde et al. 2004; Parkhurst et al. 2002; Tatler et al. 2014; Pashler 1998; Smith 2013). Thus eye movements can be, in very broad terms, described as ‘bottom up’ processing when the eye makes reflexive saccades to salient stimuli within a scene, or ‘top down’ when a viewer uses their volitional control to direct where the eye should look, and both types of saccade are important for understanding how we interacted with complex scenes in everyday life. For example, on entering a café we might casually gaze at the wonderful variety of cakes with reflective saccades to all the highly colourful icings; but when a friend says to ‘try the chocolate cake’ we direct our eyes only to cakes of chocolate brown colour using volitional saccades. Interestingly, these different types of saccadic eye movements are likely to involve different cortical processing of information (Martinez-Conde et al. 2004), potentially allowing for complex multi modal processing that incorporates the rich and dynamic environment experienced when viewing a movie. It is likely that both these mechanisms operate whilst subjects view a film, and the extent to which mechanism dominates during a particular film sequence may depend upon factors like visual design, narrative, audio input and cinema graphic style, as well as individual experience or demographic profile of observers.

The fact that we typically only perceive the world in low resolution at any one point in time can be easily illustrated with an eye chart in which letters of different parts of our visual field are scaled to make the letters equally legible when a subject fixates their gaze on a central fixation spot, or simulated by selectively Gaussian blurring a photograph such that it matches how we see detail at any one point in time (Figure 1). Human subjects typically shift their gaze about 3 times a second in many real world type scenarios in order to build up a detailed representation of our visual environment (Martinez-Conde et al. 2004; Tatler et al. 2014; Yarbus 1967). To efficiently direct the fovea to different parts of a visual scene, the human eye usually makes saccades, which also require a shift of the observer’s attention (Kustov et al. 1996; Martinez-Conde et al. 2004). One way to record subject gaze is to use a video-based eye tracking system that makes use of the different reflective properties of the eye to infrared radiation (Duchowski 2003), using a wavelength of radiation that is both invisible to the test subject and does not damage the eye. This non invasive technique thus enables very natural behavioural responses to be collected from a wide range of subjects. When the eye is illuminated by infrared light, which is typically provided by the eye tracking equipment, it enters the lens and is strongly reflected back by the retina providing a high contrast signal for an infrared camera to record, whilst some of the carefully placed infrared lights also reflect off the cornea of the eye which provides a constant references signal to enable eye tracker software to disentangle minor head movements from the actual eye movements of a subject. A subject is first calibrated to grid stimulus of known spatial dimensions (Dyer et al. 2006), and then when test images are viewed it is possible to accurately quantify the different regions of a scene to which the subject pays attention, the sequence order off this attention, and thus also what features of a scene may escape the direct visual attention of a viewer (Duchowski 2003). The use of this non invasive technique then directly enables the measurement of subject attention to the different components of a stimulus (Figure 2), and has been extensively employed for static images for many fields including medicine, forensics, face processing, advertising, sport and perceptual learning (Dyer et al. 2006; Horsely 2014; Russo et al. 2003; Tatler 2014; Vassallo et al. 2009; Yarbus 1967).

Figure 1. The way our eye samples the world means that only the central fovea region is viewed in detail. The left had image shows letters scaled to equal legibility when a subject fixates gaze to the central dot, and the right hand image is a photographic reconstruction of how an eye would typically resolve detail of the Sydney Harbour Bridge at one point in time.

Figure 1. The way our eye samples the world means that only the central fovea region is viewed in detail. The left had image shows letters scaled to equal legibility when a subject fixates gaze to the central dot, and the right hand image is a photographic reconstruction of how an eye would typically resolve detail of the Sydney Harbour Bridge at one point in time.

In recent times there has been a growing appreciation that to understand how the human visual system and brain processes complex information, the use of moving images has significant advantages since these stimuli may more accurately represent the very complex and dynamic visual environments in which we typically operate (Tatler et al. 2011). For example, when the eyes of a subject are tracked whilst driving a car, it can be observed that the gaze of subjects tends to be directed ahead of the responding action that a driver will take (Land and Lee 1994), and in other real life activities like making a cup of tea test subjects also tend to fixate on particular objects before an action like picking up an object (Land et al. 1999). This shows visual processing is often dynamic and may be influenced by top down volitional goals of a subject, whilst static images may not always best represent how subjects’ actions are informed by visual input in a dynamic situation (Tatler 2014). Interestingly, the capacity of subjects at visually anticipating tasks may link to performance or experience at a given action, as elite cricket batsmen viewing action can more efficiently predict the location that a ball will bounce in advance of the event, providing significant advantages for facing fast bowling where decisions must be made very quickly and accurately (Land and McLeod 2000). Thus there is evidence that visual perception and eye movements for moving images may be influenced by top down mechanisms and experience, as well as bottom up salience driven mechanisms of visual processing (Tatler 2014).

Subject viewer gaze and attention in dynamic environments can also be significantly influenced by the actions of other people who may be viewed within a scene. For example, when viewing a simple magic trick where an experienced magician in a video waves a hand to make an object disappear, the gaze direction of subjects viewing the video is heavily influenced by the actual gaze direction of the magician in the video clip (Tatler and Kuhn 2007). If the magician appears to pay attention to his waving hand then subjects follow this misdirection of viewer attention and the magi trick, performed with the other hand, cannot be detected and the magic trick is successful. However; this pattern is changed if the magician’s gaze attends the hand performing the apparent magical act, and then trick is readily detected by observes. This simple but highly effective demonstration shows how viewer experience is not only driven by reflexive bottom up salience signals present in complex images, but several top down and/or contextual factors may influence visual behaviour. The effect of dynamic complex environments affecting subject eye movements has also been observed in demonstrations of how people might encounter each other and either divert or attend their gaze depending upon prior experience, the perception of threat and/or chance of a collision (Jovancevic-Misic and Hayhoe 2009). Other evidence of top down type influences on observer gaze behaviour come from our understanding of how instructions or narrative may influence where a subject looks (Land and Tatler 2009; Tatler 2014; Yarbus 1967). For example, in the classic eye movement experiments done by Yarbus (1967), in which he presented to test subjects static images, a variety of different instructions were provided for viewing the painting ‘The Unexpected Visitor’ by Ilya Repin. These instructions included estimating the material circumstances of subjects within the painting, or the age of the subjects, the subject’s clothing; and a very different set of saccades and fixations was observed for different instructions or a free view situation that might be taken as a condition mainly driven by bottom up salience factors on perception; showing that top down view goals strongly influenced the way in which eye gaze is directed (Tatler 2014; Yarbus 1967).

Eye Tracking For Understanding Dynamic and Complex Visual Information

Whilst these clever, and comparatively complex, evaluations of visual perception are currently teaching us a lot about human visual performance and viewer experience, the current rapid advances in computer technology and eye tracking are now starting to enable the testing of how subjects view very complex dynamic environments as encapsulated in movies (Mital et al. 2011; Smith and Henderson 2008; Smith and Mital 2013; Smith et al. 2012; Treuting 2006; Vig et al. 2009). This potentially allows for new insights into increasingly real world type viewer experience, how the visual system potentially processes very complex information, and how viewers from different demographics may interpret information content in films. For example, some recent work has looked at subject viewer attention within movies and observed high levels of attention to faces (Treuting 2006), revealing consistent behaviours to previous work that used static images (Vassallo et al. 2009; Yarbus 1967), but a wealth of opportunities are becoming available for better understanding real world visual processing.

Figure 2. When we view an image our eyes often fixate on key areas on interest for short periods of about a third of a second, and then the eyes may make ballistic shifts (saccades) to other features. When a typical subject viewed sequential images from the film 'UP', fixations (green circles) mainly centred on the respective faces of main characters, whilst lines between fixations show the direction of respective saccades [image from Craig Batty, Adrian Dyer, Claire Perkins and Jodi Sita. 2014 Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative ( Bloomsbury, 2015) with permission].

Figure 2. When we view an image our eyes often fixate on key areas on interest for short periods of about a third of a second, and then the eyes may make ballistic shifts (saccades) to other features. When a typical subject viewed sequential images from the film ‘UP’, fixations (green circles) mainly centred on the respective faces of main characters, whilst lines between fixations show the direction of respective saccades [image from Craig Batty, Adrian Dyer, Claire Perkins and Jodi Sita. 2014 Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative ( Bloomsbury, 2015) with permission].

A current issue of how to interpret eye movement data for subjects viewing a film is how such a large volume of data can be managed and statistically separated to try and interpret viewer experience. One initial solution is gaze plot analyses which shows the average attention of a number of subjects to a particular scene (Fig. 3). Investigations on still images using gaze plot analyses have indicated a strong tendency for central bias to an image that is largely independent of factors like subject matter or composition (Tatler 2007). Studies on moving images appear to confirm a tendency for viewing restricted parts of the overall image in detail (Dorr et al. 2010; Goldstein et al. 2007; Mital et al. 2011; Smith and Henderson 2008; Tosi et al. 1997), which may hold important implications for data compression type algorithms where large amounts of image data may be streamed to a variety of different mobile viewing devices such that certain information does not have to be displayed at high resolution due to the resolution of the human eye (Fig. 1), or even certain parts of the movie may be modified to enhance viewing experience for visually impaired viewers (Goldstein et al. 2007). Despite the qualitative value of gaze plot displays, quantitative analyses can be better facilitated by allocating Areas of Interest (Fig. 4) to certain components of a scene that are hypothesised to be of high value for dissecting different theories about information processing of moving images. For example, one of the current issues in understanding how eye tracking can inform film culture, and how movies can be a useful stimulus for understanding visual behaviour is having a method that can explore the potential effects of narrative which is a hypothesised top down or endogenous control on viewer gaze behaviour when subjects are freely viewing a movie to enable natural behaviour (Smith 2013).

FIGURE 3: Gaze plot shows the mean attention of a number of viewers (n=12) to a particular scene. In this case faces capture most attention consistent with previous reports (Yarbus 1967, Vassallo et al. 2009). [image from Craig Batty, Adrian Dyer, Claire Perkins and Jodi Sita. 2014 Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative ( Bloomsbury, 2015) with permission].

FIGURE 3: Gaze plot shows the mean attention of a number of viewers (n=12) to a particular scene. In this case faces capture most attention consistent with previous reports (Yarbus 1967, Vassallo et al. 2009). [image from Craig Batty, Adrian Dyer, Claire Perkins and Jodi Sita. 2014 Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative ( Bloomsbury, 2015) with permission].

Recently one study has tackled this question by using a montage sequence from the animation movie ‘Up’ (Pete Docter, 2009) to explore if it is possible to collect empirical evidence that supports modulation between bottom up and top down mechanisms. The animation montage is a high value study case as it encapsulates a lifetime of narrative story within a 262s montage film sequence that contains no dialogue (Batty 2011), and the overall salience of the two principle characters ‘Carl’ and ‘Elli’ depicted within the montage is somewhat consistently matched due to the control exhibited by animation production. For example, in an initial opening scene where these two characters are first encountered by subjects viewing the film there was an almost identical percentage of time to Carl and Elli respectively; however, as the montage unfolds with a life story narrative of marriage, dreams of children, miscarriage, dreams of travel, illness and death; there is a significant difference in the amount of attention paid to the respective characters by viewers at different stages in the montage (Batty et al. 2014). This suggests that influences of top down type processing on the overall salience of complex images as have been observed in some studies using short motion displays in laboratory type conditions (Jovancevic-Misic and Hayhoe 2009; Tatler and Kuhn 2007; Tatler 2014), is a promising avenue of investigation for movie studies if it is possible to design protocols for controlling the many factors that can influence image salience (Parkhurst et al. 2002; Martinez-Conde et al. 2004; Tatler et al. 2014).

FIGURE 4. Areas of interest can be programmed to quantify the number and respective duration of fixations to key components within a scene of a movie, which may allow for the dissection of how factors like narrative influence viewer behaviour. saccades [from Craig Batty, Adrian Dyer, Claire Perkins and Jodi Sita. 2014. Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative (Bloomsbury, 2015) with permission].

FIGURE 4. Areas of interest can be programmed to quantify the number and respective duration of fixations to key components within a scene of a movie, which may allow for the dissection of how factors like narrative influence viewer behaviour. saccades [from Craig Batty, Adrian Dyer, Claire Perkins and Jodi Sita. 2014. Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative (Bloomsbury, 2015) with permission].

Yet because eye tracking can only tell us part of the story – that is, what people look at, and not how and why these ways of looking emerge and are enacted – other qualitative research approaches such as those used in visual and sensory ethnography (Pink 2013; Pink 2015) are needed to put eye tracking data into context. This involves approaching viewing and the practices of vision that it entails as situated activities, and as part of a broader experiential repertoire beyond the eye. The subjectivity and selectivity of viewing that the studies outlined above have evidenced, once documented and measured, can only be properly understood as emergent from particular (and always complex) environmental conditions and embodied experiences. In the next section we therefore turn to anthropological approaches to vision and the environment in order to show how this might be achieved. However before proceeding we note that when working across disciplines, there is inevitably a certain amount of conceptual slippage. Here this means that whereas we have ended the previous paragraph by suggesting that eye tracking enables our understanding of how complex environmental information is processed, in the next sections we refigure this way of thinking to consider how human perception and viewer experience are constituted in relation to the affordances of complex environments of which they are also part.

Situating Viewing as Part of Complex Environments

The environment, as a concept, is slippery and is used to different empirical and political ends in different contexts. As the anthropologist Tim Ingold has emphasised, in contemporary discourses ‘the environment’ is often referred to as an entity and as something that we exist as separate from. Indeed, this idea is present in our discussion above whereby we have considered how eye tracking studies might better show us how we process complex environmental information. As Ingold expresses it this means we are ‘inclined to forget that the environment is, in the first place, a world we live in, and not a world we look at’. He argues that ‘We inhabit our environment: we are part of it; and through this practice of habitation it becomes part of us too’ (Ingold T 2011). Following this approach the environment can be understood as an ecology that humans are part of and with which we, and the ways we view, see and experience are mutually constituted. This however does not simply mean that ‘we’ as humans are encompassed by the environment, it means that the environment is co-constituted by us and our relationships with other constituents, which for our purposes in this article we would emphasise includes, film, images, art, technologies, other humans, the weather, the built environment (as well as much more). As advanced by Ingold (Ingold 2000, 2010) and art historian Barbara Stafford (Stafford 2006), approaches that critique linguistic and semiotic studies invite an analysis which acknowledges that – as Stafford puts it – ‘when you open your eyes and actively interrogate the visual scene, what you see is that aspect, or the physical fragments, of the environment that you perform’ (Stafford 2006). This however means also that the experience of film does not simply involve us looking at something that is external to us, but it is through the affordances of film that, in relation to the other constituents of our environments/worlds that viewing becomes meaningful to us. In this interpretation the use of ‘we’ derives from the development of a universal theory of human perception and our relationship to a (complex) environment. Yet, as we explain in the next section this rendering does not dismiss the idea that different people may often perceive the same information differently, and indeed to the contrary invites us to study precisely how and why difference emerges.

If we take Ingold’s approach further, to focus in on how meanings are generated through our engagements with and experiences of visual images, we can gain an appreciation of how the measurement and monitoring patterns that emerge from eye tracking studies are materialisations or representations not just of how the eye (or the mind) responds to the moving image. Rather they can be understood as standing for (but not actually explaining the meaning of) what people do with the moving image. Building on philosophical and other traditions emerging from the work of Merleau-Ponty, Gibson and Jonas, Ingold, has argued that human perception, learning and knowing emerge from movement, specifically as we move through our environments and engaging with the affordances of those other things and processes we encounter (Ingold 2000). With regard to art, he has used this approach to suggest that therefore …

Should the drawing or painting be understood as a final image to be inspected and interpreted, as is conventional in studies of visual culture, or should we rather think of it as a node in a matrix of trails to be followed by observant eyes? Are drawings or paintings of things in the world, or are they like things in the world, in the sense that we have to find our ways through and among them, inhabiting them as we do the world itself? (Ingold 2010; p16)

If we transfer this idea to the question of how we view film we might then ask the question of how we, as viewers, inhabit film? And what then eye tracking studies can tell us about these forms of habitation. If we see the relationship established between the viewer (‘s eyes) and the film by eye tracking visualisations such as those demonstrated in the earlier sections of this article, we can begin to think of how the movement of the eye and the movement of the film become entangled. Indeed while the film and the eye will both inevitably continue to move, the question becomes not simply how the composition and action in the screen influences the movement of the eye, but rather how the eye selects the aspects of the composition and action of the screen with which to move. By taking this perspective, we are able to remove something of the technological determinism that underpins assumptions that eye tracking studies might better enable film, advertising organisations to better influence viewing behaviours. Instead it directs us towards considering what eye tracking studies might tell us about what people do when they view, and how this can inform us about how they inhabit a world of which film and more generally the moving image as an ubiquitous presence.

The work and arguments discussed thus far in this section have focused on interpreting the question of how, at a general level, people see when they are viewing moving images. The theories advanced as yet however neither explain nor discuss the usefulness of attending to the patterning of eye tracking studies. Moreover the examples and visualisations we have shown of eye tracking studies in the earlier sections of this article were undertaken with a sample of people who were likely to have similar viewing perspectives, and as might therefore be expected showed distinct patterns in the ways that people view particular information. Indeed the data that would be needed to tell us to what extent such viewing patterns were universal – that is supported by studies and theories of the ways in which the human brain processes – and to what extent they were situationally and biographically constituted for this particular group of participants still does not exist as far as we know. Such work would be of high value given the increasing globalisation of both entertainment industries and forms of activism that use visual media; where films may be distributed in markets distant to the original context to which audience experience is understood. Indeed, studies of how people learn to look and know, undertaken in culturally specific contexts definitely reveal that where we look and what we see is contingent on processes of learning and apprenticeship, and therefore specific to complex environments.

Vision, Learning and Knowing

Eye tracking studies have shown us that there are sometimes similarities and patterns in the ways people view and remember complex images (Norton and Stark 1971), although if present such patterns are easily changed through instruction (Yarbus 1967; Tatler 2014). We have seen in the earlier sections of this manuscript that participants in studies have consistently fixed their gaze on the faces of film characters (Figs 2, 3), and that visual attention may become focused on a film character whose story line commands (or affords) particularly powerful affective and/or empathetic connections for viewers. Further eye tracking research would be needed to underpin any proposals that such ways of viewing are both gendered and culturally specific, however existing research in visual and media anthropology indicates that this is likely to be the case. Two bodies of literature are relevant here. First the applied visual and media anthropology literature, and second the anthropology of vision.

Applied visual and media anthropology studies (Pink 2007) focus on using anthropological understandings of media, along with audiovisual interventions (often in the form of filmmaking processes and film products) to work towards new forms of social and public awareness, and societal change. This work draws on and advances a strand in film studies developed in the work of Laura Marks, who has advanced the idea of the ‘embodied viewing experience’ (2000: 211). Marks, whose work focuses on intercultural cinema has argued that as ‘a mimetic medium’ cinema is ‘capable of drawing us into sensory participation with its world’ (Marks 2000: 214). The notion of empathy as a route towards creating intercultural understanding through film is also increasingly popular in the visual anthropology literature (discussed in Pink 2015). While on the whole there has been insufficient research into the ways in which people view intervention films of this kind, one example that has been undertaken implies how viewer attention, and importantly viewer’s capacity to engage with and remember film narrative can depend on the ways in which they are able to affectively or empathetically engage with the experiences of film characters. Susan Levine’s media anthropology study of how viewers discussed a film made as part of a South African HIV/AIDs intervention campaign, and which drew on local narratives to communicate the central message, is a good example (Levine 2007). Levine (unsurprisingly) found that participants engaged with the stories of film characters that followed locally relevant narratives, thus generating important lessons for filmmaking campaigns of this kind, where it is often difficult to communicate generic health messages to local audiences. The bridge between this type of anthropological understanding and a capacity to map viewer attention to faces and expressions within visual representations (Vassallo et al. 2009) may allow for more comprehensive understandings of why film is such a powerful medium for communication.

Anthropological studies of vision provide further evidence of the importance of attending to how seeing is situated. Indeed when vision is understood as a practice, rather than as a behavior, it is not just a situated practice, but it is a practice that is learned through participation. The anthropologist Cristina Grasseni has developed a theory of what she calls ‘skilled vision’ though which to explain this (Grasseni 2004, 2007, 2011), as she puts it:

The “skilled visions” approach considers vision as a social activity, a proactive engagement with the world, a realm of expertise that depends heavily on trained perception and on a structured environment (Grasseni 2011).

Emphasizing that skilled visions are ‘positional, political and relational’ as well as sensuous and corporeal, Grasseni points out that ‘Because skilled visions combine aspects of embodiment (as an educated capacity for selective perception) and of apprenticeship, they are both ecological and ideological, in the sense that they inform worldviews and practice’ (Grasseni 2011). As Pink has shown through her work on the Spanish bullfight, what one sees when viewing the performance is highly contingent on how one has learned to view, ones own empathetic embodied ways of sensorially and affectively ‘feeling’ the performance at which a visual representation was created, or how one’s existing ways of knowing and understanding the world can inform perception (Pink 1997; 2011). For example, consider the different ways in which Figure 5, or a film sequence around the same performance, would be interpreted by a bullfighting fan and an animal rights activist. Each will have learned how and what to know about this performance through different trajectories. Whilst an eye tracking investigation of respective subjects might show somewhat similar patterns (especially if bottom up mechanisms dominate), the semantic interpretation of the visual input by respective viewers may be completely different. How such information content might be assessable, or not, through evaluation of bottom up or top down type mechanisms involved with visual processing will be a major challenge for interpretation of information as complex as can typically be perceived in a movie.

Conclusion

Figure 5. How emotive content, as is common in many films, may influence the perception of visual images even if the same information is present to viewers remains a major topic for exploration. For example, we know that the bullfight is interpreted, and affectively experienced, very differently when viewed by bullfight fans and animal rights activists. We also know that learning how to view the bullfight, as a bullfight fan, is a process of cultural apprenticeship (see for example Pink 1997). Consider how for the above image the action of a bull fight could promote very different visual behavior depending upon cultural context, whether a subject was a bullfighting fan and an animal rights activist, or the representation was depicted as animation instead of real life, or motion compared to a still image. Copyright: Sarah Pink.

Figure 5. How emotive content, as is common in many films, may influence the perception of visual images even if the same information is present to viewers remains a major topic for exploration. For example, we know that the bullfight is interpreted, and affectively experienced, very differently when viewed by bullfight fans and animal rights activists. We also know that learning how to view the bullfight, as a bullfight fan, is a process of cultural apprenticeship (see for example Pink 1997). Consider how for the above image the action of a bull fight could promote very different visual behavior depending upon cultural context, whether a subject was a bullfighting fan and an animal rights activist, or the representation was depicted as animation instead of real life, or motion compared to a still image. Copyright: Sarah Pink.

Bringing together measurement and monitoring data with anthropologically informed ethnographic ways of knowing, which are always collaboratively crafted and sensorially and tacitly known is increasingly common. For instance in energy research a number of projects seek to combine ethnographic and energy consumption measurement data (Cosar et al. 2013). Such an approach has not yet been integrated in eye tracking studies of movies, yet this would be the next step if we were to want to understand better the significance and relevance of the types of data and knowledge that eye tracking studies can offer us, for understanding film audiences. This however presents certain challenges, which both impinge on, but are not necessarily unique to, the use of eye tracking data in audience research. The first challenge is to generate sufficient interdisciplinary understanding between the approaches involved. This article has intended to initiate that process. That is it has explained how eye tracking and anthropological-ethnographic (that is at once theoretical and practical) approaches offer different, and differently theorised perspectives on the ways in which people look at and participate in the viewing of film. It has simultaneously however suggested that these different approaches and disciplines offer something to each other that enable new questions to be asked, and therefore is able to develop deeper understandings of how audiences view film.

Future work testing human visual behaviour with complex stimuli as are typically present in movies may help build our understanding of how humans sometimes process very complex information to build an understanding of our surrounding world, but sometimes also miss salient information in complex moving images such as the Gorillas in our midst study. Current theories suggests that perceptual blindness to salient and recognisable stimuli when our attention is captured by other competing stimuli that impose a cognitive load to process (Simons and Chabris 1999; Levin et al. 2000; Memmert 2006), but more fully exploring effects of narrative or instructions, character gaze and other potential top down mechanisms will likely be fruitful contributions to our knowledge on perceptual blindness. Indeed, as discussed above in relation to anthropological and ethnographic factors, the potential role of factors like experience do appear to modulate the ability of subjects to detect a gorilla in a perceptual blindness type test (Memmert 2006), potentially suggesting that future investigations on eye tracking and movies should consider the broad range of human experience that can influence our perception. This type of research is likely to also provide for richer understandings in some ethnographic studies as researchers will have, possibly for the first time, access to precise quantitative data on whether an observer actually failed to even look at certain objects in a scene; or indeed if such information, like an unexpected gorilla in a basketball game, was viewed but not directly perceived (Memmert 2006). Many individual scenes within a film are typically short of about 4s duration and so it is often only possible for viewers to process a small percentage of the entire visual presentation in detail, especially in cases where movies are subscripted (Smith 2013). This means that elements of a film that might be essential to the complete comprehension of narrative story line may be easily missed by a percentage of an audience depending upon their individual knowledgebase, linguistic skills, attention and motivation; and eye tracking potentially offers film makers with a useful vehicle to test different demographic groups to better understand how different components of scenes might be constructed to enhance viewer experience, and also build our understanding of how we process very complex environmental information.

 

 

Acknowledgements. We are very grateful to Dr Craig Batty, Dr Claire Perkins and Dr Jodi Sita for discussions and permission to use images from their collaborative work with one of us (AGD), and for broader discussions with members of the Eye Tracking of the Moving Image research group. AGD acknowledges funding support from the Australian Research Council (LE130100112) for eye tracking equipment. We are grateful to Dr Lalina Muir for her careful proofreading of the manuscript.

 

REFERENCES

Batty, Craig. 2011. Movies That Move Us: Screenwriting and the Power of the Protagonist’s Journey. Basingstoke: Palgrave Macmillan.

Batty, Craig, Dyer, Adrain, G., Perkins, Claire, and Sita, Jodi. 2015. Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative (Palgrave, forthcoming)

Cosar Jorda, P, Buswell, RA, Webb, LH, Leder Mackley, K, Morosanu, R, and Pink, Sarah. 2013. ‘Energy in the home: Everyday life and the effect on time of use.’ In The Proceedings of the 13th International Conference on Building Simulation 2013. Chambery, France. 25-28/8/2013.

Docter, P. 2009. Up. Disney-Pixar Motion Film.

Dorr, M, Martinetz, T, Gegenfurtner, KR, and Barth, E. 2010. ‘Variability of eye movements when viewing dynamic natural scenes.’ Journal of Vision 10 (28): 1-17.

Duchowski, Andrew. 2003. Eye tracking methodology: theory and practice. London: Springer-Verlag.

Dyer, Adrian, G., Found, Brian, and Rogers, Doug. 2006. ‘Visual attention and expertise for forensic signature analysis.’ Journal of Forensic Science 51: 1397–1404.

Goldstein, Robert, B., Woods, Russell,L., and Peli, Eli. 2007. ‘Where people look when watching movies: Do all viewers look at the same place?’ Computers in Biology and Medicine 37 (7): 957-964.

Grasseni, Cristina. 2004. ‘Video and ethnographic knowledge: skilled vision in the practice of breeding.’ In Working Images, edited by S Pink, L Kürti, and AI Afonso, 259-288. London: Routledge.

Grasseni, Cristina. 2007. Skilled Visions. Oxford: Berghahn.

Grasseni, Cristina. 2011. ‘Skilled Visions: Toward an Ecology of Visual Inscriptions.’ In Made to be Seen: Perspectives on the History of Visual Anthropology, edited by M. Banks and J. Ruby. Chicago: University of Chicago Press.

Horsely, Mike. 2014. ‘Eye Tracking as a Research Method in Social and Marketing Applications.’ In Current Trends in Eye Tracking Research, edited by M Horsley et al., 179-182. Springer, London.

Ingold, Tim. 2000. The Perception of the Environment. London: Routledge.

Ingold, Tim. 2010. ‘Ways of mind-walking: reading, writing, painting.’ Visual Studies, 25 (1): 15–23

Ingold, Tim. 2011. Being Alive. Oxford: Routledge. p 95.

Jovancevic-Misic, Jelena, and Hayhoe, Mary. 2009. ‘Adaptive Gaze Control in Natural Environments.’ Journal of Neuroscience 29 (19): 6234–6238. DOI:10.1523/JNEUROSCI.5570-08.2009.

Kustov, Alexander, A., and Robinson, David Lee. 1996. ‘Shared neural control of attentional shifts and eye movements.’ Nature 384: 74–77.

Levine, Susan. 2007. ‘Steps for the Future: HIV/AIDS, Media Activism and Applied Visual Anthropology in Southern Africa.’ In Visual Interventions, edited by S. Pink, 71-89. Oxford: Berghahn.

Marks, Laura. 2000. The Skin of the Film: Intercultural Cinema, Embodiment, and the Senses. Durham and London: Duke University Press

Martinez-Conde, Susana, Macknik, Stephen, L., and Hubel, David, H. 2004. ‘The role of fixational eye movements in visual perception.’ Nature Neuroscience 5: 229–240.

Memmert, Daniel. 2006. ‘The effects of eye movements, age, and expertise on inattentional blindness.’ Consciousness and Cognition 15 (3): 620–627.

Mital, Parag, K., Smith, Tim,J., Hill, Robin, L., and Henderson, John, M. 2011. ‘Clustering of gaze during dynamic scene viewing is predicted by motion.’ Cognitive Computation 3, 5–24.

Nodine. Calvin, F., Mello-Thoms. Claudia, Kundel. Harold, L., and Weinstein, Susan, P. 2002. ‘Time course of perception and decision making during mammographic interpretation.’ American Journal Roentgenol 179: 917–923

Norton, David, and Stark, Lawrence. 1971. ‘Scanpaths in eye movements during pattern perception.’ Science 171: 308–311.

Parkhurst, Derrick, Law, Klinton, and Niebur, Ernst. 2002. ‘Modeling the role of salience in the allocation of overt visual attention.’ Vision Research 42: 107–123.

Pashler, Harold. 1998. Attention. Hove, UK: Psychology Press Ltd.

Russo, Francesco, Pitzalis, Sabrina, and Spinell, Donatella. 2003. ‘Fixation stability and saccadic latency in elite shooters.’ Vision Research 43: 1837–1845.

Pink, Sarah. 1997. Women and Bullfighting. Oxford: Berghahn.

Pink, Sarah. 2007. (ed) Visual Interventions. Oxford: Berghahn.

Pink, Sarah. 2011. ‘From Embodiment to Emplacement: re-thinking bodies, senses and spatialities.’ In Sport, Education and Society (SES), special issue on New Directions, New Questions. Social Theory, Education and Embodiment 16(34): 343-355.

Pink, Sarah. 2013. Doing Visual Ethnography, 3rd edition. London: Sage.

Pink, Sarah. 2015 Doing Sensory Ethnography, 2nd edition London: Sage.

Simons, Daniel, J., and Chabris, Christopher, F. 1999. Gorillas in our midst: sustained inattentional blindness for dynamic events. Perception 28(9): 1059-1074.

Smith, Tim. J., and Henderson, Jordan. 2008. ‘Edit blindness: The relationship between attention and global change blindness in dynamic scenes.’ Journal of Eye Movement Research 2: 1–17.

Smith, Tim, J. 2013. ‘Watching you watch movies: Using eye tracking to inform cognitive film theory.’ In Psychocinematics: Exploring Cognition at the Movies edited by A. P. Shimamura, 165-191. New York: Oxford University Press

Smith, T, Levin, D, and Cutting J. 2012. ‘A Window on Reality: Perceiving Edited Moving Images.’ Current Directions in Psychological Science 21(2): 107-113. doi: 10.1177/0963721412437407

Smith, Tim, j., and Mital, Parag, K. 2013. ‘Attentional synchrony and the influence of viewing task on gaze behaviour in static and dynamic scenes.’ Journal of Vision 13 (8): 16.

Stafford, Barbara Maria. 2006. Echo Objects: the Cognitive Work of Images. Chicago: University of Chicago Press.

Tatler, Ben, W. 2007. ‘The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions.’ Journal of Vision 7(14): 4, 1–17. http:// www.journalofvision.org/content/7/14/4, doi:10.1167/ 7.14.4.

Tatler, Ben, W. 2014. ‘Eye Movements from Laboratory to Life.’ In Current Trends in Eye Tracking Research edited by Horsley et al., p17-35.

Tatler, Ben, W., and Kuhn, Gustav. 2007. ‘Don’t look now: The magic of misdirection.’ In Eye Movements: A window on mind and brain, edited by R van Gopel, M Fischer, W Murray and R Hill, 697–714. Amsterdam: Elsevier.

Tatler, Ben, W., Hayhoe, Mary, M., Land, Michael, F., and Ballard, Dana, H. 2011. ‘Eye guidance in natural vision: Reinterpreting salience.’ Journal of Vision 11 (5): 1–23. http://www.journalofvision.org/content/11/5/5, doi:10.1167/11.5.5.

Tatler, Ben, W., Kirtley, Claire, Macdonald, Ross. G., Mitchell, Katy, MA., and Savage, Steven, W. 2014. ‘The Active Eye: Perspectives on Eye Movement Research.’ In Current Trends in Eye Tracking Research, 3-16. DOI 10.1007/978-3-319-02868-2_16 Print ISBN 978-3-319-02867-5 Online ISBN 978-3-319-02868-2

Treuting, Jennifer. 2006. ‘Eye tracking and cinema: A study of film theory and visual perception.’ Society of Motion Picture and Television Engineers 115 (1): 31-40.

Tosi, Virgilio, Mecacci, Luciano, and Pasquali, Elio. 1997. ‘Scanning eye movements made when viewing film: Preliminary observations.’ International Journal of Neuroscience 92 (1/2): 47-52.

Vassallo, Suzanne, Cooper, Sian, LC., and Douglas, Jacinta, M. 2009. ‘Visual scanning in the recognition of facial affect: Is there an observer sex difference?’ Journal of Vision 9: 1-10.

Vig, Eleonora, Dorr, Michael, and Barth, Erhardt. 2009.’ Efficient visual coding and the predictability of eye movements on natural movies.’ Spatial Vision 22 (2): 397-408.

Yarbus Alfred, L. 1967. Eye Movements and Vision. New York: Plenum.
BIOS:

Adrian Dyer is an Associate Professor in Media and Communication at RMIT University (Australia) investigating vision in complex environments. He is an Alexander von Humboldt Fellow (Germany) and a Queen Elizabeth II Fellow (Australia), and has completed postdoctoral positions at La Trobe University and Monash University (Australia), Cambridge University (UK), and Wuerzburg and Mainz Universities (Germany).

Sarah Pink is Professor of Design and Media Ethnography at RMIT University (Australia). She is visiting/guest Professor at Halmstad University (Sweden), Loughborough University (UK), and Free University Berlin (Germany). Her most recent books include Situating Everyday Life (2012), Doing Visual Ethnography 3rd edition (2013) and Doing Sensory Ethnography 2nd edition (2015).