Sound and Sight: An Exploratory Look at Saving Private Ryan through the Eye Tracking Lens – Jennifer Robinson, Jane Stadler and Andrea Rassell

Abstract

Using eye tracking as a method to analyse how four subjects respond to the opening Omaha Beach landing scene in Saving Private Ryan (Steven Spielberg, 1998), this article draws on insights from cinema studies about the types of aesthetic techniques that may direct the audience’s attention along with findings about cognitive resource allocation in the field of media psychology to examine how viewers’ eyes track across film footage. In particular, this study examines differences when viewing the same film sequences with and without sound. The authors suggest that eye tracking on its own is a technological tool that can be used to both reveal individual differences in experiencing cinema as well as to find psychophysiologically governed patterns of audience engagement.

Introduction

Steven Spielberg’s Saving Private Ryan (1998) begins at a geriatric pace, ambling alongside an elderly World War II veteran as he visits a military cemetery and begins to reminisce about the men who saved his life during the Battle of Normandy in June, 1944. This is where the story really starts, with a platoon of terrified, seasick servicemen led by Captain John Miller (Tom Hanks) landing on Omaha Beach where they come under heavy fire by German infantry. The Omaha Beach landing scene is gruelling in its experiential intensity as the hand-held camera locates the audience alongside soldiers desperately fighting their way toward the enemy line amidst relentless machine gunfire and bone-shuddering explosions that tear them limb from limb.

An interdisciplinary 2014 study by Vittorio Gallese (one of the scientists credited with the discovery of mirror neurons), fellow neuroscientists Katrin Heimann and Maria Alessandra Umiltà, and film scholar Michele Guerra investigated the effects of camera movement on the audience’s feeling of involvement in film scenes and their ability to place themselves in the position of a screen character. This study was conducted using a high-density electroencephalogram (EEG) to test whether the audience’s experience of what Gallese (2012, 2013) terms “embodied simulation”—that is, neural mirroring responses that are associated with empathy—is affected by camera movement as well as by the action of human figures on screen. The researchers found that the relationship between cognition and action perception is significantly influenced by camera movement and that the use of camera techniques such as steadicam elicit stronger mirroring responses and an augmented sense of involvement in the scene because this type of cinematography more closely resembles human movement than static camera, zooms, or dolly-mounted tracking shots (Heimann et al. 2014, 2098–99).

These findings are consistent with eye tracking studies by Paul Marchant and colleagues who have demonstrated that the audience’s visual attention is captured and guided by mobile framing, focus, the direction of screen characters’ movement and lines of sight, and the colour and motion of other aspects of the mise-en-scene (Marchant et al. 2009, 157–58). This interplay of figure movement and the technical and aesthetic dimensions of cinematography is relevant to Saving Private Ryan in that the arresting beach landing scene at the start of the film is shot almost exclusively using hand-held camera to simulate human movement. The study by Heimann and colleagues suggests that this form of camera movement, teamed with the panicked motion of the figures on screen, functions to elicit a sense of affective identification with Captain Miller and the soldiers he leads by stimulating a shared experience of embodied confusion and sensory overload as the military men shake with fear and scramble to dodge the shrapnel ricocheting across the war-ravaged beach. The unstable gaze of the constantly moving camera makes it as difficult for the audience as it is for the soldiers in the scene to focus attention or see a pathway to safety and this shared perceptual experience may elevate neural mirroring responses or empathic concordance with observed actions.

Venturing into an area that has received less attention from either film scholars, media effects researchers or neuroscientists, we were struck by the acoustic ferocity of the Omaha Beach scene and we sought to understand the ways in which sound functions as a perceptual cue that may affect the cinema audience’s attention and modulate gaze patterns. This interdisciplinary study brings empirical eye tracking research into dialogue with formalist understandings of film style and cognitive engagement with narrative, using the following question to establish a framework for analysis: What audio-visual aesthetic cues guide the audience’s attention and what psychophysiological processes underlie audience responses to the screen? In particular, we draw on existing research on film dialogue by Todd Berliner and others and we supplement eye tracking data by drawing on Lisa Coulthard’s concept of “dirty sound,” Vivian Sobchack’s work on the “sonic imagination” and cognitivist methods of aesthetic film analysis to work through the experiential dimensions of the sonic confusion generated in the scene.

Cognitivist film theory, as advanced by scholars such as David Bordwell and Carl Plantinga, conceptualises film and television spectatorship as the active construction of meaning via the inferential elaboration of perceptual cues and formal screen production conventions. In a quest for greater explanatory power and a more holistic understanding of spectatorship that moves beyond rational thought and conscious inferential processes, film theorists are increasingly drawing upon empirical research in fields such as neuroscience, psychology and media effects to test assumptions about how audiences perceive and respond to screen texts, and to account for the sensory experiences and involuntary physiological reactions of the audience.

Psychophysiological Approaches to Cinema Studies

There are several different empirical approaches to studying audience members’ responses to film, including biometrics, neuroimaging, and psychophysiological techniques. Psychophysics is an area of research that quantifies physiological or bodily responses to psychological states. Neurocinema (Hasson et al. 2008) and Psychocinematics (Shimamura 2013) are emerging fields that connect these psychophysiological methods to cinematic experience. Where the neurocinematic approach involves imaging of the brain while watching cinema, in psychophysiology the subject’s physiological state is understood to be representative of psychological responses (for example, skin conductance and heart rate indicate arousal or an emotional reaction). One such response is an involuntary orienting response that assigns cognitive resources to processing stimuli in screen texts automatically.

Annie Lang (Lang 2000; Lang et al. 2000) proposes a model of responding to dynamic screen media that starts from the position that there are limited cognitive resources that any individual can bring to bear when processing mediated content. Features of the screen content can automatically consume some of those cognitive resources, which leaves less capacity for the intentional interpretation of meaning, formulation of hypotheses or speculation about protagonists’ motives (the very processes that cognitive film theory privileges). While this has been well developed for visual attributes, such as hard edits, movement and new features, Lang and colleagues are developing a similar catalogue of attributes for aural content (sound). Using a physiological indicator of an orienting response (a short, rapid decrease in heart rate just after the feature is introduced), they have identified “voice changes, music onsets, sound effect onsets, production effect onsets, emotional word onsets, silence onsets, and voice onsets” as aural cues that orient attention (Lang et al. 2014, 4).

Embodied responses to film are not necessarily indicative of cognitive processing as some responses occur in the autonomic nervous system (such as the startle response to a loud sound or a sudden movement); other processes involve the conscious allocation of cognitive resources. For example, seeing a poisonous reptile on screen can make the audience form hypotheses about impending danger, which can then prime emotional reactions such as anxiety. Increased heart rate during the shower scene in Psycho (Alfred Hitchcock, 1960), or light perspiration on the palms as viewers watch Grace Kelly fossick through the neighbour’s apartment in Rear Window (Alfred Hitchcock, 1954), are widely understood to be biological evidence of changes in psychological states in response to cinema. In such a state of arousal hormones are released, blood pressure rises, and brain wave patterns shift. These biological changes can be recorded using non-invasive techniques and have proven to be stable markers of psychophysiological changes. Some commonly used psychophysiological measures are eye tracking, Galvanic Skin Response (GSR), and pupillometry; however, in this exploratory study, only eye tracking has been used.

Eye Tracking

Eye tracking is a technique that can measure the movements of the eye by gauging the direction of infrared light bounced off the eye surface. The most common technique utilises the eye’s physiology to create different reflections of the light source from the pupil and cornea that are captured by two cameras and used to used to track the gaze and control for head and eye movements. While there are several types of eye tracking devices, those most pertinent to this study include eye trackers that require the viewer to be in a fixed position such as seated in front of a monitor, and those that can be head-mounted or worn like glasses by a mobile viewer. Eye tracking devices are used in a wide variety of fields including marketing, sports coaching and user experience. The range of Tobii Technology eye trackers are frequently employed as research tools to measure attention, as is the case in this study. Two of the main characteristics of eye movement that can be measured by eye tracking devices are saccades and fixations.

Saccades

In order to collect high-quality visual data about our environment, the eye needs to be constantly redirected. We use movements called saccades in order to do this. Saccades occur at a rate of about 2-3 per second (Tatler 2014) and can be voluntary or reflexive (Duchowski 2007). Their duration ranges from 10-100 ms, rendering the individual effectively blind during this time, but not for long enough to be perceivable: “Visual sensitivity effectively shuts down during a saccade via a process known as saccadic suppression, in order to ensure that the rapid movement of light across the retina is not perceived as motion blur” (Smith 2014, 86).

Fixations

A fixation is a length of time when the eyes stop large movements (saccades) and stay focused on a small visual range (typically about 5 degrees). Fixations should not be thought of as static, as the name implies, but as “miniature eye movements: tremor, drift and microsaccades” (Duchowski 2007, 46). Their duration is usually in the range of 150-600 ms (Duchowski 2007) and most visual information is processed when the eyes stabilize or fixate on a point on the screen (Smith 2014, 86).

Previous findings

A consistent finding from eye tracking research that is relevant to this study of cinema is that when scenes are viewed on a screen or a monitor, the gaze tends to fixate at the centre more than the periphery, even when salient features are not located in the middle of the frame. Because this tendency may be adaptive (for example the centre is a good resting place for fast response to new action that requires attending to), rather than solely visual, Benjamin Tatler (2014) warns against a reductive expectation that these fixations are caused by visual stimuli alone. While this study attends closely to visual stimuli and the aesthetic techniques used by filmmakers to direct attention, we also consider aural stimuli and involuntary biological responses.

Despite the large body of eye tracking research, Antoine Coutrot and colleagues claim that until recently, only two preliminary studies had investigated the influence of sound on eye movements and patterns of attention when watching film or video footage (Coutrot et al. 2012, 2).[i] When studying eye movements in response to the presence and absence of sound in audiovisual stimuli, Coutrot et al. analyse differences in three further eye tracking metrics: dispersion, distance to centre, and Kullback-Lieber Divergence. Dispersion refers to the “variability of eye positions between observers” (2012, 4). Distance to centre is a measurement of “the distance between the barycenter of a set of eye positions and the centre of the screen” (Coutrot et al. 2012, 4). Kullback-Leiber divergence “is used to estimate the difference between two probability distributions. This metric can be compared as a weighted correlation measure between two probability density functions… The lower the KL-divergence is, the closer the two distributions are… If soundtrack impacts on eye position locations, we should find a significant difference between the mean inter and intra KL-divergences” (Coutrot et al. 2012, 4). Dispersion provides information about the variability between eye positions, but does not determine the relative position of the two data sets of the eye positions for the two stimulus conditions (sound on/sound off). For the KL-divergence, it is the opposite.

Coutrot and colleagues (2012) found that eye movements follow a consistent pattern that is involuntary and that is not affected by screen aesthetics, narrative content, genre, sound or other factors in the first second following an edit. After a brief latent phase, the eye automatically refocuses on the centre of the screen after a cut and takes a second to adjust to the new image. Thereafter, they found that sound does influence gaze patterns in the following ways: dispersion is lower in the sound on condition than the sound off condition; fixation locations are different between the two conditions; sound results in larger saccades than the same footage without sound; and sound elicits longer fixations than sound off (Coutrot et al. 2012, 8).

More recently, Coutrot and Guyader found that “removing the original soundtrack from videos featuring various visual content impacts eye positions increasing the dispersion between the eye positions of different observers and shortening saccade amplitudes” (2014, 2). This study also found that in dialogue scenes, the audience’s attention tends to “follow speech turn taking more closely” (Coutrot and Guyader 2014, 1). A 2014 study by Tim Smith also investigated the cross-modal influences of audio on visual attention and found that “When the visual referent is present on the screen, such as the face of a speaker (that is, a diegetic on-screen external sound source), gaze will be biased towards the sound source, and towards the lips if the audio is difficult to interpret” (Smith 2014, 92). This accords with research in film studies into dialogue and conversation in movies. For instance, Berliner notes that movie dialogue is typically scripted to advance the narrative by directing the audience’s attention to key plot points and protagonists;[ii] furthermore, “characters in Hollywood movies communicate effectively and efficiently through dialogue” and “movie characters tend to speak flawlessly” (Berliner 2010, 191). Similarly, Aline Remael identifies the promotion of narrative continuity and textual cohesion as two of the chief functions of film dialogue (2003, 227; 233). Given these findings from two different fields of research, we pay particular attention to gaze patterns during dialogue exchanges in the analysis of Saving Private Ryan that follows.

Method

Building on previous work by Tim Smith, Antoine Coutrot, Nathalie Guyador and other researchers who have used eye tracking to investigate attentional synchrony[iii] (as illustrated in gaze plots and heat maps that represent the concentration of the audience’s gaze), our methodology examines the distribution of fixations across nine smaller central Areas of Interest (AOIs) during film sequences to explore what is occurring for viewers who are not following the predicted pattern and instead are searching for something else. Using two conditions as stimuli (film with sound on, and film with sound off), we conducted a qualitative comparison between and within the viewing patterns of four subjects. Within the limitations of a qualitative and exploratory study with only four subjects, we drilled down to conduct a fine-grained mapping of attention to determine whether it functions in a predictable way in relation to previous findings about dialogue scenes, sonic cues and attention in relation to camera and figure movement.

For the purposes of this study, a Tobii X-120 eye tracker and Tobii Studio 2.3.2 software (Tobii Technology, Stockholm, Sweden) were used to record seven individual subjects (five females, two males) as they watched film footage. As this was an exploratory study, subjects were recruited from the researchers’ networks, with ethics approval. They were seated and positioned 55-65 cm away from the eye tracker for viewing. All subjects were recorded on the same Tobii X-120, in the same room, with the film footage played on the Tobii computer to standardise start times for all subjects to enable comparison in later analysis. Each subject was successfully calibrated by looking at symbols in different areas of the screen, which ensures the eye tracker gets a reliable measure of gaze location across the whole screen. After the viewing session, each subject’s data was analysed for quality, with three subjects excluded because one condition had segments with lower reliability than desired. Thus, the results are reported on the basis of four subjects with high quality and complete data (three females, one male).

We analysed the areas of the screen where subjects looked while watching discrete sequences of the key beach-landing scene at the beginning of Saving Private Ryan. We investigated how different stylistic techniques employed in the following four consecutive sequences of the scene affect the audience’s gaze patterns:

  1. The “Indistinct Dialogue” sequence is an 11-second clip that was chosen with a view to finding out how the audience’s attention is affected when dialogue is overridden by chaotic background noise, forcing the audience to strain to decipher what is being said. This part of the scene occurs immediately after Captain Miller has located the men under his command. The first shot is an unsteady medium close-up of Miller shouting to Horvath (Tom Sizemore) as bullets splash in the surf around him and ping off the metal structure he is crouching behind. Miller yells, “Sergeant Horvath! (Explosion.) Move your men off the beach! (Water splashes up noisily.) Now!” The next shot shows Horvath’s response in a hand-held medium close-up as he points at his men and hollers, “OK you guys, get on my ass! (Directional hand signals as bullet hits metal and drowns out dialogue.) Follow me! (Horvath ducks as a mortar shell explodes, screen right.)
  2. The “Wounded Man” sequence that occurs as the men move up the beach is a 30-second segment that is noteworthy because it includes a subjective sequence that solicits audience engagement with Captain Miller’s experience of temporary hearing loss following the concussive impact of a mortar shell nearby. This clip begins with a hand-held long shot of carnage on the beach as Miller moves to the right, dragging Briggs, a wounded soldier he is trying to help. The audience hears artillery fire, crashing, splashing, and shouting as Miller lugs Briggs into the mid-ground, with explosions and debris visible in the foreground. As mortar shells hit, spraying blood and water upwards, Miller hollers for a medic. Following a massive explosion, the sound of gunfire in the background is muted and is replaced with the subdued drone of a low, echoing, wind-like sound that communicates Miller’s subjective experience of shellshock as sand obscures everything and Miller falls to the ground. In slow motion, we cut to a low level close-up of boots in the sand. Miller scrambles up and the hand held camera follows him. Other soldiers pass in front of the camera, occluding the lens and masking the edit. The sounds of the battlefield are replaced by an echoing, low frequency droning noise and the subdued clink of military gear as Miller is momentarily dazed by shellshock. As he gets up and grabs Briggs’s arm, the sound of artillery returns loudly and we see Miller in long-shot framed with a low level camera. He staggers, looks back, and realises that Briggs is dead: his lower abdomen and legs have been blasted away. This sequence ends with a close-up of Miller’s reaction as he looks at Briggs in shock, abandons him, crawls away from the camera, then stands and runs toward the sand dunes, into enemy fire and gun smoke.
  3. The “Sand Dunes” sequence is one unusually long and complex shot that lasts for a full minute; however, in the interests of generating a more granular analysis we divided the shot in two. The first 25 seconds “Sand Dunes: In Command” begins with a match on action as the body of a soldier that was catapulted into the air by a grenade in the previous shot now hits the ground. Quickly, the hand-held camera tilts down from the long-shot of the falling soldier, pans left, and follows Miller forward as he dives behind a ridge of sand for shelter. The camera pushes in to frame Miller in close up as the dialogue begins:

Miller: (Turns left to address the radio operator.) Shore Party! No armour has made it ashore. We got no DD tanks on the beach. Dog One is not open. (Miller rolls to the right so he is framed in an over-the-shoulder shot as he shouts to other soldiers seen in medium-long shot on the dune.) Who is in command here?

Soldier: You are, Sir.

Miller: Sergeant Horvath!

Horvath: Sir!

Miller: Do you know where we are?

Horvath: Right where we’re supposed to be, but nobody else is …

4. “Sand Dunes: Radio” is a 35-second continuation of the shot detailed in sequence three, beginning when the hand-held camera pans left as Miller rolls back toward the radio operator, facing the camera in close up as he grabs the radio operator’s shoulder and hollers in his ear, straining to be heard over the background gunfire.

Soldier: (Distant, off screen, as Miller rolls to the left.) Nobody’s where they’re supposed to be!

Miller: (To radio operator) Shore Party! First wave ineffective. We do not hold the beach. Say again, we do not hold the beach. (Miller turns and rolls back towards the right, away from the camera. The camera zooms toward Horvath, excluding Miller from the frame as he listens to Horvath.)

Horvath: (Indistinct) We’re all mixed up, sir. We got the leftovers from Fox Company, Able Company and George Company. Plus we got some Navy Demo guys and a Beachmaster.

Miller: (The camera follows Horvath as he rolls to the left, toward Miller; we then see Miller in medium close-up as he turns back to radio operator.) Shore party! Shore party! (Realises radio operator is deceased; grabs hold of the radio himself.) Cat-F, Cat-F, C-… (Miller realises the radio is dead.)

In Overhearing Film Dialogue, Sarah Kozloff states that “although what the characters say, exactly how they say it, and how the dialogue is integrated with the rest of the cinematic techniques are crucial to our experience and understanding of every film since the coming of sound, for the most part analysts incorporate the information given by a film’s dialogue and overlook the dialogue as signifier” (2000, 6). By contrast, our analysis focuses closely on what the characters say, and also on what they hear. It may seem counterintuitive to be investigating the significance of sound in an eye tracking study, because sound is something that the eyes are not normally required to process. However, the Omaha Beach dialogue sequences are unusual because the audience has to rely on their eyes to search for contextual cues in order to fill in gaps in understanding due to indistinct vocals. Such cues include the direction of figure movement and eye lines (when Horvath yells, “Get on my ass and follow me!” but his words are obscured by background sound), or facial expressions and body language when words don’t fully make sense due to the inclusion of military terminology or unfamiliar radio communication codes and incomplete communicative exchanges (when Miller shouts “Dog-One” and “Cat-F” into the radio and receives no response). Coutrot and colleagues identify numerous instances in which aural and visual stimuli interact to affect attention (they offer as one example, “the help given by ‘lip reading’ to understanding speech, even more when speech is produced in poor acoustical conditions,” as is the case in Saving Private Ryan); furthermore, they report that “perceivers gazed more at the mouth as auditory masking noise levels increased” (Coutrot et al. 2012, 2).[iv] Our study builds on this work, as detailed below.

With respect to each of the four sequences outlined above, two different analyses were conducted. Given that most of the viewing was within the central area of the screen, we subdivided that area into a three by three grid providing nine smaller areas of interest, as illustrated in Figure 1.[v] The total time fixated, and mean fixation duration, was calculated for each of the nine Areas of Interest (AOI). For the default method on the Tobii eye tracking system that we used, a fixation is identified when the gaze is steadily focused in the same area of X-Y coordinates on the screen, typically occurring within 35 pixels. This technique of dividing the centre of the screen into nine smaller AOIs allowed greater granularity in determining the primary AOI where most people attended and the instances where individuals diverged and looked at other parts of the screen.

Figure 1: Nine Areas of Interest (AOI)

Figure 1: Nine Areas of Interest (AOI), Saving Private Ryan: Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

The second step was to analyse whether the participants exhibited attentional synchrony and primarily look at the same AOIs as each other or displayed individual variation. For this, we calculated an estimate of attentional distribution. For this exploratory study, a simple ratio of time spent following the guided (dominant) viewing pattern compared to time looking at other parts of the screen was calculated.[vi] The dominant AOIs were determined by which AOIs account for over 50% of fixation time in the sound on condition (control). As some sequences guide the viewer between several of the AOIs, the combination that yielded a majority of viewing time was used rather than simply the AOI with the greatest portion of time. The Distribution ratio was then calculated as follows:

Distribution Ratio =

Number non-dominant AOIs viewed multiplied by amount of time in those AOIs

Number of AOIs in dominant view multiplied by amount of time in the dominant AOIs

Eye Tracking Results and Discussion

The distribution ratio was intended to see whether the findings of Coutrot et al. (2012) applied to what we saw in the sequences from the Omaha Beach landing scene. We hypothesised that a lack of sound would increase divergence away from the dominant AOI, compared to the null condition with sound that should reinforce “attentional synchrony” (Smith 2013) by guiding the viewer to the most important focal area. However, we expected that because some of the sequences we were examining contained many competing audio and visual contextual cues, we might not see such a clear distinction. Furthermore, we anticipated that our findings in relation to dialogue may diverge from Coutrot and Guyader (2014) because the Omaha Beach scene has atypical dialogue sequences that deviate from conventional turn-taking and are overloaded with noise and movement to create a sense of confusion.

We found that averaged across all sequences, three of the viewing subjects followed the expected pattern of greater divergence with sound off, as indicated by a positive difference score in Table 1. The fourth (Subject 3) had slightly higher gaze distribution when there was sound but did have an increase in the mean number of AOIs with fixations in the “sound off” condition. This would suggest that, on the whole, sound does function to focus attention more tightly. Given the small number of participants in this exploratory study, this result is encouraging.

    Table 1: Distribution Ratios Arranged by Subject     *Note: A positive difference indicates that in line with Coutrot et al. (2012), there was greater distribution of attention across AOIs for the no sound condition. The higher distribution ratio when sound was off could be due to either more total time fixated away from the dominant AOI or having fixations in more of the nine AOIs, being more spread out, or a combination of both.

Table 1: Distribution Ratios Arranged by Subject
*Note: A positive difference indicates that in line with Coutrot et al. (2012), there was greater distribution of attention across AOIs for the no sound condition. The higher distribution ratio when sound was off could be due to either more total time fixated away from the dominant AOI or having fixations in more of the nine AOIs, being more spread out, or a combination of both.

However, not all sequences of the beach scene elicited the same results. When the distribution data is averaged by sequence, it turns out that sequences 1 (d = -3.4) and 3 (d = -1.8) were strongly in the predicted direction with less distribution across AOIs when there was sound. For sequence 2 there was little difference (d < 1.0), but it was still in the predicted direction. However, for sequence 4 (d = 1.0), there was greater distribution of the fixations away from the dominant AOI in the sound on condition, indicating greater focus when there was no sound. Sequence 4 (“Sand Dunes: Radio”) does not follow screen conventions for shooting dialogue: it is shot in one long take rather than the customary shot-reverse-shot style and because many of Captain Miller’s lines contain military jargon and receive no response, the audience’s habituated expectations about turn-taking and shifting attention from speaker to speaker are derailed. Breaking with aesthetic and technical conventions may disrupt cognitive process of meaning-making when watching film. Another, more physiologically based reason that the “Sand Dunes: Radio” sequence may not conform to the gaze distribution patterns found in other parts of the scene is that it is the continuation of a very long take (together the sand dune sequences constitute a single, minute-long shot that viewers watched unbroken); consequently, viewers’ eyes are not re-focused on the centre of the screen following a cut and their eyes have more time to rove and explore the visual field for other meaningful cues. Put another way, without any cuts to generate an orienting response during this sequence there is no automatic allocation of cognitive resources to the story or refocusing of attention back onto a particular portion of the screen (Lang 2000, 2014). It is then a very individual response to novel and emotive (signal) cues within the scene that drives where each subject looks over the duration of this sequence. With so many different types of auditory cues that orient the viewer (Lang 2014), it is not surprising that viewers had fixations in the various AOIs we analysed on the screen.

In Saving Private Ryan, as has been found to be the case in other films such as Sergei Eisenstein’s 1938 historical war epic, Alexander Nevsky (see Smith 2014), the overall viewing patterns reflect the intention of the director in that audience members typically look where they are guided to by audio-visual screen conventions. Yet, an important reminder for further investigation is that not all members of an audience respond in the same way to each scene. This leads us to question what cues other than the lack of a sound track might lead to increased gaze distribution.

Even though the size of the nine AOIs and the number of participants was small, paired-sample t-tests were conducted to see whether there were any statistically significant differences at the level of each of the nine AOIs between the sound and no sound viewing experiences of the participants. No significant differences between the sound on and off conditions were found for fixation duration, total time fixated, visit duration or total time visiting any particular AOI for three of the sequences: “Indistinct Dialogue,” “Wounded Man,” or “Sand Dunes: Radio.” However, for “Sand Dunes: In Command,” subjects spent significantly (p < .05) less time looking at areas 5 and 6 when the sound was off (significant greater mean fixation duration, total time fixated and total time visiting AOI 5 when the sound was on; only total time fixated on AOI 6 was greater when sound was present). Interestingly, this did not translate to an increase in any particular AOI so it seems their gazes spread out significantly (dispersed) in the sound off condition.

With a small sample size, it is not surprising that there were few statistical differences, but it was surprising that focusing on the central area as represented by the nine AOIs did not pick up what seemed “obvious” when looking at the aggregated gaze plots. For example, in the “Indistinct Dialogue” sequence (see Figure 2) there is an explosion on the right-hand side of the screen that equally drew the attention of the subjects when sound was off as well as on (the screen characters ducked as the mortar shell whistled in, so aural and visual cues reinforced each other). Although there was one subject who looked down to the lower right part of the screen outside our central area of view when sound was off, the overall pattern was consistent in both conditions.

Figure 2a (above) and Figure 2b (below): Gaze plots of four subjects for “Indistinct Dialogue” (Sequence 1)

Figure 2a (above) and Figure 2b (below): Gaze plots of four subjects for “Indistinct Dialogue” (Sequence 1): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

We explored whether eye tracking could help reveal differences in viewing experience amongst subjects or even offer insight into what was happening beneath the surface of apparent synchrony. An obvious finding is that each individual subject had a different gaze pattern across the four sequences sampled, as can be seen when examining the pattern as recorded for Subject 2 in Figure 4 and Figure 6. For the longer sequences, their gaze fixated in more than half of the AOIs, while for the shorter sequences they were often more focused on particular AOIs. However, this pattern could change with sound on and off. For example when examining the pattern for Subject 1 for “Indistinct Dialogue”, there was a noticeable difference between sound on and off such that they only viewed three AOIs with sound, but with sound off they spread out to three new AOIs, ranging across six in total. The greatest shift was away from time fixated in the top left corner of our central area (when the footage was played with sound), contracting to the central third of the screen (without sound).

Sean Redmond and colleagues reported that the presence of sound only has an effect on fixation duration (number of fixations) for the “Wounded Man” sequence (forthcoming 2015). In the “Wounded Man” sequence, there was no overall difference in gaze location with sound on or sound off. However, the gaze fixation pattern for Subject 2 showed a large qualitative difference (see Figure 3). With sound off, Subject 2 only looked at AOIs 6, 8 and 9 (bottom right part of the central area, which is consistent with Miller’s screen direction and the action of falling to the ground and dragging the wounded soldier, Briggs, in this sequence). However, with sound on, Subject 2 fixates at least briefly in all nine AOIs, with the most time shifting to the centre of the screen where noisy background action is taking place and where other soldiers rapidly pass in front of the camera.

Figure 3: Areas of Interest for subject 2 in “Wounded Man” (Sequence 2)

Figure 3: Areas of Interest for subject 2 in “Wounded Man” (Sequence 2): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

However, when comparing all of the subjects and how they responded to the “Wounded Man” sequence (see Figure 4), the other three subjects exhibit similar patterns of scanning across the AOIs when sound is on and off. This pattern is what we expected for this sequence, which incorporates a significant subjective sound component when Miller experiences shellshock and is temporarily stunned and deafened. It is possible that subjective sound may help to anchor the viewer’s attention to the character’s experience.[vii]

Figure 4: Total fixation duration in nine AOIs for “Wounded Man” (all subjects)

Figure 4: Total fixation duration in nine AOIs for “Wounded Man” (all subjects): Graphs produced by the authors

A final illustration is the “Sand Dunes: In Control” sequence. Subject 3 was interesting because their viewing patterns for the “Wounded Man” (sequence 2) and the “Sand Dunes: Radio” (sequence 4) were consistent with the other subjects; however, the focus for “Sand Dunes: In Command” (sequence 3) was inconsistent. There were clear AOIs for the sound off condition, but with sound their eyes wandered over more of the central area of the screen. Attention is focused on the radio operator in the sound off condition (as indicated by the red bar in AOI 4, middle left of Figure 5). However, with sound on (indicated by the blue bars), the subject’s attention extends to new AOIs, including corners (top left, bottom left, and bottom right) that were not fixated on when there were no sound cues.

Figure 5: Areas of Interest for subject 4 in “Sand Dunes: In Command” (Sequence 3)

Figure 5: Areas of Interest for subject 4 in “Sand Dunes: In Command” (Sequence 3): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

A comparison between how the subjects fixated during this segment when the sound was on and off (Figure 6), illustrates a similar pattern of having eyes fixate in different AOIs when sound was on and off, except for Subject 1. Given the length of this sequence, the focus on three main AOIs for all subjects in the sound off condition is interesting in its consistency. However, even though the fixation data averages out to no difference between sound on and off, each of our subjects had a different response when sound was on—from fixating in all of the AOIs by Subject 4 to just staying longer on the same AOIs for Subject 1. The variation between sound on and off for this sequence may simply be an artefact of camera and figure motion, where shifts in the location of the protagonists’ faces on the screen can result in fixations in non-dominant AOIs (Mital et al. 2011, 19). However, the fact that these shifts did not occur in both conditions indicates that there is something different about those shifts when sound is on and the viewer can hear the dialogue. This is the only sequence where there was a significant difference in total time spent fixated in AOIs 5 and 6. The much lower time spent on key AOIs when sound was off suggests the subjects were looking to the periphery of the screen and did not look as much to the nine central AOIs.

Figure 6: Total fixation duration for nine AOIs in “Sand Dunes: In Command” (all subjects)

Figure 6: Total fixation duration for nine AOIs in “Sand Dunes: In Command” (all subjects): Graphs produced by the authors

Relating Eye Tracking Findings to Film Aesthetics

Our qualitative, exploratory analysis of gaze patterns in Saving Private Ryan has used eye tracking to offer an empirical account of cognitive-perceptual processing that includes sound and attends closely to audio-visual cues in the film’s stylistic system. In this way we have sought both to redress the limitations of theoretical approaches to film analysis that privilege inferential cognitive processes and to counterbalance the tendency of empirical studies to neglect the role of screen aesthetics in informing audience responses. In particular, we have built on existing work on eye tracking by taking account of how the aesthetic and experiential deployment of sound might affect perceptual processing, given that sound waves have “palpable force,” which means that sound seems “more materialized, more concrete, and more present to our experience than what we can see” (Sobchack 2005, 7). We have worked from the premise that expectations regarding character and narrative are chief among the ways that screen texts engage audiences in the construction of meaning, yet we have also acknowledged that the process of meaning-making is informed by aesthetic cues and by physiological sense-making, which is in part involuntary. The dangerous and chaotic Omaha Beach scene is what Man-Fung Yip refers to as an “intensified sensory environment” in which, “as a concrete visual and visceral force rather than a mere vehicle for semiotic signification, film violence offers an intensity of raw, immediate sensation that powerfully engages the eye and body of the spectator” (2014, 78). Like Yip, our interest has been sustained by “the complex interplay between the capacity of the human body and the resources of the cinematic medium” (2014, 89).

In her acoustic study of extreme cinema, Lisa Coulthard refers to the use of “deliberate sonic imperfections” (2013, 115) in films in which “visual assaultiveness” is paired with a disturbing soundscape: “Capable of impacting the body in palpable ways, sound is mined in many of these films for its viscerality: as one listens to extremities of acoustic proximity, frequency and volume, one’s own body responds in subconscious ways to those depicted and heard on screen” (2013, 117). These insights about sound are pertinent to Saving Private Ryan in that the Omaha Beach scene is designed to bombard the audience with the relentless onslaught of noise and action that the characters themselves face. In analysing this scene, we began with the hypothesis that “when the intensity of a background sound exceeds a certain threshold, mental activity can be paralyzed” (Augoyard and Torgue 2005, 41 qtd in Coulthard 2013, 118). In other words, we questioned whether the frenzied barrage of sound might cause a form of sensory-cognitive overload that could affect typical patterns of perceptual processing.

The eye tracking results did not reveal a significant pattern for scenes where we predicted this would occur. It would be helpful to explore this in the future with other physiological or neuro-measures that are better at identifying moments of cognitive overload or resource allocation. What does seem to emerge from our exploratory study is that even in films that firmly direct attention, as is characteristic of Spielberg’s directorial style, individual audience members bring their own complexity and experience to the viewing.[viii] Lang and colleagues point out that “complexity should be indexed not by how much of something is in the message but rather by how many human processing resources will be consumed when the message is processed” (Lang et al. 2014, 2). With respect to understanding the specific effects of what individual viewers bring with them to the screen, or teasing out how the audience is affected when watching footage that uses hand-held camera, induces cognitive overload, invokes the acoustic imagination, or uses indistinct dialogue, we conclude that this eye tracking study has raised fruitful questions that may best be answered by an approach that includes multiple measures provided by electroencephalograms, pupillometry or galvanic skin response techniques, as well as eye tracking technologies.

The “Indistinct Dialogue” and “Sand Dunes” sequences have what Sobchack terms a larger number of “‘synch points’ (‘salient’ and ‘prominent’ moments of audio-visual synchrony),” such as lines of dialogue, bullets pinging off metal and mortar shells landing, and these sonic cues “are firmly attached in a logically causal—and conventionally ‘realistic’—relation to the image’s specificity” (2005, 6–7). These synchronised sounds are “not as acousmatically imaginative and generative” as we contend that the subjective sound in the “Wounded Man” sequence is because the sounds appear to be “generated from the physical action” seen on the screen (Sobchack 2005, 7). In the “Indistinct Dialogue” and “Sand Dunes” conversation sequences, our eye tracking experiment did not necessarily reveal greater attentional focus with dialogue. While this counters what Coutrot and colleagues found in their 2012 study and Smith’s finding that sound reinforces visual synchrony (Smith 2013), it is in line with our expectation that the unconventional use of indistinct dialogue and chaotic background sound and imagery would disperse attention. Perhaps dialogue is not something that focuses visual attention, but rather something that focuses engagement. When the dialogue is clear, the viewer is able to look around the screen and absorb other cues about context. Precisely because the linguistic meaning is clear, such expository dialogue does not require as many cognitive resources to process and leaves some free for assessing other audio-visual cues. However, when the dialogue is indistinct, the viewer must then use other cues to work out the importance of the speech; in such cases the audience is essentially in the same position as watching without sound—although they may even be worse off in terms of cognitive resource allocation because there is also a barrage of other sound being processed in concert with the visual stimuli.

Overall, our use of eye tracking in conjunction with aesthetic analysis in our investigation of Saving Private Ryan has supported Coutrot and colleagues’ 2012 findings that dispersion (the degree of variability between observers’ eye positions) was lower with sound than without, so sound generally acted to concentrate perceptual attention. However, unlike Coutrot et al., we teamed eye tracking with qualitative film analysis to explore the effect of aesthetic variation and individual differences on gaze patterns as well as to identify common psychophysiologically governed patterns of attention. In this exploratory study, we found that differences in aesthetic techniques within segments of footage in the same film scene do make a difference to the audience’s gaze patterns and attentional fixation, and we found that within these patterns individual subjects exhibited divergent perceptual processes as well. Although our study is more restricted than comparable work undertaken by Coutrot and others, our attention to screen aesthetics and to variations in subjects’ responses within a single scene affords our method broader explanatory power than a study that excludes outliers and looks for commonalities across a wide range of video styles and genres.

 

References

Alexander Nevsky. Directed by Sergei Eisenstein, 1938. Mosfilm, DVD.

Augoyard, Jean-Francois, and Henry Torgue. 2005. Sonic Experience: A Guide to Everyday Sounds. Translated by Andra McCartney and David Paquette. Montreal: McGill Queen’s University Press.

Berliner, Todd. 2010. Hollywood Incoherent: Narration in Seventies Cinema. Austin: University of Texas Press.

Bordwell, David. 2009. “Cognitive Theory.” In Routledge Companion to Philosophy and Film, edited by Paisley Livingston and Carl Plantinga. 356–367. London: Routledge.

Coulthard, Lisa. 2013. “Dirty Sound: Haptic Noise in New Extremism.” In The Oxford Handbook of Sound and Image in Digital Media, edited by Carol Vernallis, Amy Herzog and John Richardson. 115–126. New York: Oxford University Press.

Coutrot, Antoine, Gelu Ionescu, Nathalie Guyader and Bertrand Rivet. “Audio Tracks do not Influence Eye Movements when Watching Videos.” Paper presented to the 34th European Conference on Visual Perception, Toulouse, France August 30, 2011.

Coutrot, Antoine, Nathalie Guyader, Gelu Ionescu and Alice Caplier. 2012. “Influence of Soundtrack on Eye Movements During Video Exploration.” Journal of Eye Movement Research 5.5: 1–10.

Coutrot, Antoine and Nathalie Guyader. 2014. “How Saliency, Faces, and Sound Influence Gaze in Dynamic Social Scenes.” Journal of Vision 14.8: 5.

Duchowski, Andrew T. 2007. Eye Tracking Methodology Theory and Practice. Dordrecht, Springer.

Gallese, Vittorio. 2013. “Mirror Neurons, Embodied Simulation and a Second-person Approach to Mind-reading.” Cortex in press: 1–3. Accessed August 28, 2014, http://dx.doi.org/10.1016/j.cortex.2013.09.008

Gallese, Vittorio and Michel Guerra. 2012. “Embodying Movies: Embodied Simulation and Film Studies.” Cinema: Journal of Philosophy and the Moving Image 3: 183–210.

Hasson, Uri, Ohad Landesman, Barbara Knappmeyer, Ignacio Vallines, Nava Rubin and David J. Heeger. 2008. “Neurocinematics: The Neuroscience of Film” Projections 2.1: 1-26.

Heimann, Katrin, Maria Alessandra Umiltà, Michele Guerra and Vittorio Gallese. 2014. “Moving Mirrors: A High-density EEG Study Investigating the Effect of Camera Movements on Motor Cortex Activation during Action Observation.” Journal of Cognitive Neuroscience 26.9: 2087–2101.

Kozloff, Sarah. 2000. Overhearing Film Dialogue. Berkeley: University of California Press.

Land, Michael, Neil Mennie and J. Rusted. 1999. “The Roles of Vision and Eye Movements in the Control of Activities of Daily Living.” Perception 28.11: 1311–1328.

Lang, Annie. 2000. “The Limited Capacity Model of Mediated Message Processing.” Journal of Communication 50.1: 46–70.

Lang, Annie, Shuhua Zhou, Nancy Schwartz, Paul D. Bolls and Robert F. Potter. 2000. “The Effects of Edits on Arousal, Attention, and Memory for Television Messages: When an Edit is an Edit Can an Edit be too Much?” Journal of Broadcasting & Electronic Media 44.1: 94–109.

Lang, Annie, Ya Gao, Robert F. Potter, Seungjo Lee, Byungho Park and Rachel L. Bailey 2014. “Conceptualizing Audio Message Complexity as Available Processing Resources.” Communication Research, published online before print. Accessed September 28, 2014, doi: 10.1177/0093650213490722

Marchant, Paul, David Raybould, Tony Renshaw and Richard Stevens. 2009. “Are you seeing what I’m seeing? An Eye-tracking Evaluation of Dynamic Scenes.” Digital Creativity 20.3: 153–163.

McGurk, Harry and John MacDonald. 1976. “Hearing Lips and Seeing Voices.” Nature 264.5588: 746–8. doi:10.1038/264746a0.

Mital, Parag, Tim J. Smith, Robin Hill and Jim Henderson. 2011. “Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion.” Cognitive Computing 3: 5–24

Plantinga, Carl. 2009. Moving Viewers: American Film and the Spectator’s Experience. Berkeley: University of California Press.

Psycho. Directed by Alfred Hitchcock, 1960. Shamley Productions, DVD.

Rear Window. Directed by Alfred Hitchcock, 1954. Paramount, DVD.

Redmond, Sean, Sarah Pink, Jane Stadler, Jenny Robinson, Andrea Rassell and Darrin Verhagen. 2015 (forthcoming). “Seeing, Sensing Sound: Eye Tracking Soundscapes in Saving Private Ryan and Monsters Inc.” In Making Sense of Cinema: Empirical Studies into Film Spectators and Spectatorship, edited by CarrieLynn D. Reinhard and Christopher J. Olson. New York: Bloomsbury.

Remael, Aline. 2003. “Mainstream Narrative Film Dialogue and Subtitling.” The Translator 9.2: 225–247.

Saving Private Ryan. Directed by Steven Spielberg. 1998. Dreamworks/Paramount. DVD.

Shimamura, Arthur, ed. 2013. Psychocinematics: Exploring Cognition at the Movies. New York: Oxford University Press.

Sita, Jodi. 2014. Personal Communication. 19 June 2014. Australian Catholic University: Victoria, Australia.

Smith, Tim J. 2014. “Audiovisual Correspondences in Sergei Eisenstein’s Alexander Nevsky: A Case Study in Viewer Attention.” In Cognitive Media Theory (AFI Film Reader), edited by Paul Taberham and Ted Nannicelli. 85–105. New York: Routledge.

Smith, Tim J. 2013. “Watching You Watch Movies: Using Eye Tracking to Inform Cognitive Film Theory.” In Psychocinematics: Exploring Cognition at the Movies, edited by Arthur P. Shimamura. 165–191. New York: Oxford University Press.

Sobchack, Vivian. 2005. “When the Ear Dreams: Dolby Digital and the Imagination of Sound.” Film Quarterly 58.4: 2–15.

Song, Guanghan, Denis Pellerin and Lionel Granjon. 2011. “Sound Effect on Visual Gaze When Looking at Videos.” In 19th European Signal Processing Conference. 2034–2038. Barcelona: EUSIPCO 2011.

Tatler, Benjamin. 2014. “Eye Movements from Laboratory to Life.” Current Trends in Eye Tracking Research, edited by Mike Horsley, Matt Eliot, Bruce Allen Knight and Ronan Reily. 17–35. London: Springer.

Võ, Melissa, Tim J. Smith, Parag Mital and John Henderson. 2012. “Do the Eyes Really Have it? Dynamic Allocation of Attention when Viewing Moving Faces.” Journal of Vision. 12.13(3): 1–14 http://www.journalofvision.org/content/12/13/3.full

Yip, Man-Fung. 2014. “In the Realm of the Senses: Sensory Realism, Speed, and Hong Kong Martial Arts Cinema.” Cinema Journal 53.4: 76–97.

 

List of figures

Figure 1: Nine Areas of Interest (AOI), Saving Private Ryan: Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 2a: Gaze plots of four subjects for “Indistinct Dialogue” (Sequence 1): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 2b: Gaze plots of four subjects for “Indistinct Dialogue” (Sequence 1): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 3: Areas of Interest for subject 2 in “Wounded Man” (Sequence 2): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 4: Total fixation duration in nine AOIs for “Wounded Man” (all subjects): Graphs produced by the authors

Figure 5: Areas of Interest for subject 4 in “Sand Dunes: In Command” (Sequence 3): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 6: Total fixation duration for nine AOIs in “Sand Dunes: In Command” (all subjects): Graphs produced by the authors

 

Notes

[i] The two preliminary sound-based eye tracking studies preceding Coutrot et al’s 2012 publication are a conference presentation by Coutrot et al. (2011), and a conference paper by Song, Pellerin, and Granjon (2011). However, in 2012 Melissa Võ and colleagues also published a study that investigated the effects on attention to faces in videos when the auditory speech track was removed. This study found that when speech was not present, observers’ gaze allocation changed: they looked more at the scene background and decreased fixations to faces generally and especially decreased concentration on the mouth region (Võ et al. 2012, 12).

[ii] A study of everyday attention indicates that people exhibit visual search behaviours that anticipate, locate, and monitor action, which is evidence of top down influences on visual perception (see Land et al. 1999).

[iii] Tim Smith states that “The degree of attentional synchrony observed for a particular movie frame will vary depending on whether it is from a Hollywood feature film or from unedited real-world footage, the time since a cut and compositional details such as focus or lighting but attentional synchrony will always be greater in moving images than static images” (2014, 90).

[iv] The lip-reading phenomenon is called the “McGurk effect” (see McGurk 1976).

[v] For further discussion of central areas of interest in Saving Private Ryan, see Redmond et al. (2015).

[vi] Established formulae for dispersion and other measures of individual variation in gaze pattern exist (e.g., Coutrot 2012). As an exploratory study, we were limited by both number of subjects and post hoc data analysis. This distribution estimate was a sufficient way to capture dominant and non-dominant viewing. However, we would recommend future research develop a better variance measure of asynchronous viewing, such as the Kullback-Lieber Divergence formula referred to above.

[vii] Note that similar results were obtained in a related study of a sequence earlier in the beach-landing scene that depicts Captain Miller’s experience of shellshock (Redmond et al. forthcoming 2015).

[viii] A neuroimaging study comparing responses to film clips ranging from a sequence directed by Alfred Hitchcock to a segment of actuality footage shot in Washington Square Park found that higher levels of aesthetic control generate greater viewer synchrony or inter-subject correlation in the audience’s viewing patterns and brain activity (Hasson et al. 2008, 15).

 

Bios

Dr Jennifer Robinson is Lecturer in Public Relations, School of Media and Communication at RMIT University. She authors industry reports and has published in J Advertising, BMC Public Health, J Interactive Marketing and the J Public Relations Research. Her media effects research investigates new media and media audiences using neuro-measures.

Jane Stadler is Associate Professor of Film and Media Studies, School of Communication and Arts at the University of Queensland. She is author of Pulling Focus: Intersubjective Experience, Narrative Film and Ethics, and co-author of Screen Media and Media and Society.

Andrea Rassell is a PhD student and Research Assistant in the School of Media and Communication at RMIT University. She has a professional background in both science and film and researches at the nexus of the two disciplines.