Sound and Sight: An Exploratory Look at Saving Private Ryan through the Eye Tracking Lens – Jennifer Robinson, Jane Stadler and Andrea Rassell


Using eye tracking as a method to analyse how four subjects respond to the opening Omaha Beach landing scene in Saving Private Ryan (Steven Spielberg, 1998), this article draws on insights from cinema studies about the types of aesthetic techniques that may direct the audience’s attention along with findings about cognitive resource allocation in the field of media psychology to examine how viewers’ eyes track across film footage. In particular, this study examines differences when viewing the same film sequences with and without sound. The authors suggest that eye tracking on its own is a technological tool that can be used to both reveal individual differences in experiencing cinema as well as to find psychophysiologically governed patterns of audience engagement.


Steven Spielberg’s Saving Private Ryan (1998) begins at a geriatric pace, ambling alongside an elderly World War II veteran as he visits a military cemetery and begins to reminisce about the men who saved his life during the Battle of Normandy in June, 1944. This is where the story really starts, with a platoon of terrified, seasick servicemen led by Captain John Miller (Tom Hanks) landing on Omaha Beach where they come under heavy fire by German infantry. The Omaha Beach landing scene is gruelling in its experiential intensity as the hand-held camera locates the audience alongside soldiers desperately fighting their way toward the enemy line amidst relentless machine gunfire and bone-shuddering explosions that tear them limb from limb.

An interdisciplinary 2014 study by Vittorio Gallese (one of the scientists credited with the discovery of mirror neurons), fellow neuroscientists Katrin Heimann and Maria Alessandra Umiltà, and film scholar Michele Guerra investigated the effects of camera movement on the audience’s feeling of involvement in film scenes and their ability to place themselves in the position of a screen character. This study was conducted using a high-density electroencephalogram (EEG) to test whether the audience’s experience of what Gallese (2012, 2013) terms “embodied simulation”—that is, neural mirroring responses that are associated with empathy—is affected by camera movement as well as by the action of human figures on screen. The researchers found that the relationship between cognition and action perception is significantly influenced by camera movement and that the use of camera techniques such as steadicam elicit stronger mirroring responses and an augmented sense of involvement in the scene because this type of cinematography more closely resembles human movement than static camera, zooms, or dolly-mounted tracking shots (Heimann et al. 2014, 2098–99).

These findings are consistent with eye tracking studies by Paul Marchant and colleagues who have demonstrated that the audience’s visual attention is captured and guided by mobile framing, focus, the direction of screen characters’ movement and lines of sight, and the colour and motion of other aspects of the mise-en-scene (Marchant et al. 2009, 157–58). This interplay of figure movement and the technical and aesthetic dimensions of cinematography is relevant to Saving Private Ryan in that the arresting beach landing scene at the start of the film is shot almost exclusively using hand-held camera to simulate human movement. The study by Heimann and colleagues suggests that this form of camera movement, teamed with the panicked motion of the figures on screen, functions to elicit a sense of affective identification with Captain Miller and the soldiers he leads by stimulating a shared experience of embodied confusion and sensory overload as the military men shake with fear and scramble to dodge the shrapnel ricocheting across the war-ravaged beach. The unstable gaze of the constantly moving camera makes it as difficult for the audience as it is for the soldiers in the scene to focus attention or see a pathway to safety and this shared perceptual experience may elevate neural mirroring responses or empathic concordance with observed actions.

Venturing into an area that has received less attention from either film scholars, media effects researchers or neuroscientists, we were struck by the acoustic ferocity of the Omaha Beach scene and we sought to understand the ways in which sound functions as a perceptual cue that may affect the cinema audience’s attention and modulate gaze patterns. This interdisciplinary study brings empirical eye tracking research into dialogue with formalist understandings of film style and cognitive engagement with narrative, using the following question to establish a framework for analysis: What audio-visual aesthetic cues guide the audience’s attention and what psychophysiological processes underlie audience responses to the screen? In particular, we draw on existing research on film dialogue by Todd Berliner and others and we supplement eye tracking data by drawing on Lisa Coulthard’s concept of “dirty sound,” Vivian Sobchack’s work on the “sonic imagination” and cognitivist methods of aesthetic film analysis to work through the experiential dimensions of the sonic confusion generated in the scene.

Cognitivist film theory, as advanced by scholars such as David Bordwell and Carl Plantinga, conceptualises film and television spectatorship as the active construction of meaning via the inferential elaboration of perceptual cues and formal screen production conventions. In a quest for greater explanatory power and a more holistic understanding of spectatorship that moves beyond rational thought and conscious inferential processes, film theorists are increasingly drawing upon empirical research in fields such as neuroscience, psychology and media effects to test assumptions about how audiences perceive and respond to screen texts, and to account for the sensory experiences and involuntary physiological reactions of the audience.

Psychophysiological Approaches to Cinema Studies

There are several different empirical approaches to studying audience members’ responses to film, including biometrics, neuroimaging, and psychophysiological techniques. Psychophysics is an area of research that quantifies physiological or bodily responses to psychological states. Neurocinema (Hasson et al. 2008) and Psychocinematics (Shimamura 2013) are emerging fields that connect these psychophysiological methods to cinematic experience. Where the neurocinematic approach involves imaging of the brain while watching cinema, in psychophysiology the subject’s physiological state is understood to be representative of psychological responses (for example, skin conductance and heart rate indicate arousal or an emotional reaction). One such response is an involuntary orienting response that assigns cognitive resources to processing stimuli in screen texts automatically.

Annie Lang (Lang 2000; Lang et al. 2000) proposes a model of responding to dynamic screen media that starts from the position that there are limited cognitive resources that any individual can bring to bear when processing mediated content. Features of the screen content can automatically consume some of those cognitive resources, which leaves less capacity for the intentional interpretation of meaning, formulation of hypotheses or speculation about protagonists’ motives (the very processes that cognitive film theory privileges). While this has been well developed for visual attributes, such as hard edits, movement and new features, Lang and colleagues are developing a similar catalogue of attributes for aural content (sound). Using a physiological indicator of an orienting response (a short, rapid decrease in heart rate just after the feature is introduced), they have identified “voice changes, music onsets, sound effect onsets, production effect onsets, emotional word onsets, silence onsets, and voice onsets” as aural cues that orient attention (Lang et al. 2014, 4).

Embodied responses to film are not necessarily indicative of cognitive processing as some responses occur in the autonomic nervous system (such as the startle response to a loud sound or a sudden movement); other processes involve the conscious allocation of cognitive resources. For example, seeing a poisonous reptile on screen can make the audience form hypotheses about impending danger, which can then prime emotional reactions such as anxiety. Increased heart rate during the shower scene in Psycho (Alfred Hitchcock, 1960), or light perspiration on the palms as viewers watch Grace Kelly fossick through the neighbour’s apartment in Rear Window (Alfred Hitchcock, 1954), are widely understood to be biological evidence of changes in psychological states in response to cinema. In such a state of arousal hormones are released, blood pressure rises, and brain wave patterns shift. These biological changes can be recorded using non-invasive techniques and have proven to be stable markers of psychophysiological changes. Some commonly used psychophysiological measures are eye tracking, Galvanic Skin Response (GSR), and pupillometry; however, in this exploratory study, only eye tracking has been used.

Eye Tracking

Eye tracking is a technique that can measure the movements of the eye by gauging the direction of infrared light bounced off the eye surface. The most common technique utilises the eye’s physiology to create different reflections of the light source from the pupil and cornea that are captured by two cameras and used to used to track the gaze and control for head and eye movements. While there are several types of eye tracking devices, those most pertinent to this study include eye trackers that require the viewer to be in a fixed position such as seated in front of a monitor, and those that can be head-mounted or worn like glasses by a mobile viewer. Eye tracking devices are used in a wide variety of fields including marketing, sports coaching and user experience. The range of Tobii Technology eye trackers are frequently employed as research tools to measure attention, as is the case in this study. Two of the main characteristics of eye movement that can be measured by eye tracking devices are saccades and fixations.


In order to collect high-quality visual data about our environment, the eye needs to be constantly redirected. We use movements called saccades in order to do this. Saccades occur at a rate of about 2-3 per second (Tatler 2014) and can be voluntary or reflexive (Duchowski 2007). Their duration ranges from 10-100 ms, rendering the individual effectively blind during this time, but not for long enough to be perceivable: “Visual sensitivity effectively shuts down during a saccade via a process known as saccadic suppression, in order to ensure that the rapid movement of light across the retina is not perceived as motion blur” (Smith 2014, 86).


A fixation is a length of time when the eyes stop large movements (saccades) and stay focused on a small visual range (typically about 5 degrees). Fixations should not be thought of as static, as the name implies, but as “miniature eye movements: tremor, drift and microsaccades” (Duchowski 2007, 46). Their duration is usually in the range of 150-600 ms (Duchowski 2007) and most visual information is processed when the eyes stabilize or fixate on a point on the screen (Smith 2014, 86).

Previous findings

A consistent finding from eye tracking research that is relevant to this study of cinema is that when scenes are viewed on a screen or a monitor, the gaze tends to fixate at the centre more than the periphery, even when salient features are not located in the middle of the frame. Because this tendency may be adaptive (for example the centre is a good resting place for fast response to new action that requires attending to), rather than solely visual, Benjamin Tatler (2014) warns against a reductive expectation that these fixations are caused by visual stimuli alone. While this study attends closely to visual stimuli and the aesthetic techniques used by filmmakers to direct attention, we also consider aural stimuli and involuntary biological responses.

Despite the large body of eye tracking research, Antoine Coutrot and colleagues claim that until recently, only two preliminary studies had investigated the influence of sound on eye movements and patterns of attention when watching film or video footage (Coutrot et al. 2012, 2).[i] When studying eye movements in response to the presence and absence of sound in audiovisual stimuli, Coutrot et al. analyse differences in three further eye tracking metrics: dispersion, distance to centre, and Kullback-Lieber Divergence. Dispersion refers to the “variability of eye positions between observers” (2012, 4). Distance to centre is a measurement of “the distance between the barycenter of a set of eye positions and the centre of the screen” (Coutrot et al. 2012, 4). Kullback-Leiber divergence “is used to estimate the difference between two probability distributions. This metric can be compared as a weighted correlation measure between two probability density functions… The lower the KL-divergence is, the closer the two distributions are… If soundtrack impacts on eye position locations, we should find a significant difference between the mean inter and intra KL-divergences” (Coutrot et al. 2012, 4). Dispersion provides information about the variability between eye positions, but does not determine the relative position of the two data sets of the eye positions for the two stimulus conditions (sound on/sound off). For the KL-divergence, it is the opposite.

Coutrot and colleagues (2012) found that eye movements follow a consistent pattern that is involuntary and that is not affected by screen aesthetics, narrative content, genre, sound or other factors in the first second following an edit. After a brief latent phase, the eye automatically refocuses on the centre of the screen after a cut and takes a second to adjust to the new image. Thereafter, they found that sound does influence gaze patterns in the following ways: dispersion is lower in the sound on condition than the sound off condition; fixation locations are different between the two conditions; sound results in larger saccades than the same footage without sound; and sound elicits longer fixations than sound off (Coutrot et al. 2012, 8).

More recently, Coutrot and Guyader found that “removing the original soundtrack from videos featuring various visual content impacts eye positions increasing the dispersion between the eye positions of different observers and shortening saccade amplitudes” (2014, 2). This study also found that in dialogue scenes, the audience’s attention tends to “follow speech turn taking more closely” (Coutrot and Guyader 2014, 1). A 2014 study by Tim Smith also investigated the cross-modal influences of audio on visual attention and found that “When the visual referent is present on the screen, such as the face of a speaker (that is, a diegetic on-screen external sound source), gaze will be biased towards the sound source, and towards the lips if the audio is difficult to interpret” (Smith 2014, 92). This accords with research in film studies into dialogue and conversation in movies. For instance, Berliner notes that movie dialogue is typically scripted to advance the narrative by directing the audience’s attention to key plot points and protagonists;[ii] furthermore, “characters in Hollywood movies communicate effectively and efficiently through dialogue” and “movie characters tend to speak flawlessly” (Berliner 2010, 191). Similarly, Aline Remael identifies the promotion of narrative continuity and textual cohesion as two of the chief functions of film dialogue (2003, 227; 233). Given these findings from two different fields of research, we pay particular attention to gaze patterns during dialogue exchanges in the analysis of Saving Private Ryan that follows.


Building on previous work by Tim Smith, Antoine Coutrot, Nathalie Guyador and other researchers who have used eye tracking to investigate attentional synchrony[iii] (as illustrated in gaze plots and heat maps that represent the concentration of the audience’s gaze), our methodology examines the distribution of fixations across nine smaller central Areas of Interest (AOIs) during film sequences to explore what is occurring for viewers who are not following the predicted pattern and instead are searching for something else. Using two conditions as stimuli (film with sound on, and film with sound off), we conducted a qualitative comparison between and within the viewing patterns of four subjects. Within the limitations of a qualitative and exploratory study with only four subjects, we drilled down to conduct a fine-grained mapping of attention to determine whether it functions in a predictable way in relation to previous findings about dialogue scenes, sonic cues and attention in relation to camera and figure movement.

For the purposes of this study, a Tobii X-120 eye tracker and Tobii Studio 2.3.2 software (Tobii Technology, Stockholm, Sweden) were used to record seven individual subjects (five females, two males) as they watched film footage. As this was an exploratory study, subjects were recruited from the researchers’ networks, with ethics approval. They were seated and positioned 55-65 cm away from the eye tracker for viewing. All subjects were recorded on the same Tobii X-120, in the same room, with the film footage played on the Tobii computer to standardise start times for all subjects to enable comparison in later analysis. Each subject was successfully calibrated by looking at symbols in different areas of the screen, which ensures the eye tracker gets a reliable measure of gaze location across the whole screen. After the viewing session, each subject’s data was analysed for quality, with three subjects excluded because one condition had segments with lower reliability than desired. Thus, the results are reported on the basis of four subjects with high quality and complete data (three females, one male).

We analysed the areas of the screen where subjects looked while watching discrete sequences of the key beach-landing scene at the beginning of Saving Private Ryan. We investigated how different stylistic techniques employed in the following four consecutive sequences of the scene affect the audience’s gaze patterns:

  1. The “Indistinct Dialogue” sequence is an 11-second clip that was chosen with a view to finding out how the audience’s attention is affected when dialogue is overridden by chaotic background noise, forcing the audience to strain to decipher what is being said. This part of the scene occurs immediately after Captain Miller has located the men under his command. The first shot is an unsteady medium close-up of Miller shouting to Horvath (Tom Sizemore) as bullets splash in the surf around him and ping off the metal structure he is crouching behind. Miller yells, “Sergeant Horvath! (Explosion.) Move your men off the beach! (Water splashes up noisily.) Now!” The next shot shows Horvath’s response in a hand-held medium close-up as he points at his men and hollers, “OK you guys, get on my ass! (Directional hand signals as bullet hits metal and drowns out dialogue.) Follow me! (Horvath ducks as a mortar shell explodes, screen right.)
  2. The “Wounded Man” sequence that occurs as the men move up the beach is a 30-second segment that is noteworthy because it includes a subjective sequence that solicits audience engagement with Captain Miller’s experience of temporary hearing loss following the concussive impact of a mortar shell nearby. This clip begins with a hand-held long shot of carnage on the beach as Miller moves to the right, dragging Briggs, a wounded soldier he is trying to help. The audience hears artillery fire, crashing, splashing, and shouting as Miller lugs Briggs into the mid-ground, with explosions and debris visible in the foreground. As mortar shells hit, spraying blood and water upwards, Miller hollers for a medic. Following a massive explosion, the sound of gunfire in the background is muted and is replaced with the subdued drone of a low, echoing, wind-like sound that communicates Miller’s subjective experience of shellshock as sand obscures everything and Miller falls to the ground. In slow motion, we cut to a low level close-up of boots in the sand. Miller scrambles up and the hand held camera follows him. Other soldiers pass in front of the camera, occluding the lens and masking the edit. The sounds of the battlefield are replaced by an echoing, low frequency droning noise and the subdued clink of military gear as Miller is momentarily dazed by shellshock. As he gets up and grabs Briggs’s arm, the sound of artillery returns loudly and we see Miller in long-shot framed with a low level camera. He staggers, looks back, and realises that Briggs is dead: his lower abdomen and legs have been blasted away. This sequence ends with a close-up of Miller’s reaction as he looks at Briggs in shock, abandons him, crawls away from the camera, then stands and runs toward the sand dunes, into enemy fire and gun smoke.
  3. The “Sand Dunes” sequence is one unusually long and complex shot that lasts for a full minute; however, in the interests of generating a more granular analysis we divided the shot in two. The first 25 seconds “Sand Dunes: In Command” begins with a match on action as the body of a soldier that was catapulted into the air by a grenade in the previous shot now hits the ground. Quickly, the hand-held camera tilts down from the long-shot of the falling soldier, pans left, and follows Miller forward as he dives behind a ridge of sand for shelter. The camera pushes in to frame Miller in close up as the dialogue begins:

Miller: (Turns left to address the radio operator.) Shore Party! No armour has made it ashore. We got no DD tanks on the beach. Dog One is not open. (Miller rolls to the right so he is framed in an over-the-shoulder shot as he shouts to other soldiers seen in medium-long shot on the dune.) Who is in command here?

Soldier: You are, Sir.

Miller: Sergeant Horvath!

Horvath: Sir!

Miller: Do you know where we are?

Horvath: Right where we’re supposed to be, but nobody else is …

4. “Sand Dunes: Radio” is a 35-second continuation of the shot detailed in sequence three, beginning when the hand-held camera pans left as Miller rolls back toward the radio operator, facing the camera in close up as he grabs the radio operator’s shoulder and hollers in his ear, straining to be heard over the background gunfire.

Soldier: (Distant, off screen, as Miller rolls to the left.) Nobody’s where they’re supposed to be!

Miller: (To radio operator) Shore Party! First wave ineffective. We do not hold the beach. Say again, we do not hold the beach. (Miller turns and rolls back towards the right, away from the camera. The camera zooms toward Horvath, excluding Miller from the frame as he listens to Horvath.)

Horvath: (Indistinct) We’re all mixed up, sir. We got the leftovers from Fox Company, Able Company and George Company. Plus we got some Navy Demo guys and a Beachmaster.

Miller: (The camera follows Horvath as he rolls to the left, toward Miller; we then see Miller in medium close-up as he turns back to radio operator.) Shore party! Shore party! (Realises radio operator is deceased; grabs hold of the radio himself.) Cat-F, Cat-F, C-… (Miller realises the radio is dead.)

In Overhearing Film Dialogue, Sarah Kozloff states that “although what the characters say, exactly how they say it, and how the dialogue is integrated with the rest of the cinematic techniques are crucial to our experience and understanding of every film since the coming of sound, for the most part analysts incorporate the information given by a film’s dialogue and overlook the dialogue as signifier” (2000, 6). By contrast, our analysis focuses closely on what the characters say, and also on what they hear. It may seem counterintuitive to be investigating the significance of sound in an eye tracking study, because sound is something that the eyes are not normally required to process. However, the Omaha Beach dialogue sequences are unusual because the audience has to rely on their eyes to search for contextual cues in order to fill in gaps in understanding due to indistinct vocals. Such cues include the direction of figure movement and eye lines (when Horvath yells, “Get on my ass and follow me!” but his words are obscured by background sound), or facial expressions and body language when words don’t fully make sense due to the inclusion of military terminology or unfamiliar radio communication codes and incomplete communicative exchanges (when Miller shouts “Dog-One” and “Cat-F” into the radio and receives no response). Coutrot and colleagues identify numerous instances in which aural and visual stimuli interact to affect attention (they offer as one example, “the help given by ‘lip reading’ to understanding speech, even more when speech is produced in poor acoustical conditions,” as is the case in Saving Private Ryan); furthermore, they report that “perceivers gazed more at the mouth as auditory masking noise levels increased” (Coutrot et al. 2012, 2).[iv] Our study builds on this work, as detailed below.

With respect to each of the four sequences outlined above, two different analyses were conducted. Given that most of the viewing was within the central area of the screen, we subdivided that area into a three by three grid providing nine smaller areas of interest, as illustrated in Figure 1.[v] The total time fixated, and mean fixation duration, was calculated for each of the nine Areas of Interest (AOI). For the default method on the Tobii eye tracking system that we used, a fixation is identified when the gaze is steadily focused in the same area of X-Y coordinates on the screen, typically occurring within 35 pixels. This technique of dividing the centre of the screen into nine smaller AOIs allowed greater granularity in determining the primary AOI where most people attended and the instances where individuals diverged and looked at other parts of the screen.

Figure 1: Nine Areas of Interest (AOI)

Figure 1: Nine Areas of Interest (AOI), Saving Private Ryan: Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

The second step was to analyse whether the participants exhibited attentional synchrony and primarily look at the same AOIs as each other or displayed individual variation. For this, we calculated an estimate of attentional distribution. For this exploratory study, a simple ratio of time spent following the guided (dominant) viewing pattern compared to time looking at other parts of the screen was calculated.[vi] The dominant AOIs were determined by which AOIs account for over 50% of fixation time in the sound on condition (control). As some sequences guide the viewer between several of the AOIs, the combination that yielded a majority of viewing time was used rather than simply the AOI with the greatest portion of time. The Distribution ratio was then calculated as follows:

Distribution Ratio =

Number non-dominant AOIs viewed multiplied by amount of time in those AOIs

Number of AOIs in dominant view multiplied by amount of time in the dominant AOIs

Eye Tracking Results and Discussion

The distribution ratio was intended to see whether the findings of Coutrot et al. (2012) applied to what we saw in the sequences from the Omaha Beach landing scene. We hypothesised that a lack of sound would increase divergence away from the dominant AOI, compared to the null condition with sound that should reinforce “attentional synchrony” (Smith 2013) by guiding the viewer to the most important focal area. However, we expected that because some of the sequences we were examining contained many competing audio and visual contextual cues, we might not see such a clear distinction. Furthermore, we anticipated that our findings in relation to dialogue may diverge from Coutrot and Guyader (2014) because the Omaha Beach scene has atypical dialogue sequences that deviate from conventional turn-taking and are overloaded with noise and movement to create a sense of confusion.

We found that averaged across all sequences, three of the viewing subjects followed the expected pattern of greater divergence with sound off, as indicated by a positive difference score in Table 1. The fourth (Subject 3) had slightly higher gaze distribution when there was sound but did have an increase in the mean number of AOIs with fixations in the “sound off” condition. This would suggest that, on the whole, sound does function to focus attention more tightly. Given the small number of participants in this exploratory study, this result is encouraging.

    Table 1: Distribution Ratios Arranged by Subject     *Note: A positive difference indicates that in line with Coutrot et al. (2012), there was greater distribution of attention across AOIs for the no sound condition. The higher distribution ratio when sound was off could be due to either more total time fixated away from the dominant AOI or having fixations in more of the nine AOIs, being more spread out, or a combination of both.

Table 1: Distribution Ratios Arranged by Subject
*Note: A positive difference indicates that in line with Coutrot et al. (2012), there was greater distribution of attention across AOIs for the no sound condition. The higher distribution ratio when sound was off could be due to either more total time fixated away from the dominant AOI or having fixations in more of the nine AOIs, being more spread out, or a combination of both.

However, not all sequences of the beach scene elicited the same results. When the distribution data is averaged by sequence, it turns out that sequences 1 (d = -3.4) and 3 (d = -1.8) were strongly in the predicted direction with less distribution across AOIs when there was sound. For sequence 2 there was little difference (d < 1.0), but it was still in the predicted direction. However, for sequence 4 (d = 1.0), there was greater distribution of the fixations away from the dominant AOI in the sound on condition, indicating greater focus when there was no sound. Sequence 4 (“Sand Dunes: Radio”) does not follow screen conventions for shooting dialogue: it is shot in one long take rather than the customary shot-reverse-shot style and because many of Captain Miller’s lines contain military jargon and receive no response, the audience’s habituated expectations about turn-taking and shifting attention from speaker to speaker are derailed. Breaking with aesthetic and technical conventions may disrupt cognitive process of meaning-making when watching film. Another, more physiologically based reason that the “Sand Dunes: Radio” sequence may not conform to the gaze distribution patterns found in other parts of the scene is that it is the continuation of a very long take (together the sand dune sequences constitute a single, minute-long shot that viewers watched unbroken); consequently, viewers’ eyes are not re-focused on the centre of the screen following a cut and their eyes have more time to rove and explore the visual field for other meaningful cues. Put another way, without any cuts to generate an orienting response during this sequence there is no automatic allocation of cognitive resources to the story or refocusing of attention back onto a particular portion of the screen (Lang 2000, 2014). It is then a very individual response to novel and emotive (signal) cues within the scene that drives where each subject looks over the duration of this sequence. With so many different types of auditory cues that orient the viewer (Lang 2014), it is not surprising that viewers had fixations in the various AOIs we analysed on the screen.

In Saving Private Ryan, as has been found to be the case in other films such as Sergei Eisenstein’s 1938 historical war epic, Alexander Nevsky (see Smith 2014), the overall viewing patterns reflect the intention of the director in that audience members typically look where they are guided to by audio-visual screen conventions. Yet, an important reminder for further investigation is that not all members of an audience respond in the same way to each scene. This leads us to question what cues other than the lack of a sound track might lead to increased gaze distribution.

Even though the size of the nine AOIs and the number of participants was small, paired-sample t-tests were conducted to see whether there were any statistically significant differences at the level of each of the nine AOIs between the sound and no sound viewing experiences of the participants. No significant differences between the sound on and off conditions were found for fixation duration, total time fixated, visit duration or total time visiting any particular AOI for three of the sequences: “Indistinct Dialogue,” “Wounded Man,” or “Sand Dunes: Radio.” However, for “Sand Dunes: In Command,” subjects spent significantly (p < .05) less time looking at areas 5 and 6 when the sound was off (significant greater mean fixation duration, total time fixated and total time visiting AOI 5 when the sound was on; only total time fixated on AOI 6 was greater when sound was present). Interestingly, this did not translate to an increase in any particular AOI so it seems their gazes spread out significantly (dispersed) in the sound off condition.

With a small sample size, it is not surprising that there were few statistical differences, but it was surprising that focusing on the central area as represented by the nine AOIs did not pick up what seemed “obvious” when looking at the aggregated gaze plots. For example, in the “Indistinct Dialogue” sequence (see Figure 2) there is an explosion on the right-hand side of the screen that equally drew the attention of the subjects when sound was off as well as on (the screen characters ducked as the mortar shell whistled in, so aural and visual cues reinforced each other). Although there was one subject who looked down to the lower right part of the screen outside our central area of view when sound was off, the overall pattern was consistent in both conditions.

Figure 2a (above) and Figure 2b (below): Gaze plots of four subjects for “Indistinct Dialogue” (Sequence 1)

Figure 2a (above) and Figure 2b (below): Gaze plots of four subjects for “Indistinct Dialogue” (Sequence 1): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

We explored whether eye tracking could help reveal differences in viewing experience amongst subjects or even offer insight into what was happening beneath the surface of apparent synchrony. An obvious finding is that each individual subject had a different gaze pattern across the four sequences sampled, as can be seen when examining the pattern as recorded for Subject 2 in Figure 4 and Figure 6. For the longer sequences, their gaze fixated in more than half of the AOIs, while for the shorter sequences they were often more focused on particular AOIs. However, this pattern could change with sound on and off. For example when examining the pattern for Subject 1 for “Indistinct Dialogue”, there was a noticeable difference between sound on and off such that they only viewed three AOIs with sound, but with sound off they spread out to three new AOIs, ranging across six in total. The greatest shift was away from time fixated in the top left corner of our central area (when the footage was played with sound), contracting to the central third of the screen (without sound).

Sean Redmond and colleagues reported that the presence of sound only has an effect on fixation duration (number of fixations) for the “Wounded Man” sequence (forthcoming 2015). In the “Wounded Man” sequence, there was no overall difference in gaze location with sound on or sound off. However, the gaze fixation pattern for Subject 2 showed a large qualitative difference (see Figure 3). With sound off, Subject 2 only looked at AOIs 6, 8 and 9 (bottom right part of the central area, which is consistent with Miller’s screen direction and the action of falling to the ground and dragging the wounded soldier, Briggs, in this sequence). However, with sound on, Subject 2 fixates at least briefly in all nine AOIs, with the most time shifting to the centre of the screen where noisy background action is taking place and where other soldiers rapidly pass in front of the camera.

Figure 3: Areas of Interest for subject 2 in “Wounded Man” (Sequence 2)

Figure 3: Areas of Interest for subject 2 in “Wounded Man” (Sequence 2): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

However, when comparing all of the subjects and how they responded to the “Wounded Man” sequence (see Figure 4), the other three subjects exhibit similar patterns of scanning across the AOIs when sound is on and off. This pattern is what we expected for this sequence, which incorporates a significant subjective sound component when Miller experiences shellshock and is temporarily stunned and deafened. It is possible that subjective sound may help to anchor the viewer’s attention to the character’s experience.[vii]

Figure 4: Total fixation duration in nine AOIs for “Wounded Man” (all subjects)

Figure 4: Total fixation duration in nine AOIs for “Wounded Man” (all subjects): Graphs produced by the authors

A final illustration is the “Sand Dunes: In Control” sequence. Subject 3 was interesting because their viewing patterns for the “Wounded Man” (sequence 2) and the “Sand Dunes: Radio” (sequence 4) were consistent with the other subjects; however, the focus for “Sand Dunes: In Command” (sequence 3) was inconsistent. There were clear AOIs for the sound off condition, but with sound their eyes wandered over more of the central area of the screen. Attention is focused on the radio operator in the sound off condition (as indicated by the red bar in AOI 4, middle left of Figure 5). However, with sound on (indicated by the blue bars), the subject’s attention extends to new AOIs, including corners (top left, bottom left, and bottom right) that were not fixated on when there were no sound cues.

Figure 5: Areas of Interest for subject 4 in “Sand Dunes: In Command” (Sequence 3)

Figure 5: Areas of Interest for subject 4 in “Sand Dunes: In Command” (Sequence 3): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

A comparison between how the subjects fixated during this segment when the sound was on and off (Figure 6), illustrates a similar pattern of having eyes fixate in different AOIs when sound was on and off, except for Subject 1. Given the length of this sequence, the focus on three main AOIs for all subjects in the sound off condition is interesting in its consistency. However, even though the fixation data averages out to no difference between sound on and off, each of our subjects had a different response when sound was on—from fixating in all of the AOIs by Subject 4 to just staying longer on the same AOIs for Subject 1. The variation between sound on and off for this sequence may simply be an artefact of camera and figure motion, where shifts in the location of the protagonists’ faces on the screen can result in fixations in non-dominant AOIs (Mital et al. 2011, 19). However, the fact that these shifts did not occur in both conditions indicates that there is something different about those shifts when sound is on and the viewer can hear the dialogue. This is the only sequence where there was a significant difference in total time spent fixated in AOIs 5 and 6. The much lower time spent on key AOIs when sound was off suggests the subjects were looking to the periphery of the screen and did not look as much to the nine central AOIs.

Figure 6: Total fixation duration for nine AOIs in “Sand Dunes: In Command” (all subjects)

Figure 6: Total fixation duration for nine AOIs in “Sand Dunes: In Command” (all subjects): Graphs produced by the authors

Relating Eye Tracking Findings to Film Aesthetics

Our qualitative, exploratory analysis of gaze patterns in Saving Private Ryan has used eye tracking to offer an empirical account of cognitive-perceptual processing that includes sound and attends closely to audio-visual cues in the film’s stylistic system. In this way we have sought both to redress the limitations of theoretical approaches to film analysis that privilege inferential cognitive processes and to counterbalance the tendency of empirical studies to neglect the role of screen aesthetics in informing audience responses. In particular, we have built on existing work on eye tracking by taking account of how the aesthetic and experiential deployment of sound might affect perceptual processing, given that sound waves have “palpable force,” which means that sound seems “more materialized, more concrete, and more present to our experience than what we can see” (Sobchack 2005, 7). We have worked from the premise that expectations regarding character and narrative are chief among the ways that screen texts engage audiences in the construction of meaning, yet we have also acknowledged that the process of meaning-making is informed by aesthetic cues and by physiological sense-making, which is in part involuntary. The dangerous and chaotic Omaha Beach scene is what Man-Fung Yip refers to as an “intensified sensory environment” in which, “as a concrete visual and visceral force rather than a mere vehicle for semiotic signification, film violence offers an intensity of raw, immediate sensation that powerfully engages the eye and body of the spectator” (2014, 78). Like Yip, our interest has been sustained by “the complex interplay between the capacity of the human body and the resources of the cinematic medium” (2014, 89).

In her acoustic study of extreme cinema, Lisa Coulthard refers to the use of “deliberate sonic imperfections” (2013, 115) in films in which “visual assaultiveness” is paired with a disturbing soundscape: “Capable of impacting the body in palpable ways, sound is mined in many of these films for its viscerality: as one listens to extremities of acoustic proximity, frequency and volume, one’s own body responds in subconscious ways to those depicted and heard on screen” (2013, 117). These insights about sound are pertinent to Saving Private Ryan in that the Omaha Beach scene is designed to bombard the audience with the relentless onslaught of noise and action that the characters themselves face. In analysing this scene, we began with the hypothesis that “when the intensity of a background sound exceeds a certain threshold, mental activity can be paralyzed” (Augoyard and Torgue 2005, 41 qtd in Coulthard 2013, 118). In other words, we questioned whether the frenzied barrage of sound might cause a form of sensory-cognitive overload that could affect typical patterns of perceptual processing.

The eye tracking results did not reveal a significant pattern for scenes where we predicted this would occur. It would be helpful to explore this in the future with other physiological or neuro-measures that are better at identifying moments of cognitive overload or resource allocation. What does seem to emerge from our exploratory study is that even in films that firmly direct attention, as is characteristic of Spielberg’s directorial style, individual audience members bring their own complexity and experience to the viewing.[viii] Lang and colleagues point out that “complexity should be indexed not by how much of something is in the message but rather by how many human processing resources will be consumed when the message is processed” (Lang et al. 2014, 2). With respect to understanding the specific effects of what individual viewers bring with them to the screen, or teasing out how the audience is affected when watching footage that uses hand-held camera, induces cognitive overload, invokes the acoustic imagination, or uses indistinct dialogue, we conclude that this eye tracking study has raised fruitful questions that may best be answered by an approach that includes multiple measures provided by electroencephalograms, pupillometry or galvanic skin response techniques, as well as eye tracking technologies.

The “Indistinct Dialogue” and “Sand Dunes” sequences have what Sobchack terms a larger number of “‘synch points’ (‘salient’ and ‘prominent’ moments of audio-visual synchrony),” such as lines of dialogue, bullets pinging off metal and mortar shells landing, and these sonic cues “are firmly attached in a logically causal—and conventionally ‘realistic’—relation to the image’s specificity” (2005, 6–7). These synchronised sounds are “not as acousmatically imaginative and generative” as we contend that the subjective sound in the “Wounded Man” sequence is because the sounds appear to be “generated from the physical action” seen on the screen (Sobchack 2005, 7). In the “Indistinct Dialogue” and “Sand Dunes” conversation sequences, our eye tracking experiment did not necessarily reveal greater attentional focus with dialogue. While this counters what Coutrot and colleagues found in their 2012 study and Smith’s finding that sound reinforces visual synchrony (Smith 2013), it is in line with our expectation that the unconventional use of indistinct dialogue and chaotic background sound and imagery would disperse attention. Perhaps dialogue is not something that focuses visual attention, but rather something that focuses engagement. When the dialogue is clear, the viewer is able to look around the screen and absorb other cues about context. Precisely because the linguistic meaning is clear, such expository dialogue does not require as many cognitive resources to process and leaves some free for assessing other audio-visual cues. However, when the dialogue is indistinct, the viewer must then use other cues to work out the importance of the speech; in such cases the audience is essentially in the same position as watching without sound—although they may even be worse off in terms of cognitive resource allocation because there is also a barrage of other sound being processed in concert with the visual stimuli.

Overall, our use of eye tracking in conjunction with aesthetic analysis in our investigation of Saving Private Ryan has supported Coutrot and colleagues’ 2012 findings that dispersion (the degree of variability between observers’ eye positions) was lower with sound than without, so sound generally acted to concentrate perceptual attention. However, unlike Coutrot et al., we teamed eye tracking with qualitative film analysis to explore the effect of aesthetic variation and individual differences on gaze patterns as well as to identify common psychophysiologically governed patterns of attention. In this exploratory study, we found that differences in aesthetic techniques within segments of footage in the same film scene do make a difference to the audience’s gaze patterns and attentional fixation, and we found that within these patterns individual subjects exhibited divergent perceptual processes as well. Although our study is more restricted than comparable work undertaken by Coutrot and others, our attention to screen aesthetics and to variations in subjects’ responses within a single scene affords our method broader explanatory power than a study that excludes outliers and looks for commonalities across a wide range of video styles and genres.



Alexander Nevsky. Directed by Sergei Eisenstein, 1938. Mosfilm, DVD.

Augoyard, Jean-Francois, and Henry Torgue. 2005. Sonic Experience: A Guide to Everyday Sounds. Translated by Andra McCartney and David Paquette. Montreal: McGill Queen’s University Press.

Berliner, Todd. 2010. Hollywood Incoherent: Narration in Seventies Cinema. Austin: University of Texas Press.

Bordwell, David. 2009. “Cognitive Theory.” In Routledge Companion to Philosophy and Film, edited by Paisley Livingston and Carl Plantinga. 356–367. London: Routledge.

Coulthard, Lisa. 2013. “Dirty Sound: Haptic Noise in New Extremism.” In The Oxford Handbook of Sound and Image in Digital Media, edited by Carol Vernallis, Amy Herzog and John Richardson. 115–126. New York: Oxford University Press.

Coutrot, Antoine, Gelu Ionescu, Nathalie Guyader and Bertrand Rivet. “Audio Tracks do not Influence Eye Movements when Watching Videos.” Paper presented to the 34th European Conference on Visual Perception, Toulouse, France August 30, 2011.

Coutrot, Antoine, Nathalie Guyader, Gelu Ionescu and Alice Caplier. 2012. “Influence of Soundtrack on Eye Movements During Video Exploration.” Journal of Eye Movement Research 5.5: 1–10.

Coutrot, Antoine and Nathalie Guyader. 2014. “How Saliency, Faces, and Sound Influence Gaze in Dynamic Social Scenes.” Journal of Vision 14.8: 5.

Duchowski, Andrew T. 2007. Eye Tracking Methodology Theory and Practice. Dordrecht, Springer.

Gallese, Vittorio. 2013. “Mirror Neurons, Embodied Simulation and a Second-person Approach to Mind-reading.” Cortex in press: 1–3. Accessed August 28, 2014,

Gallese, Vittorio and Michel Guerra. 2012. “Embodying Movies: Embodied Simulation and Film Studies.” Cinema: Journal of Philosophy and the Moving Image 3: 183–210.

Hasson, Uri, Ohad Landesman, Barbara Knappmeyer, Ignacio Vallines, Nava Rubin and David J. Heeger. 2008. “Neurocinematics: The Neuroscience of Film” Projections 2.1: 1-26.

Heimann, Katrin, Maria Alessandra Umiltà, Michele Guerra and Vittorio Gallese. 2014. “Moving Mirrors: A High-density EEG Study Investigating the Effect of Camera Movements on Motor Cortex Activation during Action Observation.” Journal of Cognitive Neuroscience 26.9: 2087–2101.

Kozloff, Sarah. 2000. Overhearing Film Dialogue. Berkeley: University of California Press.

Land, Michael, Neil Mennie and J. Rusted. 1999. “The Roles of Vision and Eye Movements in the Control of Activities of Daily Living.” Perception 28.11: 1311–1328.

Lang, Annie. 2000. “The Limited Capacity Model of Mediated Message Processing.” Journal of Communication 50.1: 46–70.

Lang, Annie, Shuhua Zhou, Nancy Schwartz, Paul D. Bolls and Robert F. Potter. 2000. “The Effects of Edits on Arousal, Attention, and Memory for Television Messages: When an Edit is an Edit Can an Edit be too Much?” Journal of Broadcasting & Electronic Media 44.1: 94–109.

Lang, Annie, Ya Gao, Robert F. Potter, Seungjo Lee, Byungho Park and Rachel L. Bailey 2014. “Conceptualizing Audio Message Complexity as Available Processing Resources.” Communication Research, published online before print. Accessed September 28, 2014, doi: 10.1177/0093650213490722

Marchant, Paul, David Raybould, Tony Renshaw and Richard Stevens. 2009. “Are you seeing what I’m seeing? An Eye-tracking Evaluation of Dynamic Scenes.” Digital Creativity 20.3: 153–163.

McGurk, Harry and John MacDonald. 1976. “Hearing Lips and Seeing Voices.” Nature 264.5588: 746–8. doi:10.1038/264746a0.

Mital, Parag, Tim J. Smith, Robin Hill and Jim Henderson. 2011. “Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion.” Cognitive Computing 3: 5–24

Plantinga, Carl. 2009. Moving Viewers: American Film and the Spectator’s Experience. Berkeley: University of California Press.

Psycho. Directed by Alfred Hitchcock, 1960. Shamley Productions, DVD.

Rear Window. Directed by Alfred Hitchcock, 1954. Paramount, DVD.

Redmond, Sean, Sarah Pink, Jane Stadler, Jenny Robinson, Andrea Rassell and Darrin Verhagen. 2015 (forthcoming). “Seeing, Sensing Sound: Eye Tracking Soundscapes in Saving Private Ryan and Monsters Inc.” In Making Sense of Cinema: Empirical Studies into Film Spectators and Spectatorship, edited by CarrieLynn D. Reinhard and Christopher J. Olson. New York: Bloomsbury.

Remael, Aline. 2003. “Mainstream Narrative Film Dialogue and Subtitling.” The Translator 9.2: 225–247.

Saving Private Ryan. Directed by Steven Spielberg. 1998. Dreamworks/Paramount. DVD.

Shimamura, Arthur, ed. 2013. Psychocinematics: Exploring Cognition at the Movies. New York: Oxford University Press.

Sita, Jodi. 2014. Personal Communication. 19 June 2014. Australian Catholic University: Victoria, Australia.

Smith, Tim J. 2014. “Audiovisual Correspondences in Sergei Eisenstein’s Alexander Nevsky: A Case Study in Viewer Attention.” In Cognitive Media Theory (AFI Film Reader), edited by Paul Taberham and Ted Nannicelli. 85–105. New York: Routledge.

Smith, Tim J. 2013. “Watching You Watch Movies: Using Eye Tracking to Inform Cognitive Film Theory.” In Psychocinematics: Exploring Cognition at the Movies, edited by Arthur P. Shimamura. 165–191. New York: Oxford University Press.

Sobchack, Vivian. 2005. “When the Ear Dreams: Dolby Digital and the Imagination of Sound.” Film Quarterly 58.4: 2–15.

Song, Guanghan, Denis Pellerin and Lionel Granjon. 2011. “Sound Effect on Visual Gaze When Looking at Videos.” In 19th European Signal Processing Conference. 2034–2038. Barcelona: EUSIPCO 2011.

Tatler, Benjamin. 2014. “Eye Movements from Laboratory to Life.” Current Trends in Eye Tracking Research, edited by Mike Horsley, Matt Eliot, Bruce Allen Knight and Ronan Reily. 17–35. London: Springer.

Võ, Melissa, Tim J. Smith, Parag Mital and John Henderson. 2012. “Do the Eyes Really Have it? Dynamic Allocation of Attention when Viewing Moving Faces.” Journal of Vision. 12.13(3): 1–14

Yip, Man-Fung. 2014. “In the Realm of the Senses: Sensory Realism, Speed, and Hong Kong Martial Arts Cinema.” Cinema Journal 53.4: 76–97.


List of figures

Figure 1: Nine Areas of Interest (AOI), Saving Private Ryan: Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 2a: Gaze plots of four subjects for “Indistinct Dialogue” (Sequence 1): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 2b: Gaze plots of four subjects for “Indistinct Dialogue” (Sequence 1): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 3: Areas of Interest for subject 2 in “Wounded Man” (Sequence 2): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 4: Total fixation duration in nine AOIs for “Wounded Man” (all subjects): Graphs produced by the authors

Figure 5: Areas of Interest for subject 4 in “Sand Dunes: In Command” (Sequence 3): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 6: Total fixation duration for nine AOIs in “Sand Dunes: In Command” (all subjects): Graphs produced by the authors



[i] The two preliminary sound-based eye tracking studies preceding Coutrot et al’s 2012 publication are a conference presentation by Coutrot et al. (2011), and a conference paper by Song, Pellerin, and Granjon (2011). However, in 2012 Melissa Võ and colleagues also published a study that investigated the effects on attention to faces in videos when the auditory speech track was removed. This study found that when speech was not present, observers’ gaze allocation changed: they looked more at the scene background and decreased fixations to faces generally and especially decreased concentration on the mouth region (Võ et al. 2012, 12).

[ii] A study of everyday attention indicates that people exhibit visual search behaviours that anticipate, locate, and monitor action, which is evidence of top down influences on visual perception (see Land et al. 1999).

[iii] Tim Smith states that “The degree of attentional synchrony observed for a particular movie frame will vary depending on whether it is from a Hollywood feature film or from unedited real-world footage, the time since a cut and compositional details such as focus or lighting but attentional synchrony will always be greater in moving images than static images” (2014, 90).

[iv] The lip-reading phenomenon is called the “McGurk effect” (see McGurk 1976).

[v] For further discussion of central areas of interest in Saving Private Ryan, see Redmond et al. (2015).

[vi] Established formulae for dispersion and other measures of individual variation in gaze pattern exist (e.g., Coutrot 2012). As an exploratory study, we were limited by both number of subjects and post hoc data analysis. This distribution estimate was a sufficient way to capture dominant and non-dominant viewing. However, we would recommend future research develop a better variance measure of asynchronous viewing, such as the Kullback-Lieber Divergence formula referred to above.

[vii] Note that similar results were obtained in a related study of a sequence earlier in the beach-landing scene that depicts Captain Miller’s experience of shellshock (Redmond et al. forthcoming 2015).

[viii] A neuroimaging study comparing responses to film clips ranging from a sequence directed by Alfred Hitchcock to a segment of actuality footage shot in Washington Square Park found that higher levels of aesthetic control generate greater viewer synchrony or inter-subject correlation in the audience’s viewing patterns and brain activity (Hasson et al. 2008, 15).



Dr Jennifer Robinson is Lecturer in Public Relations, School of Media and Communication at RMIT University. She authors industry reports and has published in J Advertising, BMC Public Health, J Interactive Marketing and the J Public Relations Research. Her media effects research investigates new media and media audiences using neuro-measures.

Jane Stadler is Associate Professor of Film and Media Studies, School of Communication and Arts at the University of Queensland. She is author of Pulling Focus: Intersubjective Experience, Narrative Film and Ethics, and co-author of Screen Media and Media and Society.

Andrea Rassell is a PhD student and Research Assistant in the School of Media and Communication at RMIT University. She has a professional background in both science and film and researches at the nexus of the two disciplines.

How We Came To Eye Tracking Animation: A Cross-Disciplinary Approach to Researching the Moving Image – Craig Batty, Claire Perkins, & Jodi Sita


In this article, three researchers from a large cross-disciplinary team reflect on their individual experiences of a pilot study in the field of eye tracking and the moving image. The study – now concluded – employed a montage sequence from the Pixar film Up (2009) to determine the impact of narrative cues on gaze behaviour. In the study, the researchers’ interest in narrative was underpinned by a broader concern with the interaction of top-down (cognitive) and bottom-up (salient) factors in directing viewers’ eye movements. This article provides three distinct but interconnected reflections on what the aims, process and results of the pilot study demonstrate about how eye tracking the moving image can expand methods and knowledge across the three disciplines of screenwriting, screen theory and eye tracking. It is in this way both an article about eye tracking, animation and narrative, and also a broader consideration of cross-disciplinary research methodologies.



Over the past 18 months, a team of cross-disciplinary researchers has undertaken a pilot eye tracking and the moving image study that has sought to understand where spectators look when viewing animation.[i] The original study employed eye tracking methods to record the gaze of 12 subjects. It used a Tobii X120 (Tobii Technology, 2005) remote eye tracking device which allowed viewers to watch the animation sequence on a widescreen PC monitor at 25 frames per second, with sound. The eye tracker pairs the movements of the eye over the screen with the stimuli being viewed by the participant. For each scene viewed, the researchers selected areas of interest; and for these areas, all of the gaze data, including the number and duration of each fixation, was collected and analysed.

Using a well-known montage sequence from the Pixar film Up! (2009), this pilot study focussed on narrative with the aim of discerning whether story cues were instrumental in directing spectator gaze. Focussing on narrative seemed to be useful in that as well as being an original line of enquiry in the eye tracking context, it also offered a natural connection between each of our disciplines and research experiences. The study did not take into account emotional and physiological responses from its participants as a way of discerning their narrative comprehension. Nevertheless, what we found from our data was that characters (especially their faces), key (narrative) objects and visual/scenic repetition seemed to be core factors in determining where they looked.[ii]

In the context of a montage sequence that spans around 60 years of story time, in which the death of the protagonist’s wife sets up the physical and emotional stakes of the rest of the film, it was clear that narrative meaning relating to a character’s journey/arc is important to viewers, more so (in this study) than peripheral action or visual style, for example. With regards to animation specifically, a form ‘particularly equipped to play out narratives that solicit […] emotions because of its capacity to illustrate and enhance interior states, and to express feeling that is beyond the realms of words to properly capture’ (Wells, 2007: 127), the highly controlled nature of the sequence from which the data was drawn seems to suggest that animation embraces narrative techniques fully to control viewer attention.

In this article, three researchers from the team – A, a screenwriter, B, a screen scholar and C, an eye tracking neuroscientist – discuss the approaches they took to conducting this study. Each of us came to the project armed with different expertise, different priorities and a different set of expectations for what we might find out, which we could then take back to our individual disciplines. In this article, then, we purposely use three voices as way of teasing out our understandings before, during and after the study, with the aim of better understanding the potential for cross-disciplinary research in this area. Although other studies in eye tracking and the moving image have been undertaken and reported on, we suggest that using animation with a strongly directed narrative as a test study provides new information. Furthermore, few other studies to date have brought together traditional and creative practice researchers in this way.

What we present, then, is a series of interconnected discussions that draw together ideas from each researcher’s community of thought and practice, guided by the overriding question: how did this study embrace methodological originality and yield innovative findings that might be important to the disciplines of eye tracking and moving image studies? We present these discussions in the format of individual reflections, as a way of highlighting each researcher’s contributions to the study, and in the hope that others will see the potential of disciplinary knowledge in a study such as this one.

How ‘looking’ features in our disciplines, and what we might expect to ‘see’

Researcher A: ‘Looking’ in screenwriting means two things: seeing and reflecting on. By this I mean that a viewer looks at the screen to see what is happening, whilst at the same time reflecting on what they are looking at from on a personal, cultural and/or political level. Some screenwriters focus on theme from the outset: on what they want their work to ‘say’ (see Batty, 2013); some screenwriters focus on plot: on what viewers will see (action) (see Vogler, 2007). What connects these is character. In Aristotelian terms, a character does and therefore is (Aristotle, 1996); for Egri, a character is and therefore does (Egri, 2004). The link here is that what we see on the screen (action) is always performed by a character, meaning that through a process of agency, actions are given meaning, feeding into the controlling theme(s) of the text. In this way, looking at – or seeing – is tied closely to understanding and the feelings that we bring to a text. As Hockley (2007) says, viewers are sutured into the text on an emotional level, connecting them and the text through the psychology of story space.

What we ‘see’, then, is meaning. In other words, we do not just see but we also feel. We look for visual cues that help us to understand the narrative unfolding before our eyes. With sound used to point to particular visual aspects and heighten our emotional states, we bestow energy and emotion in the visuality of the screen, in the hope that we will arrive at an understanding. As this study has revealed, examples include symbolic objects in the frame (the adventure book; the savings jar; the picture of Paradise Falls) that have narrative value in screenwriting because of the meaning they possess (Batty and Waldeback, 2008: 52-3). By seeing these objects repeated throughout the montage, we understand what they mean (to the characters and to the story) and glean a sense of how they will re-appear throughout the rest of the film as a way of representing the emotional space of the story.

Landscape is also something we see, though this is always in the context of the story world (see Harper and Rayner, 2010; Stadler, 2010). In other words, where is this place? What happens here? What cannot happen here? Characters belong to a story world, and therefore landscape also helps us to understand the situations in which we find them. This, again, draws us back to action, agency and theme: when we see landscape, we are in fact understanding why the screenwriter chose put their characters – and us, the audience – there in the first place.

Researcher B: In screen theory, looking is never just looking – never innocent and immediate. The act of looking is the gateway to the experience and knowledge of what is seen on screen, but also of how that encounter reflects the world beyond the screen and our place within it. Looking is over determined as gazing, knowing and being, endlessly charged by the coincidence of eye and I and of real and reel. Psychoanalytic theory imagines the screen as mirror and our identity as a spectatorial effect of recognizing ourselves in the characters and situations that unfold upon it, however refracted. Reception studies seeks out how conversely real individuals encounter content on screen, and how meaning sparks in that meeting—invented anew with every pair of eyes. Television studies emerges from an understanding of a fundamental schism in looking: where the cinematic apparatus enables a gaze, the televisual counterpart can (traditionally) only produce a broken and distracted glance.

All of these theories begin with the act of looking, and are enabled by it in their metaphors, methods and practices. But in no instance is looking attended to as anatomical vision – the process of the “meat and bones” body and brain rather than the metaphysical consciousness. As a scholar of screen theory, my base interest in eye tracking comes down to this “problem”. Is it a problem? Should the biology and theory of looking align? What effects and contradictions arise when they are brought together?

Phenomenological screen theory is a key and complex pathway into this debate, as an approach that values embodied experience, but discredits the ocular—seeking to bring the whole body to spectatorship rather than privilege the centred and distant subject of optical visuality (Marks, 2002: xvi). Vivian Sobchack names film ‘an expression of experience by experience … an act of seeing that makes itself seen, an act of hearing that makes itself heard’ (Sobchack, 1992: 3). Eye tracking shows us the act of seeing – the raw fixations and movements with which screen content is taken in. In the study under discussion here it is this data that is of central interest, with our key questions deriving from what such material can verify about how narrative shapes gaze behaviour. A central question and challenge for me moving forward in this field, though, is to consider this process without ceding to ocularcentrism: that is, without automatically equating seeing to knowing. This ultimately means being cautious about reading gaze behaviour as ‘proof’ of what viewing subjects are thinking, feeling and understanding. This approach will be supported by the inclusion of further physiological measurements.

Researcher C: Interest in vision and how we see the world is an age-old interest, where it has been commonly held that the eyes are the windows to the mind. Where we look is then of great importance, as learning this offers us opportunities to understand more about where the brain wants to spend its time. Human eyes move independently from our heads and so our eyes have developed a specialised operating systems that both allows our eyes to move around our visual environment, and also counteract any movements the head may be making. This has led to a distinct set of eye movements we can study – saccades (the very fast blasts of movement that pivot our eye from focus point to focus point) – and fixations (brief moments of relative stillness where our gaze stops for a moment to allow the receptors in our eye to collect visual information). In addition, only a tiny area of the back of our eyeball, the fovea on the retina, is sensitive enough to gather highly ‘acuitive’ information, thus the brain must drive the eye around precisely in order to get light to fall onto this tiny area of the eye. As such, our eyes movements are an integral and essential part of our vision system.

Eye movement research has seen great advances during the last 50 years, with many early questions examined in the classic work of Buswell (1935) and Yarbus (1967). One question visual scientists and neuroscientists have been, and are still keen to, explore is why we look where we do: what is it about the objects or scene that draws our visual attention? Research over the decades has found that several different aspects are involved, relating to object salience, recognition, movement and contextual value (see Schütz et al., 2011). For animations that are used for learning purposes, Schnotz and Lowe (2008) discussed two major contributing factors that influence the attention-grabbing properties of features that make up this form. One is visuospatial contrast and a second is dynamic contrast; with features that are relatively large, brightly coloured or centrally placed, more likely to be fixated on compared to their less distinctive neighbours; and features that move or change over time drawing more attention.

Eye tracking research, which is now easier than ever to conduct, allows us to delve into examining how these and other features influence us, and is a unique way to gain access to the windows of the mind. Directing this focus to learning more about how we watch films, and in particular to animation, is what drove me to wanting to use eye tracking to better see how people experience these; and to delve into questions such as, what are people drawn to look at, and how might things like the narrative affect the way we direct our gaze?

When looking around a visual world, our view is often full of different objects and we tend to drive our gaze to them so we can recognize, inspect or use them. Not so surprisingly, what we are doing (our task at hand) strongly affects how we direct our gaze; such that as we perform a task, our salience-based mechanisms seem to go offline as people almost exclusively fixate on the task-relevant objects (Hayhoe, 2000; Land et al., 1999). From this, one expectation we have when considering how viewers watch animation is that more than salient features, aspects relating to the narrative components of the viewer’s understanding of the story will be the stronger drive. Another well-known drawcard for visual attention is towards faces, which tend to draw the eye’s attention very strongly (Cerf et al., 2009; Crouzet et al., 2010). For animated films we were interested to see if similar effects would be observed.

Finally, another strong and interesting effect that has been discussed is a tendency for people to have a central viewing bias, in which a large effect on viewing behaviour has been shown to be that people tend to fixate in the centre of a display (Tatler and Vincent, 2009). As this study was moving image screen based, we were keen to compare different scenes and how the narrative affected this tendency.

How we came to the project, and what we thought it might reveal

Researcher A: From a screenwriting perspective, I was excited to think that at last, we might have data that not only privileges the story (i.e., the screenwriter’s input), but that also highlights the minutiae of a scene that the screenwriter is likely to have influenced. This can be different in animation than in live action, whereby a team of story designers and animators actively shape the narrative as the ‘script’ emerges (see Wells, 2010). Nevertheless, if we follow that what we see on screen has been imagined or at least intended by a ‘writer’ of sorts – someone who knows about the composition of screen narratives – then it was rousing to think that this study might provide ‘evidence’ to support long-standing questions (for myself at least) of writing for the screen and authorship. Screenwriters work in layers, building a screenplay from broad aspects such as plot, character and theme, to micro aspects such as scene rhythm, dialogue and visual cues. Being able to ‘prove’ what viewers are looking at, and hoping that this might correlate with a screenwriting perspective of scene composition, was very appealing to me.

I was also interested in what other aspects of the screen viewers might look at, either as glances or as gazes. In some genres of screenwriting, such as comedy, much of the clever work comes around the edges: background characters; ironic landscapes; peripheral visual gags, etc. From a screenwriting perspective, then, it was exciting to think that we might find ways to trace who looks at what, and if indeed the texture of a screenplay is acknowledged by the viewer. The study would be limited and not all aspects could be explored, but as a general method for screen analysis, simply having ideas about what might be revealed led to some very interesting discussions within the team.

Researcher B: All screen theories rest upon a fundamental assumption that different types of content, and different viewing situations, produce different viewing behaviours and effects. Laura Mulvey’s famous theory of the gaze stipulates that classical Hollywood cinema and the traditional exhibition environment (dark cinema, large screen, audience silence) position men as bearers of the look and women as objects of the look, and that avant-garde cinemas avoid this configuration (Mulvey, 1975). New theories of digital cinema speculate upon whether a spectator’s identification with an image is altered when it bears no indexical connection to reality; that is, when the image is a simulated collection of pixels rather than the trace of an event that once took place before a camera (Rodowick, 2007). The phenomenological film theory of Laura Marks suggests that certain kinds of video and multimedia work can engender haptic visuality, where the eyes function like ‘organs of touch’ and the viewer’s body is more obviously involved in the process of seeing that is the case with optical visuality (Marks, 2002: 2-3). It made sense to begin our study into eye tracking by thinking about these different assumptions regarding content and context and formulating methods to analyse them empirically.

For our first project we chose to focus on an assumption regarding spectatorship that is more straightforward and essential than any listed above: namely that viewers can follow a story told only in images. This is an assumption that underpins the ubiquitous presence of the montage sequence in narrative filmmaking, where a large amount of story information is presented in a short, dialogue-free sequence. We hypothesized that by tracking a montage sequence we would be able to ascertain if and how viewers looked at narrative cues, even when these are not the most salient (i.e., large, colourful, moving) features in the scene. The study was in this way designed to start investigating how much film directors and designers can control subjects’ gaze behaviour and top-down (cognitively driven) processes.

The sequence from Up! was chosen in part to act as a ‘control’ against which we could later assess different types of content. The story told in the 4-minute sequence is complex but unambiguous, with its events and emotive power linked by clear relationships of cause and effect. It is in this way a prime example of a classical narrative style of filmmaking, where the emphasis is on communicating story information as transparently as possible (Bordwell, 1985: 160). Our hypothesis was that subjects’ gaze behaviour would be controlled by the tightly directed sequence with its strong narrative cues, and that this study could thereby function as a benchmark against which different types of less story-driven material could be compared later.

Researcher C: A colleague and I set up the Eye Tracking and the Moving Image (ETMI) research group in 2012, following discussions around how evidence was collected to support and investigate current film theory. These conversations grew into a determination to begin a cross-disciplinary research group, initially in Melbourne, to begin working together on these ideas. I had previously been involved in research using eye tracking to study other dynamic stimuli such as decision making processes in sport and the dynamics of signature forgery and detection, and my experience led to a belief that the eye tracker could have enormous potential as a research tool in the analysis and understanding of the moving image. Work on this particular study was inspired by the early aims of a subgroup (of which the other authors are a part), whose members were interested to investigate, in a more objective manner, the effect that narrative cues had on viewer gaze behaviour.

Existing research in our disciplines, and how that influenced our approaches to the study

Researcher A: While there had been research already conducted on eye tracking and the moving image, none of it had focussed on the creational aspects of screen texts: what goes into making a moving image text, before it becomes a finished product to be analysed. Much like screen scholarship that studies in a ‘post event’ way, what was lacking – usefully for us – was input from those who are practitioners themselves. The wider Melbourne-based Eye Tracking and the Moving Image research group within which this study sits has a membership that includes other practitioners, including a sound designer and a filmmaker. Combined, this suggested that our approach might offer something different; that it might ‘do more’ and hopefully speak to the industry as well as other researchers. As a screenwriter, the opportunity to co-research with scholars, scientists and other creative practitioners was therefore not only appealing, but also methodologically important.

As already highlighted, it was both an academic and a practical interest in the intersection of plot, character and theme that underpinned my approach. As Smith has argued, valuing character in screen studies has not always been possible (1995); moving this forward, valuing character, and in particular the character’s journey, has recently become more salient (see Batty, 2011; Marks, 2009), adding weight to a creative practice approach to screen scholarship. In this way, understanding the viewer’s experience of the screen seemed to lend itself well to some of the core concerns of the screenwriter; or to put it another way, had the ability to test what we ‘know’ about creative practice, and the role of the practitioner. Feeding, then, into wider debates about the place of screenwriting in the academy (see Baker, 2013; Price, 2013; 2010), it was important to value the work of the screenwriter, and in a scholarly rigorous – and hopefully innovative – way.

Researcher B: The majority of research on eye tracking and the moving image to date has been designed and undertaken as an extension to cognitive theories of film comprehension. Deriving from the constructivist school of cognitive psychology, and led by film theorist David Bordwell, this approach argues that viewers do not simply absorb but construct the meaning of a film from the data that is presented on screen. This data does not constitute a complete narrative but a series of cues that viewers process by generating inferences and hypotheses (Elsaesser and Buckland, 2002: 170). Bordwell’s approach explicitly opposes psychoanalytic film theory by attending to perceptual and cognitive aspects of film viewing rather than unconscious processes. Psychologist Tim Smith has mobilized eye tracking in connection with Bordwell’s work to demonstrate how this empirical method can “prove” cognitive theories of comprehension—showing that subjects’ eyes do fixate on those cues in a film’s mise-en-scène that the director has controlled through strategies of staging and movement (Smith, 2011; 2013).

The Up study was designed to follow in the wake of Smith’s work, with a particular interest in examining the premise of Bordwell’s theory – which is that narration is the central process that influences the way spectators understand a narrative film (Elsaesser and Buckland, 2002: 170). With this in mind, we deliberately chose a segment from an animated film, where the tightly directed narrative of the montage sequence is competing with a variety of other stimuli that subjects’ eyes could plausibly be attracted to: salient colourful and visibly designed details in the background and landscape of each shot.

We were also interested in this montage sequence for the highly affecting nature of its mini storyline, which establishes the protagonist Carl’s deep love for his wife Ellie as the motivation for his journey in Up! itself. The sequence carries a great deal of emotive power by contrasting the couple’s happiness in their long marriage with Carl’s ultimate sadness and regret at not being able to fulfill their life-long dream of moving to South America before Ellie falls sick and dies. Would it be possible to ‘see’ this emotional impact in viewers’ gaze behaviour?

How we reacted to the initial data, and what it was telling us.

Researcher A: When looking at data for the first time, I certainly saw a correlation between what we know about screenwriting and seeing, and what we could now turn to as evidence. For example, key objects such as the adventure book, the savings jar (see Fig. 1) and the picture of Paradise Falls – all of which recurred throughout the montage sequence – were looked at by viewers intensely, suggesting that narrative meaning was ‘achieved’.

Fig. 1. A heat map showing the collective intensity of viewers’ responses to the savings jar.

Fig. 1. A heat map showing the collective intensity of viewers’ responses to the savings jar.

As another example, when characters were purposely (from a screenwriting perspective) separated within the frame of the action, viewers oscillated between the two, eventually settling on the one they believed to possess the most narrative meaning (see Fig. 2). This further implied the importance of the character journey and its associated sense of theme, which for screenwriting verifies the careful work that has gone into a screenplay to set up narrative expectations.

Fig. 2. A gaze plot showing the fixations and saccades of one viewer in a scene with the prominent faces of Carl and Ellie.

Fig. 2. A gaze plot showing the fixations and saccades of one viewer in a scene with the prominent faces of Carl and Ellie.

Researcher B: We chose to analyse the data on Up! by examining how viewer attention fluctuated in focus between Carl and Ellie across the course of the montage sequence. The two are equal agents in the narrative at the beginning, but the montage’s story unfolds through the action and behaviour of each as it continues – that is, each character carries the story at different points. Overwhelmingly, the data supported this narrative pattern by showing that the majority of viewers fixated on the character who, moment by moment, functions as the agent of the story, even when that figure is not the most salient aspect of the image. Aligning with Bordwell’s cognitive theory of comprehension, this data confirms that viewers do rely principally on narrative cues to understand a film. As a top-down process of cognition, narrative exerts control over viewer attention to keep focus on the story rather than let the gaze wander to other bottom-up (salient) details in the mise-en-scène. It is this process that allowed Smith to show that viewers overwhelmingly will not notice glaring continuity errors on screen (Smith, 2005). As in the famous ‘Gorillas in our Midst’ experiment (Simons and Chabris, 1999), viewer attention is focused so closely on employing narrative schema to spatially, temporally and causally linked events that the salient stimuli on screen appears to be completely missed.

Researcher C: Initially I was quite interested to see the attention paid to faces, and in particular, characters’ eyes and mouths. Being animation, I had been keen to see if similar elements of faces would draw viewers’ eyes in the same ways that we look at human faces, where eyes and mouths are most viewed (Crouzet, et al., 2010). Here, even though the characters were not engaging in dialogue, their mouths as well as their eyes were still searched. Looking at eyes has been linked to looking for contextual emotional information (Guastella et al., 2007), and so with this montage sequence being non-verbal, it was not surprising to see much of the focus on characters’ eyes as viewers attempted to read the emotion though them (see Fig. 3).

Fig, 3. Two viewers’ gaze plots depicting the sequence of fixations made between Carl and Ellie.

Fig, 3. Two viewers’ gaze plots depicting the sequence of fixations made between Carl and Ellie.

Other areas I was interested to observe were instances when other well-known features drew strong viewer attention, such as written text and bright (salient) objects. Two particular scenes we examined contained examples of these. In one scene, in which the savings jar sits at the back of a dark bookshelf, viewers were both drawn to look at the bright candle in the foreground and also to the savings jar. The jar was in the dark, however with narrative cues to draw attention to it as well as the fact that it contained text, viewers were drawn to look at it (see Fig. 1). Surprisingly, in this scene other interesting objects are easily discernible – a wooden colourful bird figure; a guitar; a compass – yet the savings jar as well as the bright candles were viewed. The contextual information, the text and the salience appear to be working here to drive the eye, all within a few seconds of time.

Fig. 4. Gaze plots of fixations made by all viewers over the scene in which Carl purchases airline tickets.

Fig. 4. Gaze plots of fixations made by all viewers over the scene in which Carl purchases airline tickets.

The second scene to see text working as a cue for the eye was in the travel shop scene (Fig. 4). Here, viewers were drawn to look at two text-based posters placed on the back wall of the shop. Again, this scene was only shown momentarily, yet glances towards the text and images, as well as the exchange between the characters, give viewers the elements of the story they need to glean so that they know what is going on, and where the story will go next (Carl’s surprise for Ellie).

How over time we better understood the data, and what we began to know more

Researcher A: I was interested to see that some viewers spent time looking at the periphery. The Up! montage sequence did not necessarily offer ‘alternative’ layers in the margins of the screen, though given its created and controlled animated nature, it perhaps should not be a surprise that away from the centre of the screen there were visual delights, such as the sun setting over the city and a blanket of clouds that changed shape, from clouds to animals to babies. This suggested to me that in animation, because viewers know that images have been created from scratch, there is an expectation that the screen will offer a plethora of experiences, from narrative agency to visual amplification. This, in turn, suggested that in further studies, it might be useful to contrast texts that use the potential of the full screen to engage viewers with those that go in close and privilege the centre. Genre would most likely play a key role in this future endeavour.

Researcher B: As hoped, this pilot study has been instructive as a base from which we can now expand. It has raised many questions. One issue is that this data cannot ‘prove’ subjects were not seeing those elements on-screen that were not fixated upon – were they perhaps seeing them peripherally? This could only be confirmed by conducting interviews after the eye tracking takes place, and could instructively inform an understanding of how story information that is layered in the mise-en-scène (for instance in setting, lighting and costume) contributes to overall narrative comprehension. We are also very interested to determine how the context of viewing affects gaze behaviour. For instance, would subjects still fixate overwhelmingly on narrative cues when watching this sequence in a cinema environment on a large – even an IMAX – screen? In this environment the image on screen is larger and the texture more palpable. Would viewers here perhaps be more focused on these salient pleasures of the image and engage in a different, less cognitive experience of the film; letting their eyes roam across the grain of the shot in its colours, shapes and surfaces? Would results alter between an animated and live action film? Psychoanalytic film theory tells us that the cinematic apparatus promotes identification with characters and, by extension, the ideologies of the social system from which they are produced (Mulvey, 1975). Eye tracking can potentially intervene in this powerful theory of spectatorship by showing if and how viewers do fixate on the cues that give rise to this interpellation.

Researcher C: After looking at some of early scene analyses, I was somewhat surprised by how many eye movements could be made in fleetingly fast scenes, and at how many items in these scenes one could fixate on, if only briefly. I had expected viewers to be taking in some of the surrounding items in a scene using their peripheral vision, and to see more of the centralisation bias (Tatler and Vincent, 2009). Yet for some scenes, in particular for the two scenes in which Carl purchases the surprise airline tickets (see Figs 4 and 5), we see how viewers were drawn to search for narrative clues by looking around the scene.

Fig. 5. Gaze plot showing the fixations made by all viewers as they briefly see the contents of the picnic basket.

Fig. 5. Gaze plot showing the fixations made by all viewers as they briefly see the contents of the picnic basket.

In the first scene (see Fig. 4), Carl in seen in a shop, facing the shop assistant. Viewers had previously seen him in the midst of coming up with a bright idea. This scene thus gives the viewer a chance to work out what his idea was. What can be seen is that most viewers scanned the surrounds for clues. A similar pattern is seen in the next scene, in which we quickly glance at the contents of a picnic basket being carried by Carl (see Fig. 5). In the basket, which is seen close up, viewers scan the basket’s contents. It contains picnic items and the surprise airline ticket, and even though some glances went to other basket items, it was the ticket that captured most of the attention; the item that held the most narrative information. This item was also the most salient, being the clearest and brightest item in the basket, and, importantly, the only item to contain written text. In a very short glimpse of a scene, these features almost ensured that viewers’ eyes were directed to look at and acknowledge the ticket.

What excites us about the future of work in this area, and where we think it might take our own disciplines

Researcher A: If we are to fully embrace the creative practice potential of studies such as this, then we might look to creating new texts that can then be studied. If, in 1971, Norton and Stark created simple drawings to test how their subjects recognised and learned patterns, then over 40 years later, our approach might be to develop a short moving image narrative through which we can test our viewers’ gaze. For example, if we were to develop a short film and play it out of sequence (i.e., narrative meaning altered), might we affect where viewers look? Might they look differently: in different places and for different lengths of time? Similarly, what if we were to musically score a text in different ways, diegetically and non-diegetically? Might we affect the focus of viewer gaze? If so, what might this tell us about narrative attention and filmmaking techniques that sit ‘beyond the screenplay’?

For screenwriting as a discipline, studies such as these would serve two purposes, I feel. Firstly, they would help to strengthen the presence of screenwriting in the academy, especially in regard to innovative research that privileges the role of the practitioner. Accordingly, these studies could provide a variety of methodological approaches that might be of use to other screenwriting scholars; or that might be applied to other creative practice disciplines, in which researchers wish to understand the work that has gone into the creation of a text that might otherwise only be studied once it has been completed. Secondly, and perhaps more importantly, such studies might yield results that benefit, or at least inform, future screenwriting practices. Whether industry-related practices or otherwise, just like all ‘good’ creative practice research, the insights and understandings gained would contribute to the discipline in question in the form of ‘better’ or ‘different’ ways of doing (Harper, 2007). For me, this would reflect both the nature and the value of creative practice research.

Researcher B: All of the potential avenues for future research in this field take an essential interest in how moving images on screen produce a play between top-down and bottom-up cognition. In this, a larger issue for me – going back to the points I raised at the beginning of my section – is how the data can be mobilized beyond a strictly cognitive framework and vocabulary of screen theory. As indicated, the cognitive approach offers a deliberately ‘common sense’ counterpart to a paradigm such as psychoanalysis, with its reliance on myth, desire and fantasy (Elsaesser and Buckland, 2002: 169). Cognitive theory understands a film as a data set that a viewer’s brain processes and completes in an active construction of meaning – an understanding that eye tracking and neurocinematics is very well placed to support and expand. But most screen scholars appreciate and theorize film and television texts as much more than mere sets of data. The moving image is an experience that only ‘works’ by generating emotional affect, by engaging the viewer’s attachments, memories, desires and fears. Film theorist Linda Williams proposes that our investment in following the twists and turns of a narrative is fundamentally reliant upon the emotion of pathos: we continually, pleasurably invest in the expectation that a character will act or be acted upon in such a way that they achieve their goal, and continually, pleasurably have that expectation obscured and dashed by the story (Williams, 1998). So viewer attention is driven not just by a drive to know but also by a desire to feel: to be swept up in waves of hope and disappointment.

The mini storyline of the Up! montage sequence relies entirely on this dialectic of action and pathos. Carl and Ellie’s hopes are repeatedly frustrated, and Carl is finally unable to redeem this pattern before Ellie dies – producing a profound sense of pathos and regret as the defining theme of the sequence. We can see that our subjects’ fixations fell in line with this pattern as the sequence unfolded, consistently focusing on the character who was triggering or carrying the emotional power. But how do we distinguish the ‘felt’ dimension of this gaze out from the viewer’s efforts to simply comprehend what is happening by following characters’ movements, facial expressions or body language? How, that is, can we ‘see’ emotional engagement, and start to appreciate how this crucial dimension of spectatorship – based on feeling not thinking – governs the play between top-down and bottom-up cognition in moving pictures? For me, grappling with this problem – and perhaps experimenting with further measurements of pupil dilation, heart rate and brain activity – offers a fascinating pathway into understanding how eye tracking can move beyond an engagement with cognitive film theory to contribute to phenomenological thinking on genuinely embodied seeing and experience.

Researcher C: There is so much that can be done in this area, and that makes it an exciting pursuit; yet what makes it even more motivating is the way that we hope to go about it: collaboratively. One of the core aspects that members of ETMI are very passionate about is working together, bringing in different fields, different disciplines, different ways of seeing things, and building bridges between them. This work is not only about learning more about how we watch and interact with films, but also about having different perspectives on those insights. Work I would personally like to see undertaken in this way is to explore how black and white viewing compares to colourised viewing, and to explore whether and how 3D viewing affects how we gaze about a scene. To compare the gaze and emotional responses of children and adults to the same visual content, and similarly compare visual and emotional responses to material between males and females, and between genre fans and haters, is also an interesting possibility.

Finally, adding to these, I am excited about the potential collection and analysis of other physiological measures to better gauge emotional engagement. These include blood pressure, pupillometry, skin conduction, breathing rate and volumes, heart rate, sounds made (gasps, holding breath, sighs etc.) and facial expressions made.


By reflecting on each of our research backgrounds, experiences and expectations, what this article has revealed is that while we might have all come to the study with varied approaches and intentions, we have come out of the study with a somewhat surprisingly harmonious set of observations and conclusions. Without knowing it, perhaps, we were all interested in narrative and the role that characters play in the agency of it. We were also similarly interested in landscape and the visual potential of the screen; not in an obvious way, but in relation to subtext, meaning and emotion. The value of a study like this, then, lies not just in its methodological originality, but also in its ability to stir up passions in cross-disciplinary researchers, whereby each can bring to the table their own skills and ways of understanding data to reach mutual and respective conclusions. Although we ‘knew’ this from undertaking the study, the opportunity to reflect fully on the process in the form of an article has given us an even greater understanding of the collaborative potential of cross-disciplinary researchers such as ourselves.



Aristotle. (1996). Poetics. Trans. Malcolm Heath. London: Penguin.

Baker, Dallas. (2013). Scriptwriting as Creative Writing Research: A Preface. In: Dallas Baker and Debra Beattie (eds.) TEXT: Journal of Writing and Writing Courses, Special Issue 19: Scriptwriting as Creative Writing Research, pp. 1-8.

Batty, Craig, Adrian G. Dyer, Claire Perkins and Jodi Sita. (Forthcoming). Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative. In: CarrieLynn D. Reinhard and Christopher J. Olson (eds.). Making Sense of Cinema: Empirical Studies into Film Spectators and Spectatorship. New York: Bloomsbury.

Batty, Craig. (2013) Creative Interventions in Screenwriting: Embracing Theme to Unify and Improve the Collaborative Development Process. In: Shane Strange and Kay Rozynski. (eds.) The Creative Manoeuvres: Making, Saying, Being Papers – the Refereed Proceedings of the 18th Conference of the Australasian Association of Writing Programs, pp. 1-12.

Batty, Craig. (2011). Movies That Move Us: Screenwriting and the Power of the Protagonist’s Journey. Basingstoke: Palgrave Macmillan.

Batty, Craig and Zara Waldeback. (2008). Writing for the Screen: Creative and Critical Approaches. Basingstoke: Palgrave Macmillan

Bordwell, David. (1985). Narration in the Fiction Film. London: Routledge.

Buswell Guy. T. (1935). How People Look at Pictures. Chicago: Chicago University Press.

Cerf, Moran, E. Paxon Frady and Christof Koch. (2009). Faces and text attract gaze independent of the task: Experimental data and computer model. Journal of Vision, 9(12): 10, pp. 1–15.

Crouzet, Sebastien M., Holle Kirchner and Simon J. Thorpe. (2010). Fast saccades toward faces: Face detection in just 100 ms. Journal of Vision, 10(4): 16, pp. 1–17.

Egri, Lajos. (2004). The Art of Dramatic Writing. New York: Simon & Schuster.

Elsaesser, Thomas and Warren Buckland. (2002). Studying Contemporary American Film: A Guide to Movie Analysis. London: Hodder Headline.

Guastella, Adam J., Philip B. Mitchell and Mark R Dadds. (2008). Oxytocin increases gaze to the eye region of human faces. Biological Psychiatry, 63, pp. 3-5.

Harper, Graeme and Jonathan Rayner. (2010). Cinema and Landscape. Bristol: Intellect.

Harper, Graeme. (2007). Creative Writing Research Today. Writing in Education, 43, p. 64-66.

Hayhoe, Mary. (2000). Vision using routines: A functional account of vision. Visual Cognition, 7, pp. 43–64.

Hockley, Luke. (2007). Frames of Mind: A Post-Jungian Look at Cinema, Television and Technology. Bristol: Intellect.

Land, Michael F., Neil Mennie and Jennifer Rusted. (1999). The roles of vision and eye movements in the control of activities of daily living. Perception, 28, pp. 1311–1328.

Marks, Dara. (2009). Inside Story: The Power of the Transformational Arc. London: A&C Black

Marks, Laura U. (2002). Touch: Sensuous Theory and Multisensory Media.

Minneapolis: University of Minnesota Press.

Mulvey, Laura. (1975). Visual Pleasure and Narrative Cinema. Screen, 16(3), pp. 6-18.

Norton, David, and Lawrence Stark. (1971). Scanpaths in eye movements during pattern perception. Science, 171, pp. 308–311.

Price, Steven. (2013). A History of the Screenplay. Basingstoke: Palgrave Macmillan.

Price, Steven. (2010). The Screenplay: Authorship, Theory and Criticism. Basingstoke: Palgrave Macmillan.

Rodowick, David. (2007). The Virtual Life of Film. Cambridge, MA: Harvard University Press.

Schnotz, Wolfgang and Richard K. Lowe. (2008). A unified view of learning from animated and static graphics. In: Richard K. Lowe and Wolfgang Schnotz (eds.). Learning with animation: Research implications for design. New York: Cambridge University Press, pp. 304-356.

Schütz, Alexander C., Doris I. Braun and Karl R. Gegenfurtner. (2011). Eye movements and perception: A selective review. Journal of Vision, 11(5), pp. 9, 1–30.

Simons, Daniel J. and Christopher F. Chabris. (1999). Gorillas in our Midst: Sustained Inattentional Blindness for Dynamic Events. Perception, 28, pp. 1059-1074.

Smith, Murray (1995). Engaging Characters: Fiction, Emotion, and the Cinema. Oxford: Oxford University Press.

Smith, Tim J. (2005). An Attentional Theory of Continuity Editing. [accessed October 17, 2014].

Smith, Tim J. (2011). Watching You Watch There Will Be Blood. [accessed August 22, 2014].

Smith, Tim J. (2013). Watching you watch movies: Using eye tracking to inform cognitive film theory. In: A. P. Shimamura (ed.). Psychocinematics: Exploring Cognition at the Movies. New York: Oxford University Press, pp. 165-191.

Sobchack, Vivian (1992). The Address of the Eye: A Phenomenology of Film Experience. Princeton, N.J: Princeton University Press.

Stadler, Jane (2010). Landscape and Location in Australian Cinema. Metro, 165.

Tatler, Benjamin W., and Benjamin T. Vincent. (2009). The prominence of behavioural biases in eye guidance. Visual Cognition, 17, pp. 1029–1054.

Tobii Technology (2005). User Manual. Tobii Technology AB. Danderyd, Sweden.

Vogler, Christopher (2007). The Writer’s Journey: Mythic Structure for Writers. Studio City, CA: Michael Wiese Productions.

Wells, Paul (2010). Boards, Beats, Binaries and Bricolage – Approaches to the Animation Script. In: Jill Nelmes (ed.) Analysing the Screenplay, Abingdon: Routledge, pp. 104-120.

Wells, Paul (2007) Basics Animation 01: Scriptwriting. Worthing: AVA Publishing.

Williams, Linda (1998). Melodrama Revised. In: Nick Browne (ed.). Refiguring American Film Genres: History and Theory. Berkeley, CA: University of California Press.

Yarbus, Alfred L. (1967). Eye Movements and Vision. New York: Plenum.


List of figures

Fig. 1. A heat map showing the collective intensity of viewers’ responses to the savings jar. Source: author study.

Fig. 2. A gaze plot showing the fixations and saccades of one viewer in a scene with the prominent faces of Carl and Ellie. Source: author study.

Fig, 3. Two viewers’ gaze plots depicting the sequence of fixations made between Carl and Ellie. Source: author study.

Fig. 4. Gaze plots of fixations made by all viewers over the scene in which Carl purchases airline tickets. Source: author study.

Fig. 5. Gaze plot showing the fixations made by all viewers as they briefly see the contents of the picnic basket. Source: author study.



[i] A full analysis of this study, ‘Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative’, will appear in the forthcoming collection Making Sense of Cinema: Empirical Studies into Film Spectators and Spectatorship, edited by CarrieLynn D. Reinhard and Christopher J. Olson.

[ii] See Batty, Craig, Dyer, Adrian G., Perkins, Claire and Sita, Jodi (forthcoming) for full results.



Associate Professor Craig Batty is Creative Practice Research Leader in the School of Media and Communication, RMIT University, where he also teaches screenwriting. He is author, co-author and editor of eight books, including Screenwriters and Screenwriting: Putting Practice into Context (2014), The Creative Screenwriter: Exercises to Expand Your Craft (2012) and Movies That Move Us: Screenwriting and the Power of the Protagonist’s Journey (2011). Craig is also a screenwriter and script editor, with experiences across short film, feature film, television and online drama.

Dr Claire Perkins is Lecturer in Film and Screen Studies in the School of Media, Film and Journalism at Monash University. She is the author of American Smart Cinema (2012) and co-editor of collections including B is for Bad Cinema: Aesthetics, Politics and Cultural Value (2014) and US Independent Film After 1989: Possible Films (forthcoming, 2015). Her writing has also appeared in journals including Camera Obscura, Critical Studies in Television, Celebrity Studies and The Velvet Light Trap.

Dr Jodi Sita is Senior Lecturer in the School of Allied Health at the Australian Catholic University. She works within the areas of neuroscience and anatomy, with expertise in eye tracking research. She has extensive experience with multiple project types using eye tracking technologies and other biophysical data. As well as her current research using into viewer gaze patterns while watching moving images, she is using eye tracking to examine expertise in Australian Rules Football League coaches and players, and to examine the signature forgery process.

Movement, Attention and Movies: the Possibilities and Limitations of Eye Tracking? – Adrian G. Dyer & Sarah Pink


Movies often present a rich encapsulation of the diversity of complex visual information and other sensory qualities and affordances that are part of the worlds we inhabit. Yet we still know little about either the physiological or experiential elements of the ways in which people view movies. In this article we bring together two approaches that have not commonly been employed in audience studies, to suggest ways in which to produce novel insights into viewer attention: through the measurement of observer eye movements whilst watching movies; in combination with an anthropological approach to understanding vision as a situated practice. We thus discuss how both eye movement studies that investigate complex media such as movies need to consider some of the important principles that have been developed for sensory ethnography, and in turn how ethnographic and social research can gain important insights into aspects of human engagement from the emergence of new technologies that can better map how an understanding of the world is constructed through sensory perceptual input. We consider recent evidence that top down mediated effects like narrative do promote significant changes in how people attend to different aspects of a film, and thus how film media combined with eye tracking and ethnography may reveal much about how people build understandings of the world.


Seeing in complex environments is not a trivial task. Whilst people are often under the impression that you can believe what you see (Levin et al. 2000), physiological and neural constraints on how our visual system operates means that only a very small proportion of an overall visual scene might be reliably perceived at one point in time during the evaluation of a sequence of events. Evidence for the way in which we often only perceive a portion of the vast amount of visual information present in a scene is nicely illustrated in the ‘Gorillas in our Midst’ short (25s) motion sequence where a group of six participants (three dressed respectively in white or black teams) are filmed passing a basketball between team members (Simons and Chabris 1999). Subjects observing the film sequence are required to count the number of passes between the three students dressed in white, and whilst many subjects do correctly count the number of passes, the majority of test subjects fail to observe a large gorilla (an actor dressed as a gorilla) that walks into the middle of the visual field and beats it’s chest, before walking casually out of the scene. People typically don’t see this salient gorilla in the action sequence because their attention has been directed to the basketball catching team in white with the instruction of counting the number of passes. Why do we miss such a salient object as a gorilla, and what does this mean for our understanding of how different subjects might view complex information in real life, or in presentations that encapsulate aspects of real life, such as movies?

In this article we take an interdisciplinary approach to the question of how we might see certain things in complex dynamic environments. We draw together insights from the neurosciences and eye tracking studies, with anthropological understandings of vision and audio-visual media in order to map out an approach to audience research that accounts for the relationship between human perception, vision as a form of practical activity, and the environments through which these are co-constituted. We first build a brief outline of how the eye, visual perception and the subjectivity or selectivity of viewing are currently understood from the perspective of vision sciences. This demonstrates how physiologically there is evidence that the eye sees selectively, yet it does not fully explain why or how perceptual understanding might vary across different persons, or for the same person across different contexts. We then build on this understanding with a discussion of what we may learn from eye tracking studies with moving images. As we will show, eye tracking can offer detailed measurements of how the eye attends to specific instances, movements, and points within sequences of action. This can reveal patterns of attention across a sample of participants, towards specific types of action. Yet eye tracking is limited in that while it can tell us what participants’ eyes are attending to, it cannot easily tell us why, what they are experiencing, what their affective states are, nor how their actions are shaped by the wider social, material, sensory and atmospheric environments of which they are part. Therefore in the subsequent section we turn to phenomenological anthropology, and draw on the possibilities provided by the theoretical-ethnographic dialogue that is at the core of anthropological research, to suggest how the propositions of eye tracking studies might be situated in relation to the ongoingness and movement of complex environments.

We discuss that such an interdisciplinary approach, which brings together monitoring and measurement with qualitative and experiential research, is needed to generate understandings of not only what people view but of how these viewing practices and experiences become relevant as part of the ways in which they both perceive and participate in the making of everyday worlds. However we end the article with a note about the relative complexities of working across disciplines, and in particular between those that measure and those that use empathetic and collaborative modes of understanding and knowing, which can be theorised as part of the ways film is experienced (Bordwell and Thompson 2010; Pink 2013). For a review of how these issues may relate to broader issues about film culture, eye tracking and the moving image readers are also referred to the manuscripts in this special issue by Redmond and Batty (2015), and Smith (2015).

Visual Resolution, Perception and the Human Eye

To enable visual perception the human eye has cone photoreceptors distributed across the retina which enable wide field binocular visual perception of about 180 degrees (Leigh and Zee 2006). In the central fovea region of the eye, cone photoreceptors are much more densely packed, and our resulting high acuity vision is only about 2-3 degrees of visual angle (Leigh and Zee 2006). Visual angle is a convenient way understand the relationship between the actual size of an object and viewing distance, for example, our fovea acuity is approximately equivalent to the width of our thumb held at about 57 cm (at this distance 1cm represents 1 degree of visual angle). This means that to view visual information in detail it is often necessary to direct the gaze of our eyes to different parts of a scene, and this is typically done with either ballistic eye movements termed saccades, or much slower smooth pursuit eye movements like when we follow the movement of a slow object in the distance (Martinez-Conde et al. 2004). Saccades are commonly broken down into two main types that are of high value for interpreting how viewers might perceive their environment, including reflexive saccades mainly thought to be driven by image salience (also termed exogenous control), or volitional saccades (endogenous control) where a viewers’ internal decision making directs attention through top-down mechanisms to where the gaze should be attended within a scene or movie sequence (Martinez-Conde et al. 2004; Parkhurst et al. 2002; Tatler et al. 2014; Pashler 1998; Smith 2013). Thus eye movements can be, in very broad terms, described as ‘bottom up’ processing when the eye makes reflexive saccades to salient stimuli within a scene, or ‘top down’ when a viewer uses their volitional control to direct where the eye should look, and both types of saccade are important for understanding how we interacted with complex scenes in everyday life. For example, on entering a café we might casually gaze at the wonderful variety of cakes with reflective saccades to all the highly colourful icings; but when a friend says to ‘try the chocolate cake’ we direct our eyes only to cakes of chocolate brown colour using volitional saccades. Interestingly, these different types of saccadic eye movements are likely to involve different cortical processing of information (Martinez-Conde et al. 2004), potentially allowing for complex multi modal processing that incorporates the rich and dynamic environment experienced when viewing a movie. It is likely that both these mechanisms operate whilst subjects view a film, and the extent to which mechanism dominates during a particular film sequence may depend upon factors like visual design, narrative, audio input and cinema graphic style, as well as individual experience or demographic profile of observers.

The fact that we typically only perceive the world in low resolution at any one point in time can be easily illustrated with an eye chart in which letters of different parts of our visual field are scaled to make the letters equally legible when a subject fixates their gaze on a central fixation spot, or simulated by selectively Gaussian blurring a photograph such that it matches how we see detail at any one point in time (Figure 1). Human subjects typically shift their gaze about 3 times a second in many real world type scenarios in order to build up a detailed representation of our visual environment (Martinez-Conde et al. 2004; Tatler et al. 2014; Yarbus 1967). To efficiently direct the fovea to different parts of a visual scene, the human eye usually makes saccades, which also require a shift of the observer’s attention (Kustov et al. 1996; Martinez-Conde et al. 2004). One way to record subject gaze is to use a video-based eye tracking system that makes use of the different reflective properties of the eye to infrared radiation (Duchowski 2003), using a wavelength of radiation that is both invisible to the test subject and does not damage the eye. This non invasive technique thus enables very natural behavioural responses to be collected from a wide range of subjects. When the eye is illuminated by infrared light, which is typically provided by the eye tracking equipment, it enters the lens and is strongly reflected back by the retina providing a high contrast signal for an infrared camera to record, whilst some of the carefully placed infrared lights also reflect off the cornea of the eye which provides a constant references signal to enable eye tracker software to disentangle minor head movements from the actual eye movements of a subject. A subject is first calibrated to grid stimulus of known spatial dimensions (Dyer et al. 2006), and then when test images are viewed it is possible to accurately quantify the different regions of a scene to which the subject pays attention, the sequence order off this attention, and thus also what features of a scene may escape the direct visual attention of a viewer (Duchowski 2003). The use of this non invasive technique then directly enables the measurement of subject attention to the different components of a stimulus (Figure 2), and has been extensively employed for static images for many fields including medicine, forensics, face processing, advertising, sport and perceptual learning (Dyer et al. 2006; Horsely 2014; Russo et al. 2003; Tatler 2014; Vassallo et al. 2009; Yarbus 1967).

Figure 1. The way our eye samples the world means that only the central fovea region is viewed in detail. The left had image shows letters scaled to equal legibility when a subject fixates gaze to the central dot, and the right hand image is a photographic reconstruction of how an eye would typically resolve detail of the Sydney Harbour Bridge at one point in time.

Figure 1. The way our eye samples the world means that only the central fovea region is viewed in detail. The left had image shows letters scaled to equal legibility when a subject fixates gaze to the central dot, and the right hand image is a photographic reconstruction of how an eye would typically resolve detail of the Sydney Harbour Bridge at one point in time.

In recent times there has been a growing appreciation that to understand how the human visual system and brain processes complex information, the use of moving images has significant advantages since these stimuli may more accurately represent the very complex and dynamic visual environments in which we typically operate (Tatler et al. 2011). For example, when the eyes of a subject are tracked whilst driving a car, it can be observed that the gaze of subjects tends to be directed ahead of the responding action that a driver will take (Land and Lee 1994), and in other real life activities like making a cup of tea test subjects also tend to fixate on particular objects before an action like picking up an object (Land et al. 1999). This shows visual processing is often dynamic and may be influenced by top down volitional goals of a subject, whilst static images may not always best represent how subjects’ actions are informed by visual input in a dynamic situation (Tatler 2014). Interestingly, the capacity of subjects at visually anticipating tasks may link to performance or experience at a given action, as elite cricket batsmen viewing action can more efficiently predict the location that a ball will bounce in advance of the event, providing significant advantages for facing fast bowling where decisions must be made very quickly and accurately (Land and McLeod 2000). Thus there is evidence that visual perception and eye movements for moving images may be influenced by top down mechanisms and experience, as well as bottom up salience driven mechanisms of visual processing (Tatler 2014).

Subject viewer gaze and attention in dynamic environments can also be significantly influenced by the actions of other people who may be viewed within a scene. For example, when viewing a simple magic trick where an experienced magician in a video waves a hand to make an object disappear, the gaze direction of subjects viewing the video is heavily influenced by the actual gaze direction of the magician in the video clip (Tatler and Kuhn 2007). If the magician appears to pay attention to his waving hand then subjects follow this misdirection of viewer attention and the magi trick, performed with the other hand, cannot be detected and the magic trick is successful. However; this pattern is changed if the magician’s gaze attends the hand performing the apparent magical act, and then trick is readily detected by observes. This simple but highly effective demonstration shows how viewer experience is not only driven by reflexive bottom up salience signals present in complex images, but several top down and/or contextual factors may influence visual behaviour. The effect of dynamic complex environments affecting subject eye movements has also been observed in demonstrations of how people might encounter each other and either divert or attend their gaze depending upon prior experience, the perception of threat and/or chance of a collision (Jovancevic-Misic and Hayhoe 2009). Other evidence of top down type influences on observer gaze behaviour come from our understanding of how instructions or narrative may influence where a subject looks (Land and Tatler 2009; Tatler 2014; Yarbus 1967). For example, in the classic eye movement experiments done by Yarbus (1967), in which he presented to test subjects static images, a variety of different instructions were provided for viewing the painting ‘The Unexpected Visitor’ by Ilya Repin. These instructions included estimating the material circumstances of subjects within the painting, or the age of the subjects, the subject’s clothing; and a very different set of saccades and fixations was observed for different instructions or a free view situation that might be taken as a condition mainly driven by bottom up salience factors on perception; showing that top down view goals strongly influenced the way in which eye gaze is directed (Tatler 2014; Yarbus 1967).

Eye Tracking For Understanding Dynamic and Complex Visual Information

Whilst these clever, and comparatively complex, evaluations of visual perception are currently teaching us a lot about human visual performance and viewer experience, the current rapid advances in computer technology and eye tracking are now starting to enable the testing of how subjects view very complex dynamic environments as encapsulated in movies (Mital et al. 2011; Smith and Henderson 2008; Smith and Mital 2013; Smith et al. 2012; Treuting 2006; Vig et al. 2009). This potentially allows for new insights into increasingly real world type viewer experience, how the visual system potentially processes very complex information, and how viewers from different demographics may interpret information content in films. For example, some recent work has looked at subject viewer attention within movies and observed high levels of attention to faces (Treuting 2006), revealing consistent behaviours to previous work that used static images (Vassallo et al. 2009; Yarbus 1967), but a wealth of opportunities are becoming available for better understanding real world visual processing.

Figure 2. When we view an image our eyes often fixate on key areas on interest for short periods of about a third of a second, and then the eyes may make ballistic shifts (saccades) to other features. When a typical subject viewed sequential images from the film 'UP', fixations (green circles) mainly centred on the respective faces of main characters, whilst lines between fixations show the direction of respective saccades [image from Craig Batty, Adrian Dyer, Claire Perkins and Jodi Sita. 2014 Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative ( Bloomsbury, 2015) with permission].

Figure 2. When we view an image our eyes often fixate on key areas on interest for short periods of about a third of a second, and then the eyes may make ballistic shifts (saccades) to other features. When a typical subject viewed sequential images from the film ‘UP’, fixations (green circles) mainly centred on the respective faces of main characters, whilst lines between fixations show the direction of respective saccades [image from Craig Batty, Adrian Dyer, Claire Perkins and Jodi Sita. 2014 Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative ( Bloomsbury, 2015) with permission].

A current issue of how to interpret eye movement data for subjects viewing a film is how such a large volume of data can be managed and statistically separated to try and interpret viewer experience. One initial solution is gaze plot analyses which shows the average attention of a number of subjects to a particular scene (Fig. 3). Investigations on still images using gaze plot analyses have indicated a strong tendency for central bias to an image that is largely independent of factors like subject matter or composition (Tatler 2007). Studies on moving images appear to confirm a tendency for viewing restricted parts of the overall image in detail (Dorr et al. 2010; Goldstein et al. 2007; Mital et al. 2011; Smith and Henderson 2008; Tosi et al. 1997), which may hold important implications for data compression type algorithms where large amounts of image data may be streamed to a variety of different mobile viewing devices such that certain information does not have to be displayed at high resolution due to the resolution of the human eye (Fig. 1), or even certain parts of the movie may be modified to enhance viewing experience for visually impaired viewers (Goldstein et al. 2007). Despite the qualitative value of gaze plot displays, quantitative analyses can be better facilitated by allocating Areas of Interest (Fig. 4) to certain components of a scene that are hypothesised to be of high value for dissecting different theories about information processing of moving images. For example, one of the current issues in understanding how eye tracking can inform film culture, and how movies can be a useful stimulus for understanding visual behaviour is having a method that can explore the potential effects of narrative which is a hypothesised top down or endogenous control on viewer gaze behaviour when subjects are freely viewing a movie to enable natural behaviour (Smith 2013).

FIGURE 3: Gaze plot shows the mean attention of a number of viewers (n=12) to a particular scene. In this case faces capture most attention consistent with previous reports (Yarbus 1967, Vassallo et al. 2009). [image from Craig Batty, Adrian Dyer, Claire Perkins and Jodi Sita. 2014 Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative ( Bloomsbury, 2015) with permission].

FIGURE 3: Gaze plot shows the mean attention of a number of viewers (n=12) to a particular scene. In this case faces capture most attention consistent with previous reports (Yarbus 1967, Vassallo et al. 2009). [image from Craig Batty, Adrian Dyer, Claire Perkins and Jodi Sita. 2014 Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative ( Bloomsbury, 2015) with permission].

Recently one study has tackled this question by using a montage sequence from the animation movie ‘Up’ (Pete Docter, 2009) to explore if it is possible to collect empirical evidence that supports modulation between bottom up and top down mechanisms. The animation montage is a high value study case as it encapsulates a lifetime of narrative story within a 262s montage film sequence that contains no dialogue (Batty 2011), and the overall salience of the two principle characters ‘Carl’ and ‘Elli’ depicted within the montage is somewhat consistently matched due to the control exhibited by animation production. For example, in an initial opening scene where these two characters are first encountered by subjects viewing the film there was an almost identical percentage of time to Carl and Elli respectively; however, as the montage unfolds with a life story narrative of marriage, dreams of children, miscarriage, dreams of travel, illness and death; there is a significant difference in the amount of attention paid to the respective characters by viewers at different stages in the montage (Batty et al. 2014). This suggests that influences of top down type processing on the overall salience of complex images as have been observed in some studies using short motion displays in laboratory type conditions (Jovancevic-Misic and Hayhoe 2009; Tatler and Kuhn 2007; Tatler 2014), is a promising avenue of investigation for movie studies if it is possible to design protocols for controlling the many factors that can influence image salience (Parkhurst et al. 2002; Martinez-Conde et al. 2004; Tatler et al. 2014).

FIGURE 4. Areas of interest can be programmed to quantify the number and respective duration of fixations to key components within a scene of a movie, which may allow for the dissection of how factors like narrative influence viewer behaviour. saccades [from Craig Batty, Adrian Dyer, Claire Perkins and Jodi Sita. 2014. Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative (Bloomsbury, 2015) with permission].

FIGURE 4. Areas of interest can be programmed to quantify the number and respective duration of fixations to key components within a scene of a movie, which may allow for the dissection of how factors like narrative influence viewer behaviour. saccades [from Craig Batty, Adrian Dyer, Claire Perkins and Jodi Sita. 2014. Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative (Bloomsbury, 2015) with permission].

Yet because eye tracking can only tell us part of the story – that is, what people look at, and not how and why these ways of looking emerge and are enacted – other qualitative research approaches such as those used in visual and sensory ethnography (Pink 2013; Pink 2015) are needed to put eye tracking data into context. This involves approaching viewing and the practices of vision that it entails as situated activities, and as part of a broader experiential repertoire beyond the eye. The subjectivity and selectivity of viewing that the studies outlined above have evidenced, once documented and measured, can only be properly understood as emergent from particular (and always complex) environmental conditions and embodied experiences. In the next section we therefore turn to anthropological approaches to vision and the environment in order to show how this might be achieved. However before proceeding we note that when working across disciplines, there is inevitably a certain amount of conceptual slippage. Here this means that whereas we have ended the previous paragraph by suggesting that eye tracking enables our understanding of how complex environmental information is processed, in the next sections we refigure this way of thinking to consider how human perception and viewer experience are constituted in relation to the affordances of complex environments of which they are also part.

Situating Viewing as Part of Complex Environments

The environment, as a concept, is slippery and is used to different empirical and political ends in different contexts. As the anthropologist Tim Ingold has emphasised, in contemporary discourses ‘the environment’ is often referred to as an entity and as something that we exist as separate from. Indeed, this idea is present in our discussion above whereby we have considered how eye tracking studies might better show us how we process complex environmental information. As Ingold expresses it this means we are ‘inclined to forget that the environment is, in the first place, a world we live in, and not a world we look at’. He argues that ‘We inhabit our environment: we are part of it; and through this practice of habitation it becomes part of us too’ (Ingold T 2011). Following this approach the environment can be understood as an ecology that humans are part of and with which we, and the ways we view, see and experience are mutually constituted. This however does not simply mean that ‘we’ as humans are encompassed by the environment, it means that the environment is co-constituted by us and our relationships with other constituents, which for our purposes in this article we would emphasise includes, film, images, art, technologies, other humans, the weather, the built environment (as well as much more). As advanced by Ingold (Ingold 2000, 2010) and art historian Barbara Stafford (Stafford 2006), approaches that critique linguistic and semiotic studies invite an analysis which acknowledges that – as Stafford puts it – ‘when you open your eyes and actively interrogate the visual scene, what you see is that aspect, or the physical fragments, of the environment that you perform’ (Stafford 2006). This however means also that the experience of film does not simply involve us looking at something that is external to us, but it is through the affordances of film that, in relation to the other constituents of our environments/worlds that viewing becomes meaningful to us. In this interpretation the use of ‘we’ derives from the development of a universal theory of human perception and our relationship to a (complex) environment. Yet, as we explain in the next section this rendering does not dismiss the idea that different people may often perceive the same information differently, and indeed to the contrary invites us to study precisely how and why difference emerges.

If we take Ingold’s approach further, to focus in on how meanings are generated through our engagements with and experiences of visual images, we can gain an appreciation of how the measurement and monitoring patterns that emerge from eye tracking studies are materialisations or representations not just of how the eye (or the mind) responds to the moving image. Rather they can be understood as standing for (but not actually explaining the meaning of) what people do with the moving image. Building on philosophical and other traditions emerging from the work of Merleau-Ponty, Gibson and Jonas, Ingold, has argued that human perception, learning and knowing emerge from movement, specifically as we move through our environments and engaging with the affordances of those other things and processes we encounter (Ingold 2000). With regard to art, he has used this approach to suggest that therefore …

Should the drawing or painting be understood as a final image to be inspected and interpreted, as is conventional in studies of visual culture, or should we rather think of it as a node in a matrix of trails to be followed by observant eyes? Are drawings or paintings of things in the world, or are they like things in the world, in the sense that we have to find our ways through and among them, inhabiting them as we do the world itself? (Ingold 2010; p16)

If we transfer this idea to the question of how we view film we might then ask the question of how we, as viewers, inhabit film? And what then eye tracking studies can tell us about these forms of habitation. If we see the relationship established between the viewer (‘s eyes) and the film by eye tracking visualisations such as those demonstrated in the earlier sections of this article, we can begin to think of how the movement of the eye and the movement of the film become entangled. Indeed while the film and the eye will both inevitably continue to move, the question becomes not simply how the composition and action in the screen influences the movement of the eye, but rather how the eye selects the aspects of the composition and action of the screen with which to move. By taking this perspective, we are able to remove something of the technological determinism that underpins assumptions that eye tracking studies might better enable film, advertising organisations to better influence viewing behaviours. Instead it directs us towards considering what eye tracking studies might tell us about what people do when they view, and how this can inform us about how they inhabit a world of which film and more generally the moving image as an ubiquitous presence.

The work and arguments discussed thus far in this section have focused on interpreting the question of how, at a general level, people see when they are viewing moving images. The theories advanced as yet however neither explain nor discuss the usefulness of attending to the patterning of eye tracking studies. Moreover the examples and visualisations we have shown of eye tracking studies in the earlier sections of this article were undertaken with a sample of people who were likely to have similar viewing perspectives, and as might therefore be expected showed distinct patterns in the ways that people view particular information. Indeed the data that would be needed to tell us to what extent such viewing patterns were universal – that is supported by studies and theories of the ways in which the human brain processes – and to what extent they were situationally and biographically constituted for this particular group of participants still does not exist as far as we know. Such work would be of high value given the increasing globalisation of both entertainment industries and forms of activism that use visual media; where films may be distributed in markets distant to the original context to which audience experience is understood. Indeed, studies of how people learn to look and know, undertaken in culturally specific contexts definitely reveal that where we look and what we see is contingent on processes of learning and apprenticeship, and therefore specific to complex environments.

Vision, Learning and Knowing

Eye tracking studies have shown us that there are sometimes similarities and patterns in the ways people view and remember complex images (Norton and Stark 1971), although if present such patterns are easily changed through instruction (Yarbus 1967; Tatler 2014). We have seen in the earlier sections of this manuscript that participants in studies have consistently fixed their gaze on the faces of film characters (Figs 2, 3), and that visual attention may become focused on a film character whose story line commands (or affords) particularly powerful affective and/or empathetic connections for viewers. Further eye tracking research would be needed to underpin any proposals that such ways of viewing are both gendered and culturally specific, however existing research in visual and media anthropology indicates that this is likely to be the case. Two bodies of literature are relevant here. First the applied visual and media anthropology literature, and second the anthropology of vision.

Applied visual and media anthropology studies (Pink 2007) focus on using anthropological understandings of media, along with audiovisual interventions (often in the form of filmmaking processes and film products) to work towards new forms of social and public awareness, and societal change. This work draws on and advances a strand in film studies developed in the work of Laura Marks, who has advanced the idea of the ‘embodied viewing experience’ (2000: 211). Marks, whose work focuses on intercultural cinema has argued that as ‘a mimetic medium’ cinema is ‘capable of drawing us into sensory participation with its world’ (Marks 2000: 214). The notion of empathy as a route towards creating intercultural understanding through film is also increasingly popular in the visual anthropology literature (discussed in Pink 2015). While on the whole there has been insufficient research into the ways in which people view intervention films of this kind, one example that has been undertaken implies how viewer attention, and importantly viewer’s capacity to engage with and remember film narrative can depend on the ways in which they are able to affectively or empathetically engage with the experiences of film characters. Susan Levine’s media anthropology study of how viewers discussed a film made as part of a South African HIV/AIDs intervention campaign, and which drew on local narratives to communicate the central message, is a good example (Levine 2007). Levine (unsurprisingly) found that participants engaged with the stories of film characters that followed locally relevant narratives, thus generating important lessons for filmmaking campaigns of this kind, where it is often difficult to communicate generic health messages to local audiences. The bridge between this type of anthropological understanding and a capacity to map viewer attention to faces and expressions within visual representations (Vassallo et al. 2009) may allow for more comprehensive understandings of why film is such a powerful medium for communication.

Anthropological studies of vision provide further evidence of the importance of attending to how seeing is situated. Indeed when vision is understood as a practice, rather than as a behavior, it is not just a situated practice, but it is a practice that is learned through participation. The anthropologist Cristina Grasseni has developed a theory of what she calls ‘skilled vision’ though which to explain this (Grasseni 2004, 2007, 2011), as she puts it:

The “skilled visions” approach considers vision as a social activity, a proactive engagement with the world, a realm of expertise that depends heavily on trained perception and on a structured environment (Grasseni 2011).

Emphasizing that skilled visions are ‘positional, political and relational’ as well as sensuous and corporeal, Grasseni points out that ‘Because skilled visions combine aspects of embodiment (as an educated capacity for selective perception) and of apprenticeship, they are both ecological and ideological, in the sense that they inform worldviews and practice’ (Grasseni 2011). As Pink has shown through her work on the Spanish bullfight, what one sees when viewing the performance is highly contingent on how one has learned to view, ones own empathetic embodied ways of sensorially and affectively ‘feeling’ the performance at which a visual representation was created, or how one’s existing ways of knowing and understanding the world can inform perception (Pink 1997; 2011). For example, consider the different ways in which Figure 5, or a film sequence around the same performance, would be interpreted by a bullfighting fan and an animal rights activist. Each will have learned how and what to know about this performance through different trajectories. Whilst an eye tracking investigation of respective subjects might show somewhat similar patterns (especially if bottom up mechanisms dominate), the semantic interpretation of the visual input by respective viewers may be completely different. How such information content might be assessable, or not, through evaluation of bottom up or top down type mechanisms involved with visual processing will be a major challenge for interpretation of information as complex as can typically be perceived in a movie.


Figure 5. How emotive content, as is common in many films, may influence the perception of visual images even if the same information is present to viewers remains a major topic for exploration. For example, we know that the bullfight is interpreted, and affectively experienced, very differently when viewed by bullfight fans and animal rights activists. We also know that learning how to view the bullfight, as a bullfight fan, is a process of cultural apprenticeship (see for example Pink 1997). Consider how for the above image the action of a bull fight could promote very different visual behavior depending upon cultural context, whether a subject was a bullfighting fan and an animal rights activist, or the representation was depicted as animation instead of real life, or motion compared to a still image. Copyright: Sarah Pink.

Figure 5. How emotive content, as is common in many films, may influence the perception of visual images even if the same information is present to viewers remains a major topic for exploration. For example, we know that the bullfight is interpreted, and affectively experienced, very differently when viewed by bullfight fans and animal rights activists. We also know that learning how to view the bullfight, as a bullfight fan, is a process of cultural apprenticeship (see for example Pink 1997). Consider how for the above image the action of a bull fight could promote very different visual behavior depending upon cultural context, whether a subject was a bullfighting fan and an animal rights activist, or the representation was depicted as animation instead of real life, or motion compared to a still image. Copyright: Sarah Pink.

Bringing together measurement and monitoring data with anthropologically informed ethnographic ways of knowing, which are always collaboratively crafted and sensorially and tacitly known is increasingly common. For instance in energy research a number of projects seek to combine ethnographic and energy consumption measurement data (Cosar et al. 2013). Such an approach has not yet been integrated in eye tracking studies of movies, yet this would be the next step if we were to want to understand better the significance and relevance of the types of data and knowledge that eye tracking studies can offer us, for understanding film audiences. This however presents certain challenges, which both impinge on, but are not necessarily unique to, the use of eye tracking data in audience research. The first challenge is to generate sufficient interdisciplinary understanding between the approaches involved. This article has intended to initiate that process. That is it has explained how eye tracking and anthropological-ethnographic (that is at once theoretical and practical) approaches offer different, and differently theorised perspectives on the ways in which people look at and participate in the viewing of film. It has simultaneously however suggested that these different approaches and disciplines offer something to each other that enable new questions to be asked, and therefore is able to develop deeper understandings of how audiences view film.

Future work testing human visual behaviour with complex stimuli as are typically present in movies may help build our understanding of how humans sometimes process very complex information to build an understanding of our surrounding world, but sometimes also miss salient information in complex moving images such as the Gorillas in our midst study. Current theories suggests that perceptual blindness to salient and recognisable stimuli when our attention is captured by other competing stimuli that impose a cognitive load to process (Simons and Chabris 1999; Levin et al. 2000; Memmert 2006), but more fully exploring effects of narrative or instructions, character gaze and other potential top down mechanisms will likely be fruitful contributions to our knowledge on perceptual blindness. Indeed, as discussed above in relation to anthropological and ethnographic factors, the potential role of factors like experience do appear to modulate the ability of subjects to detect a gorilla in a perceptual blindness type test (Memmert 2006), potentially suggesting that future investigations on eye tracking and movies should consider the broad range of human experience that can influence our perception. This type of research is likely to also provide for richer understandings in some ethnographic studies as researchers will have, possibly for the first time, access to precise quantitative data on whether an observer actually failed to even look at certain objects in a scene; or indeed if such information, like an unexpected gorilla in a basketball game, was viewed but not directly perceived (Memmert 2006). Many individual scenes within a film are typically short of about 4s duration and so it is often only possible for viewers to process a small percentage of the entire visual presentation in detail, especially in cases where movies are subscripted (Smith 2013). This means that elements of a film that might be essential to the complete comprehension of narrative story line may be easily missed by a percentage of an audience depending upon their individual knowledgebase, linguistic skills, attention and motivation; and eye tracking potentially offers film makers with a useful vehicle to test different demographic groups to better understand how different components of scenes might be constructed to enhance viewer experience, and also build our understanding of how we process very complex environmental information.



Acknowledgements. We are very grateful to Dr Craig Batty, Dr Claire Perkins and Dr Jodi Sita for discussions and permission to use images from their collaborative work with one of us (AGD), and for broader discussions with members of the Eye Tracking of the Moving Image research group. AGD acknowledges funding support from the Australian Research Council (LE130100112) for eye tracking equipment. We are grateful to Dr Lalina Muir for her careful proofreading of the manuscript.



Batty, Craig. 2011. Movies That Move Us: Screenwriting and the Power of the Protagonist’s Journey. Basingstoke: Palgrave Macmillan.

Batty, Craig, Dyer, Adrain, G., Perkins, Claire, and Sita, Jodi. 2015. Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative (Palgrave, forthcoming)

Cosar Jorda, P, Buswell, RA, Webb, LH, Leder Mackley, K, Morosanu, R, and Pink, Sarah. 2013. ‘Energy in the home: Everyday life and the effect on time of use.’ In The Proceedings of the 13th International Conference on Building Simulation 2013. Chambery, France. 25-28/8/2013.

Docter, P. 2009. Up. Disney-Pixar Motion Film.

Dorr, M, Martinetz, T, Gegenfurtner, KR, and Barth, E. 2010. ‘Variability of eye movements when viewing dynamic natural scenes.’ Journal of Vision 10 (28): 1-17.

Duchowski, Andrew. 2003. Eye tracking methodology: theory and practice. London: Springer-Verlag.

Dyer, Adrian, G., Found, Brian, and Rogers, Doug. 2006. ‘Visual attention and expertise for forensic signature analysis.’ Journal of Forensic Science 51: 1397–1404.

Goldstein, Robert, B., Woods, Russell,L., and Peli, Eli. 2007. ‘Where people look when watching movies: Do all viewers look at the same place?’ Computers in Biology and Medicine 37 (7): 957-964.

Grasseni, Cristina. 2004. ‘Video and ethnographic knowledge: skilled vision in the practice of breeding.’ In Working Images, edited by S Pink, L Kürti, and AI Afonso, 259-288. London: Routledge.

Grasseni, Cristina. 2007. Skilled Visions. Oxford: Berghahn.

Grasseni, Cristina. 2011. ‘Skilled Visions: Toward an Ecology of Visual Inscriptions.’ In Made to be Seen: Perspectives on the History of Visual Anthropology, edited by M. Banks and J. Ruby. Chicago: University of Chicago Press.

Horsely, Mike. 2014. ‘Eye Tracking as a Research Method in Social and Marketing Applications.’ In Current Trends in Eye Tracking Research, edited by M Horsley et al., 179-182. Springer, London.

Ingold, Tim. 2000. The Perception of the Environment. London: Routledge.

Ingold, Tim. 2010. ‘Ways of mind-walking: reading, writing, painting.’ Visual Studies, 25 (1): 15–23

Ingold, Tim. 2011. Being Alive. Oxford: Routledge. p 95.

Jovancevic-Misic, Jelena, and Hayhoe, Mary. 2009. ‘Adaptive Gaze Control in Natural Environments.’ Journal of Neuroscience 29 (19): 6234–6238. DOI:10.1523/JNEUROSCI.5570-08.2009.

Kustov, Alexander, A., and Robinson, David Lee. 1996. ‘Shared neural control of attentional shifts and eye movements.’ Nature 384: 74–77.

Levine, Susan. 2007. ‘Steps for the Future: HIV/AIDS, Media Activism and Applied Visual Anthropology in Southern Africa.’ In Visual Interventions, edited by S. Pink, 71-89. Oxford: Berghahn.

Marks, Laura. 2000. The Skin of the Film: Intercultural Cinema, Embodiment, and the Senses. Durham and London: Duke University Press

Martinez-Conde, Susana, Macknik, Stephen, L., and Hubel, David, H. 2004. ‘The role of fixational eye movements in visual perception.’ Nature Neuroscience 5: 229–240.

Memmert, Daniel. 2006. ‘The effects of eye movements, age, and expertise on inattentional blindness.’ Consciousness and Cognition 15 (3): 620–627.

Mital, Parag, K., Smith, Tim,J., Hill, Robin, L., and Henderson, John, M. 2011. ‘Clustering of gaze during dynamic scene viewing is predicted by motion.’ Cognitive Computation 3, 5–24.

Nodine. Calvin, F., Mello-Thoms. Claudia, Kundel. Harold, L., and Weinstein, Susan, P. 2002. ‘Time course of perception and decision making during mammographic interpretation.’ American Journal Roentgenol 179: 917–923

Norton, David, and Stark, Lawrence. 1971. ‘Scanpaths in eye movements during pattern perception.’ Science 171: 308–311.

Parkhurst, Derrick, Law, Klinton, and Niebur, Ernst. 2002. ‘Modeling the role of salience in the allocation of overt visual attention.’ Vision Research 42: 107–123.

Pashler, Harold. 1998. Attention. Hove, UK: Psychology Press Ltd.

Russo, Francesco, Pitzalis, Sabrina, and Spinell, Donatella. 2003. ‘Fixation stability and saccadic latency in elite shooters.’ Vision Research 43: 1837–1845.

Pink, Sarah. 1997. Women and Bullfighting. Oxford: Berghahn.

Pink, Sarah. 2007. (ed) Visual Interventions. Oxford: Berghahn.

Pink, Sarah. 2011. ‘From Embodiment to Emplacement: re-thinking bodies, senses and spatialities.’ In Sport, Education and Society (SES), special issue on New Directions, New Questions. Social Theory, Education and Embodiment 16(34): 343-355.

Pink, Sarah. 2013. Doing Visual Ethnography, 3rd edition. London: Sage.

Pink, Sarah. 2015 Doing Sensory Ethnography, 2nd edition London: Sage.

Simons, Daniel, J., and Chabris, Christopher, F. 1999. Gorillas in our midst: sustained inattentional blindness for dynamic events. Perception 28(9): 1059-1074.

Smith, Tim. J., and Henderson, Jordan. 2008. ‘Edit blindness: The relationship between attention and global change blindness in dynamic scenes.’ Journal of Eye Movement Research 2: 1–17.

Smith, Tim, J. 2013. ‘Watching you watch movies: Using eye tracking to inform cognitive film theory.’ In Psychocinematics: Exploring Cognition at the Movies edited by A. P. Shimamura, 165-191. New York: Oxford University Press

Smith, T, Levin, D, and Cutting J. 2012. ‘A Window on Reality: Perceiving Edited Moving Images.’ Current Directions in Psychological Science 21(2): 107-113. doi: 10.1177/0963721412437407

Smith, Tim, j., and Mital, Parag, K. 2013. ‘Attentional synchrony and the influence of viewing task on gaze behaviour in static and dynamic scenes.’ Journal of Vision 13 (8): 16.

Stafford, Barbara Maria. 2006. Echo Objects: the Cognitive Work of Images. Chicago: University of Chicago Press.

Tatler, Ben, W. 2007. ‘The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions.’ Journal of Vision 7(14): 4, 1–17. http://, doi:10.1167/ 7.14.4.

Tatler, Ben, W. 2014. ‘Eye Movements from Laboratory to Life.’ In Current Trends in Eye Tracking Research edited by Horsley et al., p17-35.

Tatler, Ben, W., and Kuhn, Gustav. 2007. ‘Don’t look now: The magic of misdirection.’ In Eye Movements: A window on mind and brain, edited by R van Gopel, M Fischer, W Murray and R Hill, 697–714. Amsterdam: Elsevier.

Tatler, Ben, W., Hayhoe, Mary, M., Land, Michael, F., and Ballard, Dana, H. 2011. ‘Eye guidance in natural vision: Reinterpreting salience.’ Journal of Vision 11 (5): 1–23., doi:10.1167/11.5.5.

Tatler, Ben, W., Kirtley, Claire, Macdonald, Ross. G., Mitchell, Katy, MA., and Savage, Steven, W. 2014. ‘The Active Eye: Perspectives on Eye Movement Research.’ In Current Trends in Eye Tracking Research, 3-16. DOI 10.1007/978-3-319-02868-2_16 Print ISBN 978-3-319-02867-5 Online ISBN 978-3-319-02868-2

Treuting, Jennifer. 2006. ‘Eye tracking and cinema: A study of film theory and visual perception.’ Society of Motion Picture and Television Engineers 115 (1): 31-40.

Tosi, Virgilio, Mecacci, Luciano, and Pasquali, Elio. 1997. ‘Scanning eye movements made when viewing film: Preliminary observations.’ International Journal of Neuroscience 92 (1/2): 47-52.

Vassallo, Suzanne, Cooper, Sian, LC., and Douglas, Jacinta, M. 2009. ‘Visual scanning in the recognition of facial affect: Is there an observer sex difference?’ Journal of Vision 9: 1-10.

Vig, Eleonora, Dorr, Michael, and Barth, Erhardt. 2009.’ Efficient visual coding and the predictability of eye movements on natural movies.’ Spatial Vision 22 (2): 397-408.

Yarbus Alfred, L. 1967. Eye Movements and Vision. New York: Plenum.

Adrian Dyer is an Associate Professor in Media and Communication at RMIT University (Australia) investigating vision in complex environments. He is an Alexander von Humboldt Fellow (Germany) and a Queen Elizabeth II Fellow (Australia), and has completed postdoctoral positions at La Trobe University and Monash University (Australia), Cambridge University (UK), and Wuerzburg and Mainz Universities (Germany).

Sarah Pink is Professor of Design and Media Ethnography at RMIT University (Australia). She is visiting/guest Professor at Halmstad University (Sweden), Loughborough University (UK), and Free University Berlin (Germany). Her most recent books include Situating Everyday Life (2012), Doing Visual Ethnography 3rd edition (2013) and Doing Sensory Ethnography 2nd edition (2015).

Volume 25, 2015

Themed Issue: Eye-Tracking the Moving Image

Edited by Sean Redmond & Craig Batty


1.  Seeing into Things: Eye Tracking the Moving ImageSean Redmond & Craig Batty

2.  Movement, Attention and Movies: the Possibilities and Limitations of Eye Tracking?Adrian G. Dyer & Sarah Pink

3.  How We Came To Eye Tracking Animation: A Cross-Disciplinary Approach to Researching the Moving Image –  Craig Batty, Claire Perkins & Jodi Sita

4.  Sound and Sight: An Exploratory Look at Saving Private Ryan through the Eye-tracking Lens – Jenny Robinson, Jane Stadler & Andrea Rassell

5.  Subtitles on the Moving Image: An Overview of Eye Tracking Studies – Jan Louis Kruger, Agnieszka Szarkowska & Izabela Krejtz

6.  From Subtitles to SMS: Eye-Tracking, Texting and Sherlock – Tessa Dwyer

7.  Our Sherlockian Eyes: the Surveillance of Vision – Sean Redmond, Jodi Sita, & Kim Vincs

8.  Politicizing Eye-tracking Studies of Film – William Brown

9.  Read, Watch, Listen: A commentary on eye tracking and moving imagesTim J. Smith

Editorial – Seeing Into Things: Eye Tracking the Moving Image – Sean Redmond & Craig Batty

Seeing into Things

We chose Seeing into Things: Eye Tracking the Moving Image as the title of this special edition to foreground the importance of reaching beyond – and beneath – the surface of the screen and the worlds that it creates and envisions. Through the empirical data that eye tracking affords us we are able to evidence and account for the depth in perception and sensibility that accompanies or anchors viewing. Seeing into Things is also recognition of the layers – or epidermi – of technological vision: depth cues, focal length, camera movement, and the delicious qualities of mise-en-scene all invite or demand that the image is looked into. There is much to observe across the textures and texturality of the screen. Eye tracking technology sees into the eyes of the viewer who peers – pierces – into the immersive world of the screen, factual or fictional. We see beauty in this alignment between the eye tracker, the viewer, and the screen. As this special edition finds, Seeing into Things is an enriching and intoxicating way of (re)discovering the complexities of viewing the moving image.

The poetry in and of seeing is not simply experiential, but connected to neurological, anatomical and cognitive processes. It is also connected to culture, discourse and ideology, where seeing into things is always gendered, classed and raced, amongst other encultured practices and modes of being in the world. Seeing into Things enables us to see into ourselves and into the complex and sometimes messy relationships between biology and culture, the human and technology, and between eye, brain, body and ear.

This last carnal conjunction is essential to the work being undertaken in this special edition because Seeing into Things is also meant to critically draw attention to the ocularcentric way through which the world is presently imagined to be experienced. Our position here is not to support this insightful supremacy, but rather to offer challenges and counter-points to it. When we see into things in this special edition it is with the need to recognise the centrality of hearing to seeing; of touching to viewing; and of the incorporation of the full human sensorium as it is taken up and in, and extends itself towards, the screen worlds that move and affect it. As Vivian Sobchack (2000) observes:

As “lived bodies” (to use a phenomenological term that insists on “the” objective body as always also lived subjectively as “my” body, diacritically invested and active in making sense and meaning in and of the world), our vision is always already “fleshed out”–and even at the movies it is “in-formed” and given meaning by our other sensory means of access to the world: our capacity not only to hear, but also to touch, to smell, to taste, and always to proprioceptively feel our dimension and movement in the world. In sum, the film experience is meaningful not to the side of my body, but because of my body.

The eye, brain, body and ear conjunction is also recognition that in order to understand viewing processes, one needs to incorporate different academic disciplines and approaches; from the vision sciences, neuroscience and linguistics; from ethnography and anthropology; and from the arteries and veins of creative practice, to the orbital concerns of the phenomenological. To do our work properly, then, Seeing into Things requires the eyes, brains, bodies and ears of scientists, anatomists, anthropologists, musicologists, filmmakers, screenwriters, and screen theorists, amongst others. It is this exciting arts-science nexus that this special edition draws uniquely from and is built upon, offering a foundational intervention into the way one makes critical and creative sense of viewers’ engagement with the moving image.

But what are the origins of this interdisciplinary approach? From where did the impetus for Seeing into Things come? Let us now return to the origins of the formation of the research group that drives many of the articles in this edition. Let us set the cinematic mood for some groundbreaking eye tracking research.

In the Mood for Eye Tracking Research

After a screening of the film In the Mood for Love (Kar-wai, 2000), Sean Redmond, a film and television scholar, mentioned to neuroscientist and anatomist Jodi Sita, how its rich colour scheme, expressionistic lighting and meandering narrative had fascinated and affected him. He suggested that he was sure his eyes were focusing on these visual elements, as they were being foregrounded, but also that they ‘wandered’ about the screen, choosing to look at motifs, characters and textures at their own volition, and where ‘mood’ took them. Sean contended that his eyes, or the way he viewed film, were both under the command of the film’s narrative and aesthetics, but were also free to discover the opulent fictive world for themselves. He suggested that viewing is an embodied experience.

Jodi responded quite directly: how do you know this? What evidence do you have? She continued that perception and comprehension are cognitive processes, and that what the eyes attend to in any viewing context can be measured objectively and understood through eye tracking, as well as other physiological technologies such as the measurement of pupil dilation. In that moment an arts-science debate was ignited, and an idea for an empirically driven eye tracking the moving image research group was born. They were now in the mood for some landmark empirical research of the moving image.

Jodi and Sean set up the Melbourne-based Eye Tracking and the Moving Image Research group at the end of 2012. They had two central goals in bringing the group together: one, they wanted to utilise eye tracking technology more centrally in the analysis and examination of the moving image; and two, they wanted to draw together scholars and practitioners from the Sciences, and the (Creative) Arts and Humanities so that different modes of enquiry, and theoretical and methodological apparatus, were placed in the same analytical arena (see Jodi’s account of the group’s formation in this edition). It was felt that having a room full of filmmakers, artists, film and cultural theorists, screenwriters, visual ethnographers, vision scientists and neuroscientists would generate new and exciting conversations and deliberations about how viewers engage with the moving image. To employ a games analogy, Jodi and Sean felt it was as if we had we all pinned our tails to different parts of the donkey, but that through opening our eyes together, we would all finally get to see and comprehend its full and glorious anatomy.

Their desire was to build upon existing research that drew disparate disciplines together, extending the type of work being conducted in arts-science research centres such as the Department of Psychology, Neuroscience and Behaviour’s NeuroArts Lab at McMaster University, Hamilton, Ontario. The formation of the group created a strong commitment to inter-disciplinary and cross-institutional relationships, and to what was considered a necessary dialogue between different disciplines united by a shared desire: to investigate vision regimes in relation to the affecting power and beauty of the moving image.

The utilisation of eye tracking technology was thus not born out of a technological determinism, but as a tool to bridge and fuse different approaches and methodologies in order that new findings, new knowledge, and new ways of understanding seeing and sensing images could emerge. This approach drew upon work by scholars who had already ‘crossed the line’, so-to-speak, including the work of Uri Hasson, Ohad Landesman, Barbara Knappmeyer, Ignacio Vallines, Nava Rubin, and David J. Heeger; who had already introduced to the field the idea of neurocinematics, the neuroscience of film, and the ‘inter-subject correlation analysis (ISC) … used to assess similarities in the spatiotemporal responses across viewers’ brains during movie watching’ (2008: 1). But you may ask: what is eye tracking?

But what is Eye Tracking?

Eye tracking enables us to empirically measure what viewers look at when watching screen-based media. The technology allows us to gather data from all platforms, interfaces and portals through which the moving image is distributed and consumed, including the television set, the cinema screen, the computer, and mobile devices such as smartphones and tablets. It also enables us to enter different types of environment to record viewing patterns, including the home and public spaces, such as the mall or the commute to work.  Analysis of viewers’ engagement with the moving image includes assessing where they look; interpreting why and how they look within determined visual fields or Areas of Interest (AOIs); and exploring what they feel or experience when they look. One can employ eye trackers to analyse viewer engagement with elements such as narrative, cinematography, editing, aesthetics, sound design and score, and characterization – elements that feature in many of the articles in this special edition. To do so, however, requires not only recognition and understanding of the languages employed in telling moving image stories, but also engagement with the science of the eye and the physiological and cardio-vascular transformations that take place when screen content is being viewed. To this end, a range of supportive investigative and methodological tools is also often employed, including the measurement of pupil dilation and the monitoring of heart and breathing rates.

Eye trackers work by shining infrared light onto the eye, which is then reflected back and captured by a sensor. The way we view images involves rapid eye movements that alter between points of fixation, in which the visual system gathers information and quickly moves between fixations called saccades. The sensor allows these eye movements (fixations) to be tracked, and specialist software then visualises these movements in the form of heat maps, swarms and gaze plot graphs. Statistical data can be extracted from these visualisations, and an interpretative framework can also be employed. For example, heat maps show effectively the weighting of all the viewing that occurred in a given scene, and gaze plots show the location of the fixations as well as the sequence in which they were made. To draw conclusions from this data, an area of interest analysis can be performed in which the number of times viewers visited specific objects or areas can be computed. By then analysing the amount of processing time spent in these areas, researchers are able to consider things such as the number of return visits made, building a picture of what was concentrated on. As can be seen from the articles in this special edition, analysis of this data draws us into open, and sometimes competing, exchanges about what has been discovered and why.

Double Dialogue

The articles in this edition are engaged in what we would like to define as a double dialogue. Each of the articles stands in their own right as discrete research, and yet they are also engaged in reflective and reflexive commentary. This dialoguing happens both within articles (see, for example, Redmond et al.) and also across articles (see, for example, Dyer and Pink, who draw upon the work of Batty et al.), to explore the possibilities and limitations of eye tracking research. The conversations that emerge enable the arts-science nexus to gather its power, since the different approaches to the text and their findings are foregrounded, drawn into syncretic union, or else are openly contested (see, for example, Brown and Smith’s engagement with each other’s work).

One can read this special edition, then, as the literal embodiment of the grounded work that takes place in a shared, respectful and mutually supportive interdisciplinary working environment. The virtues of the double dialogue approach to a special edition such as this are many, but most importantly one can see the value of the research on its own terms, and see how it has grown out of a dynamic research environment. We are able to witness directly how contributors have worked with and for each other, and how they are able to accommodate and enrich each other’s understandings of the texts under investigation. By seeing into things in this way, powerful research stories emerge.

The Stories of Seeing into Things

We have chosen to present the articles in this edition in a way that tells a research story, where conversations emerge and narrative arcs progress within and across the work presented. We have ordered them in a way that creates a narrative pattern; one can see ideas and themes introduced in one article picked up and developed in another. The story is also one that moves across screen media, from film to television and from features to serials. The special edition opens with a master shot of the field and closes with a tying up of the narrative threads that have been presented throughout the special edition. That is not to say, as previously noted, that each article does not stand as discrete research, but to recognise the beautiful truth of bringing overlapping and communicative research stories together like this.

The stories of Seeing into Things are also about the research environment that has been cultivated through the work of the Eye Tracking and the Moving Image Research group, and in the process of putting this special edition together. New international research relationships have been fashioned; and new friends have been made. We find in the inter-disciplinary stories of this special edition a range of content, styles and approaches in a deliberate attempt to engage readers (other researchers and practitioners) in recognising the power of crossing the research line.

Adrian G. Dyer (a vision scientist) and Sarah Pink (a visual anthropologist and ethnographer) open the edition with a critical, holistic overview of eye tracking research in relation to the screen. In Movement, Attention and Movies: the Possibilities and Limitations of Eye Tracking? Dyer and Pink suggest that film narrative and the conditions of viewing have a significant influence on gaze relations and subjectivity, but that there is yet limited work on the complexities and variables of such connections and alignments. Drawing upon their own research fields in vision science and anthropology and ethnography, Dyer and Pink demonstrate the value and importance of inter-disciplinary scholarship to understanding the poetics and politics of viewing the moving image. To make their observations they draw upon research carried out by Craig Batty, Claire Perkins and Jodi Sita, whose article naturally follows in this edition.

In How We Came To Eye Tracking Animation: A Cross-Disciplinary Approach to Researching the Moving Image, Batty, Perkins and Sita draw upon their pilot study of eye tracking a time-lapse montage sequence from the film Up (a study that also included Dyer). In their article, they outline how the ‘research journey’ of their project took shape, and they discuss how each of them came to the study from their individual disciplines: screenwriting, screen studies and neuroscience. They suggest that their own discipline backgrounds initially influenced and shaped both their research methodology and also the analysis of the research findings. However, they then point towards the layering of these approaches, as a way to fully discover how the montage scene under analysis can be best understood. This inter-disciplinary approach is fully taken up in the next article.

In Sound and Sight: An Exploratory Look at Saving Private Ryan through the Eye Tracking Lens, Jenny Robinson, Jane Stadler and Andrea Rassell place emphasis on the connection between looking and hearing; or, seeing and sounding. By focusing on sonic aesthetics that, arguably, direct viewer attention as much as any other film aesthetic, they use a sound-on, sound-off methodology to test their hypothesis. The resulting discussion will be as useful to film practitioners as it is to screen and eye tracking scholars.

The question of utility, or practice, is taken up in Jan Louis Kruger, Agnieszka Szarkowska and Izabela Krejtz’s, Subtitles on the Moving Image: An Overview of Eye Tracking Studies. They look towards new cognitive research horizons in the field of audiovisual translation (AVT). Seeing limitations and weaknesses in the current eye tracking research being conducted on subtitling, they argue that attention needs to be directed to the actual processing of verbal information. Drawing upon data gathered from numerous eye tracking studies, they contend that it demonstrates the way shot changes, language and subtitles impact upon cognitive processes, and how this has implications for subtitling and captioning.

Drawing on her doctoral research, Tessa Dwyer also explores subtitling but in relation to the BBC television series Sherlock. Dwyer’s fascinating article focuses upon its use of post-production (though scripted) free-floating text.  In From Subtitles to SMS: Eye-Tracking, Texting and Sherlock, Dwyer offers an in-depth analysis of viewer engagement with the show, exploring notions of reading vs. viewing, and attraction vs. distraction. Dwyer also draws upon ideas raised by Sean Redmond, Jodi Sita and Kim Vincs in their article, Our Sherlockian Eyes: the Surveillance of Vision.

In this article, Redmond, Sita and Vincs offer us a unique interior dialogue as they each read the eye tracking data gathered through their own discipline filters while also dialoging with each other’s approaches. Each author sees the hands of direction, misdirection, movement, surveillance and relationality in the scene under analysis, with agreement that vision is never simply cognitive or anatomical but multi-modal and haptic. They employ eye tracking data in ways that recognise the phenomenological embedded in the viewing experience, and which can be ‘extracted’ from what are normally seen or interpreted as qualitative findings.

The final two articles in this special edition then engage in a different type of dialogue or debate. In William Brown’s Politicizing Eye-tracking Studies of Film, he draws upon the (short) history of eye tracking and the moving image research, and specifically the work of Tim J. Smith, to demonstrate its theoretical and applied limitations. While Brown sees great value in eye tracking research he draws our attention to its obviousness in terms of telling us what we may already know. Nonetheless, Brown also outlines where the research may or should go and supplies instructive illustrations to help us chart new courses and terrains.

In what is also a critical commentary on the articles contained in this special edition, Tim J. Smith responds to Brown’s article, pointing to what he sees are misconceptions. In Read, Watch, Listen: A Commentary on Eye Tracking and Moving Images, Smith reflects on his own ground-breaking work as he also summarises and problematizes the articles in this edition. Working from a position as a cognitive psychologist and from within a version of neo-formalist film criticism, Smith’s position on eye tracking is persuasive if caroled.

Combined, the articles in this special edition reflect on the past, present and future of eye tracking and the moving image research and include critiques of the very nature of research itself. The case study material that the articles draw from is predominately from mainstream film and television texts but these are explored through new vectors. Unlike Smith’s work, the authors extend their utilization of eye tracking data to consider the cultural, the ethnographic and anthropological, the ideological and the phenomenological, albeit within the house of film and television aesthetics and genre. We hope that other researchers will draw inspiration and insight from the studies undertaken in this edition.

Future Research Directions

As indicated, the Eye Tracking and the Moving Image Research group features practitioner-academics who are interested in how research can be both carried out and disseminated through creative practice. As two of the articles in this edition signal, there is interest and expertise in sound design and scoring, and in screenwriting. Both of these aspects, which speak to the broader gamut of film and television-making practices, have found natural positions within the research undertaken to date, and also feature in two forthcoming book chapters authored by the group’s members. What is of special interest to us in the near future is how we might use these practices to further develop research methods and research outputs. For example, rather than relying on pre-existing moving image texts, what if we were to make our own? How might we use specific practices – sound, screenwriting – in order to influence the eye tracking experiments that we conduct?

One idea is for the group to make one or more short films in order to test patterns of viewer engagement, where narrative and aesthetics are controlled by the researchers, thus becoming a creative practice research variable. Another idea is to analyse eye tracking data alongside aspects such as the score and the screenplay, in order to make original connections between the source text – intentionality – and its reception. We should also consider how the scientific data provided by the research – heat maps, gaze plots, etc. – might be used as the basis of a creative work in and of itself, such as an artwork or another moving image text. Andrea Rassell, Sean Redmond, Jodi Sita and Darrin Verhagen are currently engaged in public projection and installation projects that use the colours, spirals and vortexes of eye tracking data within thematised artworks.

The make-up of the group has also resulted in some interest in how viewers engage spatially and environmentally with moving image texts, posing questions such as: does the viewing environment alter gaze patterns? How might room set-up and screen size change where and for how long people look at a defined area of interest? In this way, the group might seek to add ethnographic methods to the studies that take place, allowing us to add another set of research variables that could produce interesting and original results. Depending on the context, this type of research would also be of use to the screen industry – distributors, cinema groups, screen manufacturers, interior designers, etc.

We are mindful, nonetheless, where others might take eye tracking research. It is already being used in the commercial moving making industry and one of the worries is that it will become a device to reduce production costs as filmmakers use the data to literally paint the screen by numbers. Film and television are artforms, they beautify the world and they enrich our lives. All members of the eye tracking and moving image research group want to employ eye tracking technology to get to know and understand this beauty, and to fully comprehend what the viewer sees, hears and feels when they watch Fred Astaire dance, Ryan Gosling seduce, or Sherlock deduce and detect.


Hasson, Uri, Landesman, Ohad, Knappmeyer, Barbara, Vallines, Ignacio, Rubin, Nava, and Heeger, David J. 2008. Neurocinematics: The neuroscience of film. Projections, 2(1), 1-26

Sobchack, Vivian.2000. What My Fingers Knew: The Cinesthetic Subject, or Vision in the Flesh, Senses of Cinema, Issue 5, available at: (accessed 19th January, 2015).

Volume 24, 2014

Themed Issue: Intermediations

Edited by Kevin Fisher and Holly Randell-Moon


1. Editorial Introduction — Kevin Fisher and Holly Randell-Moon

2. Animating Ephemeral Surfaces: Transparency, Translucency and Disney’s World of Color  — Kirsten Moana Thompson

3. Vertical Framing: Authenticity and New Aesthetic Practice in Online Videos — Miriam Ross

4. Attached To My Devices: Across Individual, Collective and Panspectric Worlds — John Farnsworth

5. The Ecstatic Gestalt in Werner Herzog’s Cave of Forgotten Dreams — Kevin Fisher

6. Intermediality and Interventions: Applying Intermediality Frameworks to Reality Television and Microblogs — Rosemary Overell

7. ‘God Hates Fangs’: Gay Rights As Transmedia Story in True Blood — Holly Randell-Moon

8. We are the Borg (in a good way): Mapping The Development Of New Kinds Of Being And Knowing Through Inter- and Trans-Mediality — Anne Cranny Francis

We are the Borg (in a good way): Mapping The Development Of New Kinds Of Being And Knowing Through Inter- and Trans-Mediality — Anne Cranny Francis

Abstract: Digital technologies have enabled new ways of communicating and relating to others and this has fundamental consequences for being and for meaning. In this paper I map the development of concepts of intermediality and transmediality that are used to describe textual practice and audience engagement in order to explore these changes to communication practice. At the same time I explore the new kinds of audience engagement enabled by this technology, which includes active participation in the reconstruction of older narratives in new media and the potential this affords for new meanings. It also includes the dissemination of stories, old and new, across multiple platforms by both makers and audiences, who themselves become makers, and the proliferation of stories and meanings this enables. Finally I consider the possibilities for co-creationmy hardware, your software (or vice-versa)which can enable new forms of sharing and mutual knowledge-formation.

Sherlock (BBC, 2010-- )

Sherlock (BBC, 2010– )

1. On thinking about inter- and trans-

The research for this paper led me through a range of ideas and arguments about the meanings of intermediation and transmediation, as well as their relationship to intertextuality (for example, Bakhtin 1984; Jenkins 2006; Herzogenrath 2012; Stein and Busse 2012; Phillips 2012). It led me to think about a multiplicity of texts that are all inter in some way—either intertextually related texts and the kinds of meanings they make or intermediated narratives that tell their story across a range of media and platforms—and about texts, producers and audiences that are most definitely trans—deploying a range of media and platforms to create a composite and complex world, engage with that world, and generate new meanings. This textual multiplicity in the contemporary media environment in turn raised questions about what has caused or generated these differing ways of telling a story and what is the significance of these different modes of story-telling: whether this reflects simply a change in technology (if that is ever truly simple) or if that change has consequences that move far beyond the material technologies involved—the material artefacts and related communication practices—to our ways of thinking and of being in the world.

My argument is that digital technologies have enabled new ways of communicating and relating to others and that this has fundamental consequences for being and for meaning. Further, we are only just starting to realise the possibilities and potential offered by this technology for new forms of relationship, knowledge creation and sharing. I work through these possibilities by reference to a range of texts that were suggested by my research and which recur in discussions of these new modes of story-telling and text production. My interest is not only in digital texts themselves, but also in the new forms of engagement they offer to readers, viewers and listeners to become active producers or makers of meaning alongside the creators of the work. This engagement includes our participation in the reconstruction of older narratives in new media and the potential this affords for new meanings; the dissemination of stories, old and new, across multiple platforms by both makers and audiences, who themselves become makers, and the proliferation of stories and meanings this enables; and finally the possibilities for co-creation—my hardware, your software (or vice-versa)—which can enable new forms of sharing and mutual knowledge-formation.

This exploration of shared storytelling and textual production occurs through my engagement with the theory used by media and cultural analysts to understand transformations in creativity, knowledge-formation and being. This work includes the concepts of intermediation, which explores the possibilities opened up by new media and focuses on the textual practices that enable new forms of audience engagement, and transmediation, which also explores the effect of new technologies on meaning-making but shifts its focus from textual practice to audience response. This is a subtle shift as both concepts essentially study the same phenomena (including both textual practice and audience responses), but it mirrors what Henry Jenkins called the development of ‘convergence culture’: “the flow of content across multiple media platforms, the cooperation between multiple media industries, and the migratory behavior of media audiences who will go almost anywhere in search of the kinds of entertainment experiences they want” (2006, 2). As I will go on to argue, this convergence, this sharing of and linking via new media technologies, has the potential to transform our experience of the world and, along with that, our formation of knowledge and fundamental understandings of being.

2. The Consulting Detective and The Doctor

My first thought when beginning this paper was to use the BBC (British Broadcasting Corporation) version of Sherlock (2010-) as my example of intermediation. One of the things that attracted me to this text was that it re-tells Sir Arthur Conan Doyle’s original stories in such a fresh and engaging way, not only through the revised characterisations of its principals (Holmes, Watson, Lestrade, Moriarty, Mycroft) and the rapid editing and visual layering of the mise-en-scène that creates 21st century London as the technological and social successor to Conan Doyle’s 19th century industrial London, but also by the re-framing of familiar narratives to make them directly relevant to contemporary British society. For example, The Hound of the Baskervilles (1901) is re-written by Mark Gatiss as “The Hounds of Baskerville” (2012), a story about experiments with nerve agents and genetic mutation at a United Kingdom military base. The story focuses around a local man, Henry Knight who, as a child, saw his father torn apart by a giant hound on Dartmoor, near the Baskerville military establishment. Fear of the hound is produced not, as in the original story, by phosphorescence painted onto a large dog (though the local innkeepers have a large dog that they used to spread the ‘giant hound’ story to tourists), but by a hallucinogenic drug that is released into the air by nerve pads buried in a certain part of the nearby moors. We eventually discover that Knight’s father was killed accidentally when he wandered into the test area for these nerve pads. Under the influence of the air-borne toxin, Knight tripped and hit his head on a rock while attempting to run away from Baskerville scientist, Robert Frankland, who was wearing a gas mask and so appeared monstrous. The young Henry Knight witnessed his father’s accidental death but under the influence of the nerve toxin transformed the memory into the story of the giant hound, suggested to him by the initials H.O.U.N.D. on Franklin’s jumper.

Gatiss’s story uses elements of Conan Doyle’s original but reworks them into a contemporary story about the development of chemical and biological weapons and their production within an environment of secrecy that puts citizens’ lives at risk. The main characters (Sherlock Holmes [Benedict Cumberbatch], Dr Watson [Martin Freeman], Mycroft Holmes [Mark Gatiss] and Inspector Lestrade [Rupert Graves]) are also developed further in this story, including exploration of Sherlock’s ambiguous sexuality and his relationship with Watson, which is mapped explicitly onto the gay relationship of the local innkeepers. It is an engaging tale for the Conan Doyle enthusiast as it preserves the central motif of the narrative—the ghostly hound—but finds a way of re-presenting it that changes the story from one about evil aristocrats (the original Baskerville and his ruthless treatment of the local peasants) and modern greed (a villainous descendent of the original attempting to kill the successor to the title so that he inherits the family fortune) to one about weapons of mass destruction and government secrecy. It also presents a different ‘take’ on the sexuality of Holmes (also explored in the recent films directed by Guy Ritchie and starring Robert Downey Jr. as Holmes and Jude Law as Watson [2009, 2011]), opening up the possibility that he is either gay or bisexual whereas Conan Doyle presents Holmes as relatively asexual.[1] This re-working of the story and its characters constitutes the text as more than a period adaptation of Conan Doyle’s story, set in the late Victorian period with Holmes and Watson inhabiting the world of brougham cabs and steam trains. So is this an example of intertextuality or intermediality, with the literary creation of Conan Doyle cast as another text or medium that incorporates audience engagement with the story?

Perhaps the most obvious answer here is that this re-casting of the Holmes story is an example of intermediality, defined in an early essay by Dick Higgins as generated by “the desire to fuse two or more existing media” (1966). Berndt Herzogenrath notes, however, that Higgins saw intermediality not as the final text but as “‘the uncharted land that lies between’ … different media” (2012, loc. 129-142).[2] The intermediality generated by the Sherlock re-visioning of The Hound of the Baskervilles enables the presentation of different meanings (about weapons production and secrecy) while maintaining the bones of the original narrative (about the abuse of power and the production of fear). Herzogenrath notes that in Image-Music-Text (1977) Roland Barthes related intermediality to interdisciplinarity, which occurs:

… when the solidarity of the old disciplines breaks down—perhaps even violently, via the jolts of fashion—in the interests of a new object and a new language neither of which has a place in the field of the sciences that were to be brought peacefully together, this unease in classification being precisely the point from which it is possible to diagnose a certain mutation. (loc. 129)

This disciplinary transformation might seem a heavy burden to place on Sherlock, however it is certainly the case that this production of The Hound of the Baskervilles in a different medium tells different stories and interrogates different aspects of everyday life (military activity, government control, sexual identity) from Conan Doyle’s original. Moreover, as discussed further below, Mark Gatiss’s revision of The Hound of the Baskervilles might be seen as Bakhtin’s heteroglossia in practice with Gatiss’ story constituting another voice/telling that reiterates some original narrative elements whilst adding some and transforming others.

Jeremy Brett as/in Sherlock Holmes (1984-94)

Jeremy Brett as/in Sherlock Holmes (1984-94)

From a contemporary perspective the transfer from literary text to television may not seem a case of disciplinary violence, however, some time ago it did. When television was younger and literature was a canonical art form, the production of a literary work as a television program led inevitably to discussions of what was ‘lost’ by the transfer to such an ‘impoverished’ medium. It is only far more recently that we have understood that an intermediated work is offering something new and different, unconstrained by the disciplinary shackles of the past. This realisation enables Sherlock to be written as a contemporary series, while retaining characteristics of its Victorian predecessor—as distinct, for example, from the older BBC series, The Adventures of Sherlock Holmes starring Jeremy Brett (1984-1994) that retained the Victorian setting for the stories. This successful relocation of the narrative for Sherlock depends on viewers being able to read across media platforms without the disciplinary blinkers of an earlier time; they no longer consider the narrative confined to a particular space/time as defined by the originary text. Instead, as regular consumers of postmodern pastiche, they adjust their reading practice for the complex network of intertextual references and narrative transpositions that constitutes this contemporary Sherlock.

This is more than simply a change in forms of entertainment or the emergence of new technologies. This radical unhooking of the narrative from its original space/time and the ability to read the stories for a different age, with different values and different concerns, is characteristic of the specificity and locatedness (sometimes read as relativism) of postmodernity. The postmodern producer appreciates the origin of textual forms and practices and is able to re-mediate them in order to make new meanings for a new time. Similarly, the postmodern consumer is able to appreciate the multiplicity of (textual) voices that constitute their world, and is not constrained to one major or canonical form of textual address as the bearer of cultural value. This is a reflexive consumer who maps networks of meaning extending beyond the confines of a specific text and its world; the viewer of The Matrix (The Wachowski Brothers, 1999) who knows to ‘follow the white rabbit’ to a looking-glass world that is our own world, and yet is not.

One of the means by which this reflexive writing and viewing practice has been understood is through the concept of intertextuality—used to describe the practice of referencing from one text to another via a character, icon, event or interaction, along with the meanings associated with that reference. Based on the work of Mikhail Bakhtin who saw every text as the premise for and related to every other text, via the heteroglossia (different voices) that constitute(s) our world, intertextuality is a way of mapping the complexity of communication practices and the meanings they convey, along with the impossibility of exerting total control over the meanings associated with a particular utterance (1984, 278). Intertextuality is about meaning and its constant deferral (in Derrida’s terms) not just the appearance of story elements in different texts. So intermediality acknowledges the use of different media or platforms to convey a specific narrative while intertextuality is a way of exploring the meanings constructed.

One way of mapping the possible meanings generated by viewer engagement with (intermediated) texts—including their constant deferral of meaning—is through the notion of genre, since this is the way that we typically classify texts in order to render them accessible. In a sense genre imposes order on the chaotic heteroglossia of our world so that it does not become an incomprehensible Babel in which each individual is isolated by a wholly idiosyncratic reading/viewing/meaning-making practice. Not only does genre identify the conventions or characteristics shared by the texts that we recognise as similar and so enable us to trace their history, it also identifies the kinds of issues commonly addressed by those texts. Science fiction, for example, commonly addresses the relationship between human beings and their technology, how technology influences our lives and even the fundamental nature of human being. This is evident in science fiction works such as Blade Runner (Ridley Scott, 1982) and The Matrix (1999), both of which explore how we deploy technology and what this tells us about ourselves. And this exploration of identity and technology has its roots in what is commonly regarded the first science fiction text, Mary Shelley’s Frankenstein ([1818] 1982), written at the height of the first Industrial Revolution in western societies, when steam power had transformed work practices and social relationships, obliterating older forms of labour and the classes who performed it and reconstructing society into new classes. This industrial context may not be explicit in every reference to Frankenstein but it echoes through portrayals of the angry, sad and abandoned creature and his deluded creator, who become the robots/androids of today and us, their sometimes deluded or unaware creators and users.

Sherlock and Moriarty

Moriarty and Sherlock

One of the striking features of Sherlock is its stylistic similarity to Doctor Who, generated by the visual aesthetic, costuming, editing, and the enigmatic and manic main character, Sherlock/The Doctor and his mirror self, Moriarty/The Master. This might seem unsurprising given that the same creative team is responsible for both programs; writers, Stephen Moffat and Mark Gatiss devised the idea for Sherlock on the train to Wales to work on Doctor Who (which is produced in Cardiff). However, that fact does not explain the resulting program and its success. A generic analysis of the two series is suggestive, showing that both science fiction (Doctor Who) and detective fiction (Sherlock) have their story-telling roots in Gothic fiction, which was preoccupied with questions about being, the nature of the real, the nature of good and evil, and the dual (good/evil) nature of humanity. In science fiction those concerns are directed to an exploration of our relationship with new technologies, as discussed above.

The Doctor and The Master

The Doctor and The Master

Detective fiction focuses on the nature of knowing, personified in the detective, beginning with Edgar Allan Poe’s brilliant investigator, C. Auguste Dupin in stories the author described as “tales of ratiocination” (2010). Dupin employs a version of the scientific method (involving observation and analysis) leavened with imagination, which enables him to look beyond the obvious. Conan Doyle’s Sherlock Holmes is even more scientific in his practice, but with the same disdain for conventional ways of thinking. This deployment of scientific method in order to solve social (rather than scientific) problems focuses attention on the process of knowledge formation (how we know and understand our world and each other) and its role in our understanding of morality (whether good and evil are easily identified) and of being (whether human beings are simply good or evil). The contemporary BBC Sherlock continues this tradition of the scientific detective informed by an eccentric imagination that enables him to step outside conventionalised patterns of thought and assumption.

Intertextually, Doctor Who and Sherlock share the Gothic preoccupation with interrogating the nature of being and of knowledge, which is evident in some shared generic conventions and preoccupations, though each also has other specific interests—technology (science fiction), the social construction of good and evil (detective fiction). The value of intertextuality is that it enables us to see how these texts are constituted by the kinds of meanings they are making. It allows us to understand why two genres that we now consider quite different can have shared ontological and epistemological preoccupations, because of a common generic ancestor.

Like intertextuality, intermediality is about textual practice. We saw above that the interdisciplinarity that was generated by the postmodern recognition of diversity and difference (and hence the rejection of certainty, grand narratives and canonical textuality) enabled the production of a Sherlock that is not a period drama but a contemporary construct, telling stories of today’s world. At the same time, as the brief intertextual study of genre shows, it also deploys a conventional detective with an eccentric mix of scientific method and artistic creativity whose ‘ratiocination’ at times leads him to find villainy not in evil individuals, but in the government and its representatives. Intermediality is useful for mapping that kind of practice, where a narrative devised in one medium is transposed into another where it deploys meanings enabled by its original production, but also produces new and different meanings that are generated via this transposition.

3. Spirituality and stained glass

The stained glass windows in Christian churches deploy a similar practice, taking stories from one medium (the Biblical word of God) and realising them in another medium (coloured glass). Interestingly the windows feature a complex iconography that would appeal to the modern gamer, with icons emblematic of values and ideas that cluster around the central theme and its story arc but open up depths of spiritual meaning. One reading of these windows is that they told these stories for illiterate peasants who had no access to written versions of biblical tales. Roger Homan notes: “The great transept window at Canterbury known as the Biblia Pauperium (poor person’s bible), for example, depends upon an extensive visual vocabulary of symbols and an awareness of the supposed theological links between the biblical scenes featured in adjacent panels” (2005). In this way the windows acted as a point of meditation for the viewer, recalling the story and its religious significance. Homan notes also that many scholars believe that preachers used the windows as a reference point in sermons, especially those delivered in the vernacular of the uneducated. They could literally point to the visual representation of the story and explain their exegesis, so that later viewings of the window would recall not only the details of the story but its religious significance.

In his study, Religious Art in France XIII Century (1913) Émile Mâle begins by noting:

To the Middle Ages art was didactic. All that it was necessary that men should know—the history of the world from the creation, the dogmas of religion, the examples of the saints, the hierarchy of the virtues, the range of the sciences, arts and crafts—all these were taught them by the windows of the church or by the statues in the porch. (vii)

Mâle goes on to explain that this art is not easily decipherable to the modern viewer who may mistake elements of the works as purely figurative, bringing a momentary pleasure to the eye. By contrast: “In mediæval art every form clothes a thought; one could say that thought works within the material and animates it” (viii). Roger Homan adds to this an appreciation of the role of the material used in the art-work:

But there are properties of coloured glass that are of deeply spiritual significance and have been recognized by, for example, Pseudo-Dionysius in the first century and Bishop Grosseteste in the thirteenth. We view not an image but the light beyond which it mediates for us. The image owes its life to that ultimate light. This sense is much keener than it is in respect of the reflection of light upon opaque surfaces. The stained glass image is therefore like an ikon: we are not to look at it but through it. (2005)

If we regard the stained glass window as an intermediated presentation of religious and spiritual concepts and stories, then Homan’s analysis leads us directly to the point of intermediation—the light generated by the glass, which is as critical to the meanings of the windows as the images and icons created. Homan speaks of the role of the stained glass as being “to sedate light”: “A stained glass window slows us down; it inclines us to proceed reverently and lower our voices” (2005). The sensory effect of the coloured light produced by the windows is to remove viewers from the everyday world, locating them in an otherworldly space in which to contemplate religious mysteries and spiritual truths. This is surely the essence of the intermedial experience, not a translation from one art form to another, but a transformation of being and knowing generated by the (sensory) engagement of the viewer. Again note that although intermediality does address the effect on viewers of a particular form of text, its focus is on textual practice rather than audience interaction. Which is to say, the concept of intermediality tends to address primarily the ways in which the text positions the viewer, rather than the multiple active engagements of viewers.

4.   Boba Fett, children’s television and transmediality

The term that seems to best capture the active engagement of audiences or consumers of contemporary texts is transmediality. Henry Jenkins popularised this term in his influential study, Convergence Culture: Where Old and New Media Collide, first published in 2006. Writing about the Matrix phenomenon that had recently developed through the Wachowskis’ interrelated films, games and online comics, Jenkins identifies the work as transmedia storytelling as follows:

A transmedia story unfolds across multiple media platforms, with each new text making a distinct and valuable contribution to the whole. In the ideal form of transmedia storytelling, each medium does what it does best—so that a story might be introduced in a film, expanded through television, novels, and comics; its world might be explored through game play or experienced as an amusement park attraction. Each franchise entry needs to be self-contained so you don’t need to have seen the film to enjoy the game, and vice-versa. (loc. 1974)

This directly confronts older canonical notions of the text as a bounded entity, with the roles of the reader, viewer or listener being to unlock the meaning of that text. Instead it acknowledges the active role of the consumer (who moves between these different media) in creating story and generating meaning that is implicit in the notion of intertextuality. However, this is a different consumer from the medieval worshipper, and the key to that difference is the accessibility of a range of media.

Some thirty years ago, as a creative consultant to a network television producer of children’s programming, my job was to construct the world of a particular television program. Like Lucas’s enormously influential Star Wars series it was set in a different space—a set of planets orbiting a small star, each with their own names and characteristics. I no longer remember the details of the exercise but the project report was about forty pages long, and detailed everything a child might want to know about living on that planet. The aim of the exercise was to create a world that all the separate sequences of the program—games, stories, cartoons, write-in quizzes, the club—could refer back to, so that the show maintained a basic coherence. We wanted our viewers to feel at home in that universe, to feel a sense of engagement and belonging.

Lucasfilm led the way with this kind of world-formation by marketing a series of products that not only capitalised on viewers’ responses to the films, but also provided them with the tools to repeat and enhance that experience imaginatively. And, as Jenkins noted in Convergence Culture, Lucas did not simply endlessly repeat the story of the movie: “When Star Wars went to games, those games didn’t just enact film events; they showed what life would be like for a Jedi trainee or bounty hunter” (2006, loc. 2172). Later in the same chapter Jenkins notes that Lucas found that the value of developing toys based on secondary characters was that they might take on a life of their own: “Boba Fett eventually became the protagonist of his own novels and games and played a much larger role in the later films” (loc. 2273).

Again we might argue that this has happened before, with stories based on earlier texts that expand their imaginary world, including some based on Conan Doyle’s Sherlock Holmes stories: for example, Nicholas Meyer’s novel Seven-Per-Cent Solution ([1974] 1993) presents a back-story to Holmes’ addiction to cocaine (the novel was made into a film of the same name in 1976). What is new, however, is both the number of different media to which consumers have access and the degree to which they can engage with those media. Jenkins quotes Janet Murray’s assessment of the ‘“encyclopedic capacity’ of digital media, which she thinks will lead to new narrative forms as audiences seek information beyond the limits of the individual story” (2006, loc. 2283). Jenkins goes on to argue that, unlike some critics, he does not see this as leading to the death of narrative: “Rather, we are seeing the emergence of new story structures, which create complexity by expanding the range of narrative possibility rather than pursuing a single path with a beginning, middle, and end” (loc. 2323). Of course, it is crucial to know who is developing these new stories and how they relate to the original text.

If we use the example of the Matrix franchise, the whole massive narrative edifice stayed effectively in the control of the Wachowskis. For some viewers it was too complex to try to follow its development and they found the films increasingly difficult to understand, whilst the more dedicated fans were unhappy with the Wachowskis’ attempts to explain every aspect of their narrative, as Jenkins documents (2006, loc. 2436-2446). A fine line exists between the authorial control required to maintain the integrity of the narrative and the dictation of detail that closes down the engagement of the audience. Andrea Phillips discusses this in her practical introduction, A Creator’s Guide to Transmedia Storytelling (2012). She argues “the most effective tool is to actually create a small piece of your world and give it to your audience to play with” (41).

Phillips’ description of transmediality is subtly different from that of Jenkins, perhaps because of their different roles (Jenkins as critic and theorist, Phillips as maker). In her role as storyteller Phillips is concerned not to shut out the audience, so describes her world-building in a way that prioritises audience engagement. In Chapter 8, “Writing for Transmedia Is Different” Phillips notes that “we’ll be concentrating mainly on the requirements of telling a single, highly fragmented story across multiple platforms, and most particularly across digital platforms—you might call it social media storytelling as much as transmedia. That’s because this is where the methods of traditional single-platform or flat narratives become inadequate” (74-75). She goes on to explain this distinction in terms of the strategies used to enable the world of the narrative to be expanded by the audience: “Transmedia storytelling is an exercise in open-ended storytelling, boundless where a traditional single-medium story is finite” (75). Phillips explains that the storyteller should suggest to the audience that the world of the narrative includes more stories than the one that they have been given (75).

As noted earlier, one of the great successes of Star Wars is that its narrative is not confined to a specific set of incidents, rather the narrative contains the seeds of many other stories, featuring characters such as Boba Fett whose role in the core narrative is relatively minor but has the potential for new storytelling and world-building. By contrast, die-hard Matrix fans were disappointed when the Wachowskis attempted to lock down the meanings of the trilogy to a specific story by resolving the mystery, leaving little scope for imaginative retellings by fans. Instead Phillips notes the value of deliberately leaving loose ends that might become the source of new stories, which directly contradicts conventional advice given to writers. Though she also notes that these narrative possibilities have to be executed judiciously so that you do not “accidentally create narrative expectations that never achieve any kind of payoff” (76). Hence her earlier point about the importance of a clear story arc: “It is especially important in transmedia to have a plot that goes from beginning to end before you launch” (57). Another strategy to enhance narrative openness is “to create story elements in one medium that have their payoffs in another medium” (78), such as a game based on a film. All of this has to be achieved in relation to the basic premise with which she opens the study: “every single element of a transmedia story has to be fulfilling a narrative purpose, without exception” (40-41). And as she notes the aim of transmedia storytelling, as well as the marketers who use it, is engagement: “Transmedia storytelling can provide more engagement and more potential points of sale for any given story, and when it’s done well, each piece can effectively become a promotional tool pointing toward every other piece of the whole” (39). Every strategy used by the storyteller, therefore, should be about giving the audience “things to do, not just things to consume” (117).

Phillips’ Guide addresses textual practice directly in relation to audience or consumer engagement, though Phillips also stresses the need for a critical understanding of textuality (63). This engagement is the both the reason for transmedia production (to sell products, to tell a story) and the result of audience access to multiple media. As Phillips reiterates in her book, this engagement, and the textual openness that enables it, makes transmedia storytelling different from earlier forms of media narratives and audience-media relationships.

The Matrix (1999)

The Matrix (1999)

5. The joy of discovery and the fossilised dolphin

I return here to Jenkins’ crucial insight in Convergence Culture, that this different form of storytelling, described so well by Phillips, and common to the popular culture that preoccupies most children, signifies a new way of being and knowing:

Our workplaces have become more collaborative; our political process has become more decentered; we are living more and more within knowledge cultures based on collective intelligence. Our schools are not teaching what it means to live and work in such knowledge communities but popular culture may be doing so. (2006, loc. 2477)

For Jenkins this makes literacy training for children essential so that they can “develop the skills needed to become full participants in their culture” (loc. 5295), as Phillips argued when she stressed the need to be critical. The joy of transmedia engagement is that of discovery, of finding a way to contribute to the meanings of a text through your own creativity so that your stories are woven into that ever-expanding composite text. As Jenkins notes, however, this is more than a solitary venture. It is about being able to collaborate with others and to contribute to a collective venture without feeling a loss of individual achievement.

Digital technology has enabled this kind of sharing on an extraordinary scale—whether through kids playing games online with others across the globe, researchers collaborating on a project across cities, countries or continents or fans world-wide expanding a beloved narrative. It is also evident in the ways that older media such as radio and television use online resources to expand their research, engage their audiences, and incorporate audience responses and knowledge into their broadcast formats. Museums and libraries too are sharing resources and inviting visitors to become part of the knowledge-production for the institution. For example, by checking the digitisation of older manuscripts and newspapers for verisimilitude. On the one hand, this reflects economic necessity and the poor resourcing of many public institutions. On the other hand, it creates a wholly different, expanded knowledge base for the library, an enhanced level of engagement for visitors. Effectively, this visitor/user involvement changes the nature of the library from that of a central authority giving access to knowledge to a collaborative, creative, knowledge-building project. In December 2013 the British Library released an archive of over 1,000,000 images onto Flickr Commons for free use and reproduction. Dan Colman reported in Open Culture (2013):

The librarians behind the project freely admit that they don’t exactly have a great handle on the images in the collection. They know what books the images come from. (For example, the image above comes from Historia de las Indias de Nueva-España y islas de Tierra Firme, 1867.) But they don’t know much about the particulars of each visual. And so they’re turning to crowdsourcing for answers. In fairly short order, the Library plans to release tools that will let willing participants gather information and deepen our understanding of everything in the Flickr Commons collection.

Many other libraries and art galleries around the world have released part of their archives to open access and at the same time invite visitors to join them in becoming producers of knowledge.

Recently the Smithsonian Museum in Washington D.C. announced Smithsonian X 3D, a web portal that enables visitors to use the museum’s 3D scans of artefacts to build their own models using 3D printers. Günter Waibel, Director of the Digitization Program Office, explains:

These projects indicate that this new technology has the potential not only to support the Smithsonian mission, but to transform museum core functions. Researchers working in the field may not come back with specimens, but with 3D data documenting a site or a find. Curators and educators can use 3D data as the scaffolding to tell stories or send students on a quest of discovery. Conservators can benchmark today’s condition state of a collection item against a past state—a deviation analysis of 3D data will tell them exactly what changes have occurred. All of these uses cases are accessible through the Beta Smithsonian X 3D Explorer, as well as videos documenting the project. For many of the 3D models, raw data can be downloaded to support further inquiry and 3D printing.

And he concludes:

With only 1% of collections on display in Smithsonian museum galleries, digitization affords the opportunity to bring the remaining 99% of the collection into the virtual light. All of these digital assets become the infrastructure which will allow not just the Smithsonian, but the world at large to tell new stories about the familiar, as well as the unfamiliar, treasures in these collections.

This venture confirms many of Jenkins’ earlier predictions about how digital technologies will change our ways of producing knowledge. One of the artefacts currently available is the fossilised skull of an unknown species of dolphin, found in rocks that are 6-7 million years old. The Smithsonian X 3D website now supplies the software and instructions to print your own 3D copy of the skull. Even though this will not be the original skull, the value of a tactile engagement with the reproduction should not be underestimated. As a number of recent studies have argued (see Classen 2005, 2012; Howes 2005; Chatterjee 2008; Candlin 2010; Cranny-Francis 2013) tactile contact, indeed all kinds of sensory engagement, generate bodily responses that in turn produce new ways of knowing and understanding an object and our relationship to it. By sharing these knowledges, we learn more about not only the objects, but also ourselves.

6. Conclusion

The terms intertextuality, intermediality and transmediality map the development of new communication technologies through the twentieth and into the twenty-first century. They all effectively interrogate older canonical notions of textuality and of reading, as closed practices controlled by the author. Intertextuality was used to argue that texts have never been closed but part of an infinite conversation to which all texts contribute, and that each textual reading adds another voice to the conversation. Intermediality reflected the beginnings of popular access to multiple media, enabling users to explore the ways in a particular narrative or text may be transposed from one medium to another, expanding or enhancing the original story or idea. Transmediality is an articulation of convergence culture, whereby audiences are able easily to traverse and correlate a range of media in order to explore a complex and growing narrative or argument. The difference between intermediality and transmediality is not simply quantitative, however, it reflects a new way of understanding our relationship to texts, knowledge, and each other. It reflects, as Jenkins notes, the development of a collective knowledge culture in which collaboration is a key component of thinking and being. Further, the materials and practices that new technologies are making available, which incorporate bodily knowledges into this collaborative production of knowledge, presage new kinds of understanding and self-knowledge. As both Jenkins and Phillips argue above, the element required to leaven this heady mix is critical awareness—of the texts we produce and the meanings we make.



Bakhtin, Mikhail. 1984. The Dialogic Imagination: Four Essays. Translated by Michael Holquist and Caryl Emerson. Austin, TX: University of Texas Press.

Barthes, Roland. 1977. Image, Music, Text: Essays Selected and Translated by Stephen Heath. Translated by Stephen Heath. London: Fontana.

Candlin, Fiona. 2010. Art, Museums and Touch. Manchester: Manchester University Press.

Chatterjee, Helen, ed. 2008. Touch in Museums: Policy and Practice in Object Handling. Oxford and New York: Berg.

Classen, Constance. 2012. The Deepest Sense: a Cultural History of Touch. Chicago: University of Illinois Press.

Classen, Constance, ed. 2005. The Book of Touch. Oxford: Berg.

Colman, Dan. 2013. “The British Library Puts 1,000,000 Images into the Public Domain, Making Them Free to Reuse & Remix.” Open Culture, December 1. Accessed January 23, 2014.

Conan Doyle, Sir Arthur. 1981. “The Hound of the Baskervilles.” In The Penguin Complete Sherlock Holmes, 669-768. Harmondsworth: Penguin.

Cranny-Francis, Anne. 2013. Technology and Touch: the Biopolitics of Emerging Technologies. London: Palgrave.

Herzogenrath, Berndt, ed. 2012. Travels in Intermedia(lity): reblurring the boundaries. Kindle edition. Hanover, NH: Darmouth College Press.

Higgins, Dick. 1966. “Synaesthesia and Intersenses: Intermedia”, originally published in Something Else,Newsletter1, No.1 (Something Else Press). Accessed May 19, 2014.

Homan, Roger. 2005. “Who Looks on Glass? The Spiritual Significance of Stained Glass.” The Social Affairs Unit, August 3. Accessed January 23, 2014.

Homan, Roger. 2006. The Art of the Sublime: Principles of Christian Art and Architecture. Farnham: Ashgate.

Howes, David, ed. 2005. Empire of the Senses: the Sensual Culture Reader. Oxford: Berg.

Jenkins, Henry. 2006. Convergence Culture: Where Old and New Media Collide. Kindle edition. New York and London: New York University Press.

Lavigne, Carlen. 2012. “The Noble Bachelor and the Crooked Man: Subtext and Sexuality in the BBC’s Sherlock” in Sherlock Holmes for the 21st Century: Essays on New Adaptations.Kindle edition, edited by Lynette Porter, 13-23. London: McFarland & Company.

Mâle, Émile. 1913. Religious Art in France XIII Century: A Study in Mediaeval Iconography and Its Sources of Inspiration. Kindle edition. London: Dent.

Meyer, Nicholas. (1974) 1993. The Seven-Per-Cent Solution. New York and London: W.W. Norton.

Phillips, Andrea. 2012. A Creator’s Guide to Transmedia Storytelling: How to Captivate and Engage Audiences Across Multiple Platforms.Kindle edition. New York: McGraw-Hill.

Poe, Edgar Allan. 2010. The Dupin Mysteries with The Gold Bug. London: Capuchin Classics.

Shelley, Mary. (1818) 1982. Frankenstein or The Modern Prometheus, edited by Maurice Hindle. Harmondsworth: Penguin.

Stein, Louisa Ellen, and Kristina Busse, eds. 2012. Sherlock and Transmedia Fandom: Essays on the BBC Series. Kindle edition. London: McFarland & Company.

Waibel, Günter. “About Smithsonian X 3D.” Smithsonian X 3D. Accessed January 23, 2014.



Cox, Michael. 1984-1994. The Adventures of Sherlock Holmes.London: BBC.

Doctor Who. 2005 -. Wales, UK: BBC; Canada: CBC.

Gatiss, Mark, and Moffat, Steven. 2010-. Sherlock. London: BBC.

Gatiss, Mark, and Moffat, Steven. “The Hounds of Baskerville.” Sherlock, series 2, episode 2. Original airdate 8 January 2012. London: BBC.

Lucas, George. 1977-2005. Star Wars, Episode I-VI. USA: Lucasfilm.

Ritchie,Guy. 2009. Sherlock Holmes. USA: Warner Bros.

Ritchie,Guy. 2011. Sherlock Holmes: A Game of Shadows. USA: Warner Bros.

Ross, Herbert. 1976. The Seven-Per-Cent Solution. USA: Herbert Ross Productions, Universal Pictures.

Scott, Ridley. 1982. Blade Runner.USA: Ladd Company, Shaw Bros, Warner Bros.

The Wachowski Brothers. 1999. The Matrix. USA: Warner Bros.



[1]Steven Moffat has been reported as saying that he sees Sherlock as asexual. However, the iconography used with Sherlock and the way in which his relationships with Watson and Moriarty (among others) are presented allow for the many fan readings of him as gay or bisexual—as Carlen Lavigne argues (2012).

[2]References to Kindle books are given as locations, unless the book also provides page numbers.

Bio: Anne Cranny-Francis is Professor of Cultural Studies at the University of Technology Sydney. Her recent work includes ARC funded projects on the sense of touch and its deployment by new technologies, described in Technology and Touch: the Biopolitics of Emerging Technologies (Palgrave, 2013), and on ex-patriot Australian writer, Jack Lindsay.


‘God Hates Fangs’: Gay Rights As Transmedia Story in True Blood — Holly Randell-Moon

Abstract: In this paper I examine the television program True Blood’s allusions to gay liberation in terms of the biopolitical and neoliberal implications of consuming civil rights as a transmedia story. In the program, vampires have ‘outed’ themselves to the population at large and in conjunction with the invention of synthetic blood (Tru Blood) are able to publicly participate in social and economic activities without harming humans. Home Box Office’s (HBO) use of Tru Blood to market the show is premised on the commodification of a (vampire) rights based movement across a range of different story-telling mediums. On the one hand, this means that the program is drawing attention to the biopolitical function of rights discourse by suggesting that it is the management of particular kinds of life, through particular kinds of consumption, which remains valuable to the dominant political and economic order. On the other hand, the mapping of vampirism onto civil rights also functions to legitimise a political discourse wherein the purported social ‘harm’ of granting minority groups equal rights can be mitigated by market forces and the cultivation of a constituency whose political power is linked to their ability to consume. The consumption of the True Blood story by fans thereby enacts principles of biopolitical management and containment of civil rights groups through HBO’s and fans’ willingness to enact play-political consumption and performance of rights in a transmediated public sphere.

rm1The television series True Blood (HBO, 2008-2014), based on The Southern Vampire Mysteries novels by Charlaine Harris, features a number of allusions to gay liberation and lesbian, gay, bisexual, transgender and intersex (LGBTI) politics in its depiction of ‘vampire rights’. In the fictional town of Bon Temps, in Louisiana, United States, where True Blood is set, vampires have ‘outed’ themselves to the population at large and in conjunction with the invention of synthetic blood (Tru Blood) are able to publicly participate in social and economic activities without harming humans. The production of Tru Blood as a commodity enables individual and collective groups of vampires to advocate for the civil and political rights enjoyed by humans. In the vampires’ attempts to become part of ‘mainstream culture’, there are several references to gay liberation. These include the American Vampire League, whose activism and media interventions mirror that of groups such as the Human Rights Campaign, the use of the phrase ‘coming out of the coffin’ to describe the increasing numbers of vampires publicly acknowledging their existence to humans, and the prejudice directed at vampires by humans, particularly by those with conservative or evangelical Christian beliefs. This specific cultural, political and religious milieu for vampire rights is telegraphed in the opening title sequence by a brief shot of a church sign, which reads, “God Hates Fangs”. Amongst the ostensibly non-fictional images of Southern quotidian life—swamps, road kill, baptisms, church choirs, bar brawls—it is the only indication in the sequence of the program’s focus on the supernatural.

The diegetic plausibility of the vampire liberation movement is aided by various transmedia paraphernalia simultaneously operating outside of and in relation to events in the show’s narrative. This includes the availability of Tru Blood beverages and merchandise, Facebook and social media material for the advocacy groups featured within the show and partnerships between Home Box Office (HBO—the channel that broadcasts True Blood) and advertising companies, such as Geico insurance, to produce fictional campaigns targeted explicitly towards vampire consumers but implicitly, True Blood fans. In this extension of the program’s narrative of vampire rights to other types of media and forms of consumption, True Blood is exemplary of the new practices of transmedia storytelling championed by Henry Jenkins. He defines transmedia as

a process where integral elements of a fiction get dispersed systematically across multiple delivery channels for the purpose of creating a unified and coordinated entertainment experience. Ideally, each medium makes its own unique contribution to the unfolding of the story. (Jenkins 2011; original emphases)

For Jenkins, this type of storytelling enables and builds on audience participation in the meaning-making process of media texts (2006). This mode of storytelling is also closely associated with viral marketing, which utilises “pre-existing social networks like websites and YouTube in order to increase franchise or brand awareness” (Ndalianis 2012, 164). Transmedia forms of storytelling, like those employed for True Blood, can be quite complex and multi-faceted, involving the extension of a text across not only different types of media but also different geographical locations and consumer activities. In her excellent book, The Horror Sensorium (2012), Angela Ndalianis details transmedia stories and campaigns involving scavenger hunts, political rallies, social media tourism and urban graffiti that centre on the production of an embodied fan relationship with media texts. She argues that the transmedia stories deployed for texts such as The Dark Knight (Christopher Nolan, 2008), Lost (Walt Disney Studios Home Entertainment, 2004-2010)and True Blood “address the fiction/reality interplay by mitigating their stories more invasively into the social sphere” (165). They do this by encouraging fans and consumers to become ‘actors’ in a transmedia performance of a ‘living’ narrative (166). This performance produces a kind of meta-affect because fans “extract cerebral and sensory pleasure participating in and contributing to a highly crafted fictional world that’s in the process of unveiling itself” (169). An example of this type of meta-affective performance occurred in early 2009, in Auckland, New Zealand, when a series of wooden posters advertising True Blood were installed along public streets. Featuring information about True Blood’s airdate (the series was premiering on New Zealand television at this time), the posters had “In case of vampire” written across the top and “Snap here” at the bottom presented alongside flat wooden stakes. Potential fans and viewers of True Blood were invited to participate as performers in the program’s narrative by exercising vigilance and protection from the newly outed vampires by snapping off a wooden stake and carrying the physical textual detritus into their everyday lives.

trubloodbotWhat structures this kind of performance and participation by fans is the story and narrative used to extend a text via transmediation. In this paper I want to examine the execution of True Blood’s transmedia storytelling through a narrative of vampire rights that alludes to civil rights debates around gay liberation. I want to focus on the specifically transmedia dimensions of this narrative and how this particular media form interpellates viewers into a biopolitical and neoliberal mode of consuming civil rights. The program’s use of Tru Blood, both intra- and extra-textually, is premised on the commodification of a rights based movement across a range of different story-telling mediums. On the one hand, this means that the program is drawing attention to the biopolitical function of rights discourse by suggesting that it is the management of particular kinds of life, through particular kinds of consumption, that remains valuable to the dominant political and economic order. On the other hand, the mapping of vampirism onto civil rights also functions to legitimise a political discourse wherein the purported social ‘harm’ of granting minority groups equal rights can be mitigated by market forces and the cultivation of a constituency whose political power is linked to their ability to consume. Fans’ affective investment in vampire rights is then managed via consumption in a transmedia format that mirrors biopolitical strategies of management and containment of minority groups through civil rights discourse.

“No darlin’, we’re white, he’s dead”: Vampires and biopolitics

In her essay “Technologies of Monstrosity”, Judith Halberstam argues that “[a]ttempts to consume … vampirism within one interpretive model inevitably produce vampirism. They reproduce, in other words, the very model they claim to have discovered” (1993, 334). For this reason, in her analysis of Bram Stoker’s Dracula she argues that the central figure is “not simply a monster, but a technology of monstrosity” (334). Representations of monstrosity in texts like Dracula function not so much to reify particular characteristics of monstrosity (be it sexual immorality or corporeal difference) but to produce and disseminate particular discourses constituted as monstrous. So if we take a particular representation of vampires to signify for example, minority rights, we are also at the same time producing an understanding of what minority rights mean in popular and political culture.

Given that monstrosity is typically construed as a threat to human life, textual portrayals of monstrosity are also concerned with the management of that threat and the balancing of the value of human life with the containment of monstrosity. The development and application of various governmental strategies designed foster the life and health of citizens is defined by Michel Foucault as biopower (1991b, 263). In order to maximise the economic productivity of the state, governments and state institutions have “to qualify, measure, appraise, and hierarchize … the living in the domain of value and utility” (1991b, 266). One way to organise social practices around ‘value and utility’ is to encourage citizens to invest in a racialised and heteronormative construction of the family as the site through which life can be fostered or neglected (1991a, 99). As the management of the economic and social life of the polity comes to pivot on heterosexual familial reproduction, non-heterosexual or non-normative sexualities can be positioned in biopolitical terms as threats to the ‘health’ and productive order of a society. In her essay “Tracking the Vampire” Sue-Ellen Case explains:

From the heterosexist perspective, the sexual practice that produced babies was associated with giving life, or practicing a life-giving sexuality, and the living was established as the category of the natural. Thus, the right to life was a slogan not only for the unborn, but for those whose sexual practices could produce them. In contrast, homosexual sex was mandated as sterile—an unlive practice that was consequently unnatural, or queer, and, as that which was unlive, without the right to life. Queer sexual practice, then, impels one out of the generational production of what has been called “life” and historically, and ultimately out of the category of the living. (1991, 4)

In a biopolitical paradigm, subjects deemed unable to contribute productively to the life of a society can be excluded from the rights and protections offered by that society. This exclusion is then overlain with a naturalising discourse, which works to justify the asymmetries of legal and social recognition as simply part of the ‘natural order of things’. This is why Case sees a link between the cultural discourses used to frame both vampirism and homosexuality. In a dominant heteronormative order that conflates a particular kind of social and political life with life itself, both vampirism and homosexuality become aligned with death or unlife.

rm3The representation of the various kinds of harm vampire rights pose to humans in True Blood then seems an apposite metaphor for the biopolitical exclusion of LGBTI people from certain state-based rights. As a number of scholars have pointed out, True Blood’s treatment of vampiresis characteristic of a wider shift in textual portrayals of vampires “from the right to exile … to the right to citizenship in the postcolonial United States” (Hudson 2013, 663). Bernard Beck sees “[t]he plain message of today’s vampire lore” as evidence “that we are becoming less fearful and hostile, more curious and sympathetic to those we insist on defining as strangers” (2011, 92). This narrative shift from exclusion to inclusion in representations of vampiric difference is reflective of a broader social and political consensus around managing minority groups through integration rather than expulsion from a neoliberal economic order. Deborah Mutch notes that the narrative framework for the acceptance of vampires in book series such as Twilight and The Southern Vampire Mysteries are premised on “accepting human definitions of nation and race which are then superceded by globalised trade” (2011, 75).

While the supernatural genre has the ability to, as Dale Hudson puts it, “decolonize our familiar habits of thinking”, particularly with respect to cinematic and televisual “political realism” (2013, 662), textual portrayals of supernatural creatures nevertheless tend to incorporate dominant biopolitical conceptions of human life as the normative narrative bedrock against which other kinds of lives or living is measured. Hudson points out that in True Blood, vampirism is constituted as species difference through reference to characters as ‘vampire Bill’, whereas human characters are not described as ‘white Jason’ or ‘black Tara’ within the diegesis of the show (666). Where vampirism is discursively positioned as bodily distinct from human-ness, the nation on which this embodiment is placed remains invisible. True Blood’s representation of First Nations peoples and their interaction with vampires (those old enough to have arrived in North America during colonisation) is limited enough to suggest an erasure of colonialism as significant to the historical formation of the United States. As Hudson notes, “Indigenous nations appear only in the realm of the supernatural in True Blood” (669). For Hudson, the program’s use of the supernatural allows an imagining of “the New South as a space inhabited by multiple species on multiple planes of reality” (664), which invites consideration of “the right to rights” (685). My interest in this paper is how True Blood’s portrayal of “the right to rights” is linked to the public management and presentation of rights-based groups via transmedia texts, which are dependent on public forms of consumption and fan activity.

“You are not our equals. We will eat you. After we eat your children”: Vampire rights

In True Blood’s narrative conflicts around vampire rights, there are several allusions to civil rights and equality movements. The series has been received predominantly as a commentary on gay liberation. A New York Post article, for example, contends that “the fictional vampires’ quest for the same rights and social acceptance enjoyed by” humans “has become synonymous with the very real fight for gay rights” (Shen 2009). The author of the novels on which the show is based also seems to encourage this association (see Solomon 2010). As with the gay rights movement, vampires’ attempts to achieve equality are perceived by their opponents as a threat to the social and cultural stability of the polity they inhabit. However, the crucial difference between vampires and LGBTI peoples is that the alleged ‘harm’ posed to society by granting the latter civil rights is symbolic and imagined whereas vampires, within the diegesis of the show, do perpetrate considerable violence. In this vein, a reviewer of the show opined, “[t]hese vamps are assholes, not oppressed minorities. They deserve to be hated. If these murderous, evil creatures are figures for gay people, then they are figures for the religious right’s worst nightmare of what gay people are” (Newitz 2008). The program’s creator, Alan Ball, also avers with this reasoning “because the vampires on our show are, for the most part, vicious murderers and predators, and I’m gay myself, so I don’t really want to say, ‘Hey, gays and lesbians are basically viciously amoral murderers’” (Grigoriadis 2010).

outdoor-advertising-aimed-at-vampiresThe question of whether rights should be reserved only for those who are morally deserving is addressed in an interesting way by the American Vampire League (AVL) within the show. In the first episode (“Strange Love”, 1.1), the AVL spokesperson, Nan Flanagan (in an interview with Bill Maher) refutes assertions that vampires perpetrate large-scale murder and assault against humans (for lack of documented evidence) and counters that humans themselves are responsible for slavery and genocide. Later on in the series, another vampire Russell Edgington uses this same logic—humans have caused irreparable damage to the environment and the species they share it with—to reach a very different conclusion regarding vampire-human relations. For Edgington, vampires are right to insist on their superiority to and difference from humans. He broadcasts these views on a live news program and after deboning the anchor, proclaims to the human audience, “You are not our equals. We will eat you. After we eat your children” (“Everything is Broken”, 3.9). Human anti-vampire bigotry meanwhile stems from a corporeal vulnerability to vampires’ biological requirement for human blood. In its extreme form, anti-vampire prejudice manifests as a speciest right to survival exercised by vigilante groups such as the one seen in Season Five. This group of men don Barack Obama masks as they inflict violence and in some cases, death, upon vampires and other supernatural beings. This group mentions and appears to be linked to the ‘Keep American Human’ movement, which has its own website and promotional material. This doubly imbricated right to ‘America’ and to life is framed by anti-vampire humans as exclusive. One of the vigilante characters complains, “it’s some sort of crime now being a regular old human” (“In the Beginning”, 5.7) as if the uniqueness of being human cannot be co-extensive with the existence of other species.

Vampire prejudice thus goes beyond the simple fear of death or bodily harm and involves a speciest condemnation of vampire existence that is often inflected with a moral discourse. When the show begins, vampires have achieved a limited degree of civil equality such as the right to marry (in certain states in the US and if the unions are heterosexual) and are protected by anti-discrimination laws (businesses cannot refuse to serve vampires as customers), which are reluctantly enforced by police. There are also a series of moral and social codes, centred primarily on sexuality, that police vampire and human interactions. Humans who engage with or are thought to engage in sexual relations with vampires are derisively referred to as “fang-bangers”. The central character Sookie Stackhouse is often judged negatively in terms of her moral standing and character for her relationship with the vampire Bill Compton. The first season features a violent expression of this chauvinism in the form of a serial killer with a pathological hatred of women who sleep with vampires.

The corporeal vulnerability of humans to vampire attack is balanced by the portrayal of vampire blood as producing hallucinatory and amphetamine-like effects when consumed by humans. Vampire blood or V-juice is a highly sought-after but illegal commodity associated with the vampire bar scene and fang-bangers, which may allude to subcultural forms of clubbing and recreational drug use. In Season One, a lonely vampire named Eddie claims that he can only express and act on his homosexual orientation by trading his blood for sexual favours with human men (in particular Sookie’s co-worker and friend, Lafayette Reynolds). In an inversion of the life-giving connotations of heterosexual sex, one scene in the first season shows Sookie’s brother Jason and his girlfriend consume V-juice and make love whilst Eddie is tied up and tortured in the basement below them. Here it is an undead subject whose blood provides the impetus and facilitation of heterosexual sex.

The moral repugnance at the tarnishing of human life and sexuality bought about by vampire-human contact is aligned with most (although not all) forms of Christianity in True Blood. The second season features an evangelical group called the Fellowship of the Sun that promotes “pro-livin’ values” (Home Box Office 2012) and warns the human polity about the dangers of vampire rights and the “the wing nuts on the left” who advocate for them (“The Fourth Man in the Fire”, 1.8). In a television interview, the pastor of the church, Reverend Steve Newlin, explains that vampire rights threaten “the rights of our sons and daughters to go to school without fear of molestation by a bloodthirsty predator in the playground or in the classroom” (“The Fourth Man in the Fire”, 1.8). One of the advertisements produced by the Fellowship of the Sun, not featured in the show but distributed online and in poster form in some cities, depicts a young blonde boy with the caption, “To them he’s just a midnight snack” (Ndalianis 2012, 178).

The figure of the child here is important as Ben Davies and Jana Funke note, “the teleology of straight time is projected onto the sex act, which displaces its own meaning, significance or indeed non-significance for the production of the future” (2011, 6). In this way, the future viability of a heterosexual society is linked to the purity and protection of children. In a video press release for the advertising campaign, the elder Reverend Theodore Newlin passionately declares, “our children are our most precious resource, our lifeblood” (the video appears on YouTube under the category ‘Nonprofits & Activism’). On the Fellowship’s website, homosexuality is listed alongside vampirism as a social danger: “It’s nothing new for teenagers and young adults to flock to the newest trend, and it’s hardly uncommon for these fashion choices to be self-destructive, like smoking, drugs, tattoos or homosexuality. But the latest fad—a soulless eternity of drinking blood—can’t be undone with a laser treatment or rehab. Vampirism is forever” (Home Box Office 2012). While some organisations and US Republican presidential candidates view homosexuality as a choice or temporary lifestyle that can be cured or corrected, what makes vampirism especially pernicious for the Fellowship is that it cannot be erased or overcome, it’s “forever”. In another television interview, the younger Reverend Newlin says, “the vampires as a group have cheated death. And when death has no meaning, then life has no meaning. And when life has no meaning, it is very, very easy to kill” (“Nothing but the Blood”, 2.1).

Anti-vampire sentiment is not an opposition to the merits or otherwise of particular vampire rights, rather the opposition stems from the consequence that these rights serve to entrench vampire presence in civil and social spaces. It is precisely because vampirism constitutes a permanent state of being that the necessity of repealing vampire rights takes on an apocalyptic sense of urgency. Such rhetoric alludes to and perhaps parodies anti-gay rights activism, particularly the National Organisation for Marriage’s (NOM) Proposition 8 “gathering storm” commercials which featured activists and citizens expressing concern about marriage equality backgrounded by blue screens depicting severe lightening storms and flooding. Here the public recognition of difference is conflated with disaster. In the type of advocacy employed by the Fellowship of the Sun, and NOM, the out-group’s very existence seems to imperil a safe and normal social and political order.

Where NOM’s advocacy and rhetoric is left open to debate and parody in the marketplace of democratic political suasion, the Fellowship is clearly set up as an object of ridicule within True Blood. First Newlin (in Season Two) and then his wife Sarah (in Season Six) are positioned as villains whose attempts to instigate genocidal war against vampires figure as obstructions and then climatic battles against which Sookie and friends must contend. Hudson argues that “Steve’s punishment is to be ‘made’ vampire, presumably unleashing his latent desires for Jason” and he “becomes a self-defined ‘gay vampire American’” (2013, 672). Such a transformation is presented humorously as a revelation of the character’s moral and political hypocrisy because his hatred of vampires is ostensibly linked to a self-hatred of his orientation. The reading of groups such as the Fellowship as opposed to progressive social and political causes is reflected in scholarly and popular reception of the show. For example, J. M. Tyree explains the premise of True Blood by noting, “The resistance movement to vampire rights is formed out of the ideological dregs of fundamentalist Christianity” (2009, 32). An online recapper describes the vigilante Keep America Human group as “a bungling bunch of bigoted idiots who spew thinly veiled Fox News talking points like ‘lamestream media’” (Berkshire 2012).By framing the Fellowship and Keep America Human’s advocacy against vampires as villainous, True Blood can be seen as participating in progressive representations of civil rights wherein “proclaiming a future in which the current resistance to gay marriage will seem backward” allows those subjects who already accept civil rights to be “projected forward in time” (Davies and Funke 2011, 6).

True Blood’s vampire rights narrative enables the production and facilitation of a set of transmedia texts framed around advocacy. As various groups within the show vie for political, cultural, economic and species preservation, this sets up an affective biopolitical participation wherein fans and reviewers debate the merits of civil rights, equality and state protection. A positive reading of this biopolitical transmedia engagement with the show is that a popular political consensus around inclusion and integration encourages fans to view the contribution of violence and essentialised forms of prejudice to political debate in negative terms—whether in the form of the Fellowship’s moral inflection to humans’ right to life or vampires’ reduction of human ontological existence to food. In the next section of the paper, I want to unpack the implications of how this fan engagement with the biopolitics of vampire rights is achieved through transmedia storytelling as a specifically commodified activity.

“There’s no such thing as bad; or time for that matter”: Vampires and neoliberalism

Aside from some obvious corporeal differences—fast movement, sharp orthodontics, sartorial preference for dark, binding clothing—vampires in True Blood attempt, for the most part, to fit into the social and cultural environment around them. In an interview for The New York Times Harris explains that her vampires “are more sympathetic” than previous sanguisuge incarnations. Of Dracula she says: “He had disgusting personal habits. He had the three wives; he crawled up the sides of the buildings; he had the sharp teeth and fingernails. Mine are at least trying to look like everyone else, but it’s not working out too well for them” (Solomon 2010). While earlier representations of vampires tended to exacerbate their monstrosity as difference, in Harris’ novels and its televisual counterpart, monstrosity is framed around the problem with assimilation to a human-centred social and political order. This integration is premised on the presence of a biotechnological industry, economic infrastructure and political consensus enabling them to do so.

The AVL is able to advocate for the public acceptance of vampires, on the basis that they do not pose a threat to humans, because of the development of the synthetic Tru Blood replacement for human blood. Originally developed by a Japanese biomedical company as a solution for human blood loss and transfusions, an accidental side effect is that the product can provide sustenance to vampires. Thus while the show centres around the politics of integration, the fulcrum for this integration is the successful branding and marketing of Tru Blood as “a globally transported commodity” (Mutch 2011, 81). The second vampire we see in True Blood is shown purchasing the beverage from a 7-Eleven style convenience store. In this opening scene of the first episode, two bored white teenagers eagerly approach the store clerk, fashioned in dark clothing, piercings and long black hair, to inquire about the possibility of scoring V-juice. The clerk indulges the potential V customers, menacing them with intimations of violence, before abruptly revealing his status as human, to the delight of the male teenager and relieved anger of his female counterpart. A burly gentleman in military garb and a cap adorned with a Confederate flag comes forward to express his displeasure with the ruse. After the male teen excoriates the customer by saying, “fuck you Billy Bob”, ‘Billy Bob’ reveals his fangs and responds, “Fuck me. I’ll fuck you boy. I’ll fuck ya’ and then I’ll eat ya’” (“Strange Love”, 1.1). The vampire’s interactions with both the clerk and the young couple subvert generic expectations, from the characters within the show as well as the audience, of the vampire as reclusive and gothic. Hudson reads this scene as evoking “the lingering embers of ‘lost cause’ for white-male-human privilege” where “the privileged position of the white-male-human in the Old South might be restored only in supernatural terms in the New South” (2013, 672). Now a vampire, the Southern white Confederate man can still expect his purchasing power and public presence to proceed without humiliation or impediment.

The development and dissemination of Tru Blood for public consumption creates new forms of human and vampire interaction, which diverse sets of stakeholders attempt to negotiate and regulate in different ways. The AVL attempts to gain political enfranchisement through a Vampire Rights Amendment (VRA) while other supernatural species, such as werewolves, wait cautiously to see how vampires are treated before likewise revealing themselves publicly (Hudson 2013, 665). The means through which a pharmaceutical product propels the development of vampire rights reinforces Halberstam’s point that Gothic monstrosity is always “an aggregate of race, class, and gender” (1993, 334). In order to participate as good biopolitical citizens, vampires must have the capital to access Tru Blood as well as the legal protection to purchase and consume the product in a discrimination free environment. The fake commercials for Tru Blood, released on YouTube, attempt to help this economic and political process along by portraying Tru Blood consumption as alternatively cool and sexy or folksy and non-threatening. For example, in one commercial, three young white men approach a bar and place their orders in quick succession:

I’ll take that vodka with the really cool ad campaign.

Ridiculously expensive imported beer with a name I can’t pronounce.

I’ll have one of those exotic cocktails.

Their requests are interrupted by a conventionally attractive white woman who orders Tru Blood and then carries it to her wan date, languishing in the shadows of the bar. The men stare at the Tru Blood customer in astonishment and awe. The ad ends with the tagline, “Tru Blood, because you don’t need a pulse to make hearts race”:

The commercial has no branding for True Blood or HBO and is a self-contained transmedia text—the Tru Blood logo shown at the end even has small legalise advising potential consumers, “Synthetic blood products contain varied cellular content than actual blood. Please consult a Tru Blood Cellular Specialist for specific nutritional information”. True Blood fans are addressed as both consumers of the show and of the fictional Tru Blood beverage. These fans are positioned as savvy and media literate cognisors in a way that disarms the purpose of both the True Blood text and the Tru Blood advertisement to establish a blatantly commercial relationship with fans through a postmodern knowingness of alcohol marketing. The intended affective response here, as per Ndalianis, is to generate meta-pleasure in recognising the text’s transmedia connection to the show (in the absence of specific show branding) amidst the generic conventions of alcohol commercials.

Another commercial features a group of mostly white men camping and enjoying beer around a fire. We then see the group through a point of view shot from the darkness in a way that appears to show a predator sneaking up on them. In a reverse shot, a vampire emerges behind one of the men and snarls. The men are startled and then begin to laugh as they welcome the vampire as a recognised friend. “You boys got something for me to drink?” the vampire chuckles as his friends hand him a Tru Blood.

These commercials generate a convivial affective connection to the show anchored through transmedia commodity relations that mirror the internal commodity relations between characters in True Blood. The success of Sookie and Bill’s relationship for example, is implicated in the proliferation of cheap pharmaceutical substitutes. After a passionate bout of lovemaking and bloodletting, Bill tenderly instructs Sookie to take vitamin B-12 tablets to compensate for and replenish her blood loss. Coming out of the coffin is also made more consequential for some vampires due to their social media proficiency. Hudson notes that, “Unlike Jessica today, whose ‘babyvamp’ blog  is part of the series’ multiplatform format” Bill “could not interact with a human society that knew him to be a vampire” (2013, 665). Here the internal narrative of the show permits a younger character to be expanded into its transmedia storytelling in a way that would seem implausible and inauthentic to Bill’s character (at least before he is recruited as an AVL figurehead in Season Three). These video blogs, which are performed by the actors in character, also function to link consumption practices to vampire integration. One vlog has the vampire Pam dispense fashion advice to Jessica and her ‘audience’ about where humans should shop to avoid wearing silver (a metal that enkindles vampire flesh in True Blood). Extra-textually, the real brands that Pam lists off as acceptable for human-vampire contact also confirm to True Blood viewers which consumption practices will identify them as fans of the show (below).

Where once vampires could be seen to attest to “the consequences of over-consumption” (Halberstam 1993, 342), the vampires in True Blood reflect a different set of economic and biopolitical concerns. Writing for Newsweek Jennie Yabroff posits that the current crop of vampire films and televisions shows are permeated by “vampires who have enough self-control to resist the lure of human blood, reflecting, perhaps, the conservative direction the culture has taken” (2008). The popularity of vampires who are able to exercise self-control is politically conservative insomuch as it reflects a neoliberal focus on improving and maximising the capacities of the self. In such an economic climate, Stephen Ball writes that workers are encouraged “to think about themselves as individuals who calculate about themselves, ‘add value’ to themselves, improve their productivity, live an existence of calculation” (2001, 223). That this neoliberal calculation and control could be construed as vampiric speaks to cultural shifts in assessing social and economic success. In his book The Culture of the New Capitalism, Richard Sennett writes that workers who flourish in the contemporary business climate are “oriented to the short term, focused on potential ability, willing to abandon past experience”. This type of employee “is—to put a kindly face on the matter—an unusual sort of human being” (2006, 5). While this continual need to improve, calculate and enhance oneself and one’s resources can prove taxing to a living human, vampires have the physical capabilities as well as an endless amount of time to adapt to and thrive in volatile neoliberal economic conditions.

Vampires who are able to successfully pursue their business and political endeavours recognise the strategic value of performance. Despite her exhortations that vampires can ‘mainstream’ through the consumption of Tru Blood, the AVL’s Nan Flanagan presents herself quite differently to humans in comparison with her fellow vampires. In the episode, “Everything is Broken” (3.9), Russell Edgington kills a human on live television and Nan is revealed watching the event unfold mid-snack on a female human. When Bill is invited by Nan to appear at the AVL-sponsored Festival of Tolerance (“Let’s Get Out of Here”,4.9), he queries the political efficacy of only having three vampires present at the event, “it’s like having a civil rights protest without any black people”. In response, Nan scolds him, “They’re called African Americans and maybe those protests wouldn’t have turned into the blood baths they became if they hadn’t been there, ever consider that?” This cynical and racist understanding of minority groups as responsible for the institutional and social violence inflicted on them is an instrumentalised version of strategic essentialism (see Spivak 1987). The disjunction between Nan’s private ‘life’ and the AVL’s public management of vampire behaviour and comportment draws attention to the ways identity politics bargains on the securing of certain rights at the expense of the lived, or undead, complexity of the identities being politicised.

The shifting between rights discourse in Nan and Bill’s conversation, from the African-American Civil Rights Movement to vampire rights, is indicative of True Blood’s dual treatment of historical inequality as a topic that is both serious and linked to a post-industrial commodification of identity politics. The program typically presents critical views of the US’ racist history through the character of Tara. She is sceptical of Bill’s intentions when they first meet because he admits that his family owned slaves (“The First Taste”, 1.2) and complains, “People think just cause we got vampires out in the open now race isn’t an issue no more” (Hudson 2013, 674). Later Tara is ‘outed’ as a vampire to a former high school classmate who patronisingly affirms her identities by saying, “now you’re a member of two minorities!” (“Somebody That I Used to Know”, 5.8). The politics of being ‘out’ as a vampire are also refracted through allusions to racial segregation. Where Eddie and Steve Newlin’s status as vampires allows them to act on their sexual attraction to men (albeit in different and limited ways), other vampires do not have “built-in privileges of masculine whiteness” (672). For Tara, her body reads as both vampire and African-American, Bill meanwhile is discursively positioned as simply ‘vampire Bill’. As Arlene Fowler explains to her child (upon seeing Bill), “No darlin’, we’re white, he’s dead” (“Sparks Fly Out”, 1.5), whiteness and race are embodied by the living first and non-white bodies second. While the AVL stakes an authoritative claim to what constitutes ‘good’ vampire behaviour, vampires must negotiate their public presence among humans along normatively defined lines of race, gender and sexuality.

These intersections of vampire rights and human-centred identity politics are dramatised in transmedia texts which portray vampires’ attempts to police themselves according to competing sets of claims about ‘good’ and ‘bad’ vampire behaviour. In one of her vlogs, Jessica politely advises Tara to avoid saying phrases like “it sucks” now that she is a vampire, for fear of alarming her audience and the public at large (see vlog below).

The ways in which vampires in True Blood are portrayed “both as a threat and as a fully paid up part of civilian life” (Matthews 2011, 200) exemplifies a biopolitical order which depends on the self-policing and disciplining of subjects according to social norms so that excessive external coercion by the state is not required (Foucault 1977). In this sense, True Blood is the culmination of a representational trajectory of vampires as ostensible outsiders to ciphers for sensible consumption, civic pride and business ethics. In an AVL sponsored Public Service Announcement entitled “Accept the Truth” (below), various vampires describe themselves as ordinary “Americans”, for example, “I’m a short-order cook in New York City, I’m cold to the human touch”, and “I run a horse ranch in Northern Montana, sunlight turns me to ash”.

These dramatic declarations of nationality read as humorous precisely because audiences are used to seeing vampires as obviously different from and suspicious of human life. The extension of the True Blood narrative primarily through these media texts, which simultaneously exhort and parody ‘good’ performances of citizenship and consumption, interpellates fans into a transmedia public sphere along the same lines, through HBO-approved forms of consumption. In the final section of the paper, I want to unpack the distinctions and comingling of political-play as consumption and activism in terms of the role of transmedia storytelling and marketing in disciplining the use of public space.

But please remember I can rip your throat out if I need to”: Vampires and political-play consumption

I have argued so far that True Blood’s vampire trope conjoins civil rights with consumption and civic pride based on a neoliberal performance and management of the self. The program’s focus on the performance of vampirism enabled by a state protected mode of consumption is carried over into fans’ engagement with the show through officially sanctioned forms of consumption. The program’s production and broadcast through the premium HBO cable channel enables a much more explicit and liberal portrayal of sex and violence than traditional broadcast television, and this is undoubtedly a significant reason the show was pitched to and commissioned by HBO. The positioning of the show as both risqué and compatible with a politically progressive demographic is used in marketing material for the show.

For example, one HBO commercial (above), advertising the Season Two DVD box set, has a white family unwrapping Christmas presents from a young woman, presumably their daughter. In response to her Grandma’s query, “What’s this honey?”, the woman gives a quick recap of the season culminating in this description, “and the whole town has a huge orgy. Merry Christmas Grandma, I love you so much”. The commercial’s tagline is “The perfect gift for almost everybody” . The marketing of True Blood’s sexually explicit and graphically violent content as different to or in opposition to the ‘safe’ television programming that your grandmother enjoys sits at odds with the class and cultural capital required to actually consume the show. This includes access to premium cable or at least reliable broadband Internet to download or view the program as well as the supplementary web material that accompanies the program and is designed to satiate audience interest in between episodes and seasons. Whatever form of risk or subversion the vampires in True Blood present to the existing textual order of vampirism is incorporated into an already safely established mode of television production and consumption.

As Ndalianis points out, the goal of an effective transmedia campaign and story is to make audiences “forget that they’re a marketing strategy devised to sell a product” (2012, 166). Fans are encouraged to immerse themselves “in an emerging narrative that isn’t fixed or pre-staged but which they perform a key role in unraveling” (189) and “the participant is invited to literally play and become part of a performance as if it’s real” (172; original emphases). The unfolding of transmedia participation in ‘real-time’ is precisely how the constructed nature of the story is obfuscated. While fans can unravel or make sense of a transmedia story in diverse ways, the underlying narrative which structures the assemblage of transmedia texts is nevertheless necessarily fixed or pre-staged in order to generate an economy of performance that will move the story along.

The framing of transmedia stories around questions of rights, survival or torture can legitimate biopolitical performances through the commodification of fan activity. For instance, Ndalianis describes an aspect of The Dark Knight campaign, which “included phoning a security guard and trying to convince him to save someone being tortured” (168). In this scenario, fans can ‘create’ their own story based on their conversations with the ‘security guard’ but the narrative economy of bargaining over torture still remains intact. An interesting feature of the transmedia campaigns analysed by Ndalianis are the attempts to import ‘real’ protest into the fictional political campaigns devised for Harvey Dent, the protagonist/antagonist in The Dark Knight,and True Blood’s AVL. In the former, Dent’s campaign website was overlain with graffiti that painted his image with clown make up, signifying the Joker’s growing ‘invasion’ of the movie’s promotion (186). In the latter, AVL ads promoting the VRA were covered over, after their initial ‘clean’ public presentation, with anti-vampire slurs such as ‘Killers’ (179). The more consumers interacted with the campaigns, the more oppositional dissent was introduced into their advertising. This ‘dissent’ then becomes an entertaining spectacle, in which fans can participate, that drives the unfolding transmedia narrative as a story about biopolitical conflict; i.e. what are the democratic limits to expelling the Joker and criminals from Gotham City and vampires from public space in True Blood respectively.

In Simulacra and Simulation, Jean Baudrillard argues that the “impossibility of rediscovering an absolute level of the real is of the same order as the impossibility of staging illusion” (2006, 19). To illustrate this point he talks about the impossibility of staging a ‘fake’ bank robbery and assumes that “the network of artificial signs will become inextricably mixed up with real elements” (20). It is impossible therefore, to stage something that remains “close to the ‘truth,’ in order to test the reaction of the apparatus to a perfect simulation” (20). I would argue however that successful transmedia campaigns illustrate the degree to which the simulacra of political and juridical order is routinely accomplished by corporate and commercial interests and even accommodated by municipal councils and local governments. These transmedia activities seem to be premised on an expectation and acceptance that political campaigns which ostensibly aim to address crime and inequality will inevitably meet public backlash or violent acts of civil disobedience. Contestation over rights and public space are a normalised feature of transmedia campaigns.

Presumably this is entertaining in the context of a performance for a fictional text, albeit one that requires performance in the non-fictional social and political realm of everyday life, but we might compare this transmediation of political contestation with the everyday disciplining of activism in the public sphere. For example, in 2012, pro-Israel advertisements placed in New York subways by the American Freedom Defense Initiative were defaced with words such as “Racist” and “Hate Speech” and activists such as Egyptian-American writer Mona Eltahawy were arrested for spray-painting over them (Holpuch 2012). Here the spectacle of the invasion and countering of advocate discourse is swiftly disciplined by police and security forces, who acted to protect the purchase of advertising space by the American Freedom Defense Initiative. In New Zealand, 2007 saw a series of anti-terror raids resulting in heavy fines, long court proceedings and jail time for anarchist and Māori activists. Among the evidence used to surveil and arrest the defendants were recorded conversations detailing an apparently jocular suggestion that former US President George W. Bush could be assassinated on his next visit to New Zealand by launching a bus at his person (see Operation 8 [Abi King-Jones and Errol Wright, 2011]). Vijay Devadas (2008) provides a thorough examination of the events by situating them within the convergence of government and private security agendas during the ‘war on terror’. I note here that in distinction to transmedia campaigns that compel play-performance of public safety and order issues, parodic suggestions in the execution of advocacy by marginalised communities exacerbate rather than diminish their biopolitical position as threat.

Of course the difference between these ‘real’ events and transmedia storytelling is that the latter involves “a cognitive and sensory satisfaction that relishes in the performativity and playfulness of the text” (Ndalianis 2012, 183). The playfulness and enjoyment of transmedia fan participation seems to occur by virtue of the lack of substantive social and political consequences to transmedia performances. Where Baudrillard might see such performances as testing the authoritative apparatus of juridical and state institutions in such a way as to restate the latter’s epistemological authority to delineate ‘real’ from ‘fake’ civic activity, I would argue that transmedia activity, provided it is authorised by corporate and municipal bodies, does not test ‘the apparatus’ of a juridical and institutional order so much as it ‘simulates’ this order safely and with a positive affective disposition protected by officially authorised forms of consumption.

Ndalianis’ work maps out a framework of analysis, which takes into account the embodied, affective and urban social participation of transmedia storytelling as a significant dimension of fan activity. Given that transmedia storytelling involves the cultivation of activity and participation in the public sphere and urban environment, by connecting private acts of consumption to a theatre of public brand performance, it would be productive to extend Ndalianis’ analytic framework to an investigation of the types of affective relations emerging between fans, the public sphere, media texts, corporate industry and processes of social and political inclusion and exclusion. Does transmedia storytelling encourage a positive affective relation to biopolitical performance so long as this performance is confined to the ‘fictional’ realm? Do media scholars need to account for the consequences of transmedia ‘play’ such as the mass-shooting which took place in an Aurora, Colorado, cinema during a screening of the film The Dark Knight Rises by a young man impersonating a character from the Batman textual archive? How might we compare the increasing surveillance of political advocacy and creative protest with the seeming acquiescence of municipal and city councils to permit corporate branding to invade civil and public spaces for transmedia storytelling campaigns? Notwithstanding the possibility for resistance or divergence on the part of fans with the ‘intended’ transmedia story, the type of narrative used to anchor transmedia campaigns nevertheless frames and orients fan relations to texts through modes of consumer engagement that are legitimated by corporate, state and municipal institutions. Although my focus here has been on the ways in which transmedia consumer engagement legitimises biopolitical modes of performance and debate around civil rights, it may prove fruitful to investigate other types of relations that emerge from embedding fans into state institutions and discourses via transmedia storytelling.

Conclusion: “That’s the sickest shit I’ve ever seen … and I watch Dance Moms!”

In this paper, I have examined how biopolitical imperatives and constraints around vampire integration in True Blood are mediated through transmedia forms of storytelling and marketing. The transmediation of vampire rights involves fan immersion in discursive and representational practices which (re)produce vampirism as an allusion to gay liberation and LGBTI politics. The program’s use of Tru Blood, both intra- and extra-textually, is premised on the commodification of identity politics but also attests to the permeation and popularisation of a rights-based consensus for minority groups. In a positive reading of the program’s allusions to gay rights, True Blood’s transmedia storytelling appears to evince an inclusive textual and representational landscape for LGBTI politics. At the same time, the program draws attention to the biopolitical function of rights discourse by suggesting that it is the management of particular kinds of life, through particular kinds of consumption, that remains valuable to the dominant political and economic order rather than the identities these rights are attached to. In this sense, the mapping of vampirism onto civil rights also functions to legitimise a political discourse that measures some rights against others in terms of the strategic economic and social benefits such rights grant to the polity or fan community as a whole. This weighing up and measuring of rights in terms of who deserves social and political life, and what ‘life’ can be ‘good’ for the community, is surely more monstrous than anything True Blood’s vampires are capable of.



Ball, Stephen. 2001. “Performativities and fabrications in the education economy.” In The Performing School: Managing teaching and learning in a performance culture, ed. Denis Gleeson and Chris Husbands, 210-226. London: Routledge.

Baudrillard, Jean. 2006. Simulacra and Simulation. Translated by Sheila Faria Glaser. Ann Arbor: University of Michigan Press.

Beck, Bernard. 2011. “Fearless Vampire Kissers: Bloodsuckers We Love in Twilight, True Blood and Others.” Multicultural Perspectives 13 (2): 90-92.

Berkshire, Geoff. 2012. “‘True Blood’ recap: Roman’s fate revealed ‘In the Beginning’.”

HitFix, July 23. Accessed April 27, 2014.

Case, Sue-Ellen. 1991. “Tracking the Vampire.” Differences: A Journal of Feminist Cultural Studies 3(2): 1-20.

Davies, Ben, and Jana Funke. 2011. “Introduction: Sexual Temporalities.” In Sex, Gender and Time in Fiction and Culture, edited by Ben Davies and Jana Funke, 1-16. New York: Palgrave Macmillan.

Devadas, Vijay. 2008. “15 October 2007, Aotearoa: Race, terror and sovereignty.” Sites 5(1): 124-151.

Foucault, Michel. 1977. Discipline and punish: the birth of the prison. New York: Pantheon Books.

Foucault, Michel. 1991a. “Governmentality.” In The Foucault Effect: Studies in Governmentality, edited by Graham Burchell, Colin Gordon, and Peter Miller, 87–104. Chicago: University of Chicago Press.

Foucault, Michel. 1991b. “Right of Death and Power over Life.” In The Foucault Reader: An Introduction to Foucault’s Thought, edited by Paul Rabinow, 258–272. New York: Penguin Books.

Grigoriadis, Vanessa. 2010. “The Joy of Vampire Sex: The Schlocky, Sensual Secrets Behind the Success.” Rolling Stone, September 21112: 54-59.

Halberstam, Judith. 1993. “Technologies of Monstrosity: Bram Stoker’s Dracula.” Victorian Studies 36(3): 333-352.

Home Box Office. 2012. Fellowship of the Sun. Accessed January 1, 2012. [site archived here:]

Holpuch, Amanda. 2012. “Activist Mona Eltahawy released after arrest in New York subway protest.” The Guardian, September 26. Accessed April 26, 2014.

Hudson, Dale. 2013. “‘Of Course There Are Werewolves and Vampires’: True Blood and the Right to Rights for Other Species.” American Quarterly 65 (3): 661-687.

Jenkins, Henry. 2006. Fans, Bloggers, and Gamers: Exploring Participatory Culture. New York: New York University Press.

Jenkins, Henry. 2011. “Transmedia 202: Further Reflections.” Confessions of an Aca-Fan, August 1. Accessed April 28, 2014.

Matthews, Nicole. 2011. “Noughties Reading.” In The New Politics of Leisure and Pleasure, edited by Peter Bramham and Stephen Wagg, 195-210. New York: Palgrave Macmillan.

Mutch, Deborah. 2011. “Coming Out of the Coffin: The Vampire and Transnationalism in the Twilight and Sookie Stackhouse Series.” Critical Survey 23(2): 75-90.

Newitz, Annalee. 2008. “Let’s Face It: ‘True Blood’ Hates Gay People.” io9, November 1. Accessed April 28, 2014.

Ndalianis, Angela. 2012. The Horror Sensorium: Media and the Senses. Jefferson: McFarland Publishing.

Sennett, Richard. 2006. The Culture of the New Capitalism. London: Yale University Press.

Shen, Maxine. 2009. “Flesh & ‘Blood’.” New York Post, June 23. Accessed April 28, 2014.

Solomon, Deborah. 2010. “Once Bitten: Questions for Charlaine Harris.” The New York Times, April 30. Accessed April 28, 2014.

Spivak, Gayatri Chakravorty. 1987. In Other Worlds: Essays in Cultural Politics. New York: Methuen.

Tyree, J. M. 2009. “Warm-Blooded: True Blood and Let the Right One In.” Film Quarterly 63(2): 31-37.

Yabroff, Jennie. 2008. “A Bit Long in the Tooth.” Newsweek, December 15. 152(24).



Ball, Alan. True Blood. 2008-2014. USA: HBO.

King-Jones, Abi and Errol Wright. Operation 8. 2011. NZ:

Lieber, Jeffrey, Abrams, J. J., and Damon Lindelof. Lost.2004-2010. USA: Walt Disney Studios Home Entertainment.

Nolan, Christopher. 2008. The Dark Knight. USA: Warner Home Video.



[1] My thanks to the anonymous referee for their thoughtful comments and suggestions for improving the paper’s analytical focus. I am also grateful to Kevin Fisher for sharing his insights on Baudrillard and transmedia during the writing of this paper and to Katharine Legun for her help with improving the clarity and coherency of the paper. An early version of this paper was published in the magazine Cherrie. The original version of the paper can be found here:


Bio: Holly Randell-Moon is a Lecturer in Communication and Media Studies at the University of Otago, New Zealand. Her publications on popular culture, gender, and sexuality have appeared in the edited book collections Common Sense: Intelligence as Presented on Popular Television (2008) and Television Aesthetics and Style (2013) and the journal Feminist Media Studies. She has also published on race, religion, and secularism in the journals Critical Race and Whiteness Studies, borderlands and Social Semiotics and in the edited book collections Religion, Spirituality and the Social Sciences (2008) and Mediating Faiths (2010).


The Ecstatic Gestalt in Werner Herzog’s Cave of Forgotten Dreams — Kevin Fisher

Abstract: Werner Herzog’s Cave of Forgotten Dreams (2010) has been celebrated as the first non-gratuitous use of 3-D: perfectly suited to revealing the interior of the cave and the naturalistic environment in which it is situated, as opposed to immersing the spectator within computer-generated artificial worlds. I will dispute this reading of the film, describing its use of stereoscopy as instead expressive of an anti-naturalistic ecstatic gestalt by appeal to Ágnes Pethő’s concept of intermediality and the film phenomenology of Vivian Sobchack. Moreover, I will read the figural tropes generated through the film’s use of stereoscopy through George Bataille’s analysis of the emergence of human consciousness, which I argue reciprocates a key thematic of Herzog’s filmmaking.


Figure 1: Cave of Forgotten Dreams (Werner Herzog, 2010)

Figure 1: Cave of Forgotten Dreams (Werner Herzog, 2010)


This essay began with the desire to read the use of “3D” stereoscopic imagery in Cave of Forgotten Dreams (Werner Herzog, 2010) as an expression of one of the film’s central themes: the enigmatic consciousness of those early humans who left their renderings and cultural artifacts in the Chauvet cave in southern France. Indeed, the film, and Herzog as narrator, both reflect on the cave paintings as a form of proto-cinema, and reciprocally upon cinema as an analog of primitive consciousness. In its reflexive layering of media forms and metaphors between the bookends of what Herzog claims to be the oldest know examples of human representation and the most current cinematic technologies, the film engages in what Ágnes Pethő, in her book Cinema and Intermediality: The Passion for the In-Between (2011), describes as elaborate forms of mirroring characteristic of “abysmal intermediality”. For Pethő, intermedial confrontations open onto an in-between space (or abyme) that transcends medium specificity but instead foregrounds the embodied situations that negotiate them. In this respect, intermedial analysis has phenomenological implications, in relation to which, like Pethő’s own work, I will invoke the film theory of Vivian Sobchack. In describing the structure of the particular abyme onto which Cave opens, I will also draw upon Georges Bataille’s anthropological speculations about the co-emergence of human consciousness, tool use and the order of things to reflect upon the meaning of the particular correlation in which the film configures spectator, object and world, which I will elaborate as its ecstatic gestalt.

Intermediality and Phenomenology

For Pethő, what differentiates intermediality from intertextuality, as well as from more closely related theories of remediation, is the former’s emphasis on embodiment as the primary axis of analysis. As she asserts: “While in intertextuality we have an object that dissolves into its relations, in cinematic intermediality we seem to have moved closer to … a quasi-palpable, corporeal entity in its intermedial density” (2011, 47). In other words, the intermedial resides at the intersections among the embodied situations implicated both by the different media invoked within a given film, and the specific embodied situation of the spectator. She argues further, that as a result of it’s irreducibly embodied nature, “the intermedial cannot be read, at least not in any conventional way that we understand reading … because it is not textual in nature” (67). She continues: “It is not something one ‘deciphers’, it is something one perceives or senses” (68). This assertion involves some contentious assumptions regarding the relationship between embodiment and signification in the context of Pethő’s adaptation of Sobchack’s phenomenology. I’ll return to this issue towards the end of the essay, as it will be exemplified in the analysis that follows. However, I want to assert from the outset that notwithstanding this concern, it is the general phenomenological commitment of Pethő’s theorisation of the intermedial that marks its particular applicability to Cave. Reciprocally, I hope that my reading of the film will also lend some clarification to an approach that is at times as vague as it is provocative. For as Pethő herself acknowledges, “the possible import of phenomenological approaches to film in the interpretation of cinematic intermediality has not been stressed enough…. The phenomenology of intermediality, although hinted at … is yet to be spelled out” (69).

As suggested above, such a “spelling out” must attend to the modalities of embodiment implicated in cinematic intermediality. On one hand, intermediality is itself predicated upon an understanding of film and other moving image based media as intrinsically expressive of situations of embodied consciousness. On the other hand, as Pethő points out: “‘reading’ intermedial relations requires more than anything else, an embodied spectator” (69). In elaborating both sides of this correlation, Pethő relies on Sobchack’s phenomenology of film experience: in the first instance her concept of “film’s body” (1992), and in the second her notion of the spectator as “cinesthetic subject” (2004).

According to Sobchack, “a film must constitute an act of seeing for us to be able to see it” (1992, 129). Insofar as “vision is an act that occurs from somewhere in particular; its requisites are both a body and a world” (25), an observation she applies “not only to the spectator of the film, but also to the film as spectator” (49). What she calls the “film’s body,” like our own bodies, is experienced primarily and prereflectively not as a visibly represented body-object, but as the implicit means of perceiving a visible world. She writes:

Each film projects and makes uniquely visible not only the objective world but the very structure and process of subjective, embodied vision—hitherto only directly available to human beings as the invisible and private structure we each experience as “my own”. (298)

Figure 2: Cave of Forgotten Dreams (Werner Herzog, 2010)

Figure 2: Cave of Forgotten Dreams (Werner Herzog, 2010)

In this respect, the spectator’s own embodied situation functions as the common denominator through which the film’s body is made sensible and intelligible within a “double occupancy of vision” (260). As Sobchack argues, the fact that cinema communicates directly through sound and vision alone does not mean that the film experience is reduced to those channels of perception. Rather, through the phenomena of synaesthesia, the lived body automatically and prereflectively transcodes visual and auditory perceptions indirectly across the other sensory registers. In Cave this is most apparent in the way the proximate relation of the camera to the walls of the cave invokes the tactile qualities of the surface within the lived body of the spectator—as if touching through one’s eyes. The cinesthetic subject is thus neither disembodied—reduced to a transcendental gaze—nor is its experience of film equivalent to direct unmediated perception.

In qualifying this mingled “intermedial density” as “quasi-palbable”, Pethő seems to corroborate Sobchack’s point that the film’s body is never experienced as identical to that of the spectator, nor collapsed or conflated in experience. As she asserts, “I never merely ‘receive’ the film’s vision as my own … ” (271). While they might overlap and intersect in ways uniquely enabled by their intermediation, each still retains a mutual exteriority or otherness so essential to the preservation of that “in-between space” on which the intermedial relies. For Pethő it is precisely the unique openness of the film’s body to that of the spectator combined with the distinct differences between the two that marks the intermedial nature of film experience. She cites Jennifer Barker’s reflections on Sobchack’s phenomenology in this respect:

We exist—emerge really—in the contact between our body and the film’s body… a complex relationship that is marked as often by tension as by alignment…. so that the cinematic experience is the experience of being both “in” our bodies and “in” the liminal space created by that contact. (2009, 19)

What I will explore in this essay is how the use of stereoscopy in Cave of Forgotten Dreams augments this tension, both in relation to the embodied situation of the spectator and the embodied situations implicated by other medialities within the film.

Pethő is equally concerned with intermedial relations within film insofar as the incorporation of other media forms involves exchanges among the modes of embodiment implicated by each. In an article not mentioned by Pethő: “The Scene of the Screen: Envisioning Photographic, Cinematic, and Electronic Presence” (2004), Sobchack prefigures this move as she traces how the “techno-logic” of each medium implicates a particular “phenomo-logic”:

Insofar as the photographic, the cinematic, and the electronic have each been objectively constituted as a new and discrete techno-logic, each has also been subjectively incorporated, enabling a new and discrete perceptual mode of existential and embodied presence. (139)

For example, Sobchack describes a scene in Blade Runner (Ridley Scott, 1982) where Deckard re-animates a photograph belonging to the replicant Leon through a fictional electronic device. She observes how “transmitted to the television screen, the moving images no longer quite retain the concrete, material and objective ‘thingness’ of the photograph, but they also do not achieve the subjective animation of the intentional and prospective vision objectively projected by the cinema” (154). Sobchack’s description of the transmuting force of this remediation anticipates the in-between status that Pethő attributes to the intermedial.

The example from Blade Runner is also indicative of the capacity of film’s body to generate other embodied situations as correlatives of the actual and speculative technologies represented and emulated within its fictional world. In Cave there are a peculiar series of shots, peppered throughout the film, in which scientists researching different aspects of the cave paintings each stand (either alone or with their partners) inside the cave while displaying a printed image of a painting towards the camera. Rather than taking on an enhanced appearance it is the anemic two-dimensionality of the displayed images that is accentuated in relation to the stereoscopically enhanced environment they occupy. While their affect is one of scientific seriousness, the intermedial figuration of these images within the stereoscopic world of the film offers no enhancement of the representational field within their frame, but instead serves to diminish both their objective presence and signifying power. It’s as if stereoscopy is being turned against representation to reverse the relations of containment between the remediated photos and their referents. The insertion of the two dimensional images within the stereoscopically augmented world of the film thereby exemplifies the “abysmal” power of the intermedial to figure one media element against the other as ground, so as to throw some limitation into relief (Pethő 2011, 44).

Pethő describes how intermediality can also generate traces or metaphors from other media that reflexively allegorise embodied situations within the film (65). In one instance, the film’s remediation of the digital image processing techniques of Tossello and Fritz, who in Herzog’s words “used the dimensionality of the surface to create a powerful contrast” by dissecting the palimpsest of marks on the walls (from bear scratches to rendered figures) into discrete layers, offers a reflexive model of his own use of stereoscopy to hold certain intermedial elements in suspension. In another example, the computer-generated model of the cave, a geometric structure of luminous pixels rotating within a blackened virtual space, quite literally turns the unfathomable negative space into a stereoscopic positive—an objective correlative for Sobchack’s description of the film’s body as subjective embodied vision turned inside out and made visible onscreen. However, in Cave the reversibility of film’s body finds itself at an impasse qua abyme relative to the consciousness of the cave painters, which according to Herzog’s voice-over, we “will never be able to understand”.

Figure 3: Cave of Forgotten Dreams (Werner Herzog, 2010)

Figure 3: Cave of Forgotten Dreams (Werner Herzog, 2010)

Stereoscopy and Documentary

One key advantage of Pethő’s account of intermediality as intercorporeality is that it bypasses reductively empirical and technologistic definitions of media specificity while restoring the phenomenological grounds for describing (without reifying) the experiential differences among media. For example, within the context of this study an intermedial approach aids in exploring the tensions between assumptions regarding stereoscopy, documentary, and Hollywood cinema that, according to Barbara Klinger, many of the critical celebrations of Cave of Forgotten Dreams have sought to neutralise. For instance, emphasis has been placed on the fact that “Cave … focused on a real-life marvel rather than a CGI-manufactured landscape” and that it was shot in 3-D as opposed to other films “converted in post-production and thus considered as ‘fake 3D’” (Klinger 2012, 38). On this basis the film is regarded as “one of the few justifiable recent excursions into 3D” (Hoberman 2011), “necessary” for revealing the interior of the cave and the natural environment in which it is situated (Klinger 2012, 38). It’s worth noting how the criteria of indexical documentary realism implicated in the aforementioned defenses of Cave, on the one hand, and the connotations of 3D with Hollywood spectacle, on the other, constitute a problem that must be resolved by these same critics through the recuperation of Cave’s “naturalism”.[1] However, such apologies ring false in the face of Herzog’s own rejection of the documentary tendency (of which he accuses Cinéma Vérité in particular) that “confounds fact and truth” (2002, 301). Instead, he insists that “there are deeper strata of truth in cinema, and there is such a thing as poetic, ecstatic truth. It is mysterious and elusive, and can be reached only through fabrication and imagination and stylization” (301). Thus, “ecstatic truth … is the enemy of the merely factual” and a counterpart rather of the sublime (2010). The notions of media specificity that haunt the critical reception of the film also conspire to diminish what Pethő identifies as the productive tensions of the intermedial both within the film, and between film’s body and spectator. It is thus no surprise that the radical intermediality of Cave must be quelled in order to preserve a received sense of its “documentariness”.

Klinger herself is suspicious of this critical mobilisation of stereoscopic CGI in the Hollywood blockbuster as foil for Cave’s “naturalism”. As she writes:

Cave’s relationship to 3D is more paradoxical and interesting than such contrasts suggest…. In fact [the] film is [reflexively] as much about 3D as it is about its archaeological site…. [Specifically] the stylistic choices of deep focus cinematography (which presents foreground, middle ground, and background in focus) and a dynamically mobile camera help to wed spectacular natural phenomena and the spectacle of space. (39)

Her strategy is thus to argue that Cave transcends the presumed antagonism between the conventions of naturalism and spectacle qua stereoscopy by becoming reflexive in its spectacular naturalism. In so doing, Klinger responds to, but also reproduces, a tendency to understand stereoscopy as mimetically reflecting [or enhancing] structures and properties belonging to the objective world. Contrary to this tendency, and consistent with intermedial analysis I’d like to focus instead on the role of stereoscopy in its production of a mode of subjective embodied consciousness. In this context it is worth recalling Sobchack’s fundamental insight that any cinematic representation of a visible world entails the primarily invisible structuring presence of an embodied viewing subject—what she calls film’s body.

In pursuing the question of what structure of consciousness is manifested by the use of stereoscopy within Cave, one source for an answer can be found in Jonathan Crary’s Techniques of the Observer where he argues that the “lack of planar unity” in the nineteenth century stereoscope produced a fragmented observer quite distinct from the unified observer of the camera/photograph (1990, 128). I want to suggest that the reflexivity of Cave hinges upon a structurally analogous “structural/planar disunity of the perceptual field” (128) that prevails in normative cinematic experience, and a correlative fragmentation of the viewing subject/spectator (though to different effect). Following Crary’s account of the experience of stereoscopic photography, my analysis proceeds from a rejection of the naturalism ascribed to Herzog’s deployment of stereoscopy. Instead, I want to argue that the combination of stereoscopy with deep focus cinematography and camera movement (especially inside the cave), which Klinger refers to as “gold standards … of achieving ‘3Dness’” (2012, 40), often result in a hyperbolic space that exaggerates the separation of figure from ground towards the center of the frame while attenuating it at the peripheries through anamorphosis. The effect is especially amplified during instances of negative parallax (objects appearing to protrude through the screen) in addition to the film’s more persistent positive parallax (the appearance of supplementary depth behind the screen).[2] This emphatic demarcation of figure from ground produces what I describe as the film’s ecstatic gestalt. It is ecstatic in the double sense of ekstasis: as a literal standing out of figure from ground, and as the existential sense of what Herzog describes as “a person’s stepping out of himself into an elevated state” (2010). These two meanings of the ecstatic also relate to the two principle correlations within the gestalt: figure and ground within the perceptual field, and subject figured against the perceptual field as object of perception.

Figure 4: Cave of Forgotten Dreams (Werner Herzog, 2010)

Figure 4: Cave of Forgotten Dreams (Werner Herzog, 2010)

I want to argue that stereoscopy exaggerates the configuration of figure/ground relations within the cinematic image that, according to Sobchack, provides the most fundamental expression of the intentional activity of the film’s body as viewing subject. As she writes: “perception—as an irreducible correlation of figure and ground—forms and organizes a perceptual field(1992, 70). In this respect, every relation of figure and ground implies the intentional activity of a viewing subject as an irreducible element within the gestalt (figured against the perceived world), whose literal and intentional movement reconstitutes the perceptual field, altering and potentially reversing the relation of figure and ground within it. Because this changing configuration of subject/figure/ground is accomplished by “the radical and prereflective deliberation of the body-subject” (Sobchack 1992, 70), we are typically unaware of the co-constituting force of our embodied intentionality, which remains latent to consciousness. However, for Crary this latency is counteracted in stereoscopic photography, which impressed upon consciousness the productive power of the apparatus/observer couplet as projective of dimensionality—experiencing it where it did not objectively exist: within the representational field of the two dimensional image (1990, 129). Or, as Martin Jay summarises: “its three-dimensional images were only in the perception of the viewer—the stereoscope called into question the assumed congruence between the geometry of the world and the natural geometry of the mind’s eye” (1994, 152).

The highly mobile camera in Cave redoubles this effect by associating the exaggeration of its stereoscopic demarcation of figure/ground with the intentional movements of a viewing subject rather than with some natural geometry inherent to the represented world. Put simply, the effect follows the movement of the camera, as in the three-hundred and sixty degree pan around the so-called “cave of the lions” where the sense of added stereoscopic depth, or positive parallax, fluctuates with the distortions of perspective caused by the rotation of deep focus cinematography within such a tight space. The awareness of stereoscopic perception as constituted by the visual subject is also reinforced inside the cave by the spotlight on Herzog’s helmet (or that of the camera person), which creates a sort of iris within the image that circumscribes a zone of greatest effect. Taken together, these representational strategies install a persistent reflective awareness of the productive power of the film’s stereoscopic body within the otherwise prereflectively constituted visual field.

The expressionism of this ecstatic gestalt is also critical to recognition of the intermedial fissures opened in-between the spectator and film’s body. Barker’s earlier remarks to this effect corroborate Sobchack’s argument that the “double occupancy” of the film experience also creates a potential site of tension since the film’s body “in its visible and visual intentional activity, exists within our vision but not as our vision” (1992, 142). As such, “in so far as the visual space I see before me is not completely isomorphic with the bodily space from which I see, there will be a pressure from, an echo of, the machine that mediates my perception” (179; original emphases). This phenomenon of “echo focus” takes on a persistent quality in Cave given that the 3D effect is unlike both non-stereoscopic film and the extra-cinematic space lived by the spectator. The ecstatic gestalt is thus experienced not just as a transformation of the ordinary dimensionality of film, but also of the quotidian three-dimensional world of the spectator.

The “inner landscape” of stereoscopy

Herzog refers to the cave paintings in voice-over as “inner landscapes … of long forgotten dreams”. The notion of an inner landscape is a potent phenomenological and intermedial metaphor which Eric Ames has analysed as exemplary of how Herzog’s representation of “[outer] landscapes serve to conjure unseen words of affect and spirituality, even as they represent the physical world we inhabit” (2009, 58). For example, in his voiceover narration for The Dark Glow of the Mountain (Werner Herzog, 1984), Herzog remarks: “We weren’t so much interested in making a film about mountain climbing per se, or about climbing techniques. What we wanted to find out was what goes on inside mountain climbers who undertake such extreme endeavors…. Aren’t these mountains and peaks like something deep within us all?” During Herzog’s voice-over the camera pans across the peaks and valleys of the range, providing what Ames describes as a “graphic representation” (58) of the affective highs and lows of the climbers’ inner landscape. In Cave’s ecstatic gestalt, I want to locate a related system of correspondence. Though in this case the graphical axis of the inner landscape has been rotated from the vertical and horizontal (peaks and valleys) to the perpendicular (figure and ground), and from the syntagmatic to the paradigmatic.

Figure 5: The Grizzly Man (Werner Herzog, 2005)

Figure 5: The Grizzly Man (Werner Herzog, 2005)

In Cave as in other film’s such as The Grizzly Man (2005), the expression of inner landscapes connects with another persistent theme from Herzog’s oeuvre: exploration of the boundary between human and pre-human or animal consciousness. The relationship of gestalt philosophy to such themes is well established. In his reading of gestalt theorist Jean Piaget, Habermas describes how the emergence of a reflexive capacity within figure/ground relations in the development of the human child (to constitute oneself as figure against the world) recapitulates a stage in the evolution of human consciousness (1979). This is not to deny that infants, pre-human ancestors, or animals possess some ability to delineate figure from ground. The assumption rather is that they are incapable of bringing the gestalt to reflection, and of thereby figuring themselves within it. As Georges Bataille observes in Theory of Religion, although “the animal can be regarded as a subject for which the rest of the world is an object, it is never given the possibility of regarding itself in this way” (1989, 19).[3] Rather, for Bataille, “the animal is in the world as water is in water” (23). For example, the difference between the eater and eaten within the animal world is never qualitative but only ever quantitative: “In the movement of the waters he is only a higher wave overturning the other, weaker ones” (18-19). The animal is in a state of immanence, intimacy and immediacy within a world defined by continuity.

It is only through the correlated emergence of tool use, language and representation that human consciousness becomes reflectively aware of itself within a world of atomized and discontinuous ‘things’ defined by their function within a scheme of utility. For Bataille, the birth of the tool is intricately connected to that of the object and the subject. An instructive rendering of this event can be found in “The Dawn of Man” sequence from 2001: A Space Odyssey (Stanley Kubrick, 1968) when the hominid has an epiphany in which it suddenly perceives/conceives a bone lying on the ground as a weapon. Simultaneous with this realisation is its ability to imaginatively extrapolate the application of the tool into other situational contexts (from breaking other bones on the ground, to smashing the head of an animal, to killing the leader of a rival group at the water hole), as well as into hypothetical temporalities, such as planning future uses of the tool. As a result, discontinuity is introduced into the world by an object that is perceived indirectly according to what it does rather than directly in its immanence and immediacy: “the purpose of a plow is alien to the reality that constitutes it” (Bataille 1989, 41). In what Bataille describes as “one of the most remarkable and fateful aberrations of language […] men situated on the same plane where the things appeared (as if they were comparable to the digging stick or the chipped stone) elements that were nonetheless continuous with the world, such as animals, plants, other men, and finally, the subject determining itself” (28, 31). So in other words, the question of what a thing is automatically devolves to a question of its use value, so that all things refer in their utility to humanity, and humanity in turn to God. Through a certain contagion of thought, “the transcendence of the tool and the creative faculty connected with its use are confusedly attributed … to the entire world” (32).

For Bataille, “the world of things is perceived as a fallen world. It entails the alienation of the one who created it…. The tool changes nature and man at the same time: it subjugates nature to man, who makes and uses it, but it ties man to subjugated nature” (41). The subjugation lies in the fact that, in contrast to the reversibility of figure and ground within the discontinuous world, the figuration of the discontinuous human world against the ground of the continuous pre-human world is irreversible. The thing becomes a reducing filter within consciousness, as perceptions become indiscernible from the concepts projected, which take on a sort of autonomy inseparable from the world in-itself. As a consequence, states Bataille, “nothing is more closed to us than this animal life from which we are descended…. We can never imagine things without consciousness … since we and imagine imply consciousness, our consciousness, adhering indelibly to their presence” (20). Nevertheless, as Bataille asserts: “There is every indication that the first men were closer than we are to the animal world; they distinguished the animal from themselves perhaps, but not without a feeling of doubt mixed with terror and longing” (35). An anthropologist that Herzog interviews concurs that the cave painters did not merely see the animals they painted as things but as spiritual entities in a relationship characterised by greater “fluidity and permeability” than the modern world. By contrast, for Bataille, “The sense of continuity that we must attribute to animals … derived a new significance from the contrast it formed to the world of things … [and] offered man all the fascination of the sacred world, as against the poverty of the profane tool (of the discontinuous object)” (35). In Cave as in other films such as Encounters at the End of the World (2007), Herzog is most interested in the inner landscapes and unconscious drives of the scientists he interviews, such as Julien Monnet who reports that after going into the cave he couldn’t stop dreaming of lions and paintings of lions, and that he was possessed by “a feeling of powerful things and deep things, a feeling of understanding things, that was not a direct way”.

Indeed, Bataille concurs with Herzog that indirection is the only means of approach to such depths. The “abysmal” quality of this reciprocal mirroring between the human and the animal is illustrated traumatically in Herzog’s Grizzly Man (2005) through the folly of Timothy Treadwell’s desire to “enter the secret world of the bears”. Narrating over a close-up shot of the face of the bear that likely killed and ate Treadwell, Herzog senses: “no understanding, no kinship, no mercy, only the overwhelming indifference of nature. To me there is no such thing as the secret world of the bears and to me this blank stare speaks only of a half bored interest in food” into which Treadwell was (at his own peril) anthropomorphically projecting an “inner landscape”. Hence the irony (and the poetry) that the false opening through which Treadwell imposed himself on the world of the bears was reprised by an open mouth, the only way into the animal world being the reduction of the self to pure immanence, since one could not become animal and maintain any vestiges of human subjectivity and sovereignty.

Here we might begin to grasp the function of Cave’s stereoscopic ecstatic gestalt: to return its spectator not just to the physical site of the cave, but to a consciousness still astonished by the novelty and discontinuity of the thing figured in unnatural and quasi-hallucinatory fashion against a continuity that it occluded, but from which to quote William Wordsworth on his recollections of early childhood, it was “still trailing clouds of glory” ([1919] 2008, 536). In this way, my reading of Cave’s ecstatic gestalt through Bataille’s speculative anthropology also resonates with Paul Arthur’s elaboration of Herzog’s “metaphysical realism” in which:

As self-professed intermediary between opposing worlds—modern/pre-modern, prosaic/myth, accessible/recondite—Herzog’s strongest moments revolve around what can’t be shown, what exceeds or beggars representation … [and to] that which testifies to his own inadequacy and, by extension, that of cinema’s meager communicative tools. (2005, 5)

This metaphysical realism is reflected, for instance, in the way that Cave points from objective to non-objective forms of transcendence, between that from which the camera is merely physically blocked and that which is existentially inaccessible. For example, the reverse side of a large stalactite in the “cave of lions” on which is represented the one arguably human figure in the cave, cannot be viewed directly from the walkway to which the camera crew is restricted. “You’ll have to make do with a partial image”, observes the scientist supervising the filming. Undaunted, Herzog’s crew mounts the camera in reverse on a boom and extends it into the space to capture the opposite side of the formation. The effect of negative parallax is pronounced as a result of the proximity of the camera to the surface, which seems to bulge through the screen. While the attempt to get around the backside of the representation yields a full image, it reveals a partial human: resembling a bison from the waist up and what appears to be a female nude (reminiscent of the Venus of Willendorf) from the waist down. The desire to get behind the image leads paradoxically to an image of the desire: the fundamentally conflicted impulse to merge human and animal modalities of embodied consciousness (via “fluidity and permeability”) through the means of representation.

Cave’s ecstatic gestalt involves both the literal sense of ekstasis as a figural standing out, and the metaphorical/metaphysical sense of an extraordinary pronouncement of being that defies representation. Writing in relation to a group of non-stereoscopic films invested in the expression of the spiritual through the material,[4] Sobchack observes how “the camera seeks a parallel ekstasis in the ‘flesh’ of the world: it offers up a profane illumination of objective matter that … opens into an apprehension of something ultimately unfathomable, uncontained and uncontainable—not only in the thing on which we gaze but also in ourselves” (2004, 298). In Cave the intensity of stereoscopic effects wax and wane relative to the film’s intentional objects. It is significant in this respect that the most pronounced instances of negative parallax accompany the exhibition and demonstration of pre-historic artifacts including weapons and tools, whose figuration within the film re-enacts—and commutes to the spectator—the sudden eruption of discontinuity that, according to Bataille, would have issued from their originary invention. The reflexivity of this intermedial reading is corroborated by the fact that in addition to pre-historic tools, the only similarly ecstatic expressions of negative parallax relate to representations of the tools of filmmaking, such as boom microphones and lighting equipment.

Indeed, negative parallax has a special affinity for tools that extend intentionality into space. However, it is telling that these stereoscopic representations of tools (both old and new) transcend the film’s explanatory function of their utility, and “illuminate” (to use Sobchack’s term) something in excess of their functionality. In this way, inadvertently perhaps, the film also invites intermedial reference to the very gratuitous deployments of stereoscopy from which critics have been so determined to distance Cave. For example, anthropologist Wulf Hein’s demonstration of replica prehistoric spears by thrusting and throwing them into negative parallax finds analogs in Hollywood films from Bwana Devil (Arch Oboler, 1952) to The Hobbit: An Unexpected Journey (Peter Jackson, 2012). However, the effects of negative parallax are most profound in relation to a different category of objects worked by tools but gesturing towards that continuous world whose opening—like a trompe l’oeil—they simultaneously block. The first of these objects is the figure of a chimerical lion/man rotated against a darkened backdrop, and offered as a complement to the image of the bison/woman on the stalactite. The second figure is the Venus of Hollifers, about which anthropologist Nicholas Conard makes the unintentional pun: “this one … stands out. It’s the absolute root of figurative depiction as we know it” (emphasis added). The effect of negative parallax is even greater here than in the previous example. Suspended in a glass box, the small sculpture seems to float in front of the screen in a state of ecstatic discontinuity from the surrounding world. The autonomy and disconnection that these objects achieve relative to their environment resonates with the emphasis on their being the “firsts” of their kind. Through the hyperbolic use of negative parallax, the spectator is invited to return to the moment of their radical “newness” in which their startling discontinuity would have been tantamount to a special effect, like the obelisk before the hominids in 2001.

I don’t offer these examples to imply that Herzog’s use of stereoscopy attempts to directly capture the inner landscape of a pre-human consciousness. For as Bataille declares: “There was no landscape in a world where the eyes that opened did not comprehend what they looked at, where indeed, in our terms, the eyes did not see” (1989, 21). Rather, I want to argue that Herzog cultivates this ecstatic gestalt in order to commute to the film spectator the startling novelty and transformative power that would have accompanied the eruption of the capacity for tool use, representation, and reflective consciousness against the ground of a pre-human world—which is why Herzog refers to the cave as the site of “the birth of the human soul.” In this way, Ekstasis thus accedes to ekphrasis, by which according to Pethő cinema incorporates intermedial relations to point beyond its own limits (46).


In concluding, I’d like to return to Pethő’s assertion that intermedial relations, because they occur at the prereflective level of embodiment, are not textual and cannot be read (69). This position is further grounded in her statement that “phenomenology does not see images as representations or signs; it sees them foremost as events and corporeal experiences” (70). This argument might at first glance seem to concur with that of Bataille. However, where he is content to be silent, Pethő wants description without signification. In this sense, her argument is in contradiction to the existential phenomenology of Sobchack in the context of which she advances it. In The Address of the Eye, Sobchack is explicit that she pursues a “semiotic phenomenology,” meaning that the structure of all systems of signification emerge from and recapitulate the structure of embodied prereflective perception (1992, 8). Thus embodiment appears as a theme only by virtue of being brought to reflection, but it can only be brought to reflection because it is already signifying. Hence for Sobchack, embodiment can be read, and she writes of “a textualizing of the sensing body” (69) which is a correlative of the fact that “in its existential function—perception is always semiotic” (70). The consequence of Pethő’s misreading is a potential mystification of the intermedial and romanticisation of the cinematic, in so far as the intermedial must be non-signifying yet embodied and cinematically expressible.

Relative to this discussion, it is intriguing that where Pethő touches briefly on “3D”, she asserts its antagonism to intermedial phenomena by drawing a contrast between “the intrusive ‘tactility’ of 3D images” and “‘haptic’ images” that, by contrast, “preserve a quality of openness towards intermediality” (105 n. 18). She continues that intermediality depends upon an “aesthetic distance”, which is preserved so long as the film is emulating some other mediality, such as painting or photography—a capacity cancelled by the “illusory display of objects in space that act upon our senses (as in the case of 3D imagery)” (105 n. 18). This critique seems overly proscriptive in light of the intermedial elements clearly apparent in a film like Cave. Also, when read in relation to Pethő’s dissociation of signs from corporeality, one might diagnose that it is precisely stereoscopy’s exaggeration of the signifying power of embodied consciousness through its ecstatic gestalt that troubles her.

There is also another way in which what Pethő refers to as stereoscopy’s “illusory display of objects” reflexively turns back upon the illusion of the object. In this respect, I would argue that Cave’s ecstatic gestalt induces a sort of impromptu phenomenological reduction upon the familiar, everyday world of things whose taken-for-granted dimensional extrusion as discrete and autonomous objects it renders strangely artificial, and quasi-hallucinatory. This is nowhere so apparent as in the opening scene, when the novelty of the stereoscopic effects would seem most pronounced to the unaccustomed eyes of the audience. The camera glides down a row within a vineyard that borders the area of the Chauvet Cave. Against the undifferentiated manifold of nature the vines are doubly objectified: both as “raw” nature “cooked” (to use Claude Lévi-Strauss’s [1969] terminology) into the useful form of a vineyard, and as figures of vision unnaturally extruded from their ground through the instrumental movement of our own language-laden consciousness. It is in this way that the novelty of Herzog’s use of stereoscopy doubles for that of the more primordial innovation of which cinema is itself an extension. By breaking the plane of representation and the illusion of depth to which the cinematic spectator is habituated, Cave simulates the world-rupturing force of a much more fundamental discontinuity in the perceptual gestalt and (importantly) provokes reflexive awareness of this event.



Ames, Eric. 2009. “Herzog, Landscape, and Documentary.” Cinema Journal 48 (2): 49-69.

Arthur, Paul. 2005. “Beyond the Limits: Werner Herzog’s Metaphysical Realism.” Film Comment 41 (4): 42-47.

Bataille, Georges. 1989. Theory of Religion. Translated by Robert Hurley. New York: Zone Books.

Bataille, Georges. 2005. The Cradle of Humanity: Prehistoric Art and Culture, edited by Stuart Kendall. Translated by Michelle Kendall and Stuart Kendall. New York: Zone Books.

Crary, Jonathan. 1990. Techniques of the Observer. Cambridge: MIT Press.

Habermas, Jürgen. 1979. Communication and the Evolution of Society. Translated by Thomas McCarthy. Boston: Beacon Press.

Herzog, Werner. 2002. “The Minnesota Declaration: Truth and fact in documentary cinema.” In Herzog on Herzog, edited by Paul Cronin, 301-302. New York: Faber and Faber.

Herzog, Werner. 2010. “On the Absolute, the Sublime and Ecstatic Truth.” Werner Herzog: The Only Official and Authentic Website of Werner Herzog. Accessed June 13 2013.

Hoberman, J. 2011. “Cave Man: Werner Herzog Can’t Get Out of His Own Way in Forgotten Dreams.” The Village Voice, April 27. Accessed January 12, 2014.

Jay, Martin.1993. Downcast Eyes: The Denigration of Vision in Twentieth-Century French Thought. Berkeley: University of California Press.

Klinger, Barbara. 2012. “Cave of Forgotten Dreams: Meditations on 3D.” Film Quarterly 65 (3): 38-43.

Klinger, Barbara. 2013. “Beyond Cheap Thrills: 3D Cinema Today, The Parallax Debates, and the ‘Pop-Out’.” Public Journal: 3D Cinema and Beyond 47: 186-99.

Lévi-Strauss, Claude. 1969. The Raw and the Cooked: Mythologiques, Volume One. Chicago: University of Chicago Press.

Pethő, Ágnes. 2011. Cinema and Intermediality: The Passion for the In-Between. Newcastle upon Tyne: Cambridge Scholars Publishing.

Sobchack, Vivian. 1992. The Address of the Eye: A Phenomenology of Film Experience. Princeton: Princeton University Press.

Sobchack, Vivian. 2004. Carnal Thoughts: Embodiment and Moving Image Culture, Berkeley: University of California Press.

Wordsworth, William. (1919) 2008. “Ode on Intimations of Immortality from Recollections on Early Childhood.” In The Oxford Book of English Verse: 1250-1900—Volume II, edited by Arthur Quiller-Couch, 536. Alcester: Read Books.



Herzog, Werner. 2010. Cave of Forgotten Dreams. Creative Differences.

Herzog, Werner. 2007. Encounters at the End of the World. Discovery Films.

Herzog, Werner. 1985. The Dark Glow of the Mountains. Sudfunk Stuttgart.

Herzog, Werner. 2005. The Grizzly Man. Lions Gate Films.

Jackson, Peter. 2012. The Hobbit: An Unexpected Journey. New Line Cinema.

Kubrick, Stanley. 1968. 2001: A Space Odyssey. USA: Metro-Goldwyn-Mayer.

Oboler, Arch. 1952. Bwana Devil. United Artists.

Scott, Ridley. 1982. Blade Runner. Warner Bros.



[1] Such critical tactics are ironic given Herzog’s self-reflexive stance towards the realist strategies he routinely deploys. Evident here, for example, in his penchant for testing the credulity of the audience, such as the apocryphal story about albino alligators mutated by radiation that concludes Cave of Forgotten Dreams.

[2] For a discussion of debates surrounding the value of negative and positive parallax, see Barbara Klinger, “Beyond Cheap Thrills: 3D Cinema Today, The Parallax Debates, and the ‘Pop-Out’” (2013).

[3] Over the span of his life, Georges Bataille produced a collection of short essays and talks on prehistoric cave art (from 1930 to 1957) compiled in The Cradle of Humanity: Prehistoric Art and Culture (2005). However, his Theory of Religion (1989) reproduces many of these ideas in more systematic form, especially with regards to the relationship between tool use, language, and the emergence of human consciousness.

[4] She refers specifically to Diary of a Country Priest (Robert Bresson, 1950), Thérèse (Alain Cavalier, 1986) and Babette’s Feast (Gabriel Axel, 1987).

Bio: Kevin Fisher is a Senior Lecturer in the Department of Media, Film & Communication at the University of Otago. His research interests include phenomenology, special effects and audio-visual analysis, and documentary. His essays have appeared in the anthologies Meta-Morphing (2000), The Lord of the Rings: Studying the Event Film (2007), Cinephilia in the Age of Digital Reproduction (2008) and The Fourth Eye: Mäori Media in Aotearoa/New Zealand (2014) as well as journals such as Science Fiction Film & Television and The New Review of Film and Television and The New Zealand Journal of Media Studies.


Vertical Framing: Authenticity and New Aesthetic Practice in Online Videos — Miriam Ross

Abstract: In recent years there has been much focus on the opportunities that mobile media devices (phones, tablets) offer for user-generated audio-video production. Most often this focus has concentrated on content with emphasis on new citizen journalism and YouTube home videos. Less attention has been given to the negotiations of aesthetic parameters that mark a departure from traditional filmmaking modes. In particular, the tendency for a new generation of filmmakers to shoot on mobile phones has led to a number of works produced in a vertical (portrait format). Initially dismissed as content “shot the wrong way”, vertical videos have proliferated in the exhibition platforms provided by YouTube, Facebook and other social media sites. This article examines the trend for shooting in a vertical mode, the material markers of ‘authenticity’ this mode appears to lend to its audio-visual content, and the effect of circulating this material in a context where other users can ‘police’ videos for bad practice. It will focus, in particular on how these aspects interact with the different mediations of authenticity that emerge from new screen technologies amongst the ongoing contingency of media forms.

A video version of this essay is available at this link:

The “CAUGHT ON CAMERA: Fertilizer Plant Explosion Near Waco, Texas” video on YouTube

In 2013, a video entitled “CAUGHT ON CAMERA: Fertilizer Plant Explosion Near Waco, Texas was uploaded to YouTube. It records the 17th April ammonium nitrate explosion at the West Fertilizer Company Storage and Distribution Facility from the perspective of an observer who is seated in a car beyond the perimeter of the plant. One of the aesthetic considerations that most clearly signals that it was recorded by an ‘everyday’ user is its use of vertical framing (the production of moving images in a portrait mode). Due in part to the proliferation of mobile phone cameras, which record in a vertical mode when the phone is held upright, vertical videos have begun to circulate widely on YouTube and other social media sites. Because they depart from the professional standard of horizontal composition, vertical videos are commonly perceived to be shot by amateur users rather than professional filmmakers. They tend to become known to a wider audience only when distributed as viral videos and/or they are picked up by traditional news channels as alternatives to professional news footage.[1] Similar to many newsworthy vertical videos, the vertical framing in the “Fertilizer Plant” iterates a sense of authenticity: the vertical mode’s association with amateur users suggests an everyday user providing unmediated witness to events as they occur rather than a professional filmmaker involved in staging and careful composition.[2] Although the relationship between authenticity and aesthetic configurations is always contingent upon historical, social and cultural uses of media, in cases such as this, vertical framing emphasises the potential for new media technologies to be used for capturing the ‘moment’ and emphasising the object of observation rather than traditional aesthetic concerns. At the same time that the “Fertilizer Plant” video breaks with a hundred year plus tradition of displaying moving-images in a horizontal, or landscape, format, it signals some interlinked debates around authenticity, aesthetics and new media technologies that are currently being worked through in the context of a new media backdrop where moving image screens proliferate to a greater extent than ever before and in increasingly diverse configurations (for example, Manovich 2001, 94; Friedberg 2009, 6; Verhoeff 2012; Casetti 2011). Within this context, I will discuss how vertical media generates possibilities for a new, increasingly flexible audio-visual environment in the early twenty-first century and how the use of a seemingly ‘amateur’ mode of framing raises issues around concepts of authenticity and aesthetic norms.[3]

Although YouTube has an idiosyncratic and diverse range of material on its site (that ranges from HD video and 3D enabled films to stop motion animation and pixelated phone videos) the “Fertilizer Plant” video exemplifies what has become known as the YouTube video aesthetic: an audiovisual object that expresses user-generated content through an amateur rather than professional appearance (Cubitt 2008, 45; Burgess and Green 2009a, 90; Müller 2009, 136).[4] This particular video’s affective power lies in the way that it combines its amateur appearance with an explosive event that is more akin to the pyrotechnic effects of a blockbuster action film. The spontaneity of the blast combines with the realisation that there are observers in the car who are perilously close to the fireball at the Plant. It is not unique in this regard as various YouTube videos capture spectacular ‘real life’ events but it is distinct from the many mundane domestic amateur films that circulate on YouTube.[5] At the same time, the shocking impact of the video’s events are not wholly due to its content but are also supported by aesthetic configurations that indicate to the viewer the events were not staged and the observers in the car were ‘true’ witnesses to the explosion. In no preferred order, these aesthetic configurations operate in the following ways: the slight shake from the hand held position indicates a human observer and, in this particular video, the observer is evident through the display of the filmmaker in the car’s side mirror; the view of the filmmaker makes it clear the video is filmed on a mobile phone, an amateur recording device notable for its spontaneous filming potential; the lack of staged lighting or artificial sets combines with a lack of cuts to express a type of ‘unmediated witness’ to the events that unfold; the wind buffering the microphone on the mobile camera reminds us of a recording device that is present; the lack of closure at the end of the film (as we struggle to know whether the invisible but presumably present members of the car are okay[6]) colludes with a number of open media texts that circulate somewhat anonymously online.[7]

Additional to all of these factors is the use of vertical framing. Beyond merely suggesting an amateur user, the vertical video gestures powerfully to a subjective human observer behind the camera. It suggests a person who has a mobile phone, close to hand, and has initiated the camera without changing their normal bodily hold upon that technological device. Even though mobile phones such as the Nokia C6were designed to encourage users to hold them in a horizontal alignment, most camera-enabled phones, particularly the new generation of iPhone and Android-based smart phones, are configured to operate primarily in a vertical alignment. In this way, use of a vertical filming mode reinforces the filmmakers’ personal touch as well as a sense of immediacy and presence within the act of filming. These components contrast with the seemingly impersonal viewpoints created by virtual cameras in CGI compositions, in which no filmmaker could be present such as shots high above Earth or shots passing through the walls of buildings (Brooker 2009; Jones 2013; Purse 2013). At a time when visual manipulation tools make it increasingly impossible to identify which visual objects are a ‘faithful’ record of an event and which are staged, vertical framing suggests (whether truthfully or not[8]) that no such manipulation has taken place.[9]

This type of aesthetic positioning of a real life event is not without historical precedence. In his discussion of the use of camcorder footage in documentary and news broadcasts towards the end of the twentieth century, Jon Dovey notes that

the low grade video image has become the privileged form of TV ‘truth telling’, signifying authenticity and an indexical reproduction of the real world; indexical in the sense of presuming a direct and transparent correspondence between what is in front of the camera lens and its taped representation. Secondly, the camcorder text has become the form that most relentlessly insists upon a localised, subjective and embodied account of experience. Finally, the video text has become the form that represents better than any other the shifting perimeters of the public and the private. Video texts shot on lightweight camcorders uniquely patrol, re-produce and penetrate the boundaries between the individual subject and the public, material world. (2000, 55)

Mobile phone footage offers the most recent iteration of this context, demonstrating, on the one hand, historical continuity whereby technology serves rather than creates desires for seemingly authentic material that mixes the public and the private and, on the other hand, the potential for new technologies to reinforce and renew the embodied relationship between filmmaker, text and viewer (Hjorth 2006, 2). It is in this context that Max Schleser has discussed the way in which the mobile phone, operating as a hand-held recording device, presents opportunities to “construct personal narratives and representations of self” (2012, 400; see also Hjorth 2006). Similarly, Gavin Wilson notes “such films repeatedly reference the body and sensory perception, evoking the sense of what objects feel like as we look at them, as objects and as images” (2012, 68). In each case, these possibilities are contingent upon the extent to which the mobile technology continues to reiterate the presence of its user and the extent to which the technology is upgraded so that its images are no longer discernible from professional footage. In an era in which YouTube offers to stabilise uploaded videos, phones increasingly offer HD settings, and readily available editing software allows the addition of professional levels of colour grading and sound mixes, it is not always possible to view the traces of an amateur and/or embodied user within the footage. However, in its current manifestation, a vertical framing mode cannot be subsumed within professional filmmaking practice. Apart from the fact that there is not a body of professional vertical works for the new vertical piece to join, few professional screens exist for its exhibition and so YouTube and mobile screens remain its natural home.

Figure 2. YouTube offers a range of configurations to help professionalise the appearance of videos

Figure 1. YouTube offers a range of configurations to help professionalise the appearance of videos

When exhibited on the mobile screen, it is possible to see that the vertical mode articulates a conflation of audiovisual technologies: the distinction between capture and display devices (Wilson 2012). Since the earliest motion-picture cameras, which had the dual function of capturing images on film and then replaying that film in a projector mode, there has been the ability to use cameras as both recording and display devices. Nonetheless, the speed with which digital devices such as the mobile phone (and increasingly the tablet) can replay images recorded by the camera leads to a sense of immediacy which couples the camera device to its screening function. Immediate playback can lead to forms of intimate encounters between the device, text and viewer. At the same time, the current novelty of vertical framing emphasises how the coupling of mobile filming/display device with filmmaker/viewer create unique convergences that are not available in traditional media.

While this framing mode can be understood productively as a new iteration of digital vernacular practice, unease with its appropriation of moving image capture/screen technology has appeared. Vertical videos operate in a context whereby new technologies and their users do not function in isolation but are, instead, conditioned by networked, discursive, peer practice that is highly visible in contemporary, Internet-oriented, society:

The instant and permanent visibility and availability of social peers (and the permanent exposure of their content-related activities) enable the instant and permanent social control of exposed activities and connect the semipublic-mediated space with the private place of home. In addition, the convergent nature of the platform (i.e. the permanent and straightforward possibility to receive, post and repost various media) lowers barriers for the participation in content and at the same time brings the text right next to the negotiations of its value. (Macek 2013, 298-9)

The negotiation of the value of vertical videos has been particularly prevalent on YouTube, one of the main exhibition platforms for vertical media. The most visible debate on this topic has been conducted through and in relation to a video that appeared in June 2012, “Vertical Video Syndrome—A PSA.  The video uses a highly comic parody of United States style Public Service Announcements (PSA) to explain to audiences why shooting in a vertical mode is incorrect. It makes statements ranging from the technical, “vertical videos happen when you hold your camera the wrong way,” to the technological, “while you can turn a picture, you can’t really turn a video … Motion pictures have always been horizontal. Televisions are horizontal. Computer screens are horizontal” and the biological “People’s eyes are horizontal.” It has gathered a significant number of views (4.3 million at the time of writing) and pages of comments on YouTube as well as repeated ‘shares’ throughout social networking sites such as Twitter and Facebook.

“Vertical Video Syndrome—A PSA”

Other, similar videos have been produced such as “Turn Your Phone! (Vertical Video PSA)in July 2013 and “Turn Your Phone! (‘No Scrubs’ parody with Andrew Huang, DailyGrace, Hannah Hart, Soundlyawake) in June 2013. Both are set to music: the former portrays a young man, showing various people filming on their mobile phones, how to turn their camera “the correct way” and the latter shows a female singer explaining why videos should be shot in a landscape mode. The videos are entertaining and light-hearted but perpetuate a number of aesthetic ideas surrounding the way in which new camera technologies, particularly the camera phone, should be regulated.

Figure 4. “Turn Your Phone! (‘No Scrubs’ parody with Andrew Huang, DailyGrace, Hannah Hart, Soundlyawake)”

Figure 2. “Turn Your Phone! (‘No Scrubs’ parody with Andrew Huang, DailyGrace, Hannah Hart, Soundlyawake)”

Foremost in their claims is a normalisation of the landscape format as somehow biologically informed and historically pervasive. With regards to the biological argument, originally initiated in the “Vertical Video Syndrome” video, the claim that “people’s eyes are horizontal [sic]” has had a particular resonance with viewers providing comments on YouTube and has been repeated regularly in other Internet discussion sites such as Twitter. In the first instance, this statement ignores a lengthy history of recorded images with roots in the camera obscura whereby “the camera obscura, with its monocular aperture, became a more prefect terminus for a cone of vision, a more perfect incarnation of a single point than the awkward binocular body of the human subject” (Crary 1992, 53; see also Friedberg). Efforts in the nineteenth century to overcome this problem resulted in obsessive attempts by stereoscopic photographers such as David Brewster and John F. Mascher to create two-camera apparatuses that would exactly mimic the eyes’ relations and retinas (Silverman 1993; Pietrobruno 2011). Similarly, a number of twentieth century stereoscopic filmmakers have been equally determined to provide orthoscopic views that exactly replicate the human field of perception (Lipton 1982, 134). The failure for their experiments to take hold as the dominant way of producing photographic images points to the extent to which biological determinism and photographic reproduction have only limited interest for audiences. Even IMAX screens that are designed to exceed the boundaries of human peripheral vision operate within a rectangular frame that is distinct from the seemingly unbounded scope of human vision, particularly with the eye’s ability to focus on and narrow in on a range of different fields. Moreover, there is the problematic assumption embedded in the statement about horizontal eyes that a person must have full vision in both eyes in order to appreciate moving images.

The extent to which horizontal moving-images are historically pervasive is also often emphasised in discussions of vertical framing to the detriment of a more nuanced historical approach that takes in to account the lengthy history of diverse visual culture. Although visual culture has produced art in a variety of forms (square, circular, oval, portrait rectangle, landscape rectangle) and across different media, moving-image production has been mainly confined to a landscape rectangular format that is most commonly found in either a 4:3 or 16:9 aspect ratio. Yet, as John Belton notes, even when W. K. L Dickson and the Lumière Brothers popularised the 4:3 35mm film format during cinema’s development, there was no obvious technological precedent for this standard. Photography in the nineteenth century had a range of aspect ratios and shapes (square, circular, rectangular) and was not standardised into a similar 4:3 aspect ratio until the twentieth century, after the development of cinema. Similarly, nineteenth century hand-painted lanternslides came in a variety of shapes and sizes. Thus,

there is nothing “natural” about these formats. They do not seem to have grown ‘organically’ out of some prior medium of representation. Nor do they initially appear to be automatic consequences of the invention process. At the same time, they were not quite arrived at arbitrarily. (Belton 1992, 18)

The technological conditions that demanded effective ease of reproduction during cinema’s global development means it is hardly surprising that a rectangular, landscape format was arrived at but it is, paradoxically, the non-arbitrary nature of technological development that gave rise to alternatives to the landscape mode. One such alternative arose during the development of stereoscopic (3D) cameras. An early problem was that the desire to produce separate left and right eve views for stereoscopic footage resulted in the need for two cameras and two projectors in order to film and display footage. The added expense and difficulty in synchronising footage meant that it was desirable to develop cameras that could simultaneously record the left and right eye image on one filmstrip. The result was cameras such as the16mm Bolex camera which split the horizontal frame in two and recorded left and right eye images on each half of the frame. When played back, the projector was focused using a projection lens with polarising filters in order to overlay the two halves of the frame into a singular, vertical, stereoscopic image on a vertical screen. Unsurprisingly, the need for a vertical screen at a time when standard screens were horizontal meant that this technology reached an end point in amateur users while professional 3D filmmakers developed systems that would work in a traditional landscape mode (Hayes 1989; Zone 2007). Nonetheless, important stereoscopic documentation of the twentieth century did take place in a vertical mode, such as the Co-operative Wholesale Society’s documentary on 1950s Britain in Co-op 3-D Film (1952).[10]

More recently advertising displays in transport facilities such as underground stations and airports have begun displaying moving-image content on vertical screens. In the London Underground, for example, the numerous escalators carrying passengers from platforms to exits, and vice versa, have vertical screens placed on their walls. Similarly, in Glasgow airport there are a number of vertical screens next to the arrivals and departures screens that show moving-image advertisements in a vertical frame. The architecture of these spaces means that horizontal screens would be inappropriate and so advertisers have developed vertical moving-image content (often in the form of short animations or brief live action scenes, for example “Clarins Vertical LED TV.”  In each of the aforementioned examples of vertical filmmaking and display, there is nothing arbitrary about the decision to place or view moving-images in a portrait format. Instead, they remain tentative examples of what new screen cultures might be able to offer in terms of aesthetic experimentation. While fixed screens will always require that content is created in order to match its intended display, the greatest change in twenty-first century screen culture is the portability and malleability of screens on offer. At this time tablets, mobile phones and other portable screens are limited to rectangular frames but the ability to turn this frame into either portrait or landscape mode means that moving image content can be configured depending on which framing possibility best suits the subject being filmed.[11]

Figure 5. Departure and advertisement screens at Glasgow Airport

Figure 3. Departure and advertisement screens at Glasgow Airport

Attention to how vertical framing suits the ergonomic conditions of the filming device and the subject matter means that vertical videos can be understood more as a form of vernacular creativity than an aesthetic error in filmmaking modes. Examples of vertical videos that have been posted on YouTube and Facebook often suggest a hasty reach for the camera and opportunistic filming decision (adding to the sense that they reproduce unmediated, ‘authentic’ moments in the filmmakers life). This accounts, in part, for how the embodied hold on a mobile device translates into the framing mode. However, this practice does not mean that the filmmaker is inconsiderate of the visual field that the vertical mode will capture. Instead, certain subjects encourage a vertical mode. For example when a single human is the focus of the video, they are often framed to take up the central sections of the screen as is the case of the Irish dancer performing on top of a train’s table in “Incredible Set Dancing & Trad Session on Dublin Train to Galway  or the man jumping on a trampoline in “Epic trampoline flip FAIL dog attack.”  A type of portrait framing for subjects directly addressing the camera is also seen. This is evident in the New Zealand Red Cross’ website where, amongst others, Olympic rower Mahé Drysdale speaks about how to deal with stressful situations in a vertical video aimed at New Zealand youths.  When the architecture of space means that a vertical corridor of action predominates, portrait framing is also used. For example, the interiors of trains, subway cars and other carriages or building corridors and stairways are frequently framed in a portrait format. This can be seen in the interior of a light rail car during “Light Rail Bushido Blade! It’s all fun and games until someone pulls out a sword  and during a shot of dogs descending down a staircase in “Puppy teaching Puppy to go down stairs! SO cute!—ORIGINAL VIDEO!.”  Unprofessional videos such as the series of students performing trapeze moves on the Aerial Edge Facebook page unashamedly use a portrait format in order to provide as much detail of the moving bodies as possible in videos that are both celebratory of the students’ skills and informative for those trying to see how the moves are undertaken.

Finally, a number of smart phone applications, such as Talking Tom Cat, present characters in a portrait format that users can animate and share on sites such as YouTube, resulting in numerous vertical videos such as “Talking Ben and Talking Tom” that has had over 9 million views.  In each case, the vertical mode frames events in ways that suit the subject matter.

Figure 6. From left to right: “Incredible Set Dancing & Trad Session on Dublin Train to Galway,” “Epic trampoline flip FAIL dog attack,” “Puppy teaching Puppy to go down stairs! SO cute!—ORIGINAL VIDEO!” and “Talking Ben and Talking Tom”

Figure 4. From left to right: “Incredible Set Dancing & Trad Session on Dublin Train to Galway,” “Epic trampoline flip FAIL dog attack,” “Puppy teaching Puppy to go down stairs! SO cute!—ORIGINAL VIDEO!” and “Talking Ben and Talking Tom”

The extent to which this mode, and its revelation of a new type of digital vernacular, will be accepted is dependent upon the mechanisms of highly scrutinised exhibition environments. Although many mobile phone videos are made to be viewed only by the filmmaker or distributed only to personal contacts, the new ‘sharing’ features on most smart phones means that filmmakers are encouraged to distribute videos immediately via Internet platforms. By returning to Jakub Macek’s point that the visibility and connectivity of peers mediates the production of content in online spaces, it is possible to recognise the ways in which a negotiation of vertical media’s newfound place is occurring through and beyond that of the Vertical Video PSA videos. Not only are the PSAs highly visible on YouTube, they also reflect and contribute to comments that spread across a range of Internet forums, websites and social media sites. In the first instance, commentators often repeat the main claims in the “Vertical Video Syndrome” video: vertical videos happen when you hold your camera the wrong way/while you can turn a picture, you can’t really turn a video/motion pictures have always been horizontal/ televisions are horizontal/ computer screens are horizontal/people’s eyes are horizontal. In the second instance, they frequently provide links to the “Vertical Video Syndrome” video (that has outpaced the other PSA videos in terms of popularity) which further increases the visibility of the claims in the original video. Examples of this taking place can be seen in websites such as Provideocontent where a post reiterates that “the screens we watch video on are horizontal” followed by the Vertical Video Syndrome video in an embedded link.  Similarly, a search of Twitter on 30 October 2013 found more than a dozen tweets, within a 24-hour time period, in different languages, posting links to the Vertical Video Syndrome video with comments such as “protect yourself! Keep yourself from shooting vertical,” “shooting vertical video on a smartphone? You’re doing it wrong” and “a very serious problem: Vertical Video Syndrome.” When one user, Mike Griffith, started a thread on Twitter saying “there should be a global campaign to get people shooting video on a smartphone to hold it in landscape” the reply was “why don’t the cool phone maker peeps just do a pop-up alert telling you ‘Turn it round, turn it round’.” Subsequent tweets noted that YouTube and Google’s capture applications have functions that already do so. Further addressing this issue, an app for the Apple Store called Horizon was developed that would similarly discourage users from filming in a vertical mode (Liszewski 2014).

This type of technological reminder can be considered in light of Michel de Certeau’s (1988) description of strategies and tactics, whereby institutions and figures of authority put in place strategies for the correct use of consumer products while users often perform tactics that negotiate and change this intended use. The introduction of software that conditions how users may film content on mobile cameras suggests a reiteration of strategy in the face of vertical video tactics. However, there is not merely a simple division between those who control products and everyday users. Instead, there is a complex interplay at work between technology manufacturers, everyday users, and their peer networks. Following Pierre Bourdieu, Jakub Macek notes that

through participation, we establish our common interest in shared content and so we ensure that our cultural capital (and thus our values, preferences, tastes and opinions) and that of those included in our social circles are compatible, that we are surrounded by ‘proper people’ with ‘proper interests’ and that our textual interests and pleasures are consistent with the rest of our habitus. (2013, 298)

In the vertical video context, filmmakers are often operating within peer networks that are attuned to performing correct consumer operation of filming equipment that has been embedded in and reinforced by traditional media. Yet a tension emerges between the cultural habitus in which users are expected to conform to standard horizontal norms and a parallel, embodied technological habitus which encourages users to hold their phone in the vertical position that non-video applications and other content encourages.[12] Although these different habitus represent the meeting of contradictory strategies (media institutions indicating the horizontal framing is correct, hardware manufacturing that encourages a vertical display), their incongruence is overlooked in discussions of vertical video. Instead of recognising the way users are tactically engaging with media technology in new ways, the assumption in the Vertical Video PSAs and related comments on social network sites is that filmmakers shooting in a vertical mode do so because they are amateur users who do not know better and/or do not have the skills and training to conform to professional norms. Paradoxically, by distancing vertical videos as amateur, commentators reiterate the likely authenticity of vertical videos as unmediated documentation. In contexts where authenticity is favoured, such as news sites and certain realms of social media, vertical videos are thus given additional value.

The vertical videos discussed thus far include many or all of the traits discussed in relation to the “Fertilizer Plant video and in this way demonstrate amateur practice. At the same time, their successful circulation (23.3 million views for the “Fertilizer Plant video at the time of writing) means that they are often able to gain more exposure and recognition than the wide variety of professional filmmaking practitioners who put portions of their work on YouTube. It is not surprising, then, that tensions arise when these two groups operate within a shared exhibition platform. Significant to this context is the extent to which blogs, posts, tweets and videos calling for an end to vertical video are often from non-professional filmmakers or otherwise liminal media practitioners. With regards to this type of behaviour amongst filmmakers on YouTube, Eggo Müller notes that similar processes take place when amateur and professional filmmakers interact through videos and forums dedicated to ‘upskilling’ the average YouTube user. He notes that a quality discourse prevails whereby relatively conservative adherence to traditional filmmaking norms is upheld but he also states that it is impossible to delineate a clear boundary between amateurs and professionals within this process. Instead,

users with different backgrounds and interests in YouTube contribute to and maintain this quality discourse. Full, semi-, pre- and post-professionals use YouTube to share and promote their knowledge, and dabblers, novices and amateurs contribute to the same discourse through their questions and comments. As opposed to the era of mass media—with producers on the one side and consumers on the other – there is a diverse field of positions in the space of participation YouTube creates. (2009, 136)

In this context, the efforts to police a portrait framing practice become a means for some users to distinguish themselves as knowledgeable and attuned to the importance of visual aesthetics at a time when sites such as YouTube offer a type of anarchic free-for-all. Thus, conformity to a perceived set of professional norms is “exposed, articulated and reproduced in a performative interaction” (Macek 2013, 299).

However, for every desire to be valued by and incorporated into a social milieu there are often antithetical desires to be visibly unique (Macek 2013, 299). This means that, aside from advertising professionals who have been contracted to produce vertical moving-images for billboard display, there are a small, but growing number of media practitioners advocating a portrait mode in order to produce distinct aesthetic effects. As will be discussed, their work is supported by digital viewing platforms, but there are antecedents for their work in experimental filmmaking practice such as Paolo Gioli’s vertical Film Stenopeico (1973/81/89), Commutazione con mutazione (1969) and L’operatore perforato (1979) (Bordwell 2009) andBill Viola’s “The Messenger” and “The Crossing” (Young 1997).[13] Building on the challenges these experimental films pose to an understanding of traditional compositional strategies, some groups of digitally oriented filmmakers are using the exhibition platform Vimeo to showcase experiments in vertical framing.[14] Started in 2010, the main focal point for this activity is the Tallscreen group that provides a space to upload short vertical films. Although not directly stated, it is implied that there is a concern with distinguishing these works from the amateur efforts being produced elsewhere and to this end there are calls to use High Definition SLR cameras or equivalents in order to “avoid the non aesthetic Jello Effect.”

Elsewhere on Vimeo, artist Gregory Gutenko has uploaded versions of his work such as “Rail Poem” (2012) and “Orientation Video” (2012) that have been exhibited in a portrait aspect ratio in gallery space.  The former focuses on motion through landscapes, often with an emphasis on movement that follow the narrow borders of railway tracks. Careful editing and manipulation of images means that various parts of what the camera has captured are morphed together and laid over one another within individual shots. The latter uses an upside down camera to produced “upside down” images that then frame media devices such as a television displaying their own images “upside down” which in turn appear the correct way up to viewers. In each case, portrait formatting concentrates attention on movement and objects in the screen space that would no longer be central in the same way were the frame to be widened. In a similar manner, Christoph A. Geiseler’s documentary Curry Power (2012) ( experiments with vertical framing. Like the Vimeo Tallscreen group, Geiseler narrates his reasons for using the portrait format. “Musicians on a stage, runway models, train-tracks disappearing into the distance, close-up portraits, skyscrapers and trees beg for a vertical video to capture their inherent beauty: the essence of their form, flow and function is vertical.” Like the filmmakers mentioned before, the attention given to his practice sets his work apart from the vertical videos that are deemed and contextualised as amateur content. In this way, discourses of authenticity are less apparent but bridging the artistic and unmediated documentation contexts for vertical filming is a film by musician Wayne Coyne of the Flaming Lips group. In 2013 he produced A Year In The Life Of Wayne’s Phone, a film that premiered at the SXSW festival in March of that year. Filmed on his iPhone, A Year in the Life of Wayne, took vertical moving-images shot by Coyne and displayed them three at a time, side by side on an horizontal screen. Noted as a somewhat jumbled assemblage of footage from Coyne’s personal and professional life, the film captured the unruly nature of unplanned and hastily thrown together YouTube videos while simultaneously elevating them to a status worthy of a film festival audience (even though reviews of the festival screening were not always complimentary [Miller 2013; Saldana 2013]).

Heaven (2013)

In my own experiments with colleagues at Victoria University of Wellington, I have been able to contribute to short films that take into consideration these vertical concerns as well as how our films can be exhibited. In our first short film “Heaven(2013),  we wanted to make use of new DIY media technologies and so we shot the film on an HTC mobile phone and in a portrait format. This process posited its own creative possibilities that enriched our sense of aesthetic experience.

Figure 7: Poster for Christoph A. Geiseler’s Curry Power (2012)

Figure 5: Poster for Christoph A. Geiseler’s Curry Power (2012)

We often found ourselves visualising sequences in a horizontal format as we were accustomed to working within that framing but then had to recalibrate our mental images as we composed individual shots. However, we found that the focus on human subjects within our script (essentially a story of two idiosyncratic male characters portrayed over a week’s timeframe in Wellington, New Zealand) was supported by vertical framing and we were able to utilise space in ways that we hadn’t previously conceived of. For example, the opening shot is of a plane flying overhead on its way to land at Wellington airport. The vertical framing allows the underbelly of the plane to take up the majority of screen space, intonating the visceral and dizzying feeling that occurs when one watches a plane fly overhead. At various points in the film a two-shot is constructed between the male characters. They do not speak to one another and so it was useful to frame their bodies vertically in a way that emphasised the subtle interactions between them that take place through whole body positioning. While the film could have been shot with horizontal framing and the essence of the story would have remained, the details of our composition and the way in which this framed the relationship between the characters would have been altered. Knowing that the eventual distribution for the film would take place on YouTube and the film would ideally be viewed on portable vertical screens such as smart phones and tablets, we also took into consideration Andreas Treske’s questions concerning developing visual work for small screens: “how does the development of smaller screens and online video influence how we compose and create images? How is the reproduction of images influenced by its assumed viewing environment? How is it related to the viewing situations of its audience if these are not the cinema theatre or the television set with its attached couch?” (2008, 215). For this reason we attempted to compose bold dynamic shots that suited vertical framing for small screen viewing.

Figure 8: “Heaven” (2013)

Figure 6: “Heaven” (2013)

While the aforementioned vertical works have mainly used live action footage, there have also been experiments with other media. The Alicewinks ( project created by David Neal originated in a desire to animate the various illustrations of Lewis Carroll’s Alice’s Adventures in Wonderland (1865) that appeared in different publications of the text around the beginning of the twentieth century. Because these illustrations were mainly in a portrait format, the animation was conceived of in this way and Alicewinks follows a variety of different visual versions of Alice as she moves through the narrative of the original book, all within a vertical framing. The story of the work’s journey from production to exhibition is illustrative of the changing landscape of screen technologies and the way in which traditional formats retain a hold on the way new media works can be conceived and distributed. Due to its feature film length, Neal initially approached Apple’s iTunes in order to distribute it through their Movie store but because of its vertical format Apple responded that it could not be distributed as a movie but might be better distributed through their App Store. The App Store stated that the piece was not sufficiently interactive and suggested returning to the Movie Store. Unable to resolve the issue between either of these stores, Neal eventually found a place for Alicewink’s distribution in Apple’s iBookstore. For many viewers, its placement here means that it will initially be perceived as a book yet its ability to provide vertical moving images in tablets and e-book means that its audiences will be exposed to portrait format moving images.

In 2008, Lev Manovich asked: “given that the significant percentage of user-generated content either follows the templates and conventions set up by professional entertainment industry, or directly re-uses professionally produced content … does this mean that people’s identities and imagination are now even more firmly colonized by commercial media than in the twentieth century?” (36; see also van Dijck 2009) On the one hand, the continued policing of vertical videos and the desires to prove that there are correct, inflexible ways of framing content confirms the extent to which media traditions retain a stronghold on aesthetic practice. On the other hand, the proliferation of user-filmed vertical videos on YouTube and social networking sites suggests vertical framing may find its place as part of a new digital vernacular. Furthermore, the small but steady uptake of digital vertical content in animation, advertising billboards, gallery works and short films on Vimeo, suggests that new screen environments are creating new aesthetic practice. Manovich goes on to draw upon Michel de Certeau to suggest

a city’s layout, signage, driving and parking rules and maps are strategies created by governmental and corporate interests. The ways an individual is moving through the city, taking shortcuts, wandering aimlessly, navigating through favourite routes and adopting others are tactics. In other words, an individual can’t physically reorganize the city but she can adopt itself to her needs by choosing how she moves through it. (2008, 37)

I would suggest that, within this analogy, vertical framing is more akin to squatting: it is the occupation of a space and a way of undertaking habitation that is related to the normal use of this space but in new, unauthorised ways.

As this article has sought to prove, there is no inherent technological determinant that means moving-images must be displayed in a horizontal format yet there are technological concerns that make portrait and landscape modes more or less advantageous depending, firstly, on the screen environment in which they are to be exhibited, and secondly, on the type of content that filmmakers wish to portray. To return to the “Fertilizer Plant video, its vertical framing acts as a signifier of authenticity which is useful in its claim to represent a real-time, non-manipulated event. Other videos are able to use the same framing in order to tactically present their own version of events. This mode is, of course, open to be co-opted by commercial media in order to simulate a visual signifier of legitimate amateur content (in much the same way the shaky camera and camcorder-style white-balance were co-opted by The Blair Witch Project [Daniel Myrick and Eduardo Sánchez, 1999] and Cloverfield [Matt Reeves, 2008]). To exemplify this potential, comedian Ricky Gervais produces popular comedy shows (mainly for television) in traditional horizontal framing. At the same time, he maintains a Facebook page which frequently hosts vertical videos filmed in his home space such as video of him in his bath  or a video of his cat. The vertical framing suggests these are unmediated moments in his domestic life but their playful nature also suggests that Gervais is using them to blur the boundary between his real-life and on-screen persona in a way that is typical of his comedy work. In this case, there is a highly media-literate public personality drawing upon an amateur framing technique in order to add value to the videos that he produces.

In the contexts discussed in this article, the diversity in filming possibilities has not really changed, as it has always been possible for a filmmaker to turn a camera on its side for dramatic effect and the vertical mode is already appearing amongst traditional media when, for example, television news programs insert vertical footage of an event in a way that suggests everyday persons were present to witness it. What have changed are exhibition possibilities. In the first instance, readily available digital projectors and large digital monitors make it possible for the art gallery film and advertising content to exhibit vertical images. In the second instance, the hand held screening devices make the possibility for filming vertically ubiquitous. Detractors of a vertical format are right to note that vertical footage will normally default to a small image within a larger screen when displayed on horizontal televisions and computer monitors but these thoughts fail to foresee the present and the future of new screen technologies in which “a variety of screens—long and wide and square, large and small, flat and fat, composed of grains composed of pixels, lit by projected light, cathode-ray tube, plasma, LCD—all compete for our attention without any convincing arguments about hegemony” (Friedberg 2009, 7).[15] In this environment there will likely be the continuation of amateurs producing vertical works as a type of vernacular practice while there are also open possibilities for media professionals to engage with this framing mode in new and dynamic ways.



Andén-Papadopoulos, Kari. 2013. “Media Witnessing and the ‘crowd-Sourced Video Revolution’.” Visual Communication 12 (3): 341-357.

Arthur, Paul. 1993. “Jargons of authenticity (three American moments).” In Theorizing Documentary, edited by Michael Renov, 108-134. London and New York: Routledge.

Belton, John. 1992. Widescreen Cinema. London: Harvard University Press.

Bordwell, David. 2009. “Paolo Gioli’s Vertical Cinema.”, August. Accessed November 26, 2013.

Brooker, Will. 2009. “Camera-Eye, CG-Eye: Videogames and the ‘Cinematic’.” Cinema Journal 48 (3): 122-128.

Burgess, Jean, and Joshua Green. 2009a. “The Entreprenurial Vlogger: Participatory Culture Beyond the Professional-Amateur Divide.” In The Youtube Reader, edited by Pelle Snickars and Patrick Vonderau, 89-107. Stockholm: National Library of Sweden.

Burgess, Jean, and Joshua Green. 2009b. YouTube: Online Video and Participatory Culture. Cambridge: Polity.

Casetti, Francesco. 2011. “Back to the Motherland: The Film Theatre in the Postmedia Age.” Screen 52 (1): 1-12.

Crary, Jonathan. 1992. Techniques of the Observer: On Vision and Modernity in the 19th Century: On Vision and Modernity in the Nineteenth Century. Cambridge: MIT Press.

Cubitt, Sean. 2008. “Codecs and Capability.” In Video Vortex Reader: Responses to Youtube, edited by Geert Lovink and Sabine Niederer, 45-52. Amsterdam: Institute of Network Cultures.

de Certeau, Michel. 1988. The Practice of Everyday Life. Berkeley: University of California Press.

Dovey, Jon. 2000. Freakshow First Person Media and Factual Television. London: Pluto Press.

Friedberg, Anne. 2009. The Virtual Window: From Alberti to Microsoft. London: MIT Press.

Griffiths, Mike. 2013. “There Should Be a Global Campaign to Get People Shooting Video on a Smartphone to Hold It in Landscape.” @mrmikegriffiths, May 15. Archived November 12, 2013.

Hayes, R. M. 1989. 3-D Movies: A History and Filmography of Stereoscopic Cinema. London: St James Press.

Hjorth, Larissa. 2006. “Being Mobile: In Between the Real and Reel.” Paper presented at Asian Pop and Mobile Cultures Conference, Gwangju, October 28-29, 1-15. Accessed May 9, 2014.

Jones, Nick. 2013. “Quantification and Substitution: The Abstract Space of Virtual Cinematography.” Animation 8 (3): 253-266.

Lipton, Lenny. 1982. Foundations of the Stereoscopic Cinema. New York: Van Nostrand Reinhold.

Liszewski, Andrew. 2014. “Brilliant, Overdue App Forces Your Phone to Take Horizontal Videos.” Gizmodo, January 15. Accessed May 9, 2014.

Macek, Jakub. 2013. “More than a Desire for Text Online Participation and the Social Curation of Content.” Convergence: The International Journal of Research into New Media Technologies 19 (3): 295-302.

Maheshwari, Laya. 2014. “Angular Visions: Vertical Cinema at Rotterdam.” Filmmaker Magazine, January 28. Accessed May 9, 2014.

Manovich, Lev. 2001. The Language of New Media. London: MIT Press.

Manovich, Lev. 2008. “The Practice of Everyday (Media) Life.’’ In Video Vortex Reader: Responses to Youtube, edited by Geert Lovink and Sabine Niederer, 33-44. Amsterdam: Institute of Network Cultures.

Miller, Jeff. 2013. “SXSW: Film About ‘A Year in the Life’ of Wayne Coyne’s Phone Is a Dizzying Spectacle.” The Hollywood Reporter, March 14. Accessed November 27, 2013.

Müller, Eggo. 2009. “Where Quality Matters: Discourses on the Art of Making a YouTube Video.” In The Youtube Reader, edited by Pelle Snickars and Patrick Vonderau, 126-140. Stockholm: National Library of Sweden.

Pantti, Mervi, and Piet Bakker. 2009. “Misfortunes, Memories and Sunsets Non-Professional Images in Dutch News Media.” International Journal of Cultural Studies 12 (5): 471-489.

Pietrobruno, Sheenagh. 2011. “The Stereoscope and the Miniature.” Early Popular Visual Culture 9 (3): 171-190.

Purse, Lisa. 2013. Digital Imaging in Popular Cinema. Edinburgh: Edinburgh University Press.

Saldana, Mark. 2013. “2013 SXSW Review: A Year in the Life of Wayne’s Phone.”, March 26. Accessed May 9, 2014.

Schleser, Max. 2012. “Collaborative Mobile Phone Filmmaking.” In Handbook of Participatory Video, edited by Elizabeth-Jane Milne, Claudia Mitchell and Naydene De Lange, 397-411. Lanham, MD: AltaMira Press.

Silverman, Robert J. 1993. “The Stereoscope and Photographic Depiction in the 19th Century.” Technology and Culture 34 (4): 729-756.

Tolson, Andrew. 2010. “A New Authenticity? Communicative Practices on YouTube.” Critical Discourse Studies 7 (4): 277-289.

Treske, Andreas. 2008. “Detailing and Pointing.” In Video Vortex Reader: Responses to Youtube, edited by Geert Lovink and Sabine Niederer, 215-221. Amsterdam: Institute of Network Cultures.

van Dijck, José. 2009. “Users like You? Theorizing Agency in User-Generated Content.” Media, Culture & Society 31 (1): 41-58.

Verhoeff, Nanna. 2012. Mobile Screens: The Visual Regime of Navigation. Amsterdam: Amsterdam University Press.

Wilson, Gavin. 2012. “A Phenomenology of Reciprocal Sensation in the Moving Body Experience of Mobile Phone Films.” Cinema: Journal of Philosophy and the Moving Image 3: 62-83.

Young, Lisa Jaye. 1997. “The Elemental Sublime.” Performing Arts Journal 19 (3): 65-71.

Zone, Ray. 2007. Stereoscopic Cinema and the Origins of 3-D Film. Lexington: University Press of Kentucky.



[1]For a related discussion of the way news sites are using user-filmed videos as a type of ‘media witnessing based on an aesthetics of authenticity’ see Andén-Papadopoulos (2013).

[2] For a discussion of the way the unprofessional was associated with authenticity in factual documentary video footage see Dovey (2000) and in photojournalism see Pantti and Bakker (2009).

[3] This article takes on and moves forward initial discussion of this concept outlined in Ross, Miriam and Maddy Glen. “ Vertical Cinema: New Digital Possibilities” Rhizomes (forthcoming).

[4] There is a paradoxical public imagining of YouTube’s operation whereby YouTube seems to simultaneously act as a transparent exhibition platform for professionally produced content (music videos, trailers, old movies, high quality promotional videos) and a creative machine for engendering home-video style user videos and their proliferation (Burgess and Green 2009b).

[5] For a discussion of the way YouTube content creators use ordinariness as a trope for suggesting authenticity in vlogs see Tolson (2010).

[6] Although the YouTube page that hosts the video now has a description of who the filmmaker was and a note that says he and the child that was in the car are okay, initially no such description was added and the video was just one of the many videos uploaded to user zidyboby’s prolific channel.

[7]For a discussion of the way previous media texts, particularly 1960s Direct Cinema documentaries, have used similar techniques to suggest unmediated and authentic representation see Arthur (1993).

[8]For example, the vertically framed YouTube video “Seeing her for the first time again,” purportedly showing a man awakening from an operation and not recognizing his wife, was widely believed to be faked (

[9] Sites such as Tom Phillip’s features images and videos that have been circulated by traditional news sites and social media sites as purportedly authentic representations of a particular moment or event even as their veracity has been questioned.

[10] Artist Zoe Beloff has used this technology in recent years to make the critically acclaimed Shadow Land or Light from the Other Side (2000) and Charming Augustine (2005), both of which are screened on a portable, vertical screen maintained by Beloff.

[11] It is worth noting that new media applications such as Vine and Instagram produce moving image content in a square format.

[12] The tablet offer an interesting case study in this regard as computer-similar tablets such as the iPad are often seen used in a landscape mode whereas book-similar tablets such as the Kindle Fire are often seen used in a portrait format. In both case, the tablet will be rotated depending on both the functions they are performing (electronic books encourage portrait display, audio-visual content encourages landscape display) and user preferences and habits.

[13] A more recent example would be the vertical films screened at the International Film Festival Rotterdam films that, while made with digital technologies, were printed on 35mm film (Maheshwari 2014).

[14] I would suggest that Vimeo and YouTube have complex complementary and competitive relationships whereby Vimeo has emerged as a space for ‘artistic’ filmmakers who place emphasis on cinematic style whereas YouTube simultaneously appeals to popular, mass audiences and extremely niche interest groups that often have little concern with aesthetic tendencies.

[15]A playful installation, DVD Dead Drop vol.6: “Vertical Video”, by Aram Bartholl was commissioned by MOMA New York to provide audiences with a DVD of amateur videos captured in a portrait format along with instructions for adjusting a home theatre or other viewing environment to properly experience the works (


Bio: Dr Miriam Ross is Lecturer in the Film Programme at Victoria University of Wellington. She is the author of South American Cinematic Culture: Policy, Production, Distribution and Exhibition (2010), as well as publications on film industries, stereoscopic media, film festivals and new digital technologies.