Read, Watch, Listen: A commentary on eye tracking and moving images – Tim J. Smith


Eye tracking is a research tool that has great potential for advancing our understanding of how we watch movies. Questions such as how differences in the movie influences where we look and how individual differences between viewers alters what we see can be operationalised and empirically tested using a variety of eye tracking measures. This special issue collects together an inspiring interdisciplinary range of opinions on what eye tracking can (and cannot) bring to film and television studies and practice. In this article I will reflect on each of these contributions with specific focus on three aspects: how subtitling and digital effects can reinvigorate visual attention, how audio can guide and alter our visual experience of film, and how methodological, theoretical and statistical considerations are paramount when trying to derive conclusions from eye tracking data.



I have been obsessed with how people watch movies since I was a child. All you have to do is turn and look at an audience member’s face at the movies or at home in front of the TV to see the power the medium holds over them. We sit enraptured, transfixed and immersed in the sensory patterns of light and sound projected back at us from the screen. As our physical activity diminishes our mental activity takes over. We piece together minimal audiovisual cues to perceive rich otherworldly spaces, believable characters and complex narratives that engage us mentally and move us emotionally. As I progressed through my education in Cognitive Science and Psychology I was struck by how little science understood about cinema and the mechanisms filmmakers used to create this powerful experience.[i] Reading the film literature, listening to filmmakers discuss their craft and excavating gems of their craft knowledge I started to realise that film was a medium ripe for psychological investigation. The empirical study of film would further our understanding of how films work and how we experience them but it would also serve as a test bed for investigating complex aspects of real-world cognition that were often considered beyond the realms of experimentation. As I (Smith, Levin & Cutting, 2010) and others (Anderson, 2006) have argued elsewhere, film evolved to “piggy back” normal cognitive development and use basic cognitive tendencies such as attentional preferences, theory of mind, empathy and narrative structuring of memory to make the perception of film as enjoyable and effortless as possible. By investigating film cognition we can, in turn advance our understanding of general cognition. But to do so we need to step outside of traditional disciplinary boundaries concerning the study of film and approach the topic from an interdisciplinary perspective. This special issue represents a highly commendable attempt to do just that.

By bringing together psychologists, film theorists, philosophers, vision scientists, neuroscientists and screenwriters this special issue (and the Melbourne research group that most contributors belong to) provides a unique perspective on film viewing. The authors included in this special issue share my passion for understanding the relationship between viewers and film but this interest manifests in very different ways depending on their perspectives (see Redmond, Sita, and Vincs, this issue; for a similar personal journey into eye tracking as that presented above). By focussing on viewer eye movements the articles in this special issue provide readers from a range of disciplines a way into the eye tracking investigation of film viewing. Eye tracking (as comprehensively introduced and discussed by Dyer and Pink, this issue) is a powerful tool for quantifying a viewer’s experience of a film, comparing viewing behaviour across different viewing conditions and groups as well as testing hypotheses about how certain cinematic techniques impact where we look. But, as is rightly highlighted by several of the authors in this special issue eye tracking is not a panacea for all questions about film spectatorship.

Like all experimental techniques it can only measure a limited range of psychological states and behaviours and the data it produces does not say anything in and of itself. Data requires interpretation. Interpretation can take many forms[ii] but if conclusions are to be drawn about how the data relates to psychological states of the viewer this interpretation must be based on theories of psychology and ideally confirmed using secondary/supporting measures. For example, the affective experience of a movie is a critical aspect which cognitive approaches to film are often wrongly accused of ignoring. Although, cognitive approaches to film often focus on how we comprehend narratives (Magliano and Zacks, 2011), attend to the image (Smith, 2013) or follow formal patterns within a film (Cutting, DeLong and Nothelfer, 2010) several cognitivists have focussed in depth on emotional aspects (see the work of Carl Plantinga, Torben Grodal or Murray Smith). Eye tracking is the perfect tool for investigating the impact of immediate audiovisual information on visual attention but it is less suitable for measuring viewer affect. Psychophysiological measures such as heart rate and skin conductance, neuroimaging methods such as fMRI or EEG, or even self-report ratings may be better for capturing a viewer’s emotional responses to a film as has been demonstrated by several research teams (Suckfull, 2000; Raz et al, 2014). Unless the emotional state of the viewer changed where they looked or how quickly they moved their eyes the eye tracker may not detect any differences between two viewers with different emotional states.[iii]

As such, a researcher interested in studying the emotional impact of a film should either choose a different measurement technique or combine eye tracking with another more suitable technique (Dyer and Pink, this issue). This does not mean that eye tracking is unsuitable for studying the cinematic experience. It simply means that you should always choose the right tool for the job and often this means combining multiple tools that are strong in different ways. As Murray Smith (the current President of the Society for the Cognitive Study of the Moving Images; SCSMI) has argued, a fully rounded investigation of the cinematic experience requires “triangulation” through the combination of multiple perspectives including psychological, neuroscientific and phenomenological/philosophical theory and methods (Smith, 2011) – an approach taken proudly across this special issue.

For the remainder of my commentary I would like to focus on certain themes that struck me as most personally relevant and interesting when reading the other articles in this special issue. This is by no means an exhaustive list of the themes raised by the other articles or even an assessment of the importance of the particular themes I chose to select. There are many other interesting observations made in the articles I do not focus on below but given my perspective as a cognitive scientist and current interests I decided to focus my commentary on these specific themes rather than make a comprehensive review of the special issues or tackle topics I am unqualified to comment on. Also, I wanted to take the opportunity to dispel some common misconceptions about eye tracking (see the section ‘Listening to the data’) and empirical methods in general.

Reading an image

One area of film cognition that has received considerable empirical investigation is subtitling. As Kruger, Szarkowska and Krejtz (this issue) so comprehensively review, they and I believe eye tracking is the perfect tool for investigating how we watch subtitled films. The presentation of subtitles divides the film viewing experience into a dual- task: reading and watching. Given that the media was originally designed to communicate critical information through two channels, the image and soundtrack introducing text as a third channel of communication places extra demands on the viewer’s visual system. However, for most competent readers serially shifting attention between these two tasks does not lead to difficulties in comprehension (Kruger, Szarkowska and Krejtz, this issue). Immediately following the presentation of the subtitles gaze will shift to the beginning of the text, saccade across the text and return to the centre of interest within a couple of seconds. Gaze heatmaps comparing the same scenes with and without subtitles (Kruger, Szarkowska and Krejtz, this issue; Fig. 3) show that the areas of the image fixated are very similar (ignoring the area of the screen occupied by the subtitles themselves) and rather than distracting from the visual content the presence of subtitles seems to actually condense the gaze behaviour on the areas of central interest in an image, e.g. faces and the centre of the image. This illustrates the redundancy of a lot of the visual information presented in films and the fact that under non-subtitle conditions viewers rarely explore the periphery of the image (Smith, 2013).

My colleague Anna Vilaró and I recently demonstrated this similarity in an eye tracking study in which the gaze behaviour of viewers was compared across versions of an animated film, Disney’s Bolt (Howard & Williams, 2008) either in the original English audio condition, a Spanish language version with English subtitles, an English language version with Spanish subtitles and a Spanish language version without subtitles (Vilaró, & Smith, 2011). Given that our participants were English speakers who did not know Spanish these conditions allowed us to investigate both where they looked under the different audio and subtitle conditions but also what they comprehended. Using cued recall tests of memory for verbal and visual content we found no significant differences in recall for either types of content across the viewing conditions except for verbal recall in the Spanish-only condition (not surprisingly given that our English participants couldn’t understand the Spanish dialogue). Analysis of the gaze behaviour showed clear evidence of subtitle reading, even in the Spanish subtitle condition (see Figure 1) but no differences in the degree to which peripheral objects were explored. This indicates that even when participants are watching film sequences without subtitles and know that their memory will be tested for the visual content their gaze still remains focussed on central features of a traditionally composed film. This supports arguments for subtitling movies over dubbing as, whilst placing greater demands on viewer gaze and a heightened cognitive load there is no evidence that subtitling leads to poorer comprehension.

Figure 1: Figure from Vilaró & Smith (2011) showing the gaze behaviour of multiple viewers directed to own language subtitles (A) and foreign language/uninterpretable subtitles (B).

Figure 1: Figure from Vilaró & Smith (2011) showing the gaze behaviour of multiple viewers directed to own language subtitles (A) and foreign language/uninterpretable subtitles (B).

The high degree of attentional synchrony (Smith and Mital, 2013) observed in the above experiment and during most film sequences indicates that all visual features in the image and areas of semantic significance (e.g. social information and objects relevant to the narrative) tend to point to the same part of the image (Mital, Smith, Hill and Henderson, 2011). Only when areas of the image are placed in conflict through image composition (e.g. depth of field, lighting, colour or motion contrast) or staging (e.g. multiple actors) does attentional synchrony break down and viewer gaze divide between multiple locations. Such shots are relatively rare in mainstream Hollywood cinema or TV (Salt, 2009; Smith, 2013) and when used the depicted action tends to be highly choreographed so attention shifts between the multiple centres of image in a predictable fashion (Smith, 2012). If such choreographing of action is not used the viewer can quickly exhaust the information in the image and start craving either new action or a cut to a new shot.

Hochberg and Brooks (1978) referred to this as the visual momentum of the image: the pace at which visual information is acquired. This momentum is directly observable in the saccadic behaviour during an images presentation with frequent short duration fixations at the beginning of a scene’s presentation interspersed by large amplitude saccades (known as the ambient phase of viewing; Velichovsky, Dornhoefer, Pannasch and Unema, 2000) and less frequent, longer duration fixations separated by smaller amplitude saccades as the presentation duration increases (known as the focal phase of viewing; Velichovsky et al., 2000). I have recently demonstrated the same pattern of fixations during viewing of dynamic scenes (Smith and Mital, 2013) and shown how this pattern gives rise to more central fixations at shot onset and greater exploration of the image and decreased attentional synchrony as the shot duration increases (Mital, Smith, Hill and Henderson, 2011). Interestingly, the introduction of subtitles to a movie may have the unintended consequence of sustaining visual momentum throughout a shot. The viewer is less likely to exhaust the information in the image because their eyes are busy saccading across the text to acquire the information that would otherwise be presented in parallel to the image via the soundtrack. This increased saccadic activity may increase the cognitive load experienced by viewers of subtitled films and change their affective experience, producing greater arousal and an increased sense of pace.

For some filmmakers and producers of dynamic visual media, increasing the visual momentum of an image sequence may be desirable as it maintains interest and attention on the screen (e.g. Michael Bay’s use of rapidly edited extreme Close-Ups and intense camera movements in the Transformer movies). In this modern age of multiple screens fighting for our attention when we are consuming moving images (e.g. mobile phones and computer screens in our living rooms and even, sadly increasingly at the cinema) if the designers of this media are to ensure that our visual attention is focussed on their screen over the other competing screens they need to design the visual display in a way that makes comprehension impossible without visual attention. Feature Films and Television dramas often rely heavily on dialogue for narrative communication and the information communicated through the image may be of secondary narrative importance to the dialogue so viewers can generally follow the story just by listening to the film rather than watching it. If producers of dynamic visual media are to draw visual attention back to the screen and away from secondary devices they need to increase the ratio of visual to verbal information. A simple way of accomplishing this is to present the critical audio information through subtitling. The more visually attentive mode of viewing afforded by watching subtitled film and TV may partly explain the growing interest in foreign TV series (at least in the UK) such as the popularity of Nordic Noir series such as The Bridge (2011) and The Killing (2007).

Another way of drawing attention back to the screen is to constantly “refresh” the visual content of the image by either increasing the editing rate or creatively using digital composition.[iv] The latter technique is wonderfully exploited by Sherlock (2010) as discussed brilliantly by Dwyer (this issue). Sherlock contemporised the detective techniques of Sherlock Holmes and John Watson by incorporating modern technologies such as the Internet and mobile phones and simultaneously updated the visual narrative techniques used to portray this information by using digital composition to playfully superimpose this information onto the photographic image. In a similar way to how the sudden appearance of traditional subtitles involuntarily captures visual attention and draws our eyes down to the start of the text, the digital inserts used in Sherlock overtly capture our eyes and encourage reading within the viewing of the image.

If Dwyer (this issue) had eyetracked viewers watching these excerpts she would have likely observed this interesting shifting between phases of reading and dynamic scene perception. Given that the appearance of the digital inserts produce sudden visual transients and are highly incongruous with the visual features of the background scene they are likely to involuntarily attract attention (Mital, Smith, Hill & Henderson, 2012). As such, they can be creatively used to reinvigorate the pace of viewing and strategically direct visual attention to parts of the image away from the screen centre. Traditionally, the same content may have been presented either verbally as narration, heavy handed dialogue exposition (e.g. “Oh my! I have just received a text message stating….”) or as a slow and laboured cut to close-up of the actual mobile phone so we can read it from the perspective of the character. Neither takes full advantage of the communicative potential of the whole screen space or our ability to rapidly attend to and comprehend visual information and audio information in parallel.

Such intermixing of text, digital inserts and filmed footage is common in advertisements, music videos, and documentaries (see Figure 2) but is still surprisingly rare in mainstream Western film and TV. Short-form audiovisual messages have recently experienced a massive increase in popularity due to the internet and direct streaming to smartphones and mobile devices. To maximise their communicative potential and increase their likelihood of being “shared” these videos use all audiovisual tricks available to them. Text, animations, digital effects, audio and classic filmed footage all mix together on the screen, packing every frame with as much info as possible (Figure 2), essentially maximising the visual momentum of each video and maintaining interest for as long as possible.[v] Such videos are so effective at grabbing attention and delivering satisfying/entertaining/informative experiences in a short period of time that they often compete directly with TV and film for our attention. Once we click play, the audiovisual bombardment ensures that our attention remains latched on to the second screen (i.e., the tablet or smartphone) for its duration and away from the primary screen, i.e., the TV set. Whilst distressing for producers of TV and Film who wish our experience of their material to be undistracted, the ease with which we pick up a handheld device and seek other stimulation in parallel to the primary experience may indicate that the primary material does not require our full attention for us to follow what is going on. As attention has a natural ebb-and-flow (Cutting, DeLong and Nothelfer, 2010) and “There is no such thing as voluntary attention sustained for more than a few seconds at a time” (p. 421; James, 1890) if modern producers of Film and TV want to maintain a high level of audience attention and ensure it is directed to the screen they must either rely on viewer self-discipline to inhibit distraction, reward attention to the screen with rich and nuanced visual information (as fans of “slow cinema” would argue of films like those of Bela Tarr) or utilise the full range of postproduction effects to keep visual interest high and maintained on the image, as Sherlock so masterfully demonstrates.

Figure 2: Gaze Heatmaps of participants’ free-viewing a trailer for Lego Indiana Jones computer game (left column) and the Video Republic documentary (right column). Notice how both make copious use of text within the image, as intertitles and as extra sources of information in the image (such as the head-up display in A3). Data and images were taken from the Dynamic Images and Eye Movement project (DIEM; Mital, Smith, Hill & Henderson, 2010). Videos can be found here ( and here (

Figure 2: Gaze Heatmaps of participants’ free-viewing a trailer for Lego Indiana Jones computer game (left column) and the Video Republic documentary (right column). Notice how both make copious use of text within the image, as intertitles and as extra sources of information in the image (such as the head-up display in A3). Data and images were taken from the Dynamic Images and Eye Movement project (DIEM; Mital, Smith, Hill & Henderson, 2010). Videos can be found here ( and here (

A number of modern filmmakers are beginning to experiment with the language of visual storytelling by questioning our assumptions of how we perceive moving images. Forefront in this movement are Ang Lee and Andy and Lana Wachowski. In Ang Lee’s Hulk (2003), Lee worked very closely with editor Tim Squyers to use non-linear digital editing and after effects to break apart the traditional frame and shot boundaries and create an approximation of a comic book style within film. This chaotic unpredictable style polarised viewers and was partly blamed for the film’s poor reception. However, it cannot be argued that this experiment was wholly unsuccessful. Several sequences within the film used multiple frames, split screens, and digital transformation of images to increase the amount of centres of interest on the screen and, as a consequence increase pace of viewing and the arousal experienced by viewers. In the sequence depicted below (Figure 3) two parallel scenes depicting Hulk’s escape from a containment chamber (A1) and this action being watched from a control room by General Ross (B1) were presented simultaneously by presenting elements of both scenes on the screen at the same time. Instead of using a point of view (POV) shot to show Ross looking off screen (known as the glance shot; Branigan, 1984) followed by a cut to what he was looking at (the object shot) both shots were combined into one image (F1 and F2) with the latter shot sliding into from behind Ross’ head (E2). These digital inserts float within the frame, often gliding behind objects or suddenly enlarging to fill the screen (A2-B2). Such visual activity and use of shots-within-shots makes viewer gaze highly active (notice how the gaze heatmap is rarely clustered in one place; Figure 3). Note that this method of embedding a POV object shot within a glance shot is similar to Sherlock’s method of displaying text messages as both the glance, i.e., Watson looking at his phone, and the object, i.e., the message, are shown in one image. Both uses take full advantage of our ability to rapidly switch from watching action to reading text without having to wait for a cut to give us the information.

Figure 3: Gaze heatmap of eight participants watching a series of shots and digital inserts from Hulk (Ang Lee, 2003). Full heatmap video is available at

Figure 3: Gaze heatmap of eight participants watching a series of shots and digital inserts from Hulk (Ang Lee, 2003). Full heatmap video is available at

Similar techniques have been used Andy and Lana Wachowski’s films including most audaciously in Speed Racer (2008). Interestingly, both sets of filmmakers seem to intuitively understand that packing an image with as much visual and textual information as possible can lead to viewer fatigue and so they limit such intense periods to only a few minutes and separate them with more traditionally composed sequences (typically shot/reverse-shot dialogue sequences). These filmmakers have also demonstrated similar respect for viewer attention and the difficulty in actively locating and encoding visual information in a complex visual composition in their more recent 3D movies. Ang Lee’s Life of Pi (2012) uses the visual volume created by stereoscopic presentation to its full potential. Characters inhabit layers within the volume as foreground and background objects fluidly slide around each other within this space. The lessons Lee and his editor Tim Squyers learned on Hulk (2003) clearly informed the decisions they made when tackling their first 3D film and allowed them to avoid some of the issues most 3D films experience such as eye strain, sudden unexpected shifts in depth and an inability to ensure viewers are attending to the part of the image easiest to fuse across the two eye images (Banks, Read, Allison & Watt, 2012).

Watching Audio

I now turn to another topic featured in this special issue, the influence of audio on gaze (Robinson, Stadler and Rassell, this issue). Film and TV are inherently multimodal. Both media have always existed as a combination of visual and audio information. Even early silent film was almost always presented with either live musical accompaniment or a narrator. As such, the relative lack of empirical investigation into how the combination of audio and visual input influences how we perceive movies and, specifically how we attend to them is surprising. Robinson, Stadler and Rassell (this issue) have attempted to address this omission by comparing eye movements for participants either watching the original version of the Omaha beach sequence from Steven Spielberg’s Saving Private Ryan (1998) or the same sequence with the sound removed. This film sequence is a great choice for investigating AV influences on viewer experience as the intensity of the action, the hand-held cinematography and the immersive soundscape all work together to create a disorientating embodied experience for the viewer. The authors could have approached this question by simply showing a set of participants the sequence with audio and qualitatively describing the gaze behaviour at interesting AV moments during the sequence. Such description of the data would have served as inspiration for further investigation but in itself can’t say anything about the causal contribution of audio to this behaviour as there would be nothing to compare the behaviour to. Thankfully, the authors avoided this problem by choosing to manipulate the audio.

In order to identify the causal contribution of any factor you need to design an experiment in which that factor (known as the Independent Variable) is either removed or manipulated and the significant impact of this manipulation on the behaviour of interest (known as the Dependent Variable) is tested using appropriate inferential statistics. I commend Robinson, Stadler and Rassell’s experimental design as they present such an manipulation and are therefore able to produce data that will allow them to test their hypotheses about the causal impact of audio on viewer gaze behaviour. Several other papers in this special issue (Redmond, Sita and Vincs; Batty, Perkins and Sita) discuss gaze data (typically in the form of scanpaths or heatmaps) from one viewing condition without quantifying its difference to another viewing condition. As such, they are only able to describe the gaze data, not use it to test hypotheses. There is always a temptation to attribute too much meaning to a gaze heatmap (I too am guilty of this; Smith, 2013) due to their seeming intuitive nature (i.e., they looked here and not there) but, as in all psychological measures they are only as good as the experimental design within which there are employed.[vi]

Qualitative interpretation of individual fixation locations, scanpaths or group heatmaps are useful for informing initial interpretation of which visual details are most likely to make it into later visual processing (e.g. perception, encoding and long term memory representations) but care has to be taken in falsely assuming that fixation equals awareness (Smith, Lamont and Henderson, 2012). Also, the visual form of gaze heatmaps vary widely depending on how many participants contribute to the heatmap, which parameters you choose to generate the heatmaps and which oculomotor measures the heatmap represent (Holmqvist, et al., 2011). For example, I have demonstrated that unlike during reading visual encoding during scene perception requires over 150ms during each fixation (Rayner, Smith, Malcolm and Henderson, 2009). This means that if fixations with durations less than 150ms are included in a heatmap it may suggest parts of the image have been processed which in actual fact were fixated too briefly to be processed adequately. Similarly, heatmaps representing fixation duration instead of just fixation location have been shown to be a better representation of visual processing (Henderson, 2003). Heatmaps have an immediate allure but care has to be taken about imposing too much meaning on them especially when the gaze and the image are changing over time (see Smith and Mital, 2013; and Sawahata et al, 2008 for further discussion). As eye tracking hardware becomes more available to researchers from across a range of disciplines we need to work harder to ensure that it is not used inappropriately and that the conclusions that are drawn from eye tracking data are theoretically and statistically motivated (see Rayner, 1998; and Holmqvist et al, 2013 for clear guidance on how to conduct sound eye tracking studies).

Given that Robinson, Stadler and Rassell (this issue) manipulated the critical factor, i.e., the presence of audio the question now is whether their study tells us anything new about the AV influences on gaze during film viewing. To examine the influence of audio they chose two traditional methods for expressing the gaze data: area of interest (AOI) analysis and dispersal. By using nine static (relative to the screen) AOIs they were able to quantify how much time the gaze spent in each AOI and utilise this measure to work out how distributed gaze was across all AOIs. Using these measures they reported a trend towards greater dispersal in the mute condition compared to the audio condition and a small number of significant differences in the amount of time spent in some regions across the audio conditions.

However, the conclusions we can draw from these findings are seriously hindered by the low sample size (only four participants were tested, meaning that any statistical test is unlikely to reveal significant differences) and the static AOIs that did not move with the image content. By locking the AOIs to static screen coordinates their AOI measures express the deviation of gaze relative to these coordinates, not to the image content. This approach can be informative for quantifying gaze exploration away from the screen centre (Mital, Smith, Hill and Henderson, 2011) but in order to draw conclusions about what was being fixated the gaze needs to be quantified relative to dynamic AOIs that track objects of interest on the screen (see Smith an Mital, 2013). For example, their question about whether we fixate a speaker’s mouth more in scenes where the clarity of the speech is difficult due to background noise (i.e., their “Indistinct Dialogue” scene) has previously been investigated in studies that have manipulated the presence of audio (Võ, Smith, Mital and Henderson, 2012) or the level of background noise (Buchan, Paré and Munhall, 2007) and measured gaze to dynamic mouth regions. As Robinson, Stadler and Rassell correctly predicted, lip reading increases as speech becomes less distinct or the listener’s linguistic competence in the spoken language decreases (see Võ et al, 2012 for review).

Similarly, by measuring gaze dispersal using a limited number of static AOIs they are losing considerable nuance in the gaze data and have to resort to qualitative description of unintuitive bar charts (figure 4). There exist several methods for quantifying gaze dispersal (see Smith and Mital, 2013, for review) and even open-source tools for calculating this measure and comparing dispersal across groups (Le Meur and Baccino, 2013). Some methods are as easy, if not easier to calculate than the static AOIs used in the present study. For example, the Euclidean distance between the screen centre and the x/y gaze coordinates at each frame of the movie provides a rough measure of how spread out the gaze is from the screen centre (typically the default viewing location; Mital et al, 2011) and a similar calculation can be performed between the gaze position of all participants within a viewing condition to get a measure of group dispersal.

Using such measures, Coutrot and colleagues (2012) showed that gaze dispersal is greater when you remove audio from dialogue film sequences and they have also observed shorter amplitude saccades and marginally shorter fixation durations. Although, I have recently shown that a non-dialogue sequence from Sergei Eisenstein’s Alexander Nevsky (1938) does not show significant differences in eye movement metrics when the accompanying music is removed (Smith, 2014). This difference in findings points towards interesting differences in the impact diegetic (within the depicted scene, e.g. dialogue) and non-diegetic (outside of the depicted scene, e.g. the musical score) may have on gaze guidance. It also highlights how some cinematic features may have a greater impact on other aspects of a viewer’s experience than those measureable by eye tracking such as physiological markers of arousal and emotional states. This is also the conclusion that Robinson, Stadler and Rassell come to.    

Listening to the Data (aka, What is Eye Tracking Good For?)

The methodological concerns I have raised in the previous section lead nicely to the article by William Brown, entitled There’s no I in Eye Tracking: How useful is Eye Tracking to Film Studies (this issue). I have known William Brown for several years through our attendance of the Society for Cognitive Studies of the Moving Image (SCSMI) annual conference and I have a deep respect for his philosophical approach to film and his ability to incorporate empirical findings from the cognitive neurosciences, including some references to my own work into his theories. Therefore, it comes somewhat as a surprise that his article openly attacks the application of eye tracking to film studies. However, I welcome Brown’s criticisms as it provides me with an opportunity to address some general assumptions about the scientific investigation of film and hopefully suggest future directions in which eye tracking research can avoid falling into some of the pitfalls Brown identifies.

Brown’s main criticisms of current eye tracking research are: 1) eye tracking studies neglect “marginal” viewers or marginal ways of watching movies; 2) studies so far have neglected “marginal” films; 3) they only provide “truisms”, i.e., already known facts; and 4) they have an implicit political agenda to argue that the only “true” way to study film is a scientific approach and the “best” way to make a film is to ensure homogeneity of viewer experience. I will address these criticisms in turn but before I do so I would like to state that a lot of Brown’s arguments could generally be recast as an argument against science in general and are built upon a misunderstanding of how scientific studies should be conducted and what they mean.

To respond to Brown’s first criticism that eye tracking “has up until now been limited somewhat by its emphasis on statistical significance – or, put simply, by its emphasis on telling us what most viewers look at when they watch films” (Brown, this issue; 1), I first have to subdivide the criticism into ‘the search for significance’ and ‘attentional synchrony’, i.e., how similar gaze is across viewers (Smith and Mital, 2013). Brown tells an anecdote about a Dutch film scholar who’s data had to be excluded from an eye tracking study because they did not look where the experimenter wanted them to look. I wholeheartedly agree with Brown that this sounds like a bad study as data should never be excluded for subjective reasons such as not supporting the hypothesis, i.e., looking as predicted. However, exclusion due to statistical reasons is valid if the research question being tested relates to how representative the behaviour of a small set of participants (known as the sample) are to the overall population. To explain when such a decision is valid and to respond to Brown’s criticism about only ‘searching for significance’ I will first need to provide a brief overview of how empirical eye tracking studies are designed and why significance testing is important.

For example, if we were interested in the impact sound had on the probability of fixating an actor’s mouth (e.g., Robinson, Stadler and Rassell, this issue) we would need to compare the gaze behaviour of a sample of participants who watch a sequence with the sound turned on to a sample who watched it with the sound turned off. By comparing the behaviour between these two groups using inferential statistics we are testing the likelihood that these two viewing conditions would differ in a population of all viewers given the variation within and between these two groups. In actual fact we do this by performing the opposite test: testing the probability that that the two groups belong to a single statistically indistinguishable group. This is known as the null hypothesis. By showing that there is less than a 5% chance that the null hypothesis is true we can conclude that there is a statistically significant chance that another sample of participants presented with the same two viewing conditions would show similar differences in viewing behaviour.

In order to test whether our two viewing conditions belong to one or two distributions we need to be able to express this distribution. This is typically done by identifying the mean score for each participant on the dependent variable of interest, in this case the probability of fixating a dynamic mouth AOI then calculating the mean for this measure across all participants within a group and their variation in scores (known as the standard deviation). Most natural measures produce a distribution of scores looking somewhat like a bell curve (known as the normal distribution) with most observations near the centre of the distribution and an ever decreasing number of observations as you move away from this central score. Each observation (in our case, participants) can be expressed relative to this distribution by subtracting the mean of the distribution from its score and dividing by the standard deviation. This converts a raw score into a normalized or z-score. Roughly ninety-five percent of all observations will fall within two standard deviations of the mean for normally distributed data. This means that observations with a z-score greater than two are highly unrepresentative of that distribution and may be considered outliers.

However, being unrepresentative of the group mean is insufficient motivation to exclude a participant. The outlier still belongs to the group distribution and should be included unless there is a supporting reason for exclusion such as measurement error, e.g. poor calibration of the eye tracker. If an extreme outlier is not excluded it can often have a disproportionate impact on the group mean and make statistical comparison of groups difficult. However, if this is the case it suggests that the sample size is too small and not representative of the overall population. Correct choice of sample size given an estimate of the predicted effect size combined with minimising measurement error should mean that subjective decisions do not have to be made about who’s data is “right” and who should be included or excluded.

Brown also believes that eye tracking research has so far marginalised viewers who have atypical ways of watching film, such as film scholars either by not studying them or treating them as statistical outliers and excluding them from analyses. However, I would argue that the only way to know if their way of watching a film is atypical is to first map out the distribution of how viewers typically watch films. If a viewer attended more to the screen edge than the majority of other viewers in a random sample of the population (as was the case with Brown’s film scholar colleague) this should show up as a large z-score when their gaze data is expressed relative to the group on a suitable measure such as Euclidean distance from the screen centre. Similarly, a non-native speaker of English may have appeared as an outlier in terms of how much time they spent looking at the speaker’s mouth in Robinson, Stadler and Rassell’s (this issue) study. Such idiosyncrasies may be of interest to researchers and there are statistical methods for expressing emergent groupings within the data (e.g. cluster analysis) or seeing whether group membership predicts behaviour (e.g. regression). These approaches may have not previously been applied to questions of film viewing but this is simply due to the immaturity of the field and the limited availability of the equipment or expertise to conduct such studies.

In my own recent work I have shown how viewing task influences how we watch unedited video clips (Smith and Mital, 2013), how infants watch TV (Wass and Smith, in press), how infant gaze differs to adult gaze (Smith, Dekker, Mital, Saez De Urabain and Karmiloff-Smith, in prep) and even how film scholars attend to and remember a short film compared to non-expert film viewers (Smith and Smith, in prep). Such group viewing differences are of great interest to me and I hope these studies illustrate how eye tracking has a lot to offer to such research questions if the right statistics and experimental designs are employed.

Brown’s second main criticism is that the field of eye tracking neglects “marginal” films. I agree that the majority of films that have so far been used in eye tracking studies could be considered mainstream. For example, the film/TV clips used in this special issue include Sherlock (2010), Up (2009) and Saving Private Ryan (1998). However, this limit is simply a sign of how few eye tracking studies of moving images there have been. All research areas take time to fully explore the range of possible research questions within that area.

I have always employed a range of films from diverse film traditions, cultures, and languages. My first published eye tracking study (Smith and Henderson, 2008) used film clips from Citizen Kane (1941), Dogville (2003), October (1928), Requiem for a Dream (2000), Dancer in the Dark (2000), Koyaanisqatsi (1982) and Blade Runner (1982). Several of these films may be considered “marginal” relative to the mainstream. If I have chosen to focus most of my analyses on mainstream Hollywood cinema this is only because they were the most suitable exemplars of the phenomena I was investigating such as continuity editing and its creation of a universal pattern of viewing (Smith, 2006; 2012). This interest is not because, as Brown argues, I have a hidden political agenda or an implicit belief that this style of filmmaking is the “right” way to make films. I am interested in this style because it is the dominant style and, as a cognitive scientist I wish to use film as a way of understanding how most people process audiovisual dynamic scenes.

Hollywood film stands as a wonderfully rich example of what filmmakers think “fits” human cognition. By testing filmmaker intuitions and seeing what impact particular compositional decisions have on viewer eye movements and behavioural responses I hope to gain greater insight into how audiovisual perception operates in non-mediated situations (Smith, Levin and Cutting, 2012). But, just as a neuropsychologist can learn about typical brain function by studying patients with pathologies such as lesions and strokes, I can also learn about how we perceive a “typical” film by studying how we watch experimental or innovative films. My previous work is testament to this interest (Smith, 2006; 2012a; 2012b; 2014; Smith & Henderson, 2008) and I hope to continue finding intriguing films to study and further my understanding of film cognition.

One practical reason why eye tracking studies rarely use foreign language films is the presence of subtitles. As has been comprehensively demonstrated by other authors in this special issue (Kruger, Szarkowska and Krejtz, this issue) and earlier in this article, the sudden appearance of text on the screen, even if it is incomprehensible leads to differences in eye movement behaviour. This invalidates the use of eye tracking as a way to measure how the filmmaker intended to shape viewer attention and perception. The alternatives would be to either use silent film (an approach I employed with October; Smith and Henderson, 2008), remove the audio (which changes gaze behaviour and awareness of editing; Smith & Martin-Portugues Santacreau, under review) or use dubbing (which can bias the gaze down to the poorly synched lips; Smith, Batten, and Bedford, 2014). None of these options are ideal for investigating foreign language sound film and until there is a suitable methodological solution this will restrict eye tracking studies to experimental films in a participant’s native language.

Finally, I would like to counter Brown’s assertion that eye tracking investigations of film have so far only generated “truisms”. I admit that there is often a temptation to reduce empirical findings to simplified take-home messages that only seem to confirm previous intuitions such as a bias of gaze towards the screen centre, towards speaking faces, moving objects or subtitles. However, I would argue that such messages fail to appreciate the nuance in the data. Empirical data correctly measured and analysed can provide subtle insights into a phenomenon that subjective introspection could never supply.

For example, film editors believe that an impression of continuous action can be created across a cut by overlapping somewhere between two (Anderson, 1996) and four frames (Dmytryk, 1986) of the action. However, psychological investigations of time perception revealed that our judgements of duration depend on how attention is allocated during the estimated period (Zakay and Block, 1996) and will vary depending on whether our eyes remain still or saccade during the period (Yarrow et al, 2001). In my thesis (Smith, 2006) I used simplified film stimuli to investigate the role that visual attention played in estimation of temporal continuity across a cut and found that participants experienced an overlap of 58.44ms as continuous when an unexpected cut occurred during fixation and an omission of 43.63ms as continuous when they performed a saccade in response to the cut. As different cuts may result in different degrees of overt (i.e., eye movements) and covert attentional shifts these empirical findings both support editor intuitions that temporal continuity varies between cuts (Dmytryk, 1986) whilst also explaining the factors that are important in influencing time perception at a level of precision not possible through introspection.

Reflecting on our own experience of a film suffers from the fact that it relies on our own senses and cognitive abilities to identify, interpret and express what we experience. I may feel that my experience of a dialogue sequence from Antichrist (2010) differs radically from a similar sequence from Secrets & Lies (1996) but I would be unable to attribute these differences to different aspects of the two scenes without quantifying both the cinematic features and my responses to them. Without isolating individual features I cannot know their causal contribution to my experience. Was it the rapid camera movements in Antichrist, the temporally incongruous editing, the emotionally extreme dialogue or the combination of these features that made me feel so unsettled whilst watching the scene? If one is not interested in understanding the causal contributions of each cinematic decision to an audience member’s response then one may be content with informed introspection and not find empirical hypothesis testing the right method. I make no judgement about the validity of either approach as long as each researcher understands the limits of their approach.

Introspection utilises the imprecise measurement tool that is the human brain and is therefore subject to distortion, human bias and an inability to extrapolate the subjective experience of one person to another. Empirical hypothesis testing also has its limitations: research questions have to be clearly formulated so that hypotheses can be stated in a way that allows them to be statistically tested using appropriate observable and reliable measurements. A failure at any of these stages can invalidate the conclusions that can be drawn from the data. For example, an eye tracker may be poorly calibrated resulting in an inaccurate record of where somebody was looking or it could be used to test an ill formed hypothesis such as how a particular film sequence caused attentional synchrony without having another film sequence to compare the gaze data to. Each approach has its strength and weaknesses and no single approach should be considered “better” than any other, just as no film should be considered “better” than any other film.


The articles collected here constitute the first attempt to bring together interdisciplinary perspectives on the application of eye tracking to film studies. I fully commend the intention of this special issue and hope that it encourages future researchers to conduct further studies using these methods to investigate research questions and film experiences we have not even conceived of. However, given that the recent release of low-cost eye tracking peripherals such as the EyeTribe[vii] tracker and the Tobii EyeX[viii] has moved eye tracking from a niche and highly expensive research tool to an accessible option for researchers in a range of disciplines, I need to take this opportunity to issue a word of warning. As I have outlined in this article, eye tracking is like any other research tool in that it is only useful if used correctly, its limitations are respected, its data is interpreted through the appropriate application of statistics and conclusions are only drawn that are based on the data in combination with a sound theoretical base. Eye tracking is not the “saviour” of film studies , nor is science the only “valid” way to investigate somebody’s experience of a film. Hopefully, the articles in this special issue and the ideas I have put forward here suggest how eye tracking can function within an interdisciplinary approach to film analysis that furthers our appreciation of film in previously unfathomed ways.



Thanks to Rachael Bedford, Sean Redmond and Craig Batty for comments on earlier drafts of this article. Thank you to John Henderson, Parag Mital and Robin Hill for help in gathering and visualising the eye movement data used in the Figures presented here. Their work was part of the DIEM Leverhulme Trust funded project ( The author, Tim Smith is funded by EPSRC (EP/K012428/1), Leverhulme Trust (PLP-2013-028) and BIAL Foundation grant (224/12).



Anderson, Joseph. 1996. The Reality of Illusion: An Ecological Approach to Cognitive Film Theory. Southern. Illinois University Press.

Batty, Craig, Claire Perkins and Jodi Sita. 2015. “How We Came To Eye Tracking Animation: A Cross-Disciplinary Approach to Researching the Moving Image”, Refractory: a Journal of Entertainment Media, 25.

Banks, Martin S., Jenny R. Read, Robert S. Allison and Simon J. Watt. 2012. “Stereoscopy and the human visual system.” SMPTE Mot. Imag. J., 121 (4), 24-43

Bradley, Margaret M., Laura Miccoli, Miguel A. Escrig and Peter J. Lang. 2008. “The pupil as a measure of emotional arousal and autonomic activation.” Psychophysiology, 45(4), 602-607.

Branigan, Edward R. 1984. Point of View in the Cinema: A Theory of Narration and Subjectivity in Classical Film. Berlin: Mouton.

Brown, William. 2015. “There’s no I in Eye Tacking: How Useful is Eye Tracking to Film Studies?”, Refractory: a Journal of Entertainment Media, 25.

Buchan, Julie N., Martin Paré and Kevin G. Munhall. 2007. “Spatial statistics of gaze fixations during dynamic face processing.” Social Neuroscience, 2, 1–13.

Coutrot, Antoine, Nathalie Guyader, Gelu Ionesc and Alice Caplier. 2012. “Influence of Soundtrack on Eye Movements During Video Exploration”, Journal of Eye Movement Research 5, no. 4.2: 1-10.

Cutting, James. E., Jordan E. DeLong and Christine E. Nothelfer. 2010. “Attention and the evolution of Hollywood film.” Psychological Science, 21, 440-447.

Dwyer, Tessa. 2015. “From Subtitles to SMS: Eye Tracking, Texting and Sherlock”, Refractory: a Journal of Entertainment Media, 25.

Dyer, Adrian. G and Sarah Pink. 2015. “Movement, attention and movies: the possibilities and limitations of eye tracking?”, Refractory: a Journal of Entertainment Media, 25.

Dmytryk, Edward. 1986. On Filmmaking. London, UK: Focal Press.

Henderson, John. M., 2003. “Human gaze control during real-world scene perception.” Trends in Cognitive Sciences, 7, 498-504.

Hochberg, Julian and Virginia Brooks. 1978). “Film Cutting and Visual Momentum”. In John W. Senders, Dennis F. Fisher and Richard A. Monty (Eds.), Eye Movements and the Higher Psychological Functions (pp. 293-317). Hillsdale, NJ: Lawrence Erlbaum.

Holmqvist, Kenneth, Marcus Nyström, Richard Andersson, Richard Dewhurst, Halszka Jarodzka and Joost van de Weijer. 2011. Eye Tracking: A comprehensive guide to methods and measures. Oxford, UK: OUP Press.

James, William. 1890. The principles of psychology (Vol.1). New York: Holt

Kruger, Jan Louis, Agnieszka Szarkowska and Izabela Krejtz. 2015. “Subtitles on the Moving Image: An Overview of Eye Tracking Studies”, Refractory: a Journal of Entertainment Media, 25.

Le Meur, Olivier and Baccino, Thierry. 2013. “Methods for comparing scanpaths and saliency maps: strengths and weaknesses.” Behavior research methods, 45(1), 251-266.

Magliano, Joseph P. and Jeffrey M. Zacks. 2011. “The Impact of Continuity Editing in Narrative Film on Event Segmentation.” Cognitive Science, 35(8), 1-29.

Mital, Parag K., Tim J. Smith, Robin Hill. and John M. Henderson. 2011. “Clustering of gaze during dynamic scene viewing is predicted by motion.” Cognitive Computation, 3(1), 5-24

Rayner, Keith. 1998. “Eye movements in reading and information processing: 20 years of research”. Psychological Bulletin, 124(3), 372-422.

Rayner, Keith, Tim J. Smith, George Malcolm and John M. Henderson, J.M. 2009. “Eye movements and visual encoding during scene perception.” Psychological Science, 20, 6-10.

Raz, Gal, Yael Jacob, Tal Gonen, Yonatan Winetraub, Tamar Flash, Eyal Soreq and Talma Hendler. 2014. “Cry for her or cry with her: context-dependent dissociation of two modes of cinematic empathy reflected in network cohesion dynamics.” Social cognitive and affective neuroscience, 9(1), 30-38.

Redmond, Sean, Jodi Sita and Kim Vincs. 2015. “Our Sherlockian Eyes: the Surveillance of Vision”, Refractory: a Journal of Entertainment Media, 25.

Robinson, Jennifer, Jane Stadler and Andrea Rassell. 2015. “Sound and Sight: An Exploratory Look at Saving Private Ryan through the Eye-tracking Lens”, Refractory: a Journal of Entertainment Media, 25.

Salt, Barry. 2009. Film Style and Technology: History and Analysis (Vol. 3rd). Totton, Hampshire, UK: Starword.

Sawahata, Yasuhito, Rajiv Khosla, Kazuteru Komine, Nobuyuki Hiruma, Takayuki Itou, Seiji Watanabe, Yuji Suzuki, Yumiko Hara and Nobuo Issiki. 2008. “Determining comprehension and quality of TV programs using eye-gaze tracking.” Pattern Recognition, 41(5), 1610-1626.

Smith, Murray. 2011. “Triangulating Aesthetic Experience”, paper presented at the annual Society for Cognitive Studies of the Moving Image conference, Budapest, June 8–11, 201

Smith, Tim J. 2006. An Attentional Theory of Continuity Editing. Ph.D., University of Edinburgh, Edinburgh, UK.

Smith, Tim J. 2012a. “The Attentional Theory of Cinematic Continuity”, Projections: The Journal for Movies and the Mind. 6(1), 1-27.

Smith, Tim J. 2012b. “Extending AToCC: a reply,” Projections: The Journal for Movies and the Mind. 6(1), 71-78

Smith, Tim J. 2013. “Watching you watch movies: Using eye tracking to inform cognitive film theory.” In A. P. Shimamura (Ed.), Psychocinematics: Exploring Cognition at the Movies. New York: Oxford University Press. pages 165-191

Smith, Tim J. 2014. “Audiovisual correspondences in Sergei Eisenstein’s Alexander Nevsky: a case study in viewer attention”. Cognitive Media Theory (AFI Film Reader), Eds. P. Taberham & T. Nannicelli.

Smith, Tim J., Jonathan Batten and Rachael Bedford. 2014. “Implicit detection of asynchronous audiovisual speech by eye movements.” Journal of Vision,14(10), 440-440.

Smith, Tim J., Dekker, T., Mital, Parag K., Saez De Urabain, I. R. & Karmiloff-Smith, A., In Prep. “Watch like mother: Motion and faces make infant gaze indistinguishable from adult gaze during Tot TV.”

Smith, Tim J. and John M. Henderson. 2008. “Edit Blindness: The relationship between attention and global change blindness in dynamic scenes”. Journal of Eye Movement Research, 2(2):6, 1-17.

Smith Tim J., Peter Lamont and John M. Henderson. 2012. “The penny drops: Change blindness at fixation.” Perception 41(4) 489 – 492

Smith, Tim J., Daniel Levin and James E. Cutting. 2012. “A Window on Reality: Perceiving Edited Moving Images.” Current Directions in Psychological Science. 21: 101-106

Smith, Tim J. and Parag K. Mital. 2013. “Attentional synchrony and the influence of viewing task on gaze behaviour in static and dynamic scenes”. Journal of Vision 13(8): 16.

Smith, Tim J. and Janet Y. Martin-Portugues Santacreu. Under Review. “Match-Action: The role of motion and audio in limiting awareness of global change blindness in film.”

Smith, Tim. J. and Murray Smith. In Prep. “The impact of expertise on eye movements during film viewing.”

Suckfull, Monika. 2000. “Film Analysis and Psychophysiology Effects of Moments of Impact and Protagonists”. Media Psychology2(3), 269-301.

Vilaro, Anna and Tim J. Smith. 2011. “Subtitle reading effects on visual and verbal information processing in films.” Published abstract In Perception. ECVP abstract supplement, 40. (p. 153). European Conference on Visual Perception. Toulousse, France.

Velichkovsky, Boris M., Sascha M. Dornhoefer, Sebastian Pannasch and Pieter J. A. Unema. 2001. “Visual fixations and level of attentional processing”. In Andrew T. Duhowski (Ed.), Proceedings of the International Conference Eye Tracking Research & Applications, Palm Beach Gardens, FL, November 6-8, ACM Press.

Wass, Sam V. and Tim J. Smith. In Press. “Visual motherese? Signal-to-noise ratios in toddler-directed television,” Developmental Science

Yarrow, Kielan, Patrick Haggard, Ron Heal, Peter Brown and John C. Rothwell. 2001. “Illusory perceptions of space and time preserve cross-saccadic perceptual continuity”. Nature, 414.

Zakay, Dan and Richard A. Block. 1996. Role of Attention in Time Estimation Processes. Time, Internal Clocks, and Movement. Elsevier Science.



[ii] An alternative take on eye tracking data is to divorce the data itself from psychological interpretation. Instead of viewing a gaze point as an index of where a viewer’s overt attention is focussed and a record of the visual input most likely to be encoded into the viewer’s long-term experience of the media, researchers can instead take a qualitative, or even aesthetic approach to the data. The gaze point becomes a trace of some aspect of the viewer’s engagement with the film. The patterns of gaze, its movements across the screen and the coordination/disagreement between viewers can inform qualitative interpretation without recourse to visual cognition. Such an approach is evident in several of the articles in this special issue (including Redmond, Sita, and Vincs, this issue; Batty, Perkins, and Sita, this issue). This approach can be interesting and important for stimulating hypotheses about how such patterns of viewing have come about and may be a satisfying endpoint for some disciplinary approaches to film. However, if researchers are interested in testing these hypotheses further empirical manipulation of the factors that are believed to be important and statistical testing would be required. During such investigation current theories about what eye movements are and how they relate to cognition must also be respected.

[iii] Although, one promising area of research is the use of pupil diameter changes as an index of arousal (Bradley, Miccoli, Escrig and Lang, 2008).

[iv] This technique has been used for decades by producers of TV advertisements and by some “pop” serials such as Hollyoaks in the UK (Thanks for Craig Batty for this observation).

[v] This trend in increasing pace and visual complexity of film is confirmed by statistical analyses of film corpora over time (Cutting, DeLong and Nothelfer, 2010) and has resulted in a backlash and increasing interest in “slow cinema”.

[vi] Other authors in this special issue may argue that taking a critical approach to gaze heatmaps without recourse to psychology allows them to embed eye tracking within their existing theoretical framework (such as hermeneutics). However, I would warn that eye tracking data is simply a record of how a relatively arbitrary piece of machinery (the eye tracking hardware) and associated software decided to represent the centre of a viewer’s gaze. There are numerous parameters that can be tweaked to massively alter how such gaze traces and heatmaps appear. Without understanding the psychology and the physiology of the human eye a researcher cannot know how to set these parameters, how much to trust the equipment they are using, or the data it is recording and as a consequence may over attribute interpretation to a representation that is not reliable.

[vii] (accessed 13/12/14). The EyeTribe tracker is $99 and is as spatially and temporally accurate (up to 60Hz sampling rate) as some science-grade trackers.

[viii] (accessed 13/12/14). The Tobii EyeX tracker is $139, samples at 30Hz and is as spatially accurate as the EyeTribe although the EyeX does not give you as much access to the raw gaze data (e.g., pupil size and binocular gaze coordinates) as the EyeTribe.



Dr Tim J. Smith is a senior lecturer in the Department of Psychological Sciences at Birkbeck, University of London. He applies empirical Cognitive Psychology methods including eye tracking to questions of Film Cognition and has published extensively on the subject both in Psychology and Film journals.


Subtitles on the Moving Image: an Overview of Eye Tracking Studies – Jan Louis Kruger, Agnieszka Szarkowska and Izabela Krejtz


This article provides an overview of eye tracking studies on subtitling (also known as captioning), and makes recommendations for future cognitive research in the field of audiovisual translation (AVT). We find that most studies in the field that have been conducted to date fail to address the actual processing of verbal information contained in subtitles, and rather focus on the impact of subtitles on viewing behaviour. We also show how eye tracking can be utilised to measure not only the reading of subtitles, but also the impact of stylistic elements such as language usage and technical issues such as the presence of subtitles during shot changes on the cognitive processing of the audiovisual text as a whole. We support our overview with empirical evidence from various eye tracking studies conducted on a number of languages, language combinations, viewing contexts as well as different types of viewers/readers, such as hearing, hard of hearing and Deaf people.


The reading of printed text has received substantial attention from scholars since the 1970s (for an overview of the first two decades see Rayner et al. 1998). Many of these studies, conducted from a psycholinguistic angle, made use of eye tracking. As a result, a large body of knowledge exists on the eye movements during reading of people with varying levels of reading skills and language proficiency, with a range of ages, different first languages and cultural backgrounds, and in different contexts. Studies on subtitle reading, however, have not achieved the same level of scientific rigour largely for practical reasons: subtitles are not static for more than a few seconds at a time; they compete for visual attention with a moving image; and they compete for overall cognitive resources with verbal and non-verbal sounds. This article will identify some of the gaps in current research in the field, and also illustrate how some of these gaps can be bridged.

Studying the reading of subtitles is significantly different from studying the reading of static text. In the first place, as far as eye tracking software is concerned, the subtitles appear on a moving image as image rather than text, which renders traditional text-based reading statistics and software all but useless. This also makes the collection of data for reading research on subtitles a painstakingly slow process involving substantial manual inspections and coding. Secondly, the fact that subtitles appear against the background of the moving image means that they are always in competition with this image, which renders the reading process fundamentally different from the reading process of static texts: on the one hand because the reading of subtitles compete with the processing of the image, sometimes resulting in interrupted reading, but on the other hand the limited time the subtitles are on screen means that readers have less time to reread or regress to study difficult words or to check information. Either way, studying this reading process, and the cognitive processing that takes place during the reading, is much more complicated than in the case of static texts where we know that the reader is mainly focussing on the words before her/him without additional auditory and visual information to process.

While the viewing of subtitles has been the object of many eye tracking studies in recent years, with increasing frequency (see, for example Bisson et al. 2012; d’Ydewalle and Gielen 1992; d’Ydewalle and De Bruycker 2007; Ghia 2012; Krejtz et al. 2013; Kruger 2013; Kruger et al. 2013; Kruger and Steyn 2014; Perego et al. 2010; Rajendran et al. 2013; Specker 2008; Szarkowska et al. 2011; Winke et al. 2013), the study of the reading of subtitles remains a largely uncharted territory with many research avenues still to be explored. Those studies that do venture to measure more than just attention to the subtitle area, seldom do this for extended texts.

In this article we provide an overview of studies on how subtitles change the way viewers process audiovisual material, and also of studies on the unique characteristics of the subtitle reading process. Taking an analysis of the differences between reading printed (static) text and subtitles as point of departure, we examine a number of aspects typical of the way subtitle text is processed in reading. We also look at the impact of the dynamic nature of the text and the competition with other sources of information on the reading process (including scene perception, changes in the viewing process, shifts between subtitles and image, visual saliency of text, faces, and movement, and cognitive load), as well as discussing studies on the impact of graphic elements on subtitle reading (e.g. number of lines, and text chunking), and studies that attempt to measure the subtitle reading process in more detail.

We start off with a discussion of the way in which watching an audiovisual text with subtitles alters viewing behaviour as well as of the complexities of studying subtitles due to the dynamic nature of the image it has as a backdrop. Here we focus on the fleeting nature of the subtitle text, the competition between reading the subtitles and scanning the image, and the interaction between different sources of information. We further discuss internal factors that impact on subtitle processing, like the language and culture of the audience, the language of the subtitles, the degree of access the audience has to sound, and other internal factors, before turning to external factors related to the nature of the audiovisual text and the presentation of the subtitles. Finally, we provide an overview of studies attempting to measure the processing of subtitles as well as findings from two studies that approach the processing of subtitles

The dynamic nature of the subtitle reading process

Reading subtitles differs substantially from reading printed text in a number of respects. As opposed to “static text on a stable background”, the viewer of subtitled audiovisual material is confronted with “fleeting text on a dynamic background” (Kruger and Steyn 2014, 105). In consequence, viewers not only need to process and integrate information from different communication channels (verbal visual, non-verbal visual, verbal auditory, non-verbal auditory, see Gottlieb 1998), but they also have no control over the presentation speed (see Kruger and Steyn 2014; Szarkowska et al. forthcoming). As a consequence, unlike in the reading of static texts, the pace of reading is in part dictated by the text rather than the reader – by the time the text is available to be read – and there is much less time for the reader to regress to an earlier part of a sentence or phrase, and no opportunity to return to previous sentences. Reading traditionally takes place in a limited window which the reader is acutely aware will disappear in a few seconds. Even though there are exceptions to the level of control a viewer has, for example in the case of DVD and PVR as well as other electronic media where the viewer can rewind and forward at will, the typical viewing of subtitles for most audiovisual products happens continuously and without pauses just as when watching live television.

Regressions, which form an important consideration in the reading of static text, take on a different aspect in the context of the knowledge (the viewer has) that dwelling too much on any part of a subtitle may make it difficult to finish reading the subtitle before it disappears. Any subtitle is on screen for between one and six seconds, and the viewer also has to simultaneously process all the other auditory (in the case of hearing audiences) and visual cues. In other words, unlike when reading printed text, reading becomes only one of the cognitive processes the viewer has to juggle in order to understand the audiovisual text as a whole. Some regressions are in fact triggered by the change of the image in shot changes (and to a much lesser extent scene changes) when the text stays on across these boundaries, which means that the viewer sometimes returns to the beginning of the subtitle to check whether it is a new subtitle, and sometimes even re-reads the subtitle. For example, in a recent study, Krejtz et al. (2013) established that participants tend not to re-read subtitles after a shot change or cut. But their data also revealed that a proportion of the participants did return their gaze to the beginning of the subtitle after such a change (see also De Linde and Kay, 1999). What this means for the study of subtitle reading is that these momentary returns (even if only for checking) result in a class of regressions that is not in fact a regression to re-read a word or section, but rather a false initiation of reading for what some viewers initially perceive to be a new sentence.

On the positive side, the fact that subtitles are embedded on a moving image and are accompanied by a soundtrack (in the case of hearing audiences) facilitates the processing of language in context. Unfortunately, this context also introduces competition for attention and cognitive resources. For the Deaf and hard of hearing audience, attention has to be divided between reading the subtitles and processing the scene, extracting information from facial expressions, lip movements and gestures, and matching or checking this against the information obtained in the subtitles. For the hearing audience who makes use of subtitles for support or to provide access to foreign language dialogue, attention is likewise divided between subtitles and the visual scene, and just as the Deaf and hard of hearing audiences have the added demand on their cognitive resources of having to match what they read with what they get from non-verbal signs and lip movements, the hearing audience matches what they read with what they hear, checking for correspondence of information and interpreting intonation, tenor and other non-verbal elements of speech.

What stands beyond doubt is that the appearance of subtitles changes the viewing process. In 2000, Jensema et al. famously stated that “the addition of captions to a video resulted in major changes in eye movement patterns, with the viewing process becoming primarily a reading process” (2000a, 275). Having examined the eye movements of six subjects watching video clips with and without subtitles, they found that the onset of a subtitle triggers a change in the eye movement pattern: when a subtitle appears, viewers move their gaze from whatever they were watching in order to follow the subtitle. In a more wide-scale study it was concluded by d’Ydewalle and de Bruycker (2007,196) that “paying attention to the subtitle at its presentation onset is more or less obligatory and is unaffected by major contextual factors such as the availability of the soundtrack, knowledge of the foreign language in the soundtrack, and important episodic characteristics of actions in the movie: Switching attention from the visual image to “reading” the subtitles happens effortlessly and almost automatically”.

Subtitles therefore appear to be the cause of eye movement bias similar to faces (see Hershler & Hochstein, 2005; Langton, Law, Burton, & Schweinberger, 2008; Yarbus, 1967), the centre of the screen, contrast and movement. In other words, subtitles attract the gaze at least in part because of the fact that the eye is drawn to the words on screen just as the eye is drawn to movement and other elements. Eyes are drawn to subtitles not only because the text is identified as a source of meaningful information (in other words a top-down impulse as the viewer consciously consults the subtitles to obtain relevant information), but also because of the change to the scene that the appearance of a subtitle causes (in other words a bottom-up impulse, automatically drawing the eyes to what has changed on the screen).

As in most other contexts, the degree to which viewers will process the subtitles (i.e. read them rather than merely look at them when they appear and then look away) will be determined by the extent to which they need the subtitles to follow the dialogue or to obtain information on relevant sounds. In studying visual attention to subtitles it therefore remains a priority to measure the degree of processing, something that has not been done in more than a handful of studies, and something to which we will return later in the article.

Viewers usually attend to the image on the screen, but when subtitles appear, it only takes a few frames for most viewers to move their gaze to read the subtitles. The fact that people tend to move their gaze to subtitles the moment they appear on the screen is illustrated in Figures 1 and 2.

Figure. 1 Heat maps of three consecutive film stills – Polish news programme Fakty (TVN) with intralingual subtitles.

Figure. 1 Heat maps of three consecutive film stills – Polish news programme Fakty (TVN) with intralingual subtitles.

Figure 2. Heat maps of two consecutive film stills – Polish news programme Wiadomości (TVP1) with intralingual subtitles

Figure 2. Heat maps of two consecutive film stills – Polish news programme Wiadomości (TVP1) with intralingual subtitles

Likewise, when the gaze of a group of viewers watching an audiovisual text without subtitles is compared to that of a similar group watching the same text with subtitles, the split in attention is immediately visible as the second group reads the subtitles and attends less to the image, as can be seen in Figure 3.

Figure 3. Heat maps of the same scene seen without subtitles and with subtitles – recording of an academic lecture.

Figure 3. Heat maps of the same scene seen without subtitles and with subtitles – recording of an academic lecture.

Viewer-internal factors that impact on subtitle processing

The degree to which the subtitles are processed is far from straightforward. In a study performed at a South African university in the context of Sesotho students looking at a recorded lecture with subtitles in their first language and audio in English (their language of instruction), students were found to avoid looking at the subtitles (see Kruger, Hefer and Matthew, 2013b). Sesotho students in a different group who saw the same lecture with English subtitles processed the subtitles to a much larger extent. This contrast is illustrated in the focus maps in Figures 4.


Figure 4. Focus maps of Sesotho students looking at a lecture in intralingual English subtitles (left) and another group looking at the same lecture with interlingual Sesotho subtitles (right) – recording of an academic lecture.

The difference in eye movement behaviour between the conditions is also evident when considering the number of subtitles skipped. Participants in the above study who saw the video with Sesotho subtitles skipped an average of around 50% of the Sesotho subtitles (median at around 58%), whereas participants who saw the video with English subtitles only skipped an average of around 20% of the English subtitles (with a median of around 8%) (see Kruger, Hefer & Matthew, 2014).

This example does not, however, represent the conventional use of subtitles where viewers would rely on the subtitles to gain access to a text from which they would have been excluded without the subtitles. It does serve to illustrate that subtitle reading is not unproblematic and that more research is needed on the nature of processing in different contexts by different audiences. For example, in a study in Poland, interlingual subtitles (English to Polish) were skipped slightly less often by hearing viewers compared to intralingual subtitles (Polish to Polish), possibly because hearing viewers didn’t need them to follow the plot (see Szarkowska et al., forthcoming).

Another important finding from eye tracking studies on the subtitle process relates to how viewers typically go about reading a subtitle. Jensema et al. (2000) found that in subtitled videos, “there appears to be a general tendency to start by looking at the middle of the screen and then moving the gaze to the beginning of a caption within a fraction of a second. Viewers read the caption and then glance at the video action after they finish reading” (2000, 284). This pattern is indeed often found, as illustrated in the sequence of frames from a short video from our study in Figure 5.

Figure 5. Sequence of typical subtitle reading – a recording of Polish news programme Fakty (TVN) with intralingual subtitles.

Figure 5. Sequence of typical subtitle reading – a recording of Polish news programme Fakty (TVN) with intralingual subtitles.

Some viewers, however, do not read so smoothly and tend to shift their gaze between the image and the subtitles, as demonstrated in Figure 6. The gaze shifts between the image and the subtitle, also referred to in literature as ‘deflections’ (de Linde and Kay 1999) or ‘back-and-forth shifts’ (d’Ydewalle and De Bruycker (2007), can be regarded as an indication of the smoothness of the subtitle reading process: the fewer the gaze shifts, the more fluent the reading and vice versa.

Figure 6. Scanpath of frequent gaze shifting between text and image – a recording of Polish news programme Fakty (TVN) with intralingual subtitles.

Figure 6. Scanpath of frequent gaze shifting between text and image – a recording of Polish news programme Fakty (TVN) with intralingual subtitles.

An important factor that influences subtitle reading patterns is the nature of the audience. In Figure 7 an interesting difference is shown between the way a Deaf and a hard of hearing viewer watched a subtitled video. The Deaf viewer moved her gaze from the centre of the screen to read the subtitle and then, after having read the subtitle, returned the gaze to the centre of the screen. In contrast, the hard of hearing viewer made constant comparisons between the subtitles and the image, possibly relying on residual hearing and trying to support the subtitle reading process with lip-reading. Such a result was reported by Szarkowska et al. (2011), who found differences in the number of gaze shifts between the subtitles and the image in the verbatim subtitles condition, particularly discernible (and statistically significant) in the hard of hearing group (when compared to the hearing and Deaf groups).

Figure 7. Scanpaths of Deaf and hard of hearing viewers. Left: Gaze plot illustrating the viewing pattern of a Deaf participant watching a clip with verbatim subtitles.  Right: Gaze plot illustrating the viewing pattern of a hard of hearing participant watching a clip with verbatim subtitles.

Figure 7. Scanpaths of Deaf and hard of hearing viewers. Left: Gaze plot illustrating the viewing pattern of a Deaf participant watching a clip with verbatim subtitles. Right: Gaze plot illustrating the viewing pattern of a hard of hearing participant watching a clip with verbatim subtitles.

These provisional qualitative indications of differences between eye movements of users with different profiles require more in-depth quantitative investigation and the subsequent section will provide a few steps in this direction.

As mentioned above, subtitle reading patterns largely depend on the type of viewers. Fluent readers have been found to have no difficulty following subtitles. Diao et al. (2007), for example, found a direct correlation between the impact of subtitles on learning and the academic and literacy levels of participants. Similarly, given that “hearing status and literacy tend to covary” (Burnham et al. 2008, 392), some previous studies found important differences in the way hearing and hearing-impaired people watch subtitled programmes. Robson (2004, 21) notes that “regardless of their intelligence, if English is their second language (after sign language), they [i.e. Deaf people] cannot be expected to have the same comprehension levels as hearing people who grew up exposed to English”. This is indeed confirmed by Szarkowska et al. (forthcoming) who report that Deaf and hard of hearing viewers in their study made more fixations on subtitles and that their dwell time on the subtitles was longer compared to hearing viewers. This result may indicate a larger effort needed to process subtitled content and more difficulty in extracting information (see Holmqvist et al. 2011, 387-388). This, in turn, may stem from the fact that for some Deaf people the language in the subtitles is not their mother tongue (their L1 being sign language). At the same time, for hearing-impaired viewers, subtitles provide an important source of information on the words spoken in the audiovisual text as well as other information contained in the audio track, which in itself explains the fact that they would spend more time looking at the subtitles.

Viewer-external factors that impact on subtitle processing

The ‘smoothness’ of the subtitle reading process depends on a number of factors, including the nature of the audiovisual material as well as technical and graphical aspects of subtitles themselves. At a general level, genre has an impact on both the role of subtitles in the total viewing experience, and on the way viewers process the subtitles. For example, d’Ydewalle and Van Rensbergen (1989) found that children in Grade 2 paid less attention to subtitles if a film involved a lot of action (see d’Ydewalle & Bruycker 2007 for a discussion). The reasons for this could simply be that action film tends to have less dialogue in the first place, but secondly and more significantly, the pace of the visual editing and the use of special effects creates a stronger visual element which shifts the balance of content towards the action (visual content) and away from dialogue (soundtrack and therefore subtitles). This, however, is an area that has to be investigated empirically. At a more specific level, technical characteristics of an audiovisual text such as film editing have an impact on the processing of subtitles.

1 Film editing

Film editing has a strong influence on the way people read subtitles, even beyond the difference in editing pace as a result of genre (for example, action and experimental films could typically be said to have a higher editing pace than dramas and documentaries). In terms of audience perception, viewers have been found to be unaware of standard film editing techniques (such as continuity editing) and are thus able to perceive film as a continuous whole in spite of numerous cuts – the phenomenon termed “edit blindness” (Smith & Henderson, 2008, 2). With more erratic and fast-paced editing, it stands to reason that the cognitive demands will increase as viewers have to work harder to sustain the illusion of a continuous whole.

When subtitles clash with editing such as cuts (i.e. if subtitles stay on screen over a shot or scene change), conventional wisdom as passed on by generations of subtitling guides (see Díaz Cintas & Remael 2007, ITC Guidance on Standards for Subtitling 1999) suggests that the viewer will assume that the subtitle has changed with the image and as a consequence they will re-read it (see above). However, Krejtz et al. (2013) reported that subtitles displayed over shot changes are more likely to cause perceptual confusion by making viewers shift their gaze between the subtitle and the rest of the image more frequently than subtitles which do not cross film cuts (cf. de Linde and Kay 1999). As such, the cognitive load is bound to increase.

2 Text chunking and line segmentation

Another conventional wisdom, perpetuated in subtitling guidelines and standards, is that poor line segmentation will result in less efficient processing (see Díaz Cintas & Remael 2007, Karamitroglou 1998). In other words, subtitles should be chunked per line and between subtitles in terms of self-contained semantic units. The line of dialogue: “He told me that he would meet me at the red mailbox” should therefore be segmented in something like the following ways:

He told me he would meet me
at the red mailbox.


He told me
he would meet me at the red mailbox.

Neither of the following segmentations would be optimal because the prepositional phrase ‘at the red mailbox’ and the verb phrase ‘he would meet me’, respectively, are split, which is considered an error:

He told me he would meet me at the
red mailbox

He told me he
would meet me at the red mailbox.

However, Perego et al. (2010) found that poor line segmentation in two-line subtitles did not affect subtitle comprehension negatively. They also investigated 28 subtitles viewed by 16 participants using a threshold line between the subtitle region and the upper part of the screen, or main film zone, but did not find a statistically significant difference between the well-segmented and ill-segmented subtitles in terms of fixation counts, total fixation time, or number of shifts between subtitle region and upper area. The only statistically significant difference they found was between the mean fixation duration within the subtitle area between the two conditions, with the mean fixation duration in the ill-segmented subtitles being on average 12ms longer than in the well-segmented subtitles. Although the authors downplay the importance of this difference on the grounds that the difference is so small, it does seem to indicate at least a slightly higher cognitive load when the subtitles are ill-segmented. The small number of subtitles and participants, however, make it difficult to generalize from their results, again a result of the fact that it is difficult to extract reading statistics for subtitles unless the reading behaviour can be quantified over longer audiovisual texts.

In a study conducted a few years later, Rajendran et al. (2013) found that “chunking improves the viewing experience by reducing the amount of time spent on reading subtitles” (2013, 5). This study compared conditions different from those investigated in the previous study, excluding the ill-segmented condition of Perego et al. (2010), and focused mostly on live subtitling with respeaking. In the earlier study, which focused on pre-recorded subtitling, the subtitles in the two conditions were essentially still part of one sense unit that appeared as one two-line subtitle. In the later study, the conditions were chunked by phrase (similar to the well-segmented condition of the earlier study but with phrases appearing one by one on one line), no segmentation (where the subtitle area was filled with as much text as possible with no attempt at segmentation), word by word (where words appeared one by one) and chunked by sentence (where the sentences showed up one by one). Regardless of the fact that this later study therefore essentially investigated different conditions, they did find that the most disruptive condition was where the subtitle appeared word by word – eliciting more gaze points (defined less strictly than in fixation algorithms used by commercial eye trackers) and more “saccadic crossovers” or switches between image and subtitle area. However, in this study by Rajendran et al. (2013), the videos were extremely short (under a minute), and the sound was muted, hampering the ecological validity of the material, and once again making the findings less suitable to generalization.

Although both these studies have limitations in terms of generalizability, they both provide some indication that segmentation has an impact on subtitle processing. Future studies will nonetheless have to investigate this aspect over longer videos to determine whether the graphical appearance, and particularly the segmentation of subtitles, has a detrimental effect on subtitle processing in terms of cognitive load and effectiveness.

3 Language

The language of subtitles has received considerable attention from psycholinguists in the context of subtitle reading. D’Ydewalle and de Bruycker (2007) examined eye movement behaviour of people reading standard interlingual subtitles (with the audio track in a foreign language and subtitles in their native language) and reversed subtitles (with the audio in their mother tongue and subtitles in a foreign language). They found more regular reading patterns in the standard interlingual subtitling condition, with the reversed subtitling condition having more subtitles skipped, fewer fixations per subtitle, etc. (see also d’Ydewalle and de Bruycker 2003 and Pavakanun 1993). This is an interesting finding in itself, as it is the reversed subtitling that has been found to be particularly conducive to foreign language learning (see Díaz Cintas and Fernández Cruz 2008, and Vanderplank 1988).

Szarkowska et al. (forthcoming) examined differences in reading patterns of intralingual (Polish to Polish) and interlingual (English to Polish) subtitles among a group of Deaf, hard of hearing and hearing viewers. They found no differences in reading for the Deaf and hard of hearing audiences, but hearing people made significantly more fixations to subtitles when watching English clips with interlingual Polish subtitles than Polish clips with intralingual Polish subtitles. This confirms that the hearing viewers processed the subtitles to a significantly lower degree when they were redundant, as in the case of intralingual transcriptions of the soundtrack. What would be interesting to investigate in this context is those instances when the hearing audience did in fact read the subtitles, to determine to what extent and under what circumstances the redundant written information is used by viewers to support their auditory intake of information.

In a study on the influence of translation strategies on subtitle reading, Ghia (2012) investigated the differences in the processing of literal vs. non-literal translations into Italian of an English film clip (6 minutes) when watched by Italian EFL learners. According to Ghia, just as subtitle format, layout, and segmentation have the potential to affect visual and perceptual dynamics, the relationship translation establishes with the original text means that “subtitle translation is also likely to influence the perception of the audiovisual product and viewers’ general reading patterns” (2012,175). Ghia particularly wanted to investigate the processing of different translation strategies in the presence of sound and image with the subtitles. In her study she found that the non-literal translations (where the target text diverged from the source text) resulted in more deflections between text and image. This is similar to the findings of Rajendran et al. (2013) in terms of less fluent graphics in word-by-word subtitles.

As can be seen from the above, the aspect of language processing in the context of subtitled audiovisual texts has received some attention, but has not to date been approached in any comprehensive manner. In particular, there is a need for more psycholinguistic studies to determine how subtitle reading differs from the reading of static text, and how this knowledge can be applied to the practice of subtitling.

Measuring subtitle processing

1 Attention distribution and presentation speed

In the study by Jensema et al. (2000), subjects spent on average 84% of the time looking at subtitles, 14% at the video picture and 2% outside of the frame. The study represents an important early attempt to identify reading patterns in subtitle reading, but it has considerable limitations. The study had only six participants, three deaf and three hearing, and the video clips were extremely short (around 11 seconds each), presented with English subtitles (in upper case) without sound. The fact that there was no soundtrack therefore impacted on the time spent on the subtitles. In Perego et al’s study (2010), the ratio is reported as 67% on the subtitle area and 33% on the image. In this study there were 41 Italian participants who watched a 15-minute clip with Hungarian soundtrack and subtitles in Italian. As in the previous study, the audience therefore had to rely heavily on the subtitles in order to follow the dialogue. Kruger et al. (2014), in the context of intralingual subtitles in a Psychology lecture in English, found a ratio of 43% on subtitles, 43% on the speaker and slides and 14% on the rest of the screen. When the same lecture was subtitled into Sotho, the ratio changed to 20% on the subtitles, 66% on the speaker and slides, and 14% on the rest of the screen. This wide range is an indication of the difference in the distribution of visual attention in different contexts with different language combinations, different levels of redundancy of information, and differences in audiences.

In order to account for “the audiovisual nature of subtitled programmes”, Romero-Fresco (in press) puts forward the notion of ‘viewing speed’ – as opposed to reading speed and subtitling speed – which he defines as “the speed at which a given viewer watches a piece of audiovisual material, which in the case of subtitling includes accessing the subtitle, the accompanying images and the sound, if available”. The perception of subtitled programmes is therefore a result of not only the subtitle reading patterns, but also the visual elements of the film. Based on the analysis of over seventy-one thousand subtitles created in the course of the Digital Television for All project, Romero Fresco provides the following data on the viewing speed, reflecting the proportion of time spent by viewers looking at subtitles and at the images, proportional to the subtitle presentation rates (see Table 1).

Viewing speed Time on subtitles Time on images
120wpm ±40% ±60%
150wpm ±50% ±50%
180wpm ±60%-70% ±40%-30%
200wpm ±80% ±20%

Table 1. Viewing speed and distribution of gaze between subtitles and images (Romero-Fresco) 

Jensema et al. also suggested that the subtitle presentation rate may have an influence on the time spent reading subtitles vs. watching the rest of the image: “higher captioning speed results in more time spent reading captions on a video segment” (2000, 275). This was later confirmed by Szarkowska et al. (2011), who found that viewers spent more time on verbatim subtitles displayed at higher presentation rates compared to edited subtitles displayed with low reading speed, as illustrated by Figure 8.

Figure 8. Fixation-count based heatmaps illustrating changes in attention allocation of hearing and Deaf viewers watching videos subtitled at different rates.

Figure 8. Fixation-count based heatmaps illustrating changes in attention allocation of hearing and Deaf viewers watching videos subtitled at different rates.

2 Mean fixation duration

Irwin (2004, 94) states that “fixation location corresponds to the spatial locus of cognitive processing and that fixation or gaze duration corresponds to the duration of cognitive processing of the material located at fixation”. Within the same activity (e.g. reading), longer mean fixation durations could therefore be said to reflect more cognitive processing and higher cognitive load. One would therefore expect viewers to have longer fixations when the subject matter is more difficult, or when the language is more specialized. Across activities, however, comparisons of fixation duration is less meaningful as reading elicits more shorter fixations than scene perception or visual scanning simply because of the nature of the activities. It is therefore essential in eye tracking studies of subtitle reading to distinguish between the actual subtitles when they are on screen, the rest of the screen, and the subtitle area when there is no text (between successive subtitles).

The difference between reading and scene perception is illustrated in Figure 9, demonstrating that fixations on the image tend to be longer (indicated here by a bigger circle) than those on subtitles (which indicates more focused viewing), and more exploratory in nature (see the distinction between focal and ambient fixations in Velichkovsky et al. 2005).

Figure 9. Differences in fixation durations between the image and subtitle text – from Polish TV series Londyńczycy.

Figure 9. Differences in fixation durations between the image and subtitle text – from Polish TV series Londyńczycy.

Rayner (1984) indicated the impact of different tasks on mean fixation durations, as reflected in Table 2 below:

Task Mean fixation duration (ms) Mean saccade size (degrees)
Silent reading 225 2 (about 8 letters)
Oral reading 275 1.5 (about 6 letters)
Visual search 275 3
Scene perception 330 4
Music reading 375 1
Typing 400 1 (about 4 letters)

 Table 2. Approximate Mean Fixation Duration and Saccade Length in Reading, Visual Search, Scene Perception, Music Reading, and Typing[1]

In subtitling, silent reading is accompanied by simultaneous processing of the same information in the soundtrack (in the same or another language) as well as of other sounds and visual signs (for a hearing audience, that is – for a Deaf audience, it would be text and visual signs). The difference in mean fixation duration in these different tasks therefore reflects the difference in cognitive load. In silent reading of static text, there is no external competition for cognitive resources. When reading out loud, the speaker/reader inevitably monitor his/her own reading, introducing additional cognitive load. As the nature of the sign becomes more abstract, the load, and the fixation duration increases, and in the case of typing, different processing, production and checking activities are performed simultaneously, resulting in even higher cognitive load. This is inevitably an oversimplification of cognitive load, and indeed the nature of information acquisition between reading successive groups of letters (words) in a linear fashion is significantly different from that of scanning a visual scene for cues.

Undoubtedly, subtitle reading imposes different cognitive demands, and these demands are also very much dependent on the audience. In an extensive study on the differences in subtitle reading between Deaf, hard of hearing and hearing participants, we found a high degree of variation in mean fixation duration between the groups, and also a difference between the mean fixation duration in the Deaf and the hard of hearing groups between subtitles presented at 12 characters per second and 15 characters per second (see Szarkowska et al. forthcoming).

  12 characters per second 15 characters per second
Deaf 241.93 ms 232.82 ms
Hard of hearing 218.51 ms 214.78 ms
Hearing 186.66 ms 186.58 ms

Table 3. Differences in reading subtitles presented at different rates

Statistical analyses performed on the three groups with mean fixation duration as a dependent variable and groups and speed as categorical factors produced a statistically significant main effect, further confirmed by subsequent t-tests that yielded statistically significant differences in mean fixation duration for both subtitling speeds between all three groups. The difference within the Deaf and hard of hearing groups was also significant between 12cps and 15cps. What this suggests is that reading speed has a more pronounced effect on Deaf and hard of hearing viewers than on hearing ones.

3 Subtitle reading

As indicated at the outset, one of the biggest hurdles in studying the processing of subtitles is the fact that the subtitles appear as image on image rather than text on image as far as eye tracking analysis software is concerned. Whereas reading statistics software can therefore automatically mark words as areas of interest in static texts, and then calculate number of regressions, refixations, saccade length, fixation duration and count as related to the specific words, this process has to be done manually for subtitles. The fact that it is virtually impossible to create similar areas of interest on the subtitle words that are embedded in the image over large numbers of subtitles makes it very difficult to obtain reliable eye tracking results on subtitles as text. This explains the predominance of measures such as fixation count and fixation duration as well as shifts between subtitle area and image in eye tracking studies on subtitle processing. As a result, many of these studies do not distinguish directly between looking at the subtitle area and reading the subtitles, and, “they tend to define crude areas of interest (AOIs), such as the entire subtitle area, which means that eye movement data are also collected for the subtitle area when there are no subtitles on screen, which further skews the data” (Kruger and Steyn, 2014, 109).

Although a handful of studies come closer to studying subtitle reading by going beyond the study of fixation counts, mean fixation duration, and shifts between subtitle area and image area, most studies tend to focus on amount of attention rather than nature of attention. Briefly, the exceptions can be identified in the following studies: Specker (2008) looks at consecutive fixations; Perego et al. (2010) add the path length (sum of saccade lengths in pixels) to the more conventional measures; Rajendran et al. (2013) add the proportion of gaze points; Ghia (2012) looks at fixations on specific words as well as regressions; Bisson et al. (2012) look at the number of subtitles skipped, and proportion of successive fixations (number of successive fixations divided by total number of fixations); and in one of the most comprehensive studies on the subject of subtitle processing, d’Ydewalle and De Bruycker (2007) look at attention allocation (percentage of skipped subtitles, latency time, and percentage of time spent in the subtitle area), fixations (number, duration, and word-fixation probability), and saccades (saccade amplitude, percentage of regressive eye movements, and number of back-and-forth shifts between visual image and subtitle).

In a recent study, Kruger and Steyn (2014) provide a reading index for dynamic texts (RIDT) designed specifically to measure the degree of reading that takes place when subtitled material is viewed. This index is explained as “a product of the number of unique fixations per standard word in any given subtitle by each individual viewer and the average forward saccade length of the viewer on this subtitle per length of the standard word in the text as a whole” (2014, 110). Taking the location and start time of successive fixations within the subtitle area when a subtitle is present as the point of departure, the number of unique fixations (i.e. excluding refixations, and fixations following a regression) is determined, as well as the average length of forward saccades in the subtitle. This information gives an indication of the meaningful processing of the words in the subtitle when the number of fixations per word, as well as the length of saccades as ratio of the length of the average word in the audiovisual text are calculated. Essentially, the formula quantifies the reading of a particular subtitle by a particular participant by measuring the eye movement during subtitle reading against what is known about eye movements during reading and perceptual span.

In a little more detail, the formula can be written as follows for video v, with participant p viewing subtitle s”:


(Kruger and Steyn, 2014, 110).

This index was validated by performing a comparison of the manual inspection of the reading of 145 subtitles by 17 participants, and makes it possible to study the reading of subtitles over extended texts. In their study, Kruger and Steyn (2014) use the index to determine the relationship between subtitle reading and performance in an academic context, finding a significant positive correlation between the degree to which participants read the subtitles and their performance in a test written after watching subtitled lectures. The RIDT therefore presents a robust index of the degree to which subtitles are processed over extended texts, and could add significant value to psycholinguistic studies on subtitles. Using the index, previous claims that subtitles have a positive or negative impact on comprehension, vocabulary acquisition, language learning or other dependent variables, can be correlated with whether or not viewers actually read the subtitles, and to what extent the subtitles were read.


From this overview of studies investigating the processing of subtitles on the moving image it should be clear that much still needs to be done to gain a better understanding of the impact of various independent variables on subtitle processing. The complexity of the multimodal text, and in particular the competition between different sources of information, means that a subtitled audiovisual text is a substantially altered product from a cognitive perspective. Much progress has been made in coming to grips with the way different viewers behave when looking at subtitled audiovisual texts, but there are still more questions than answers – relating, for instance, to differences in how people process subtitled content on various devices (cf. the HBBTV4ALL project). The use of physiological measures like eye tracking and EEG (see Kruger et al. 2014) in combination with subjective measures like post-report questionnaires is, however, continually bringing us closer to understanding the impact of audiovisual translation like subtitling on the experience and processing of audiovisual texts.



This study was partially supported by research grant No. IP2011 053471 “Subtitling for the deaf and hard of hearing on digital television” from the Polish Ministry of Science and Higher Education for the years 2011–2014.



Bisson, Marie-Josée, Walter Van Heuven, Kathy Conklin, and Richard Tunney. 2014. “Processing of Native and Foreign Language Subtitles in Films: An Eye Tracking Study.” Applied Psycholinguistics 35(2):399-418.

Burnham, Denis, Leigh Greg, Noble William, Jones Caroline, Tyler Michael, Grebennikov Leonid and Alex Varley. 2008. Parameters in television captioning for deaf and hard-of-hearing adults: effects of caption rate versus text reduction on comprehension. Journal of Deaf Studies and Deaf Education 13 (3):391-404.

de Linde, Zoé and Neil Kay. 1999. The Semiotics of Subtitling. Manchester: St. Jerome.

Diao, Y., Chandler, P., Sweller, J. 2007. The effect of written text on comprehension of spoken English as a foreign language. The American Journal of Psychology 120(2): 237-261.

Díaz Cintas, Jorge and Marco Fernandez Cruz. (2008) “Using subtitled video materials for foreign language instruction”. In The Didactics of Audiovisual Translation edited by Jorge Díaz Cintas, 201-214. Amsterdam/Philadelphia: John Benjamins.

Díaz Cintas, Jorge and Aline Remael. 2007. Audiovisual Translation: Subtitling. Manchester: St. Jerome.

d’Ydewalle, Géry and Wim De Bruycker. 2003. Reading native and foreign language television subtitles in children and adults. In The mind’s eyes: Cognitive and applied aspects of eye movement research, edited by J. Hyönä, R. Radach and H. Deubel, 444-461. New York: Springer-Verlag.

d’Ydewalle, Géry and Wim De Bruycker. 2007. “Eye Movements of Children and Adults while Reading Television Subtitles.” European Psychologist 12:196–205.

d’Ydewalle, Géry and Ingrid Gielen. 1992. “Attention Allocation with Overlapping Sound, Image, and Text.” In Eye Movements and Visual Cognition: Scene Perception and Reading, edited by Keith Rayner, 415–427. New York: Springer-Verlag.

d’Ydewalle, Géry, Johan Van Rensbergen, and Joris Pollet. 1987. Reading a message when the same message is available auditorily in another language: The case of subtitling. In Eye Movements: From Physiology to Cognition edited by J.K O’Reagan and A. Lévy Schoen, 313-321. Amsterdam: Elsevier Science Publishers B.V. (North-Holland).

Ghia, Elisa. 2012. “The Impact of Translation Strategies on Subtitle Reading.” In Eye Tracking in Audiovisual Translation, edited by Elisa Perego, 155–182. Roma: Aracne Editrice.

Gottlieb, Henrik. 1998. Subtitling. In Routledge Encyclopaedia of Translation Studies, edited by Mona Baker, 244-248. London & New York: Routledge.

Hershler, Orit and Shaul Hochstein. 2005. At first sight: a high-level pop out effect for faces. Vision Research, 45, 1707–1724.

Holmqvist, Kenneth et al. 2011. Eyetracking. A Comprehensive Guide to Methods and Measures. Oxford: Oxford University Press.

Irwin, David E. 2004. Fixation location and fixation duration as indices of cognitive processing. In J.M. Henderson & F. Ferreira (Eds.), The interface of language, vision, and action: Eye movements and the visual world, 105-133. New York, NY: Psychology Press.

ITC Guidance on Standards for Subtitling. Online at:

Jensema, Carl. 2000. Eye movement patterns of captioned TV viewers. American Annals of the Deaf vo. 145, no. 3, 275-285.

Karamitroglou, Fotios. 1998. A Proposed Set of Subtitling Standards in Europe. Translation Journal 2(2).

Krejtz, Izabela, Agnieszka Szarkowska, and Krzysztof Krejtz. 2013. “The Effects of Shot Changes on Eye Movements in Subtitling.” Journal of Eye Movement Research 6 (5): 1–12.

Kruger, Jan-Louis and Faans Steyn. 2014. “Subtitles and Eye Tracking: Reading and Performance.” Reading Research Quarterly 49 (1): 105–120.

Kruger, Jan-Louis, Esté Hefer, and Gordon Matthew. 2013a. “Measuring the Impact of Subtitles on Cognitive Load: Eye Tracking and Dynamic Audiovisual Texts.” Proceedings of Eye Tracking South Africa 29-31 August 2013, Cape Town.

Kruger, Jan-Louis, Esté Hefer, and Gordon Matthew. 2013b. The impact of subtitles on academic performance at tertiary level. Paper presented at the Linguistics Society of Southern Africa annual conference in Stellenbosch, June, 2013.

Kruger, Jan-Louis. 2013. “Subtitles in the Classroom: Balancing the Benefits of Dual Coding with the Cost of Increased Cognitive Load.” Journal for Language Teaching 47(1):29–53.

Kruger, Jan-Louis, Hefer, Esté, and Gordon Matthew. 2014. Attention distribution and cognitive load in a subtitled academic lecture: L1 vs. L2. Journal of Eye Movement Research 7(5):4, 1–15.

Langton, Stephen R.H., Anna S. Law, Burton, A. Mike and Stefan R. Schweinberger. 2008. Attention capture by faces. Cognition, 107:330-342.

Pavakanun, Ubowanna. 1992. Incidental acquisition of foreign language through subtitled television programs as a function of similarity with native language and as a function of presentation mode. Unpublished doctoral thesis, Leuven, Belgium, University of Leuven.

Perego, Elisa, Fabio Del Missier, Marco Porta and Mauro Mosconi. 2010. “The Cognitive Effectiveness of Subtitle Processing.” Media Psychology 13(3):243–272.

Rajendran, Dhevi, Andrew Duchowski, Pilar Orero, Juan Martínez, and Pablo Romero-Fresco. 2013. “Effects of Text Chunking on Subtitling: A Quantitative and Qualitative Examination.” Perspectives: Studies in Translatology 21(1):5–31.

Rayner, Keith. 1984. Visual selection in reading, picture perception, and visual search: A tutorial review. In Attention and performance edited by H. Bouma and D. Bouhwhuis, vol. 10. Hillsdale, NJ: Erlbaum.

Rayner, Keith 1998. “Eye movements in reading and information processing: Twenty years of research.” Psychological Bulletin, 124:372–422.

Robson, Gary D. 2004. The closed captioning handbook. Amsterdam: Elsevier.

Romero Fresco, Pablo (in press) The Reception of Subtitles for the Deaf and Hard of Hearing in Europe. Peter Lang.

Smith, Tim, and John M. Henderson. 2008. Edit Blindness: The relationship between attention and global change blindness in dynamic scenes. Journal of Eye Movement Research 2(2), 6:1-17.

Specker, Elizabeth, A. 2008. L1/L2 Eye Movement Reading of Closed Captioning: A Multimodal Analysis of Multimodal Use. Unpublished PhD thesis. University of Arizona.

Szarkowska, Agnieszka, Krejtz, Izabela, and Łukasz Dutka. (forthcoming) The effects of subtitle presentation rate, text editing and type of subtitling on the comprehension and reading patterns of subtitles among deaf, hard of hearing and hearing viewers. To appear in: Across Languages and Cultures 2016, vol. 2.

Szarkowska, Agnieszka, Krejtz, Izabela, Kłyszejko, Zuzanna and Anna Wieczorek. 2011. “Verbatim, standard, or edited? Reading patterns of different captioning styles among deaf, hard of hearing, and hearing viewers”. American Annals of the Deaf 156 (4):363-378.

Vanderplank, Robert. 1988 “The value of teletext sub-titles in language learning”. ELT Journal 42(4):272-81.

Velichkovsky, Boris M., Joos, Markus, Helmert, Jens R., and Sebastian Pannasch. 2005. Two Visual Systems and Their Eye Movements: Evidence from Static and Dynamic Scene Perception. InCogSci 2005: Proceedings of the XXVII Conference of the Cognitive Science Society, 2283–2288.

Winke, Paula, Susan Gass, and Tetyana Syderenko. 2013. “Factors Influencing the Use of Captions by Foreign Language Learners: An Eye Tracking Study.” The Modern Language Journal 97 (1):254–275.

Yarbus, Alfred L. 1967. Eye movements and vision. New York, NY: Plenum Press.



[1] Values are taken from a number of sources and vary depending on a number of factors (see Rayner, 1984)



Jan-Louis Kruger is director of translation and interpreting in the Department of Linguistics at Macquarie University in Sydney, Australia.  He holds a PhD in English on the translation of narrative point of view. His main research interests include studies on the reception and cognitive processing of audiovisual translation products including aspects such as cognitive load, comprehension, attention allocation, and psychological immersion.

Agnieszka Szarkowska, PhD, is Assistant Professor in the Institute of Applied Linguistics at the University of Warsaw, Poland. She is the founder and head of the Audiovisual Translation Lab, a research group working on media accessibility. Her main research interests lies in audiovisual translation, especially subtitling for the deaf and the hard of hearing and audio description.

Izabela Krejtz, PhD, is Assistant Professor at University of Social Sciences and Humanities, Warsaw. She is a co-founder of Eyetracking Research Center at USSH. Her research interests include neurocognitive and educational psychology. Her applied work focuses on pro-positive trainings of attention control, eye tracking studies in perception of audiovisual material and emotions regulation.

Reaching for the Screen in Nine Inch Nails’ ‘Lights in the Sky’ – Katheryn Wright

Abstract: During the Nine Inch Nails’ Lights in the Sky tour in 2008, Trent Reznor made use of two semi-transparent stealth screens layered in front of a third screen through which the band performed the second and third acts of the show. A stealth screen is made from reflective elements linked together like a chain, and can appear either transparent or opaque depending upon the lighting. When these screens appear onstage during the concert, attention shifts from Reznor’s body in performance to his physical interactions with the surrounding screens. The screens are not only spaces for the projection of images, but physical objects Reznor interacts with during the course of the show. Reznor plays a game of hide-and-seek with his audience using screens to reveal and conceal his body. The downstage screen also transforms into a touch interface, where Reznor experiments with its responsiveness. Both hide-and-seek and the play of responsivity in NIN’s performance echoes the everyday interactions people have with their screen technologies. As such, the NIN’s Lights in the Sky tour maps emerging bodily habituations forming through the materiality of the screen.

Nine Inch Nails performing ’31 Ghosts IV’ during their 2008 Lights In The Sky Over North America Tour.

“We may debate whether our society is a society of spectacle or of simulation, but, undoubtedly, it is a society of the screen” (Manovich 94). In this quote from The Language of New Media, Lev Manovich recognizes the central role screen technologies play in the digital era. This “society of the screen” has taken on new life over the past decade. A few examples: Mirjam Struppek organized the first international conference about the aesthetic and political potential of urban screens in 2005. Released in 2007, the Apple iPhone brought multi-touch technology to a mass audience; a multi-touch surface can respond to two or more inputs, increasing the functionality of touch screen (or trackpad) devices like the iPhone. That same year, American Express sponsored a program for cardholders during the U.S. Open where attendees were issued handheld televisions to carry around with them during the event to enhance the live experience of tennis. In 2008, life on a spaceship involved living through your own personal screen in the post-apocalyptic film, Wall-E. And, the opening ceremonies of the Beijing Olympic Games in 2008 featured Olympic gold-metal gymnast Li Ning traversing the inner rim of the stadium with a media surface unfolded to display images from the torch’s journey across China behind him.

These brief descriptions capture artists, scholars, entrepreneurs, and athletes challenging what the screen can do. Scott McQuire traces the migration of the television set from the domestic spaces of the home to urban spaces of the city, a shift that has developed alongside the rise of global networks and the mobilization of media (2). McQuire and Sean Cubitt argue that contemporary forms of sociality occur through the materiality of screen technology as an architectural façade, a virtual interface, or a personal companion (McQuire 48, Cubitt 105). Uta Casparay, Erkki Huhtamo, and Cubitt trace the material histories of the historical precursors to urban screens and outdoor advertising. Experimental artworks incorporating large-scale video like Krzysztof Wodiczko’s projections on historical monuments or smaller scale interactive pieces like Chris Jordan’s Chrono Beam (2011) has received recent critical attention in the way they interrogate the intersection between the embodied spectator and the ephemeral politics of public displays (Susik 113-114, 118). In addition to advertising displays and experimental art projections, experiments with emerging screen practices are also going on in popular forms of entertainment, especially rock concerts. Trent Reznor, lead singer and driving force behind Nine Inch Nails (NIN), has written and performed industrial rock music for more than two decades, and since the height of his popularity in the nineties has extended his creative pursuits to include digital imaging, remixing, online distribution, and most recently composing for movies including Girl with the Dragon Tattoo (2011). Added to this list are his experiments with screen technologies.

For NIN’s Lights in the Sky tour, Reznor introduced a new twist to the second act of each concert by adding two semi-transparent stealth screens positioned in front of a third screen (Gardiner par. 10). Because a stealth screen is made from reflective elements linked together like a chain, the combination of projection and light cues can make it appear both transparent and opaque at the same time. These screens function not only as spaces for the projection of animations or images, but material objects Reznor interacts with during the course of the show. The screens in Lights in the Sky are an extension of the design concept for his previous tour, Live: With Teeth, where Reznor used a big screen to display video footage throughout the concert. Lights in the Sky (designed by Reznor and Rob Sheridan, the artistic director, with lighting designer Roy Bennett and the company Moment Factory) combines laser technologies, particle-based animation that runs off several Linux-based devices, choreographed staged lighting, a high-resolution  n, and the two semitransparent stealth screens. In addition to these major elements, the production includes a closed circuit camera system streamed live through the Linux-based computer terminal and preprogrammed song cues controlled by the artistic director and lighting designer via the motherboard.

The collection of screens, lighting, animation, and sound create a media environment that Reznor moves within for the rest of the concert. Trent Reznor’s body is the critical touchstone around which every element in the performance revolves. For fans, rock concerts are about being in the moment and presence of the star. Reznor’s introduction of stealth screens during the concert creates an awkward situation where the people there to see NIN perform live do so through a technological interface. When I attended a live performance on September 29, 2008 in Jacksonville, FL, there were two primary types of interactions that occurred between Reznor, the transparent screens onstage, and audience. First, Reznor engages in a game of hide-and-seek where the screen is used to both reveal and conceal his body from the audience. Second, the play of responsivity between his physical actions and digital imagery creates a visual continuity between screen projections and his body in motion that reverberates through the venue. These interactions at the concert echo the habituations people develop as they use the variety of screen technologies at their disposal, making NIN’s Lights in the Sky a compelling case study to explore a privileged moment in time and space when the cultural significance of the screen is in flux.


The lights go out as the screens lower across the stage. Spectators murmur and wait for Reznor to reappear and begin the second set, but he does not immediately come back onto the stage. Instead, a blue field of light begins accompanied by some drums and a xylophone. The silhouettes of Reznor flanked on either side by the other members of the band appear behind the transparent screen. Only after the instrumental section has been going on for a few minutes, additional light floods the stage and spectators realize which body belongs to Reznor.

In this opening sequence to the second act of the show, the stealth screen initially masks and then dramatically reveals Reznor’s position on the stage. During the “Greater Good” track from the album Year Zero, the downstage stealth screen displays what looks like a time-lapsed recording of bacteria growing, pulsating, in a Petri dish – a mound of digital particles. For the first thirty seconds of the performance, these animations appear onscreen while Reznor is nowhere to be found. Then, extremely subtly, Reznor’s silhouette backs into the stage in front of the left corner of the screen. Dwarfed by the vastness of the stealth screen, he faces the offstage wing crouched over in a profile position. He is barely visible and remains completely out of view to many. After several beats, the movement of the digital particles takes the shape of Reznor’s face; the extreme close-up is a video recorded by an offstage camera. During this segment, Reznor’s body overlaps the live feed of his physical image being projected onto the stealth screen. He appears onscreen and onstage at the same time. Reznor hides. The audience seeks. This game adds an intriguing twist for the audience who came to hear and see NIN in person. Rather than getting to see the band perform for the entire set, the lead singer disappears from the stage for fairly large chunks of time. They perform songs like “Greater Good” behind the stealth screen and out of sight from the audience. As such, Reznor controls how much the audience sees him, on and offscreen, during the show.

During the performance of “Survivalism” in the third act of the concert, a closed-circuit camera system installed around the premises records live footage of the audience and projects it onto the big screen. In what looks like a collection of monitors located in a security station, spectators watch themselves watching NIN. This sequence is about the ubiquity of surveillance technologies contemporary American culture, but the audience cannot hide among the screens like Reznor can. Even so, those watching can use their mobile devices to capture the live event as it unfolds in real time. Smartphones enable audience members to communicate ideas and information via text, voice, image, and video. They can send what they record at the concert to NIN’s website. Ironically, audience members also use these same devices to capture a better view than they have while standing, zooming in to get a closer look at the action. The power Reznor has over his visibility onstage dissipates as the territory of the media environment expands into the World Wide Web. Although Reznor plays with the audience in terms of his physical visibility, the overall performance hinges on its technological infrastructure. The production team cannot always manage software, and screens at the venue. Computer glitches continued to crop into the flow of the performances during the tour, including when I saw it. In Jacksonville, animations on the stealth screen kept flickering on and off and, during the final “Head Like a Hole” encore, the red “NIN” symbol flashing on the downstage stealth screen had a part of an “N” chopped off. These glitches form a digital reality outside of any single person’s control. Spectators have no influence over their bodies on display.

When Reznor plays hide-and-seek with his audience using the screens surrounding him, this communicative act challenges the implied materiality of screen space. Critical discussions about the spatial relations of the screen media emerge in apparatus theory of the 1970s. Jean-Louis Baudry argues that the screen is simultaneously a mirror and frame that produces ideological distance between the conscious spectator and the dreamworld of the cinema (352 – 353). Manovich echoes the logic of apparatus theory in his cultural history of screen technologies where he distinguishes between the classical screen (Renaissance perspectival painting), dynamic screen (film and television), and the screen in real time (computer) (96). In all of these cases, “The act of cutting reality into a sign and nothingness simultaneously doubles the viewing subject who now exists in two spaces: the familiar space of his/her real body and the virtual space of an image within the screen” (106). For Baudry and Manovich, the screen marks a border between two qualitatively different spaces that may speak to each other in dynamic in provocative ways as decades of compelling scholarship in media and cultural studies has proven, yet they remain separate. The screen is important in as much as it frames the point-of-view of someone looking through it at whatever movie or show they happen to be watching.

Reznor’s game challenges the implied separation between real and virtual space that Baudry and Manovich trace in their respective theories. Establishing a connection between the spectacle and spectators through the screen transforms the display space into something more than a window frame to a virtual world. Reznor uses to the screen to shield himself from view, and to reveal his presence to the crowd. The juxtaposition between the stealth screen’s transparency and opacity highlights the basic attributes of its materiality. Its surface conceals and reveals as much as the edges frame what is on it. The stealth screen is like the cape a magician uses to hide the bag of tricks from the audience. The turn, however, is when the onscreen spectacle bleeds into the physical space the screen occupies. Through hide-and-seek, Reznor occupies a modular media space. So, too, do audience members who use their screened devices to get a better view of the show. Media scholar Adriana de Souza e Silva explains how “from the merging of mixed reality and augmented spaces, mobility, and sociability arises a hybrid reality…a hybrid space is not constructed by technology. It is built by the connection of mobility and communication and materialized by social networks developed simultaneously in physical and digital spaces” (265 − 266). She describes what others have called augmented space, simulacra, computer-mediated reality, or multimedia environments: concepts with varied connotations that attempt to describe the blending of physical and digital spaces. Reznor experiments with what communication feels like within these spaces that De Souza e Silva among others attempt to explicate. Ironically, the space between virtual and actual space symbolized by the screen for Baudry and Manovich seems to migrate to the body of the user, who in the case of the Lights in the Sky performance is Reznor and not the audience watching him.


A white field of particles fills the screen at the beginning of “Only” from the album With Teeth. A small opening grows out from the center to reveal Reznor’s body. The closer he moves toward the screen, the larger the opening. He walks across the stage, and the opening follows him. He moves upstage and the opening closes.  The field gives way to a violent streaming of digital noise that momentarily reveals the rest of the band playing behind Reznor. The white field returns to view. Again, Reznor paces across the stage as the opening follows him wherever he goes. The drummer comes out onstage and lights up a series of boxes by touching each individual square.

This transitional sequence leads into the performance “Echoplex” when the initial beat he establishes by touching the boxes merges into the song’s introduction. The drummer returns at its conclusion to deactivate the light boxes.

In these two instances, the onstage stealth screens transform into touch screens when lasers running along the back of the screens indicate their position. The animations generated in real-time record their physical movement in relation to the screen to produce the illusion of a haptic interface. Even if the performers do not actually touch anything, the sequences establish a sense of continuity between onscreen and offscreen through the responsivity of the screen. Responsivity refers to the quality of the digital connection between onscreen and offscreen. The screen interfaces of popular technologies like the iPhone react to the touch of a finger or pen. Touch screens work by layering two surfaces with an electric current or laser beam sandwiched between them. When somebody touches the surface of the outer screen, the flow of the current or beam is interrupted and signals the device to react in that particular spot. Slightly different from the touch screen, the remote responsiveness of game consoles like Nintendo’s Wiimote and Xbox’s Kinnect depend on a remote sensor or motion control. Reznor experiments with both types of responsiveness during the performance.

Reznor draws the second type of responsivity for the performance of “Terrible Lie” in the third act. At this point in the show, the three screens have changed position. The stealth screens have been raised to allow for Reznor and the band to move more freely on the stage. However, they remain staggered so as to continue to project images, although intensities would probably be the better term at this point given that approximately 90% of what is onscreen are bursts of color and light. Amorphous red, orange and yellow particles flash on the three screens in the same rhythmic patterns of the song. These animations translate auditory and haptic cues (rock music performed live can be felt just as much as it is heard) into visualizations. These sensory translations, like synesthesia, extend the aesthetic possibilities of live performance. Reznor becomes connected to the (digital, screened, animated) world through the responsivity of his onstage environment. The responsiveness between body and screen draws disparate elements of the performance together into a single rhythm during the course of the show.

During the concert, Reznor appears to cross through the frame of the screen. Even though he never actually steps through it, the animations create the illusion of breaking through the screen’s surface. The stealth screen offers a way to cross the frame that separates the spatial reality of the viewer from the virtual space of representation. The formal separation between real and virtual space plays a pivotal role in Western aesthetics. Writing about this divide, Anne Friedberg traces the cultural history of what she calls the “virtual window” to Renaissance perspectival painting, where the viewer is situated in front of a framed surface. Reznor challenges this tradition by linking the physical and digital through his body. Although screens continue to situate the perspective of the viewer towards onscreen content, Reznor temporarily acts as a node through which the onscreen and offscreen converge. Reznor frames the visual content for the audience by moving towards and away from the stealth screen. He determines the path of the real-time particle animations during the show. Still, he never actually crosses through the frame of the screen. This crossing is an illusion; Reznor remains trapped behind the screen in order for that illusion to work. To cross the frame of the screen is transgressive within the field of modern aesthetics, much like breaking through the imaginary “fourth wall” of the stage, but the interplay between screen and body during the concert translates into an optical effect for everyone watching where Reznor is, himself, on display. Like a painting or film, this performance can only be accessed through a proscenium, a frame, which separates the spectacle from those watching it.

Similar to the responsivity of the touch screen, the integration of sensory components during the show cultivates a sense of continuity between actual and virtual through Reznor’s body. He acts like a remote control for the live action onstage, altering the sense of presence throughout the venue. Every aspect of the concert feels connected, and nobody can get out of it because the surveillance cameras make everyone visible. This feeling of connection continues when video recorded with mobile devices extend into online archives after the concert. The attempt to capture the “live” experience through media by fans reinforces the responsivity that Reznor draws on throughout the second and third acts. After each concert concludes, NIN’s website archives fan videos and chats in an effort to collect the individualized performances together into the broader context of the tour. Like the multiple elements of the live show combining through the responsiveness of the screen, information comes together through the human-computer interface. Mobile devices enable spectators to participate in the concert by recording and archiving the live event. The website clearly organizes its galleries according to concert and tour dates so visitors can easily navigate through the streaming videos. The feeling of connection cultivated through the play of responsivity during the show was reignited when audiences come together to construct their own narrative about the tour through the documentary Another Version of the Truth: The Gift that was produced by fans and distributed through

From my perspective in the crowd, interacting with his audience through the stealth screens seems almost like a spiritual experience for Reznor, who is the obvious centerpiece around which the hybrid reality is constructed. He stands behind the screen and soaks up the spectacle while the audience looks on. For me, however, the experience was ultimately frustrating because the spontaneity and singular intimacy of a rock concert, that feel and smell of bodies cramming next to you in a collective push towards the stage, is lost. We were left watching NIN interact with cutting-edge screen technologies onstage without access to it. Even though the animations and lighting cues were generated live, the concert began to feel closer to performance art (like one of Wodiczko’s projections) than a rock concert. It is the same feeling I have when I watch someone text at the table at lunch, for instance. The person is there, but not in the same way as they would have been otherwise.
Screens for Sale

The Lights in the Sky tour combines industrial-alternative rock, live performance, new media technologies, real-time animation, ticketholders, critics, and fans like myself into a symbolic act of a body reaching out to touch the screen. The game of hide-and-seek and play of responsivity represent different ways people interact with the screens around them. A commercial for the Blackberry Storm smartphone released in November 2008 (and running throughout the following year) illustrates how the appeal of interacting with a screen interface, much like the games Reznor plays, stems from the symbolic act of crossing through its frame. This commercial sells the idea that by simply touching a screen you can connect more efficiently, immediately, and directly with what’s important in your life. The multi-touch interface allows users to make contact with their social networks through a phone, yet the commercial itself acknowledges the reality of the interface, the semi-transparent field at the center of the composition, as the primary point of access. Reznor stands behind his stealth screen; the woman stands behind a rectangular plane and faces forward as she navigates through the textural space of the graphic user interface. Different from Reznor’s live performance, however, is what happens afterward. Animations like the boy with a kite, rock concert, and photographs explode from the frame rather than being projected onto a screen. Touching the interface releases the three-dimensionality of life as it unfolds in real time. The advertisement suggests all we need is a Blackberry Storm to make our experience of the world more real, an ideology that continues to shape narratives about technology in the 21st century. Still, the promise represented by the screen remains tempered by the frame of the television set or YouTube video player. The life falling out from the flat plane, activated through touch, can only be perceived by sitting in front or standing behind another screen, another interface.

Being connected and simultaneously in sync with each other through actual and virtual space creates a sense of presence rooted in the modularity of media space. This spatial arrangement appears to situate the body as a locus of control, benefiting artists like Reznor who embraces the emancipatory promise of new media technology and multinational telecommunications companies like Research in Motion Limited (RIM), the makers of Blackberry, who hope to sell the need to connect with others by touching a really cool screen on your smartphone. Still, like anybody who uses a smartphone will eventually find out, Trent Reznor’s ability to control the terms of his physical interactions with the screens around him is a fantasy. His financial investment, celebrity status, and social positioning coupled with the obvious fact that he and his band are the only people allowed onstage during the concert make him a privileged participant. Reznor chooses what to release on the website. Reznor manages the NIN’s brand. The title of the documentary is called “the gift” because he released high-quality digital recordings to his fans so they could make the documentary. Still, this fantasy – the fantasy of control and connection – is something that commercials for new media technologies ranging from smartphones and video games to Project Glass from Google X continue to promote. Reznor’s onstage encounter with his screens during the Lights in the Sky tour represents a time and place when this fantasy was just beginning to enter into the mainstream, when the potentiality of hybrid reality is being tested within the volatile boundaries of popular culture, and before the games Reznor plays were written into the teleological narrative of technological change.



“Another Version of the Truth: The Gift.” Web. 14 Oct. 2011.

Baudry, Jean-Louis. “Ideological Effects of the Basic Cinematographic Apparatus.” Film Theory and Criticism: Introductory Readings, 5th Edition. Ed. Leo Braudy and Marshall Cohen. Oxford University Press, USA, 1998. 345 – 355. Print.

Caspary, Uta. “Digital Media as Ornament in Contemporary Architectural Facades.” Urban Screens Reader. Eds. Scott McQuire, Meredith Martin, and Sabine Niederer. Institute of Network Cultures: Amsterdam, 2009. 65 – 74.

Cubbit, Sean. “LED Technology and the Shaping of Culture.” Urban Screens Reader. Eds. Scott McQuire, Meredith Martin, and Sabine Niederer. Institute of Network Cultures: Amsterdam, 2009. 97 – 108.

De Souza e Silva, Adriana. “From Cyber to Hybrid.” Space and Culture 9.3 (2006): 261 –278. Web. 19 Aug. 2010.

Easybakechicken. “Nine Inch Nails – Lights in the Sky Tour – Echoplex – Trent Speaks.” YouTube. Web. 14 October 2011.

Friedberg, Anne. The Virtual Window: From Alberti to Microsoft. The MIT Press, 2009. Print.

Gardiner, Bryan. “NIN Dazzles With Lasers, LEDs and Stealth Screens.” 13 Sept. 2008. Web. 14 Oct. 2011.

Huhtamo, Erkki. “Messages on the Wall: An Archeology of Public Media Displays.” Urban Screens Reader. Eds. Scott McQuire, Meredith Martin, and Sabine Niederer. Institute of Network Cultures: Amsterdam, 2009. 15 – 28.

Manovich, Lev. The Language of New Media. The MIT Press, 2002. Print.

Mattssnet. “Blackberry Storm – Feel Your Passion – Hot Girl.” YouTube. Web. 14 October 2011.

McQuire, Scott. “Mobility, Cosmopolitanism, and Public Space in the Media City.” Urban Screens Reader. Eds. Scott McQuire, Meredith Martin, and Sabine Niederer. Institute of Network Cultures: Amsterdam, 2009. 45 – 64.

–. “The Politics of Public Space in the Media City.” First Monday. Web. 14 October 2011.

“ [the Official Nine Inch Nails Website].” 13 Sept. 2008. Web. 14 October 2011.

Shelterslullabies. “Nine Inch Nails – Only (Lights In The Sky Tour).” YouTube. Web. 14 October 2011.

Shelterslullabies. “Nine Inch Nails – Survivalism (Lights In The Sky 2008).” YouTube. Web. 14 October 2011.

Susik, Abigail. “The Screen Politics of Architectural Light Projection.” Public 23: 45 (June 2012): 106 – 119.

Vacantenigma. “Nine Inch Nails – The Greater Good – Philly, PA – 8-29-08.” YouTube. Web. 14 October 2011.


A Moving Image Experience: Il Cinema Ritrovato: Bologna, June-July, 2010 – Wendy Haslem

A film festival is always a time machine, and Il Cinema Ritrovato doubly so. Every bit of film contributes to the kaleidoscope of a century, especially when screened now, at the beginning of a new century and during circumstances where almost no moment of film, and few entire films, count in the same way.

Peter von Bagh, Artistic Director, Il Cinema Ritrovato, (2010 9).

In 2010 the 24th edition of Il Cinema Ritrovato screened 313 films over eight days in four locations throughout the city of Bologna, Italy. The coordinator of Il Cinema Ritrovato, Guy Borlée and the artistic director Peter von Bagh were responsible for curating a festival of cinema dedicated to the conservation and exhibition of newly discovered films. Conservation technologies are the invisible and highly visible forces behind this festival. These technologies are revealed in newly cleaned, pristine images, brilliant with the erasure of traces of time and use. Films reveal narratives that are restored with the insertion of intertitles and even in black sequences highlighting those scenes that were beyond restoration. This is a festival that makes a dynamic contribution to the evolution of the history of world cinema. Il Cinema Ritrovato exhibits the results of conservation projects by the Cineteca Bologna, The World Film Foundation and other restoration institutions world wide. As von Bagh implies, this network connects organisations, spaces, people and histories beyond a simple chronology. Von Bagh perceives this festival to be as much about the future as the past. He writes:

Considering that the cinema year 2009-10 has been filled with especially infantile discussions about 3-D and related matters, I’m glad to state the overwhelming – and essential – presence in our program of technologies and the dialogue about them. This doesn’t mean only our dear themes of colour and widescreen, but also a more surprising face: that stepping into the midst of silent films is often also a trip to the future (Peter von Bagh, 2010 9).

This film festival not only connects the past to the present, it creates a culture that understands both as necessary for the future of the moving image. Il Cinema Ritrovato is a festival that cannot be reduced to the binary oppositions of ‘business’ and ‘audience’ festivals outlined by Mark Peranson (2009 23-37). In its diachronic connection of short films, feature films, documentaries and cinema from across film history, exhibited in spaces including theatres, museums and a Piazza, this is a “moving image experience” greater than film according to the definition outlined by Paolo Cherchi Usai. The moving image experience connects the act of seeing with creation, preservation and access (2008 9). In its history, in the establishment of its hierarchies and in the creation of its rituals, Il Cinema Ritrovato could be aligned closely with André Bazin’s effusive description of festivals, “in which people join in holy worship of a common transcendent reality, then the Festival is a religious Order” (1955, 2009 13-19).

This festival has the continuing support of screen luminaries like Martin Scorsese (who provides access to his archive) and prestigious organisations like The World Cinema Foundation which sponsors the restoration of many films. Some of these films are surprisingly new. Recent historical forces affecting the history of film are evident in the exhibition of Mest/Revenge (Ermek Shinarbaev, 1989), a film described by Kent Jones as “one of the greatest films to emerge from the Kazakh New Wave and one of the toughest” (2010 47). Mest, a film that investigates the Korean diaspora displaced into the Russian Far East was prohibited distribution by Soviet authorities and shelved as soon as it was completed. Mest was restored by The World Cinema Foundation and Cineteca di Bologna at L’Immagine Film Laboratory, Bologna with the collaboration of its director Ermek Shinarbaev in 2010.

Programs of auteur films shown at Il Cinema Ritrovato include a retrospective of the films of Jean-Luc Godard, musicals created by Stanley Donen, silent and early sound films of John Ford, the films of Albert Capellani and a project reflecting the collaboration between Charlie Chaplin and Robert Florey. The ‘auteur’ is reconceptualised throughout these programs as embodying multiple identities, evident in nascent careers and in collaborations between filmmakers and studios. The festival consciously references the interrelationship between cinema and history in films that reflect ‘anni difficili’ in collections entitled: Hard Times: Italian Cinema Before the Codes (1945-1949), as well as Hard Times in Europe: European Cinema (1945-1952). A recurring feature of Il Cinema Ritrovato is ‘A Hundred Years Ago: European Films of 1910’, a program commemorating the cinematic technologies available one hundred years prior. Another program of films addressing issues of national identity, early communications and the development of global flows was ‘The Naples/Italy Project and Cinema of Emigration’ curated by Elena Correra and Luigi Virgolin. Many of the short films that comprise this collection were made at the turn of the century when the port city of Naples was a gateway to the rest of the world. Colour was also a focus in a program entitled ‘Searching For Colour in Films’ with many films (like Visconti’s Senso, 1954, Il Gattopardo, 1963 and Nicholas Ray’s Johnny Guitar, 1954) restored to their original vibrancy. Curator Gian Luca Farinelli notes the importance of color when he writes that the “chromatic mood” of a film might be the most secret and intimate aspect of our relationship with films we have loved (2010 99). A program entitled ‘Fearless and Peerless: Adventurous Women of the Silent Screen’ showed films featuring active, mobile feminine protagonists, detective figures who travelled by ship, plane, horse and cart, even car, women armed with guns and chloroform and were not afraid to use them.

The Piazza Maggiore is the largest open air auditorium showing films for free, connecting the local community with film buffs, scholars and archivists. Each evening of the festival viewers gathered in the twilight reserving their seats before the dusk descended providing the ambience for the nightly screening. This public screen sits on an auspicious grounds in terms of history and architecture. On the right is The Basilica Maggiore shrouded by scaffolding supporting its reconstruction. The screen faces Bologna’s Archaeological Museum, a further indication of deference to the rich history of the Comune di Bologna. The screen is surrounded by cafes and restaurants with some of Bologna’s distinctive leaning towers visible in the streets beyond. A small bio box sits at the rear of the piazza, projecting light above the audience and through the celluloid – the medium of choice for Il Cinema Ritrovato. This large public screen provides the focal point for the festival. In 2010 Il Cinema Ritrovato screened restorations of  films like Boudu Saved From Drowning (Jean Renoir, 1932), African Queen (John Huston, 1951) and the classic musical Singin’ In The Rain (Stanley Donen and Gene Kelly, 1952) which was introduced by the ebullient Stanley Donen.


Figure 1: Il Cinema Ritrovato, Piazza San Maggiore in the daylight, Bologna. Photograph: Simon McLean

Figure 2: Stanley Donen introducing Singin In The Rain. Photograph: Simon McLean

Two public screenings in The Piazza Maggiore illustrated both the innovation and the significance of this festival. The first was the breathtaking public presentation of Lumière! (2010). This portmanteau of short films curated by the Lumière  Institut, representing innovations in anaglyphic, stereoscopic film and autochromatic coloured film stock in early cinema. In 2010 the screening of Lumière! took place on a hot night when the Piazza teemed with more than six thousand people waiting to become only the second public audience in the world to watch the Lumière brothers’ experiments with stereoscopic illusions, precursors to 3D cinema. The audience demographic was broad, and included young Italian cinephiles, some luminaries from the world of cinema and film historians, many of whom would have experienced significant change in film and screen technologies throughout their lifetimes. On this particular night I noticed a young man with an awkward gait sitting uncomfortably close to a woman who set him back in his seat with a steely glare. In our row sat young mothers cradling babies on their laps. Someone had bought their dog and he slept, curled up by his owner’s feet for the entire screening. Overwhelmingly, the impression is of an incredibly diverse audience who met in the Piazza every night, a tangible sign of the vibrant life of film culture in Italy and of the devotion to the Bologna Cinematheque specifically, the organisation that presents Il Cinema Ritrovato annually.

It is not such a stretch  to imagine that the inventors Auguste and Louis Lumière were creating technologies to film and project cinema in three dimensions as early as the 1930s. However, this new package of short films displays the surprising extent of their experimentation beginning with the earliest impressions of pre-cinema in the single reel, static camera recordings of everyday events or ‘actualities’. Included within the collection of the Lumière  films presented at Il Cinema Ritrovato are the recognisable early scenes: workers leaving the Lumière  factory (La Sortie des Usines Lumière à Lyon, 1895), feeding the baby (Sortie d’usine, Repas de Bébé, 1895), but also early narrative films like The Waterer Watered (L’Arroseur Arose, 1895), sequences comprising more than a single shot, indications of experiments with cause and effect. One of the films in the collection shows a pedestrian being hit by a car, and then magically springing to his feet as the film is reversed, a homage to George Méliès and the potential for editing to provide illusions beyond reality. The magic of early cinema is evident in innovations in film narration, in experiments with space and perception, but also in the exhibition of images shot by travelling cameramen.

The program of Lumière  films includes sequences of panoramas of distant locations shot by Lumière  camera operators travelling throughout the world. One particular stereoscopic film included a panning shot revealing iconic buildings like The Blue Mosque and Hagia Sophia in the newly named city of Istanbul at the time that these images were recorded. With the camera mounted on a boat or even a bike, the Lumière Brothers are able to present early impressions of exotic cities as their camera scans locations visible from the Golden Horn and throughout Turkish street markets. A stereoscopic version of The Arrival of the Train at Ciotat Station (L’Arrivée d’un Train En Gare de La Ciotat, 1895), pushes Tom Gunning’s description of ‘the aesthetics of astonishment’ into a new, more contemporary realm. Whilst Gunning questions the mythology associated with early accounts of audience shock and the terror of witnessing the train arrive, audiences resplendent in their cardboard 3D glasses displayed the opposite – attentive wonder and fascination. With the experience of the IMAX and 3D screen common amongst contemporary audiences, shock is replaced by wonder and appreciation of the effects of stereoscopic technologies producing images in spatial relief. Coloured sequences created with the use of the Lumière’s patented autochromatic process displayed ladies in patterned dresses, pastel landscapes and the slightly unnatural glow of cityscapes. This collection of films shows the influence of the Lumière  family photography business in Louis and Auguste’s experiments with the development of cinematic technologies to produce delicately toned, coloured film.

Lumière! was presented and narrated by the director of the Lumière  Institute, Thierry Frémaux. The narration was both respectful and revealing. Frémaux showcased the Lumière  films which included sequences that feature a family of circus performers, with surprising images of children being juggled from the feet of their parents, their small bodies spinning through the air. The patriarch of this family reappears in a later sequence displaying his capacity for origami, folding and then modelling a range of hats. Frémaux drew our attention to detail in some of the staged sequences including the delight of two men dancing together at a formal ball. Films in this collection are designed to display spectacle, performance and magic. When the program of Lumière  films reached the end, it played again – in fast rewind – from the end to the beginning, a reminder of the range and depth of the images that comprise the collection. This digital restoration project emerged as a collaboration between the Institut Lumière  and L’Immagine Ritrovata laboratory of Cineteca Bologna. These experiments in early cinematic technologies, including autochrome and anaglyph films, provided a fascinating, and at times breathtaking, collection of early cinematic experiences.

Another highlight screened in the Piazza was the latest version of Fritz Lang’s Metropolis (1926), a screening that included an additional 15 minutes of footage. This recent version of Metropolis was discovered in the collection of Manuel Peña Rodriguez in 2008 and verified by the Buenos Aires Film Museum. The latest print is based on an original nitrate copy of the film that was purchased by the distributor, Adolfo Z. Wilson in 1927. The additional scenes stand out due to an alternative aspect ratio, and the surprising beauty of its faded and scratched original state. The combination of the deteriorated found material alongside the otherwise pristine film stock required viewers to look carefully and deeply into the new sequences, making visible the impact of time on celluloid. This version included detail about the rivalry between Fredersen and Rotwang for Maria and it provides the motivation for the invention of the robot. To augment the experience even further, Metropolis was accompanied by music played by the Bologna Symphony Orchestra. Audiences for this unique screening spilled into the streets beyond the Piazza, exceeding the audience numbers for Lumiere!. One of the features of Il Cinema Ritrovato is the combination of ‘live’ performance through introductions, musical soundtracks, or even narration alongside the screenings of the restored films in the Piazza Maggiore.

Whilst Lumière! and Metropolis were spectacular, some of the shorts that were screened prior to the features almost eclipsed the longer films. One of these was Il Ruscello Di Ripasottile (1941), a magical realist film shot by Roberto Rossellini and Rodolfo Lombardi. This is an eight minute experiment with the potential for cinematic illusion in documentary cinema. Rossellini creates a narrative of two conflicting threads – beginning with the story of a perch couple awaiting the imminent hatching of their eggs. A chain of talking animals is established as the fish communicate with frogs and birds in their underwater environment transmitting the good news with delight and some trepidation about the predatory trout in their vicinity. The drama escalates as the trouts overhear the conversation. Exteriors were filmed in the Ladispoli hinterland, whilst close ups of the fish were shot by creating cascading waters in the fish breeding tanks at the Ittiogenico Fish Biology Institute in Rome. Rossellini juxtaposed exterior locations with controlled interior environments and inserted the sounds and speech of animals to produce a magical realist underwater fantasy.

Il Ruscello Di Ripasottile was filmed in Italy just prior to the development of the Neorealist film movement, a time when filmic subjects usually focused on daily struggles, producing films that were sanctioned by the Fascist Regime. Whilst this short film might be interpreted as an analogy of larger power struggles, the aesthetics and lyricism distinguish this tale from the Neorealist formula. Il Ruscello Di Ripasottile was restored at L’Immagine Ritrovata laboratory, Cineteca Bologna from fragments of the film and documents discovered in Cinema Cilea de Palmi (Calabria). Il Ruscello Di Ripasottile was conserved without the necessity for absolute completion of form, some images (particularly towards the conclusion of the film) are rendered with black frames, something that serves to end the film in time and support the soundtrack. This film also exhibits damaged sequences in reverence to its original form and it shows a dedication and attention to detail in the restoration of the images and sounds that remain. Il Ruscello Di Ripasottile is indicative of the ideology that emphasises the primacy of the original in conservation and the desire to preserve as much of the source as possible. Linking film through the blood line, Il Ruscello Di Ripasottile  may also be seen as a precursor to Isabella Rosselini’s fabulous Green Porno project, a book and dvd set, investigating the sex life of various insect species.

Figure 3: Il Ruscello Di Ripasottile, (Roberto Rossellini, 1941)

Another inspiring and moving short film displayed in the Piazza Maggiore was Islands in the Lagoon (Isole Nella Laguna, Luciano Emmer, Enrico Gras, 1948), a poetic sequence that was described in 1948 as a chronicle of “the feelings and emotions of the islands” (Venturi, 2010 30). This black and white travelogue begins with a woman sitting on an island small enough to support herself and a goat that she has tethered to a pole. Whilst the goat chews grass, the woman is occupied by her sewing. As the camera escalates, the resulting panorama offers an indication of the mass of water surrounding this tiny landform, and the first sign of movement is noted in a small sailing ship that glides along the still waters of the lagoon, waters “without rest” according to the narrator. Water and land is linked inextricably by high contrast imagery, shots which blur the horizon line. The voice over narration mentions the “grand silence” of this landscape, one that supports ruins of a previous age. Bones and other traces of past lives remain invisible to two children who are busy pulling blackberries from canes. In a later scene, a gliding camera visits a quiet church, identifying its “abandoned saints”. The movement of the camera highlights the stillness of the church. Reflections of water produces an illusion that the Madonna’s eyes are glistening. Buildings are shot to emphasise the shimmering reflections in the water of the lagoon, adding the illusion of movement to the stillness of exteriors. The closing sequences of Islands in the Lagoon detail the work of the inhabitants of the islands. Sequences of women making lace, beading and sewing are juxtaposed with images of men blowing, spinning and cutting decorative glass for chandeliers, heated to extreme temperatures that render it soft and pliable. Focus falls on the serious faces of children concentrating hard on their crafts, an image that might imply their destinies. But Islands in the Lagoon concludes with the voice over narrator identifying the great treasure that Is hidden at the bottom of the sea, which, if we look closely, could be found one day.

Beyond the screenings at Il Cinema Ritrovato, the multi media exhibition Federico Fellini Dall’Italia Alla Luna (Federico Fellini From Italy to the Moon) offered a fascinating insight into the career, dreams and desires of one of the most important Italian auteurs. The Museum of Modern Art, Bologna (MAMbo) exhibited impressions of the life and work of Federico Fellini in public and in private moments. Visitors are greeted with large, vibrant posters advertising Fellini’s films. These include the powerful imagery of posters for Roma (1972), the cityscape with characters linked in an matrix of eyeline matches in La Dolce Vita (1960) and the collage of stars and filmmaker in the classic poster for 8 1/2 (1963). Beginning an exhibition with the public art provides a reflection on the first visual impressions of a Fellini film. This public imagery also includes newspaper articles and paparazzi snapshots designed to create scandal surrounding Fellini, his films and his stars. ‘Cinestories’, popular in illustrated magazines, provide insight into early storyboarding process in imagining films like The White Sheikh (1952). One of the moving image screens shows the mesmerising opening sequence from La Dolce Vita, featuring a helicopter trailing a statue.  A photostory of Fellini’s images of Mandrake the Magician appeared in Vogue and reimagined Marcello Mastroianni as Mandrake working in advertising in the early 1970s.

Figures 4 & 5: Posters for Federico Fellini’s films Roma and La Dolce Vita.

Federico Fellini Dall’Italia Alla Luna also reveals private images, behind the scenes photos from film productions and drawings that offer a (sometimes alarming) indication of Fellini’s thoughts. Fellini’s dreams are exposed in a reflective journal of watercolour illustrations and text. Drawings reveal his fear of being stuck in doorways, his anxieties of falling from buildings and Fellini’s nightmares about giant crocodiles. Watercolour illustrations magnify the proportion of breasts and penises in lascivious images of Fellini’s sexual fantasies. The exhibition includes Fellini’s thoughts on Roma where he writes: “Everything here belongs to the belly, everything is the belly… such a show is a feast for the eyes, but at the same time threatens all gazes: mouths, faces, outpouring bodies avidly swallowing”. Fellini  associates the procession of prostitutes in Roma  with both Fascist parades and processions of the Catholic Church, all of which he describes as ‘hypnotic representations’ of ritual. Photographs reveal the antics behind the scenes of Fellini’s film productions. This is illustrated by one particular image of a kitten placed gently on Anita Ekberg’s head during a lighter moment on the set of La Dolce Vita. These private images include satirical caricatures: photos written over with dialogue bubbles revealing the thoughts of the young filmmaker. The collection shows a collage of responses to a classified advertisement that Fellini published in an Italian newspaper announcing that he is ready to meet anyone who would like to see him. Displays include personal letters written to Fellini directly – one in orange texta, others containing snapshots of aspiring actors, some in profile, some in various states of undress. MAMbo’s exhibition Federico Fellini Dall’Italia Alla Luna is a revealing and rich collection of still and moving images, both public and private designed to follow the ‘red thread’ of Fellini’s obsessions.

Complementing the film and multi media programs the Cineteca Bologna and L’Immagine Ritrovata film restoration and conservation laboratory present the FIAF (Fédération International des Archives du Film) summer school in film restoration. The Summer School provides distance education on theory and film restoration as well as classes on the practice of restoration on site and an internship primarily for archivists and film industry workers. The DVD Awards ceremony acknowledges the best results in conservation and reproduction in the digital format from the previous year. Exhibitions of photographs, multi media and the commitment to training a new generation of film conservators is evidence of the breadth of Il Cinema Ritrovato and its interest in the future of restoration.

The crowds spilling beyond the limitations of the Piazza Maggiore, the full houses in Scala Scorsese or Lumière, the visitors to MAMBo, the interest in workshops in conservation provides measureable evidence of the breadth of Il Cinema Ritrovato. The devotion to film history and the reverence for film is expressed in the dedication of the organisation, which is mirrored in the vibrancy of the audiences in both large public spaces and in the more intimate theatres. The culture of Il Cinema Ritrovato sits resolutely against the swirling fears about the end of celluloid and the eclipse of film by digital media. But this is not a festival that exists in opposition to change, it is one that is progressively engaged with film and media histories. Peter von Bagh defines Il Cinema Ritrovato as a “web of correspondences in the finest sense of the word” (2010 9). He argues that the “program is always immeasurably more than a succession of films. Behind the scene of the program is not only the Bologna staff, but also so many individual participants, and the enormously knowledgeable audience we now have around us” (2010 9). Bazin notes that when the festival reviewer returns home,  “he feels as though he’s come back from far away, having spent a long spell in a world where order, rigour and necessity reign” with the experience an “amazing albeit hard-working retreat, with cinema as its unifying spiritual focus” (1955, 2009 19). On both sides of the screen, in its organisation and in its audiences, Il Cinema Ritrovato reaffirms the life and vibrancy of cinema of the near and distant past.


André Bazin (2009, 1955) ‘The Festival Viewed As A Religious Order’, Dekalog3: On Film Festivals, Richard Porton (ed.), London: Wallflower, 13-19.

Peter von Bagh (2010) ‘Introduzione/Foreword’, Il Cinema Ritrovato, 24th Edizione, Bologna: Cineteca del Comune di Bologna, 9-12.

Paolo Cherchi Usai [et al] (2008) Film Curatorship: Archives, Museums, and the Digital Marketplace, Wien: Synema.

Guy Borlee, Roberto Chiesi [eds] (2010) Il Cinema Ritrovato, 24th Edizione, Bologna: Cineteca del Comune di Bologna.

Tom Gunning (1995) ‘An Aesthetic of Astonishment’, Viewing Positions: Ways of Seeing Film, Linda Williams (ed.), New Jersey: Rutgers University Press, 114-33.

Kent Jones (2010) ‘Mest’, in Il Cinema Ritrovato, Guy Borlee, Roberto Chiesi [eds], 24th Edizione, Bologna: Cineteca del Comune di Bologna, 47.

Mark Peranson (2009) ‘First You Get the Power, Then You Get the Money: Two Models of Film Festivals’, Dekalog3, Richard Porton (ed.) London: Wallflower Press, 23-37.

Lauro Venturi (2010, 1948) ‘Isole Nella Laguna’, in Il Cinema Ritrovato, Guy Borlee, Roberto Chiesi [eds], 24th Edizione, Bologna: Cineteca del Comune di Bologna, 29-30.



Wendy Haslem is a lecturer in Screen Studies in the School of Culture of Communication at the University of Melbourne where she is also Coordinator of the Moving Image MA, which is part of the Master of Arts and Cultural Management. Wendy teaches, researches and publishes on the intersections of film history and new media.  Her research includes: Gothic film, film noir, cinema of the 1950s, Atomic culture, trauma cinema, censorship, Japanese film, Australian film culture and industry. Wendy is interested in the impact of new forms of exhibition on the archive. She is the author of ‘A Charade of Innocence and Vice’: Hollywood Gothic Films of the 1940s (2009) and she is a co-editor for the anthology Super/Heroes: From Hercules to Superman (2007). She is currently researching the evolution of the Gothic from silent cinema to new media for her book Gothic Projections: From Méliès to New Media. Email:






From Night and Day to De-Lovely: Cinematic Representations of Cole Porter – Penny Spirou

Cole Porter (1891 – 1964) is a composer and popular songwriter with two musical biopics that explore his life story: Night and Day (Michael Curtiz, 1946) and De-Lovely (Irwin Winkler, 2004). Due to their time of release and production the films offer different interpretations of Cole Porter’s life through musical integration, narrative content, star casting and genre characteristics. Star casting in De-Lovely and Night and Day transforms the way the film audience interprets the protagonist. Casting for a musical biopic is significant as it changes the perception of both the character and the actor. Several popular music artists appear in De-Lovely, not as specific characters but as performers/singers of Porter’s music. The following will explore the function of the popular music artists in the film as well as the casting of Kevin Kline as Cole Porter in comparison to Cary Grant in Night and Day. In terms of genre characteristics, De-Lovely is recognised as what Altman refers to as the ‘backstage musical’ (1987). The genre categorisation can be identified in the narrative premise of the film. The over-arching plot in De-Lovely is that Cole Porter (in the final days before his death) witnesses his life re-enacted through a musical onstage. This stage musical then becomes the diegetic film narrative. A discussion concerning the backstage musical and the function of the contemporary self-reflexive approach will conclude the analysis of De-Lovely and how style and form effects interpretation of content.
Continue reading

Keaton and the Lion: A Critical Re-evaluation of The Cameraman, Free and Easy and Speak Easily – Anna Gardner

Much of the academic writing on the films of Buster Keaton concentrates on the silent period from 1917, when Keaton began his career in films, to 1928 when his independent production company (Buster Keaton Productions) was wound up and Keaton signed a contract with Metro-Goldwyn-Mayer. This article will explore some of the less highly regarded films of Keaton’s career, namely those made at MGM between 1928 and 1933, which have been neglected in the majority of the academic literature on Keaton as they fall outside what is generally considered to be his most fertile and creative period. Continue reading

Volume 14, 2008

Double Trouble – Special Issue on Split and Double Screens 

Guest Editors: Tessa Dwyer & Mehmet Mehmet


1. Double Trouble: Editorial – Tessa Dwyer & Mehmet Mehmet

2. The Mosaic-Screen: Exploration and Definition – Sergio Dias Branco

3. Sound and Space in the Split-Screen Movie – Ian Garwood

4. The Embedded Screen and the State of Exception: Counterterrorist Narratives and the “War on Terror” – Cormac Deane

5. “What Am I… Beloved or Bewitched?” Split Screens, Gender Confusion, and Psychiatric Solutions in The Dark Mirror – Tim Snelson

6. Medusa in the Mirror: The Split World of Brian De Palma’s Carrie – David Greven

7. The Double Side of Delay: Sutapa Biswas’ film installation Birdsong and Gilles Deleuze’s Actual/Virtual Couplet – Maria Walsh

8. Missed Encounters: Film Theory and Expanded Cinema – Bruno Lessard

9. Four Cameras are Better than One: Division as Excess in Mike Figgis’ Timecode – Nadia Bozak

10. The Aesthetics of Displays: How the Split Screen Remediates Other Media – Malte Hagener

11. Double Take: Rotoscoping and the Processing of Performance – Kim Louise Walden


Sound and Space in the Split-Screen Movie – Ian Garwood

Abstract: This article focuses on the operation of sound in the split-screen movie. It concentrates, in particular, on instances where the storytelling function of sound is accompanied by the aural exploration of the split screen as a specific spatial form. Different relationships between the soundtrack and multiple frames are demonstrated through examples from The Thomas Crown Affair, The Boston Strangler and Timecode. Continue reading

Refractory Volume 8, 2005

Edited by Angela Ndalianis and Wendy Haslem

Some of the essays in this special bumper issue were presented as papers at the Men in Tights! Superheroes Conference, which was held at Melbourne University, June 2005.


1. True Lies: Do We Really Want Our Icons to Come to Life – Louise Krasniewicz

2. The Comicbook Superhero: Myth For Our Times – Nigel Kaw

3. Toys and Grrls: Comparing Figures in the Merchandising of Television’s Action Heroine – Miranda J. Banks

4. What the *Hezmanah* Are You Talking about?: Alien Discourses in ‘Farscape’ – Jes Battis

5. Xena’s Double-Edged Sword: Sapphic Love & the Judaeo-Christian tradition – Ivar Kvistad

6. Romancing the vampire: the lives and loves of two vampire slayers: Anita and Buffy – Ingrid Hofman-Howley

7. Smallville’s Sexual Symbolism: From Queer Repression to Fans’ Queered Expressions – Anne Kustritz

8. Cyborg girls and shape-shifters: The discovery of difference by Anime and Manga Fans in Australia – Craig Norris

9. The Bold and the Forgetful: Amnesia, Character Mutability and Serial Narrative Form in The X-Men – Radha O’Meara

10. All’s Well, the Twentieth Century Dies: David Bowie as Postmodern Art Detective Professor – Kellie A. Wacker

11. Side FX – the Aura of Electronics in the Information Age – Rock Chugg

12. More than Meets the Eye: the Suburban Cinema Megaplex as Sensory Heterotopia – Leanne Downing