Subtitles on the Moving Image: an Overview of Eye Tracking Studies – Jan Louis Kruger, Agnieszka Szarkowska and Izabela Krejtz


This article provides an overview of eye tracking studies on subtitling (also known as captioning), and makes recommendations for future cognitive research in the field of audiovisual translation (AVT). We find that most studies in the field that have been conducted to date fail to address the actual processing of verbal information contained in subtitles, and rather focus on the impact of subtitles on viewing behaviour. We also show how eye tracking can be utilised to measure not only the reading of subtitles, but also the impact of stylistic elements such as language usage and technical issues such as the presence of subtitles during shot changes on the cognitive processing of the audiovisual text as a whole. We support our overview with empirical evidence from various eye tracking studies conducted on a number of languages, language combinations, viewing contexts as well as different types of viewers/readers, such as hearing, hard of hearing and Deaf people.


The reading of printed text has received substantial attention from scholars since the 1970s (for an overview of the first two decades see Rayner et al. 1998). Many of these studies, conducted from a psycholinguistic angle, made use of eye tracking. As a result, a large body of knowledge exists on the eye movements during reading of people with varying levels of reading skills and language proficiency, with a range of ages, different first languages and cultural backgrounds, and in different contexts. Studies on subtitle reading, however, have not achieved the same level of scientific rigour largely for practical reasons: subtitles are not static for more than a few seconds at a time; they compete for visual attention with a moving image; and they compete for overall cognitive resources with verbal and non-verbal sounds. This article will identify some of the gaps in current research in the field, and also illustrate how some of these gaps can be bridged.

Studying the reading of subtitles is significantly different from studying the reading of static text. In the first place, as far as eye tracking software is concerned, the subtitles appear on a moving image as image rather than text, which renders traditional text-based reading statistics and software all but useless. This also makes the collection of data for reading research on subtitles a painstakingly slow process involving substantial manual inspections and coding. Secondly, the fact that subtitles appear against the background of the moving image means that they are always in competition with this image, which renders the reading process fundamentally different from the reading process of static texts: on the one hand because the reading of subtitles compete with the processing of the image, sometimes resulting in interrupted reading, but on the other hand the limited time the subtitles are on screen means that readers have less time to reread or regress to study difficult words or to check information. Either way, studying this reading process, and the cognitive processing that takes place during the reading, is much more complicated than in the case of static texts where we know that the reader is mainly focussing on the words before her/him without additional auditory and visual information to process.

While the viewing of subtitles has been the object of many eye tracking studies in recent years, with increasing frequency (see, for example Bisson et al. 2012; d’Ydewalle and Gielen 1992; d’Ydewalle and De Bruycker 2007; Ghia 2012; Krejtz et al. 2013; Kruger 2013; Kruger et al. 2013; Kruger and Steyn 2014; Perego et al. 2010; Rajendran et al. 2013; Specker 2008; Szarkowska et al. 2011; Winke et al. 2013), the study of the reading of subtitles remains a largely uncharted territory with many research avenues still to be explored. Those studies that do venture to measure more than just attention to the subtitle area, seldom do this for extended texts.

In this article we provide an overview of studies on how subtitles change the way viewers process audiovisual material, and also of studies on the unique characteristics of the subtitle reading process. Taking an analysis of the differences between reading printed (static) text and subtitles as point of departure, we examine a number of aspects typical of the way subtitle text is processed in reading. We also look at the impact of the dynamic nature of the text and the competition with other sources of information on the reading process (including scene perception, changes in the viewing process, shifts between subtitles and image, visual saliency of text, faces, and movement, and cognitive load), as well as discussing studies on the impact of graphic elements on subtitle reading (e.g. number of lines, and text chunking), and studies that attempt to measure the subtitle reading process in more detail.

We start off with a discussion of the way in which watching an audiovisual text with subtitles alters viewing behaviour as well as of the complexities of studying subtitles due to the dynamic nature of the image it has as a backdrop. Here we focus on the fleeting nature of the subtitle text, the competition between reading the subtitles and scanning the image, and the interaction between different sources of information. We further discuss internal factors that impact on subtitle processing, like the language and culture of the audience, the language of the subtitles, the degree of access the audience has to sound, and other internal factors, before turning to external factors related to the nature of the audiovisual text and the presentation of the subtitles. Finally, we provide an overview of studies attempting to measure the processing of subtitles as well as findings from two studies that approach the processing of subtitles

The dynamic nature of the subtitle reading process

Reading subtitles differs substantially from reading printed text in a number of respects. As opposed to “static text on a stable background”, the viewer of subtitled audiovisual material is confronted with “fleeting text on a dynamic background” (Kruger and Steyn 2014, 105). In consequence, viewers not only need to process and integrate information from different communication channels (verbal visual, non-verbal visual, verbal auditory, non-verbal auditory, see Gottlieb 1998), but they also have no control over the presentation speed (see Kruger and Steyn 2014; Szarkowska et al. forthcoming). As a consequence, unlike in the reading of static texts, the pace of reading is in part dictated by the text rather than the reader – by the time the text is available to be read – and there is much less time for the reader to regress to an earlier part of a sentence or phrase, and no opportunity to return to previous sentences. Reading traditionally takes place in a limited window which the reader is acutely aware will disappear in a few seconds. Even though there are exceptions to the level of control a viewer has, for example in the case of DVD and PVR as well as other electronic media where the viewer can rewind and forward at will, the typical viewing of subtitles for most audiovisual products happens continuously and without pauses just as when watching live television.

Regressions, which form an important consideration in the reading of static text, take on a different aspect in the context of the knowledge (the viewer has) that dwelling too much on any part of a subtitle may make it difficult to finish reading the subtitle before it disappears. Any subtitle is on screen for between one and six seconds, and the viewer also has to simultaneously process all the other auditory (in the case of hearing audiences) and visual cues. In other words, unlike when reading printed text, reading becomes only one of the cognitive processes the viewer has to juggle in order to understand the audiovisual text as a whole. Some regressions are in fact triggered by the change of the image in shot changes (and to a much lesser extent scene changes) when the text stays on across these boundaries, which means that the viewer sometimes returns to the beginning of the subtitle to check whether it is a new subtitle, and sometimes even re-reads the subtitle. For example, in a recent study, Krejtz et al. (2013) established that participants tend not to re-read subtitles after a shot change or cut. But their data also revealed that a proportion of the participants did return their gaze to the beginning of the subtitle after such a change (see also De Linde and Kay, 1999). What this means for the study of subtitle reading is that these momentary returns (even if only for checking) result in a class of regressions that is not in fact a regression to re-read a word or section, but rather a false initiation of reading for what some viewers initially perceive to be a new sentence.

On the positive side, the fact that subtitles are embedded on a moving image and are accompanied by a soundtrack (in the case of hearing audiences) facilitates the processing of language in context. Unfortunately, this context also introduces competition for attention and cognitive resources. For the Deaf and hard of hearing audience, attention has to be divided between reading the subtitles and processing the scene, extracting information from facial expressions, lip movements and gestures, and matching or checking this against the information obtained in the subtitles. For the hearing audience who makes use of subtitles for support or to provide access to foreign language dialogue, attention is likewise divided between subtitles and the visual scene, and just as the Deaf and hard of hearing audiences have the added demand on their cognitive resources of having to match what they read with what they get from non-verbal signs and lip movements, the hearing audience matches what they read with what they hear, checking for correspondence of information and interpreting intonation, tenor and other non-verbal elements of speech.

What stands beyond doubt is that the appearance of subtitles changes the viewing process. In 2000, Jensema et al. famously stated that “the addition of captions to a video resulted in major changes in eye movement patterns, with the viewing process becoming primarily a reading process” (2000a, 275). Having examined the eye movements of six subjects watching video clips with and without subtitles, they found that the onset of a subtitle triggers a change in the eye movement pattern: when a subtitle appears, viewers move their gaze from whatever they were watching in order to follow the subtitle. In a more wide-scale study it was concluded by d’Ydewalle and de Bruycker (2007,196) that “paying attention to the subtitle at its presentation onset is more or less obligatory and is unaffected by major contextual factors such as the availability of the soundtrack, knowledge of the foreign language in the soundtrack, and important episodic characteristics of actions in the movie: Switching attention from the visual image to “reading” the subtitles happens effortlessly and almost automatically”.

Subtitles therefore appear to be the cause of eye movement bias similar to faces (see Hershler & Hochstein, 2005; Langton, Law, Burton, & Schweinberger, 2008; Yarbus, 1967), the centre of the screen, contrast and movement. In other words, subtitles attract the gaze at least in part because of the fact that the eye is drawn to the words on screen just as the eye is drawn to movement and other elements. Eyes are drawn to subtitles not only because the text is identified as a source of meaningful information (in other words a top-down impulse as the viewer consciously consults the subtitles to obtain relevant information), but also because of the change to the scene that the appearance of a subtitle causes (in other words a bottom-up impulse, automatically drawing the eyes to what has changed on the screen).

As in most other contexts, the degree to which viewers will process the subtitles (i.e. read them rather than merely look at them when they appear and then look away) will be determined by the extent to which they need the subtitles to follow the dialogue or to obtain information on relevant sounds. In studying visual attention to subtitles it therefore remains a priority to measure the degree of processing, something that has not been done in more than a handful of studies, and something to which we will return later in the article.

Viewers usually attend to the image on the screen, but when subtitles appear, it only takes a few frames for most viewers to move their gaze to read the subtitles. The fact that people tend to move their gaze to subtitles the moment they appear on the screen is illustrated in Figures 1 and 2.

Figure. 1 Heat maps of three consecutive film stills – Polish news programme Fakty (TVN) with intralingual subtitles.

Figure. 1 Heat maps of three consecutive film stills – Polish news programme Fakty (TVN) with intralingual subtitles.

Figure 2. Heat maps of two consecutive film stills – Polish news programme Wiadomości (TVP1) with intralingual subtitles

Figure 2. Heat maps of two consecutive film stills – Polish news programme Wiadomości (TVP1) with intralingual subtitles

Likewise, when the gaze of a group of viewers watching an audiovisual text without subtitles is compared to that of a similar group watching the same text with subtitles, the split in attention is immediately visible as the second group reads the subtitles and attends less to the image, as can be seen in Figure 3.

Figure 3. Heat maps of the same scene seen without subtitles and with subtitles – recording of an academic lecture.

Figure 3. Heat maps of the same scene seen without subtitles and with subtitles – recording of an academic lecture.

Viewer-internal factors that impact on subtitle processing

The degree to which the subtitles are processed is far from straightforward. In a study performed at a South African university in the context of Sesotho students looking at a recorded lecture with subtitles in their first language and audio in English (their language of instruction), students were found to avoid looking at the subtitles (see Kruger, Hefer and Matthew, 2013b). Sesotho students in a different group who saw the same lecture with English subtitles processed the subtitles to a much larger extent. This contrast is illustrated in the focus maps in Figures 4.


Figure 4. Focus maps of Sesotho students looking at a lecture in intralingual English subtitles (left) and another group looking at the same lecture with interlingual Sesotho subtitles (right) – recording of an academic lecture.

The difference in eye movement behaviour between the conditions is also evident when considering the number of subtitles skipped. Participants in the above study who saw the video with Sesotho subtitles skipped an average of around 50% of the Sesotho subtitles (median at around 58%), whereas participants who saw the video with English subtitles only skipped an average of around 20% of the English subtitles (with a median of around 8%) (see Kruger, Hefer & Matthew, 2014).

This example does not, however, represent the conventional use of subtitles where viewers would rely on the subtitles to gain access to a text from which they would have been excluded without the subtitles. It does serve to illustrate that subtitle reading is not unproblematic and that more research is needed on the nature of processing in different contexts by different audiences. For example, in a study in Poland, interlingual subtitles (English to Polish) were skipped slightly less often by hearing viewers compared to intralingual subtitles (Polish to Polish), possibly because hearing viewers didn’t need them to follow the plot (see Szarkowska et al., forthcoming).

Another important finding from eye tracking studies on the subtitle process relates to how viewers typically go about reading a subtitle. Jensema et al. (2000) found that in subtitled videos, “there appears to be a general tendency to start by looking at the middle of the screen and then moving the gaze to the beginning of a caption within a fraction of a second. Viewers read the caption and then glance at the video action after they finish reading” (2000, 284). This pattern is indeed often found, as illustrated in the sequence of frames from a short video from our study in Figure 5.

Figure 5. Sequence of typical subtitle reading – a recording of Polish news programme Fakty (TVN) with intralingual subtitles.

Figure 5. Sequence of typical subtitle reading – a recording of Polish news programme Fakty (TVN) with intralingual subtitles.

Some viewers, however, do not read so smoothly and tend to shift their gaze between the image and the subtitles, as demonstrated in Figure 6. The gaze shifts between the image and the subtitle, also referred to in literature as ‘deflections’ (de Linde and Kay 1999) or ‘back-and-forth shifts’ (d’Ydewalle and De Bruycker (2007), can be regarded as an indication of the smoothness of the subtitle reading process: the fewer the gaze shifts, the more fluent the reading and vice versa.

Figure 6. Scanpath of frequent gaze shifting between text and image – a recording of Polish news programme Fakty (TVN) with intralingual subtitles.

Figure 6. Scanpath of frequent gaze shifting between text and image – a recording of Polish news programme Fakty (TVN) with intralingual subtitles.

An important factor that influences subtitle reading patterns is the nature of the audience. In Figure 7 an interesting difference is shown between the way a Deaf and a hard of hearing viewer watched a subtitled video. The Deaf viewer moved her gaze from the centre of the screen to read the subtitle and then, after having read the subtitle, returned the gaze to the centre of the screen. In contrast, the hard of hearing viewer made constant comparisons between the subtitles and the image, possibly relying on residual hearing and trying to support the subtitle reading process with lip-reading. Such a result was reported by Szarkowska et al. (2011), who found differences in the number of gaze shifts between the subtitles and the image in the verbatim subtitles condition, particularly discernible (and statistically significant) in the hard of hearing group (when compared to the hearing and Deaf groups).

Figure 7. Scanpaths of Deaf and hard of hearing viewers. Left: Gaze plot illustrating the viewing pattern of a Deaf participant watching a clip with verbatim subtitles.  Right: Gaze plot illustrating the viewing pattern of a hard of hearing participant watching a clip with verbatim subtitles.

Figure 7. Scanpaths of Deaf and hard of hearing viewers. Left: Gaze plot illustrating the viewing pattern of a Deaf participant watching a clip with verbatim subtitles. Right: Gaze plot illustrating the viewing pattern of a hard of hearing participant watching a clip with verbatim subtitles.

These provisional qualitative indications of differences between eye movements of users with different profiles require more in-depth quantitative investigation and the subsequent section will provide a few steps in this direction.

As mentioned above, subtitle reading patterns largely depend on the type of viewers. Fluent readers have been found to have no difficulty following subtitles. Diao et al. (2007), for example, found a direct correlation between the impact of subtitles on learning and the academic and literacy levels of participants. Similarly, given that “hearing status and literacy tend to covary” (Burnham et al. 2008, 392), some previous studies found important differences in the way hearing and hearing-impaired people watch subtitled programmes. Robson (2004, 21) notes that “regardless of their intelligence, if English is their second language (after sign language), they [i.e. Deaf people] cannot be expected to have the same comprehension levels as hearing people who grew up exposed to English”. This is indeed confirmed by Szarkowska et al. (forthcoming) who report that Deaf and hard of hearing viewers in their study made more fixations on subtitles and that their dwell time on the subtitles was longer compared to hearing viewers. This result may indicate a larger effort needed to process subtitled content and more difficulty in extracting information (see Holmqvist et al. 2011, 387-388). This, in turn, may stem from the fact that for some Deaf people the language in the subtitles is not their mother tongue (their L1 being sign language). At the same time, for hearing-impaired viewers, subtitles provide an important source of information on the words spoken in the audiovisual text as well as other information contained in the audio track, which in itself explains the fact that they would spend more time looking at the subtitles.

Viewer-external factors that impact on subtitle processing

The ‘smoothness’ of the subtitle reading process depends on a number of factors, including the nature of the audiovisual material as well as technical and graphical aspects of subtitles themselves. At a general level, genre has an impact on both the role of subtitles in the total viewing experience, and on the way viewers process the subtitles. For example, d’Ydewalle and Van Rensbergen (1989) found that children in Grade 2 paid less attention to subtitles if a film involved a lot of action (see d’Ydewalle & Bruycker 2007 for a discussion). The reasons for this could simply be that action film tends to have less dialogue in the first place, but secondly and more significantly, the pace of the visual editing and the use of special effects creates a stronger visual element which shifts the balance of content towards the action (visual content) and away from dialogue (soundtrack and therefore subtitles). This, however, is an area that has to be investigated empirically. At a more specific level, technical characteristics of an audiovisual text such as film editing have an impact on the processing of subtitles.

1 Film editing

Film editing has a strong influence on the way people read subtitles, even beyond the difference in editing pace as a result of genre (for example, action and experimental films could typically be said to have a higher editing pace than dramas and documentaries). In terms of audience perception, viewers have been found to be unaware of standard film editing techniques (such as continuity editing) and are thus able to perceive film as a continuous whole in spite of numerous cuts – the phenomenon termed “edit blindness” (Smith & Henderson, 2008, 2). With more erratic and fast-paced editing, it stands to reason that the cognitive demands will increase as viewers have to work harder to sustain the illusion of a continuous whole.

When subtitles clash with editing such as cuts (i.e. if subtitles stay on screen over a shot or scene change), conventional wisdom as passed on by generations of subtitling guides (see Díaz Cintas & Remael 2007, ITC Guidance on Standards for Subtitling 1999) suggests that the viewer will assume that the subtitle has changed with the image and as a consequence they will re-read it (see above). However, Krejtz et al. (2013) reported that subtitles displayed over shot changes are more likely to cause perceptual confusion by making viewers shift their gaze between the subtitle and the rest of the image more frequently than subtitles which do not cross film cuts (cf. de Linde and Kay 1999). As such, the cognitive load is bound to increase.

2 Text chunking and line segmentation

Another conventional wisdom, perpetuated in subtitling guidelines and standards, is that poor line segmentation will result in less efficient processing (see Díaz Cintas & Remael 2007, Karamitroglou 1998). In other words, subtitles should be chunked per line and between subtitles in terms of self-contained semantic units. The line of dialogue: “He told me that he would meet me at the red mailbox” should therefore be segmented in something like the following ways:

He told me he would meet me
at the red mailbox.


He told me
he would meet me at the red mailbox.

Neither of the following segmentations would be optimal because the prepositional phrase ‘at the red mailbox’ and the verb phrase ‘he would meet me’, respectively, are split, which is considered an error:

He told me he would meet me at the
red mailbox

He told me he
would meet me at the red mailbox.

However, Perego et al. (2010) found that poor line segmentation in two-line subtitles did not affect subtitle comprehension negatively. They also investigated 28 subtitles viewed by 16 participants using a threshold line between the subtitle region and the upper part of the screen, or main film zone, but did not find a statistically significant difference between the well-segmented and ill-segmented subtitles in terms of fixation counts, total fixation time, or number of shifts between subtitle region and upper area. The only statistically significant difference they found was between the mean fixation duration within the subtitle area between the two conditions, with the mean fixation duration in the ill-segmented subtitles being on average 12ms longer than in the well-segmented subtitles. Although the authors downplay the importance of this difference on the grounds that the difference is so small, it does seem to indicate at least a slightly higher cognitive load when the subtitles are ill-segmented. The small number of subtitles and participants, however, make it difficult to generalize from their results, again a result of the fact that it is difficult to extract reading statistics for subtitles unless the reading behaviour can be quantified over longer audiovisual texts.

In a study conducted a few years later, Rajendran et al. (2013) found that “chunking improves the viewing experience by reducing the amount of time spent on reading subtitles” (2013, 5). This study compared conditions different from those investigated in the previous study, excluding the ill-segmented condition of Perego et al. (2010), and focused mostly on live subtitling with respeaking. In the earlier study, which focused on pre-recorded subtitling, the subtitles in the two conditions were essentially still part of one sense unit that appeared as one two-line subtitle. In the later study, the conditions were chunked by phrase (similar to the well-segmented condition of the earlier study but with phrases appearing one by one on one line), no segmentation (where the subtitle area was filled with as much text as possible with no attempt at segmentation), word by word (where words appeared one by one) and chunked by sentence (where the sentences showed up one by one). Regardless of the fact that this later study therefore essentially investigated different conditions, they did find that the most disruptive condition was where the subtitle appeared word by word – eliciting more gaze points (defined less strictly than in fixation algorithms used by commercial eye trackers) and more “saccadic crossovers” or switches between image and subtitle area. However, in this study by Rajendran et al. (2013), the videos were extremely short (under a minute), and the sound was muted, hampering the ecological validity of the material, and once again making the findings less suitable to generalization.

Although both these studies have limitations in terms of generalizability, they both provide some indication that segmentation has an impact on subtitle processing. Future studies will nonetheless have to investigate this aspect over longer videos to determine whether the graphical appearance, and particularly the segmentation of subtitles, has a detrimental effect on subtitle processing in terms of cognitive load and effectiveness.

3 Language

The language of subtitles has received considerable attention from psycholinguists in the context of subtitle reading. D’Ydewalle and de Bruycker (2007) examined eye movement behaviour of people reading standard interlingual subtitles (with the audio track in a foreign language and subtitles in their native language) and reversed subtitles (with the audio in their mother tongue and subtitles in a foreign language). They found more regular reading patterns in the standard interlingual subtitling condition, with the reversed subtitling condition having more subtitles skipped, fewer fixations per subtitle, etc. (see also d’Ydewalle and de Bruycker 2003 and Pavakanun 1993). This is an interesting finding in itself, as it is the reversed subtitling that has been found to be particularly conducive to foreign language learning (see Díaz Cintas and Fernández Cruz 2008, and Vanderplank 1988).

Szarkowska et al. (forthcoming) examined differences in reading patterns of intralingual (Polish to Polish) and interlingual (English to Polish) subtitles among a group of Deaf, hard of hearing and hearing viewers. They found no differences in reading for the Deaf and hard of hearing audiences, but hearing people made significantly more fixations to subtitles when watching English clips with interlingual Polish subtitles than Polish clips with intralingual Polish subtitles. This confirms that the hearing viewers processed the subtitles to a significantly lower degree when they were redundant, as in the case of intralingual transcriptions of the soundtrack. What would be interesting to investigate in this context is those instances when the hearing audience did in fact read the subtitles, to determine to what extent and under what circumstances the redundant written information is used by viewers to support their auditory intake of information.

In a study on the influence of translation strategies on subtitle reading, Ghia (2012) investigated the differences in the processing of literal vs. non-literal translations into Italian of an English film clip (6 minutes) when watched by Italian EFL learners. According to Ghia, just as subtitle format, layout, and segmentation have the potential to affect visual and perceptual dynamics, the relationship translation establishes with the original text means that “subtitle translation is also likely to influence the perception of the audiovisual product and viewers’ general reading patterns” (2012,175). Ghia particularly wanted to investigate the processing of different translation strategies in the presence of sound and image with the subtitles. In her study she found that the non-literal translations (where the target text diverged from the source text) resulted in more deflections between text and image. This is similar to the findings of Rajendran et al. (2013) in terms of less fluent graphics in word-by-word subtitles.

As can be seen from the above, the aspect of language processing in the context of subtitled audiovisual texts has received some attention, but has not to date been approached in any comprehensive manner. In particular, there is a need for more psycholinguistic studies to determine how subtitle reading differs from the reading of static text, and how this knowledge can be applied to the practice of subtitling.

Measuring subtitle processing

1 Attention distribution and presentation speed

In the study by Jensema et al. (2000), subjects spent on average 84% of the time looking at subtitles, 14% at the video picture and 2% outside of the frame. The study represents an important early attempt to identify reading patterns in subtitle reading, but it has considerable limitations. The study had only six participants, three deaf and three hearing, and the video clips were extremely short (around 11 seconds each), presented with English subtitles (in upper case) without sound. The fact that there was no soundtrack therefore impacted on the time spent on the subtitles. In Perego et al’s study (2010), the ratio is reported as 67% on the subtitle area and 33% on the image. In this study there were 41 Italian participants who watched a 15-minute clip with Hungarian soundtrack and subtitles in Italian. As in the previous study, the audience therefore had to rely heavily on the subtitles in order to follow the dialogue. Kruger et al. (2014), in the context of intralingual subtitles in a Psychology lecture in English, found a ratio of 43% on subtitles, 43% on the speaker and slides and 14% on the rest of the screen. When the same lecture was subtitled into Sotho, the ratio changed to 20% on the subtitles, 66% on the speaker and slides, and 14% on the rest of the screen. This wide range is an indication of the difference in the distribution of visual attention in different contexts with different language combinations, different levels of redundancy of information, and differences in audiences.

In order to account for “the audiovisual nature of subtitled programmes”, Romero-Fresco (in press) puts forward the notion of ‘viewing speed’ – as opposed to reading speed and subtitling speed – which he defines as “the speed at which a given viewer watches a piece of audiovisual material, which in the case of subtitling includes accessing the subtitle, the accompanying images and the sound, if available”. The perception of subtitled programmes is therefore a result of not only the subtitle reading patterns, but also the visual elements of the film. Based on the analysis of over seventy-one thousand subtitles created in the course of the Digital Television for All project, Romero Fresco provides the following data on the viewing speed, reflecting the proportion of time spent by viewers looking at subtitles and at the images, proportional to the subtitle presentation rates (see Table 1).

Viewing speed Time on subtitles Time on images
120wpm ±40% ±60%
150wpm ±50% ±50%
180wpm ±60%-70% ±40%-30%
200wpm ±80% ±20%

Table 1. Viewing speed and distribution of gaze between subtitles and images (Romero-Fresco) 

Jensema et al. also suggested that the subtitle presentation rate may have an influence on the time spent reading subtitles vs. watching the rest of the image: “higher captioning speed results in more time spent reading captions on a video segment” (2000, 275). This was later confirmed by Szarkowska et al. (2011), who found that viewers spent more time on verbatim subtitles displayed at higher presentation rates compared to edited subtitles displayed with low reading speed, as illustrated by Figure 8.

Figure 8. Fixation-count based heatmaps illustrating changes in attention allocation of hearing and Deaf viewers watching videos subtitled at different rates.

Figure 8. Fixation-count based heatmaps illustrating changes in attention allocation of hearing and Deaf viewers watching videos subtitled at different rates.

2 Mean fixation duration

Irwin (2004, 94) states that “fixation location corresponds to the spatial locus of cognitive processing and that fixation or gaze duration corresponds to the duration of cognitive processing of the material located at fixation”. Within the same activity (e.g. reading), longer mean fixation durations could therefore be said to reflect more cognitive processing and higher cognitive load. One would therefore expect viewers to have longer fixations when the subject matter is more difficult, or when the language is more specialized. Across activities, however, comparisons of fixation duration is less meaningful as reading elicits more shorter fixations than scene perception or visual scanning simply because of the nature of the activities. It is therefore essential in eye tracking studies of subtitle reading to distinguish between the actual subtitles when they are on screen, the rest of the screen, and the subtitle area when there is no text (between successive subtitles).

The difference between reading and scene perception is illustrated in Figure 9, demonstrating that fixations on the image tend to be longer (indicated here by a bigger circle) than those on subtitles (which indicates more focused viewing), and more exploratory in nature (see the distinction between focal and ambient fixations in Velichkovsky et al. 2005).

Figure 9. Differences in fixation durations between the image and subtitle text – from Polish TV series Londyńczycy.

Figure 9. Differences in fixation durations between the image and subtitle text – from Polish TV series Londyńczycy.

Rayner (1984) indicated the impact of different tasks on mean fixation durations, as reflected in Table 2 below:

Task Mean fixation duration (ms) Mean saccade size (degrees)
Silent reading 225 2 (about 8 letters)
Oral reading 275 1.5 (about 6 letters)
Visual search 275 3
Scene perception 330 4
Music reading 375 1
Typing 400 1 (about 4 letters)

 Table 2. Approximate Mean Fixation Duration and Saccade Length in Reading, Visual Search, Scene Perception, Music Reading, and Typing[1]

In subtitling, silent reading is accompanied by simultaneous processing of the same information in the soundtrack (in the same or another language) as well as of other sounds and visual signs (for a hearing audience, that is – for a Deaf audience, it would be text and visual signs). The difference in mean fixation duration in these different tasks therefore reflects the difference in cognitive load. In silent reading of static text, there is no external competition for cognitive resources. When reading out loud, the speaker/reader inevitably monitor his/her own reading, introducing additional cognitive load. As the nature of the sign becomes more abstract, the load, and the fixation duration increases, and in the case of typing, different processing, production and checking activities are performed simultaneously, resulting in even higher cognitive load. This is inevitably an oversimplification of cognitive load, and indeed the nature of information acquisition between reading successive groups of letters (words) in a linear fashion is significantly different from that of scanning a visual scene for cues.

Undoubtedly, subtitle reading imposes different cognitive demands, and these demands are also very much dependent on the audience. In an extensive study on the differences in subtitle reading between Deaf, hard of hearing and hearing participants, we found a high degree of variation in mean fixation duration between the groups, and also a difference between the mean fixation duration in the Deaf and the hard of hearing groups between subtitles presented at 12 characters per second and 15 characters per second (see Szarkowska et al. forthcoming).

  12 characters per second 15 characters per second
Deaf 241.93 ms 232.82 ms
Hard of hearing 218.51 ms 214.78 ms
Hearing 186.66 ms 186.58 ms

Table 3. Differences in reading subtitles presented at different rates

Statistical analyses performed on the three groups with mean fixation duration as a dependent variable and groups and speed as categorical factors produced a statistically significant main effect, further confirmed by subsequent t-tests that yielded statistically significant differences in mean fixation duration for both subtitling speeds between all three groups. The difference within the Deaf and hard of hearing groups was also significant between 12cps and 15cps. What this suggests is that reading speed has a more pronounced effect on Deaf and hard of hearing viewers than on hearing ones.

3 Subtitle reading

As indicated at the outset, one of the biggest hurdles in studying the processing of subtitles is the fact that the subtitles appear as image on image rather than text on image as far as eye tracking analysis software is concerned. Whereas reading statistics software can therefore automatically mark words as areas of interest in static texts, and then calculate number of regressions, refixations, saccade length, fixation duration and count as related to the specific words, this process has to be done manually for subtitles. The fact that it is virtually impossible to create similar areas of interest on the subtitle words that are embedded in the image over large numbers of subtitles makes it very difficult to obtain reliable eye tracking results on subtitles as text. This explains the predominance of measures such as fixation count and fixation duration as well as shifts between subtitle area and image in eye tracking studies on subtitle processing. As a result, many of these studies do not distinguish directly between looking at the subtitle area and reading the subtitles, and, “they tend to define crude areas of interest (AOIs), such as the entire subtitle area, which means that eye movement data are also collected for the subtitle area when there are no subtitles on screen, which further skews the data” (Kruger and Steyn, 2014, 109).

Although a handful of studies come closer to studying subtitle reading by going beyond the study of fixation counts, mean fixation duration, and shifts between subtitle area and image area, most studies tend to focus on amount of attention rather than nature of attention. Briefly, the exceptions can be identified in the following studies: Specker (2008) looks at consecutive fixations; Perego et al. (2010) add the path length (sum of saccade lengths in pixels) to the more conventional measures; Rajendran et al. (2013) add the proportion of gaze points; Ghia (2012) looks at fixations on specific words as well as regressions; Bisson et al. (2012) look at the number of subtitles skipped, and proportion of successive fixations (number of successive fixations divided by total number of fixations); and in one of the most comprehensive studies on the subject of subtitle processing, d’Ydewalle and De Bruycker (2007) look at attention allocation (percentage of skipped subtitles, latency time, and percentage of time spent in the subtitle area), fixations (number, duration, and word-fixation probability), and saccades (saccade amplitude, percentage of regressive eye movements, and number of back-and-forth shifts between visual image and subtitle).

In a recent study, Kruger and Steyn (2014) provide a reading index for dynamic texts (RIDT) designed specifically to measure the degree of reading that takes place when subtitled material is viewed. This index is explained as “a product of the number of unique fixations per standard word in any given subtitle by each individual viewer and the average forward saccade length of the viewer on this subtitle per length of the standard word in the text as a whole” (2014, 110). Taking the location and start time of successive fixations within the subtitle area when a subtitle is present as the point of departure, the number of unique fixations (i.e. excluding refixations, and fixations following a regression) is determined, as well as the average length of forward saccades in the subtitle. This information gives an indication of the meaningful processing of the words in the subtitle when the number of fixations per word, as well as the length of saccades as ratio of the length of the average word in the audiovisual text are calculated. Essentially, the formula quantifies the reading of a particular subtitle by a particular participant by measuring the eye movement during subtitle reading against what is known about eye movements during reading and perceptual span.

In a little more detail, the formula can be written as follows for video v, with participant p viewing subtitle s”:


(Kruger and Steyn, 2014, 110).

This index was validated by performing a comparison of the manual inspection of the reading of 145 subtitles by 17 participants, and makes it possible to study the reading of subtitles over extended texts. In their study, Kruger and Steyn (2014) use the index to determine the relationship between subtitle reading and performance in an academic context, finding a significant positive correlation between the degree to which participants read the subtitles and their performance in a test written after watching subtitled lectures. The RIDT therefore presents a robust index of the degree to which subtitles are processed over extended texts, and could add significant value to psycholinguistic studies on subtitles. Using the index, previous claims that subtitles have a positive or negative impact on comprehension, vocabulary acquisition, language learning or other dependent variables, can be correlated with whether or not viewers actually read the subtitles, and to what extent the subtitles were read.


From this overview of studies investigating the processing of subtitles on the moving image it should be clear that much still needs to be done to gain a better understanding of the impact of various independent variables on subtitle processing. The complexity of the multimodal text, and in particular the competition between different sources of information, means that a subtitled audiovisual text is a substantially altered product from a cognitive perspective. Much progress has been made in coming to grips with the way different viewers behave when looking at subtitled audiovisual texts, but there are still more questions than answers – relating, for instance, to differences in how people process subtitled content on various devices (cf. the HBBTV4ALL project). The use of physiological measures like eye tracking and EEG (see Kruger et al. 2014) in combination with subjective measures like post-report questionnaires is, however, continually bringing us closer to understanding the impact of audiovisual translation like subtitling on the experience and processing of audiovisual texts.



This study was partially supported by research grant No. IP2011 053471 “Subtitling for the deaf and hard of hearing on digital television” from the Polish Ministry of Science and Higher Education for the years 2011–2014.



Bisson, Marie-Josée, Walter Van Heuven, Kathy Conklin, and Richard Tunney. 2014. “Processing of Native and Foreign Language Subtitles in Films: An Eye Tracking Study.” Applied Psycholinguistics 35(2):399-418.

Burnham, Denis, Leigh Greg, Noble William, Jones Caroline, Tyler Michael, Grebennikov Leonid and Alex Varley. 2008. Parameters in television captioning for deaf and hard-of-hearing adults: effects of caption rate versus text reduction on comprehension. Journal of Deaf Studies and Deaf Education 13 (3):391-404.

de Linde, Zoé and Neil Kay. 1999. The Semiotics of Subtitling. Manchester: St. Jerome.

Diao, Y., Chandler, P., Sweller, J. 2007. The effect of written text on comprehension of spoken English as a foreign language. The American Journal of Psychology 120(2): 237-261.

Díaz Cintas, Jorge and Marco Fernandez Cruz. (2008) “Using subtitled video materials for foreign language instruction”. In The Didactics of Audiovisual Translation edited by Jorge Díaz Cintas, 201-214. Amsterdam/Philadelphia: John Benjamins.

Díaz Cintas, Jorge and Aline Remael. 2007. Audiovisual Translation: Subtitling. Manchester: St. Jerome.

d’Ydewalle, Géry and Wim De Bruycker. 2003. Reading native and foreign language television subtitles in children and adults. In The mind’s eyes: Cognitive and applied aspects of eye movement research, edited by J. Hyönä, R. Radach and H. Deubel, 444-461. New York: Springer-Verlag.

d’Ydewalle, Géry and Wim De Bruycker. 2007. “Eye Movements of Children and Adults while Reading Television Subtitles.” European Psychologist 12:196–205.

d’Ydewalle, Géry and Ingrid Gielen. 1992. “Attention Allocation with Overlapping Sound, Image, and Text.” In Eye Movements and Visual Cognition: Scene Perception and Reading, edited by Keith Rayner, 415–427. New York: Springer-Verlag.

d’Ydewalle, Géry, Johan Van Rensbergen, and Joris Pollet. 1987. Reading a message when the same message is available auditorily in another language: The case of subtitling. In Eye Movements: From Physiology to Cognition edited by J.K O’Reagan and A. Lévy Schoen, 313-321. Amsterdam: Elsevier Science Publishers B.V. (North-Holland).

Ghia, Elisa. 2012. “The Impact of Translation Strategies on Subtitle Reading.” In Eye Tracking in Audiovisual Translation, edited by Elisa Perego, 155–182. Roma: Aracne Editrice.

Gottlieb, Henrik. 1998. Subtitling. In Routledge Encyclopaedia of Translation Studies, edited by Mona Baker, 244-248. London & New York: Routledge.

Hershler, Orit and Shaul Hochstein. 2005. At first sight: a high-level pop out effect for faces. Vision Research, 45, 1707–1724.

Holmqvist, Kenneth et al. 2011. Eyetracking. A Comprehensive Guide to Methods and Measures. Oxford: Oxford University Press.

Irwin, David E. 2004. Fixation location and fixation duration as indices of cognitive processing. In J.M. Henderson & F. Ferreira (Eds.), The interface of language, vision, and action: Eye movements and the visual world, 105-133. New York, NY: Psychology Press.

ITC Guidance on Standards for Subtitling. Online at:

Jensema, Carl. 2000. Eye movement patterns of captioned TV viewers. American Annals of the Deaf vo. 145, no. 3, 275-285.

Karamitroglou, Fotios. 1998. A Proposed Set of Subtitling Standards in Europe. Translation Journal 2(2).

Krejtz, Izabela, Agnieszka Szarkowska, and Krzysztof Krejtz. 2013. “The Effects of Shot Changes on Eye Movements in Subtitling.” Journal of Eye Movement Research 6 (5): 1–12.

Kruger, Jan-Louis and Faans Steyn. 2014. “Subtitles and Eye Tracking: Reading and Performance.” Reading Research Quarterly 49 (1): 105–120.

Kruger, Jan-Louis, Esté Hefer, and Gordon Matthew. 2013a. “Measuring the Impact of Subtitles on Cognitive Load: Eye Tracking and Dynamic Audiovisual Texts.” Proceedings of Eye Tracking South Africa 29-31 August 2013, Cape Town.

Kruger, Jan-Louis, Esté Hefer, and Gordon Matthew. 2013b. The impact of subtitles on academic performance at tertiary level. Paper presented at the Linguistics Society of Southern Africa annual conference in Stellenbosch, June, 2013.

Kruger, Jan-Louis. 2013. “Subtitles in the Classroom: Balancing the Benefits of Dual Coding with the Cost of Increased Cognitive Load.” Journal for Language Teaching 47(1):29–53.

Kruger, Jan-Louis, Hefer, Esté, and Gordon Matthew. 2014. Attention distribution and cognitive load in a subtitled academic lecture: L1 vs. L2. Journal of Eye Movement Research 7(5):4, 1–15.

Langton, Stephen R.H., Anna S. Law, Burton, A. Mike and Stefan R. Schweinberger. 2008. Attention capture by faces. Cognition, 107:330-342.

Pavakanun, Ubowanna. 1992. Incidental acquisition of foreign language through subtitled television programs as a function of similarity with native language and as a function of presentation mode. Unpublished doctoral thesis, Leuven, Belgium, University of Leuven.

Perego, Elisa, Fabio Del Missier, Marco Porta and Mauro Mosconi. 2010. “The Cognitive Effectiveness of Subtitle Processing.” Media Psychology 13(3):243–272.

Rajendran, Dhevi, Andrew Duchowski, Pilar Orero, Juan Martínez, and Pablo Romero-Fresco. 2013. “Effects of Text Chunking on Subtitling: A Quantitative and Qualitative Examination.” Perspectives: Studies in Translatology 21(1):5–31.

Rayner, Keith. 1984. Visual selection in reading, picture perception, and visual search: A tutorial review. In Attention and performance edited by H. Bouma and D. Bouhwhuis, vol. 10. Hillsdale, NJ: Erlbaum.

Rayner, Keith 1998. “Eye movements in reading and information processing: Twenty years of research.” Psychological Bulletin, 124:372–422.

Robson, Gary D. 2004. The closed captioning handbook. Amsterdam: Elsevier.

Romero Fresco, Pablo (in press) The Reception of Subtitles for the Deaf and Hard of Hearing in Europe. Peter Lang.

Smith, Tim, and John M. Henderson. 2008. Edit Blindness: The relationship between attention and global change blindness in dynamic scenes. Journal of Eye Movement Research 2(2), 6:1-17.

Specker, Elizabeth, A. 2008. L1/L2 Eye Movement Reading of Closed Captioning: A Multimodal Analysis of Multimodal Use. Unpublished PhD thesis. University of Arizona.

Szarkowska, Agnieszka, Krejtz, Izabela, and Łukasz Dutka. (forthcoming) The effects of subtitle presentation rate, text editing and type of subtitling on the comprehension and reading patterns of subtitles among deaf, hard of hearing and hearing viewers. To appear in: Across Languages and Cultures 2016, vol. 2.

Szarkowska, Agnieszka, Krejtz, Izabela, Kłyszejko, Zuzanna and Anna Wieczorek. 2011. “Verbatim, standard, or edited? Reading patterns of different captioning styles among deaf, hard of hearing, and hearing viewers”. American Annals of the Deaf 156 (4):363-378.

Vanderplank, Robert. 1988 “The value of teletext sub-titles in language learning”. ELT Journal 42(4):272-81.

Velichkovsky, Boris M., Joos, Markus, Helmert, Jens R., and Sebastian Pannasch. 2005. Two Visual Systems and Their Eye Movements: Evidence from Static and Dynamic Scene Perception. InCogSci 2005: Proceedings of the XXVII Conference of the Cognitive Science Society, 2283–2288.

Winke, Paula, Susan Gass, and Tetyana Syderenko. 2013. “Factors Influencing the Use of Captions by Foreign Language Learners: An Eye Tracking Study.” The Modern Language Journal 97 (1):254–275.

Yarbus, Alfred L. 1967. Eye movements and vision. New York, NY: Plenum Press.



[1] Values are taken from a number of sources and vary depending on a number of factors (see Rayner, 1984)



Jan-Louis Kruger is director of translation and interpreting in the Department of Linguistics at Macquarie University in Sydney, Australia.  He holds a PhD in English on the translation of narrative point of view. His main research interests include studies on the reception and cognitive processing of audiovisual translation products including aspects such as cognitive load, comprehension, attention allocation, and psychological immersion.

Agnieszka Szarkowska, PhD, is Assistant Professor in the Institute of Applied Linguistics at the University of Warsaw, Poland. She is the founder and head of the Audiovisual Translation Lab, a research group working on media accessibility. Her main research interests lies in audiovisual translation, especially subtitling for the deaf and the hard of hearing and audio description.

Izabela Krejtz, PhD, is Assistant Professor at University of Social Sciences and Humanities, Warsaw. She is a co-founder of Eyetracking Research Center at USSH. Her research interests include neurocognitive and educational psychology. Her applied work focuses on pro-positive trainings of attention control, eye tracking studies in perception of audiovisual material and emotions regulation.