From Subtitles to SMS: Eye Tracking, Texting and Sherlock – Tessa Dwyer


As we progress into the digital age, text is experiencing a resurgence and reshaping as blogging, tweeting and phone messaging establish new textual forms and frameworks. At the same time, an intrusive layer of text, obviously added in post, has started to feature on mainstream screen media – from the running subtitles of TV news broadcasts to the creative portrayals of mobile phone texting on film and TV dramas. In this paper, I examine the free-floating text used in BBC series Sherlock (2010–). While commentators laud this series for the novel way it integrates text into its narrative, aesthetic and characterisation, it requires eye tracking to unpack the cognitive implications involved. Through recourse to eye tracking data on image and textual processing, I revisit distinctions between reading and viewing, attraction and distraction, while addressing a range of issues relating to eye bias, media access and multimodal redundancy effects.

Figure 1

Figure 1: Press conference in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.


BBC’s Sherlock (2010–) has received considerable acclaim for its creative deployment of text to convey thought processes and, most notably, to depict mobile phone messaging. Receiving high-profile write-ups in The Wall Street Journal (Dodes, 2013) and Wired UK, this innovative representational strategy has been hailed an incisive reflection of our current “transhuman” reality and “a core element of the series’ identity” (McMillan 2014).[1] In the following discussion, I deploy eye tracking data to develop an alternate perspective on this phenomenon. While Sherlock’s on-screen text directly engages with the emerging modalities of digital and online technologies, it also borrows from more conventional textual tools like subtitling and captioning or SDH (subtitling for the deaf and hard-of-hearing). Most emphatically, the presence of floating text in Sherlock challenges the presumption that screen media is made to be viewed, not read. To explore this challenge in detail, I bring Sherlock’s inventive titling into contact with eye tracking research on subtitle processing, using insights from audiovisual translation (AVT) studies to investigate the complexities involved in processing dynamic text on moving-image screens. Bridging screen and translation studies via eye tracking, I consider recent on-screen text developments in relation to issues of media access and linguistic diversity, noting the gaps or blind spots that regularly infiltrate research frameworks. Discussion focuses on ‘A Study in Pink’ – the first episode of Sherlock’s initial season – which producer Sue Vertue explains was actually “written and shot last, and so could make the best use of onscreen text as additional script and plot points” (qtd in McMillan, 2014).

Texting Sherlock

Figure 2

Figure 2: Watson reads a text message in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

The phenomenon under investigation in this article is by no means easy to define. Already it has inspired neologisms, word mashes and acronyms including TELOP (television optical projection), ‘impact captioning’ (Sasamoto, 2014), ‘decotitles’ (Kofoed, 2011), ‘beyond screen text messaging’ (Zhang 2014) and ‘authorial titling’ (Pérez González, 2012). While slight differences in meaning separate such terms from one another, the on-screen text in Sherlock fits all. Hence, in this discussion, I alternate between them and often default to more general terms like ‘titling’ and ‘on-screen text’ for their wide applicability across viewing devices and subject matter. This approach preserves the terminological ambiguity that attaches to this phenomenon instead of seeking to solve it, finding it symptomatic of the rapid rate of technological development with which it engages. Whatever term is decided upon today could well be obsolete tomorrow. Additionally, as Rick Altman (2004: 16) notes in his ‘crisis historiography’ of silent and early sound film, the “apparently innocuous process of naming is actually one of culture’s most powerful forms of appropriation.” He argues that in the context of new technologies and the representational codes they engender, terminological variance and confusion signals an identity crisis “reflected in every aspect of the new technology’s socially defined existence” (19).

According to the write-ups, phone messaging is the hero of BBC’s updated and rebooted Sherlock adaptation. Almost all the press garnered around Sherlock’s on-screen text links this strategy to mobile phone ‘texting’ or SMS (short messaging service). Reporting on “the storytelling challenges of a world filled with unglamorous smartphones, texting and social media”, The Wall Street Journal’s Rachel Dodes (2013) credits Sherlock with solving this dilemma and establishing a new convention for depicting texting on the big screen, creatively capturing “the real world’s digital transformation of everyday life.” For Mariel Calloway (2013), “Sherlock is honest about the role of technology and social media in daily life and daily thought… the seamless way that text messages and internet searches integrate into our lives.” Wired’s Graeme McMillan (2014) ups the ante, naming Sherlock a “new take” on “television drama as a whole” due precisely to its on-screen texting technique that sets it apart from other “tech-savvy shows out there”. McMillan continues, that “as with so many aspects of Sherlock, there’s an element of misdirection going on here, with the fun, eye-catching slickness of the visualization distracting from a deeper commentary the show is making about its characters relationship with technology – and, by extension, our own relationship with it, as well.”

As this flurry of media attention makes clear, praise for Sherlock’s on-screen text or texting firmly anchors this strategy to technology and its newly evolving forms, most notably the iPhone or smartphone. Appearing consistently throughout the series’ three seasons to date, on-screen text in Sherlock occurs in a plain, uniform white sans-serif font that appears unadorned over the screen image, obviously added during post-production. This text is superimposed, pure and simple, relying on neither text bubbles nor coloured boxes nor sender ID’s to formally separate it from the rest of the image area. As Michele Tepper (2011) eloquently notes, by utilising text in this way, Sherlock “is capturing the viewer’s screen as part of the narrative itself”:

It’s a remarkably elegant solution from director Paul McGuigan. And it works because we, the viewing audience, have been trained to understand it by the last several years of service-driven, multi-platform, multi-screen applications. Last week’s iCloud announcement is just the latest iteration of what can happen when your data is in the cloud and can be accessed by a wide range of smart-enough devices. Your VOIP phone can show caller ID on your TV; your iPod can talk to both your car and your sneakers; Twitter is equally accessible via SMS or a desktop application. It doesn’t matter where or what the screen is, as long as it’s connected to a network device. … In this technological environment, the visual conceit that Sherlock’s text message could migrate from John Watson’s screen to ours makes complete and utter sense.

Unlike on-screen text in Glee (Fox, 2009–), for instance (see Fig. 3), that is used only occasionally in episodes like ‘Feud’ (Season 4, Ep 16, March 14, 2013), Sherlock flaunts its on-screen text as signature. Its consistently interesting textual play helps to give the series cohesion. Yet, just as it aids in characterisation, helps to progress the narrative, and binds the series as a whole, it also, necessarily, remains at somewhat of a remove, as an overtly post-production effect.

Figure 3

Figure 3: Ryder chats online in ‘Feud’, Glee (2013), Episode 16, Season 4.

While Tepper (2011) explains how Sherlock’s “disembodied” (Banks, 2014) texting ‘makes sense’ in the age of cross-platform devices and online clouds, this argument falters when the on-screen text in question is less overtly technological. The extradiegetic nature of this on-screen text – so obviously a ‘post’ effect – is brought to the fore when it is used to render thoughts and emotions rather than technological interfacing. In ‘A Study in Pink’, a large proportion of the text that pops up intermittently on-screen functions to represent Sherlock’s interiority, not his Internet prowess. In concert with camera angles and “microscopic close-ups”, it elucidates Sherlock’s forensic “mind’s eye” (Redmond, Sita and Vincs, this issue), highlighting clues and literally spelling out their significance (see Figs. 4 and 5). The fact that these human-coded moments of titling have received far less attention in the press than those that more directly index new technologies is fascinating in itself, revealing the degree to which praise for Sherlock’s on-screen text is invested in ideas of newness and technological innovation – underlined by the predilection for neologisms.

Figure 4

Figures 4: Sherlock examines the pink lady’s ring in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

Figure 5

Figures 5: Sherlock examines the pink lady’s ring in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

Of course, even when not attached to smartphones or data retrieval, Sherlock’s deployment of on-screen text remains fresh, creative and playful and still signals perceptual shifts resulting from technological transformation. Even when representing Sherlock’s thoughts, text flashes on screen manage to recall the excesses of the digital, when email, Facebook and Twitter ensconce us in streams of endlessly circulating words, and textual pop-ups are ubiquitous. Nevertheless, the blinkered way in which Sherlock’s on-screen text is repeatedly framed as, above all, a means of representing mobile phone texting functions to conceal some of its links to older, more conventional forms of titling and textual intervention, from silent-era intertitles to expository titles to subtitles. By relentlessly emphasising its newness, much discussion of Sherlock’s on-screen text overlooks links to a host of related past and present practices. Moreover, Sherlock’s textual play actually invites a rethinking of these older, ongoing text-on-screen devices.

Reading, Watching, Listening

As Szarkowska and Kruger (this issue) explain, research into subtitle processing builds upon earlier eye tracking studies on the reading of static, printed text. They proceed to detail differences between subtitle and ‘regular’ reading, in relation to factors like presentation speed, information redundancy, and sensory competition between different multimodal channels. Here, I focus on differences between saccadic or scanning movements and fixations, in order to compare data across the screen and translation fields. During ‘regular’ reading (of static texts) average saccades last 20 to 50 milliseconds (ms) while fixations range between 100 and 500ms, averaging 200 to 300ms (Rayner, 1998). Referencing pioneering studies into subtitle processing by Géry d’Ydewalle and associates, Szarkowska et al. (2013: 155) note that “when reading film subtitles, as opposed to print, viewers tend to make more regressions” and fixations tend to be shorter. Regressions occur when the eye returns to material that has already been read, and Rayner (1998: 393) finds that slower readers (of static text) make more regressions than faster readers. A study by d’Ydewalle and de Bruycker (2007: 202) found “the percentage of regressions in reading subtitles was globally, among children and adults, much higher than in normal text reading.” They also report that mean fixation durations in the subtitles was shorter, at 178 ms (for adults) and explain that subtitle regressions (where the eye travels back across words already read) can be partly explained by the “considerable information redundancy” that occurs when “[s]ubtitle, soundtrack (including the voice and additional information such as intonation, background noise, etc.), and image all provide partially overlapping information, eliciting back and forth shifts with the image and more regressive eye-movements” (202).

What happens to saccades and fixations when image processing is brought into the mix? When looking at static images, average fixations last 330 ms (Rayner, 1998). This figure is slightly longer than average fixations during regular reading and longer again than average subtitle fixations. Szarkowska and Kruger (this issue) note that “reading requires many successive fixations to extract information whereas looking at a scene requires fewer, but longer fixations” that tend to be more exploratory or ambient in nature, taking in a greater area of focus. In relation to moving-images, Smith (2013: 168) finds that viewers take in roughly 3.8% of the total screen area during an average length shot. Peripheral processing is at play but “is mostly reserved for selecting future saccade targets, tracking moving targets, and extracting gist about scene category, layout and vague object information”. In thinking about these differences in regular reading behaviour, screen viewing, and subtitle processing, it is noticeable that with subtitles, distinctions between fixations and saccades are less clear-cut. While saccades last between 20 and 50ms, Smith (2013: 169) notes that the smallest amount of time taken to perform a saccadic eye movement (taking into account saccadic reaction time) is 100-130ms. Recalling d’Ydewalle and de Bruycker’s (2007: 202) finding that fixations during subtitle processing last around 178ms, it would seem that subtitle conditions blur the boundaries somewhat between saccades and fixations, scanning and reading.

Interestingly, studies have also shown that the processing of two-line subtitles involves more regular word-by-word reading than for one-liners (D’Ydewalle and de Bruycker, 2007: 199). D’Ydewalle and de Bruycker (2007: 199) report, for instance, that more words are skipped and more regressions occur for one-line subtitles than for two-line subtitles. Two-line subtitles result in a larger proportion of time being spent in the subtitle area, and occasion more back-and-forth shifts between the subtitles and the remaining image area (201). This finding suggests that the processing of one-line subtitles differs considerably from regular reading behaviour. D’Ydewalle and de Bruycker (2007: 202) surmise that the distinct way in which one-line subtitles are processed relates to a redundancy effect caused by the multimodal nature of screen media. Noting how one-line subtitles often convey short exclamations and outcries, they suggest that a “standard one-line subtitle generally does not provide much more information than what can already be extracted from the picture and the auditory message.” They conclude that one-line subtitles occasion “less reading” than two-line subtitles (202). Extrapolating further, I posit that the routine overlapping of information that occurs in subtitled screen media blurs lines between reading and watching. One-line subtitles are ‘read’ irregularly and partly blind – that is, they are regularly skipped and processed through saccadic eye movements rather than fixations.

This suggestion is supported by data on subtitle skipping. Szarkowska and Kruger (this issue) find that longer subtitles containing frequently used words are easier and quicker to process than shorter subtitles containing low-frequency words. Hence, they conclude that cognitive load relates more to word familiarity than quantity, something that is overlooked in many professional subtitling guidelines. This finding indicates that high-frequency words are processed ‘differently’ in subtitling than in static text, in a manner more akin to visual recognition or scanning than reading. Szarkowska and Kruger find that high-frequency words in subtitles are often skipped. Hence, as with one-line subtitles, high-frequency words are, to a degree, processed blind, possibly through shape recognition and mapping more than durational focus. In relation to other types of on-screen text, such as the short, free-floating type that characterises Sherlock, it seems entirely possible that this innovative mode of titling may just challenge distinctions between text and image processing. While commentators laud this series for the way it integrates on-screen text into its narrative, style and characterisation, eye tracking is required to unpack the cognitive implications of Sherlock’s text/image morph.

The Pink Lady

Figure 6

Figure 6: Letters scratched into the floor in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

Sherlock producer Vertue refers to the pink lady scene in ‘A Study in Pink’ as particularly noteworthy for its “text all around the screen”, referring to it as the “best use” of on-screen text in the series (qtd in McMillan, 2014). In this scene, a dead woman dressed in pink lies face first on the floor of a derelict building into which she has painstakingly etched a word or series of letters (‘Rache’) with her fingernails. As Sherlock investigates the crime scene, forensics officer Anderson interrupts to explain that ‘Rache’ is the German word for ‘revenge’. The German-to-English translation pops up on screen (see Fig. 6), and this time Sherlock sees it too. This superimposed text, so obviously laid over the image, oversteps its surface positioning to enter Sherlock’s diegetic space, and we next view it backwards, from Sherlock’s point of view, not ours (see Fig. 7). After an exasperated eye roll that signals his disregard for Anderson, Sherlock dismisses this textual intervention and we watch it swirl into oblivion. Here, on-screen text is at once both inside and outside the narrative, diegetic and extra-diegetic, informative and affecting. In this way it self-reflexively draws attention to the show’s narrative framing, demonstrating its complexity as distinct diegetic levels merge.

Figure 7

Figure 7: Sherlock sees on-screen text in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

For Carol O’Sullivan (2011), when on-screen text affords this type of play between the diegetic and extra-diegetic it functions as an “extreme anti-naturalistic device” (166) that she unpacks via Gérard Genette’s notion of narrative metalepsis (164). Detailing numerous examples of humourous, formally transgressive diegetic subtitles, such as those found in Annie Hall (Woody Allen, 1977) (Fig. 8), O’Sullivan points to their metatextual function, referring to them as “metasubtitles” (166) that implicitly comment on the limits and nature of subtitling itself. When Sherlock’s on-screen titles oscillate between character and viewer point-of-view shots, they too become metatextual, demonstrating, in Genette’s terms, “the importance of the boundary they tax their ingenuity to overstep in defiance of verisimilitude – a boundary that is precisely the narrating (or the performance) itself: a shifting but sacred frontier between two worlds, the world in which one tells, the world of which one tells” (qtd in O’Sullivan 2011: 165). Moreover, for O’Sullivan, “all subtitles are metatextual” (166) necessarily foregrounding their own act of mediation and interpretation. Specifically linking such ideas to Sherlock, Luis Perez Gonzalez (2012: 18) notes how “the series creators incorporate titles that draw attention to the material apparatus of filmic production”, thereby creating an complex alienation-attraction effect “that shapes audience engagement by commenting upon the diegetic action and disrupting conventional forms of semiotic representation, making viewers consciously work as co-creators of media content.”

Figure 8

Figure 8: Subtitled thoughts in the balcony scene, Annie Hall (1977).

Eye Bias

One finding from subtitle eye tracking research particularly pertinent to Sherlock is the notion that on-screen text causes eye bias. This was established in various studies conducted by d’Ydewalle and associates, which found that subtitle processing is largely automatic and obligatory. D’Ydewalle and de Bruycker (2007: 196) state:

Paying attention to the subtitle at its presentation onset is more or less obligatory and is unaffected by major contextual factors such as the availability of the soundtrack, knowledge of the foreign language in the soundtrack, and important episodic characteristics of actions in the movie: Switching attention from the visual image to “reading” the subtitles happens effortlessly and almost automatically (196).

This point is confirmed by Bisson et al. (2014: 399) who report that participants read subtitles even in ‘reversed’ conditions – that is, when subtitles are rendered in an unfamiliar language and the screen audio is fully comprehensible (in the viewers’ first language) (413). Again, in intralingual or same-language subtitling – when titles replicate the language spoken on screen –hearing audiences still divert to the subtitle area (413). These findings indicate that viewers track subtitles irrespective of language or accessibility requirements. In fact, the tracking of subtitles overrides function. As Bisson et al. (413) surmise, “the dynamic nature of the subtitles, i.e., the appearance and disappearance of the subtitles on the screen, coupled with the fact that the subtitles contained words was enough to generate reading behavior”.

Szarkowska and Kruger (this issue) reach a similar conclusion, explaining eye bias towards subtitles in terms of both bottom-up and top-down impulses. When subtitles or other forms of text flash up on screen, they affect a change to the scene that automatically pulls our eyes. The appearance and disappearance of text on screen is registered in terms of motion contrast, which according to Smith (2013: 176), is the “critical component predicting gaze behavior”, attaching to small movements as well as big. Additionally, we are drawn to words on screen because we identify them as a ready source of relevant information, as found in Batty et al. (forthcoming). Analysing a dialogue-free montage sequence from animated feature Up (Pete Docter, 2009), Batty et al. found that on-screen text in the form of signage replicates in miniature how ‘classical’ montage functions as a condensed form of storytelling aiming for enhanced communication and exposition. They suggest that montage offers a rhetorical amplification of an implicit intertitle, thereby alluding to the historical roots of text on screen while underlining its narrative as well as visual salience. One frame from the montage sequence focuses in close-up on a basket containing picnic items and airline tickets (see Fig. 9). Eye tracking tests conducted on twelve participants indicates a high degree of attentional synchrony in relation to the text elements of the airline ticket on which Ellie’s name is printed. Here, text provides a highly expedient visual clue as to the narrative significance of the scene and viewers are drawn to it precisely for its intertitle-like, expository function, highlighting the top-down impulse also at play in the eye bias caused by on-screen text.

Figure 9

Figure 9: Heat map showing collective gaze weightings during the montage sequence in Up (2009).

In this image from Up, printed text appears in the centre of the frame and, as Smith (2013: 178) elucidates, eyes are instinctively drawn towards frame centre, a finding backed up by much subtitle research (see Skarkowska and Kruger, this issue). However, eye tracking results on Sherlock conducted by Redmond, Sita and Vincs (this issue) indicate that viewers also scan static text when it is not in the centre of the frame. In an establishing shot of 221B Baker Street from the first episode of Sherlock’s second season, ‘A Scandal in Belgravia’, viewers track static text that borders the frame across its top and right hand sides, again searching for information (See Fig. 10). Hence, the eye-pull exerted by text is noticeable even in the absence of movement, contrast and central framing. In part, viewers are attracted to text simply because it is text – identified as an efficient communication mode that facilitates speedy comprehension (see Lavaur, 2011: 457).

Figure 10

Figure 10: Single viewer gaze path for ‘A Scandal in Belgravia’, Sherlock (2012), Episode 1, Season 2.


What do these eye tracking results across screen and translation studies tell us about Sherlock’s innovative use of on-screen text and texting? Based on the notion that text on screen draws the eye in at least dual ways, due to both its dynamic/contrastive nature and its communicative expediency, we can surmise that for Sherlock viewers, on-screen text is highly visible and more than likely to be in that 3.8% of the screen on which they will focus at any one point in time (see Smith, 2013: 168). The marked eye bias caused by text on screen is further accentuated in Sherlock by the freshness of its textual flashes, especially for English-speaking audiences given the language hierarchies of global screen media (see Acland 2012, UNESCO 2013). The small percentage of foreign-language media imported into most English-speaking markets tends to result in a lack of familiarity with subtitling beyond niche audience segments. For those unfamiliar with subtitling or captioning, on-screen text appears particularly novel. Additionally, as explored, floating TELOPs in Sherlock attract attention due to the complex functions they fulfil, providing narrative and character clues as well as textual and stylistic cohesion. As Tepper (2011) points out, in the first episode of the series, viewers are introduced to Sherlock’s character via text, before seeing him on screen. “When he texts the word ‘Wrong!’ to DI Lestrade and all the reporters at Lestrade’s press conference,” notes Tepper, “the technological savvy and the imperiousness of tone tell you most of what you need to know about the character.”

There seems no doubt that on-screen text in Sherlock attracts eye movement, and that it therefore distracts from other parts of the image. One question then that immediately presents itself is why Sherlock’s textual distractions are tolerated – even celebrated – to a far greater extent than other, more conventional or routine forms of titling like subtitles and captions. While Sherlock’s on-screen text is praised as innovative and incisive, interlingual subtitling and SDH are criticised by detractors for the way in which they supposedly force viewers to read rather than watch, effectively transforming film into “a kind of high-class comic book with sound effects” (Canby, 1983).[2] Certainly, differences in scale affect such attitudes and the quantitative variance between post-subtitles (produced for distribution only) and authorial or diegetic titling (as seen in Sherlock) is pronounced.[3] However, eye tracking research on subtitle processing indicates that, on the whole, viewers easily accommodate the increased cognitive load it presents. Although attentional splitting occurs, leading to an increase in back-and-forth shifts between the subtitles and the rest of the image area (Skarkowska and Kruger, this issue), viewers acclimatise by making shorter fixations than in regular reading and by skipping high-frequency words and subtitles while still managing to register meaning (see d’Ydewalle and de Bruycker, 2007: 199). In this way, subtitle processing reveals many differences to reading of static text, and approximates techniques of visual scanning. Bearing these findings in mind, I propose it is more accurate to see subtitling as transforming reading into viewing and text into image, rather than vice versa.

Situating Sherlock in relation to a range of related TELOP practices across diverse TV genres (such as game shows, panel shows, news broadcasting and dramas) Ryoko Sasamoto (2014: 7) notes that the additional processing effort caused by on-screen text is offset by its editorial function.[4] TELOPs are often deployed by TV producers to guide interpretation and ensure comprehension by selecting and highlighting information deemed most relevant. This suggestion is backed up by research by Rei Matsukawa et al. (2009), which found that the information redundancy effect caused by TELOPs facilitates understanding of TV news. For Sasamoto (2014: 7), ‘impact captioning’ highlights salient information in much the same way as voice intonation or contrastive stress. It acts as a “written prop on screen” enabling “TV producers to achieve their communicative aims… in a highly economical manner” (8). Focusing on Sherlock specifically, Sasamoto suggests that its captioning provides “a route for viewers into complex narratives” (9). Moreover, as Szarkowska and Kruger (this issue) note, in static reading conditions, “longer fixations typically reflect higher cognitive load.” Consequently, the shorter fixations that characterise subtitle viewing supports the contention that on-screen text processing is eased by its expedient, editorial function and by redundancy effects resulting from its multimodality.

Switched On

Another way in which Sherlock’s text and titling innovations extend beyond mobile phone usage was exemplified in July 2013 by a promotional campaign that promised viewers a ‘sneak peak’ at a yet-to-be-released episode title, requiring them to find and piece together a series of clues. In true Sherlockian style, the clues were well hidden, only visible to viewers if they switched on closed-captioning or SDH available for deaf and hard-of-hearing audiences. With this device turned on, viewers encountered intralingual captioning along the bottom of their screen and additionally, individually boxed letters that appeared top left (see Figs. 11 and 12). Viewers needed to gather all these single letter clues in order to deduce the episode title: ‘His Last Vow’. According to the ‘I Heart Subtitles’ blog (July 16, 2013), in doing so, Sherlock once again displayed its ability to “think outside the box and consider all…options”. It also cemented its commitment to on-screen text in various guises, and effectively gave voice to an audience segment typically disregarded in screen commentary and analysis. Through this highly unusual, cryptic campaign, Sherlock alerted viewers to more overtly functional forms of titling, and intimated points of connection between language, textual intervention and access.

Figure 11

Figures 11: Boxed letter clues (top left of frame) that appeared when closed captioning was switched on, during a re-run of ‘A Scandal in Belgravia’, Sherlock (2012), Episode 1, Season 2.

Figure 12

Figures 12: Boxed letter clues (top left of frame) that appeared when closed captioning was switched on, during a re-run of ‘A Scandal in Belgravia’, Sherlock (2012), Episode 1, Season 2.


On-screen text invites a rethinking of the visual, expanding its borders and blurring its definitional clarity. Eye tracking research demonstrates that moving text on screens is processed differently to static text, affected by a range of factors issuing from its multimodal complexity. Sherlock subtly signals such issues through its playful, irreverent deployment of text, which enables viewers to directly access Sherlock’s thoughts and understand his reasoning, while also distancing them, asking them to marvel at his ‘millennial’ technological prowess (Stein and Busse, 2012: 11) while remaining self-consciously aware of his complex narrative framing as it flips inside out, inviting audiences to watch themselves watching. Such diegetic transgression is yet to be mapped through eye tracking, intimating a profitable direction for future studies. To date, data on text and image processing demonstrates how on-screen text attracts eye movement and hence, it can be inferred that it distracts from other parts of the image area. Yet, despite rendering more of the image effectively ‘invisible’, text in the form of TELOPs are increasingly prevalent in news broadcasts, current affairs panel shows (when audience text messages are displayed) and, most notably, in Asian TV genres where they are now a “standard editorial prop” featured in many dramas and game shows (Sasamoto, 2014: 1). In order to take up the challenge presented by such emerging modes of screen address, research needs to move beyond surface assessments of the attraction/distraction nexus. It is the very attraction to TELOP distraction that Sherlock – via eye tracking – brings to the fore.



Acland, Charles. 2012. “From International Blockbusters to National Hits: Analysis of the 2010 UIS Survey on Feature Film Statistics.” UIS Information Bulletin 8: 1-24. UNESCO Institute for Statistics.

Altman, Rick. 2004. Silent Film Sound. New York: Columbia University Press.

Banks, David. 2012. “Sherlock: A Perspective on Technology and Story Telling.” Cyborgology, January 25. Accessed October 9, 2014.

Batty, Craig, Adrian Dyer, Claire Perkins and Jodi Sita (forthcoming). “Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative.” In Making Sense of Cinema: Empirical Studies into Film Spectators and Spectatorship, edited by Carrie Lynn D. Reinhard and Christopher J. Olson. London and New York: Bloomsbury.

Bennet, Alannah. 2014. “From Sherlock to House of Cards: Who’s Figured Out How to Translate Texting to Film.” Bustle, August 18. Entertainment. Accessed October 9.

Biedenharn, Isabella. 2014. “A Brief Visual History of On-Screen Text Messages in Movies and TV.Flavorwire, April 24. Accessed October 13.

Bisson, Marie-Jos´ee, Walter J. B. Van Heuven, Kathy Conklin And Richard J. Tunney. 2014. “Processing of native and foreign language subtitles in films: An eye tracking study.” Applied Psycholinguistics 35: 399–418. Accessed October 13, 2014. doi: 10.1017/S0142716412000434.

Calloway, Mariel. 2013. “The Game is On(line): BBC’s ‘Sherlock’ in the Age of Social MediaMariel Calloway, March 8. Accessed October 14, 2014.

Canby, Vincent. 1983. “A Rebel Lion Breaks Out.” New York Times, March 27, 21.

Dodes, Rachel. 2013. “From Talkies to Texties.” Wall Street Journal, April 4, Arts and Entertainment Section. Accessed October 13, 2014.

d’Ydewalle, Géry and Wim De Bruycker, 2007. “Eye movements of children and adults while reading television subtitles.” European Psychologist 12 (3): 196-205.

Kofoed, D. T. 2011. “Decotitles, the Animated Discourse of Fox’s Recent Anglophonic Internationalism.” Reconstruction 11 (1). Accessed October 5, 2012.

Lavaur, Jean-Marc and Dominic Bairstow. 2011. “Languages on the screen: Is film comprehension related to the viewers’ fluency level and to the language in the subtitles?” International Journal of Psychology 46 (6): 455-462. doi: 10.1080/00207594.2011.565343.

McMillan, Graeme. 2014. “Sherlock’s Text Messages Reveal Our TranshumanismWired UK, February 3. Accessed October 14.

Matsukawa, Rei, Yosuke Miyata and Shuichi Ueda. 2009. “Information Redundancy Effect on Watching TV News: Analysis of Eye Tracking Data and Examination of the Contents.” Literary and Information Science 62: 193-205.

O’Sullivan, Carol. 2011. Translating Popular Film. Basingstoke and New York: Palgrave Macmillan.

Pérez González, Luis. 2013. “Co-Creational Subtitling in the Digital Media: Transformative and Authorial Practices.” International Journal of Cultural Studies 16 (1): 3-21. Accessed September 25, 2014. doi: 10.1177/1367877912459145.

Rayner, K. 1998. “Eye Movements in Reading and Information Processing: 20 Years of Research.” Psychological Bulletin 124: 372-422.

Redmond, Sean, Jodi Sita and Kim Vincs. 2015. “Our Sherlockian Eyes: The Surveillance of VisionRefractory: a Journal of Entertainment Media, 25.

Romero-Fresco, Pablo. 2013. “Accessible filmmaking: Joining the dots between audiovisual translation, accessibility and filmmaking.” JoSTrans: The Journal of Specialised Translation 20: 201-23. Accessed September 20, 2014.

Sasamoto, Ryoko. 2014. “Impact caption as a highlighting device: Attempts at viewer manipulation on TV.” Discourse, Context and Media 6: 1-10. Accessed September 18 (Article in Press). doi: 10.1016/j.dcm.2014.03.003.

Schrodt, Paul. 2013. “This is How to Shoot Text MessagingEsquire, February 4. The Culture Blog. Accessed October 13, 2014.

Smith, Tim J. 2013. “Watching You Watch Movies: Using Eye Tracking to Inform

Cognitive Film Theory” in Psychocinematics: Exploring Cognition at the Movies, edited by Arthur P. Shimamura, 165-91. Oxford and New York: Oxford University Press. Accessed October 7, 2014. doi:

Stein, Louise Ellen and Kristina Busse. 2012. “Introduction: The Literary, Televisual and Digital Adventures of the Beloved Detective.” In Sherlock and Transmedia Fandom: Essays on the BBC Series, edited by Louise Ellen Stein and Kristina Busse, 9-24. Jefferson: McFarland and Company.

Szarkowska, Agnieszka et. al. 2013. “Harnessing the Potential of Eye-Tracking for Media Accessibility.” in Translation Studies and Eye-Tracking Analysis, edited by Sambor Grucza, Monika Płużyczka and Justyna Zając, 153-83. Frankfurt am Mein: Peter Lang.

Szarkowska, Agnieszka and Jan Louis Kruger. 2015. “Subtitles on the Moving Image: An Overview of Eye Tracking Studies.” Refractory: a Journal of Entertainment Media, 25.

Tepper, Michele. 2011. “The Case of the Travelling Text Message.” Interactions Everywhere, June 14. Accessed October 14, 2014.

UNESCO. 2013. “Feature Film Diversity”, UIS Fact Sheet 24, May. Accessed October 3, 2014.

Zhang, Sarah. 2014. “How Hollywood Figured Out A Way To Make Texting In Movies Look Less Dumb.Gizmodo, August 18. Accessed August 19.

Zhou, Tony. 2014. “A Brief Look at Texting and the Internet in Film”. Video Essay, Every Frame a Painting, August 15. Accessed August 19.


List of Figures




[1] While some commentators point out that Sherlock was by no means the first to depict text messaging in this way – as floating text on screen – it is this series more than any other that has brought this phenomenon into the limelight. Other notable uses of on-screen text to depict mobile phone messaging occur in films All About Lily Chou-Chou (Iwai, 2001), Disconnect (Rubin, 2013), The Fault in Our Stars (Boone, 2014), LOL (Azuelos, 2012), Non-Stop (Collet-Serra, 2014), Wall Street: Money Never Sleeps (Stone, 2010), and in TV series Glee (Fox, 2009–), House of Cards (Netflix, 2013–), Hollyoaks (Channel 4, 1995–), Married Single Other (ITV, 2010) and Slide (Fox8, 2011). For discussion of some ‘early adopters’, see Biendenharn 2014.



[2] Notably, in this New York Times piece, Canby (1983) actually defends subtitling against this charge, and advocates for subtitling over dubbing.

[3] On distinctions between post-subtitling and pre-subtitling (including diegetic subtitling), see O’Sullivan (2011).

[4] According to Sasamoto (2014: 1), “the use of OCT [Open Caption Telop] as an aid for enhanced viewing experience originated in Japan in 1990.”



Dr Tessa Dwyer teaches Screen Studies at the University of Melbourne, specialising in language politics and issues of screen translation. Her publications have appeared in journals such as The Velvet Light Trap, The Translator and The South Atlantic Quarterly and in a range of anthologies including B is for Bad Cinema (2014), Words, Images and Performances in Translation (2012) and the forthcoming Locating the Voice in Film (2016), Contemporary Publics (2016) and the Routledge Handbook of Audiovisual Translation (2017). In 2008, she co-edited a special issue of Refractory on split screens. She is a member of the ETMI research group and is currently writing a book on error and screen translation.