Our Sherlockian Eyes: the Surveillance of Vision – Sean Redmond, Jodi Sita and Kim Vincs


For this inter-disciplinary article, we undertook a pilot case study that eye-tracked the ‘Holmes Saves Mrs. Hudson’ sequence from the episode, A Scandal in Belgravia (Sherlock, BBC, 2012). This small-scale empirical study involved a total of 13 participants (3 males and 10 females, mean age was: 27 years), comprised of a mixture of academics and undergraduate students at La Trobe University in Melbourne, Australia. The article examines its findings through a range of threaded frames – neuroscience, forensics, surveillance, haptics, memory, performance-movement, and relationality – and uniquely draws upon the interests of the authors to set the examination in context. The article is both a reading of Sherlock and a dialogue between its authors. We discover that the codes and conventions of Sherlock have a direct impact on where viewers look but we also discover eyes emerging in the periphery of the frame, and we account for these ways of seeing in different ways.

My Sherlockian Eyes

Sean Redmond

I have always been fascinated, perhaps even obsessed, with my eyes. I have often felt them looking into things, as if they had their own embodied consciousness that I was entirely, simultaneously, conscious of. It was as if we, my eyes and I, saw the world separately and together, possessing a double vision, one set within the meaty windows of my sockets, and the other looking outside, grasping the world with a replete hapticity, sending shivers across my pupils and retinas as they did so.

I have found myself trying to catch my eyes out, to second guess their movements, their sightlines, and their interests. I must be a sight for sore eyes on the rush hour train, wrestling with what I will allow my eyes to see. I often try to resist my conforming eyes, to make them look towards the cultural periphery, to the aesthetic margins, and to the haphazard shards of broken, refracted light on oily windows that few others see as they go about their busy, and sometimes dreary lives. I have a deep yearning to see my eyes politicised, to turn them completely into organs of touch (Marks, 2000), and to feel them wander freely across the intricate layers of the film and television screen. I want Sherlockian eyes.

I have held a rather romantic notion about my viewing eyes, and the eyes of some viewers: that they sometimes wander freely across the spaces, objects, lights, colours, bodies, movements and sounds of the diegetic world they are presented with. Narrative action may be centre frame, and all the elements of the mise en scène may be attempting to draw one’s eyes to this interaction, but I will catch myself looking to the far left of the screen, to hold my sight on an obscure pattern on a wall, or to search for the origins of a distant minor or insignificant sound just off-screen. I want to see inside and outside the narrative simultaneously. I imagine my eyes as Sherlock-like, searching for narrative clues, new plot developments, and for the sensuous expression of character, mood and feeling. But I also see them loosing or freeing themselves; my eyes (unconsciously) float within all the elements of filmic or televisual material as they happen on the screen.

I see in Sherlock’s eyes this double vision: the ability to have foresight, to see into the margins of things, and to be consciously aware of the vision within and all around him. As Sherlock sees into the finest grain of things, so do my eyes and I. My Sherlockian eyes are forensic, haptic, self-processing and are blessed with twenty twenty vision – they have the power to see into all things clearly. Sherlock mirrors or rather embodies the very qualities of the cinema machine (Metz, 1982), and of the surveillance regimes (Foucault, 1977), that emerged at the time the first Sherlock Holmes books were written (1887-1927). Sherlock is a text that already embodies the eye tracking experience.

But is this so, or just a fictive longing? What evidence do we have that our eyes do what we say they do? What evidence do we have that viewers possess a double vision? This romantic, phenomenological notion of the viewing, carnal, haptic eyes, then, we wanted to test, to explore, to see in action and interaction…

The Science of the Sherlockian Eye

Jodi Sita

I can often be found staring off into space, deep in thought, looking at nothing in particular. If I were being eye tracked it would look like I was staring at something. Where people look and, more particularly, why they are looking there, are questions that fascinate me and make me think about the phenomenon of blank stares. Human beings have fascinating eyes that, because they are housed independently, with their own localized environments, need to and can move about quite a lot. A tiny spot on the back of the eye, the retina, houses the receptors for high visual acuity, and this spot must be directed at the object we want to see, in order for us to see it clearly, and see its fine detail. People tend to move their eyes to aspects of a scene which are interesting or useful. The visual system in the brain directs these movements; they are not random. Bottom–up control processes (see; Itti & Koch, 2000) help direct some shifts in visual gaze and involve features that are thought to attract attention due to their ability to be noticeable. These include salient features such as luminosity, colour and movement. More importantly, in the human gaze we know that top-down processes are at play when viewing complex or meaningful scenes; our eyes employ feature selection which are based on our understanding of the scene and our internal expectations about where important things are or are likely to occur (Torralba et al., 2006; Birmingham et al., 2008, and Vincent et al., 2009).

What I am curious to learn more about is how our viewing behavior is shaped by what we are doing, by how we are interacting with the world and how our brains are responding to that and shaping that encounter. Thus, my involvement in this work comes from these curiosities, however, it also stems from my own forensic tendencies. It develops from my own need to ask the Sherlockian questions about viewers viewing Sherlock.

My early research had me investigate a branch of forensic science; handwriting and signature examination. At the time the area had a lot of practitioners, many quite experienced and successful, and there had been a substantial amount written about it, yet very little objective evidence for the fields’ claims had been produced. What the field needed were studies which produced hard evidence to support or dispute its original claims. My work was part of a large and ongoing body of work, where the field’s ideas and claims are tested objectively, and whose results can be used as evidence to support existing notions or derive new ones. It was within this area of research that I started using eye tracking, yet it has also led me to want to bring eye tracking to this moving-image field; where the focus of the eye tracker, shining like an objective lens over some of the theories of the area, can help bring to it some other method for its practitioners to use – to examine how viewers watch and are involved with what they are watching.

The Optometry of Sherlock

Kim Vincs

The science of the gaze—of how eyes fixate or fail to fixate—has always been of great interest to me, firstly in my original career as an optometrist, and more latterly as a choreographer and then a transmedia dance artist. As an optometrist, I was less concerned with where people looked than with whether they could look, and with the accuracy and resolution of the sensory information they received and interpreted when they did look. What do I mean by this? In considering whether people could look, I am referring to whether they were able to accurately fixate the static and moving targets they wanted and needed to. Fixating, that is, aiming one’s eyes at a static or moving target, is a function of attention, and integrated as action by the sensory and muscular systems of the eye and brain. There are many pathological conditions that interfere with the capacity to fixate a static visual stimulus quickly, accurately and efficiently. As an optometrist, I was primarily concerned with detecting these conditions and referring patients who had them for appropriate treatment. I was, in a very real sense, perfectly happy to allow my patients to decide for themselves what to fixate on. My job was simply to ensure that, should they wish to, they would be capable of locating and tracking something. This willingness to allow dissociation between capacity and will, between ability and decision, is something I consider foundational to the ways in which I have pursued my subsequent research into creative practices. I have never, as a choreographer or an interactive / transmedia artist, wished to dictate to people where they should look or what they should perceive. I consider my job to be to place appropriate objects / events / movements within a context in which they can be perceived should people so choose.

This outlook has had some specific implications for my art practice. As a choreographer, I have never thought to ask what someone watches when they observe a dancer moving. Cognitive psychologist Kate Stevens’ seminal work on eye movements in dance has demonstrated a classic novice/expert shift in the way that observers view dance. As with many other fields of expertise such as airline pilots, and driving instructors, experts make significantly fewer saccades, that is, changes in fixation, watching a dance performance than do novices, where experts are people with professional experience in dance and novices are people with no particular prior experience of the artform (Stevens et al., 2010). The implication of these results is that experts do not need to change fixation as many times as novices because they are able, to some extent, to predict where the dancing body will move. In essence, they understand what they are looking for, and are therefore able to maximize the efficiency of their fixation choices.

What Steven’s work does tell us is which movement features most attract fixation when watching a dancing body. My own work in motion capture analysis of dance movement provides me with a theory about why this might be a difficult thing to measure. Dance, at the movement level, comprises movement of some 33 major joints, each of which may make movements of entirely different velocity, acceleration and magnitude to achieve an overall aesthetic effect. The dancing body essentially has no ‘centre of focus’ that can be interpolated from movement data such as the speed, momentum or even position of specific body parts, because the semantics of dance movement are only meaningful in relation to the composition across the body. As I have argued previously, (Vincs, 2014) the semantic significance of a movement bears no relationship to its metrics, such as amplitude or speed. In some aesthetic contexts, tiny movements of the fingers may be essential to the meaning and feeling tone of the movement. In others, such as large virtuosic or acrobatic movement forms, hand gestures may contribute relatively little to a movement’s significance.

Dance grammars are aesthetically and culturally, rather than anatomically determined. I think that this fact has also contributed to my attraction to the notion of Sherlockian eyes. As a choreographer, I am always a detective, seeking potential significances in movement rather than predetermined ones. I value the opportunity to go looking for the dancing body, browsing, shuffling, wandering through the multiple and complex joint actions that comprise a single ‘step,’ looking for something of newness and emotional value rather than assuming I know what it is and where I will find it. Yet I am always aware that my aesthetic search is underpinned by a neurosensory apparatus that is primed to respond selectively to human movement (Hagendoorn, 2004, Vincs, 2009). I am therefore armed, at least potentially, with an inherent ‘grammar’ that is defined by the morphology and physical capacity of the human body, and I am curious as to what predilections and biases my visual sensory system imposes on my seemingly adventurous gaze.

What now follows is an exploration of our different approaches to the eye tracking data that we generated. Jodi is first off, outlining our empirical method and undertaking a close reading of the preliminary results. Jodi shows how the results begin to tell us that the viewers’ gaze patterns and fixations are closely clustered together, and she situates these findings in relation to the science of the eye, the importance of the face in human communication, and to the visual and narrative codes and conventions of Sherlock. Sean then explores the results in terms of haptic visuality and the surveillance gaze, drawing upon phenomenology and the discourses of conspiracy to argue that the vision in Sherlock is marked by touch, texture, and control. Kim examines the results in terms of movement and relationality, examining the eye tracking data in terms of the way it supports and confirms the necessary nature of vision in seeing into moving things. Kim shows that even though there is a high degree of direction in terms of where viewers are being asked to look, visual perception allows or enable the eyes to wander. Finally, we conclude our article together, drawing together our voices to offer an interdisciplinary way forward.

Eye Tracking Sherlock (the objective viewing): Methods and Preliminary Results

Jodi Sita

We undertook a pilot case study that eye-tracked the ‘Holmes Saves Mrs. Hudson’ sequence from the episode, A Scandal in Belgravia (Sherlock, BBC, 2012). This small-scale empirical study involved a total of 13 participants (3 males and 10 females, mean age was: 27 years), comprised of a mixture of academics and undergraduate students at La Trobe University in Melbourne, Australia.

A Tobii X-120 remote eye tracker (Tobii Technology, Stockholm, Sweden) was used to record participants eye movements which has an accuracy of 0.5ºof visual angle and allows a moderate amount of free head movement (30 x 22 x 30 cm at 70 cm (Width x Height x Depth)). This data collection technique uses reflected infrared light from the eye to determine participants viewer gaze positioning and allows for natural head movements and natural human responses to screened material. The eye tracker was connected to a PC running an Intel ® core ™ i7 CPU ‘Cool Master’ hard-drive. The eye tracker used Tobii Studio 2.3.2 professional edition software for the presentation of the movie scene stimuli and recording eye movements. The eye tracker was set up on a desk, situated below a Dell PC monitor (1680×1050), which was utilised by the participants to view the Sherlock sequence. Participants were seated on a sturdy chair between 55-65cm away from the eye tracker and between 65-75cm from the viewing screen. A second screen (Dell; 1920×1080) was utilised by the researchers to view, in real time, the eye movements of the participants as they were being tracked and calibrated, although all computer analyses and statistics reported here was based on stored data.

Participants were recruited via posters advertising the study at La Trobe University, with ethics approval (Ethics approval number: FHEC13/101). Participants were required to be at least 18 years of age to be considered eligible. People whom expressed interest in taking part in the study were contacted via email to attend a single recording session. In preliminary tests participants were introduced to the study and screened for exclusion criteria such as taking medications (e.g. benzodiazepines) that may potentially affect their eye movement, known neurological conditions, disorders or injuries that could potentially affect their eye movement. All participants were screened for normal or corrected to normal near visual acuity of N8 or better on the Designs for Vision near sighted visual acuity test, and with a pen-torch eye movement excursion test to screen for symmetrical movement of the eyes. Participants who were ametropic were allowed to wear their glasses to watch the stimuli.

Prior to eye movement data being collected, the eye tracker was configured for each participant using a 9-point on screen calibration test within the Tobii Studio recording software. Participants were told only that they would watch short segments from a variety of films. Recording sessions typically lasted between 15-25 minutes, and each participant was tested individually.

First, we found that our viewers’ eyes were strongly drawn to follow movement and directional cues and signs. This included camera and character movement. In the opening scene, where Mrs. Hudson’s fingers scrape along the wall, followed by Sherlock’s fingers retracing her steps (03-010 seconds), we see all viewers making strings of successive fixations – each following these finger movements (see Figures 1 and 2). The sound of these fingers scraping along the wall was heavily amplified, and fully sychronised, and we suggest, then, that sound was also an aesthetic device being employed to direct where viewer’s looked. These results confirm previous findings where camera movement, sound, character behavior, and editing patterns are seen to inform gaze patterns and fixations (see Smith, forthcoming, Smith and Mital, 2011).

In one brief shot in the middle of this scene, we cut to a close-up of Mrs. Hudson’s face, full of anguish. All the subjects discern this face in amongst the movement and chaos of the surrounding action, as seen by their fixating to its features. Her face is captured in the center of the screen, making it central to the scene’s visualisation. However, the face is known to be a strong attractor of what the eye attends to (Treuting (2006)), and with it being such an important narrative component in this scene, would have been a strong attentional cue.

Figure 1: Finger drag: 2 subjects

Figure 1: Finger drag: 2 subjects

Figure 2: Finger drag: 13 subjects

Figure 2: Finger drag: 13 subjects

Second, we observed an alignment in vision with regards to where Sherlock was looking. This sight co-proximation is referred to as ‘joint attention,’ in which what one attends to seems to shift automatically to where another is looking (Birmingham et al, 2009). Interestingly, this is a common misdirection trick used by magicians (Kuhn, et al, 2009).

In particular, we observed that Sherlock’s’ point of view in the scene very often produced a close proximity in viewers focus and attention (participant’s looked where Sherlock looked, and with the same overall gaze patterning, see from 1.05 to 1.14). This also supports the findings reported by other film scholars using eye tracking methods, such as Rassell et al. (forthcoming, 2015) who found that a character’s point of view and subjective experiences have an influence on where  viewers look.

The trends for this short sequence support the idea that Sherlock is a character-driven drama in which his vision is not only foregrounded but given omnipotent and omniscient power. Thus, viewers are not only being positioned to observe from his authorial position but to trust where he looks and what he discovers there. There are recognizable genre codes and conventions also in play, structuring the looking patterns we have observed. This is a detective-thriller series that repeats a series of camera and editing motifs that become familiar to audiences (Neale, 1990).

Sherlock Image 3

Figure 3: A heap map showing the hot spots where viewer’s gazed; red indicating longer dwelling time

Figure 4: The gaze plots showing the sequence of looks that viewers made over Sherlock’s face

Figure 4: The gaze plots showing the sequence of looks that viewers made over Sherlock’s face

Third, we found that viewers focused heavily on the characters faces, both in scenes with dialogue and those without. In scenes where Sherlock was clearly putting together the evidence, viewers focused heavily on his eyes, dwelling there for almost the entire shot. (Figures 3 and 4).

Viewers fixated back and forth between the eyes, face, and mouth of the central characters. These viewing patterns are characteristic of the movements made in facial and emotional recognition (Ekman & Friesen, 1971; Hernandez, et al, 2009) and show some indication that viewers were paying attention to the different character’s in the scene, working out the role of each character and what their intentions and emotions might be. These patterns of eye movements suggest that viewers are engaging with the scene as they would in a normal face-to-face encounter, using eye movements to verify who people are and what they are feeling. It is interesting to note that people who are not able to perform these socially informative tasks, such as those with the disorder schizophrenia, and with some traumatic brain injuries, do not show the same eye movement behaviors (Watt & Douglas, 2006; Loughland et al, 2002; Williams,et al, 1999).

Our viewers clearly followed narrative cues in line with the dialogue exchanges, looking back and forth between the character’s interpersonal relays (Figures 5, 6 and 7). These results are similar to those of Treuting (2006), who eye-tracked 14 participants viewing short clips from such films as Shawshank Redemption (1994) and Harry Potter and the Philosopher’s Stone (2001). Treuting found that gaze clusters emerged in and around the central character’s faces involved in dialogue and moments of heightened drama (see also, Redmond, Sita, 2013).


Figure 5: Single viewer character alignment

Figure 6: 6 subjects and the relay of looks on eyes, mouths and faces.

Figure 6: 6 subjects and the relay of looks on eyes, mouths and faces.

Figure 7: Final scene, last 12 seconds, searching for information: 13 subjects (most of the fixations are falling over the faces of the 2 central character’s in dialogue)

Figure 7: Final scene, last 12 seconds, searching for information: 13 subjects (most of the fixations are falling over the faces of the 2 central character’s in dialogue)

Fourth, we saw evidence that viewers searched for narrative information and cues: this included fixating on aspects of the background wall before Sherlock first enters the scene (from 0.33-0.36 seconds), then moving between the image of a smile seen on the wall and Holmes’ face, spending time ‘reading’ the shop window signs and the note on the front door (Figure 8) as Watson arrives at the scene (from 2.22 to 2.37). One can understand such scanning as influenced by the meticulous work of the mise en scène: where all the elements have been carefully placed to enact this type of searching for narrative cues (see Smith, forthcoming).


Figure 8: Searching for narrative information

Finally, albeit in relation to our last point, we observed that certain viewers looked at more elements of the mise en scène (Figure 9, shows gaze patterns for 4 of the 13 viewers), including the interior lights, the computer, and furniture, even as the more dramatic moments of the scene were taking place.

These findings were interesting but not totally unexpected; we would hope that not all people viewing the same scene would watch it in the same way (this is something that is discussed further below). Insights like this allow us to see that even though there are some aspects that are strong attention grabbers, such as faces and movement within a scene, other aspects can captivate and draw attention away from those areas of interest. For example, the scene shown below (Figure 9) involves a particularly emotive exchange between two key characters, Mrs. Hudson and Dr. Watson. The fact that 4 of the viewers were attending elsewhere helps us to see these aspects of interest outside the main narrative at play. Why certain viewers look to the margins of the screen, to the more ‘insignificant’ elements of the mise en scène remains of great interest. One possibility would be that these viewers were not fully engaged with the exchange between the characters, and their attention therefore drifted to other elements in the scene. Another possibility is that these scenic elements drew particular interest because of their pattern, colour, etc. Further testing is needed to begin to tease out whether this response is scene-dependent, or a characteristic of these particular observers.

Figure 9: A slightly different patterning – 4 subjects – and wider viewing

Figure 9: A slightly different patterning – 4 subjects – and wider viewing

It should be noted, nonetheless, that these observations come from only a very small sample (13 participants to date), which will be increased, and which still needs to undergo further data analysis and interpretation.

In summary, what have we see in these results so far? Evidence of the eyes being held to attention by narrative cues, by camera and character movement, faces, dialogue, point of view and performance. These were elements to be expected and add to the growing body of eye tracking evidence that supports much of current film and television theory, particularly those working in the cognitive tradition such as David Bordwell (2007) and Noël Carroll (1996). The results equally support the results of other studies into; narrative centered visual texts (see Batty et al, forthcoming, 2015); and how sound and movement affect gaze patterns (Rassell et al, forthcoming, 2015) Further, they speak to the way viewers are pulled seamlessly into the diegetic worlds they believe and invest in.

In this article we would now like to apply two different theoretical filters to the results just summarized; the first will be an examination of the gaze, by Sean, and the second will be an examination of the physiological and perceptual processes of the eye in relation to movement, by Kim. Both filters will attempt to make deeper sense of the results from the traditions in which the scholars operate from. Following this analysis, a summary conclusion will draw their approaches together to make further inter-disciplinary sense of Sherlock’s eyes.

Sherlock’s Gaze

Sean Redmond

The concept of the gaze has a long and contentious history in Film Studies if much less so in the study of television. In fact, John Ellis has suggested that the domestic context in which television viewing has historically taken place, with a host of likely distractions, and in a context of constant programme flow and segmentation, produces a glance aesthetic whereby the image isn’t looked into deeply or for a sustained period of time (1982: 138). Sherlock, of course, contests this idea since the programme is heavily built around the details of forensic gazing.

Most notably, the idea of the gaze has been employed in psychoanalytical film theory to make the argument that the cinema looking apparatus is patriarchal and heterosexual, and viewers are positioned as ‘male’ subjects through which masculine identifications emerge (Mulvey, 1975, 1989). Its main male characters, and male writers and directors of course control the vision regime in Sherlock, although its objects of focus are rarely to-be-looked-at female characters.

Critical race theory, by contrast, has employed the concept of the gaze to demonstrate how the racial Other is fixed in inferior, marginal and fetishized subject positions (Hall, 2001). Sherlock can be read as a post-colonial text enacting a present England that centres whiteness and ‘invisibly’ marginalizes the Other from its panopticon empowered centre (Cuningham, 2004). The Other in Sherlock of course extends to those who sit outside the bourgeois social centre; there are particular class dimensions to the way crime is surveyed and defined (Jann, 1990).

In terms of surveillance discourse, film has been read as a vision machine set within a invasive visual culture that promotes:

the normalizing gaze, a surveillance that makes it possible to qualify, to classify and to punish. It establishes over individuals a visibility through which one differentiates and judges them (Foucault, 1977: 25).

Sherlock can be read as a text that carries out this normalizing gaze, defining the parameters of law and order and the way the criminal can be discovered, classified, and ultimately disciplined. That is not to say that the visual excesses of the programme do not at times undermine its simple binaries. To the contrary, Sherlock is constantly troubled by its own dominant discourses particularly through the way Sherlock is also a maverick outsider.

Finally, film phenomenology has made use of the gaze to demonstrate how looking and seeing is always embodied, experiential, and depending on the text, haptic and synesthetic – where ‘the eyes themselves function like organs of touch’ (Marks, 2010, 162). Sherlock creates the conditions of both embodied presence and haptic visuality through the way the gaze is employed to see deeply into things, while the programmes textural mise en scène ‘demands’ to be attended to.

What I would now like to do is analyse two particular aspects of the way the gaze can be understood in Sherlock, relating my reading back to the eye tracking results that we have, and to eye tracking technology itself. First, I will explore Sherlock through its forensic gazing and the way this creates the particular conditions for the way viewers become locked into particular viewing patters and relations. Second, I will explore Sherlock through its haptic elements whereby the viewer is understood to gaze at and touch (things) simultaneously.

The Forensic Gaze

In Sherlock one can argue that camera movement and position are motivated by the following factors. First, to reveal narrative information such as a new location, or setting; character relations and their relative physical proximity; time, and temporal detail; and moments of revelation where a new angle or focus reveals something previously hidden or a new ‘enigma’ emerges. Second, as a dramatic device: the camera is re-positioned to signal and cue moments of narrative development, crisis, reaction, and activation. Third, there are repeated and recognised televisual conventions of the programme: one can locate and expect certain camera motifs to function in Sherlock, such as the way we enter Sherlock’s mind’s eye to see what he is unearthing in microscopic close-up. Finally, camera movement and position signals certain emotional states and modes of feeling. The cut to a close-up, for example, a moment of affecting intensity, such as is the case with the fingers being scraped along the wall in the scene under analysis in this article.

When one takes these Sherlockian codes and conventions into consideration one can make better sense of the eye tracking results that we have gotten. The eyes of the viewer seem to be relentlessly led and directed. Viewers familiar with the programme’s codes can be expected to have expectations of its visual tapestry, and to make predictions about where to look (see Rassell et al. forthcoming). This would explain both the way that viewers seem closely aligned with the looking operations of the scene (figures 1-7), and the way that viewers scan shots for narrative information (figures 8-10).

However, I also think there is something more telling to discover here – one around a consistent forensic looking regime where the text and the viewer align. This is the ‘double vision’ we refer to in my introduction to this article. Viewers come to embody the gazing powers that Sherlock possess and look at the diegetic world through his eyes even where no direct or imagined point of view is in operation. Viewers experience their very own form of social surveillance becoming detectives and snoopers in the process. Sherlock, then, can be read as a text of and for paranoid surveillance, fuelled by the constant search for facts, omissions, falsehoods, and half-truths. At a more general cultural level, trust is at issue here in what is perceived to be an age of ‘faithless’ activity and widespread corruption, where politicians are regarded to be as corrupt as the criminals they covertly support. Sherlockians ultimately become part of this age of conspiracy (Knight, 2002).

As does, in a very real sense, eye tracking technology and the data it produces. Sherlock is his very own eye tracker – he creates his own heat maps and relays and through this inbuilt biotechnology he sees into everything. Eye tracking technology is Sherlocklian and the data it produces allows us to see into everything the viewer sees. Or, at least mostly…

The Haptic Gaze

Laura U Marks (2010) has written that haptic visuality is a more intimate form of looking, where the eyes, ‘move over the surface of its object rather than plunge into illusionist depth, not to distinguish form so much as to discern texture” (162). For Marks, film and video may be “thought of as impressionable and conductive, like skin.” (2000: xi-xii) and this sensory materiality is heightened by it containing:

Grainy, unclear images; sensuous imagery that evokes memory of the senses (i.e. water, nature); the depiction of characters in acute states of sensory activity (smelling, sniffing, tasting, etc.); close-to-the-body camera positions and panning across the surface of objects; changes in focus, under- and overexposure, decaying film and video imagery; optical printing; scratching on the emulsion; densely textured, effects and formats such as Pixelvision… and alternating between film/video.  (Totaro, 2002)

The gaze found in Sherlock is very often a haptic one. The programme’s entire mise en scène evokes the activity and memory of sensation. Lights, objects, clothes, furniture, exteriors are given deep and layered textures. Sherlockian environments are populated with objects and qualities that are themselves sensory driven (poison, oil, tactile fabrics, beads of sweat, cigarette smoke, wet soil). The camera very often dwells on these, picks them out, and tracks and pans over them in close and proximate detail. Sherlock of course is a master of haptic visuality – his eyes touches the things that he observes or that he conjures up in his imagination. In many respects, then, the viewer is also invited to see Sherlock through a haptic lens.

If one was to return to the eye tracking results on the scene what we might be observing is not just an alignment in vision, and the search for narrative information, but eyes that have been turned into organs of touch and deep sensual appreciation. For example, in image 1 and 2 viewers are not just following the fingers that scrape along the wall but touching (with) them, and in touching them feeling them as if it is their fingers suffering this pain. In figures 5-7 viewers are not just following the relay of looks between the two characters but ‘touching’ faces, eyes and mouths. In figures 8-10, viewers are not just searching for narrative information and clues but are actively seeing into the textures, lights, objects, items that populate those scenes. The heat maps that eye tracking technology can generate may be more apt than we imagine since the suggestion of temperature, of body-heat, may well give truth to the embodied and carnal nature of vision. This is one of the limitations of eye tracking technology, however, since it cannot tell us what people are feeling when watching a film or television text.

I would like to make one final observation about the eye that searches the mise en scène for narrative information or clues, as in figures 8-10. This is a point about the privacy or individualism of watching a screen text so co-dependent it is on personal memory, biography, and the contexts one finds one self-viewing something. After the viewing of the scene one of our subjects remarked that they had actually spent much of the time trying to figure out whom the actor was playing Dr Watson. Any number of ‘personal’ factors might get in the way of the looking regime of the text and for why we might scan a particular text.

Roland Barthes (1981) has usefully employed the concept of the punctum (a Latin word derived from the Greek word for trauma) to viewing photographs. He argues that the still image inspires an intensely private meaning, one in which an affecting ‘partial object’ emerges from its centre to ‘prick’ or ‘wound’ the viewer. The punctum is personal and as soon as it emerges it holds the viewer’s gaze. Although Barthes is singularly writing about the photograph I think the idea of the punctum can be applied to the moving image text, to Sherlock. Although a dynamic media, television and film still settle on images and representations that reach out into the private realm of the viewer; and the viewer still finds their memories, traumas, life events activated and mobilised in the fictive worlds constructed. The wandering eyes of figures 8-10 are caught in their own biographical exploration, looking for objects that may ultimately wound them, their haptic eyes. In Sherlock we are not just being positioned as objects and subjects of surveillance but as carnal beings.

The Eyes of Sherlock; the Eyes of the Viewer

Kim Vincs

Coming, as I do, from the perspective of a dancer and choreographer, I read these eye tracking results not exactly as touch, but as a search for relationality. Erin Manning, in her seminal work on the philosophy of movement (Manning, 2009) emphasizes the incipiency of movement—movement as something in the act of becoming something, of reaching towards something or someone—over its positionality. That is to say, movement, in Manning’s terms, is always a process of relationality or reaching towards the world, rather than simply a series of coordinates on a grid. Bodies, or, more precisely “bodies-in-the-making,” are a means of thought rather than simply of action because they define, by the possibilities embedded in the moment of pre-articulation, a relationship or set of relationships within the world (Manning, 2009: 78).

Our eye movement data defines very clearly the relationships between the protagonists. In figures 1 and 2, ‘finger drag,’ the fixations map the pathway of Mrs. Hudson’s fingers along the wall. The relationships between Mrs. Hudson’s fingers and the texture of the wall are, in fact, the only potential human relationships within the scene. These relationships are not ‘positional’— that is to say, fingers defining points on the wall—but, by virtue of their spatial distribution, an articulation of the trajectory of Mrs. Hudson’s hand in relation to the wall.

Figures 3—6 reveal fixation patterns that are concentrated around the face, and in particular the eyes. These fixations address the origin of relationality in the scenes within the eyes, as if the eyes are understood to reveal the incipient thought of the characters. In Manning’s terms, incipiency the moment in which a movement is in a state of ‘pre-acceleration,’ organizing and mobilizing itself, yet still capable of any number of actual outcomes, is the most potent aspect of a movement. The predominance of fixations on the eyes and face, while perfectly interpretable in as simply a biological reflex designed to respond to and recognize human faces, also speaks of the process by which relationality is thought into being.

Figures 7—10, with their fixations distributed between human-to-human gaze and gesture, (Figure 7) non-human scenic elements that lend cognitive elements or ‘clues,’ (Figure 8) and poetic elements such as the look and texture of surrounding objects in Figures 8 (curtain texture) and 10 (lens flare), articulate an expanded notion of relationality in which human and non-human elements are implicated within a web of actions.

For me, these results suggest a mutability between human and non-human elements that is reminiscent of Manning’s understanding of relationality as something that can have both interpersonal and person-world dimensions, and also points to the kind of ‘Sherlockian body’ I seek as a dance artist. A purely ‘narrative’ approach to the filmic bodies analysed here would suggest that only human factors would feature in the gaze analysis. Similarly, a biologically driven notion of human movement perception as pre-wired to detect human shapes and actions would not predict migrations to and from the bodies to surrounding objects.

I read these results as revealing a semantic ambiguity that can form the basis for a Sherlockian search for relationality that is produced and constructed by the viewer as much as it is dictated by the film-maker. These scenes offer relatively few visual details from which to construct such a relationality, and this, no doubt, is indicative of a film-making style designed to direct the eye, and hence the mind, to very specifically and deliberately arranged narrative events. However, despite the seeming prescriptiveness of these images, they offer the viewer an opportunity, to construct relational scenarios across like and non-like (human and non-human) scenic elements. They therefore demonstrate at least the potential for a Sherlockian approach to the visual perception of movement.


So, what have we seen with your eyes? On the one hand, we have demonstrated how Sherlock’s narrative and mise en scène pulls the viewer into taking certain (emotive, forensic, relational) viewing positions. Sherlock is tightly bound by a number of codes and conventions and its palette and composition are highly constructed, creating spaces and interactions that focus our eyes. On the other hand, we have looked at the way haptics and relationality open up the possibility for better understanding the synaesthetic and organic/inorganic ways through which the vision of movement takes place. A televisual text such as Sherlock intends to marshal our viewing experience but as we have also seen, eyes wander; they search, their own movement and the poetics of movement opening up textual encounters not always pre-determined by the scenes deliberate operations. Our eyes escape themselves. Finally, we have noted how memory and biography can impact on where and why we look, or why we might look away. One of our respondents searched Watson’s face in the hope of remembering the actor’s name. The brutal violence metered out on Mrs. Holmes may be looked at (felt) differently by someone who has themselves undergone such misery. This is ultimately about our-being-in-the-world, and about how we make sense of beings-in-the-fictive world. Vision is never disembodied but full of the drink, food and love of life itself.

Vision needs to be understood as something that can only be fully understood through combining different theoretical and methodological frameworks. What we have found in this article, and in the broader work of the Eye Tracking the Moving Image Research group – see the introduction to this special edition for more on this group – is that it is in the conversations and deliberations, provocations and discussions between vision scientists, neuroscientists, anatomists, choreographers, film makers, ethnographers, screen theorists, and screen writers where insights are best made, conclusions thickened, and arguments enriched and extended. What we have found in this article is that Sherlock exists at the eye of the ArtsScience nexus, and it is at this nexus where the authors would like to situate their work.

The language of film and television constantly creates these spaces of vision and for seeing to take place, whether this be; the embodied point-of-view shot which allows us to become the character; aerial cinematography that brings wide open exteriors and bejeweled cityscapes into view; the furtive camera that glimpses into dark corners, allowing us to happen on to what is supposedly hidden; or the interiorized gaze that expressionistically captures the nightmare visions of the lost, the hunted, and the alien. With a close-up shot one can trace the undulating valleys of emotion on a character’s face and feel their affecting eyes reaching out and into yours. In Sherlock, the very force of his vision, mobilized through special effects and the power of digital photography, enables his/our eyes to create mathematical formulations out of thin air and to re-visit crime scenes as if one is witnessing, experiencing it all over again. Sherlock establishes the sense that all vision is embodied and personally engineered, and to the wider conceit that television is an artform that gives a miraculous, omnipotent and omnipresent vision to its ever watchful viewers.



Sean Redmond is an Associate Professor in Media and Communication at Deakin University. He has research interests in film and television aesthetics, film and television genre, film authorship, film sound, stardom and celebrity, and film phenomenology. He has published nine books including The Cinema of Takeshi Kitano: Flowering Blood (Columbia, 2013), and Celebrity and the Media (Palgrave, 2014), and with Su Holmes he edits the journal Celebrity Studies. Sean Redmond and Jodi Sita set up the Eye Tracking the Moving Image Research group in 2011.

Jodi Sita is an academic working within the areas of neuroscience and anatomy with expertise in eye tracking research. She has extensive experience, with multiple project types using eye tracking technologies and other biophysical data. Her current research uses eye tracking to study viewers gaze patterns while watching moving images; to examine expertise in Australian Rules Football League coaches and players and to examine the signature forgery process.

Professor Kim Vincs is Director and founder of Deakin Motion.Lab, at Deakin University. Kim integrates scientific and artistic approaches through research. She is currently working on a three-year project, supported by the Australian Research Council’s Discovery program. Her collaborations with mathematicians, biomechanists and cognitive psychologists span Deakin and the Universities of Sydney, Western Sydney and New South Wales.

From Subtitles to SMS: Eye Tracking, Texting and Sherlock – Tessa Dwyer


As we progress into the digital age, text is experiencing a resurgence and reshaping as blogging, tweeting and phone messaging establish new textual forms and frameworks. At the same time, an intrusive layer of text, obviously added in post, has started to feature on mainstream screen media – from the running subtitles of TV news broadcasts to the creative portrayals of mobile phone texting on film and TV dramas. In this paper, I examine the free-floating text used in BBC series Sherlock (2010–). While commentators laud this series for the novel way it integrates text into its narrative, aesthetic and characterisation, it requires eye tracking to unpack the cognitive implications involved. Through recourse to eye tracking data on image and textual processing, I revisit distinctions between reading and viewing, attraction and distraction, while addressing a range of issues relating to eye bias, media access and multimodal redundancy effects.

Figure 1

Figure 1: Press conference in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.


BBC’s Sherlock (2010–) has received considerable acclaim for its creative deployment of text to convey thought processes and, most notably, to depict mobile phone messaging. Receiving high-profile write-ups in The Wall Street Journal (Dodes, 2013) and Wired UK, this innovative representational strategy has been hailed an incisive reflection of our current “transhuman” reality and “a core element of the series’ identity” (McMillan 2014).[1] In the following discussion, I deploy eye tracking data to develop an alternate perspective on this phenomenon. While Sherlock’s on-screen text directly engages with the emerging modalities of digital and online technologies, it also borrows from more conventional textual tools like subtitling and captioning or SDH (subtitling for the deaf and hard-of-hearing). Most emphatically, the presence of floating text in Sherlock challenges the presumption that screen media is made to be viewed, not read. To explore this challenge in detail, I bring Sherlock’s inventive titling into contact with eye tracking research on subtitle processing, using insights from audiovisual translation (AVT) studies to investigate the complexities involved in processing dynamic text on moving-image screens. Bridging screen and translation studies via eye tracking, I consider recent on-screen text developments in relation to issues of media access and linguistic diversity, noting the gaps or blind spots that regularly infiltrate research frameworks. Discussion focuses on ‘A Study in Pink’ – the first episode of Sherlock’s initial season – which producer Sue Vertue explains was actually “written and shot last, and so could make the best use of onscreen text as additional script and plot points” (qtd in McMillan, 2014).

Texting Sherlock

Figure 2

Figure 2: Watson reads a text message in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

The phenomenon under investigation in this article is by no means easy to define. Already it has inspired neologisms, word mashes and acronyms including TELOP (television optical projection), ‘impact captioning’ (Sasamoto, 2014), ‘decotitles’ (Kofoed, 2011), ‘beyond screen text messaging’ (Zhang 2014) and ‘authorial titling’ (Pérez González, 2012). While slight differences in meaning separate such terms from one another, the on-screen text in Sherlock fits all. Hence, in this discussion, I alternate between them and often default to more general terms like ‘titling’ and ‘on-screen text’ for their wide applicability across viewing devices and subject matter. This approach preserves the terminological ambiguity that attaches to this phenomenon instead of seeking to solve it, finding it symptomatic of the rapid rate of technological development with which it engages. Whatever term is decided upon today could well be obsolete tomorrow. Additionally, as Rick Altman (2004: 16) notes in his ‘crisis historiography’ of silent and early sound film, the “apparently innocuous process of naming is actually one of culture’s most powerful forms of appropriation.” He argues that in the context of new technologies and the representational codes they engender, terminological variance and confusion signals an identity crisis “reflected in every aspect of the new technology’s socially defined existence” (19).

According to the write-ups, phone messaging is the hero of BBC’s updated and rebooted Sherlock adaptation. Almost all the press garnered around Sherlock’s on-screen text links this strategy to mobile phone ‘texting’ or SMS (short messaging service). Reporting on “the storytelling challenges of a world filled with unglamorous smartphones, texting and social media”, The Wall Street Journal’s Rachel Dodes (2013) credits Sherlock with solving this dilemma and establishing a new convention for depicting texting on the big screen, creatively capturing “the real world’s digital transformation of everyday life.” For Mariel Calloway (2013), “Sherlock is honest about the role of technology and social media in daily life and daily thought… the seamless way that text messages and internet searches integrate into our lives.” Wired’s Graeme McMillan (2014) ups the ante, naming Sherlock a “new take” on “television drama as a whole” due precisely to its on-screen texting technique that sets it apart from other “tech-savvy shows out there”. McMillan continues, that “as with so many aspects of Sherlock, there’s an element of misdirection going on here, with the fun, eye-catching slickness of the visualization distracting from a deeper commentary the show is making about its characters relationship with technology – and, by extension, our own relationship with it, as well.”

As this flurry of media attention makes clear, praise for Sherlock’s on-screen text or texting firmly anchors this strategy to technology and its newly evolving forms, most notably the iPhone or smartphone. Appearing consistently throughout the series’ three seasons to date, on-screen text in Sherlock occurs in a plain, uniform white sans-serif font that appears unadorned over the screen image, obviously added during post-production. This text is superimposed, pure and simple, relying on neither text bubbles nor coloured boxes nor sender ID’s to formally separate it from the rest of the image area. As Michele Tepper (2011) eloquently notes, by utilising text in this way, Sherlock “is capturing the viewer’s screen as part of the narrative itself”:

It’s a remarkably elegant solution from director Paul McGuigan. And it works because we, the viewing audience, have been trained to understand it by the last several years of service-driven, multi-platform, multi-screen applications. Last week’s iCloud announcement is just the latest iteration of what can happen when your data is in the cloud and can be accessed by a wide range of smart-enough devices. Your VOIP phone can show caller ID on your TV; your iPod can talk to both your car and your sneakers; Twitter is equally accessible via SMS or a desktop application. It doesn’t matter where or what the screen is, as long as it’s connected to a network device. … In this technological environment, the visual conceit that Sherlock’s text message could migrate from John Watson’s screen to ours makes complete and utter sense.

Unlike on-screen text in Glee (Fox, 2009–), for instance (see Fig. 3), that is used only occasionally in episodes like ‘Feud’ (Season 4, Ep 16, March 14, 2013), Sherlock flaunts its on-screen text as signature. Its consistently interesting textual play helps to give the series cohesion. Yet, just as it aids in characterisation, helps to progress the narrative, and binds the series as a whole, it also, necessarily, remains at somewhat of a remove, as an overtly post-production effect.

Figure 3

Figure 3: Ryder chats online in ‘Feud’, Glee (2013), Episode 16, Season 4.

While Tepper (2011) explains how Sherlock’s “disembodied” (Banks, 2014) texting ‘makes sense’ in the age of cross-platform devices and online clouds, this argument falters when the on-screen text in question is less overtly technological. The extradiegetic nature of this on-screen text – so obviously a ‘post’ effect – is brought to the fore when it is used to render thoughts and emotions rather than technological interfacing. In ‘A Study in Pink’, a large proportion of the text that pops up intermittently on-screen functions to represent Sherlock’s interiority, not his Internet prowess. In concert with camera angles and “microscopic close-ups”, it elucidates Sherlock’s forensic “mind’s eye” (Redmond, Sita and Vincs, this issue), highlighting clues and literally spelling out their significance (see Figs. 4 and 5). The fact that these human-coded moments of titling have received far less attention in the press than those that more directly index new technologies is fascinating in itself, revealing the degree to which praise for Sherlock’s on-screen text is invested in ideas of newness and technological innovation – underlined by the predilection for neologisms.

Figure 4

Figures 4: Sherlock examines the pink lady’s ring in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

Figure 5

Figures 5: Sherlock examines the pink lady’s ring in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

Of course, even when not attached to smartphones or data retrieval, Sherlock’s deployment of on-screen text remains fresh, creative and playful and still signals perceptual shifts resulting from technological transformation. Even when representing Sherlock’s thoughts, text flashes on screen manage to recall the excesses of the digital, when email, Facebook and Twitter ensconce us in streams of endlessly circulating words, and textual pop-ups are ubiquitous. Nevertheless, the blinkered way in which Sherlock’s on-screen text is repeatedly framed as, above all, a means of representing mobile phone texting functions to conceal some of its links to older, more conventional forms of titling and textual intervention, from silent-era intertitles to expository titles to subtitles. By relentlessly emphasising its newness, much discussion of Sherlock’s on-screen text overlooks links to a host of related past and present practices. Moreover, Sherlock’s textual play actually invites a rethinking of these older, ongoing text-on-screen devices.

Reading, Watching, Listening

As Szarkowska and Kruger (this issue) explain, research into subtitle processing builds upon earlier eye tracking studies on the reading of static, printed text. They proceed to detail differences between subtitle and ‘regular’ reading, in relation to factors like presentation speed, information redundancy, and sensory competition between different multimodal channels. Here, I focus on differences between saccadic or scanning movements and fixations, in order to compare data across the screen and translation fields. During ‘regular’ reading (of static texts) average saccades last 20 to 50 milliseconds (ms) while fixations range between 100 and 500ms, averaging 200 to 300ms (Rayner, 1998). Referencing pioneering studies into subtitle processing by Géry d’Ydewalle and associates, Szarkowska et al. (2013: 155) note that “when reading film subtitles, as opposed to print, viewers tend to make more regressions” and fixations tend to be shorter. Regressions occur when the eye returns to material that has already been read, and Rayner (1998: 393) finds that slower readers (of static text) make more regressions than faster readers. A study by d’Ydewalle and de Bruycker (2007: 202) found “the percentage of regressions in reading subtitles was globally, among children and adults, much higher than in normal text reading.” They also report that mean fixation durations in the subtitles was shorter, at 178 ms (for adults) and explain that subtitle regressions (where the eye travels back across words already read) can be partly explained by the “considerable information redundancy” that occurs when “[s]ubtitle, soundtrack (including the voice and additional information such as intonation, background noise, etc.), and image all provide partially overlapping information, eliciting back and forth shifts with the image and more regressive eye-movements” (202).

What happens to saccades and fixations when image processing is brought into the mix? When looking at static images, average fixations last 330 ms (Rayner, 1998). This figure is slightly longer than average fixations during regular reading and longer again than average subtitle fixations. Szarkowska and Kruger (this issue) note that “reading requires many successive fixations to extract information whereas looking at a scene requires fewer, but longer fixations” that tend to be more exploratory or ambient in nature, taking in a greater area of focus. In relation to moving-images, Smith (2013: 168) finds that viewers take in roughly 3.8% of the total screen area during an average length shot. Peripheral processing is at play but “is mostly reserved for selecting future saccade targets, tracking moving targets, and extracting gist about scene category, layout and vague object information”. In thinking about these differences in regular reading behaviour, screen viewing, and subtitle processing, it is noticeable that with subtitles, distinctions between fixations and saccades are less clear-cut. While saccades last between 20 and 50ms, Smith (2013: 169) notes that the smallest amount of time taken to perform a saccadic eye movement (taking into account saccadic reaction time) is 100-130ms. Recalling d’Ydewalle and de Bruycker’s (2007: 202) finding that fixations during subtitle processing last around 178ms, it would seem that subtitle conditions blur the boundaries somewhat between saccades and fixations, scanning and reading.

Interestingly, studies have also shown that the processing of two-line subtitles involves more regular word-by-word reading than for one-liners (D’Ydewalle and de Bruycker, 2007: 199). D’Ydewalle and de Bruycker (2007: 199) report, for instance, that more words are skipped and more regressions occur for one-line subtitles than for two-line subtitles. Two-line subtitles result in a larger proportion of time being spent in the subtitle area, and occasion more back-and-forth shifts between the subtitles and the remaining image area (201). This finding suggests that the processing of one-line subtitles differs considerably from regular reading behaviour. D’Ydewalle and de Bruycker (2007: 202) surmise that the distinct way in which one-line subtitles are processed relates to a redundancy effect caused by the multimodal nature of screen media. Noting how one-line subtitles often convey short exclamations and outcries, they suggest that a “standard one-line subtitle generally does not provide much more information than what can already be extracted from the picture and the auditory message.” They conclude that one-line subtitles occasion “less reading” than two-line subtitles (202). Extrapolating further, I posit that the routine overlapping of information that occurs in subtitled screen media blurs lines between reading and watching. One-line subtitles are ‘read’ irregularly and partly blind – that is, they are regularly skipped and processed through saccadic eye movements rather than fixations.

This suggestion is supported by data on subtitle skipping. Szarkowska and Kruger (this issue) find that longer subtitles containing frequently used words are easier and quicker to process than shorter subtitles containing low-frequency words. Hence, they conclude that cognitive load relates more to word familiarity than quantity, something that is overlooked in many professional subtitling guidelines. This finding indicates that high-frequency words are processed ‘differently’ in subtitling than in static text, in a manner more akin to visual recognition or scanning than reading. Szarkowska and Kruger find that high-frequency words in subtitles are often skipped. Hence, as with one-line subtitles, high-frequency words are, to a degree, processed blind, possibly through shape recognition and mapping more than durational focus. In relation to other types of on-screen text, such as the short, free-floating type that characterises Sherlock, it seems entirely possible that this innovative mode of titling may just challenge distinctions between text and image processing. While commentators laud this series for the way it integrates on-screen text into its narrative, style and characterisation, eye tracking is required to unpack the cognitive implications of Sherlock’s text/image morph.

The Pink Lady

Figure 6

Figure 6: Letters scratched into the floor in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

Sherlock producer Vertue refers to the pink lady scene in ‘A Study in Pink’ as particularly noteworthy for its “text all around the screen”, referring to it as the “best use” of on-screen text in the series (qtd in McMillan, 2014). In this scene, a dead woman dressed in pink lies face first on the floor of a derelict building into which she has painstakingly etched a word or series of letters (‘Rache’) with her fingernails. As Sherlock investigates the crime scene, forensics officer Anderson interrupts to explain that ‘Rache’ is the German word for ‘revenge’. The German-to-English translation pops up on screen (see Fig. 6), and this time Sherlock sees it too. This superimposed text, so obviously laid over the image, oversteps its surface positioning to enter Sherlock’s diegetic space, and we next view it backwards, from Sherlock’s point of view, not ours (see Fig. 7). After an exasperated eye roll that signals his disregard for Anderson, Sherlock dismisses this textual intervention and we watch it swirl into oblivion. Here, on-screen text is at once both inside and outside the narrative, diegetic and extra-diegetic, informative and affecting. In this way it self-reflexively draws attention to the show’s narrative framing, demonstrating its complexity as distinct diegetic levels merge.

Figure 7

Figure 7: Sherlock sees on-screen text in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

For Carol O’Sullivan (2011), when on-screen text affords this type of play between the diegetic and extra-diegetic it functions as an “extreme anti-naturalistic device” (166) that she unpacks via Gérard Genette’s notion of narrative metalepsis (164). Detailing numerous examples of humourous, formally transgressive diegetic subtitles, such as those found in Annie Hall (Woody Allen, 1977) (Fig. 8), O’Sullivan points to their metatextual function, referring to them as “metasubtitles” (166) that implicitly comment on the limits and nature of subtitling itself. When Sherlock’s on-screen titles oscillate between character and viewer point-of-view shots, they too become metatextual, demonstrating, in Genette’s terms, “the importance of the boundary they tax their ingenuity to overstep in defiance of verisimilitude – a boundary that is precisely the narrating (or the performance) itself: a shifting but sacred frontier between two worlds, the world in which one tells, the world of which one tells” (qtd in O’Sullivan 2011: 165). Moreover, for O’Sullivan, “all subtitles are metatextual” (166) necessarily foregrounding their own act of mediation and interpretation. Specifically linking such ideas to Sherlock, Luis Perez Gonzalez (2012: 18) notes how “the series creators incorporate titles that draw attention to the material apparatus of filmic production”, thereby creating an complex alienation-attraction effect “that shapes audience engagement by commenting upon the diegetic action and disrupting conventional forms of semiotic representation, making viewers consciously work as co-creators of media content.”

Figure 8

Figure 8: Subtitled thoughts in the balcony scene, Annie Hall (1977).

Eye Bias

One finding from subtitle eye tracking research particularly pertinent to Sherlock is the notion that on-screen text causes eye bias. This was established in various studies conducted by d’Ydewalle and associates, which found that subtitle processing is largely automatic and obligatory. D’Ydewalle and de Bruycker (2007: 196) state:

Paying attention to the subtitle at its presentation onset is more or less obligatory and is unaffected by major contextual factors such as the availability of the soundtrack, knowledge of the foreign language in the soundtrack, and important episodic characteristics of actions in the movie: Switching attention from the visual image to “reading” the subtitles happens effortlessly and almost automatically (196).

This point is confirmed by Bisson et al. (2014: 399) who report that participants read subtitles even in ‘reversed’ conditions – that is, when subtitles are rendered in an unfamiliar language and the screen audio is fully comprehensible (in the viewers’ first language) (413). Again, in intralingual or same-language subtitling – when titles replicate the language spoken on screen –hearing audiences still divert to the subtitle area (413). These findings indicate that viewers track subtitles irrespective of language or accessibility requirements. In fact, the tracking of subtitles overrides function. As Bisson et al. (413) surmise, “the dynamic nature of the subtitles, i.e., the appearance and disappearance of the subtitles on the screen, coupled with the fact that the subtitles contained words was enough to generate reading behavior”.

Szarkowska and Kruger (this issue) reach a similar conclusion, explaining eye bias towards subtitles in terms of both bottom-up and top-down impulses. When subtitles or other forms of text flash up on screen, they affect a change to the scene that automatically pulls our eyes. The appearance and disappearance of text on screen is registered in terms of motion contrast, which according to Smith (2013: 176), is the “critical component predicting gaze behavior”, attaching to small movements as well as big. Additionally, we are drawn to words on screen because we identify them as a ready source of relevant information, as found in Batty et al. (forthcoming). Analysing a dialogue-free montage sequence from animated feature Up (Pete Docter, 2009), Batty et al. found that on-screen text in the form of signage replicates in miniature how ‘classical’ montage functions as a condensed form of storytelling aiming for enhanced communication and exposition. They suggest that montage offers a rhetorical amplification of an implicit intertitle, thereby alluding to the historical roots of text on screen while underlining its narrative as well as visual salience. One frame from the montage sequence focuses in close-up on a basket containing picnic items and airline tickets (see Fig. 9). Eye tracking tests conducted on twelve participants indicates a high degree of attentional synchrony in relation to the text elements of the airline ticket on which Ellie’s name is printed. Here, text provides a highly expedient visual clue as to the narrative significance of the scene and viewers are drawn to it precisely for its intertitle-like, expository function, highlighting the top-down impulse also at play in the eye bias caused by on-screen text.

Figure 9

Figure 9: Heat map showing collective gaze weightings during the montage sequence in Up (2009).

In this image from Up, printed text appears in the centre of the frame and, as Smith (2013: 178) elucidates, eyes are instinctively drawn towards frame centre, a finding backed up by much subtitle research (see Skarkowska and Kruger, this issue). However, eye tracking results on Sherlock conducted by Redmond, Sita and Vincs (this issue) indicate that viewers also scan static text when it is not in the centre of the frame. In an establishing shot of 221B Baker Street from the first episode of Sherlock’s second season, ‘A Scandal in Belgravia’, viewers track static text that borders the frame across its top and right hand sides, again searching for information (See Fig. 10). Hence, the eye-pull exerted by text is noticeable even in the absence of movement, contrast and central framing. In part, viewers are attracted to text simply because it is text – identified as an efficient communication mode that facilitates speedy comprehension (see Lavaur, 2011: 457).

Figure 10

Figure 10: Single viewer gaze path for ‘A Scandal in Belgravia’, Sherlock (2012), Episode 1, Season 2.


What do these eye tracking results across screen and translation studies tell us about Sherlock’s innovative use of on-screen text and texting? Based on the notion that text on screen draws the eye in at least dual ways, due to both its dynamic/contrastive nature and its communicative expediency, we can surmise that for Sherlock viewers, on-screen text is highly visible and more than likely to be in that 3.8% of the screen on which they will focus at any one point in time (see Smith, 2013: 168). The marked eye bias caused by text on screen is further accentuated in Sherlock by the freshness of its textual flashes, especially for English-speaking audiences given the language hierarchies of global screen media (see Acland 2012, UNESCO 2013). The small percentage of foreign-language media imported into most English-speaking markets tends to result in a lack of familiarity with subtitling beyond niche audience segments. For those unfamiliar with subtitling or captioning, on-screen text appears particularly novel. Additionally, as explored, floating TELOPs in Sherlock attract attention due to the complex functions they fulfil, providing narrative and character clues as well as textual and stylistic cohesion. As Tepper (2011) points out, in the first episode of the series, viewers are introduced to Sherlock’s character via text, before seeing him on screen. “When he texts the word ‘Wrong!’ to DI Lestrade and all the reporters at Lestrade’s press conference,” notes Tepper, “the technological savvy and the imperiousness of tone tell you most of what you need to know about the character.”

There seems no doubt that on-screen text in Sherlock attracts eye movement, and that it therefore distracts from other parts of the image. One question then that immediately presents itself is why Sherlock’s textual distractions are tolerated – even celebrated – to a far greater extent than other, more conventional or routine forms of titling like subtitles and captions. While Sherlock’s on-screen text is praised as innovative and incisive, interlingual subtitling and SDH are criticised by detractors for the way in which they supposedly force viewers to read rather than watch, effectively transforming film into “a kind of high-class comic book with sound effects” (Canby, 1983).[2] Certainly, differences in scale affect such attitudes and the quantitative variance between post-subtitles (produced for distribution only) and authorial or diegetic titling (as seen in Sherlock) is pronounced.[3] However, eye tracking research on subtitle processing indicates that, on the whole, viewers easily accommodate the increased cognitive load it presents. Although attentional splitting occurs, leading to an increase in back-and-forth shifts between the subtitles and the rest of the image area (Skarkowska and Kruger, this issue), viewers acclimatise by making shorter fixations than in regular reading and by skipping high-frequency words and subtitles while still managing to register meaning (see d’Ydewalle and de Bruycker, 2007: 199). In this way, subtitle processing reveals many differences to reading of static text, and approximates techniques of visual scanning. Bearing these findings in mind, I propose it is more accurate to see subtitling as transforming reading into viewing and text into image, rather than vice versa.

Situating Sherlock in relation to a range of related TELOP practices across diverse TV genres (such as game shows, panel shows, news broadcasting and dramas) Ryoko Sasamoto (2014: 7) notes that the additional processing effort caused by on-screen text is offset by its editorial function.[4] TELOPs are often deployed by TV producers to guide interpretation and ensure comprehension by selecting and highlighting information deemed most relevant. This suggestion is backed up by research by Rei Matsukawa et al. (2009), which found that the information redundancy effect caused by TELOPs facilitates understanding of TV news. For Sasamoto (2014: 7), ‘impact captioning’ highlights salient information in much the same way as voice intonation or contrastive stress. It acts as a “written prop on screen” enabling “TV producers to achieve their communicative aims… in a highly economical manner” (8). Focusing on Sherlock specifically, Sasamoto suggests that its captioning provides “a route for viewers into complex narratives” (9). Moreover, as Szarkowska and Kruger (this issue) note, in static reading conditions, “longer fixations typically reflect higher cognitive load.” Consequently, the shorter fixations that characterise subtitle viewing supports the contention that on-screen text processing is eased by its expedient, editorial function and by redundancy effects resulting from its multimodality.

Switched On

Another way in which Sherlock’s text and titling innovations extend beyond mobile phone usage was exemplified in July 2013 by a promotional campaign that promised viewers a ‘sneak peak’ at a yet-to-be-released episode title, requiring them to find and piece together a series of clues. In true Sherlockian style, the clues were well hidden, only visible to viewers if they switched on closed-captioning or SDH available for deaf and hard-of-hearing audiences. With this device turned on, viewers encountered intralingual captioning along the bottom of their screen and additionally, individually boxed letters that appeared top left (see Figs. 11 and 12). Viewers needed to gather all these single letter clues in order to deduce the episode title: ‘His Last Vow’. According to the ‘I Heart Subtitles’ blog (July 16, 2013), in doing so, Sherlock once again displayed its ability to “think outside the box and consider all…options”. It also cemented its commitment to on-screen text in various guises, and effectively gave voice to an audience segment typically disregarded in screen commentary and analysis. Through this highly unusual, cryptic campaign, Sherlock alerted viewers to more overtly functional forms of titling, and intimated points of connection between language, textual intervention and access.

Figure 11

Figures 11: Boxed letter clues (top left of frame) that appeared when closed captioning was switched on, during a re-run of ‘A Scandal in Belgravia’, Sherlock (2012), Episode 1, Season 2.

Figure 12

Figures 12: Boxed letter clues (top left of frame) that appeared when closed captioning was switched on, during a re-run of ‘A Scandal in Belgravia’, Sherlock (2012), Episode 1, Season 2.


On-screen text invites a rethinking of the visual, expanding its borders and blurring its definitional clarity. Eye tracking research demonstrates that moving text on screens is processed differently to static text, affected by a range of factors issuing from its multimodal complexity. Sherlock subtly signals such issues through its playful, irreverent deployment of text, which enables viewers to directly access Sherlock’s thoughts and understand his reasoning, while also distancing them, asking them to marvel at his ‘millennial’ technological prowess (Stein and Busse, 2012: 11) while remaining self-consciously aware of his complex narrative framing as it flips inside out, inviting audiences to watch themselves watching. Such diegetic transgression is yet to be mapped through eye tracking, intimating a profitable direction for future studies. To date, data on text and image processing demonstrates how on-screen text attracts eye movement and hence, it can be inferred that it distracts from other parts of the image area. Yet, despite rendering more of the image effectively ‘invisible’, text in the form of TELOPs are increasingly prevalent in news broadcasts, current affairs panel shows (when audience text messages are displayed) and, most notably, in Asian TV genres where they are now a “standard editorial prop” featured in many dramas and game shows (Sasamoto, 2014: 1). In order to take up the challenge presented by such emerging modes of screen address, research needs to move beyond surface assessments of the attraction/distraction nexus. It is the very attraction to TELOP distraction that Sherlock – via eye tracking – brings to the fore.



List of Figures




[1] While some commentators point out that Sherlock was by no means the first to depict text messaging in this way – as floating text on screen – it is this series more than any other that has brought this phenomenon into the limelight. Other notable uses of on-screen text to depict mobile phone messaging occur in films All About Lily Chou-Chou (Iwai, 2001), Disconnect (Rubin, 2013), The Fault in Our Stars (Boone, 2014), LOL (Azuelos, 2012), Non-Stop (Collet-Serra, 2014), Wall Street: Money Never Sleeps (Stone, 2010), and in TV series Glee (Fox, 2009–), House of Cards (Netflix, 2013–), Hollyoaks (Channel 4, 1995–), Married Single Other (ITV, 2010) and Slide (Fox8, 2011). For discussion of some ‘early adopters’, see Biendenharn 2014.



[2] Notably, in this New York Times piece, Canby (1983) actually defends subtitling against this charge, and advocates for subtitling over dubbing.

[3] On distinctions between post-subtitling and pre-subtitling (including diegetic subtitling), see O’Sullivan (2011).

[4] According to Sasamoto (2014: 1), “the use of OCT [Open Caption Telop] as an aid for enhanced viewing experience originated in Japan in 1990.”



Dr Tessa Dwyer teaches Screen Studies at the University of Melbourne, specialising in language politics and issues of screen translation. Her publications have appeared in journals such as The Velvet Light Trap, The Translator and The South Atlantic Quarterly and in a range of anthologies including B is for Bad Cinema (2014), Words, Images and Performances in Translation (2012) and the forthcoming Locating the Voice in Film (2016), Contemporary Publics (2016) and the Routledge Handbook of Audiovisual Translation (2017). In 2008, she co-edited a special issue of Refractory on split screens. She is a member of the ETMI research group and is currently writing a book on error and screen translation.

We are the Borg (in a good way): Mapping The Development Of New Kinds Of Being And Knowing Through Inter- and Trans-Mediality — Anne Cranny Francis

Abstract: Digital technologies have enabled new ways of communicating and relating to others and this has fundamental consequences for being and for meaning. In this paper I map the development of concepts of intermediality and transmediality that are used to describe textual practice and audience engagement in order to explore these changes to communication practice. At the same time I explore the new kinds of audience engagement enabled by this technology, which includes active participation in the reconstruction of older narratives in new media and the potential this affords for new meanings. It also includes the dissemination of stories, old and new, across multiple platforms by both makers and audiences, who themselves become makers, and the proliferation of stories and meanings this enables. Finally I consider the possibilities for co-creationmy hardware, your software (or vice-versa)which can enable new forms of sharing and mutual knowledge-formation.

Sherlock (BBC, 2010-- )

Sherlock (BBC, 2010– )

1. On thinking about inter- and trans-

The research for this paper led me through a range of ideas and arguments about the meanings of intermediation and transmediation, as well as their relationship to intertextuality (for example, Bakhtin 1984; Jenkins 2006; Herzogenrath 2012; Stein and Busse 2012; Phillips 2012). It led me to think about a multiplicity of texts that are all inter in some way—either intertextually related texts and the kinds of meanings they make or intermediated narratives that tell their story across a range of media and platforms—and about texts, producers and audiences that are most definitely trans—deploying a range of media and platforms to create a composite and complex world, engage with that world, and generate new meanings. This textual multiplicity in the contemporary media environment in turn raised questions about what has caused or generated these differing ways of telling a story and what is the significance of these different modes of story-telling: whether this reflects simply a change in technology (if that is ever truly simple) or if that change has consequences that move far beyond the material technologies involved—the material artefacts and related communication practices—to our ways of thinking and of being in the world.

My argument is that digital technologies have enabled new ways of communicating and relating to others and that this has fundamental consequences for being and for meaning. Further, we are only just starting to realise the possibilities and potential offered by this technology for new forms of relationship, knowledge creation and sharing. I work through these possibilities by reference to a range of texts that were suggested by my research and which recur in discussions of these new modes of story-telling and text production. My interest is not only in digital texts themselves, but also in the new forms of engagement they offer to readers, viewers and listeners to become active producers or makers of meaning alongside the creators of the work. This engagement includes our participation in the reconstruction of older narratives in new media and the potential this affords for new meanings; the dissemination of stories, old and new, across multiple platforms by both makers and audiences, who themselves become makers, and the proliferation of stories and meanings this enables; and finally the possibilities for co-creation—my hardware, your software (or vice-versa)—which can enable new forms of sharing and mutual knowledge-formation.

This exploration of shared storytelling and textual production occurs through my engagement with the theory used by media and cultural analysts to understand transformations in creativity, knowledge-formation and being. This work includes the concepts of intermediation, which explores the possibilities opened up by new media and focuses on the textual practices that enable new forms of audience engagement, and transmediation, which also explores the effect of new technologies on meaning-making but shifts its focus from textual practice to audience response. This is a subtle shift as both concepts essentially study the same phenomena (including both textual practice and audience responses), but it mirrors what Henry Jenkins called the development of ‘convergence culture’: “the flow of content across multiple media platforms, the cooperation between multiple media industries, and the migratory behavior of media audiences who will go almost anywhere in search of the kinds of entertainment experiences they want” (2006, 2). As I will go on to argue, this convergence, this sharing of and linking via new media technologies, has the potential to transform our experience of the world and, along with that, our formation of knowledge and fundamental understandings of being.

2. The Consulting Detective and The Doctor

My first thought when beginning this paper was to use the BBC (British Broadcasting Corporation) version of Sherlock (2010-) as my example of intermediation. One of the things that attracted me to this text was that it re-tells Sir Arthur Conan Doyle’s original stories in such a fresh and engaging way, not only through the revised characterisations of its principals (Holmes, Watson, Lestrade, Moriarty, Mycroft) and the rapid editing and visual layering of the mise-en-scène that creates 21st century London as the technological and social successor to Conan Doyle’s 19th century industrial London, but also by the re-framing of familiar narratives to make them directly relevant to contemporary British society. For example, The Hound of the Baskervilles (1901) is re-written by Mark Gatiss as “The Hounds of Baskerville” (2012), a story about experiments with nerve agents and genetic mutation at a United Kingdom military base. The story focuses around a local man, Henry Knight who, as a child, saw his father torn apart by a giant hound on Dartmoor, near the Baskerville military establishment. Fear of the hound is produced not, as in the original story, by phosphorescence painted onto a large dog (though the local innkeepers have a large dog that they used to spread the ‘giant hound’ story to tourists), but by a hallucinogenic drug that is released into the air by nerve pads buried in a certain part of the nearby moors. We eventually discover that Knight’s father was killed accidentally when he wandered into the test area for these nerve pads. Under the influence of the air-borne toxin, Knight tripped and hit his head on a rock while attempting to run away from Baskerville scientist, Robert Frankland, who was wearing a gas mask and so appeared monstrous. The young Henry Knight witnessed his father’s accidental death but under the influence of the nerve toxin transformed the memory into the story of the giant hound, suggested to him by the initials H.O.U.N.D. on Franklin’s jumper.

Gatiss’s story uses elements of Conan Doyle’s original but reworks them into a contemporary story about the development of chemical and biological weapons and their production within an environment of secrecy that puts citizens’ lives at risk. The main characters (Sherlock Holmes [Benedict Cumberbatch], Dr Watson [Martin Freeman], Mycroft Holmes [Mark Gatiss] and Inspector Lestrade [Rupert Graves]) are also developed further in this story, including exploration of Sherlock’s ambiguous sexuality and his relationship with Watson, which is mapped explicitly onto the gay relationship of the local innkeepers. It is an engaging tale for the Conan Doyle enthusiast as it preserves the central motif of the narrative—the ghostly hound—but finds a way of re-presenting it that changes the story from one about evil aristocrats (the original Baskerville and his ruthless treatment of the local peasants) and modern greed (a villainous descendent of the original attempting to kill the successor to the title so that he inherits the family fortune) to one about weapons of mass destruction and government secrecy. It also presents a different ‘take’ on the sexuality of Holmes (also explored in the recent films directed by Guy Ritchie and starring Robert Downey Jr. as Holmes and Jude Law as Watson [2009, 2011]), opening up the possibility that he is either gay or bisexual whereas Conan Doyle presents Holmes as relatively asexual.[1] This re-working of the story and its characters constitutes the text as more than a period adaptation of Conan Doyle’s story, set in the late Victorian period with Holmes and Watson inhabiting the world of brougham cabs and steam trains. So is this an example of intertextuality or intermediality, with the literary creation of Conan Doyle cast as another text or medium that incorporates audience engagement with the story?

Perhaps the most obvious answer here is that this re-casting of the Holmes story is an example of intermediality, defined in an early essay by Dick Higgins as generated by “the desire to fuse two or more existing media” (1966). Berndt Herzogenrath notes, however, that Higgins saw intermediality not as the final text but as “‘the uncharted land that lies between’ … different media” (2012, loc. 129-142).[2] The intermediality generated by the Sherlock re-visioning of The Hound of the Baskervilles enables the presentation of different meanings (about weapons production and secrecy) while maintaining the bones of the original narrative (about the abuse of power and the production of fear). Herzogenrath notes that in Image-Music-Text (1977) Roland Barthes related intermediality to interdisciplinarity, which occurs:

… when the solidarity of the old disciplines breaks down—perhaps even violently, via the jolts of fashion—in the interests of a new object and a new language neither of which has a place in the field of the sciences that were to be brought peacefully together, this unease in classification being precisely the point from which it is possible to diagnose a certain mutation. (loc. 129)

This disciplinary transformation might seem a heavy burden to place on Sherlock, however it is certainly the case that this production of The Hound of the Baskervilles in a different medium tells different stories and interrogates different aspects of everyday life (military activity, government control, sexual identity) from Conan Doyle’s original. Moreover, as discussed further below, Mark Gatiss’s revision of The Hound of the Baskervilles might be seen as Bakhtin’s heteroglossia in practice with Gatiss’ story constituting another voice/telling that reiterates some original narrative elements whilst adding some and transforming others.

Jeremy Brett as/in Sherlock Holmes (1984-94)

Jeremy Brett as/in Sherlock Holmes (1984-94)

From a contemporary perspective the transfer from literary text to television may not seem a case of disciplinary violence, however, some time ago it did. When television was younger and literature was a canonical art form, the production of a literary work as a television program led inevitably to discussions of what was ‘lost’ by the transfer to such an ‘impoverished’ medium. It is only far more recently that we have understood that an intermediated work is offering something new and different, unconstrained by the disciplinary shackles of the past. This realisation enables Sherlock to be written as a contemporary series, while retaining characteristics of its Victorian predecessor—as distinct, for example, from the older BBC series, The Adventures of Sherlock Holmes starring Jeremy Brett (1984-1994) that retained the Victorian setting for the stories. This successful relocation of the narrative for Sherlock depends on viewers being able to read across media platforms without the disciplinary blinkers of an earlier time; they no longer consider the narrative confined to a particular space/time as defined by the originary text. Instead, as regular consumers of postmodern pastiche, they adjust their reading practice for the complex network of intertextual references and narrative transpositions that constitutes this contemporary Sherlock.

This is more than simply a change in forms of entertainment or the emergence of new technologies. This radical unhooking of the narrative from its original space/time and the ability to read the stories for a different age, with different values and different concerns, is characteristic of the specificity and locatedness (sometimes read as relativism) of postmodernity. The postmodern producer appreciates the origin of textual forms and practices and is able to re-mediate them in order to make new meanings for a new time. Similarly, the postmodern consumer is able to appreciate the multiplicity of (textual) voices that constitute their world, and is not constrained to one major or canonical form of textual address as the bearer of cultural value. This is a reflexive consumer who maps networks of meaning extending beyond the confines of a specific text and its world; the viewer of The Matrix (The Wachowski Brothers, 1999) who knows to ‘follow the white rabbit’ to a looking-glass world that is our own world, and yet is not.

One of the means by which this reflexive writing and viewing practice has been understood is through the concept of intertextuality—used to describe the practice of referencing from one text to another via a character, icon, event or interaction, along with the meanings associated with that reference. Based on the work of Mikhail Bakhtin who saw every text as the premise for and related to every other text, via the heteroglossia (different voices) that constitute(s) our world, intertextuality is a way of mapping the complexity of communication practices and the meanings they convey, along with the impossibility of exerting total control over the meanings associated with a particular utterance (1984, 278). Intertextuality is about meaning and its constant deferral (in Derrida’s terms) not just the appearance of story elements in different texts. So intermediality acknowledges the use of different media or platforms to convey a specific narrative while intertextuality is a way of exploring the meanings constructed.

One way of mapping the possible meanings generated by viewer engagement with (intermediated) texts—including their constant deferral of meaning—is through the notion of genre, since this is the way that we typically classify texts in order to render them accessible. In a sense genre imposes order on the chaotic heteroglossia of our world so that it does not become an incomprehensible Babel in which each individual is isolated by a wholly idiosyncratic reading/viewing/meaning-making practice. Not only does genre identify the conventions or characteristics shared by the texts that we recognise as similar and so enable us to trace their history, it also identifies the kinds of issues commonly addressed by those texts. Science fiction, for example, commonly addresses the relationship between human beings and their technology, how technology influences our lives and even the fundamental nature of human being. This is evident in science fiction works such as Blade Runner (Ridley Scott, 1982) and The Matrix (1999), both of which explore how we deploy technology and what this tells us about ourselves. And this exploration of identity and technology has its roots in what is commonly regarded the first science fiction text, Mary Shelley’s Frankenstein ([1818] 1982), written at the height of the first Industrial Revolution in western societies, when steam power had transformed work practices and social relationships, obliterating older forms of labour and the classes who performed it and reconstructing society into new classes. This industrial context may not be explicit in every reference to Frankenstein but it echoes through portrayals of the angry, sad and abandoned creature and his deluded creator, who become the robots/androids of today and us, their sometimes deluded or unaware creators and users.

Sherlock and Moriarty

Moriarty and Sherlock

One of the striking features of Sherlock is its stylistic similarity to Doctor Who, generated by the visual aesthetic, costuming, editing, and the enigmatic and manic main character, Sherlock/The Doctor and his mirror self, Moriarty/The Master. This might seem unsurprising given that the same creative team is responsible for both programs; writers, Stephen Moffat and Mark Gatiss devised the idea for Sherlock on the train to Wales to work on Doctor Who (which is produced in Cardiff). However, that fact does not explain the resulting program and its success. A generic analysis of the two series is suggestive, showing that both science fiction (Doctor Who) and detective fiction (Sherlock) have their story-telling roots in Gothic fiction, which was preoccupied with questions about being, the nature of the real, the nature of good and evil, and the dual (good/evil) nature of humanity. In science fiction those concerns are directed to an exploration of our relationship with new technologies, as discussed above.

The Doctor and The Master

The Doctor and The Master

Detective fiction focuses on the nature of knowing, personified in the detective, beginning with Edgar Allan Poe’s brilliant investigator, C. Auguste Dupin in stories the author described as “tales of ratiocination” (2010). Dupin employs a version of the scientific method (involving observation and analysis) leavened with imagination, which enables him to look beyond the obvious. Conan Doyle’s Sherlock Holmes is even more scientific in his practice, but with the same disdain for conventional ways of thinking. This deployment of scientific method in order to solve social (rather than scientific) problems focuses attention on the process of knowledge formation (how we know and understand our world and each other) and its role in our understanding of morality (whether good and evil are easily identified) and of being (whether human beings are simply good or evil). The contemporary BBC Sherlock continues this tradition of the scientific detective informed by an eccentric imagination that enables him to step outside conventionalised patterns of thought and assumption.

Intertextually, Doctor Who and Sherlock share the Gothic preoccupation with interrogating the nature of being and of knowledge, which is evident in some shared generic conventions and preoccupations, though each also has other specific interests—technology (science fiction), the social construction of good and evil (detective fiction). The value of intertextuality is that it enables us to see how these texts are constituted by the kinds of meanings they are making. It allows us to understand why two genres that we now consider quite different can have shared ontological and epistemological preoccupations, because of a common generic ancestor.

Like intertextuality, intermediality is about textual practice. We saw above that the interdisciplinarity that was generated by the postmodern recognition of diversity and difference (and hence the rejection of certainty, grand narratives and canonical textuality) enabled the production of a Sherlock that is not a period drama but a contemporary construct, telling stories of today’s world. At the same time, as the brief intertextual study of genre shows, it also deploys a conventional detective with an eccentric mix of scientific method and artistic creativity whose ‘ratiocination’ at times leads him to find villainy not in evil individuals, but in the government and its representatives. Intermediality is useful for mapping that kind of practice, where a narrative devised in one medium is transposed into another where it deploys meanings enabled by its original production, but also produces new and different meanings that are generated via this transposition.

3. Spirituality and stained glass

The stained glass windows in Christian churches deploy a similar practice, taking stories from one medium (the Biblical word of God) and realising them in another medium (coloured glass). Interestingly the windows feature a complex iconography that would appeal to the modern gamer, with icons emblematic of values and ideas that cluster around the central theme and its story arc but open up depths of spiritual meaning. One reading of these windows is that they told these stories for illiterate peasants who had no access to written versions of biblical tales. Roger Homan notes: “The great transept window at Canterbury known as the Biblia Pauperium (poor person’s bible), for example, depends upon an extensive visual vocabulary of symbols and an awareness of the supposed theological links between the biblical scenes featured in adjacent panels” (2005). In this way the windows acted as a point of meditation for the viewer, recalling the story and its religious significance. Homan notes also that many scholars believe that preachers used the windows as a reference point in sermons, especially those delivered in the vernacular of the uneducated. They could literally point to the visual representation of the story and explain their exegesis, so that later viewings of the window would recall not only the details of the story but its religious significance.

In his study, Religious Art in France XIII Century (1913) Émile Mâle begins by noting:

To the Middle Ages art was didactic. All that it was necessary that men should know—the history of the world from the creation, the dogmas of religion, the examples of the saints, the hierarchy of the virtues, the range of the sciences, arts and crafts—all these were taught them by the windows of the church or by the statues in the porch. (vii)

Mâle goes on to explain that this art is not easily decipherable to the modern viewer who may mistake elements of the works as purely figurative, bringing a momentary pleasure to the eye. By contrast: “In mediæval art every form clothes a thought; one could say that thought works within the material and animates it” (viii). Roger Homan adds to this an appreciation of the role of the material used in the art-work:

But there are properties of coloured glass that are of deeply spiritual significance and have been recognized by, for example, Pseudo-Dionysius in the first century and Bishop Grosseteste in the thirteenth. We view not an image but the light beyond which it mediates for us. The image owes its life to that ultimate light. This sense is much keener than it is in respect of the reflection of light upon opaque surfaces. The stained glass image is therefore like an ikon: we are not to look at it but through it. (2005)

If we regard the stained glass window as an intermediated presentation of religious and spiritual concepts and stories, then Homan’s analysis leads us directly to the point of intermediation—the light generated by the glass, which is as critical to the meanings of the windows as the images and icons created. Homan speaks of the role of the stained glass as being “to sedate light”: “A stained glass window slows us down; it inclines us to proceed reverently and lower our voices” (2005). The sensory effect of the coloured light produced by the windows is to remove viewers from the everyday world, locating them in an otherworldly space in which to contemplate religious mysteries and spiritual truths. This is surely the essence of the intermedial experience, not a translation from one art form to another, but a transformation of being and knowing generated by the (sensory) engagement of the viewer. Again note that although intermediality does address the effect on viewers of a particular form of text, its focus is on textual practice rather than audience interaction. Which is to say, the concept of intermediality tends to address primarily the ways in which the text positions the viewer, rather than the multiple active engagements of viewers.

4.   Boba Fett, children’s television and transmediality

The term that seems to best capture the active engagement of audiences or consumers of contemporary texts is transmediality. Henry Jenkins popularised this term in his influential study, Convergence Culture: Where Old and New Media Collide, first published in 2006. Writing about the Matrix phenomenon that had recently developed through the Wachowskis’ interrelated films, games and online comics, Jenkins identifies the work as transmedia storytelling as follows:

A transmedia story unfolds across multiple media platforms, with each new text making a distinct and valuable contribution to the whole. In the ideal form of transmedia storytelling, each medium does what it does best—so that a story might be introduced in a film, expanded through television, novels, and comics; its world might be explored through game play or experienced as an amusement park attraction. Each franchise entry needs to be self-contained so you don’t need to have seen the film to enjoy the game, and vice-versa. (loc. 1974)

This directly confronts older canonical notions of the text as a bounded entity, with the roles of the reader, viewer or listener being to unlock the meaning of that text. Instead it acknowledges the active role of the consumer (who moves between these different media) in creating story and generating meaning that is implicit in the notion of intertextuality. However, this is a different consumer from the medieval worshipper, and the key to that difference is the accessibility of a range of media.

Some thirty years ago, as a creative consultant to a network television producer of children’s programming, my job was to construct the world of a particular television program. Like Lucas’s enormously influential Star Wars series it was set in a different space—a set of planets orbiting a small star, each with their own names and characteristics. I no longer remember the details of the exercise but the project report was about forty pages long, and detailed everything a child might want to know about living on that planet. The aim of the exercise was to create a world that all the separate sequences of the program—games, stories, cartoons, write-in quizzes, the club—could refer back to, so that the show maintained a basic coherence. We wanted our viewers to feel at home in that universe, to feel a sense of engagement and belonging.

Lucasfilm led the way with this kind of world-formation by marketing a series of products that not only capitalised on viewers’ responses to the films, but also provided them with the tools to repeat and enhance that experience imaginatively. And, as Jenkins noted in Convergence Culture, Lucas did not simply endlessly repeat the story of the movie: “When Star Wars went to games, those games didn’t just enact film events; they showed what life would be like for a Jedi trainee or bounty hunter” (2006, loc. 2172). Later in the same chapter Jenkins notes that Lucas found that the value of developing toys based on secondary characters was that they might take on a life of their own: “Boba Fett eventually became the protagonist of his own novels and games and played a much larger role in the later films” (loc. 2273).

Again we might argue that this has happened before, with stories based on earlier texts that expand their imaginary world, including some based on Conan Doyle’s Sherlock Holmes stories: for example, Nicholas Meyer’s novel Seven-Per-Cent Solution ([1974] 1993) presents a back-story to Holmes’ addiction to cocaine (the novel was made into a film of the same name in 1976). What is new, however, is both the number of different media to which consumers have access and the degree to which they can engage with those media. Jenkins quotes Janet Murray’s assessment of the ‘“encyclopedic capacity’ of digital media, which she thinks will lead to new narrative forms as audiences seek information beyond the limits of the individual story” (2006, loc. 2283). Jenkins goes on to argue that, unlike some critics, he does not see this as leading to the death of narrative: “Rather, we are seeing the emergence of new story structures, which create complexity by expanding the range of narrative possibility rather than pursuing a single path with a beginning, middle, and end” (loc. 2323). Of course, it is crucial to know who is developing these new stories and how they relate to the original text.

If we use the example of the Matrix franchise, the whole massive narrative edifice stayed effectively in the control of the Wachowskis. For some viewers it was too complex to try to follow its development and they found the films increasingly difficult to understand, whilst the more dedicated fans were unhappy with the Wachowskis’ attempts to explain every aspect of their narrative, as Jenkins documents (2006, loc. 2436-2446). A fine line exists between the authorial control required to maintain the integrity of the narrative and the dictation of detail that closes down the engagement of the audience. Andrea Phillips discusses this in her practical introduction, A Creator’s Guide to Transmedia Storytelling (2012). She argues “the most effective tool is to actually create a small piece of your world and give it to your audience to play with” (41).

Phillips’ description of transmediality is subtly different from that of Jenkins, perhaps because of their different roles (Jenkins as critic and theorist, Phillips as maker). In her role as storyteller Phillips is concerned not to shut out the audience, so describes her world-building in a way that prioritises audience engagement. In Chapter 8, “Writing for Transmedia Is Different” Phillips notes that “we’ll be concentrating mainly on the requirements of telling a single, highly fragmented story across multiple platforms, and most particularly across digital platforms—you might call it social media storytelling as much as transmedia. That’s because this is where the methods of traditional single-platform or flat narratives become inadequate” (74-75). She goes on to explain this distinction in terms of the strategies used to enable the world of the narrative to be expanded by the audience: “Transmedia storytelling is an exercise in open-ended storytelling, boundless where a traditional single-medium story is finite” (75). Phillips explains that the storyteller should suggest to the audience that the world of the narrative includes more stories than the one that they have been given (75).

As noted earlier, one of the great successes of Star Wars is that its narrative is not confined to a specific set of incidents, rather the narrative contains the seeds of many other stories, featuring characters such as Boba Fett whose role in the core narrative is relatively minor but has the potential for new storytelling and world-building. By contrast, die-hard Matrix fans were disappointed when the Wachowskis attempted to lock down the meanings of the trilogy to a specific story by resolving the mystery, leaving little scope for imaginative retellings by fans. Instead Phillips notes the value of deliberately leaving loose ends that might become the source of new stories, which directly contradicts conventional advice given to writers. Though she also notes that these narrative possibilities have to be executed judiciously so that you do not “accidentally create narrative expectations that never achieve any kind of payoff” (76). Hence her earlier point about the importance of a clear story arc: “It is especially important in transmedia to have a plot that goes from beginning to end before you launch” (57). Another strategy to enhance narrative openness is “to create story elements in one medium that have their payoffs in another medium” (78), such as a game based on a film. All of this has to be achieved in relation to the basic premise with which she opens the study: “every single element of a transmedia story has to be fulfilling a narrative purpose, without exception” (40-41). And as she notes the aim of transmedia storytelling, as well as the marketers who use it, is engagement: “Transmedia storytelling can provide more engagement and more potential points of sale for any given story, and when it’s done well, each piece can effectively become a promotional tool pointing toward every other piece of the whole” (39). Every strategy used by the storyteller, therefore, should be about giving the audience “things to do, not just things to consume” (117).

Phillips’ Guide addresses textual practice directly in relation to audience or consumer engagement, though Phillips also stresses the need for a critical understanding of textuality (63). This engagement is the both the reason for transmedia production (to sell products, to tell a story) and the result of audience access to multiple media. As Phillips reiterates in her book, this engagement, and the textual openness that enables it, makes transmedia storytelling different from earlier forms of media narratives and audience-media relationships.

The Matrix (1999)

The Matrix (1999)

5. The joy of discovery and the fossilised dolphin

I return here to Jenkins’ crucial insight in Convergence Culture, that this different form of storytelling, described so well by Phillips, and common to the popular culture that preoccupies most children, signifies a new way of being and knowing:

Our workplaces have become more collaborative; our political process has become more decentered; we are living more and more within knowledge cultures based on collective intelligence. Our schools are not teaching what it means to live and work in such knowledge communities but popular culture may be doing so. (2006, loc. 2477)

For Jenkins this makes literacy training for children essential so that they can “develop the skills needed to become full participants in their culture” (loc. 5295), as Phillips argued when she stressed the need to be critical. The joy of transmedia engagement is that of discovery, of finding a way to contribute to the meanings of a text through your own creativity so that your stories are woven into that ever-expanding composite text. As Jenkins notes, however, this is more than a solitary venture. It is about being able to collaborate with others and to contribute to a collective venture without feeling a loss of individual achievement.

Digital technology has enabled this kind of sharing on an extraordinary scale—whether through kids playing games online with others across the globe, researchers collaborating on a project across cities, countries or continents or fans world-wide expanding a beloved narrative. It is also evident in the ways that older media such as radio and television use online resources to expand their research, engage their audiences, and incorporate audience responses and knowledge into their broadcast formats. Museums and libraries too are sharing resources and inviting visitors to become part of the knowledge-production for the institution. For example, by checking the digitisation of older manuscripts and newspapers for verisimilitude. On the one hand, this reflects economic necessity and the poor resourcing of many public institutions. On the other hand, it creates a wholly different, expanded knowledge base for the library, an enhanced level of engagement for visitors. Effectively, this visitor/user involvement changes the nature of the library from that of a central authority giving access to knowledge to a collaborative, creative, knowledge-building project. In December 2013 the British Library released an archive of over 1,000,000 images onto Flickr Commons for free use and reproduction. Dan Colman reported in Open Culture (2013):

The librarians behind the project freely admit that they don’t exactly have a great handle on the images in the collection. They know what books the images come from. (For example, the image above comes from Historia de las Indias de Nueva-España y islas de Tierra Firme, 1867.) But they don’t know much about the particulars of each visual. And so they’re turning to crowdsourcing for answers. In fairly short order, the Library plans to release tools that will let willing participants gather information and deepen our understanding of everything in the Flickr Commons collection.

Many other libraries and art galleries around the world have released part of their archives to open access and at the same time invite visitors to join them in becoming producers of knowledge.

Recently the Smithsonian Museum in Washington D.C. announced Smithsonian X 3D, a web portal that enables visitors to use the museum’s 3D scans of artefacts to build their own models using 3D printers. Günter Waibel, Director of the Digitization Program Office, explains:

These projects indicate that this new technology has the potential not only to support the Smithsonian mission, but to transform museum core functions. Researchers working in the field may not come back with specimens, but with 3D data documenting a site or a find. Curators and educators can use 3D data as the scaffolding to tell stories or send students on a quest of discovery. Conservators can benchmark today’s condition state of a collection item against a past state—a deviation analysis of 3D data will tell them exactly what changes have occurred. All of these uses cases are accessible through the Beta Smithsonian X 3D Explorer, as well as videos documenting the project. For many of the 3D models, raw data can be downloaded to support further inquiry and 3D printing.

And he concludes:

With only 1% of collections on display in Smithsonian museum galleries, digitization affords the opportunity to bring the remaining 99% of the collection into the virtual light. All of these digital assets become the infrastructure which will allow not just the Smithsonian, but the world at large to tell new stories about the familiar, as well as the unfamiliar, treasures in these collections.

This venture confirms many of Jenkins’ earlier predictions about how digital technologies will change our ways of producing knowledge. One of the artefacts currently available is the fossilised skull of an unknown species of dolphin, found in rocks that are 6-7 million years old. The Smithsonian X 3D website now supplies the software and instructions to print your own 3D copy of the skull. Even though this will not be the original skull, the value of a tactile engagement with the reproduction should not be underestimated. As a number of recent studies have argued (see Classen 2005, 2012; Howes 2005; Chatterjee 2008; Candlin 2010; Cranny-Francis 2013) tactile contact, indeed all kinds of sensory engagement, generate bodily responses that in turn produce new ways of knowing and understanding an object and our relationship to it. By sharing these knowledges, we learn more about not only the objects, but also ourselves.

6. Conclusion

The terms intertextuality, intermediality and transmediality map the development of new communication technologies through the twentieth and into the twenty-first century. They all effectively interrogate older canonical notions of textuality and of reading, as closed practices controlled by the author. Intertextuality was used to argue that texts have never been closed but part of an infinite conversation to which all texts contribute, and that each textual reading adds another voice to the conversation. Intermediality reflected the beginnings of popular access to multiple media, enabling users to explore the ways in a particular narrative or text may be transposed from one medium to another, expanding or enhancing the original story or idea. Transmediality is an articulation of convergence culture, whereby audiences are able easily to traverse and correlate a range of media in order to explore a complex and growing narrative or argument. The difference between intermediality and transmediality is not simply quantitative, however, it reflects a new way of understanding our relationship to texts, knowledge, and each other. It reflects, as Jenkins notes, the development of a collective knowledge culture in which collaboration is a key component of thinking and being. Further, the materials and practices that new technologies are making available, which incorporate bodily knowledges into this collaborative production of knowledge, presage new kinds of understanding and self-knowledge. As both Jenkins and Phillips argue above, the element required to leaven this heady mix is critical awareness—of the texts we produce and the meanings we make.



