“You gave me no choice”: A queer reading of Mordred’s journey to villainy and struggle for identity in BBC’s Merlin – Joseph Brennan

Abstract: This essay performs a queer reading of the Mordred character—that great archetype of the treacherous villain—from BBC’s Merlin (2008–2012) so as to examine his role in a series that garnered a devoted following among ‘slash fans,’ who homoeroticise male pairings. By charting the various catalysts that set this villain on his path, we are privy to insights into the representations and (queer) metaphors of this popular British series and what these elements have to tell us about this reimagined legendary villain. This reading is supported by analysis of slash fanart (known as ‘slash manips’), which support my reading and delve into typologies that help examine the construction and journey of Mordred as the archetypal villain, as well as his multiple identities of knight and magician, and queer associations of his struggle for self. This reading offers insight into the reimagining of an iconic villain, as well as the various types and queer metaphors the character’s journey in this popular series illuminates.

Introduction

The Arthurian legend’s Mordred, like Bram Stoker’s (1897) Count Dracula or Arthur Conan Doyle’s (1894) Professor Moriarty, is one of literature’s most iconic villains; his portrayal in the legend’s best-known rendition, Thomas Malory’s (1485) Le Morte d’Arthur, for example, is as a Judas figure. (For those unfamiliar with the legends of King Arthur, Aronstein 2012 is an accessible introduction.) The Mordred character’s morphological qualities as the archetypal villain (see Propp 1968), combined with his weight in Arthurian literature, meant his appearance and relationship with Arthur—that great hero of Western literature and folklore, fated to die at Mordred’s hand (see Sutton 2003)—was highly anticipated from the start of the BBC’s recent television adaptation of the legend, Merlin (2008–2012). Mordred was also a major source of tension for the titular character in the series, ‘Merlin the Magician,’ who in this adaptation keeps his magical identity as the most powerful wizard in all of Albion (Britain) secret from ‘Arthur the King’ until Arthur’s death at an also-magical (and also-knight) Mordred’s hand in the climactic Battle of Camlann, which ended the program’s five-year run. This essay performs a queer reading of the Mordred character so as to examine his role in a series that has garnered a devoted following among slash fans, who create artistic works that actualise latent homoeroticism in popular texts. This reading is bolstered by analysis of select ‘slash manips’ featuring the character. A form of visual slash, these images help to anchor this author’s reading by connecting it with fans’ own queer interpretations of Mordred and his interactions with other men, Merlin and Arthur specifically. By charting the various catalysts that set this villain on his path, we are privy to insights into the representations and (queer) metaphors of this popular British series, and what these elements have to tell us about this reimagined legendary villain. Further, such a reading allows us to hypothesise about how Mordred’s villainy could all have been avoided if only his dual identities of Magician and Warrior had been accepted by his mentor, Merlin, and his master, Arthur.

Merlin (2008–2012)

Spanning five years and 65 episodes, Merlin chronicles the namesake’s acceptance and fulfilment of his destiny to assist Arthur in becoming the king of legend. Advising him along the way is his guardian Gaius (Richard Wilson) and a dragon Kilgharrah (voiced by John Hurt); while King Uther (Anthony Head), and later Morgana (Katie McGrath) and Mordred (Alexander Vlahos), are his main hindrances. It differs from most interpretations of the King Arthur legend by making Merlin and Arthur (portrayed by Colin Morgan and Bradley James, respectively) contemporaries (Sherman 2015, 93) in a world where magic is outlawed. The resultant need for secrecy from Merlin became a central narrative drive throughout the series, with the character only revealing his true self to Arthur in the final episode—an eventuality anticipated from the pilot. For many fans, Merlin’s ‘magic reveal’ in the final episode invites comparison with coming out as homosexual, for it is only after revealing his true self to Arthur that the pair’s love for each other may be acknowledged. Queer viewers can easily identify with characters such as Mordred and Merlin, who keep their identities secret in fear of an unaccepting society, forming a “wishful identification” (see Hoffner and Buchanan 2005) with such characters’ struggle for acceptance and identity in a universe hostile to ‘their kind.’

The finale saw the death of King Arthur in the arms of his manservant, Merlin, an event that was foreshadowed from the first episode of the final season.[1] Arthur is slain by his former knight and surrogate son, Mordred, who feels betrayed by both Arthur and Merlin, two men that represent two sides of himself—Warrior and Magician—that he failed to reconcile. This essay’s queer reading of the Mordred character is from the position of an aca–fan (an academic and fan, see Brennan 2014b). It is written with the belief—put forth by Henry Jenkins in his seminal text on television fan cultures, Textual Poachers—that “speaking as a fan is a defensible position within the debates surrounding mass culture.” (1992, 23) To this end, I use fan readings of the series and analyse select photo montaged fan works (known as ‘slash manips’), including some from my own practice, to support my reading and delve into typologies that help examine the construction and journey of Mordred as the archetypal villain, as well as his multiple identities of knight and magician, and queer associations of his struggle for self.

Medieval (Homo)Eroticism, Queer Readings, and Slash Manips

Scholarship on the series, in the form of chapters in edited collections (see Elmes 2015; Meredith 2015) and journal articles (see Foster and Sherman 2015 for a special issue on the subject in Arthuriana), have begun to explore its significance. In particular, scholars have examined its representations and the value of its unique version of a legend that is broadly familiar to most viewers (Britons particularly). Such familiarity, as Jon Sherman points out, makes up much of Merlin’s appeal (2015, 97). Among this scholarship is my own article (see Brennan 2015), which performs a queer reading of the Lancelot character (the great Romantic archetype) as he appears in this BBC series and the works of Thomas Malory, T.H. White, and Marion Zimmer Bradley. In this recent article, I situate the popular series in the long heritage of Arthurian adaptation. The article also includes an examination of a tradition of using queer theory to analyse Arthurian texts (see Brennan 2015, 21–22). In particular, I explore the proposition by certain medievalists (see Burger and Kruger 2001; Zeikowitz 2003) that a ‘queer approach’ (see Halperin 1995) to texts of or set in the Middle Ages can be useful in making “intelligible expressions of same-sex desire.” (Brennan 2015, 21) The applicability of queer readings to this series is perhaps illustrated best by the fan followings it has inspired, which contribute to its status as a ‘cult text.’ (See Hills 2004 and his definition of cult television as a complex interaction among television texts, discourses about them, and the fan practices these texts inspire; also see Machat 2012, who examines Merlin fanfic trailers to explore how fans of the series remix the canon relationship of its male protagonists.)

Of particular relevance to a queer approach to television series such as Merlin are the products of ‘slash’ fans and their exploration of homoeroticism in popular texts, often of which lack representations of homosexuals (see Russ 1985; Bacon-Smith 1992; Jenkins 1992). Slash derives its name from the convention of using a forward slash (/) to designate sexual male pairings, such as ‘Arthur/Mordred’ (see Jones 2002, 80). Slash fans produce texts in the form of fiction, video, and art to depict their (often subversive) homoerotic readings. The attraction of Merlin to many slash fans can be read as a result of Merlin and Mordred’s secret identities as sorcerers in a world where the practice of sorcery is punishable by death. For many fans, magic here is a metaphor.[2] And when magic is read as a metaphor for homosexuality, as David M. Halperin reminds us, the term ‘queer’ becomes available: to “anyone who is or feels marginalized because of her or his sexual practices.” (1995, 62) I have examined the Merlin/Arthur pairing previously (see Brennan 2013) in an article that also introduces a form of slash that had at that time yet to receive scholarly attention, namely ‘slash manips.’ (See Brennan 2014a for more on the significance of slash manips with respect to how slash practice has been defined.)

Slash manips remix images from the source material (such as high resolution screen shots or promotional images from Merlin) with images from scenes selected from gay pornography. Most commonly, these works come in the form of two characters’ heads (often with expressions of exertion) digitally superimposed onto gay porn bodies (that generally match the physicality of the characters in question). It is a process I describe as the ‘semiotic significance of selection’ (see Brennan 2013). This present article includes analysis of select slash manips involving the Mordred character, all of which are reproduced here with the permission of the respective artists. The inclusion of these works is useful in the context of a queer reading of Mordred because the visual impact of these digital manipulations, in addition to complementing discussion of symbolism of certain scenes, also themselves are distinctly ‘queer.’ Such imagery is in of itself an embodiment of the “project of contestation” this is queering, in addition to helping disrupt “our assumptions about medieval culture and textual practices.” (Lochrie 1997, 180)

Reading Character: Mordred-as-Villain

In his seminal syntagmatic structural analysis of folklore, Morphology of the Folktale, Vladimir Propp (1958) develops a typology that identifies seven character types in folktales, each with a role to play in forwarding the narrative, namely: Villain, Donor, Helper, Princess, False Hero, Dispatcher, and Hero. By focalising the story through Merlin, two central heroes emerge in this retelling: Merlin and Arthur. (Ordinarily Merlin would be the ‘helper’ character type, the hero’s guide who prepares Arthur and provides him with magical assistance.) As my close reading will demonstrate, with Merlin-as-hero Mordred is consigned to the villain type, as he is never viewed by this character with anything other than suspicion of villainy; from the perspective of Arthur-as-hero, conversely, Mordred is a false hero, a character once viewed as good who becomes evil, much like the series’ other false hero, Morgana (known to legend as Morgan le Fay), who in this version of the legend, Mordred turns to after being betrayed by the heroes of the story. This essay explores how the heroes’ own categorising of Mordred’s character ensures his path as villain, as confirmed by Mordred’s final words to Arthur: “You gave me no choice.” (V.13 [abbreviated season and episode number]) This reading is similar to Mary Stewart’s 1983 novel, The Wicked Day, which retells the legend from Mordred’s perspective, portraying him sympathetically as a victim of circumstance and confirming that we are all the heroes of our own story.

Mordred as he appears in Merlin is fascinating not only because he is a villain of the series—and villains are often fascinating in queer readings—but further because he bridges the central characters of Merlin and Arthur, or ‘Merlin/Arthur,’ who are described in the series as “two sides of the same coin” (Kilgharrah, V.3). In a queer reading, Merlin (manservant)/Arthur (master) as two sides of the same coin create a binary chain of tails/heads, bottom/top, passive/active, sorcery/non-sorcery, intuition/rationality, magic/strength, feminine/masculine, homosexual/homosocial. Mordred as both sorcerer and knight, straddles these positions in Merlin, moving freely between them, which is in part why the titular character—with his intention to “Keep the magic secret” (a series tagline)—can only ever see Mordred as a threat. Conversely, to Mordred, Merlin represents someone with magic like himself. Someone who can help him negotiate his dual identity of knight/sorcerer. As this essay’s close reading of select episodes will reveal, by not trusting him, what Merlin ultimately denies Mordred (freedom to be himself), is also what he ultimately denies himself.

Reading Character: Mordred and the Magician/Warrior Archetype

William P. McFarland and Timothy R. McMahon (1999) employ the four masculine archetypes of King, Lover, Magician, and Warrior (see Moore 1991; Moore and Gillette 1990, 1992) to outline the respective benefits of each to homosexual identity development. The King archetype displays “qualities of order, of reasonable and rational patterning, of integration and integrity” (Moore and Gillette 1990, 62); the Lover is “deeply sensual, sensually aware, and sensitive to the physical world in all its splendor” (ibid., 121); the Magician bears the characteristics of “thoughtfulness, reflection, and introversion,” exhibiting “the ability to connect with inner truths” (McFarland and McMahon 1999, 51); and the Warrior incites others to “take the offensive and to move out of a defensive or holding position about life’s tasks and problems” (Moore and Gillette 1990, 79).

These archetypes are useful in introducing the characters of Mordred, Merlin, and Arthur, each of whom, in addition to being literal personifications of these archetypes, display a combination of the corresponding traits in their representation: Mordred (as Lover, as Magician, as Warrior), Merlin (as Lover, as Magician), and Arthur (as King, as Warrior). These archetypes are useful in plotting the binary of Arthur/Merlin, primarily King/Magician, and the manner in which Mordred belongs to both men, while ultimately struggling and eventually failing to exist in the grey area between the well-defined and policed binaries the men embody. For while being Magician and Lover affords Merlin (as Helper) attributes that Arthur both needs and does not possess himself (as King and Warrior; hence the earlier ‘coin’ metaphor), these are identities that Merlin conceals, that bring shame within the context of the series, for they also bear feminine (Lover) and queer (Magician) connotations; and thus Merlin is treated as such in the series, excluded from Arthur’s homosocial circle of knights, and ridiculed for his sensitivity, his lack of masculine worth—“Pathetic. You’re pretending to be a battle-hardened warrior, not a daffodil.” (Arthur to Merlin, I.2). By being King, Arthur “stabilizes chaotic emotion and out-of-control behaviors” (Moore and Gillette 1990, 62), he controls the unruly feminine, which is how sorcery is defined (and portrayed by Morgana), and thus needed to be outlawed, by the ultimate Father and King, Uther.

In this essay I examine the otherness of Mordred and how his pole personas of Warrior/Magician, knight/sorcerer, hero/villain, toy with Merlin and his efforts to maintain separation between such identities. In particular, I consider the Druid boy’s appearances over the final season of Merlin and his transition to Arthur’s favourite knight, as well as the fluidity and openness with which he occupies positions of otherness, as is supported by slash manips featuring the character. The essay also explores how Mordred subverts the homosocial order of Camelot in a way Merlin never could, eroticising the sacred bonds between Arthur and his men.

Arthur/Mordred: The Erotic Bonds of Heroes and Villains

Figure 1. The Arthur/Merlin/Mordred homosocial triangle (V.1).

Figure 1. The Arthur/Merlin/Mordred homosocial triangle (V.1).

Male heroes and villains of legend and myth share obsessive bonds and a covert homoeroticism (Battis 2006). The villain becomes obsessed with the hero’s body, “with finding his weakness, with penetrating or shattering or inflicting violence upon him” (ibid.). In his obsession, the villain becomes a “failed version” of the hero, needing to eradicate the hero to validate his own perverse ethical agenda, not just interested in ruling the world, but in “ruling the hero’s body” as well (ibid.). Writing here on the comic book tradition and the queer potential of the central antagonism of Clark Kent/Lex Luthor as they appear in the television series Smallville (2001–2011), Jes Battis’s description is also suited to the rivalry of Arthur/Morgana.[3] As villain and woman, Morgana seeks to disrupt and possess all that Arthur is—chivalric order, his reign, and his legacy—so as to impose her own worldview on the realm. “I want his annihilation, Mordred,” she tells him in V.2. “I want to put his head on a spike and I want to watch as the crows feast on his eyes.” While not homoerotic, there is a taboo eroticism inherit in Arthur/Morgana due to their blood relation, and the romantic references to the pairing in season one—such as in I.5, when Guinevere confides in Merlin that she hopes one day Arthur and Morgana will marry. Mordred, who responds to Morgana’s blood thirst by urging her to “calm yourself,” (V.2) is different. He is, in the end, fate’s and Morgana’s pawn—particularly when compared with other adaptations in which the character appears, T.H. White’s The Once and Future King particularly (see Thomas 1982). That Mordred’s villainy is an extension of Morgana’s perverse agenda is an idea put forward by Erin Chandler, who argues that at times (such as in season three):

the series focuses on Morgana playing what is essentially the legendary Mordred role, turning against her father, Uther, and everyone dear to him for his past actions and his refusal to acknowledge his errors. (2015, 109–10)

After all, while in Merlin Mordred may wield the sword that delivers the fatal blow, Morgana is the one who makes it unbeatable by forging it in dragon’s breath (Edwards 2015, 81).

Mordred’s portrayal as pawn explains why interest in the character from the perspective of slash fans seems to be less about his antagonism with Arthur—though there is certainly homoeroticism in that regard—and more about the love and devotion that turns sour and leads to respective betrayals of each other. Mordred defies Morgana at the start of season five, in fact wounds her in favour of Arthur’s vision of a nobler way, making the transition from Druid nomad to Arthur’s favourite knight in the space of a few episodes. As a man of magic, who also wishes to prove to Merlin his devotion to Arthur, the character self-sacrifices for the greater good until Arthur asks of him a sacrifice that is too much: to allow the woman he loves to be executed. To have done so, to have let the girl die, which would be to betray himself (the Lover). In the end Mordred is as betrayed by Arthur and Merlin (his mentor, his ‘helper,’ if you like) as he himself betrays. Until their mutual destruction he still desires Arthur, smiling when Arthur returns a mortal wound, welcoming the opportunity to join Arthur in death.

Mordred enacts a kind of homosocial, or ‘erotic’ (to appropriate Eve Kosofsky Sedgwick’s use of the term, see 1985) triangle with Arthur and Merlin, challenging Merlin and his decision to maintain secrecy. He also is endeared to Arthur, trusting him completely, a trust that is in his eyes betrayed; although there is more to it than that, Mordred has a part to play in Arthur’s fate. The triangle enacted by these men is visible from their first meeting as adults (V.1). In this scene (to be explored further in the next section), as Merlin recognises Mordred for the threat he is, an instant bond is formed between Arthur and his future knight (see Figure 1). Concerning the bond of Arthur and Mordred, there are traces of erotic connection between the men in the literature also. In Wilfred Campbell’s 1895 play Mordred: A Tragedy in Five Acts—in which the character is cast in the role of tragic anti-hero rather than villain—Mordred makes the point that Arthur’s affection for Launcelot “outweighs his affection for the queen, suggesting a possible homosexual subtext and therefore implicitly threatening Arthur with sexual blackmail.” (Yee 2014, 15) Benjamin Franklin Fisher IV takes this observation further when he suggests that Mordred’s suspicions in this play are not entirely unfounded; for, as Launcelot says, “I love thee, King, as doth no other man.” (1990, 171) The significance of such a suggestion of eroticism—whether valid or not—is that, as Pamela M. Yee argues: “the fact that Mordred introduces the possibility of inappropriate conduct between king and knight indicates that both he and Campbell are preoccupied with definitions of proper masculine behavior”. (2014, 16) In the second half of this essay, I will consider via close readings of episodes and analysis of slash manips, the ease with which Mordred negotiates and simultaneously inhabits dual positions—knight/sorcerer, hero/villain, lover/destroyer. A quality that renders him an intriguing and highly ‘slashable’ figure throughout the final season of the series, and a character that has something important to say about the villain’s journey.

V.1: Arthur’s Bane is Mordred’s Destiny

Figure 2. Mordred (V).

Figure 2. Mordred (V).

Mordred (portrayed by Asa Butterfield, I–II; Alexander Vlahos, V) is first introduced as a young Druid boy in three episodes over seasons one and two (I.8, II.3, and II.11). He is the first to call Merlin by his Druid name, ‘Emrys,’ and plays a crucial role in introducing Morgana to sorcery early in the series. He is saved initially when Arthur allows him to escape execution by Uther, an act of mercy that endears Arthur to the character and explains the bond they later share: Arthur does, in a way, give Mordred life. Kilgharrah the dragon prophetesses that the young Druid will bring about Arthur’s demise and therefore that Merlin “must let the boy die.” However it is only at the end of this episode (I.8) that viewers learn this character is in fact the Mordred of legend. As Sherman points out, in Merlin the plot device of “introducing a figure or object from Arthurian legend while withholding his, her, or its name” (as with Mordred, Geoffrey of Monmouth, and Excalibur, for example) is a pattern that is repeated throughout the series (see Sherman 2015, 91 and 94). Resultantly, when the character returns in season two, Merlin attempts unsuccessfully to have him captured, knowing he will be killed if he is. These are actions Mordred vows never to forgive and never to forget. He does not return again until the final season (V.1). Recast as an adult (the 24-year-old Vlahos, see Figure 2), he becomes a central character until the series’ end twelve episodes later (V.13). There is significance to be found in this recasting. For the Mordred of season five, while an adult, remains still somehow younger, more innocent, more easily corrupted than the other men who sit among Arthur’s ‘circle.’ He is also now at a suitable age to be ‘paired’ by slash fans with other adult males.

Mordred’s reintroduction comes while Merlin and Arthur are separated from the Knights of Camelot and being held as captives of slave traders. Mordred’s entrance is by way of intervention, preventing one of the men from killing Arthur: “Shouldn’t we leave it to the Lady Morgana to decide their fate?” Assisting Arthur up from the ground, their hands still clasped, Mordred says, “You don’t remember me do you? You saved my life once, many years ago.” The scene (see Figure 3) in which Arthur and Mordred first meet as adults is rich in visual symbolism. Mordred, with his black fur, clean appearance, and well-tailored-yet-exotic attire stands apart from the filthy brutes of the party he travels with. His pallid complexion, blue eyes, blood red lips, and black, curly hair makes him an alluring presence, set against a woodlands backdrop of lush greenery. All this contrasts with Arthur’s golden hair and reflective armour: he sits stark in the shot. Mordred’s appearance in furs and associations with the Druids make him almost wolf-like in appearance, a lone wolf boy with bushy fur and piercing eyes. Combined with the appearance of the character in Merlin’s dreams throughout the final season, such imagery is phallic and homoerotic, as Sigmund Freud’s psychoanalytic reading of the ‘Wolf Man’ myth reveals (see 1955). The ‘Wolf Man’—as Freud’s patient has come to be known—is a case that appeared in From the History of an Infantile Neurosis. It details “the primal scene,” the witnessing by a child of a sexual act. In this case from the 1910s the patient, a Russian aristocrat, has an anal fixation: a predilection for heterosexual relations in which he penetrates his partner from behind, and where he is unable to move his bowels without an enema administered by a male attendant. The patient has a recurring dream of a tree full of white wolves, which Freud relates to a time when, just age one-and-a-half, the patient was exposed to his parents having coitus a tergo (“from behind”), and thus a “repressed homosexual attitude” developed (Freud 1955, 64). As Lee Edelman writes, “the Wolf Man observed at first hand what being used from behind entailed.” (1991, 96) Edelman, in connecting the case with passages in texts that depict sodomy between men, argues that the Wolf Man case “carries more specifically the psychic inscription of the anal-erotic organization.” (98)[4] The erotic potential of Arthur and Mordred’s first adult meeting is explored in my 2013 slash manip, The Coming of Mordred (see Figure 4). The work employs binary symbolism of colour and physiology (gold/black, muscular/slight, hairless/hairy, light/dark) to represent the contrast in the Arthur/Mordred dynamic; while the connection of their bodies, their hands exploring each other’s naked flesh, foreshadows the (erotic) intimacy to follow. Like the base image onto which the characters have been placed, it is a work of foreplay.

Figure 3. Mordred and Arthur’s first meeting as adults (V.1).

Figure 3. Mordred and Arthur’s first meeting as adults (V.1).

There is an unkempt wildness to Mordred that resembles Morgana, a character who has undergone a transition from colourful and regel gowns (I–III) to black furs and unkempt sensuality (IV–V), from the warmth of the ward of Camelot to the icy climate of exile; a transformation from young and beautiful into the series’ main antagonist (Mediavilla 2015, 52), a transformation that coincides with her embracing sorcery. Cindy Mediavilla argues that the televisual format “presents many opportunities for characters to evolve from one season to the next.” (2015, 52) And that of all characters, “Morgana’s transformation is, by far, the most profound.” (ibid.) Making Morgana “one of the most complex and fascinating Arthurian characters depicted on television.” (ibid.) Further, summing up the connection between the journeys of Mordred and Morgana in the series, Elysse T. Meredith argues that in Merlin, “Mordred’s path is a rough reversal of Morgana’s.” (2015, 165) In many regards a resemblance in the evolution of these characters is fitting, especially given that in many retellings of the legend, Mordred is the unwanted son of Arthur and Morgana (Edwards 2015, 50). There is a quality of heightened sexuality signified by the appearances of the adult Mordred and season five’s Morgana, which ties the sorcerer with the sexual, and the taboo of magic with the taboo of unbridled sexuality, at odds with the chaste chivalric order of Arthurian knights.

In the first episode of season five, despite travelling with their captors, Mordred continues to protect Merlin and Arthur, even smuggling them food. And when the pair escape and Arthur is presented with the opportunity to kill Mordred, he restrains, “He showed us kindness.” When Mordred is reunited with Morgana, she is both delighted and surprised to see him alive. “Sorcery frightens people,” Mordred says, “even those who claim to support it.” He is of course speaking of Merlin, whose decision to keep his identity secret, Mordred never fully reconciles. “You see a lot,” Morgana replies. “I’ve learned to,” Mordred says. “I’ve had to. If I was not to be burned at the stake or exploited for another man’s gain.” We realise at this point that Mordred too has changed, he no longer associates with the Druids. He is an outcast, like Merlin, having to hide in plain sight to survive. We never learn why this is, the mystery of his background adding to the suspense of the character and his intentions. Morgana becomes hostile when Mordred informs her that they had Arthur in their grasp and that he escaped. She accuses Mordred of letting him go. Mordred is clearly taken aback by Morgana’s outburst and detailing of how she wishes for Arthur’s head on a spike. Their reunion is cut short when the alarm is sounded: Arthur has come to free his men.

Figure 4. The Coming of Mordred, Merlin/Mordred slash manip. By chewableprose.

Figure 4. The Coming of Mordred, Merlin/Mordred slash manip. By chewableprose.

While Morgana is successful in capturing Arthur, she is stopped from killing him by Mordred, who decides in a moment of intensity to change sides. It would seem that Arthur’s willingness to risk his life—“Had to free my men.”—inspires Mordred to literally stab his own kind in the back with a dagger. In the following scene, a confused Merlin asks the Diamair—the key to all knowledge—“If Mordred is not Arthur’s bane than who is?”, to which the Diamair replies, “Himself.” This is Arthur’s betrayal of Mordred to which I earlier referred. Mordred does, by all appearances, change sides; however it is Arthur’s later decisions that ultimately lead Mordred to double cross him, decisions ‘helped’ by Merlin. Mordred returns to Camelot and is knighted. In the scene following, Merlin offers to remove his cape, and queries Mordred’s defection:

MERLIN          You saved Arthur’s life, why?

MORDRED Because Arthur is right, the love that binds us is more important than the power we wield. Morgana had forgotten that.

Merlin disrobing Mordred is a titillating sight for slash fans. It connotes a changed dynamic for the former rivals. While Mordred was previously an outsider and Merlin had Arthur’s ear, now Mordred is granted access to Arthur’s inner circle. Merlin is now subservient to Sir Mordred, and must interact with him accordingly. Such is the symbolism attached to the removal of the ceremonial cape. Yet there is also subterfuge in the scene. Merlin veils a threat of exposure through the line, “if Arthur knew.” A threat that is of course empty, as Mordred holds the same damning knowledge over Merlin. Theirs is a stalemate. Merlin resists the shift of power, the subtext of this scene being his jealousy.

Mordred and Merlin are “not so different,” as Mordred identifies earlier in the episode. His rationale for turning on Morgana bears uncanny semblance to a scene from the previous season (IV.6), when a captured Merlin accuses Morgana of knowing nothing of loyalty, caring only for power. Also, they both keep their magical identities hidden from Arthur. This essay suggests that Merlin’s suspicion of Mordred is misplaced, and in fact helps ensure his eventual betrayal (as is argued below in regard to the events of V.5). As the focal character of the series, Merlin’s suspicion—however unwarranted—manifests itself in slash art that exploits the potential power, symbolic and supernatural, Mordred has to control Merlin. My 2013 slash manip Like A Beast is a case in point. In the work, I exploit the derogatory connotations of the ‘doggystyle’ position (of being fucked “from behind,” to refer back to the Wolf Man myth) and signifieds of dispassionate, focused, in control (Mordred) versus shocked, overwhelmed, distant (Merlin) in my selection of facial expressions. Merlin’s expression in particular evokes all the passivity, phallus-accommodating, and penetrative potential of the toothless, gaping mouths of side show carnival clowns ready for ball play. Such imagery is also supported by Merlin’s performance in the series of a medieval fool.[5] The Camelot banner and digitally-engorged scrotums combined with the ‘movement’ of the sexual position—Mordred employing elements of the ‘leap frog’ doggystyle variant, ‘balls deep’ inside Merlin—helps convey my intended subversion of Merlin, the power afforded to Sir Mordred, and the fallacy of his knighthood, which is built on a lie and a constant ‘threat-of-outing’ game with Merlin.

Other artists have also explored the new power differential between Merlin and Mordred, and further, the new affordances with Arthur that come as a result of Mordred’s knighthood. In an untitled 2014 work by wishfulcelebfak, who posts his works to LiveJournal, Mordred sits on Arthur’s cock (perhaps symbolic of a throne). In text accompanying the work, the artist situates the image:

Arthur (bradley james) helps druid Mordred (alexander vlahos) come out of his shell, by introducing him to “knights of the round table” aka sex buddy club.

Morgana can only offer Mordred some cheap magic tricks and a wooden dildo, but Arthur can offer him unlimited gay sex with all the hunks of the kingdom. Which side will Mordred choose? (wishfulcelebfak 2014)

Expressed in the above are the benefits that come with Mordred’s inclusion in the Knights of the Round Table, including certain ‘homosocial rituals,’ which wishfulcelebfak has (homo)sexualised. The work of Ruth Mazo Karras is useful here, her 2002 From Boys to Men, for example, examines formations of masculinity in late medieval Europe through a queer reading of the bonds that ignite among knights. The message of this manip is just how much Arthur has to offer.

Similarly, a 2012 work titled Breaking in a New Knights by endless_paths, also a LiveJournal artist, depicts Arthur entering Mordred ‘from behind.’ The accompanying text: “Who needs merlin when you have knights” (endless_paths 2012a), makes clear the role (once occupied by Merlin) that Mordred now fills; or in the context of the sexual act depicted, the willingness of Mordred to provide a ‘space’ for Arthur to fill. The artist implies that Mordred’s hole is more compatible with the cock of a king than that offered by his manservant. This implication is in much the same spirit as the erotic rituals that may have taken place between knights, such as bathing in front of each other to verify health and masculinity, as recounted in the 1300s by French knight Geoffroi de Charny in his Book of Chivalry (as noted by Zeikowitz 2003, 64–65; Zeikowitz also details intimate interactions between knights in Geoffrey Chaucer’s Troilus and Criseyde and Sir Gawain and the Green Knight, author unknown). Concerning the erotic rituals of Arthur and his knights, Mordred specifically, my 2014 slash manip It’s Good to be Bad, describes just such a ritual:

Mordred knew it was wrong that, when the other knights were not looking and the Queen was away, he would get down on his knees in the grass in that private spot behind the castle and take Arthur’s manhood in his mouth, and keep it there until the King moaned, withdrew and showered him with his seed. Mordred knew it was bad to be so suggestive in front of the others in gesturing for his King to repeat the ritual more and more, but such dangerous displays were also what made it feel so good (chewableprose 2014)

In Arthur’s eyes Merlin and Mordred are entirely different (a theme explored in endless_paths’s manip): one is brave and noble and knightly, the other a friend and manservant yes, but not possessing qualities necessary to be a knight. Mordred is given recognition and place at Arthur’s right side, which is everything prophesised, but not realised, about Merlin and Arthur’s relationship. In Kilgharrah’s words to Merlin: “The Druid boy, his fate, and Arthur’s are bound together like ivy around a tree.” (V.3) While the legend is clear about the significance of such a statement, in Merlin there is the implication that it is the character Merlin’s unwillingness to trust Mordred’s sincerity that in part ensures Arthur’s grim fate. That Merlin may have had a role to play in the death of Arthur is supported by Chandler, who argues that in Merlin, and indeed much of the literature on which it is based, there is no single contributing factor in Arthur’s downfall (2015, 110). As Gaius, Merlin’s most trusted friend, tells him: “People change, perhaps you should give [Mordred] the benefit of the doubt.” (V.2) Merlin never does.

Seeking a Father, Seeking a Son: Arthur and Mordred’s Search for Each Other

Etymologically Mordred is Latin and means “painful,” an apt descriptor for a character difficult to watch. From a slash perspective, he is painful because he had so much promise. The promise was despite the character’s “weight of history,” a phrase used by series co-producer Julian Murphy to explain certain inevitable conclusions to the series (see Brennan 2015, 37; also see Sherman 2015, 83 who discusses audience expectations around Arthurian retellings). Being introduced as an adolescent to the ‘of age’ Merlin and Arthur early in the series, understanding Mordred’s portrayal relies on remembering that he is much younger than contemporaries Merlin and Arthur—easy to forget given that Alexander Vlahos, the actor recast in the role, is aged within two years of Merlin actor Colin Morgan. In the legend the character is often Arthur’s illegitimate son (to Morgause in Malory and White, and to Morgana in Bradley’s 1982 The Mists of Avalon), which perhaps explains Arthur’s father-like devotion, and Morgana’s protectiveness in this version of the story. Mordred wishes to please Arthur, and when that fails, repurposes this wish for Morgana. He gives up Merlin’s secret identity late in the final season (V.11) as demonstration of his devotion to Morgana’s cause, committing himself to the destruction of his father-figure, and the Law-of-the-Father (see Lacan 1977, 67).

The Oedipal potential of the Arthur/Mordred/Morgana relationship is plain to see, and has been noted by scholars (see Worthington 2002) in their readings of other iterations of the Arthurian legend. In renouncing Arthur and turning to the ‘dark side’ (see Figure 5) Mordred also foregoes all knightly, chivalric artifice. He embraces the sorcerer, traitor, feminine side of the binaries he once moved between. Keeping in mind Mordred’s age and his search for guardianship, before shifting sides, Arthur and Merlin emerge as two potential surrogate fathers, the erotic potential of which is as pronounced in Merlin as it is in the incestuous unions that spawned Mordred in many other adaptations (most notably in Malory). Mordred’s search for a father is met with Arthur’s search for a son and heir and is most evident in V.5. It is a search at odds with Merlin’s own quest to prove himself to Arthur, the tragedy of which rings true when we consider that Arthur dies before producing an heir.

Figure 11. Left–right: Mordred in service to Arthur; Mordred in service to Morgana (V).

Figure 5. Left–right: Mordred in service to Arthur; Mordred in service to Morgana (V).

In a scene from V.5 that follows a training session, Arthur makes clear to Merlin his intention to mentor Mordred, and speaks with an admiration and pride he does not of any of his other knights. Mordred’s prowess with a sword confirms how little we know of his life in the intermediary years since we last saw him. Where did he learn to fight in a manner that would impress the king? Furthering the surrogate father metaphor, Mordred is half Merlin, half Arthur, he has both of their skills and the potential to become the best of both men.[6] Mordred reaches out to both men, and while Arthur reciprocates Mordred’s love, Merlin shuns it. This is despite Gaius’s—Merlin’s own father-figure—efforts to convince Merlin that Mordred will not necessarily betray Arthur:

The future has many paths, that is only one. […] Seeing’s not the same as knowing, and we must know before we act.

In this episode Merlin acts before he knows, seizing an opportunity to ensure Mordred dies, actions that in fact ensure Mordred’s survival and the continuation of the prophecy of ‘Arthur’s bane.’

V.5: “I Cannot Save the Life of a Man Destined to Kill Arthur”

Arthur displays his faith in Mordred by inviting him on a routine patrol of the woods surrounding Camelot. Merlin objects in an early scene that labours his inability to afford Mordred the opportunity to prove himself, suggesting yet again that there could have been a very different outcome for all concerned if he had. The purpose of the patrol is to confront a rogue sorcerer, Osgar, who when confronted presents Arthur with a relic of the ‘Old Religion.’ Such relics and reference to magic as an ‘Old Religion’ adds to the mysticism of magic as it is represented in the series (via glowing eyes, potions, collection of herbs for poultices, etc.). Naturally, given his unsuperstitious nature and traits of King and Warrior (Moore and Gillette 1990, 62, 79), Arthur is not too concerned. The sorcerer dies from wounds sustained in his confrontation with the patrol and is buried in secret by Merlin. Mordred notices:

MORDRED What would the king say? Sorcerers are not permitted marked graves. It’s all right, Merlin, I’d have done the same. He was one of us, after all.

MERLIN          It won’t always be like this. One day we’ll live in freedom again.

MORDRED You really believe that?

MERLIN          I do.

MORDRED Until then, we go unmarked in death as in life.

It is their first scene alone since Merlin disrobed Mordred following his knighting. And Mordred begins as Merlin had before, with a veiled threat of exposure. Before the sorcerer Osgar had died he had told Arthur there was still time to find his “true path.” This warning mirrors Gaius’s “many paths” comment to Merlin. Kilgharrah confirms this later in the episode when he tells Merlin: “The future is never clear, there are many paths, they do not all lead to Camelot’s ruin.” It follows, therefore, that not all paths lead to Mordred’s villainy. Within Merlin, Mordred is seeking someone with whom he can confide, someone with magic like himself who can help him negotiate his dual identity. This is what Merlin ultimately denies him, and himself. Merlin is so used to keeping his identities separate, he is unable to understand Mordred, a man who refuses to give up on others knowing that side of himself. That becomes clear in this scene as Mordred seeks surety that he will not always have to hide who he is. In the end, it is Morgana who gives him this certainty of self. In the episode, Gaius convinces Arthur to investigate the relic, a journey that takes them to the White Mountains and the dwelling of the ‘Disir,’ representatives of the Old Religion (all women). When conflict inevitably follows, Mordred is gravely wounded while protecting Arthur. Mordred’s only hope for survival is Merlin’s magic, which Merlin will not use because of fear of who Mordred will become. Gaius rightly notes that letting someone die based on a prophecy of what they may one day do is out of character for Merlin. Interestingly, this scene is similar to the scene between Arthur and Morgana in V.1 that convinced Mordred to change sides:

ARTHUR          What happened to you, Morgana? As a child, you were so kind, so compassionate.

MORGANA      I grew up.

Merlin remains committed to his decision to let Mordred die for the greater good, as the experience of ‘growing up’ has taught him. This is perhaps where Mordred’s youth, as a man yet to ‘grow up’ and thus in need of guidance and understanding, becomes significant. Believing it his only recourse, Arthur returns with Merlin to the Disir, prepared to lay down his life for Mordred’s. The Disir tell Arthur he must embrace magic, and is given the night to decide. “My heart says do anything I can to save Mordred,” Arthur says to Merlin that night by campfire, a recurrent setting of intimacy and phallic symbolism (“tongues of flame” [Freud 1930, 37]) for the men. “But I have seen what misery unfettered sorcery brings. Before my father outlawed magic, Camelot was almost destroyed by sorcery. In my own time, Morgana has used it for nothing but evil. What would you do? In my place?” Arthur seriously considers the prospect that magic may not be as evil as his father thought, and even if it is, seems prepared to accept that threat in exchange for Mordred’s life. He asks Merlin for his advice on what he thinks they should do: “So what should we do? Accept magic? Or let Mordred die?” Merlin chooses the latter, and seals the fate of both men: “There can be no place for magic in Camelot.”

Arthur tells the Disir of his decision, returning with a heavy heart to Camelot. When he arrives he is delighted to discover that Mordred is alive and well, Mordred running and embracing Arthur. Merlin then realises in a scene with Gaius that by influencing Arthur not to allow magic to return to the realm, he had ensured Mordred’s path to bring about Arthur’s death:

MERLIN          How could I have been so stupid?

GAIUS             You did what you thought was best.

MERLIN          I assumed the best way to protect Arthur was to kill Mordred.

GAIUS             A perfectly natural assumption.

MERLIN          But all I did was make sure he lived. That was the Disir’s judgment. Mordred’s life is Arthur’s punishment for rejecting magic.

GAIUS             You mustn’t blame yourself.

MERLIN          But it is my fault. Mordred is alive and well. He’s free to play his part in Arthur’s death and there’s nothing I can do to prevent it. Nothing.

I am inclined to disagree with Merlin’s logic, as expressed in the above dialogue. Given reference in this episode to the many paths of fate, and the Disir’s promise to spare Mordred’s life should Arthur accept magic, it seems more plausible that it is not Mordred’s life that is punishment, but rather forthcoming catalysts—namely the character Kara—that will lead Mordred to stray onto a different path. Merlin is right in so far as this cannot now be prevented; the sentence has been passed: Arthur will die at Mordred’s hand, and Merlin ensured it. This reasoning makes sense when considered in relation to a key fan criticism (see Caspers 2013) of Merlin ending when it does, which is that the prophecy of Merlin and Arthur side-by-side, uniting the lands of Albion and returning magic to the realm is never realised. It would seem this is the hero’s critical mistake. As Gaius words it, Merlin did what he thought was ‘best,’ but not what was ‘right.’ As Arthur prophetically told Merlin in V.1: “No matter what adversity we face, we stand for what is right. To betray our beliefs, Merlin, that is what would destroy everything we strive for.”

This is the tragedy of this particular retelling. By betraying the beliefs that Arthur and Merlin had lived by, and that had seen them escape certain death many times previous, Merlin had ensured Arthur’s destruction. This point also explains another fan criticism of the plotting of the final episode (see Caspers 2013), which is that Arthur and Merlin had survived worse in the past. This time was different, this time Arthur’s fate was decided in advance. The earlier scene where Mordred doubts whether magic will ever not be outlawed lends further credence to the argument that had Arthur chosen Mordred’s life over his decree, Mordred would not need to go on “unmarked in death as in life.” The episode ends with Arthur with his arms around Mordred, hoisting him into the air (see Figure 6), it serves as grim reminder—for Arthur/Mordred shippers[7] particularly—of what might have been.

Figure 6 Arthur hoists Mordred into the air in a playful embrace (V.5).

Figure 6. Arthur hoists Mordred into the air in a playful embrace (V.5).

V.9: “Three’s Better than Two, Isn’t That Right, Merlin?”

Mordred continues to reach out to Merlin in the lead-up to the cataclysmic event that reroutes him onto the path of Arthur’s destruction. And Arthur continues to treat Mordred like a son. The events of V.9 are a good illustration of this. In the plot for this episode, Mordred and Leon are the only knights Arthur trusts with information of a plan intended to disrupt potential leaks in the ranks. The episode is the final in the ‘evil!Guinevere trilogy,’ in which Guinevere is enchanted to serve Morgana, and in it Merlin and Arthur set out with an unconscious Guinevere to meet ‘The Dolma,’ a mysterious elderly female sorcerer, in hopes of a cure. Mordred, having noticed Merlin acting strangely, follows them. It is just as well he does too, coming to the rescue when a cliff fall leaves Merlin unconscious and Arthur pinned beneath a boulder. Mordred is praised that evening around a campfire: that site of homoerotic significance. There, sitting around erect flames, Arthur makes reference to the triangle Mordred effects in the Arthur/Merlin dynamic: “Good to have you with us. Three’s better than two, isn’t that right, Merlin?” That evening, Mordred once again confronts Merlin, expressing a desire for amicable relations between them:

MORDRED You don’t trust me do you, Merlin?

MERLIN          I believe you to be a fine knight.

MORDRED But not one to be trusted. It’s all right, I know you have the king’s best interests at heart. I only wish you would believe that I do too. One day I shall prove my loyalty to you and the king. Then I hope we may be friends.

MERLIN          I would wish for nothing more.

When an attack from Morgana renders Mordred unconscious, Merlin convinces Arthur to leave him for dead. Yet another refusal by Merlin to believe in Mordred, which in turn facilitates Morgana and Mordred’s first meeting since his defection:

MORDRED Why don’t you kill me?

MORGANA      My argument’s not with you, Mordred. How could it be? We’re of a kind.

MORDRED Never.

MORGANA      You wear the uniform well but we both know what lies beneath. Do you think Arthur would tolerate you for one minute if he knew the truth? One of his knights, a sorcerer.

MORDRED One day he will know. One day we will be accepted.

MORGANA      Your naïveté would be charming if it wasn’t so dangerous.

Mordred defeats Morgana using magic, his eyes glowing gold: symbolising the fire Morgana has ignited within (see Figure 7); ambers of doubt—and of Camelot’s destruction, as the prophecy goes—are being fanned, which again would not have been the case had Arthur embraced magic in V.5. At the episode’s end Mordred reveals that he had known the mysterious sorceress Arthur had gone to meet was in fact Merlin, and vows to keep his secret yet again, to trust that Merlin’s intentions are just: “Have no fear. I will not divulge your secret. I admire you. It can’t be easy to do so much for so little reward.” This episode and the meeting with Morgana marks the beginning of the end.

Figure 13. Mordred defeats Morgana using magic (V.9).

Figure 7. Mordred defeats Morgana using magic (V.9).

V.11: “You’re Breaking His Heart. You’ll Lose His Trust”

Arthur’s sentence—to die at the hands of a Druid—begins with Mordred’s betrayal in V.11 and is complete only two episodes later. In V.11 Mordred (as Lover) shelters a childhood friend and implied lover, Kara, who is subsequently captured and sentenced to death after killing several of Arthur’s men and making an attempt on Arthur’s life. Mordred pleads with Arthur on Kara’s behalf for clemency, weeps and kneels before him, “I beg you, Arthur.” Arthur is moved by the display and responds in a father-like manner: “You know there’s nothing I wouldn’t do for you.” Yet refuses to yield the sentence, for she is a danger to his people. Merlin watches these events unfold with great interest, well aware of what is a stake, and pleads to Arthur on Mordred’s behalf:

MERLIN          You’re breaking his heart. You’ll lose his trust.

ARTHUR          There’s nothing I can do. In time Mordred will understand that. He’ll come to forgive me.

MERLIN          I fear you’re wrong, Arthur.

Kara exploits Mordred’s feelings for her, poisoning him against Arthur to further her own cause against Uther’s doctrine: “No matter what he preaches, he is no different from his father.” Mordred resolves to free Kara and smuggle her out of Camelot. However before he does, he returns to Arthur to apologise for what he is about to do, and to say goodbye: “You took me in. I will always remember that, and everything you’ve done for me.” Recognising Mordred’s speech for what it is, Merlin confronts Mordred and his intention to free Kara. Mordred warns Merlin not to betray his trust. “Tell me you wouldn’t do the same for the woman you love,” Mordred says. “You see, you can’t.” When Merlin discusses the situation with Gaius, he is reminded that what Mordred is planning: “It’s nothing you haven’t done yourself a hundred times before.” And yet, as Merlin has always done, he applies a double standard where Mordred is concerned, betraying his trust and telling Arthur of Mordred’s intentions. It is one final failure on Merlin’s behalf to choose another path for Mordred, the man who so admires him.

Mordred and Kara are captured in the woods beyond the castle, Kara having killed a guard during the escape. They are imprisoned, Kara’s sentence standing and Mordred’s pending. Merlin makes another attempt to persuade Arthur to free Kara. And it works. The next morning, in the throne room before all of the court Arthur offers Kara a chance: “If you repent your crimes, I will spare your life.” Arthur’s love for Mordred is such that he would betray his own beliefs—allowing a sorcerer and killer to go free—if it will mean winning back Mordred’s favour. Slash manip artist endless-paths speculates on Arthur’s devotion and the seductiveness of the Mordred character in a 2012 Arthur/Mordred manip titled A Knight Doing His Duty. In a brief statement accompanying the work and setting up the action depicted, endless_paths writes: “Sometimes the power of a sorcerer is to [sic] much to resist.” (2012b) The manip configures the two in the missionary position and is set in Arthur’s chambers, two qualities that connote intimacy and familiarity between the pair: they have done this before. In line with the ‘semiotic significance of selection’ (Brennan 2013) in the work, Mordred, as you would expect, is slighter in stature, while Arthur is particularly limber. In a plank position, Mordred folds Arthur’s knees back and by his sides, elevating his arse for deeper penetration. Arthur’s arms reclined behind his head; his toes pointed and clenched; and his chin pressed to his chest allowing for full view of Mordred’s cock entering him: Arthur is entirely committed to the act and maximising the full range of his penetrator’s motion. Both men have relaxed expressions and line of sight to each other.

Despite Arthur’s best efforts to
alleviate tensions with Mordred via an offer of clemency, Kara
remains resolute: “You deserve everything that’s coming to you, Arthur Pendragon.” Mordred never learns of Arthur’s offer to pardon Kara. In a state of acute grief, Mordred uses magic to free himself following her execution (see Figure 8) and travels to Morgana directly, to whom he reveals that the identity of the man who had been stalking her dreams, Emrys, is none other than Arthur’s manservant, Merlin. Once again, connection can be made here between Mordred and Morgana’s journeys to villainy, in particular this critical episode and its sequence of events, which can be compared with a storyline from season one. As Jennifer C. Edwards explains, after witnessing Uther’s resolve to execute a man of magic (Alvarr in I.12) who had provided her with comfort, “Morgana changes from a loving ward to a treacherous rebel and even goes so far as to plot Uther’s death.” (2015, 51) A similar fate befalls Mordred here, whose “betrayal of Arthur results not from inherent malevolence but from the death of his childhood sweetheart.” (Meredith 2015, 165)

Figure 14. In a state of grief, Mordred uses magic to set himself free from his cell and from Arthur (V.11).

Figure 8. In a state of grief, Mordred uses magic to set himself free from his cell and from Arthur (V.11).

Conclusion

Reflecting on her experience of the aftermath of a public execution of a criminal during a residence in Scandinavia, Mary Wollstonecraft (1802) writes:

[…] executions, far from being useful examples to the survivors, have, I am persuaded, a quite contrary effect, by hardening the heart they ought to terrify. Besides, the fear of an ignominious death, I believe, never deterred any one from the commission of a crime; because, in committing it, the mind is roused to activity about present circumstances. It is a game of hazard, at which all expect the turn of the die in their own favour; never reflecting on the chance of ruin, till it comes. In fact, from what I saw, in the fortresses of Norway, I am more and more convinced that the same energy of character, which renders a man a daring villain, would have rendered him useful to society, had that society been well organized. (208)

Wollstonecraft’s reflection is resonant with the execution of Kara, which is the catalyst for spurring Mordred the Lover to betray and destroy his King. In her critique of the spectacle of the public execution, Wollstonecraft makes the case that villainy is not innate, but rather due to some external, societal failure. Such an observation is comparable with my argument in this essay about the Mordred character, that great archetype of the treacherous villain. That the societal failure of a pre-unified Albion, in which magic is banned and Merlin the Magician feels the need to hide himself, is what leads Mordred onto his villainous path. This reading offers insight into the popular reimagining of an iconic villain, as well as the various types and queer metaphors the character’s journey in this popular series illuminates and rouses within the minds of fans. The inclusion in this essay of works by slash manip artists both demonstrate the appeal of a queer reading of the Mordred character, while also supporting broader queer readings of Merlin as a program full of homoerotic potential.

T.H. White’s adaptation of the Arthurian legend has been read by some scholars as an allegory to the horrors of the Second World War. In it Mordred is a Hitlerian character. He turns to new technology to bring about a ‘New Order’ (1958, 620–21). If Hitler sought to destroy civilisation; in White, by valorising power above honour, Mordred destroys chivalry (Thomas 1982, 50). In Merlin, Mordred is more a pawn of fate than an agent of destruction; he carries out Arthur’s sentence from the Triple Goddess (V.5) under Morgana’s—High Priestess of the Triple Goddess—instruction. He stands as example of the dire consequences of secrecy. Merlin’s unwillingness to trust him, and resolve to remain closeted about his secret identity, seals Mordred and Arthur’s fate of mutual destruction. When Mordred strikes the fatal blow in V.13, he says to Arthur: “You gave me no choice.” When Arthur returns with a fatal strike of his own, Mordred smiles, he will not go into death unmarked or alone.

 

References

Aronstein, Susan. An Introduction to British Arthurian Narrative. Gainesville: University Press of Florida, 2012.

Bacon-Smith, Camille. Enterprising Women: Television Fandom and the Creation of Popular Myth. Philadelphia: University of Pennsylvania Press, 1992.

Battis, Jes. “The Kryptonite Closet: Silence and Queer Secrecy in Smallville.” Jump Cut: A Review of Contemporary Media, no. 48 (2006).

Bradley, Marion Zimmer. The Mists of Avalon. New York: Random House, 1982.

Brennan, Joseph. “Slash Manips: Remixing Popular Media with Gay Pornography.” M/C Journal: A Journal of Media and Culture 16.4 (2013).

Brennan, Joseph. “Not ‘From My Hot Little Ovaries’: How Slash Manips Pierce Reductive Assumptions.” Continuum: Journal of Media & Cultural Studies 28.2 (2014a): 247–64.

Brennan, Joseph. “The Fannish Parergon: Aca–Fandom and the Decentred Canon.” Australasian Journal of Popular Culture 3.2 (2014b): 217–32.

Brennan, Joseph. “‘You Could Shame the Great Arthur Himself’: A Queer Reading of Lancelot from BBC’s Merlin with Respect to the Character in Malory, White, and Bradley.” Arthuriana 25.2 (2015): 20–43.

Burger, Glenn, and Steven F. Kruger. “Introduction.” In Queering the Middle Ages, edited by Glenn Burger and Steven F. Kruger, i–xxiii. Minneapolis: University of Minnesota Press, 2001.

Caspers, Kirsten. “Why Merlin Should Return to the Screen,” Den of Geek!, February 13, 2013, accessed September 3, 2015.

Chandler, Erin. “Pendragons at the Chopping Block: Elements of Sir Gawain and the Green Knight in the BBC’s Merlin.” Arthuriana 25.1 (2015): 101–112.

chewableprose. “Chewyboys It’s Good to be Bad,” LiveJournal, July 8, 2014, accessed September 8, 2015.

Doyle, Arthur Conan. “The Final Problem.” In his The Memoirs of Sherlock Holmes, 256–79. London: George Newnes, 1894.

Edelman, Lee. “Seeing Things: Representation, the Scene of Surveillance, and the Spectacle of Gay Male Sex.” In Inside/Out: Lesbian Theories, Gay Theories, edited by Diana Fuss, 93–118. New York: Routledge, 1991.

Edwards, Jennifer C. “Casting, Plotting, and Enchanting: Arthurian Women in Starz’s Camelot and the BBC’s Merlin.” Arthuriana 25.1 (2015): 57–81.

Elmes, Melissa Ridley. “Episodic Arthur: Merlin, Camelot, and the visual Modernization of the Medieval Romance Tradition.” In The Middle Ages on Television: Critical Essays, edited by Meriem Pages and Karolyn Kinane, 99–121. Jefferson: McFarland, 2015.

endless_paths. “17th December: Breaking in a New Knights,” LiveJournal, December 17, 2012a, accessed September 7, 2015.

endless_paths. “A Knight Doing His Duty,” LiveJournal, December 31, 2012b, accessed September 7, 2015.

Fisher IV, Benjamin Franklin. “King Arthur Plays from the 1890s.” Victorian Poetry 28.3–4 (1990): 153–76.

Foster, Tara, and Jon Sherman, eds., “King Arthur in the Twenty-First Century: Kaamelott, BBC’s Merlin, and Starz’s Camelot.” [Special Issue] Arthuriana 25.1 (2015).

Freud, Sigmund. Civilization and its Discontents. Translated by James Strachey. New York: W.W. Norton and Company, 1930.

Freud, Sigmund. “From the History of an Infantile Neurosis.” In The Standard Edition of the Complete Psychological Works of Sigmund Freud 17, edited by James Strachey, 1–124. London: The Hogarth Press, 1955.

Halperin, David M. Saint Foucault: Towards a Gay Hagiography. Oxford: Oxford University Press, 1995.

Hills, Matt. “Defining Cult TV: Texts, Inter-texts, and Fan Audiences.” In The Television Studies Reader, edited by Robert Clyde Allen and Annette Hill, 509–23. London: Routledge, 2004.

Hoffner, Cynthia, and Martha Buchanan. “Young Adults’ Wishful Identification with Television Characters: The Role of Perceived Similarity and Character Attributes.” Media Psychology 7.4 (2005): 325–51.

Jenkins, Henry. Textual Poachers: Television Fans and Participatory Culture. London: Routledge, 1992.

Jones, Sara Gwenllian. “The Sex Lives of Cult Television Characters.” Screen 43.1 (2002): 79–90.

Karras, Ruth Mazo. From Boys to Men: Formations of Masculinity in Late Medieval Europe. Philadelphia: University of Pennsylvania Press, 2002.

Lacan, Jacques. Ecrits: A Selection. London: W.W. Norton & Company, 1977.

Lochrie, Karma. “Mystical Acts, Queer Tendencies.” In Constructing Medieval Sexuality, edited by Karma Lochrie, James Schultz, and Peggy McCracken, 180–200. Minneapolis: University of Minnesota Press, 1997.

Machat, Sibylle. “‘Prince Arthur Spotted Exiting Buckingham Palace!’: The Re-Imagined Worlds of Fanfic Trailers.” In Film Remakes, Adaptations and Fan Productions: Remake/Remodel, edited by Kathleen Loock and Constantine Verevis, 197–214. Basingstoke: Palgrave Macmillan, 2012.

Malory, Thomas. Le Morte d’Arthur. London: William Caxton, 1485.

McFarland, William P., and Timothy R. McMahon. “Male Archetypes as Resources for Homosexual Identity Development in Gay Men.” The Journal of Humanistic Counseling, Education and Development 38.1 (1999): 47–60.

Mediavilla, Cindy. “From ‘Unthinking Stereotype’ to Fearless Antagonist: The Evolution of Morgan le Fay on Television.” Arthuriana 25.1 (2015): 44–56.

Meredith, Elysse T. “Gendering Morals, Magic and Medievalism in the BBC’s Merlin.” In The Middle Ages on Television: Critical Essays, edited by Meriem Pages and Karolyn Kinane, 158–73. Jefferson: McFarland, 2015.

Moore, Robert L. The Magician and the Analyst: Ritual, Sacred Space, and Psychotherapy. Chicago: Council of Societies for the Study of Religion, 1991.

Moore, Robert L., and Douglas Gillette. King, Warrior, Magician, Lover: Rediscovering the Archetypes of the Mature Masculine. San Francisco: HarperSanFrancisco, 1990.

Moore, Robert L., and Douglas Gillette. The King Within: Accessing the King in the Male Psyche. New York: William Morrow & Co., 1992.

Padva, Gilad. “Dreamboys, Meatmen and Werewolves: Visualizing Erotic Identities in All-Male Comic Strips.” Sexualities 8.5 (2005): 587–99.

Propp, Vladimir. Morphology of the Folktale. Volume 9. London: American Folklore Society, 1958.

Russ, Joanna. “Pornography by Women for Women, with Love.” In her Magic Mommas, Trembling Sisters, Puritans and Perverts: Feminist Essays, 79–99. Trumansburg: Crossing Press, 1985.

Scodari, Christine, and Jenna L. Felder. “Creating a Pocket Universe: ‘Shippers,’ Fan Fiction, and The XFiles Online.” Communication Studies 51.3 (2000): 238–57.

Sedgwick, Eve Kosofsky. “Gender Asymmetry and Erotic Triangles.” In her Between Men: English Literature and Male Homosocial Desire, 21–27. New York: Columbia University Press, 1985.

Sherman, Jon. “Source, Authority, and Audience in the BBC’s Merlin.” Arthuriana 25.1 (2015): 82–100.

Stewart, Mary. The Wicked Day. London: Hodder & Stoughton, 1983.

Stoker, Bram. Dracula. London: Archibald Constable and Company, 1897.

Sutton, John William. “Mordred’s End: A Reevaluation of Mordred’s Death Scene in the Alliterative Morte Arthure.” The Chaucer Review, 37.3 (2003): 280–5.

Thomas, Jimmie Elaine. “The Once and Present King: A Study of the World View Revealed in Contemporary Arthurian Adaptations.” Ph.D. Thesis, University of Arkansas, 1982.

Tollerton, David C. “Multiculturalism, Diversity, and Religious Tolerance in Modern Britain and the BBC’s Merlin.” Arthuriana 25.1 (2015): 113–27.

White, T.H. The Once and Future King. New York: GP Putnam’s Sons, 1958.

wishfulcelebfak. “Hugedicklover’s Top 10 Gift – Random Merlin Hunks Fucking,” LiveJournal, November 9, 2014, accessed September 7, 2015.

Wollstonecraft, Mary. “Letter XIX.” In her Letters Written During A Short Residence in Sweden, Norway, and Denmark, 207–18. Second Edition. London: G. Woodfall, 1802.

Worthington, Heather. “From Children’s Story to Adult Fiction: T.H. White’s The Once and Future King.” Arthuriana 12.2 (2002): 97–119.

Yee, Pamela M. “Re-presenting Mordred: Three Plays of 1895.” Arthuriana 24.4 (2014): 3–32.

Zeikowitz, Richard E. Homoeroticism and Chivalry: Discourses of Male Same-Sex Desire in the 14th Century. Basingstoke: Palgrave Macmillan, 2003.

 

Notes

[1] During Arthur’s quest to save his knights from Morgana in V.1, Merlin encounters a Druid seer who tells him of ‘Arthur’s bane,’ the prophecy of Arthur’s death at the hands of a Druid (Mordred). Merlin is told: “Now more than ever it is you and you alone that can keep Arthur safe.” It sets a sinister tone for the final season. Coupled with the season’s tagline “The die is cast,” it suggests that Arthur’s death is an inescapable destiny, which ushers back to season one’s tagline, “You can’t escape destiny.”

[2] See Tollerton 2015, who discusses the “freer hand” Merlin has “to gesture toward modern concerns and make ethical judgements on issues of diversity and society.” (123)

[3] Not surprising, given that the format of Smallville (depicting Clark Kent before he became Superman) served as principal inspiration for Merlin (Brennan 2015, 39).

[4] Also see Padva 2005, who uses Freud’s reading of the homoerotic symbolism in the wolf dream to read a gay male comic, Jon Macy’s ‘Tail.’

[5] In a scene from V.1, Arthur delights in the opportunity to humiliate Merlin, forcing him to juggle for the entertainment of Queen Annis and her guests.

[6] Producing offspring based on a digital composite of two male faces is a popular practice among digital slash artists.

[7] A ‘shipper’ is a fan who wishes for a particular pairing to share a romantic relationship (see Scodari and Felder 2000).

 

Bio:
Joseph Brennan
is a sessional lecturer in the Department of Media and Communications at the University of Sydney, where he was recently awarded his Ph.D. His doctoral work involved textual analysis of photo-montaged fan works inspired by BBC’s Merlin. Known as ‘slash manips,’ in these photo remixes fans layer images of male characters from popular media with gay, and often pornographic, material. He argues that these works are of scholarly interest because they have something to tell us about sex and bodies, about the divides we erect within male sexuality, between popular and pornographic, homosocial and homosexual, the implied and the explicit. He was Teaching Fellow at the University of Sydney, 2012–2013, and a critic with Australian Art Review, 2008–2013.

Read, Watch, Listen: A commentary on eye tracking and moving images – Tim J. Smith

Abstract

Eye tracking is a research tool that has great potential for advancing our understanding of how we watch movies. Questions such as how differences in the movie influences where we look and how individual differences between viewers alters what we see can be operationalised and empirically tested using a variety of eye tracking measures. This special issue collects together an inspiring interdisciplinary range of opinions on what eye tracking can (and cannot) bring to film and television studies and practice. In this article I will reflect on each of these contributions with specific focus on three aspects: how subtitling and digital effects can reinvigorate visual attention, how audio can guide and alter our visual experience of film, and how methodological, theoretical and statistical considerations are paramount when trying to derive conclusions from eye tracking data.

 

Introduction

I have been obsessed with how people watch movies since I was a child. All you have to do is turn and look at an audience member’s face at the movies or at home in front of the TV to see the power the medium holds over them. We sit enraptured, transfixed and immersed in the sensory patterns of light and sound projected back at us from the screen. As our physical activity diminishes our mental activity takes over. We piece together minimal audiovisual cues to perceive rich otherworldly spaces, believable characters and complex narratives that engage us mentally and move us emotionally. As I progressed through my education in Cognitive Science and Psychology I was struck by how little science understood about cinema and the mechanisms filmmakers used to create this powerful experience.[i] Reading the film literature, listening to filmmakers discuss their craft and excavating gems of their craft knowledge I started to realise that film was a medium ripe for psychological investigation. The empirical study of film would further our understanding of how films work and how we experience them but it would also serve as a test bed for investigating complex aspects of real-world cognition that were often considered beyond the realms of experimentation. As I (Smith, Levin & Cutting, 2010) and others (Anderson, 2006) have argued elsewhere, film evolved to “piggy back” normal cognitive development and use basic cognitive tendencies such as attentional preferences, theory of mind, empathy and narrative structuring of memory to make the perception of film as enjoyable and effortless as possible. By investigating film cognition we can, in turn advance our understanding of general cognition. But to do so we need to step outside of traditional disciplinary boundaries concerning the study of film and approach the topic from an interdisciplinary perspective. This special issue represents a highly commendable attempt to do just that.

By bringing together psychologists, film theorists, philosophers, vision scientists, neuroscientists and screenwriters this special issue (and the Melbourne research group that most contributors belong to) provides a unique perspective on film viewing. The authors included in this special issue share my passion for understanding the relationship between viewers and film but this interest manifests in very different ways depending on their perspectives (see Redmond, Sita, and Vincs, this issue; for a similar personal journey into eye tracking as that presented above). By focussing on viewer eye movements the articles in this special issue provide readers from a range of disciplines a way into the eye tracking investigation of film viewing. Eye tracking (as comprehensively introduced and discussed by Dyer and Pink, this issue) is a powerful tool for quantifying a viewer’s experience of a film, comparing viewing behaviour across different viewing conditions and groups as well as testing hypotheses about how certain cinematic techniques impact where we look. But, as is rightly highlighted by several of the authors in this special issue eye tracking is not a panacea for all questions about film spectatorship.

Like all experimental techniques it can only measure a limited range of psychological states and behaviours and the data it produces does not say anything in and of itself. Data requires interpretation. Interpretation can take many forms[ii] but if conclusions are to be drawn about how the data relates to psychological states of the viewer this interpretation must be based on theories of psychology and ideally confirmed using secondary/supporting measures. For example, the affective experience of a movie is a critical aspect which cognitive approaches to film are often wrongly accused of ignoring. Although, cognitive approaches to film often focus on how we comprehend narratives (Magliano and Zacks, 2011), attend to the image (Smith, 2013) or follow formal patterns within a film (Cutting, DeLong and Nothelfer, 2010) several cognitivists have focussed in depth on emotional aspects (see the work of Carl Plantinga, Torben Grodal or Murray Smith). Eye tracking is the perfect tool for investigating the impact of immediate audiovisual information on visual attention but it is less suitable for measuring viewer affect. Psychophysiological measures such as heart rate and skin conductance, neuroimaging methods such as fMRI or EEG, or even self-report ratings may be better for capturing a viewer’s emotional responses to a film as has been demonstrated by several research teams (Suckfull, 2000; Raz et al, 2014). Unless the emotional state of the viewer changed where they looked or how quickly they moved their eyes the eye tracker may not detect any differences between two viewers with different emotional states.[iii]

As such, a researcher interested in studying the emotional impact of a film should either choose a different measurement technique or combine eye tracking with another more suitable technique (Dyer and Pink, this issue). This does not mean that eye tracking is unsuitable for studying the cinematic experience. It simply means that you should always choose the right tool for the job and often this means combining multiple tools that are strong in different ways. As Murray Smith (the current President of the Society for the Cognitive Study of the Moving Images; SCSMI) has argued, a fully rounded investigation of the cinematic experience requires “triangulation” through the combination of multiple perspectives including psychological, neuroscientific and phenomenological/philosophical theory and methods (Smith, 2011) – an approach taken proudly across this special issue.

For the remainder of my commentary I would like to focus on certain themes that struck me as most personally relevant and interesting when reading the other articles in this special issue. This is by no means an exhaustive list of the themes raised by the other articles or even an assessment of the importance of the particular themes I chose to select. There are many other interesting observations made in the articles I do not focus on below but given my perspective as a cognitive scientist and current interests I decided to focus my commentary on these specific themes rather than make a comprehensive review of the special issues or tackle topics I am unqualified to comment on. Also, I wanted to take the opportunity to dispel some common misconceptions about eye tracking (see the section ‘Listening to the data’) and empirical methods in general.

Reading an image

One area of film cognition that has received considerable empirical investigation is subtitling. As Kruger, Szarkowska and Krejtz (this issue) so comprehensively review, they and I believe eye tracking is the perfect tool for investigating how we watch subtitled films. The presentation of subtitles divides the film viewing experience into a dual- task: reading and watching. Given that the media was originally designed to communicate critical information through two channels, the image and soundtrack introducing text as a third channel of communication places extra demands on the viewer’s visual system. However, for most competent readers serially shifting attention between these two tasks does not lead to difficulties in comprehension (Kruger, Szarkowska and Krejtz, this issue). Immediately following the presentation of the subtitles gaze will shift to the beginning of the text, saccade across the text and return to the centre of interest within a couple of seconds. Gaze heatmaps comparing the same scenes with and without subtitles (Kruger, Szarkowska and Krejtz, this issue; Fig. 3) show that the areas of the image fixated are very similar (ignoring the area of the screen occupied by the subtitles themselves) and rather than distracting from the visual content the presence of subtitles seems to actually condense the gaze behaviour on the areas of central interest in an image, e.g. faces and the centre of the image. This illustrates the redundancy of a lot of the visual information presented in films and the fact that under non-subtitle conditions viewers rarely explore the periphery of the image (Smith, 2013).

My colleague Anna Vilaró and I recently demonstrated this similarity in an eye tracking study in which the gaze behaviour of viewers was compared across versions of an animated film, Disney’s Bolt (Howard & Williams, 2008) either in the original English audio condition, a Spanish language version with English subtitles, an English language version with Spanish subtitles and a Spanish language version without subtitles (Vilaró, & Smith, 2011). Given that our participants were English speakers who did not know Spanish these conditions allowed us to investigate both where they looked under the different audio and subtitle conditions but also what they comprehended. Using cued recall tests of memory for verbal and visual content we found no significant differences in recall for either types of content across the viewing conditions except for verbal recall in the Spanish-only condition (not surprisingly given that our English participants couldn’t understand the Spanish dialogue). Analysis of the gaze behaviour showed clear evidence of subtitle reading, even in the Spanish subtitle condition (see Figure 1) but no differences in the degree to which peripheral objects were explored. This indicates that even when participants are watching film sequences without subtitles and know that their memory will be tested for the visual content their gaze still remains focussed on central features of a traditionally composed film. This supports arguments for subtitling movies over dubbing as, whilst placing greater demands on viewer gaze and a heightened cognitive load there is no evidence that subtitling leads to poorer comprehension.

Figure 1: Figure from Vilaró & Smith (2011) showing the gaze behaviour of multiple viewers directed to own language subtitles (A) and foreign language/uninterpretable subtitles (B).

Figure 1: Figure from Vilaró & Smith (2011) showing the gaze behaviour of multiple viewers directed to own language subtitles (A) and foreign language/uninterpretable subtitles (B).

The high degree of attentional synchrony (Smith and Mital, 2013) observed in the above experiment and during most film sequences indicates that all visual features in the image and areas of semantic significance (e.g. social information and objects relevant to the narrative) tend to point to the same part of the image (Mital, Smith, Hill and Henderson, 2011). Only when areas of the image are placed in conflict through image composition (e.g. depth of field, lighting, colour or motion contrast) or staging (e.g. multiple actors) does attentional synchrony break down and viewer gaze divide between multiple locations. Such shots are relatively rare in mainstream Hollywood cinema or TV (Salt, 2009; Smith, 2013) and when used the depicted action tends to be highly choreographed so attention shifts between the multiple centres of image in a predictable fashion (Smith, 2012). If such choreographing of action is not used the viewer can quickly exhaust the information in the image and start craving either new action or a cut to a new shot.

Hochberg and Brooks (1978) referred to this as the visual momentum of the image: the pace at which visual information is acquired. This momentum is directly observable in the saccadic behaviour during an images presentation with frequent short duration fixations at the beginning of a scene’s presentation interspersed by large amplitude saccades (known as the ambient phase of viewing; Velichovsky, Dornhoefer, Pannasch and Unema, 2000) and less frequent, longer duration fixations separated by smaller amplitude saccades as the presentation duration increases (known as the focal phase of viewing; Velichovsky et al., 2000). I have recently demonstrated the same pattern of fixations during viewing of dynamic scenes (Smith and Mital, 2013) and shown how this pattern gives rise to more central fixations at shot onset and greater exploration of the image and decreased attentional synchrony as the shot duration increases (Mital, Smith, Hill and Henderson, 2011). Interestingly, the introduction of subtitles to a movie may have the unintended consequence of sustaining visual momentum throughout a shot. The viewer is less likely to exhaust the information in the image because their eyes are busy saccading across the text to acquire the information that would otherwise be presented in parallel to the image via the soundtrack. This increased saccadic activity may increase the cognitive load experienced by viewers of subtitled films and change their affective experience, producing greater arousal and an increased sense of pace.

For some filmmakers and producers of dynamic visual media, increasing the visual momentum of an image sequence may be desirable as it maintains interest and attention on the screen (e.g. Michael Bay’s use of rapidly edited extreme Close-Ups and intense camera movements in the Transformer movies). In this modern age of multiple screens fighting for our attention when we are consuming moving images (e.g. mobile phones and computer screens in our living rooms and even, sadly increasingly at the cinema) if the designers of this media are to ensure that our visual attention is focussed on their screen over the other competing screens they need to design the visual display in a way that makes comprehension impossible without visual attention. Feature Films and Television dramas often rely heavily on dialogue for narrative communication and the information communicated through the image may be of secondary narrative importance to the dialogue so viewers can generally follow the story just by listening to the film rather than watching it. If producers of dynamic visual media are to draw visual attention back to the screen and away from secondary devices they need to increase the ratio of visual to verbal information. A simple way of accomplishing this is to present the critical audio information through subtitling. The more visually attentive mode of viewing afforded by watching subtitled film and TV may partly explain the growing interest in foreign TV series (at least in the UK) such as the popularity of Nordic Noir series such as The Bridge (2011) and The Killing (2007).

Another way of drawing attention back to the screen is to constantly “refresh” the visual content of the image by either increasing the editing rate or creatively using digital composition.[iv] The latter technique is wonderfully exploited by Sherlock (2010) as discussed brilliantly by Dwyer (this issue). Sherlock contemporised the detective techniques of Sherlock Holmes and John Watson by incorporating modern technologies such as the Internet and mobile phones and simultaneously updated the visual narrative techniques used to portray this information by using digital composition to playfully superimpose this information onto the photographic image. In a similar way to how the sudden appearance of traditional subtitles involuntarily captures visual attention and draws our eyes down to the start of the text, the digital inserts used in Sherlock overtly capture our eyes and encourage reading within the viewing of the image.

If Dwyer (this issue) had eyetracked viewers watching these excerpts she would have likely observed this interesting shifting between phases of reading and dynamic scene perception. Given that the appearance of the digital inserts produce sudden visual transients and are highly incongruous with the visual features of the background scene they are likely to involuntarily attract attention (Mital, Smith, Hill & Henderson, 2012). As such, they can be creatively used to reinvigorate the pace of viewing and strategically direct visual attention to parts of the image away from the screen centre. Traditionally, the same content may have been presented either verbally as narration, heavy handed dialogue exposition (e.g. “Oh my! I have just received a text message stating….”) or as a slow and laboured cut to close-up of the actual mobile phone so we can read it from the perspective of the character. Neither takes full advantage of the communicative potential of the whole screen space or our ability to rapidly attend to and comprehend visual information and audio information in parallel.

Such intermixing of text, digital inserts and filmed footage is common in advertisements, music videos, and documentaries (see Figure 2) but is still surprisingly rare in mainstream Western film and TV. Short-form audiovisual messages have recently experienced a massive increase in popularity due to the internet and direct streaming to smartphones and mobile devices. To maximise their communicative potential and increase their likelihood of being “shared” these videos use all audiovisual tricks available to them. Text, animations, digital effects, audio and classic filmed footage all mix together on the screen, packing every frame with as much info as possible (Figure 2), essentially maximising the visual momentum of each video and maintaining interest for as long as possible.[v] Such videos are so effective at grabbing attention and delivering satisfying/entertaining/informative experiences in a short period of time that they often compete directly with TV and film for our attention. Once we click play, the audiovisual bombardment ensures that our attention remains latched on to the second screen (i.e., the tablet or smartphone) for its duration and away from the primary screen, i.e., the TV set. Whilst distressing for producers of TV and Film who wish our experience of their material to be undistracted, the ease with which we pick up a handheld device and seek other stimulation in parallel to the primary experience may indicate that the primary material does not require our full attention for us to follow what is going on. As attention has a natural ebb-and-flow (Cutting, DeLong and Nothelfer, 2010) and “There is no such thing as voluntary attention sustained for more than a few seconds at a time” (p. 421; James, 1890) if modern producers of Film and TV want to maintain a high level of audience attention and ensure it is directed to the screen they must either rely on viewer self-discipline to inhibit distraction, reward attention to the screen with rich and nuanced visual information (as fans of “slow cinema” would argue of films like those of Bela Tarr) or utilise the full range of postproduction effects to keep visual interest high and maintained on the image, as Sherlock so masterfully demonstrates.

Figure 2: Gaze Heatmaps of participants’ free-viewing a trailer for Lego Indiana Jones computer game (left column) and the Video Republic documentary (right column). Notice how both make copious use of text within the image, as intertitles and as extra sources of information in the image (such as the head-up display in A3). Data and images were taken from the Dynamic Images and Eye Movement project (DIEM; Mital, Smith, Hill & Henderson, 2010). Videos can be found here (http://vimeo.com/6628451) and here (http://vimeo.com/2883321).

Figure 2: Gaze Heatmaps of participants’ free-viewing a trailer for Lego Indiana Jones computer game (left column) and the Video Republic documentary (right column). Notice how both make copious use of text within the image, as intertitles and as extra sources of information in the image (such as the head-up display in A3). Data and images were taken from the Dynamic Images and Eye Movement project (DIEM; Mital, Smith, Hill & Henderson, 2010). Videos can be found here (http://vimeo.com/6628451) and here (http://vimeo.com/2883321).

A number of modern filmmakers are beginning to experiment with the language of visual storytelling by questioning our assumptions of how we perceive moving images. Forefront in this movement are Ang Lee and Andy and Lana Wachowski. In Ang Lee’s Hulk (2003), Lee worked very closely with editor Tim Squyers to use non-linear digital editing and after effects to break apart the traditional frame and shot boundaries and create an approximation of a comic book style within film. This chaotic unpredictable style polarised viewers and was partly blamed for the film’s poor reception. However, it cannot be argued that this experiment was wholly unsuccessful. Several sequences within the film used multiple frames, split screens, and digital transformation of images to increase the amount of centres of interest on the screen and, as a consequence increase pace of viewing and the arousal experienced by viewers. In the sequence depicted below (Figure 3) two parallel scenes depicting Hulk’s escape from a containment chamber (A1) and this action being watched from a control room by General Ross (B1) were presented simultaneously by presenting elements of both scenes on the screen at the same time. Instead of using a point of view (POV) shot to show Ross looking off screen (known as the glance shot; Branigan, 1984) followed by a cut to what he was looking at (the object shot) both shots were combined into one image (F1 and F2) with the latter shot sliding into from behind Ross’ head (E2). These digital inserts float within the frame, often gliding behind objects or suddenly enlarging to fill the screen (A2-B2). Such visual activity and use of shots-within-shots makes viewer gaze highly active (notice how the gaze heatmap is rarely clustered in one place; Figure 3). Note that this method of embedding a POV object shot within a glance shot is similar to Sherlock’s method of displaying text messages as both the glance, i.e., Watson looking at his phone, and the object, i.e., the message, are shown in one image. Both uses take full advantage of our ability to rapidly switch from watching action to reading text without having to wait for a cut to give us the information.

Figure 3: Gaze heatmap of eight participants watching a series of shots and digital inserts from Hulk (Ang Lee, 2003). Full heatmap video is available at http://youtu.be/tErdurgN8Yg.

Figure 3: Gaze heatmap of eight participants watching a series of shots and digital inserts from Hulk (Ang Lee, 2003). Full heatmap video is available at http://youtu.be/tErdurgN8Yg.

Similar techniques have been used Andy and Lana Wachowski’s films including most audaciously in Speed Racer (2008). Interestingly, both sets of filmmakers seem to intuitively understand that packing an image with as much visual and textual information as possible can lead to viewer fatigue and so they limit such intense periods to only a few minutes and separate them with more traditionally composed sequences (typically shot/reverse-shot dialogue sequences). These filmmakers have also demonstrated similar respect for viewer attention and the difficulty in actively locating and encoding visual information in a complex visual composition in their more recent 3D movies. Ang Lee’s Life of Pi (2012) uses the visual volume created by stereoscopic presentation to its full potential. Characters inhabit layers within the volume as foreground and background objects fluidly slide around each other within this space. The lessons Lee and his editor Tim Squyers learned on Hulk (2003) clearly informed the decisions they made when tackling their first 3D film and allowed them to avoid some of the issues most 3D films experience such as eye strain, sudden unexpected shifts in depth and an inability to ensure viewers are attending to the part of the image easiest to fuse across the two eye images (Banks, Read, Allison & Watt, 2012).

Watching Audio

I now turn to another topic featured in this special issue, the influence of audio on gaze (Robinson, Stadler and Rassell, this issue). Film and TV are inherently multimodal. Both media have always existed as a combination of visual and audio information. Even early silent film was almost always presented with either live musical accompaniment or a narrator. As such, the relative lack of empirical investigation into how the combination of audio and visual input influences how we perceive movies and, specifically how we attend to them is surprising. Robinson, Stadler and Rassell (this issue) have attempted to address this omission by comparing eye movements for participants either watching the original version of the Omaha beach sequence from Steven Spielberg’s Saving Private Ryan (1998) or the same sequence with the sound removed. This film sequence is a great choice for investigating AV influences on viewer experience as the intensity of the action, the hand-held cinematography and the immersive soundscape all work together to create a disorientating embodied experience for the viewer. The authors could have approached this question by simply showing a set of participants the sequence with audio and qualitatively describing the gaze behaviour at interesting AV moments during the sequence. Such description of the data would have served as inspiration for further investigation but in itself can’t say anything about the causal contribution of audio to this behaviour as there would be nothing to compare the behaviour to. Thankfully, the authors avoided this problem by choosing to manipulate the audio.

In order to identify the causal contribution of any factor you need to design an experiment in which that factor (known as the Independent Variable) is either removed or manipulated and the significant impact of this manipulation on the behaviour of interest (known as the Dependent Variable) is tested using appropriate inferential statistics. I commend Robinson, Stadler and Rassell’s experimental design as they present such an manipulation and are therefore able to produce data that will allow them to test their hypotheses about the causal impact of audio on viewer gaze behaviour. Several other papers in this special issue (Redmond, Sita and Vincs; Batty, Perkins and Sita) discuss gaze data (typically in the form of scanpaths or heatmaps) from one viewing condition without quantifying its difference to another viewing condition. As such, they are only able to describe the gaze data, not use it to test hypotheses. There is always a temptation to attribute too much meaning to a gaze heatmap (I too am guilty of this; Smith, 2013) due to their seeming intuitive nature (i.e., they looked here and not there) but, as in all psychological measures they are only as good as the experimental design within which there are employed.[vi]

Qualitative interpretation of individual fixation locations, scanpaths or group heatmaps are useful for informing initial interpretation of which visual details are most likely to make it into later visual processing (e.g. perception, encoding and long term memory representations) but care has to be taken in falsely assuming that fixation equals awareness (Smith, Lamont and Henderson, 2012). Also, the visual form of gaze heatmaps vary widely depending on how many participants contribute to the heatmap, which parameters you choose to generate the heatmaps and which oculomotor measures the heatmap represent (Holmqvist, et al., 2011). For example, I have demonstrated that unlike during reading visual encoding during scene perception requires over 150ms during each fixation (Rayner, Smith, Malcolm and Henderson, 2009). This means that if fixations with durations less than 150ms are included in a heatmap it may suggest parts of the image have been processed which in actual fact were fixated too briefly to be processed adequately. Similarly, heatmaps representing fixation duration instead of just fixation location have been shown to be a better representation of visual processing (Henderson, 2003). Heatmaps have an immediate allure but care has to be taken about imposing too much meaning on them especially when the gaze and the image are changing over time (see Smith and Mital, 2013; and Sawahata et al, 2008 for further discussion). As eye tracking hardware becomes more available to researchers from across a range of disciplines we need to work harder to ensure that it is not used inappropriately and that the conclusions that are drawn from eye tracking data are theoretically and statistically motivated (see Rayner, 1998; and Holmqvist et al, 2013 for clear guidance on how to conduct sound eye tracking studies).

Given that Robinson, Stadler and Rassell (this issue) manipulated the critical factor, i.e., the presence of audio the question now is whether their study tells us anything new about the AV influences on gaze during film viewing. To examine the influence of audio they chose two traditional methods for expressing the gaze data: area of interest (AOI) analysis and dispersal. By using nine static (relative to the screen) AOIs they were able to quantify how much time the gaze spent in each AOI and utilise this measure to work out how distributed gaze was across all AOIs. Using these measures they reported a trend towards greater dispersal in the mute condition compared to the audio condition and a small number of significant differences in the amount of time spent in some regions across the audio conditions.

However, the conclusions we can draw from these findings are seriously hindered by the low sample size (only four participants were tested, meaning that any statistical test is unlikely to reveal significant differences) and the static AOIs that did not move with the image content. By locking the AOIs to static screen coordinates their AOI measures express the deviation of gaze relative to these coordinates, not to the image content. This approach can be informative for quantifying gaze exploration away from the screen centre (Mital, Smith, Hill and Henderson, 2011) but in order to draw conclusions about what was being fixated the gaze needs to be quantified relative to dynamic AOIs that track objects of interest on the screen (see Smith an Mital, 2013). For example, their question about whether we fixate a speaker’s mouth more in scenes where the clarity of the speech is difficult due to background noise (i.e., their “Indistinct Dialogue” scene) has previously been investigated in studies that have manipulated the presence of audio (Võ, Smith, Mital and Henderson, 2012) or the level of background noise (Buchan, Paré and Munhall, 2007) and measured gaze to dynamic mouth regions. As Robinson, Stadler and Rassell correctly predicted, lip reading increases as speech becomes less distinct or the listener’s linguistic competence in the spoken language decreases (see Võ et al, 2012 for review).

Similarly, by measuring gaze dispersal using a limited number of static AOIs they are losing considerable nuance in the gaze data and have to resort to qualitative description of unintuitive bar charts (figure 4). There exist several methods for quantifying gaze dispersal (see Smith and Mital, 2013, for review) and even open-source tools for calculating this measure and comparing dispersal across groups (Le Meur and Baccino, 2013). Some methods are as easy, if not easier to calculate than the static AOIs used in the present study. For example, the Euclidean distance between the screen centre and the x/y gaze coordinates at each frame of the movie provides a rough measure of how spread out the gaze is from the screen centre (typically the default viewing location; Mital et al, 2011) and a similar calculation can be performed between the gaze position of all participants within a viewing condition to get a measure of group dispersal.

Using such measures, Coutrot and colleagues (2012) showed that gaze dispersal is greater when you remove audio from dialogue film sequences and they have also observed shorter amplitude saccades and marginally shorter fixation durations. Although, I have recently shown that a non-dialogue sequence from Sergei Eisenstein’s Alexander Nevsky (1938) does not show significant differences in eye movement metrics when the accompanying music is removed (Smith, 2014). This difference in findings points towards interesting differences in the impact diegetic (within the depicted scene, e.g. dialogue) and non-diegetic (outside of the depicted scene, e.g. the musical score) may have on gaze guidance. It also highlights how some cinematic features may have a greater impact on other aspects of a viewer’s experience than those measureable by eye tracking such as physiological markers of arousal and emotional states. This is also the conclusion that Robinson, Stadler and Rassell come to.    

Listening to the Data (aka, What is Eye Tracking Good For?)

The methodological concerns I have raised in the previous section lead nicely to the article by William Brown, entitled There’s no I in Eye Tracking: How useful is Eye Tracking to Film Studies (this issue). I have known William Brown for several years through our attendance of the Society for Cognitive Studies of the Moving Image (SCSMI) annual conference and I have a deep respect for his philosophical approach to film and his ability to incorporate empirical findings from the cognitive neurosciences, including some references to my own work into his theories. Therefore, it comes somewhat as a surprise that his article openly attacks the application of eye tracking to film studies. However, I welcome Brown’s criticisms as it provides me with an opportunity to address some general assumptions about the scientific investigation of film and hopefully suggest future directions in which eye tracking research can avoid falling into some of the pitfalls Brown identifies.

Brown’s main criticisms of current eye tracking research are: 1) eye tracking studies neglect “marginal” viewers or marginal ways of watching movies; 2) studies so far have neglected “marginal” films; 3) they only provide “truisms”, i.e., already known facts; and 4) they have an implicit political agenda to argue that the only “true” way to study film is a scientific approach and the “best” way to make a film is to ensure homogeneity of viewer experience. I will address these criticisms in turn but before I do so I would like to state that a lot of Brown’s arguments could generally be recast as an argument against science in general and are built upon a misunderstanding of how scientific studies should be conducted and what they mean.

To respond to Brown’s first criticism that eye tracking “has up until now been limited somewhat by its emphasis on statistical significance – or, put simply, by its emphasis on telling us what most viewers look at when they watch films” (Brown, this issue; 1), I first have to subdivide the criticism into ‘the search for significance’ and ‘attentional synchrony’, i.e., how similar gaze is across viewers (Smith and Mital, 2013). Brown tells an anecdote about a Dutch film scholar who’s data had to be excluded from an eye tracking study because they did not look where the experimenter wanted them to look. I wholeheartedly agree with Brown that this sounds like a bad study as data should never be excluded for subjective reasons such as not supporting the hypothesis, i.e., looking as predicted. However, exclusion due to statistical reasons is valid if the research question being tested relates to how representative the behaviour of a small set of participants (known as the sample) are to the overall population. To explain when such a decision is valid and to respond to Brown’s criticism about only ‘searching for significance’ I will first need to provide a brief overview of how empirical eye tracking studies are designed and why significance testing is important.

For example, if we were interested in the impact sound had on the probability of fixating an actor’s mouth (e.g., Robinson, Stadler and Rassell, this issue) we would need to compare the gaze behaviour of a sample of participants who watch a sequence with the sound turned on to a sample who watched it with the sound turned off. By comparing the behaviour between these two groups using inferential statistics we are testing the likelihood that these two viewing conditions would differ in a population of all viewers given the variation within and between these two groups. In actual fact we do this by performing the opposite test: testing the probability that that the two groups belong to a single statistically indistinguishable group. This is known as the null hypothesis. By showing that there is less than a 5% chance that the null hypothesis is true we can conclude that there is a statistically significant chance that another sample of participants presented with the same two viewing conditions would show similar differences in viewing behaviour.

In order to test whether our two viewing conditions belong to one or two distributions we need to be able to express this distribution. This is typically done by identifying the mean score for each participant on the dependent variable of interest, in this case the probability of fixating a dynamic mouth AOI then calculating the mean for this measure across all participants within a group and their variation in scores (known as the standard deviation). Most natural measures produce a distribution of scores looking somewhat like a bell curve (known as the normal distribution) with most observations near the centre of the distribution and an ever decreasing number of observations as you move away from this central score. Each observation (in our case, participants) can be expressed relative to this distribution by subtracting the mean of the distribution from its score and dividing by the standard deviation. This converts a raw score into a normalized or z-score. Roughly ninety-five percent of all observations will fall within two standard deviations of the mean for normally distributed data. This means that observations with a z-score greater than two are highly unrepresentative of that distribution and may be considered outliers.

However, being unrepresentative of the group mean is insufficient motivation to exclude a participant. The outlier still belongs to the group distribution and should be included unless there is a supporting reason for exclusion such as measurement error, e.g. poor calibration of the eye tracker. If an extreme outlier is not excluded it can often have a disproportionate impact on the group mean and make statistical comparison of groups difficult. However, if this is the case it suggests that the sample size is too small and not representative of the overall population. Correct choice of sample size given an estimate of the predicted effect size combined with minimising measurement error should mean that subjective decisions do not have to be made about who’s data is “right” and who should be included or excluded.

Brown also believes that eye tracking research has so far marginalised viewers who have atypical ways of watching film, such as film scholars either by not studying them or treating them as statistical outliers and excluding them from analyses. However, I would argue that the only way to know if their way of watching a film is atypical is to first map out the distribution of how viewers typically watch films. If a viewer attended more to the screen edge than the majority of other viewers in a random sample of the population (as was the case with Brown’s film scholar colleague) this should show up as a large z-score when their gaze data is expressed relative to the group on a suitable measure such as Euclidean distance from the screen centre. Similarly, a non-native speaker of English may have appeared as an outlier in terms of how much time they spent looking at the speaker’s mouth in Robinson, Stadler and Rassell’s (this issue) study. Such idiosyncrasies may be of interest to researchers and there are statistical methods for expressing emergent groupings within the data (e.g. cluster analysis) or seeing whether group membership predicts behaviour (e.g. regression). These approaches may have not previously been applied to questions of film viewing but this is simply due to the immaturity of the field and the limited availability of the equipment or expertise to conduct such studies.

In my own recent work I have shown how viewing task influences how we watch unedited video clips (Smith and Mital, 2013), how infants watch TV (Wass and Smith, in press), how infant gaze differs to adult gaze (Smith, Dekker, Mital, Saez De Urabain and Karmiloff-Smith, in prep) and even how film scholars attend to and remember a short film compared to non-expert film viewers (Smith and Smith, in prep). Such group viewing differences are of great interest to me and I hope these studies illustrate how eye tracking has a lot to offer to such research questions if the right statistics and experimental designs are employed.

Brown’s second main criticism is that the field of eye tracking neglects “marginal” films. I agree that the majority of films that have so far been used in eye tracking studies could be considered mainstream. For example, the film/TV clips used in this special issue include Sherlock (2010), Up (2009) and Saving Private Ryan (1998). However, this limit is simply a sign of how few eye tracking studies of moving images there have been. All research areas take time to fully explore the range of possible research questions within that area.

I have always employed a range of films from diverse film traditions, cultures, and languages. My first published eye tracking study (Smith and Henderson, 2008) used film clips from Citizen Kane (1941), Dogville (2003), October (1928), Requiem for a Dream (2000), Dancer in the Dark (2000), Koyaanisqatsi (1982) and Blade Runner (1982). Several of these films may be considered “marginal” relative to the mainstream. If I have chosen to focus most of my analyses on mainstream Hollywood cinema this is only because they were the most suitable exemplars of the phenomena I was investigating such as continuity editing and its creation of a universal pattern of viewing (Smith, 2006; 2012). This interest is not because, as Brown argues, I have a hidden political agenda or an implicit belief that this style of filmmaking is the “right” way to make films. I am interested in this style because it is the dominant style and, as a cognitive scientist I wish to use film as a way of understanding how most people process audiovisual dynamic scenes.

Hollywood film stands as a wonderfully rich example of what filmmakers think “fits” human cognition. By testing filmmaker intuitions and seeing what impact particular compositional decisions have on viewer eye movements and behavioural responses I hope to gain greater insight into how audiovisual perception operates in non-mediated situations (Smith, Levin and Cutting, 2012). But, just as a neuropsychologist can learn about typical brain function by studying patients with pathologies such as lesions and strokes, I can also learn about how we perceive a “typical” film by studying how we watch experimental or innovative films. My previous work is testament to this interest (Smith, 2006; 2012a; 2012b; 2014; Smith & Henderson, 2008) and I hope to continue finding intriguing films to study and further my understanding of film cognition.

One practical reason why eye tracking studies rarely use foreign language films is the presence of subtitles. As has been comprehensively demonstrated by other authors in this special issue (Kruger, Szarkowska and Krejtz, this issue) and earlier in this article, the sudden appearance of text on the screen, even if it is incomprehensible leads to differences in eye movement behaviour. This invalidates the use of eye tracking as a way to measure how the filmmaker intended to shape viewer attention and perception. The alternatives would be to either use silent film (an approach I employed with October; Smith and Henderson, 2008), remove the audio (which changes gaze behaviour and awareness of editing; Smith & Martin-Portugues Santacreau, under review) or use dubbing (which can bias the gaze down to the poorly synched lips; Smith, Batten, and Bedford, 2014). None of these options are ideal for investigating foreign language sound film and until there is a suitable methodological solution this will restrict eye tracking studies to experimental films in a participant’s native language.

Finally, I would like to counter Brown’s assertion that eye tracking investigations of film have so far only generated “truisms”. I admit that there is often a temptation to reduce empirical findings to simplified take-home messages that only seem to confirm previous intuitions such as a bias of gaze towards the screen centre, towards speaking faces, moving objects or subtitles. However, I would argue that such messages fail to appreciate the nuance in the data. Empirical data correctly measured and analysed can provide subtle insights into a phenomenon that subjective introspection could never supply.

For example, film editors believe that an impression of continuous action can be created across a cut by overlapping somewhere between two (Anderson, 1996) and four frames (Dmytryk, 1986) of the action. However, psychological investigations of time perception revealed that our judgements of duration depend on how attention is allocated during the estimated period (Zakay and Block, 1996) and will vary depending on whether our eyes remain still or saccade during the period (Yarrow et al, 2001). In my thesis (Smith, 2006) I used simplified film stimuli to investigate the role that visual attention played in estimation of temporal continuity across a cut and found that participants experienced an overlap of 58.44ms as continuous when an unexpected cut occurred during fixation and an omission of 43.63ms as continuous when they performed a saccade in response to the cut. As different cuts may result in different degrees of overt (i.e., eye movements) and covert attentional shifts these empirical findings both support editor intuitions that temporal continuity varies between cuts (Dmytryk, 1986) whilst also explaining the factors that are important in influencing time perception at a level of precision not possible through introspection.

Reflecting on our own experience of a film suffers from the fact that it relies on our own senses and cognitive abilities to identify, interpret and express what we experience. I may feel that my experience of a dialogue sequence from Antichrist (2010) differs radically from a similar sequence from Secrets & Lies (1996) but I would be unable to attribute these differences to different aspects of the two scenes without quantifying both the cinematic features and my responses to them. Without isolating individual features I cannot know their causal contribution to my experience. Was it the rapid camera movements in Antichrist, the temporally incongruous editing, the emotionally extreme dialogue or the combination of these features that made me feel so unsettled whilst watching the scene? If one is not interested in understanding the causal contributions of each cinematic decision to an audience member’s response then one may be content with informed introspection and not find empirical hypothesis testing the right method. I make no judgement about the validity of either approach as long as each researcher understands the limits of their approach.

Introspection utilises the imprecise measurement tool that is the human brain and is therefore subject to distortion, human bias and an inability to extrapolate the subjective experience of one person to another. Empirical hypothesis testing also has its limitations: research questions have to be clearly formulated so that hypotheses can be stated in a way that allows them to be statistically tested using appropriate observable and reliable measurements. A failure at any of these stages can invalidate the conclusions that can be drawn from the data. For example, an eye tracker may be poorly calibrated resulting in an inaccurate record of where somebody was looking or it could be used to test an ill formed hypothesis such as how a particular film sequence caused attentional synchrony without having another film sequence to compare the gaze data to. Each approach has its strength and weaknesses and no single approach should be considered “better” than any other, just as no film should be considered “better” than any other film.

Conclusion

The articles collected here constitute the first attempt to bring together interdisciplinary perspectives on the application of eye tracking to film studies. I fully commend the intention of this special issue and hope that it encourages future researchers to conduct further studies using these methods to investigate research questions and film experiences we have not even conceived of. However, given that the recent release of low-cost eye tracking peripherals such as the EyeTribe[vii] tracker and the Tobii EyeX[viii] has moved eye tracking from a niche and highly expensive research tool to an accessible option for researchers in a range of disciplines, I need to take this opportunity to issue a word of warning. As I have outlined in this article, eye tracking is like any other research tool in that it is only useful if used correctly, its limitations are respected, its data is interpreted through the appropriate application of statistics and conclusions are only drawn that are based on the data in combination with a sound theoretical base. Eye tracking is not the “saviour” of film studies , nor is science the only “valid” way to investigate somebody’s experience of a film. Hopefully, the articles in this special issue and the ideas I have put forward here suggest how eye tracking can function within an interdisciplinary approach to film analysis that furthers our appreciation of film in previously unfathomed ways.

 

Acknowledgements

Thanks to Rachael Bedford, Sean Redmond and Craig Batty for comments on earlier drafts of this article. Thank you to John Henderson, Parag Mital and Robin Hill for help in gathering and visualising the eye movement data used in the Figures presented here. Their work was part of the DIEM Leverhulme Trust funded project (https://thediemproject.wordpress.com/). The author, Tim Smith is funded by EPSRC (EP/K012428/1), Leverhulme Trust (PLP-2013-028) and BIAL Foundation grant (224/12).

 

References

Anderson, Joseph. 1996. The Reality of Illusion: An Ecological Approach to Cognitive Film Theory. Southern. Illinois University Press.

Batty, Craig, Claire Perkins and Jodi Sita. 2015. “How We Came To Eye Tracking Animation: A Cross-Disciplinary Approach to Researching the Moving Image”, Refractory: a Journal of Entertainment Media, 25.

Banks, Martin S., Jenny R. Read, Robert S. Allison and Simon J. Watt. 2012. “Stereoscopy and the human visual system.” SMPTE Mot. Imag. J., 121 (4), 24-43

Bradley, Margaret M., Laura Miccoli, Miguel A. Escrig and Peter J. Lang. 2008. “The pupil as a measure of emotional arousal and autonomic activation.” Psychophysiology, 45(4), 602-607.

Branigan, Edward R. 1984. Point of View in the Cinema: A Theory of Narration and Subjectivity in Classical Film. Berlin: Mouton.

Brown, William. 2015. “There’s no I in Eye Tacking: How Useful is Eye Tracking to Film Studies?”, Refractory: a Journal of Entertainment Media, 25.

Buchan, Julie N., Martin Paré and Kevin G. Munhall. 2007. “Spatial statistics of gaze fixations during dynamic face processing.” Social Neuroscience, 2, 1–13.

Coutrot, Antoine, Nathalie Guyader, Gelu Ionesc and Alice Caplier. 2012. “Influence of Soundtrack on Eye Movements During Video Exploration”, Journal of Eye Movement Research 5, no. 4.2: 1-10.

Cutting, James. E., Jordan E. DeLong and Christine E. Nothelfer. 2010. “Attention and the evolution of Hollywood film.” Psychological Science, 21, 440-447.

Dwyer, Tessa. 2015. “From Subtitles to SMS: Eye Tracking, Texting and Sherlock”, Refractory: a Journal of Entertainment Media, 25.

Dyer, Adrian. G and Sarah Pink. 2015. “Movement, attention and movies: the possibilities and limitations of eye tracking?”, Refractory: a Journal of Entertainment Media, 25.

Dmytryk, Edward. 1986. On Filmmaking. London, UK: Focal Press.

Henderson, John. M., 2003. “Human gaze control during real-world scene perception.” Trends in Cognitive Sciences, 7, 498-504.

Hochberg, Julian and Virginia Brooks. 1978). “Film Cutting and Visual Momentum”. In John W. Senders, Dennis F. Fisher and Richard A. Monty (Eds.), Eye Movements and the Higher Psychological Functions (pp. 293-317). Hillsdale, NJ: Lawrence Erlbaum.

Holmqvist, Kenneth, Marcus Nyström, Richard Andersson, Richard Dewhurst, Halszka Jarodzka and Joost van de Weijer. 2011. Eye Tracking: A comprehensive guide to methods and measures. Oxford, UK: OUP Press.

James, William. 1890. The principles of psychology (Vol.1). New York: Holt

Kruger, Jan Louis, Agnieszka Szarkowska and Izabela Krejtz. 2015. “Subtitles on the Moving Image: An Overview of Eye Tracking Studies”, Refractory: a Journal of Entertainment Media, 25.

Le Meur, Olivier and Baccino, Thierry. 2013. “Methods for comparing scanpaths and saliency maps: strengths and weaknesses.” Behavior research methods, 45(1), 251-266.

Magliano, Joseph P. and Jeffrey M. Zacks. 2011. “The Impact of Continuity Editing in Narrative Film on Event Segmentation.” Cognitive Science, 35(8), 1-29.

Mital, Parag K., Tim J. Smith, Robin Hill. and John M. Henderson. 2011. “Clustering of gaze during dynamic scene viewing is predicted by motion.” Cognitive Computation, 3(1), 5-24

Rayner, Keith. 1998. “Eye movements in reading and information processing: 20 years of research”. Psychological Bulletin, 124(3), 372-422.

Rayner, Keith, Tim J. Smith, George Malcolm and John M. Henderson, J.M. 2009. “Eye movements and visual encoding during scene perception.” Psychological Science, 20, 6-10.

Raz, Gal, Yael Jacob, Tal Gonen, Yonatan Winetraub, Tamar Flash, Eyal Soreq and Talma Hendler. 2014. “Cry for her or cry with her: context-dependent dissociation of two modes of cinematic empathy reflected in network cohesion dynamics.” Social cognitive and affective neuroscience, 9(1), 30-38.

Redmond, Sean, Jodi Sita and Kim Vincs. 2015. “Our Sherlockian Eyes: the Surveillance of Vision”, Refractory: a Journal of Entertainment Media, 25.

Robinson, Jennifer, Jane Stadler and Andrea Rassell. 2015. “Sound and Sight: An Exploratory Look at Saving Private Ryan through the Eye-tracking Lens”, Refractory: a Journal of Entertainment Media, 25.

Salt, Barry. 2009. Film Style and Technology: History and Analysis (Vol. 3rd). Totton, Hampshire, UK: Starword.

Sawahata, Yasuhito, Rajiv Khosla, Kazuteru Komine, Nobuyuki Hiruma, Takayuki Itou, Seiji Watanabe, Yuji Suzuki, Yumiko Hara and Nobuo Issiki. 2008. “Determining comprehension and quality of TV programs using eye-gaze tracking.” Pattern Recognition, 41(5), 1610-1626.

Smith, Murray. 2011. “Triangulating Aesthetic Experience”, paper presented at the annual Society for Cognitive Studies of the Moving Image conference, Budapest, June 8–11, 201

Smith, Tim J. 2006. An Attentional Theory of Continuity Editing. Ph.D., University of Edinburgh, Edinburgh, UK.

Smith, Tim J. 2012a. “The Attentional Theory of Cinematic Continuity”, Projections: The Journal for Movies and the Mind. 6(1), 1-27.

Smith, Tim J. 2012b. “Extending AToCC: a reply,” Projections: The Journal for Movies and the Mind. 6(1), 71-78

Smith, Tim J. 2013. “Watching you watch movies: Using eye tracking to inform cognitive film theory.” In A. P. Shimamura (Ed.), Psychocinematics: Exploring Cognition at the Movies. New York: Oxford University Press. pages 165-191

Smith, Tim J. 2014. “Audiovisual correspondences in Sergei Eisenstein’s Alexander Nevsky: a case study in viewer attention”. Cognitive Media Theory (AFI Film Reader), Eds. P. Taberham & T. Nannicelli.

Smith, Tim J., Jonathan Batten and Rachael Bedford. 2014. “Implicit detection of asynchronous audiovisual speech by eye movements.” Journal of Vision,14(10), 440-440.

Smith, Tim J., Dekker, T., Mital, Parag K., Saez De Urabain, I. R. & Karmiloff-Smith, A., In Prep. “Watch like mother: Motion and faces make infant gaze indistinguishable from adult gaze during Tot TV.”

Smith, Tim J. and John M. Henderson. 2008. “Edit Blindness: The relationship between attention and global change blindness in dynamic scenes”. Journal of Eye Movement Research, 2(2):6, 1-17.

Smith Tim J., Peter Lamont and John M. Henderson. 2012. “The penny drops: Change blindness at fixation.” Perception 41(4) 489 – 492

Smith, Tim J., Daniel Levin and James E. Cutting. 2012. “A Window on Reality: Perceiving Edited Moving Images.” Current Directions in Psychological Science. 21: 101-106

Smith, Tim J. and Parag K. Mital. 2013. “Attentional synchrony and the influence of viewing task on gaze behaviour in static and dynamic scenes”. Journal of Vision 13(8): 16.

Smith, Tim J. and Janet Y. Martin-Portugues Santacreu. Under Review. “Match-Action: The role of motion and audio in limiting awareness of global change blindness in film.”

Smith, Tim. J. and Murray Smith. In Prep. “The impact of expertise on eye movements during film viewing.”

Suckfull, Monika. 2000. “Film Analysis and Psychophysiology Effects of Moments of Impact and Protagonists”. Media Psychology2(3), 269-301.

Vilaro, Anna and Tim J. Smith. 2011. “Subtitle reading effects on visual and verbal information processing in films.” Published abstract In Perception. ECVP abstract supplement, 40. (p. 153). European Conference on Visual Perception. Toulousse, France.

Velichkovsky, Boris M., Sascha M. Dornhoefer, Sebastian Pannasch and Pieter J. A. Unema. 2001. “Visual fixations and level of attentional processing”. In Andrew T. Duhowski (Ed.), Proceedings of the International Conference Eye Tracking Research & Applications, Palm Beach Gardens, FL, November 6-8, ACM Press.

Wass, Sam V. and Tim J. Smith. In Press. “Visual motherese? Signal-to-noise ratios in toddler-directed television,” Developmental Science

Yarrow, Kielan, Patrick Haggard, Ron Heal, Peter Brown and John C. Rothwell. 2001. “Illusory perceptions of space and time preserve cross-saccadic perceptual continuity”. Nature, 414.

Zakay, Dan and Richard A. Block. 1996. Role of Attention in Time Estimation Processes. Time, Internal Clocks, and Movement. Elsevier Science.

 

Notes

[ii] An alternative take on eye tracking data is to divorce the data itself from psychological interpretation. Instead of viewing a gaze point as an index of where a viewer’s overt attention is focussed and a record of the visual input most likely to be encoded into the viewer’s long-term experience of the media, researchers can instead take a qualitative, or even aesthetic approach to the data. The gaze point becomes a trace of some aspect of the viewer’s engagement with the film. The patterns of gaze, its movements across the screen and the coordination/disagreement between viewers can inform qualitative interpretation without recourse to visual cognition. Such an approach is evident in several of the articles in this special issue (including Redmond, Sita, and Vincs, this issue; Batty, Perkins, and Sita, this issue). This approach can be interesting and important for stimulating hypotheses about how such patterns of viewing have come about and may be a satisfying endpoint for some disciplinary approaches to film. However, if researchers are interested in testing these hypotheses further empirical manipulation of the factors that are believed to be important and statistical testing would be required. During such investigation current theories about what eye movements are and how they relate to cognition must also be respected.

[iii] Although, one promising area of research is the use of pupil diameter changes as an index of arousal (Bradley, Miccoli, Escrig and Lang, 2008).

[iv] This technique has been used for decades by producers of TV advertisements and by some “pop” serials such as Hollyoaks in the UK (Thanks for Craig Batty for this observation).

[v] This trend in increasing pace and visual complexity of film is confirmed by statistical analyses of film corpora over time (Cutting, DeLong and Nothelfer, 2010) and has resulted in a backlash and increasing interest in “slow cinema”.

[vi] Other authors in this special issue may argue that taking a critical approach to gaze heatmaps without recourse to psychology allows them to embed eye tracking within their existing theoretical framework (such as hermeneutics). However, I would warn that eye tracking data is simply a record of how a relatively arbitrary piece of machinery (the eye tracking hardware) and associated software decided to represent the centre of a viewer’s gaze. There are numerous parameters that can be tweaked to massively alter how such gaze traces and heatmaps appear. Without understanding the psychology and the physiology of the human eye a researcher cannot know how to set these parameters, how much to trust the equipment they are using, or the data it is recording and as a consequence may over attribute interpretation to a representation that is not reliable.

[vii] https://theeyetribe.com/ (accessed 13/12/14). The EyeTribe tracker is $99 and is as spatially and temporally accurate (up to 60Hz sampling rate) as some science-grade trackers.

[viii] http://www.tobii.com/eye-experience/ (accessed 13/12/14). The Tobii EyeX tracker is $139, samples at 30Hz and is as spatially accurate as the EyeTribe although the EyeX does not give you as much access to the raw gaze data (e.g., pupil size and binocular gaze coordinates) as the EyeTribe.

 

Bio

Dr Tim J. Smith is a senior lecturer in the Department of Psychological Sciences at Birkbeck, University of London. He applies empirical Cognitive Psychology methods including eye tracking to questions of Film Cognition and has published extensively on the subject both in Psychology and Film journals.

 

From Subtitles to SMS: Eye Tracking, Texting and Sherlock – Tessa Dwyer

Abstract

As we progress into the digital age, text is experiencing a resurgence and reshaping as blogging, tweeting and phone messaging establish new textual forms and frameworks. At the same time, an intrusive layer of text, obviously added in post, has started to feature on mainstream screen media – from the running subtitles of TV news broadcasts to the creative portrayals of mobile phone texting on film and TV dramas. In this paper, I examine the free-floating text used in BBC series Sherlock (2010–). While commentators laud this series for the novel way it integrates text into its narrative, aesthetic and characterisation, it requires eye tracking to unpack the cognitive implications involved. Through recourse to eye tracking data on image and textual processing, I revisit distinctions between reading and viewing, attraction and distraction, while addressing a range of issues relating to eye bias, media access and multimodal redundancy effects.

Figure 1

Figure 1: Press conference in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

Introduction

BBC’s Sherlock (2010–) has received considerable acclaim for its creative deployment of text to convey thought processes and, most notably, to depict mobile phone messaging. Receiving high-profile write-ups in The Wall Street Journal (Dodes, 2013) and Wired UK, this innovative representational strategy has been hailed an incisive reflection of our current “transhuman” reality and “a core element of the series’ identity” (McMillan 2014).[1] In the following discussion, I deploy eye tracking data to develop an alternate perspective on this phenomenon. While Sherlock’s on-screen text directly engages with the emerging modalities of digital and online technologies, it also borrows from more conventional textual tools like subtitling and captioning or SDH (subtitling for the deaf and hard-of-hearing). Most emphatically, the presence of floating text in Sherlock challenges the presumption that screen media is made to be viewed, not read. To explore this challenge in detail, I bring Sherlock’s inventive titling into contact with eye tracking research on subtitle processing, using insights from audiovisual translation (AVT) studies to investigate the complexities involved in processing dynamic text on moving-image screens. Bridging screen and translation studies via eye tracking, I consider recent on-screen text developments in relation to issues of media access and linguistic diversity, noting the gaps or blind spots that regularly infiltrate research frameworks. Discussion focuses on ‘A Study in Pink’ – the first episode of Sherlock’s initial season – which producer Sue Vertue explains was actually “written and shot last, and so could make the best use of onscreen text as additional script and plot points” (qtd in McMillan, 2014).

Texting Sherlock

Figure 2

Figure 2: Watson reads a text message in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

The phenomenon under investigation in this article is by no means easy to define. Already it has inspired neologisms, word mashes and acronyms including TELOP (television optical projection), ‘impact captioning’ (Sasamoto, 2014), ‘decotitles’ (Kofoed, 2011), ‘beyond screen text messaging’ (Zhang 2014) and ‘authorial titling’ (Pérez González, 2012). While slight differences in meaning separate such terms from one another, the on-screen text in Sherlock fits all. Hence, in this discussion, I alternate between them and often default to more general terms like ‘titling’ and ‘on-screen text’ for their wide applicability across viewing devices and subject matter. This approach preserves the terminological ambiguity that attaches to this phenomenon instead of seeking to solve it, finding it symptomatic of the rapid rate of technological development with which it engages. Whatever term is decided upon today could well be obsolete tomorrow. Additionally, as Rick Altman (2004: 16) notes in his ‘crisis historiography’ of silent and early sound film, the “apparently innocuous process of naming is actually one of culture’s most powerful forms of appropriation.” He argues that in the context of new technologies and the representational codes they engender, terminological variance and confusion signals an identity crisis “reflected in every aspect of the new technology’s socially defined existence” (19).

According to the write-ups, phone messaging is the hero of BBC’s updated and rebooted Sherlock adaptation. Almost all the press garnered around Sherlock’s on-screen text links this strategy to mobile phone ‘texting’ or SMS (short messaging service). Reporting on “the storytelling challenges of a world filled with unglamorous smartphones, texting and social media”, The Wall Street Journal’s Rachel Dodes (2013) credits Sherlock with solving this dilemma and establishing a new convention for depicting texting on the big screen, creatively capturing “the real world’s digital transformation of everyday life.” For Mariel Calloway (2013), “Sherlock is honest about the role of technology and social media in daily life and daily thought… the seamless way that text messages and internet searches integrate into our lives.” Wired’s Graeme McMillan (2014) ups the ante, naming Sherlock a “new take” on “television drama as a whole” due precisely to its on-screen texting technique that sets it apart from other “tech-savvy shows out there”. McMillan continues, that “as with so many aspects of Sherlock, there’s an element of misdirection going on here, with the fun, eye-catching slickness of the visualization distracting from a deeper commentary the show is making about its characters relationship with technology – and, by extension, our own relationship with it, as well.”

As this flurry of media attention makes clear, praise for Sherlock’s on-screen text or texting firmly anchors this strategy to technology and its newly evolving forms, most notably the iPhone or smartphone. Appearing consistently throughout the series’ three seasons to date, on-screen text in Sherlock occurs in a plain, uniform white sans-serif font that appears unadorned over the screen image, obviously added during post-production. This text is superimposed, pure and simple, relying on neither text bubbles nor coloured boxes nor sender ID’s to formally separate it from the rest of the image area. As Michele Tepper (2011) eloquently notes, by utilising text in this way, Sherlock “is capturing the viewer’s screen as part of the narrative itself”:

It’s a remarkably elegant solution from director Paul McGuigan. And it works because we, the viewing audience, have been trained to understand it by the last several years of service-driven, multi-platform, multi-screen applications. Last week’s iCloud announcement is just the latest iteration of what can happen when your data is in the cloud and can be accessed by a wide range of smart-enough devices. Your VOIP phone can show caller ID on your TV; your iPod can talk to both your car and your sneakers; Twitter is equally accessible via SMS or a desktop application. It doesn’t matter where or what the screen is, as long as it’s connected to a network device. … In this technological environment, the visual conceit that Sherlock’s text message could migrate from John Watson’s screen to ours makes complete and utter sense.

Unlike on-screen text in Glee (Fox, 2009–), for instance (see Fig. 3), that is used only occasionally in episodes like ‘Feud’ (Season 4, Ep 16, March 14, 2013), Sherlock flaunts its on-screen text as signature. Its consistently interesting textual play helps to give the series cohesion. Yet, just as it aids in characterisation, helps to progress the narrative, and binds the series as a whole, it also, necessarily, remains at somewhat of a remove, as an overtly post-production effect.

Figure 3

Figure 3: Ryder chats online in ‘Feud’, Glee (2013), Episode 16, Season 4.

While Tepper (2011) explains how Sherlock’s “disembodied” (Banks, 2014) texting ‘makes sense’ in the age of cross-platform devices and online clouds, this argument falters when the on-screen text in question is less overtly technological. The extradiegetic nature of this on-screen text – so obviously a ‘post’ effect – is brought to the fore when it is used to render thoughts and emotions rather than technological interfacing. In ‘A Study in Pink’, a large proportion of the text that pops up intermittently on-screen functions to represent Sherlock’s interiority, not his Internet prowess. In concert with camera angles and “microscopic close-ups”, it elucidates Sherlock’s forensic “mind’s eye” (Redmond, Sita and Vincs, this issue), highlighting clues and literally spelling out their significance (see Figs. 4 and 5). The fact that these human-coded moments of titling have received far less attention in the press than those that more directly index new technologies is fascinating in itself, revealing the degree to which praise for Sherlock’s on-screen text is invested in ideas of newness and technological innovation – underlined by the predilection for neologisms.

Figure 4

Figures 4: Sherlock examines the pink lady’s ring in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

Figure 5

Figures 5: Sherlock examines the pink lady’s ring in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

Of course, even when not attached to smartphones or data retrieval, Sherlock’s deployment of on-screen text remains fresh, creative and playful and still signals perceptual shifts resulting from technological transformation. Even when representing Sherlock’s thoughts, text flashes on screen manage to recall the excesses of the digital, when email, Facebook and Twitter ensconce us in streams of endlessly circulating words, and textual pop-ups are ubiquitous. Nevertheless, the blinkered way in which Sherlock’s on-screen text is repeatedly framed as, above all, a means of representing mobile phone texting functions to conceal some of its links to older, more conventional forms of titling and textual intervention, from silent-era intertitles to expository titles to subtitles. By relentlessly emphasising its newness, much discussion of Sherlock’s on-screen text overlooks links to a host of related past and present practices. Moreover, Sherlock’s textual play actually invites a rethinking of these older, ongoing text-on-screen devices.

Reading, Watching, Listening

As Szarkowska and Kruger (this issue) explain, research into subtitle processing builds upon earlier eye tracking studies on the reading of static, printed text. They proceed to detail differences between subtitle and ‘regular’ reading, in relation to factors like presentation speed, information redundancy, and sensory competition between different multimodal channels. Here, I focus on differences between saccadic or scanning movements and fixations, in order to compare data across the screen and translation fields. During ‘regular’ reading (of static texts) average saccades last 20 to 50 milliseconds (ms) while fixations range between 100 and 500ms, averaging 200 to 300ms (Rayner, 1998). Referencing pioneering studies into subtitle processing by Géry d’Ydewalle and associates, Szarkowska et al. (2013: 155) note that “when reading film subtitles, as opposed to print, viewers tend to make more regressions” and fixations tend to be shorter. Regressions occur when the eye returns to material that has already been read, and Rayner (1998: 393) finds that slower readers (of static text) make more regressions than faster readers. A study by d’Ydewalle and de Bruycker (2007: 202) found “the percentage of regressions in reading subtitles was globally, among children and adults, much higher than in normal text reading.” They also report that mean fixation durations in the subtitles was shorter, at 178 ms (for adults) and explain that subtitle regressions (where the eye travels back across words already read) can be partly explained by the “considerable information redundancy” that occurs when “[s]ubtitle, soundtrack (including the voice and additional information such as intonation, background noise, etc.), and image all provide partially overlapping information, eliciting back and forth shifts with the image and more regressive eye-movements” (202).

What happens to saccades and fixations when image processing is brought into the mix? When looking at static images, average fixations last 330 ms (Rayner, 1998). This figure is slightly longer than average fixations during regular reading and longer again than average subtitle fixations. Szarkowska and Kruger (this issue) note that “reading requires many successive fixations to extract information whereas looking at a scene requires fewer, but longer fixations” that tend to be more exploratory or ambient in nature, taking in a greater area of focus. In relation to moving-images, Smith (2013: 168) finds that viewers take in roughly 3.8% of the total screen area during an average length shot. Peripheral processing is at play but “is mostly reserved for selecting future saccade targets, tracking moving targets, and extracting gist about scene category, layout and vague object information”. In thinking about these differences in regular reading behaviour, screen viewing, and subtitle processing, it is noticeable that with subtitles, distinctions between fixations and saccades are less clear-cut. While saccades last between 20 and 50ms, Smith (2013: 169) notes that the smallest amount of time taken to perform a saccadic eye movement (taking into account saccadic reaction time) is 100-130ms. Recalling d’Ydewalle and de Bruycker’s (2007: 202) finding that fixations during subtitle processing last around 178ms, it would seem that subtitle conditions blur the boundaries somewhat between saccades and fixations, scanning and reading.

Interestingly, studies have also shown that the processing of two-line subtitles involves more regular word-by-word reading than for one-liners (D’Ydewalle and de Bruycker, 2007: 199). D’Ydewalle and de Bruycker (2007: 199) report, for instance, that more words are skipped and more regressions occur for one-line subtitles than for two-line subtitles. Two-line subtitles result in a larger proportion of time being spent in the subtitle area, and occasion more back-and-forth shifts between the subtitles and the remaining image area (201). This finding suggests that the processing of one-line subtitles differs considerably from regular reading behaviour. D’Ydewalle and de Bruycker (2007: 202) surmise that the distinct way in which one-line subtitles are processed relates to a redundancy effect caused by the multimodal nature of screen media. Noting how one-line subtitles often convey short exclamations and outcries, they suggest that a “standard one-line subtitle generally does not provide much more information than what can already be extracted from the picture and the auditory message.” They conclude that one-line subtitles occasion “less reading” than two-line subtitles (202). Extrapolating further, I posit that the routine overlapping of information that occurs in subtitled screen media blurs lines between reading and watching. One-line subtitles are ‘read’ irregularly and partly blind – that is, they are regularly skipped and processed through saccadic eye movements rather than fixations.

This suggestion is supported by data on subtitle skipping. Szarkowska and Kruger (this issue) find that longer subtitles containing frequently used words are easier and quicker to process than shorter subtitles containing low-frequency words. Hence, they conclude that cognitive load relates more to word familiarity than quantity, something that is overlooked in many professional subtitling guidelines. This finding indicates that high-frequency words are processed ‘differently’ in subtitling than in static text, in a manner more akin to visual recognition or scanning than reading. Szarkowska and Kruger find that high-frequency words in subtitles are often skipped. Hence, as with one-line subtitles, high-frequency words are, to a degree, processed blind, possibly through shape recognition and mapping more than durational focus. In relation to other types of on-screen text, such as the short, free-floating type that characterises Sherlock, it seems entirely possible that this innovative mode of titling may just challenge distinctions between text and image processing. While commentators laud this series for the way it integrates on-screen text into its narrative, style and characterisation, eye tracking is required to unpack the cognitive implications of Sherlock’s text/image morph.

The Pink Lady

Figure 6

Figure 6: Letters scratched into the floor in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

Sherlock producer Vertue refers to the pink lady scene in ‘A Study in Pink’ as particularly noteworthy for its “text all around the screen”, referring to it as the “best use” of on-screen text in the series (qtd in McMillan, 2014). In this scene, a dead woman dressed in pink lies face first on the floor of a derelict building into which she has painstakingly etched a word or series of letters (‘Rache’) with her fingernails. As Sherlock investigates the crime scene, forensics officer Anderson interrupts to explain that ‘Rache’ is the German word for ‘revenge’. The German-to-English translation pops up on screen (see Fig. 6), and this time Sherlock sees it too. This superimposed text, so obviously laid over the image, oversteps its surface positioning to enter Sherlock’s diegetic space, and we next view it backwards, from Sherlock’s point of view, not ours (see Fig. 7). After an exasperated eye roll that signals his disregard for Anderson, Sherlock dismisses this textual intervention and we watch it swirl into oblivion. Here, on-screen text is at once both inside and outside the narrative, diegetic and extra-diegetic, informative and affecting. In this way it self-reflexively draws attention to the show’s narrative framing, demonstrating its complexity as distinct diegetic levels merge.

Figure 7

Figure 7: Sherlock sees on-screen text in ‘A Study in Pink’, Sherlock (2010), Episode 1, Season 1.

For Carol O’Sullivan (2011), when on-screen text affords this type of play between the diegetic and extra-diegetic it functions as an “extreme anti-naturalistic device” (166) that she unpacks via Gérard Genette’s notion of narrative metalepsis (164). Detailing numerous examples of humourous, formally transgressive diegetic subtitles, such as those found in Annie Hall (Woody Allen, 1977) (Fig. 8), O’Sullivan points to their metatextual function, referring to them as “metasubtitles” (166) that implicitly comment on the limits and nature of subtitling itself. When Sherlock’s on-screen titles oscillate between character and viewer point-of-view shots, they too become metatextual, demonstrating, in Genette’s terms, “the importance of the boundary they tax their ingenuity to overstep in defiance of verisimilitude – a boundary that is precisely the narrating (or the performance) itself: a shifting but sacred frontier between two worlds, the world in which one tells, the world of which one tells” (qtd in O’Sullivan 2011: 165). Moreover, for O’Sullivan, “all subtitles are metatextual” (166) necessarily foregrounding their own act of mediation and interpretation. Specifically linking such ideas to Sherlock, Luis Perez Gonzalez (2012: 18) notes how “the series creators incorporate titles that draw attention to the material apparatus of filmic production”, thereby creating an complex alienation-attraction effect “that shapes audience engagement by commenting upon the diegetic action and disrupting conventional forms of semiotic representation, making viewers consciously work as co-creators of media content.”

Figure 8

Figure 8: Subtitled thoughts in the balcony scene, Annie Hall (1977).

Eye Bias

One finding from subtitle eye tracking research particularly pertinent to Sherlock is the notion that on-screen text causes eye bias. This was established in various studies conducted by d’Ydewalle and associates, which found that subtitle processing is largely automatic and obligatory. D’Ydewalle and de Bruycker (2007: 196) state:

Paying attention to the subtitle at its presentation onset is more or less obligatory and is unaffected by major contextual factors such as the availability of the soundtrack, knowledge of the foreign language in the soundtrack, and important episodic characteristics of actions in the movie: Switching attention from the visual image to “reading” the subtitles happens effortlessly and almost automatically (196).

This point is confirmed by Bisson et al. (2014: 399) who report that participants read subtitles even in ‘reversed’ conditions – that is, when subtitles are rendered in an unfamiliar language and the screen audio is fully comprehensible (in the viewers’ first language) (413). Again, in intralingual or same-language subtitling – when titles replicate the language spoken on screen –hearing audiences still divert to the subtitle area (413). These findings indicate that viewers track subtitles irrespective of language or accessibility requirements. In fact, the tracking of subtitles overrides function. As Bisson et al. (413) surmise, “the dynamic nature of the subtitles, i.e., the appearance and disappearance of the subtitles on the screen, coupled with the fact that the subtitles contained words was enough to generate reading behavior”.

Szarkowska and Kruger (this issue) reach a similar conclusion, explaining eye bias towards subtitles in terms of both bottom-up and top-down impulses. When subtitles or other forms of text flash up on screen, they affect a change to the scene that automatically pulls our eyes. The appearance and disappearance of text on screen is registered in terms of motion contrast, which according to Smith (2013: 176), is the “critical component predicting gaze behavior”, attaching to small movements as well as big. Additionally, we are drawn to words on screen because we identify them as a ready source of relevant information, as found in Batty et al. (forthcoming). Analysing a dialogue-free montage sequence from animated feature Up (Pete Docter, 2009), Batty et al. found that on-screen text in the form of signage replicates in miniature how ‘classical’ montage functions as a condensed form of storytelling aiming for enhanced communication and exposition. They suggest that montage offers a rhetorical amplification of an implicit intertitle, thereby alluding to the historical roots of text on screen while underlining its narrative as well as visual salience. One frame from the montage sequence focuses in close-up on a basket containing picnic items and airline tickets (see Fig. 9). Eye tracking tests conducted on twelve participants indicates a high degree of attentional synchrony in relation to the text elements of the airline ticket on which Ellie’s name is printed. Here, text provides a highly expedient visual clue as to the narrative significance of the scene and viewers are drawn to it precisely for its intertitle-like, expository function, highlighting the top-down impulse also at play in the eye bias caused by on-screen text.

Figure 9

Figure 9: Heat map showing collective gaze weightings during the montage sequence in Up (2009).

In this image from Up, printed text appears in the centre of the frame and, as Smith (2013: 178) elucidates, eyes are instinctively drawn towards frame centre, a finding backed up by much subtitle research (see Skarkowska and Kruger, this issue). However, eye tracking results on Sherlock conducted by Redmond, Sita and Vincs (this issue) indicate that viewers also scan static text when it is not in the centre of the frame. In an establishing shot of 221B Baker Street from the first episode of Sherlock’s second season, ‘A Scandal in Belgravia’, viewers track static text that borders the frame across its top and right hand sides, again searching for information (See Fig. 10). Hence, the eye-pull exerted by text is noticeable even in the absence of movement, contrast and central framing. In part, viewers are attracted to text simply because it is text – identified as an efficient communication mode that facilitates speedy comprehension (see Lavaur, 2011: 457).

Figure 10

Figure 10: Single viewer gaze path for ‘A Scandal in Belgravia’, Sherlock (2012), Episode 1, Season 2.

Distraction/Attraction

What do these eye tracking results across screen and translation studies tell us about Sherlock’s innovative use of on-screen text and texting? Based on the notion that text on screen draws the eye in at least dual ways, due to both its dynamic/contrastive nature and its communicative expediency, we can surmise that for Sherlock viewers, on-screen text is highly visible and more than likely to be in that 3.8% of the screen on which they will focus at any one point in time (see Smith, 2013: 168). The marked eye bias caused by text on screen is further accentuated in Sherlock by the freshness of its textual flashes, especially for English-speaking audiences given the language hierarchies of global screen media (see Acland 2012, UNESCO 2013). The small percentage of foreign-language media imported into most English-speaking markets tends to result in a lack of familiarity with subtitling beyond niche audience segments. For those unfamiliar with subtitling or captioning, on-screen text appears particularly novel. Additionally, as explored, floating TELOPs in Sherlock attract attention due to the complex functions they fulfil, providing narrative and character clues as well as textual and stylistic cohesion. As Tepper (2011) points out, in the first episode of the series, viewers are introduced to Sherlock’s character via text, before seeing him on screen. “When he texts the word ‘Wrong!’ to DI Lestrade and all the reporters at Lestrade’s press conference,” notes Tepper, “the technological savvy and the imperiousness of tone tell you most of what you need to know about the character.”

There seems no doubt that on-screen text in Sherlock attracts eye movement, and that it therefore distracts from other parts of the image. One question then that immediately presents itself is why Sherlock’s textual distractions are tolerated – even celebrated – to a far greater extent than other, more conventional or routine forms of titling like subtitles and captions. While Sherlock’s on-screen text is praised as innovative and incisive, interlingual subtitling and SDH are criticised by detractors for the way in which they supposedly force viewers to read rather than watch, effectively transforming film into “a kind of high-class comic book with sound effects” (Canby, 1983).[2] Certainly, differences in scale affect such attitudes and the quantitative variance between post-subtitles (produced for distribution only) and authorial or diegetic titling (as seen in Sherlock) is pronounced.[3] However, eye tracking research on subtitle processing indicates that, on the whole, viewers easily accommodate the increased cognitive load it presents. Although attentional splitting occurs, leading to an increase in back-and-forth shifts between the subtitles and the rest of the image area (Skarkowska and Kruger, this issue), viewers acclimatise by making shorter fixations than in regular reading and by skipping high-frequency words and subtitles while still managing to register meaning (see d’Ydewalle and de Bruycker, 2007: 199). In this way, subtitle processing reveals many differences to reading of static text, and approximates techniques of visual scanning. Bearing these findings in mind, I propose it is more accurate to see subtitling as transforming reading into viewing and text into image, rather than vice versa.

Situating Sherlock in relation to a range of related TELOP practices across diverse TV genres (such as game shows, panel shows, news broadcasting and dramas) Ryoko Sasamoto (2014: 7) notes that the additional processing effort caused by on-screen text is offset by its editorial function.[4] TELOPs are often deployed by TV producers to guide interpretation and ensure comprehension by selecting and highlighting information deemed most relevant. This suggestion is backed up by research by Rei Matsukawa et al. (2009), which found that the information redundancy effect caused by TELOPs facilitates understanding of TV news. For Sasamoto (2014: 7), ‘impact captioning’ highlights salient information in much the same way as voice intonation or contrastive stress. It acts as a “written prop on screen” enabling “TV producers to achieve their communicative aims… in a highly economical manner” (8). Focusing on Sherlock specifically, Sasamoto suggests that its captioning provides “a route for viewers into complex narratives” (9). Moreover, as Szarkowska and Kruger (this issue) note, in static reading conditions, “longer fixations typically reflect higher cognitive load.” Consequently, the shorter fixations that characterise subtitle viewing supports the contention that on-screen text processing is eased by its expedient, editorial function and by redundancy effects resulting from its multimodality.

Switched On

Another way in which Sherlock’s text and titling innovations extend beyond mobile phone usage was exemplified in July 2013 by a promotional campaign that promised viewers a ‘sneak peak’ at a yet-to-be-released episode title, requiring them to find and piece together a series of clues. In true Sherlockian style, the clues were well hidden, only visible to viewers if they switched on closed-captioning or SDH available for deaf and hard-of-hearing audiences. With this device turned on, viewers encountered intralingual captioning along the bottom of their screen and additionally, individually boxed letters that appeared top left (see Figs. 11 and 12). Viewers needed to gather all these single letter clues in order to deduce the episode title: ‘His Last Vow’. According to the ‘I Heart Subtitles’ blog (July 16, 2013), in doing so, Sherlock once again displayed its ability to “think outside the box and consider all…options”. It also cemented its commitment to on-screen text in various guises, and effectively gave voice to an audience segment typically disregarded in screen commentary and analysis. Through this highly unusual, cryptic campaign, Sherlock alerted viewers to more overtly functional forms of titling, and intimated points of connection between language, textual intervention and access.

Figure 11

Figures 11: Boxed letter clues (top left of frame) that appeared when closed captioning was switched on, during a re-run of ‘A Scandal in Belgravia’, Sherlock (2012), Episode 1, Season 2.

Figure 12

Figures 12: Boxed letter clues (top left of frame) that appeared when closed captioning was switched on, during a re-run of ‘A Scandal in Belgravia’, Sherlock (2012), Episode 1, Season 2.

Conclusion

On-screen text invites a rethinking of the visual, expanding its borders and blurring its definitional clarity. Eye tracking research demonstrates that moving text on screens is processed differently to static text, affected by a range of factors issuing from its multimodal complexity. Sherlock subtly signals such issues through its playful, irreverent deployment of text, which enables viewers to directly access Sherlock’s thoughts and understand his reasoning, while also distancing them, asking them to marvel at his ‘millennial’ technological prowess (Stein and Busse, 2012: 11) while remaining self-consciously aware of his complex narrative framing as it flips inside out, inviting audiences to watch themselves watching. Such diegetic transgression is yet to be mapped through eye tracking, intimating a profitable direction for future studies. To date, data on text and image processing demonstrates how on-screen text attracts eye movement and hence, it can be inferred that it distracts from other parts of the image area. Yet, despite rendering more of the image effectively ‘invisible’, text in the form of TELOPs are increasingly prevalent in news broadcasts, current affairs panel shows (when audience text messages are displayed) and, most notably, in Asian TV genres where they are now a “standard editorial prop” featured in many dramas and game shows (Sasamoto, 2014: 1). In order to take up the challenge presented by such emerging modes of screen address, research needs to move beyond surface assessments of the attraction/distraction nexus. It is the very attraction to TELOP distraction that Sherlock – via eye tracking – brings to the fore.

 

References

Acland, Charles. 2012. “From International Blockbusters to National Hits: Analysis of the 2010 UIS Survey on Feature Film Statistics.” UIS Information Bulletin 8: 1-24. UNESCO Institute for Statistics.

Altman, Rick. 2004. Silent Film Sound. New York: Columbia University Press.

Banks, David. 2012. “Sherlock: A Perspective on Technology and Story Telling.” Cyborgology, January 25. Accessed October 9, 2014.

Batty, Craig, Adrian Dyer, Claire Perkins and Jodi Sita (forthcoming). “Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative.” In Making Sense of Cinema: Empirical Studies into Film Spectators and Spectatorship, edited by Carrie Lynn D. Reinhard and Christopher J. Olson. London and New York: Bloomsbury.

Bennet, Alannah. 2014. “From Sherlock to House of Cards: Who’s Figured Out How to Translate Texting to Film.” Bustle, August 18. Entertainment. Accessed October 9. http://www.bustle.com/articles/36115-from-sherlock-to-house-of-cards-whos-figured-out-how-to-translate-texting-to-film/image/36115.

Biedenharn, Isabella. 2014. “A Brief Visual History of On-Screen Text Messages in Movies and TV.Flavorwire, April 24. Accessed October 13.

Bisson, Marie-Jos´ee, Walter J. B. Van Heuven, Kathy Conklin And Richard J. Tunney. 2014. “Processing of native and foreign language subtitles in films: An eye tracking study.” Applied Psycholinguistics 35: 399–418. Accessed October 13, 2014. doi: 10.1017/S0142716412000434.

Calloway, Mariel. 2013. “The Game is On(line): BBC’s ‘Sherlock’ in the Age of Social MediaMariel Calloway, March 8. Accessed October 14, 2014.

Canby, Vincent. 1983. “A Rebel Lion Breaks Out.” New York Times, March 27, 21.

Dodes, Rachel. 2013. “From Talkies to Texties.” Wall Street Journal, April 4, Arts and Entertainment Section. Accessed October 13, 2014.

d’Ydewalle, Géry and Wim De Bruycker, 2007. “Eye movements of children and adults while reading television subtitles.” European Psychologist 12 (3): 196-205.

Kofoed, D. T. 2011. “Decotitles, the Animated Discourse of Fox’s Recent Anglophonic Internationalism.” Reconstruction 11 (1). Accessed October 5, 2012.

Lavaur, Jean-Marc and Dominic Bairstow. 2011. “Languages on the screen: Is film comprehension related to the viewers’ fluency level and to the language in the subtitles?” International Journal of Psychology 46 (6): 455-462. doi: 10.1080/00207594.2011.565343.

McMillan, Graeme. 2014. “Sherlock’s Text Messages Reveal Our TranshumanismWired UK, February 3. Accessed October 14.

Matsukawa, Rei, Yosuke Miyata and Shuichi Ueda. 2009. “Information Redundancy Effect on Watching TV News: Analysis of Eye Tracking Data and Examination of the Contents.” Literary and Information Science 62: 193-205.

O’Sullivan, Carol. 2011. Translating Popular Film. Basingstoke and New York: Palgrave Macmillan.

Pérez González, Luis. 2013. “Co-Creational Subtitling in the Digital Media: Transformative and Authorial Practices.” International Journal of Cultural Studies 16 (1): 3-21. Accessed September 25, 2014. doi: 10.1177/1367877912459145.

Rayner, K. 1998. “Eye Movements in Reading and Information Processing: 20 Years of Research.” Psychological Bulletin 124: 372-422.

Redmond, Sean, Jodi Sita and Kim Vincs. 2015. “Our Sherlockian Eyes: The Surveillance of VisionRefractory: a Journal of Entertainment Media, 25.

Romero-Fresco, Pablo. 2013. “Accessible filmmaking: Joining the dots between audiovisual translation, accessibility and filmmaking.” JoSTrans: The Journal of Specialised Translation 20: 201-23. Accessed September 20, 2014.

Sasamoto, Ryoko. 2014. “Impact caption as a highlighting device: Attempts at viewer manipulation on TV.” Discourse, Context and Media 6: 1-10. Accessed September 18 (Article in Press). doi: 10.1016/j.dcm.2014.03.003.

Schrodt, Paul. 2013. “This is How to Shoot Text MessagingEsquire, February 4. The Culture Blog. Accessed October 13, 2014.

Smith, Tim J. 2013. “Watching You Watch Movies: Using Eye Tracking to Inform

Cognitive Film Theory” in Psychocinematics: Exploring Cognition at the Movies, edited by Arthur P. Shimamura, 165-91. Oxford and New York: Oxford University Press. Accessed October 7, 2014. doi: http://dx.doi.org/10.1093/acprof:oso/9780199862139.001.0001.

Stein, Louise Ellen and Kristina Busse. 2012. “Introduction: The Literary, Televisual and Digital Adventures of the Beloved Detective.” In Sherlock and Transmedia Fandom: Essays on the BBC Series, edited by Louise Ellen Stein and Kristina Busse, 9-24. Jefferson: McFarland and Company.

Szarkowska, Agnieszka et. al. 2013. “Harnessing the Potential of Eye-Tracking for Media Accessibility.” in Translation Studies and Eye-Tracking Analysis, edited by Sambor Grucza, Monika Płużyczka and Justyna Zając, 153-83. Frankfurt am Mein: Peter Lang.

Szarkowska, Agnieszka and Jan Louis Kruger. 2015. “Subtitles on the Moving Image: An Overview of Eye Tracking Studies.” Refractory: a Journal of Entertainment Media, 25.

Tepper, Michele. 2011. “The Case of the Travelling Text Message.” Interactions Everywhere, June 14. Accessed October 14, 2014.

UNESCO. 2013. “Feature Film Diversity”, UIS Fact Sheet 24, May. Accessed October 3, 2014.

Zhang, Sarah. 2014. “How Hollywood Figured Out A Way To Make Texting In Movies Look Less Dumb.Gizmodo, August 18. Accessed August 19.

Zhou, Tony. 2014. “A Brief Look at Texting and the Internet in Film”. Video Essay, Every Frame a Painting, August 15. Accessed August 19.

 

List of Figures

 

 

Notes

[1] While some commentators point out that Sherlock was by no means the first to depict text messaging in this way – as floating text on screen – it is this series more than any other that has brought this phenomenon into the limelight. Other notable uses of on-screen text to depict mobile phone messaging occur in films All About Lily Chou-Chou (Iwai, 2001), Disconnect (Rubin, 2013), The Fault in Our Stars (Boone, 2014), LOL (Azuelos, 2012), Non-Stop (Collet-Serra, 2014), Wall Street: Money Never Sleeps (Stone, 2010), and in TV series Glee (Fox, 2009–), House of Cards (Netflix, 2013–), Hollyoaks (Channel 4, 1995–), Married Single Other (ITV, 2010) and Slide (Fox8, 2011). For discussion of some ‘early adopters’, see Biendenharn 2014.

 

Notes

[2] Notably, in this New York Times piece, Canby (1983) actually defends subtitling against this charge, and advocates for subtitling over dubbing.

[3] On distinctions between post-subtitling and pre-subtitling (including diegetic subtitling), see O’Sullivan (2011).

[4] According to Sasamoto (2014: 1), “the use of OCT [Open Caption Telop] as an aid for enhanced viewing experience originated in Japan in 1990.”

 

Bio

Dr Tessa Dwyer teaches Screen Studies at the University of Melbourne, specialising in language politics and issues of screen translation. Her publications have appeared in journals such as The Velvet Light Trap, The Translator and The South Atlantic Quarterly and in a range of anthologies including B is for Bad Cinema (2014), Words, Images and Performances in Translation (2012) and the forthcoming Locating the Voice in Film (2016), Contemporary Publics (2016) and the Routledge Handbook of Audiovisual Translation (2017). In 2008, she co-edited a special issue of Refractory on split screens. She is a member of the ETMI research group and is currently writing a book on error and screen translation.

Subtitles on the Moving Image: an Overview of Eye Tracking Studies – Jan Louis Kruger, Agnieszka Szarkowska and Izabela Krejtz

Abstract

This article provides an overview of eye tracking studies on subtitling (also known as captioning), and makes recommendations for future cognitive research in the field of audiovisual translation (AVT). We find that most studies in the field that have been conducted to date fail to address the actual processing of verbal information contained in subtitles, and rather focus on the impact of subtitles on viewing behaviour. We also show how eye tracking can be utilised to measure not only the reading of subtitles, but also the impact of stylistic elements such as language usage and technical issues such as the presence of subtitles during shot changes on the cognitive processing of the audiovisual text as a whole. We support our overview with empirical evidence from various eye tracking studies conducted on a number of languages, language combinations, viewing contexts as well as different types of viewers/readers, such as hearing, hard of hearing and Deaf people.

Introduction

The reading of printed text has received substantial attention from scholars since the 1970s (for an overview of the first two decades see Rayner et al. 1998). Many of these studies, conducted from a psycholinguistic angle, made use of eye tracking. As a result, a large body of knowledge exists on the eye movements during reading of people with varying levels of reading skills and language proficiency, with a range of ages, different first languages and cultural backgrounds, and in different contexts. Studies on subtitle reading, however, have not achieved the same level of scientific rigour largely for practical reasons: subtitles are not static for more than a few seconds at a time; they compete for visual attention with a moving image; and they compete for overall cognitive resources with verbal and non-verbal sounds. This article will identify some of the gaps in current research in the field, and also illustrate how some of these gaps can be bridged.

Studying the reading of subtitles is significantly different from studying the reading of static text. In the first place, as far as eye tracking software is concerned, the subtitles appear on a moving image as image rather than text, which renders traditional text-based reading statistics and software all but useless. This also makes the collection of data for reading research on subtitles a painstakingly slow process involving substantial manual inspections and coding. Secondly, the fact that subtitles appear against the background of the moving image means that they are always in competition with this image, which renders the reading process fundamentally different from the reading process of static texts: on the one hand because the reading of subtitles compete with the processing of the image, sometimes resulting in interrupted reading, but on the other hand the limited time the subtitles are on screen means that readers have less time to reread or regress to study difficult words or to check information. Either way, studying this reading process, and the cognitive processing that takes place during the reading, is much more complicated than in the case of static texts where we know that the reader is mainly focussing on the words before her/him without additional auditory and visual information to process.

While the viewing of subtitles has been the object of many eye tracking studies in recent years, with increasing frequency (see, for example Bisson et al. 2012; d’Ydewalle and Gielen 1992; d’Ydewalle and De Bruycker 2007; Ghia 2012; Krejtz et al. 2013; Kruger 2013; Kruger et al. 2013; Kruger and Steyn 2014; Perego et al. 2010; Rajendran et al. 2013; Specker 2008; Szarkowska et al. 2011; Winke et al. 2013), the study of the reading of subtitles remains a largely uncharted territory with many research avenues still to be explored. Those studies that do venture to measure more than just attention to the subtitle area, seldom do this for extended texts.

In this article we provide an overview of studies on how subtitles change the way viewers process audiovisual material, and also of studies on the unique characteristics of the subtitle reading process. Taking an analysis of the differences between reading printed (static) text and subtitles as point of departure, we examine a number of aspects typical of the way subtitle text is processed in reading. We also look at the impact of the dynamic nature of the text and the competition with other sources of information on the reading process (including scene perception, changes in the viewing process, shifts between subtitles and image, visual saliency of text, faces, and movement, and cognitive load), as well as discussing studies on the impact of graphic elements on subtitle reading (e.g. number of lines, and text chunking), and studies that attempt to measure the subtitle reading process in more detail.

We start off with a discussion of the way in which watching an audiovisual text with subtitles alters viewing behaviour as well as of the complexities of studying subtitles due to the dynamic nature of the image it has as a backdrop. Here we focus on the fleeting nature of the subtitle text, the competition between reading the subtitles and scanning the image, and the interaction between different sources of information. We further discuss internal factors that impact on subtitle processing, like the language and culture of the audience, the language of the subtitles, the degree of access the audience has to sound, and other internal factors, before turning to external factors related to the nature of the audiovisual text and the presentation of the subtitles. Finally, we provide an overview of studies attempting to measure the processing of subtitles as well as findings from two studies that approach the processing of subtitles

The dynamic nature of the subtitle reading process

Reading subtitles differs substantially from reading printed text in a number of respects. As opposed to “static text on a stable background”, the viewer of subtitled audiovisual material is confronted with “fleeting text on a dynamic background” (Kruger and Steyn 2014, 105). In consequence, viewers not only need to process and integrate information from different communication channels (verbal visual, non-verbal visual, verbal auditory, non-verbal auditory, see Gottlieb 1998), but they also have no control over the presentation speed (see Kruger and Steyn 2014; Szarkowska et al. forthcoming). As a consequence, unlike in the reading of static texts, the pace of reading is in part dictated by the text rather than the reader – by the time the text is available to be read – and there is much less time for the reader to regress to an earlier part of a sentence or phrase, and no opportunity to return to previous sentences. Reading traditionally takes place in a limited window which the reader is acutely aware will disappear in a few seconds. Even though there are exceptions to the level of control a viewer has, for example in the case of DVD and PVR as well as other electronic media where the viewer can rewind and forward at will, the typical viewing of subtitles for most audiovisual products happens continuously and without pauses just as when watching live television.

Regressions, which form an important consideration in the reading of static text, take on a different aspect in the context of the knowledge (the viewer has) that dwelling too much on any part of a subtitle may make it difficult to finish reading the subtitle before it disappears. Any subtitle is on screen for between one and six seconds, and the viewer also has to simultaneously process all the other auditory (in the case of hearing audiences) and visual cues. In other words, unlike when reading printed text, reading becomes only one of the cognitive processes the viewer has to juggle in order to understand the audiovisual text as a whole. Some regressions are in fact triggered by the change of the image in shot changes (and to a much lesser extent scene changes) when the text stays on across these boundaries, which means that the viewer sometimes returns to the beginning of the subtitle to check whether it is a new subtitle, and sometimes even re-reads the subtitle. For example, in a recent study, Krejtz et al. (2013) established that participants tend not to re-read subtitles after a shot change or cut. But their data also revealed that a proportion of the participants did return their gaze to the beginning of the subtitle after such a change (see also De Linde and Kay, 1999). What this means for the study of subtitle reading is that these momentary returns (even if only for checking) result in a class of regressions that is not in fact a regression to re-read a word or section, but rather a false initiation of reading for what some viewers initially perceive to be a new sentence.

On the positive side, the fact that subtitles are embedded on a moving image and are accompanied by a soundtrack (in the case of hearing audiences) facilitates the processing of language in context. Unfortunately, this context also introduces competition for attention and cognitive resources. For the Deaf and hard of hearing audience, attention has to be divided between reading the subtitles and processing the scene, extracting information from facial expressions, lip movements and gestures, and matching or checking this against the information obtained in the subtitles. For the hearing audience who makes use of subtitles for support or to provide access to foreign language dialogue, attention is likewise divided between subtitles and the visual scene, and just as the Deaf and hard of hearing audiences have the added demand on their cognitive resources of having to match what they read with what they get from non-verbal signs and lip movements, the hearing audience matches what they read with what they hear, checking for correspondence of information and interpreting intonation, tenor and other non-verbal elements of speech.

What stands beyond doubt is that the appearance of subtitles changes the viewing process. In 2000, Jensema et al. famously stated that “the addition of captions to a video resulted in major changes in eye movement patterns, with the viewing process becoming primarily a reading process” (2000a, 275). Having examined the eye movements of six subjects watching video clips with and without subtitles, they found that the onset of a subtitle triggers a change in the eye movement pattern: when a subtitle appears, viewers move their gaze from whatever they were watching in order to follow the subtitle. In a more wide-scale study it was concluded by d’Ydewalle and de Bruycker (2007,196) that “paying attention to the subtitle at its presentation onset is more or less obligatory and is unaffected by major contextual factors such as the availability of the soundtrack, knowledge of the foreign language in the soundtrack, and important episodic characteristics of actions in the movie: Switching attention from the visual image to “reading” the subtitles happens effortlessly and almost automatically”.

Subtitles therefore appear to be the cause of eye movement bias similar to faces (see Hershler & Hochstein, 2005; Langton, Law, Burton, & Schweinberger, 2008; Yarbus, 1967), the centre of the screen, contrast and movement. In other words, subtitles attract the gaze at least in part because of the fact that the eye is drawn to the words on screen just as the eye is drawn to movement and other elements. Eyes are drawn to subtitles not only because the text is identified as a source of meaningful information (in other words a top-down impulse as the viewer consciously consults the subtitles to obtain relevant information), but also because of the change to the scene that the appearance of a subtitle causes (in other words a bottom-up impulse, automatically drawing the eyes to what has changed on the screen).

As in most other contexts, the degree to which viewers will process the subtitles (i.e. read them rather than merely look at them when they appear and then look away) will be determined by the extent to which they need the subtitles to follow the dialogue or to obtain information on relevant sounds. In studying visual attention to subtitles it therefore remains a priority to measure the degree of processing, something that has not been done in more than a handful of studies, and something to which we will return later in the article.

Viewers usually attend to the image on the screen, but when subtitles appear, it only takes a few frames for most viewers to move their gaze to read the subtitles. The fact that people tend to move their gaze to subtitles the moment they appear on the screen is illustrated in Figures 1 and 2.

Figure. 1 Heat maps of three consecutive film stills – Polish news programme Fakty (TVN) with intralingual subtitles.

Figure. 1 Heat maps of three consecutive film stills – Polish news programme Fakty (TVN) with intralingual subtitles.

Figure 2. Heat maps of two consecutive film stills – Polish news programme Wiadomości (TVP1) with intralingual subtitles

Figure 2. Heat maps of two consecutive film stills – Polish news programme Wiadomości (TVP1) with intralingual subtitles

Likewise, when the gaze of a group of viewers watching an audiovisual text without subtitles is compared to that of a similar group watching the same text with subtitles, the split in attention is immediately visible as the second group reads the subtitles and attends less to the image, as can be seen in Figure 3.

Figure 3. Heat maps of the same scene seen without subtitles and with subtitles – recording of an academic lecture.

Figure 3. Heat maps of the same scene seen without subtitles and with subtitles – recording of an academic lecture.

Viewer-internal factors that impact on subtitle processing

The degree to which the subtitles are processed is far from straightforward. In a study performed at a South African university in the context of Sesotho students looking at a recorded lecture with subtitles in their first language and audio in English (their language of instruction), students were found to avoid looking at the subtitles (see Kruger, Hefer and Matthew, 2013b). Sesotho students in a different group who saw the same lecture with English subtitles processed the subtitles to a much larger extent. This contrast is illustrated in the focus maps in Figures 4.

4-Kruger

Figure 4. Focus maps of Sesotho students looking at a lecture in intralingual English subtitles (left) and another group looking at the same lecture with interlingual Sesotho subtitles (right) – recording of an academic lecture.

The difference in eye movement behaviour between the conditions is also evident when considering the number of subtitles skipped. Participants in the above study who saw the video with Sesotho subtitles skipped an average of around 50% of the Sesotho subtitles (median at around 58%), whereas participants who saw the video with English subtitles only skipped an average of around 20% of the English subtitles (with a median of around 8%) (see Kruger, Hefer & Matthew, 2014).

This example does not, however, represent the conventional use of subtitles where viewers would rely on the subtitles to gain access to a text from which they would have been excluded without the subtitles. It does serve to illustrate that subtitle reading is not unproblematic and that more research is needed on the nature of processing in different contexts by different audiences. For example, in a study in Poland, interlingual subtitles (English to Polish) were skipped slightly less often by hearing viewers compared to intralingual subtitles (Polish to Polish), possibly because hearing viewers didn’t need them to follow the plot (see Szarkowska et al., forthcoming).

Another important finding from eye tracking studies on the subtitle process relates to how viewers typically go about reading a subtitle. Jensema et al. (2000) found that in subtitled videos, “there appears to be a general tendency to start by looking at the middle of the screen and then moving the gaze to the beginning of a caption within a fraction of a second. Viewers read the caption and then glance at the video action after they finish reading” (2000, 284). This pattern is indeed often found, as illustrated in the sequence of frames from a short video from our study in Figure 5.

Figure 5. Sequence of typical subtitle reading – a recording of Polish news programme Fakty (TVN) with intralingual subtitles.

Figure 5. Sequence of typical subtitle reading – a recording of Polish news programme Fakty (TVN) with intralingual subtitles.

Some viewers, however, do not read so smoothly and tend to shift their gaze between the image and the subtitles, as demonstrated in Figure 6. The gaze shifts between the image and the subtitle, also referred to in literature as ‘deflections’ (de Linde and Kay 1999) or ‘back-and-forth shifts’ (d’Ydewalle and De Bruycker (2007), can be regarded as an indication of the smoothness of the subtitle reading process: the fewer the gaze shifts, the more fluent the reading and vice versa.

Figure 6. Scanpath of frequent gaze shifting between text and image – a recording of Polish news programme Fakty (TVN) with intralingual subtitles.

Figure 6. Scanpath of frequent gaze shifting between text and image – a recording of Polish news programme Fakty (TVN) with intralingual subtitles.

An important factor that influences subtitle reading patterns is the nature of the audience. In Figure 7 an interesting difference is shown between the way a Deaf and a hard of hearing viewer watched a subtitled video. The Deaf viewer moved her gaze from the centre of the screen to read the subtitle and then, after having read the subtitle, returned the gaze to the centre of the screen. In contrast, the hard of hearing viewer made constant comparisons between the subtitles and the image, possibly relying on residual hearing and trying to support the subtitle reading process with lip-reading. Such a result was reported by Szarkowska et al. (2011), who found differences in the number of gaze shifts between the subtitles and the image in the verbatim subtitles condition, particularly discernible (and statistically significant) in the hard of hearing group (when compared to the hearing and Deaf groups).

Figure 7. Scanpaths of Deaf and hard of hearing viewers. Left: Gaze plot illustrating the viewing pattern of a Deaf participant watching a clip with verbatim subtitles.  Right: Gaze plot illustrating the viewing pattern of a hard of hearing participant watching a clip with verbatim subtitles.

Figure 7. Scanpaths of Deaf and hard of hearing viewers. Left: Gaze plot illustrating the viewing pattern of a Deaf participant watching a clip with verbatim subtitles. Right: Gaze plot illustrating the viewing pattern of a hard of hearing participant watching a clip with verbatim subtitles.

These provisional qualitative indications of differences between eye movements of users with different profiles require more in-depth quantitative investigation and the subsequent section will provide a few steps in this direction.

As mentioned above, subtitle reading patterns largely depend on the type of viewers. Fluent readers have been found to have no difficulty following subtitles. Diao et al. (2007), for example, found a direct correlation between the impact of subtitles on learning and the academic and literacy levels of participants. Similarly, given that “hearing status and literacy tend to covary” (Burnham et al. 2008, 392), some previous studies found important differences in the way hearing and hearing-impaired people watch subtitled programmes. Robson (2004, 21) notes that “regardless of their intelligence, if English is their second language (after sign language), they [i.e. Deaf people] cannot be expected to have the same comprehension levels as hearing people who grew up exposed to English”. This is indeed confirmed by Szarkowska et al. (forthcoming) who report that Deaf and hard of hearing viewers in their study made more fixations on subtitles and that their dwell time on the subtitles was longer compared to hearing viewers. This result may indicate a larger effort needed to process subtitled content and more difficulty in extracting information (see Holmqvist et al. 2011, 387-388). This, in turn, may stem from the fact that for some Deaf people the language in the subtitles is not their mother tongue (their L1 being sign language). At the same time, for hearing-impaired viewers, subtitles provide an important source of information on the words spoken in the audiovisual text as well as other information contained in the audio track, which in itself explains the fact that they would spend more time looking at the subtitles.

Viewer-external factors that impact on subtitle processing

The ‘smoothness’ of the subtitle reading process depends on a number of factors, including the nature of the audiovisual material as well as technical and graphical aspects of subtitles themselves. At a general level, genre has an impact on both the role of subtitles in the total viewing experience, and on the way viewers process the subtitles. For example, d’Ydewalle and Van Rensbergen (1989) found that children in Grade 2 paid less attention to subtitles if a film involved a lot of action (see d’Ydewalle & Bruycker 2007 for a discussion). The reasons for this could simply be that action film tends to have less dialogue in the first place, but secondly and more significantly, the pace of the visual editing and the use of special effects creates a stronger visual element which shifts the balance of content towards the action (visual content) and away from dialogue (soundtrack and therefore subtitles). This, however, is an area that has to be investigated empirically. At a more specific level, technical characteristics of an audiovisual text such as film editing have an impact on the processing of subtitles.

1 Film editing

Film editing has a strong influence on the way people read subtitles, even beyond the difference in editing pace as a result of genre (for example, action and experimental films could typically be said to have a higher editing pace than dramas and documentaries). In terms of audience perception, viewers have been found to be unaware of standard film editing techniques (such as continuity editing) and are thus able to perceive film as a continuous whole in spite of numerous cuts – the phenomenon termed “edit blindness” (Smith & Henderson, 2008, 2). With more erratic and fast-paced editing, it stands to reason that the cognitive demands will increase as viewers have to work harder to sustain the illusion of a continuous whole.

When subtitles clash with editing such as cuts (i.e. if subtitles stay on screen over a shot or scene change), conventional wisdom as passed on by generations of subtitling guides (see Díaz Cintas & Remael 2007, ITC Guidance on Standards for Subtitling 1999) suggests that the viewer will assume that the subtitle has changed with the image and as a consequence they will re-read it (see above). However, Krejtz et al. (2013) reported that subtitles displayed over shot changes are more likely to cause perceptual confusion by making viewers shift their gaze between the subtitle and the rest of the image more frequently than subtitles which do not cross film cuts (cf. de Linde and Kay 1999). As such, the cognitive load is bound to increase.

2 Text chunking and line segmentation

Another conventional wisdom, perpetuated in subtitling guidelines and standards, is that poor line segmentation will result in less efficient processing (see Díaz Cintas & Remael 2007, Karamitroglou 1998). In other words, subtitles should be chunked per line and between subtitles in terms of self-contained semantic units. The line of dialogue: “He told me that he would meet me at the red mailbox” should therefore be segmented in something like the following ways:

He told me he would meet me
at the red mailbox.

Or

He told me
he would meet me at the red mailbox.

Neither of the following segmentations would be optimal because the prepositional phrase ‘at the red mailbox’ and the verb phrase ‘he would meet me’, respectively, are split, which is considered an error:

He told me he would meet me at the
red mailbox

He told me he
would meet me at the red mailbox.

However, Perego et al. (2010) found that poor line segmentation in two-line subtitles did not affect subtitle comprehension negatively. They also investigated 28 subtitles viewed by 16 participants using a threshold line between the subtitle region and the upper part of the screen, or main film zone, but did not find a statistically significant difference between the well-segmented and ill-segmented subtitles in terms of fixation counts, total fixation time, or number of shifts between subtitle region and upper area. The only statistically significant difference they found was between the mean fixation duration within the subtitle area between the two conditions, with the mean fixation duration in the ill-segmented subtitles being on average 12ms longer than in the well-segmented subtitles. Although the authors downplay the importance of this difference on the grounds that the difference is so small, it does seem to indicate at least a slightly higher cognitive load when the subtitles are ill-segmented. The small number of subtitles and participants, however, make it difficult to generalize from their results, again a result of the fact that it is difficult to extract reading statistics for subtitles unless the reading behaviour can be quantified over longer audiovisual texts.

In a study conducted a few years later, Rajendran et al. (2013) found that “chunking improves the viewing experience by reducing the amount of time spent on reading subtitles” (2013, 5). This study compared conditions different from those investigated in the previous study, excluding the ill-segmented condition of Perego et al. (2010), and focused mostly on live subtitling with respeaking. In the earlier study, which focused on pre-recorded subtitling, the subtitles in the two conditions were essentially still part of one sense unit that appeared as one two-line subtitle. In the later study, the conditions were chunked by phrase (similar to the well-segmented condition of the earlier study but with phrases appearing one by one on one line), no segmentation (where the subtitle area was filled with as much text as possible with no attempt at segmentation), word by word (where words appeared one by one) and chunked by sentence (where the sentences showed up one by one). Regardless of the fact that this later study therefore essentially investigated different conditions, they did find that the most disruptive condition was where the subtitle appeared word by word – eliciting more gaze points (defined less strictly than in fixation algorithms used by commercial eye trackers) and more “saccadic crossovers” or switches between image and subtitle area. However, in this study by Rajendran et al. (2013), the videos were extremely short (under a minute), and the sound was muted, hampering the ecological validity of the material, and once again making the findings less suitable to generalization.

Although both these studies have limitations in terms of generalizability, they both provide some indication that segmentation has an impact on subtitle processing. Future studies will nonetheless have to investigate this aspect over longer videos to determine whether the graphical appearance, and particularly the segmentation of subtitles, has a detrimental effect on subtitle processing in terms of cognitive load and effectiveness.

3 Language

The language of subtitles has received considerable attention from psycholinguists in the context of subtitle reading. D’Ydewalle and de Bruycker (2007) examined eye movement behaviour of people reading standard interlingual subtitles (with the audio track in a foreign language and subtitles in their native language) and reversed subtitles (with the audio in their mother tongue and subtitles in a foreign language). They found more regular reading patterns in the standard interlingual subtitling condition, with the reversed subtitling condition having more subtitles skipped, fewer fixations per subtitle, etc. (see also d’Ydewalle and de Bruycker 2003 and Pavakanun 1993). This is an interesting finding in itself, as it is the reversed subtitling that has been found to be particularly conducive to foreign language learning (see Díaz Cintas and Fernández Cruz 2008, and Vanderplank 1988).

Szarkowska et al. (forthcoming) examined differences in reading patterns of intralingual (Polish to Polish) and interlingual (English to Polish) subtitles among a group of Deaf, hard of hearing and hearing viewers. They found no differences in reading for the Deaf and hard of hearing audiences, but hearing people made significantly more fixations to subtitles when watching English clips with interlingual Polish subtitles than Polish clips with intralingual Polish subtitles. This confirms that the hearing viewers processed the subtitles to a significantly lower degree when they were redundant, as in the case of intralingual transcriptions of the soundtrack. What would be interesting to investigate in this context is those instances when the hearing audience did in fact read the subtitles, to determine to what extent and under what circumstances the redundant written information is used by viewers to support their auditory intake of information.

In a study on the influence of translation strategies on subtitle reading, Ghia (2012) investigated the differences in the processing of literal vs. non-literal translations into Italian of an English film clip (6 minutes) when watched by Italian EFL learners. According to Ghia, just as subtitle format, layout, and segmentation have the potential to affect visual and perceptual dynamics, the relationship translation establishes with the original text means that “subtitle translation is also likely to influence the perception of the audiovisual product and viewers’ general reading patterns” (2012,175). Ghia particularly wanted to investigate the processing of different translation strategies in the presence of sound and image with the subtitles. In her study she found that the non-literal translations (where the target text diverged from the source text) resulted in more deflections between text and image. This is similar to the findings of Rajendran et al. (2013) in terms of less fluent graphics in word-by-word subtitles.

As can be seen from the above, the aspect of language processing in the context of subtitled audiovisual texts has received some attention, but has not to date been approached in any comprehensive manner. In particular, there is a need for more psycholinguistic studies to determine how subtitle reading differs from the reading of static text, and how this knowledge can be applied to the practice of subtitling.

Measuring subtitle processing

1 Attention distribution and presentation speed

In the study by Jensema et al. (2000), subjects spent on average 84% of the time looking at subtitles, 14% at the video picture and 2% outside of the frame. The study represents an important early attempt to identify reading patterns in subtitle reading, but it has considerable limitations. The study had only six participants, three deaf and three hearing, and the video clips were extremely short (around 11 seconds each), presented with English subtitles (in upper case) without sound. The fact that there was no soundtrack therefore impacted on the time spent on the subtitles. In Perego et al’s study (2010), the ratio is reported as 67% on the subtitle area and 33% on the image. In this study there were 41 Italian participants who watched a 15-minute clip with Hungarian soundtrack and subtitles in Italian. As in the previous study, the audience therefore had to rely heavily on the subtitles in order to follow the dialogue. Kruger et al. (2014), in the context of intralingual subtitles in a Psychology lecture in English, found a ratio of 43% on subtitles, 43% on the speaker and slides and 14% on the rest of the screen. When the same lecture was subtitled into Sotho, the ratio changed to 20% on the subtitles, 66% on the speaker and slides, and 14% on the rest of the screen. This wide range is an indication of the difference in the distribution of visual attention in different contexts with different language combinations, different levels of redundancy of information, and differences in audiences.

In order to account for “the audiovisual nature of subtitled programmes”, Romero-Fresco (in press) puts forward the notion of ‘viewing speed’ – as opposed to reading speed and subtitling speed – which he defines as “the speed at which a given viewer watches a piece of audiovisual material, which in the case of subtitling includes accessing the subtitle, the accompanying images and the sound, if available”. The perception of subtitled programmes is therefore a result of not only the subtitle reading patterns, but also the visual elements of the film. Based on the analysis of over seventy-one thousand subtitles created in the course of the Digital Television for All project, Romero Fresco provides the following data on the viewing speed, reflecting the proportion of time spent by viewers looking at subtitles and at the images, proportional to the subtitle presentation rates (see Table 1).

Viewing speed Time on subtitles Time on images
120wpm ±40% ±60%
150wpm ±50% ±50%
180wpm ±60%-70% ±40%-30%
200wpm ±80% ±20%

Table 1. Viewing speed and distribution of gaze between subtitles and images (Romero-Fresco) 

Jensema et al. also suggested that the subtitle presentation rate may have an influence on the time spent reading subtitles vs. watching the rest of the image: “higher captioning speed results in more time spent reading captions on a video segment” (2000, 275). This was later confirmed by Szarkowska et al. (2011), who found that viewers spent more time on verbatim subtitles displayed at higher presentation rates compared to edited subtitles displayed with low reading speed, as illustrated by Figure 8.

Figure 8. Fixation-count based heatmaps illustrating changes in attention allocation of hearing and Deaf viewers watching videos subtitled at different rates.

Figure 8. Fixation-count based heatmaps illustrating changes in attention allocation of hearing and Deaf viewers watching videos subtitled at different rates.

2 Mean fixation duration

Irwin (2004, 94) states that “fixation location corresponds to the spatial locus of cognitive processing and that fixation or gaze duration corresponds to the duration of cognitive processing of the material located at fixation”. Within the same activity (e.g. reading), longer mean fixation durations could therefore be said to reflect more cognitive processing and higher cognitive load. One would therefore expect viewers to have longer fixations when the subject matter is more difficult, or when the language is more specialized. Across activities, however, comparisons of fixation duration is less meaningful as reading elicits more shorter fixations than scene perception or visual scanning simply because of the nature of the activities. It is therefore essential in eye tracking studies of subtitle reading to distinguish between the actual subtitles when they are on screen, the rest of the screen, and the subtitle area when there is no text (between successive subtitles).

The difference between reading and scene perception is illustrated in Figure 9, demonstrating that fixations on the image tend to be longer (indicated here by a bigger circle) than those on subtitles (which indicates more focused viewing), and more exploratory in nature (see the distinction between focal and ambient fixations in Velichkovsky et al. 2005).

Figure 9. Differences in fixation durations between the image and subtitle text – from Polish TV series Londyńczycy.

Figure 9. Differences in fixation durations between the image and subtitle text – from Polish TV series Londyńczycy.

Rayner (1984) indicated the impact of different tasks on mean fixation durations, as reflected in Table 2 below:

Task Mean fixation duration (ms) Mean saccade size (degrees)
Silent reading 225 2 (about 8 letters)
Oral reading 275 1.5 (about 6 letters)
Visual search 275 3
Scene perception 330 4
Music reading 375 1
Typing 400 1 (about 4 letters)

 Table 2. Approximate Mean Fixation Duration and Saccade Length in Reading, Visual Search, Scene Perception, Music Reading, and Typing[1]

In subtitling, silent reading is accompanied by simultaneous processing of the same information in the soundtrack (in the same or another language) as well as of other sounds and visual signs (for a hearing audience, that is – for a Deaf audience, it would be text and visual signs). The difference in mean fixation duration in these different tasks therefore reflects the difference in cognitive load. In silent reading of static text, there is no external competition for cognitive resources. When reading out loud, the speaker/reader inevitably monitor his/her own reading, introducing additional cognitive load. As the nature of the sign becomes more abstract, the load, and the fixation duration increases, and in the case of typing, different processing, production and checking activities are performed simultaneously, resulting in even higher cognitive load. This is inevitably an oversimplification of cognitive load, and indeed the nature of information acquisition between reading successive groups of letters (words) in a linear fashion is significantly different from that of scanning a visual scene for cues.

Undoubtedly, subtitle reading imposes different cognitive demands, and these demands are also very much dependent on the audience. In an extensive study on the differences in subtitle reading between Deaf, hard of hearing and hearing participants, we found a high degree of variation in mean fixation duration between the groups, and also a difference between the mean fixation duration in the Deaf and the hard of hearing groups between subtitles presented at 12 characters per second and 15 characters per second (see Szarkowska et al. forthcoming).

  12 characters per second 15 characters per second
Deaf 241.93 ms 232.82 ms
Hard of hearing 218.51 ms 214.78 ms
Hearing 186.66 ms 186.58 ms

Table 3. Differences in reading subtitles presented at different rates

Statistical analyses performed on the three groups with mean fixation duration as a dependent variable and groups and speed as categorical factors produced a statistically significant main effect, further confirmed by subsequent t-tests that yielded statistically significant differences in mean fixation duration for both subtitling speeds between all three groups. The difference within the Deaf and hard of hearing groups was also significant between 12cps and 15cps. What this suggests is that reading speed has a more pronounced effect on Deaf and hard of hearing viewers than on hearing ones.

3 Subtitle reading

As indicated at the outset, one of the biggest hurdles in studying the processing of subtitles is the fact that the subtitles appear as image on image rather than text on image as far as eye tracking analysis software is concerned. Whereas reading statistics software can therefore automatically mark words as areas of interest in static texts, and then calculate number of regressions, refixations, saccade length, fixation duration and count as related to the specific words, this process has to be done manually for subtitles. The fact that it is virtually impossible to create similar areas of interest on the subtitle words that are embedded in the image over large numbers of subtitles makes it very difficult to obtain reliable eye tracking results on subtitles as text. This explains the predominance of measures such as fixation count and fixation duration as well as shifts between subtitle area and image in eye tracking studies on subtitle processing. As a result, many of these studies do not distinguish directly between looking at the subtitle area and reading the subtitles, and, “they tend to define crude areas of interest (AOIs), such as the entire subtitle area, which means that eye movement data are also collected for the subtitle area when there are no subtitles on screen, which further skews the data” (Kruger and Steyn, 2014, 109).

Although a handful of studies come closer to studying subtitle reading by going beyond the study of fixation counts, mean fixation duration, and shifts between subtitle area and image area, most studies tend to focus on amount of attention rather than nature of attention. Briefly, the exceptions can be identified in the following studies: Specker (2008) looks at consecutive fixations; Perego et al. (2010) add the path length (sum of saccade lengths in pixels) to the more conventional measures; Rajendran et al. (2013) add the proportion of gaze points; Ghia (2012) looks at fixations on specific words as well as regressions; Bisson et al. (2012) look at the number of subtitles skipped, and proportion of successive fixations (number of successive fixations divided by total number of fixations); and in one of the most comprehensive studies on the subject of subtitle processing, d’Ydewalle and De Bruycker (2007) look at attention allocation (percentage of skipped subtitles, latency time, and percentage of time spent in the subtitle area), fixations (number, duration, and word-fixation probability), and saccades (saccade amplitude, percentage of regressive eye movements, and number of back-and-forth shifts between visual image and subtitle).

In a recent study, Kruger and Steyn (2014) provide a reading index for dynamic texts (RIDT) designed specifically to measure the degree of reading that takes place when subtitled material is viewed. This index is explained as “a product of the number of unique fixations per standard word in any given subtitle by each individual viewer and the average forward saccade length of the viewer on this subtitle per length of the standard word in the text as a whole” (2014, 110). Taking the location and start time of successive fixations within the subtitle area when a subtitle is present as the point of departure, the number of unique fixations (i.e. excluding refixations, and fixations following a regression) is determined, as well as the average length of forward saccades in the subtitle. This information gives an indication of the meaningful processing of the words in the subtitle when the number of fixations per word, as well as the length of saccades as ratio of the length of the average word in the audiovisual text are calculated. Essentially, the formula quantifies the reading of a particular subtitle by a particular participant by measuring the eye movement during subtitle reading against what is known about eye movements during reading and perceptual span.

In a little more detail, the formula can be written as follows for video v, with participant p viewing subtitle s”:

10

(Kruger and Steyn, 2014, 110).

This index was validated by performing a comparison of the manual inspection of the reading of 145 subtitles by 17 participants, and makes it possible to study the reading of subtitles over extended texts. In their study, Kruger and Steyn (2014) use the index to determine the relationship between subtitle reading and performance in an academic context, finding a significant positive correlation between the degree to which participants read the subtitles and their performance in a test written after watching subtitled lectures. The RIDT therefore presents a robust index of the degree to which subtitles are processed over extended texts, and could add significant value to psycholinguistic studies on subtitles. Using the index, previous claims that subtitles have a positive or negative impact on comprehension, vocabulary acquisition, language learning or other dependent variables, can be correlated with whether or not viewers actually read the subtitles, and to what extent the subtitles were read.

Conclusion

From this overview of studies investigating the processing of subtitles on the moving image it should be clear that much still needs to be done to gain a better understanding of the impact of various independent variables on subtitle processing. The complexity of the multimodal text, and in particular the competition between different sources of information, means that a subtitled audiovisual text is a substantially altered product from a cognitive perspective. Much progress has been made in coming to grips with the way different viewers behave when looking at subtitled audiovisual texts, but there are still more questions than answers – relating, for instance, to differences in how people process subtitled content on various devices (cf. the HBBTV4ALL project). The use of physiological measures like eye tracking and EEG (see Kruger et al. 2014) in combination with subjective measures like post-report questionnaires is, however, continually bringing us closer to understanding the impact of audiovisual translation like subtitling on the experience and processing of audiovisual texts.

 

Acknowledgements

This study was partially supported by research grant No. IP2011 053471 “Subtitling for the deaf and hard of hearing on digital television” from the Polish Ministry of Science and Higher Education for the years 2011–2014.

 

References

Bisson, Marie-Josée, Walter Van Heuven, Kathy Conklin, and Richard Tunney. 2014. “Processing of Native and Foreign Language Subtitles in Films: An Eye Tracking Study.” Applied Psycholinguistics 35(2):399-418.

Burnham, Denis, Leigh Greg, Noble William, Jones Caroline, Tyler Michael, Grebennikov Leonid and Alex Varley. 2008. Parameters in television captioning for deaf and hard-of-hearing adults: effects of caption rate versus text reduction on comprehension. Journal of Deaf Studies and Deaf Education 13 (3):391-404.

de Linde, Zoé and Neil Kay. 1999. The Semiotics of Subtitling. Manchester: St. Jerome.

Diao, Y., Chandler, P., Sweller, J. 2007. The effect of written text on comprehension of spoken English as a foreign language. The American Journal of Psychology 120(2): 237-261.

Díaz Cintas, Jorge and Marco Fernandez Cruz. (2008) “Using subtitled video materials for foreign language instruction”. In The Didactics of Audiovisual Translation edited by Jorge Díaz Cintas, 201-214. Amsterdam/Philadelphia: John Benjamins.

Díaz Cintas, Jorge and Aline Remael. 2007. Audiovisual Translation: Subtitling. Manchester: St. Jerome.

d’Ydewalle, Géry and Wim De Bruycker. 2003. Reading native and foreign language television subtitles in children and adults. In The mind’s eyes: Cognitive and applied aspects of eye movement research, edited by J. Hyönä, R. Radach and H. Deubel, 444-461. New York: Springer-Verlag.

d’Ydewalle, Géry and Wim De Bruycker. 2007. “Eye Movements of Children and Adults while Reading Television Subtitles.” European Psychologist 12:196–205.

d’Ydewalle, Géry and Ingrid Gielen. 1992. “Attention Allocation with Overlapping Sound, Image, and Text.” In Eye Movements and Visual Cognition: Scene Perception and Reading, edited by Keith Rayner, 415–427. New York: Springer-Verlag.

d’Ydewalle, Géry, Johan Van Rensbergen, and Joris Pollet. 1987. Reading a message when the same message is available auditorily in another language: The case of subtitling. In Eye Movements: From Physiology to Cognition edited by J.K O’Reagan and A. Lévy Schoen, 313-321. Amsterdam: Elsevier Science Publishers B.V. (North-Holland).

Ghia, Elisa. 2012. “The Impact of Translation Strategies on Subtitle Reading.” In Eye Tracking in Audiovisual Translation, edited by Elisa Perego, 155–182. Roma: Aracne Editrice.

Gottlieb, Henrik. 1998. Subtitling. In Routledge Encyclopaedia of Translation Studies, edited by Mona Baker, 244-248. London & New York: Routledge.

Hershler, Orit and Shaul Hochstein. 2005. At first sight: a high-level pop out effect for faces. Vision Research, 45, 1707–1724.

Holmqvist, Kenneth et al. 2011. Eyetracking. A Comprehensive Guide to Methods and Measures. Oxford: Oxford University Press.

Irwin, David E. 2004. Fixation location and fixation duration as indices of cognitive processing. In J.M. Henderson & F. Ferreira (Eds.), The interface of language, vision, and action: Eye movements and the visual world, 105-133. New York, NY: Psychology Press.

ITC Guidance on Standards for Subtitling. Online at: http://www.ofcom.org.uk/static/archive/itc/itc_publications/codes_guidance/standards_for_subtitling/subtitling_1.asp.html

Jensema, Carl. 2000. Eye movement patterns of captioned TV viewers. American Annals of the Deaf vo. 145, no. 3, 275-285.

Karamitroglou, Fotios. 1998. A Proposed Set of Subtitling Standards in Europe. Translation Journal 2(2). http://translationjournal.net/journal/04stndrd.htm

Krejtz, Izabela, Agnieszka Szarkowska, and Krzysztof Krejtz. 2013. “The Effects of Shot Changes on Eye Movements in Subtitling.” Journal of Eye Movement Research 6 (5): 1–12.

Kruger, Jan-Louis and Faans Steyn. 2014. “Subtitles and Eye Tracking: Reading and Performance.” Reading Research Quarterly 49 (1): 105–120.

Kruger, Jan-Louis, Esté Hefer, and Gordon Matthew. 2013a. “Measuring the Impact of Subtitles on Cognitive Load: Eye Tracking and Dynamic Audiovisual Texts.” Proceedings of Eye Tracking South Africa 29-31 August 2013, Cape Town.

Kruger, Jan-Louis, Esté Hefer, and Gordon Matthew. 2013b. The impact of subtitles on academic performance at tertiary level. Paper presented at the Linguistics Society of Southern Africa annual conference in Stellenbosch, June, 2013.

Kruger, Jan-Louis. 2013. “Subtitles in the Classroom: Balancing the Benefits of Dual Coding with the Cost of Increased Cognitive Load.” Journal for Language Teaching 47(1):29–53.

Kruger, Jan-Louis, Hefer, Esté, and Gordon Matthew. 2014. Attention distribution and cognitive load in a subtitled academic lecture: L1 vs. L2. Journal of Eye Movement Research 7(5):4, 1–15.

Langton, Stephen R.H., Anna S. Law, Burton, A. Mike and Stefan R. Schweinberger. 2008. Attention capture by faces. Cognition, 107:330-342.

Pavakanun, Ubowanna. 1992. Incidental acquisition of foreign language through subtitled television programs as a function of similarity with native language and as a function of presentation mode. Unpublished doctoral thesis, Leuven, Belgium, University of Leuven.

Perego, Elisa, Fabio Del Missier, Marco Porta and Mauro Mosconi. 2010. “The Cognitive Effectiveness of Subtitle Processing.” Media Psychology 13(3):243–272.

Rajendran, Dhevi, Andrew Duchowski, Pilar Orero, Juan Martínez, and Pablo Romero-Fresco. 2013. “Effects of Text Chunking on Subtitling: A Quantitative and Qualitative Examination.” Perspectives: Studies in Translatology 21(1):5–31.

Rayner, Keith. 1984. Visual selection in reading, picture perception, and visual search: A tutorial review. In Attention and performance edited by H. Bouma and D. Bouhwhuis, vol. 10. Hillsdale, NJ: Erlbaum.

Rayner, Keith 1998. “Eye movements in reading and information processing: Twenty years of research.” Psychological Bulletin, 124:372–422.

Robson, Gary D. 2004. The closed captioning handbook. Amsterdam: Elsevier.

Romero Fresco, Pablo (in press) The Reception of Subtitles for the Deaf and Hard of Hearing in Europe. Peter Lang.

Smith, Tim, and John M. Henderson. 2008. Edit Blindness: The relationship between attention and global change blindness in dynamic scenes. Journal of Eye Movement Research 2(2), 6:1-17.

Specker, Elizabeth, A. 2008. L1/L2 Eye Movement Reading of Closed Captioning: A Multimodal Analysis of Multimodal Use. Unpublished PhD thesis. University of Arizona.

Szarkowska, Agnieszka, Krejtz, Izabela, and Łukasz Dutka. (forthcoming) The effects of subtitle presentation rate, text editing and type of subtitling on the comprehension and reading patterns of subtitles among deaf, hard of hearing and hearing viewers. To appear in: Across Languages and Cultures 2016, vol. 2.

Szarkowska, Agnieszka, Krejtz, Izabela, Kłyszejko, Zuzanna and Anna Wieczorek. 2011. “Verbatim, standard, or edited? Reading patterns of different captioning styles among deaf, hard of hearing, and hearing viewers”. American Annals of the Deaf 156 (4):363-378.

Vanderplank, Robert. 1988 “The value of teletext sub-titles in language learning”. ELT Journal 42(4):272-81.

Velichkovsky, Boris M., Joos, Markus, Helmert, Jens R., and Sebastian Pannasch. 2005. Two Visual Systems and Their Eye Movements: Evidence from Static and Dynamic Scene Perception. InCogSci 2005: Proceedings of the XXVII Conference of the Cognitive Science Society, 2283–2288.

Winke, Paula, Susan Gass, and Tetyana Syderenko. 2013. “Factors Influencing the Use of Captions by Foreign Language Learners: An Eye Tracking Study.” The Modern Language Journal 97 (1):254–275.

Yarbus, Alfred L. 1967. Eye movements and vision. New York, NY: Plenum Press.

 

Notes

[1] Values are taken from a number of sources and vary depending on a number of factors (see Rayner, 1984)

 

Bios

Jan-Louis Kruger is director of translation and interpreting in the Department of Linguistics at Macquarie University in Sydney, Australia.  He holds a PhD in English on the translation of narrative point of view. His main research interests include studies on the reception and cognitive processing of audiovisual translation products including aspects such as cognitive load, comprehension, attention allocation, and psychological immersion.

Agnieszka Szarkowska, PhD, is Assistant Professor in the Institute of Applied Linguistics at the University of Warsaw, Poland. She is the founder and head of the Audiovisual Translation Lab, a research group working on media accessibility. Her main research interests lies in audiovisual translation, especially subtitling for the deaf and the hard of hearing and audio description.

Izabela Krejtz, PhD, is Assistant Professor at University of Social Sciences and Humanities, Warsaw. She is a co-founder of Eyetracking Research Center at USSH. Her research interests include neurocognitive and educational psychology. Her applied work focuses on pro-positive trainings of attention control, eye tracking studies in perception of audiovisual material and emotions regulation.

Sound and Sight: An Exploratory Look at Saving Private Ryan through the Eye Tracking Lens – Jennifer Robinson, Jane Stadler and Andrea Rassell

Abstract

Using eye tracking as a method to analyse how four subjects respond to the opening Omaha Beach landing scene in Saving Private Ryan (Steven Spielberg, 1998), this article draws on insights from cinema studies about the types of aesthetic techniques that may direct the audience’s attention along with findings about cognitive resource allocation in the field of media psychology to examine how viewers’ eyes track across film footage. In particular, this study examines differences when viewing the same film sequences with and without sound. The authors suggest that eye tracking on its own is a technological tool that can be used to both reveal individual differences in experiencing cinema as well as to find psychophysiologically governed patterns of audience engagement.

Introduction

Steven Spielberg’s Saving Private Ryan (1998) begins at a geriatric pace, ambling alongside an elderly World War II veteran as he visits a military cemetery and begins to reminisce about the men who saved his life during the Battle of Normandy in June, 1944. This is where the story really starts, with a platoon of terrified, seasick servicemen led by Captain John Miller (Tom Hanks) landing on Omaha Beach where they come under heavy fire by German infantry. The Omaha Beach landing scene is gruelling in its experiential intensity as the hand-held camera locates the audience alongside soldiers desperately fighting their way toward the enemy line amidst relentless machine gunfire and bone-shuddering explosions that tear them limb from limb.

An interdisciplinary 2014 study by Vittorio Gallese (one of the scientists credited with the discovery of mirror neurons), fellow neuroscientists Katrin Heimann and Maria Alessandra Umiltà, and film scholar Michele Guerra investigated the effects of camera movement on the audience’s feeling of involvement in film scenes and their ability to place themselves in the position of a screen character. This study was conducted using a high-density electroencephalogram (EEG) to test whether the audience’s experience of what Gallese (2012, 2013) terms “embodied simulation”—that is, neural mirroring responses that are associated with empathy—is affected by camera movement as well as by the action of human figures on screen. The researchers found that the relationship between cognition and action perception is significantly influenced by camera movement and that the use of camera techniques such as steadicam elicit stronger mirroring responses and an augmented sense of involvement in the scene because this type of cinematography more closely resembles human movement than static camera, zooms, or dolly-mounted tracking shots (Heimann et al. 2014, 2098–99).

These findings are consistent with eye tracking studies by Paul Marchant and colleagues who have demonstrated that the audience’s visual attention is captured and guided by mobile framing, focus, the direction of screen characters’ movement and lines of sight, and the colour and motion of other aspects of the mise-en-scene (Marchant et al. 2009, 157–58). This interplay of figure movement and the technical and aesthetic dimensions of cinematography is relevant to Saving Private Ryan in that the arresting beach landing scene at the start of the film is shot almost exclusively using hand-held camera to simulate human movement. The study by Heimann and colleagues suggests that this form of camera movement, teamed with the panicked motion of the figures on screen, functions to elicit a sense of affective identification with Captain Miller and the soldiers he leads by stimulating a shared experience of embodied confusion and sensory overload as the military men shake with fear and scramble to dodge the shrapnel ricocheting across the war-ravaged beach. The unstable gaze of the constantly moving camera makes it as difficult for the audience as it is for the soldiers in the scene to focus attention or see a pathway to safety and this shared perceptual experience may elevate neural mirroring responses or empathic concordance with observed actions.

Venturing into an area that has received less attention from either film scholars, media effects researchers or neuroscientists, we were struck by the acoustic ferocity of the Omaha Beach scene and we sought to understand the ways in which sound functions as a perceptual cue that may affect the cinema audience’s attention and modulate gaze patterns. This interdisciplinary study brings empirical eye tracking research into dialogue with formalist understandings of film style and cognitive engagement with narrative, using the following question to establish a framework for analysis: What audio-visual aesthetic cues guide the audience’s attention and what psychophysiological processes underlie audience responses to the screen? In particular, we draw on existing research on film dialogue by Todd Berliner and others and we supplement eye tracking data by drawing on Lisa Coulthard’s concept of “dirty sound,” Vivian Sobchack’s work on the “sonic imagination” and cognitivist methods of aesthetic film analysis to work through the experiential dimensions of the sonic confusion generated in the scene.

Cognitivist film theory, as advanced by scholars such as David Bordwell and Carl Plantinga, conceptualises film and television spectatorship as the active construction of meaning via the inferential elaboration of perceptual cues and formal screen production conventions. In a quest for greater explanatory power and a more holistic understanding of spectatorship that moves beyond rational thought and conscious inferential processes, film theorists are increasingly drawing upon empirical research in fields such as neuroscience, psychology and media effects to test assumptions about how audiences perceive and respond to screen texts, and to account for the sensory experiences and involuntary physiological reactions of the audience.

Psychophysiological Approaches to Cinema Studies

There are several different empirical approaches to studying audience members’ responses to film, including biometrics, neuroimaging, and psychophysiological techniques. Psychophysics is an area of research that quantifies physiological or bodily responses to psychological states. Neurocinema (Hasson et al. 2008) and Psychocinematics (Shimamura 2013) are emerging fields that connect these psychophysiological methods to cinematic experience. Where the neurocinematic approach involves imaging of the brain while watching cinema, in psychophysiology the subject’s physiological state is understood to be representative of psychological responses (for example, skin conductance and heart rate indicate arousal or an emotional reaction). One such response is an involuntary orienting response that assigns cognitive resources to processing stimuli in screen texts automatically.

Annie Lang (Lang 2000; Lang et al. 2000) proposes a model of responding to dynamic screen media that starts from the position that there are limited cognitive resources that any individual can bring to bear when processing mediated content. Features of the screen content can automatically consume some of those cognitive resources, which leaves less capacity for the intentional interpretation of meaning, formulation of hypotheses or speculation about protagonists’ motives (the very processes that cognitive film theory privileges). While this has been well developed for visual attributes, such as hard edits, movement and new features, Lang and colleagues are developing a similar catalogue of attributes for aural content (sound). Using a physiological indicator of an orienting response (a short, rapid decrease in heart rate just after the feature is introduced), they have identified “voice changes, music onsets, sound effect onsets, production effect onsets, emotional word onsets, silence onsets, and voice onsets” as aural cues that orient attention (Lang et al. 2014, 4).

Embodied responses to film are not necessarily indicative of cognitive processing as some responses occur in the autonomic nervous system (such as the startle response to a loud sound or a sudden movement); other processes involve the conscious allocation of cognitive resources. For example, seeing a poisonous reptile on screen can make the audience form hypotheses about impending danger, which can then prime emotional reactions such as anxiety. Increased heart rate during the shower scene in Psycho (Alfred Hitchcock, 1960), or light perspiration on the palms as viewers watch Grace Kelly fossick through the neighbour’s apartment in Rear Window (Alfred Hitchcock, 1954), are widely understood to be biological evidence of changes in psychological states in response to cinema. In such a state of arousal hormones are released, blood pressure rises, and brain wave patterns shift. These biological changes can be recorded using non-invasive techniques and have proven to be stable markers of psychophysiological changes. Some commonly used psychophysiological measures are eye tracking, Galvanic Skin Response (GSR), and pupillometry; however, in this exploratory study, only eye tracking has been used.

Eye Tracking

Eye tracking is a technique that can measure the movements of the eye by gauging the direction of infrared light bounced off the eye surface. The most common technique utilises the eye’s physiology to create different reflections of the light source from the pupil and cornea that are captured by two cameras and used to used to track the gaze and control for head and eye movements. While there are several types of eye tracking devices, those most pertinent to this study include eye trackers that require the viewer to be in a fixed position such as seated in front of a monitor, and those that can be head-mounted or worn like glasses by a mobile viewer. Eye tracking devices are used in a wide variety of fields including marketing, sports coaching and user experience. The range of Tobii Technology eye trackers are frequently employed as research tools to measure attention, as is the case in this study. Two of the main characteristics of eye movement that can be measured by eye tracking devices are saccades and fixations.

Saccades

In order to collect high-quality visual data about our environment, the eye needs to be constantly redirected. We use movements called saccades in order to do this. Saccades occur at a rate of about 2-3 per second (Tatler 2014) and can be voluntary or reflexive (Duchowski 2007). Their duration ranges from 10-100 ms, rendering the individual effectively blind during this time, but not for long enough to be perceivable: “Visual sensitivity effectively shuts down during a saccade via a process known as saccadic suppression, in order to ensure that the rapid movement of light across the retina is not perceived as motion blur” (Smith 2014, 86).

Fixations

A fixation is a length of time when the eyes stop large movements (saccades) and stay focused on a small visual range (typically about 5 degrees). Fixations should not be thought of as static, as the name implies, but as “miniature eye movements: tremor, drift and microsaccades” (Duchowski 2007, 46). Their duration is usually in the range of 150-600 ms (Duchowski 2007) and most visual information is processed when the eyes stabilize or fixate on a point on the screen (Smith 2014, 86).

Previous findings

A consistent finding from eye tracking research that is relevant to this study of cinema is that when scenes are viewed on a screen or a monitor, the gaze tends to fixate at the centre more than the periphery, even when salient features are not located in the middle of the frame. Because this tendency may be adaptive (for example the centre is a good resting place for fast response to new action that requires attending to), rather than solely visual, Benjamin Tatler (2014) warns against a reductive expectation that these fixations are caused by visual stimuli alone. While this study attends closely to visual stimuli and the aesthetic techniques used by filmmakers to direct attention, we also consider aural stimuli and involuntary biological responses.

Despite the large body of eye tracking research, Antoine Coutrot and colleagues claim that until recently, only two preliminary studies had investigated the influence of sound on eye movements and patterns of attention when watching film or video footage (Coutrot et al. 2012, 2).[i] When studying eye movements in response to the presence and absence of sound in audiovisual stimuli, Coutrot et al. analyse differences in three further eye tracking metrics: dispersion, distance to centre, and Kullback-Lieber Divergence. Dispersion refers to the “variability of eye positions between observers” (2012, 4). Distance to centre is a measurement of “the distance between the barycenter of a set of eye positions and the centre of the screen” (Coutrot et al. 2012, 4). Kullback-Leiber divergence “is used to estimate the difference between two probability distributions. This metric can be compared as a weighted correlation measure between two probability density functions… The lower the KL-divergence is, the closer the two distributions are… If soundtrack impacts on eye position locations, we should find a significant difference between the mean inter and intra KL-divergences” (Coutrot et al. 2012, 4). Dispersion provides information about the variability between eye positions, but does not determine the relative position of the two data sets of the eye positions for the two stimulus conditions (sound on/sound off). For the KL-divergence, it is the opposite.

Coutrot and colleagues (2012) found that eye movements follow a consistent pattern that is involuntary and that is not affected by screen aesthetics, narrative content, genre, sound or other factors in the first second following an edit. After a brief latent phase, the eye automatically refocuses on the centre of the screen after a cut and takes a second to adjust to the new image. Thereafter, they found that sound does influence gaze patterns in the following ways: dispersion is lower in the sound on condition than the sound off condition; fixation locations are different between the two conditions; sound results in larger saccades than the same footage without sound; and sound elicits longer fixations than sound off (Coutrot et al. 2012, 8).

More recently, Coutrot and Guyader found that “removing the original soundtrack from videos featuring various visual content impacts eye positions increasing the dispersion between the eye positions of different observers and shortening saccade amplitudes” (2014, 2). This study also found that in dialogue scenes, the audience’s attention tends to “follow speech turn taking more closely” (Coutrot and Guyader 2014, 1). A 2014 study by Tim Smith also investigated the cross-modal influences of audio on visual attention and found that “When the visual referent is present on the screen, such as the face of a speaker (that is, a diegetic on-screen external sound source), gaze will be biased towards the sound source, and towards the lips if the audio is difficult to interpret” (Smith 2014, 92). This accords with research in film studies into dialogue and conversation in movies. For instance, Berliner notes that movie dialogue is typically scripted to advance the narrative by directing the audience’s attention to key plot points and protagonists;[ii] furthermore, “characters in Hollywood movies communicate effectively and efficiently through dialogue” and “movie characters tend to speak flawlessly” (Berliner 2010, 191). Similarly, Aline Remael identifies the promotion of narrative continuity and textual cohesion as two of the chief functions of film dialogue (2003, 227; 233). Given these findings from two different fields of research, we pay particular attention to gaze patterns during dialogue exchanges in the analysis of Saving Private Ryan that follows.

Method

Building on previous work by Tim Smith, Antoine Coutrot, Nathalie Guyador and other researchers who have used eye tracking to investigate attentional synchrony[iii] (as illustrated in gaze plots and heat maps that represent the concentration of the audience’s gaze), our methodology examines the distribution of fixations across nine smaller central Areas of Interest (AOIs) during film sequences to explore what is occurring for viewers who are not following the predicted pattern and instead are searching for something else. Using two conditions as stimuli (film with sound on, and film with sound off), we conducted a qualitative comparison between and within the viewing patterns of four subjects. Within the limitations of a qualitative and exploratory study with only four subjects, we drilled down to conduct a fine-grained mapping of attention to determine whether it functions in a predictable way in relation to previous findings about dialogue scenes, sonic cues and attention in relation to camera and figure movement.

For the purposes of this study, a Tobii X-120 eye tracker and Tobii Studio 2.3.2 software (Tobii Technology, Stockholm, Sweden) were used to record seven individual subjects (five females, two males) as they watched film footage. As this was an exploratory study, subjects were recruited from the researchers’ networks, with ethics approval. They were seated and positioned 55-65 cm away from the eye tracker for viewing. All subjects were recorded on the same Tobii X-120, in the same room, with the film footage played on the Tobii computer to standardise start times for all subjects to enable comparison in later analysis. Each subject was successfully calibrated by looking at symbols in different areas of the screen, which ensures the eye tracker gets a reliable measure of gaze location across the whole screen. After the viewing session, each subject’s data was analysed for quality, with three subjects excluded because one condition had segments with lower reliability than desired. Thus, the results are reported on the basis of four subjects with high quality and complete data (three females, one male).

We analysed the areas of the screen where subjects looked while watching discrete sequences of the key beach-landing scene at the beginning of Saving Private Ryan. We investigated how different stylistic techniques employed in the following four consecutive sequences of the scene affect the audience’s gaze patterns:

  1. The “Indistinct Dialogue” sequence is an 11-second clip that was chosen with a view to finding out how the audience’s attention is affected when dialogue is overridden by chaotic background noise, forcing the audience to strain to decipher what is being said. This part of the scene occurs immediately after Captain Miller has located the men under his command. The first shot is an unsteady medium close-up of Miller shouting to Horvath (Tom Sizemore) as bullets splash in the surf around him and ping off the metal structure he is crouching behind. Miller yells, “Sergeant Horvath! (Explosion.) Move your men off the beach! (Water splashes up noisily.) Now!” The next shot shows Horvath’s response in a hand-held medium close-up as he points at his men and hollers, “OK you guys, get on my ass! (Directional hand signals as bullet hits metal and drowns out dialogue.) Follow me! (Horvath ducks as a mortar shell explodes, screen right.)
  2. The “Wounded Man” sequence that occurs as the men move up the beach is a 30-second segment that is noteworthy because it includes a subjective sequence that solicits audience engagement with Captain Miller’s experience of temporary hearing loss following the concussive impact of a mortar shell nearby. This clip begins with a hand-held long shot of carnage on the beach as Miller moves to the right, dragging Briggs, a wounded soldier he is trying to help. The audience hears artillery fire, crashing, splashing, and shouting as Miller lugs Briggs into the mid-ground, with explosions and debris visible in the foreground. As mortar shells hit, spraying blood and water upwards, Miller hollers for a medic. Following a massive explosion, the sound of gunfire in the background is muted and is replaced with the subdued drone of a low, echoing, wind-like sound that communicates Miller’s subjective experience of shellshock as sand obscures everything and Miller falls to the ground. In slow motion, we cut to a low level close-up of boots in the sand. Miller scrambles up and the hand held camera follows him. Other soldiers pass in front of the camera, occluding the lens and masking the edit. The sounds of the battlefield are replaced by an echoing, low frequency droning noise and the subdued clink of military gear as Miller is momentarily dazed by shellshock. As he gets up and grabs Briggs’s arm, the sound of artillery returns loudly and we see Miller in long-shot framed with a low level camera. He staggers, looks back, and realises that Briggs is dead: his lower abdomen and legs have been blasted away. This sequence ends with a close-up of Miller’s reaction as he looks at Briggs in shock, abandons him, crawls away from the camera, then stands and runs toward the sand dunes, into enemy fire and gun smoke.
  3. The “Sand Dunes” sequence is one unusually long and complex shot that lasts for a full minute; however, in the interests of generating a more granular analysis we divided the shot in two. The first 25 seconds “Sand Dunes: In Command” begins with a match on action as the body of a soldier that was catapulted into the air by a grenade in the previous shot now hits the ground. Quickly, the hand-held camera tilts down from the long-shot of the falling soldier, pans left, and follows Miller forward as he dives behind a ridge of sand for shelter. The camera pushes in to frame Miller in close up as the dialogue begins:

Miller: (Turns left to address the radio operator.) Shore Party! No armour has made it ashore. We got no DD tanks on the beach. Dog One is not open. (Miller rolls to the right so he is framed in an over-the-shoulder shot as he shouts to other soldiers seen in medium-long shot on the dune.) Who is in command here?

Soldier: You are, Sir.

Miller: Sergeant Horvath!

Horvath: Sir!

Miller: Do you know where we are?

Horvath: Right where we’re supposed to be, but nobody else is …

4. “Sand Dunes: Radio” is a 35-second continuation of the shot detailed in sequence three, beginning when the hand-held camera pans left as Miller rolls back toward the radio operator, facing the camera in close up as he grabs the radio operator’s shoulder and hollers in his ear, straining to be heard over the background gunfire.

Soldier: (Distant, off screen, as Miller rolls to the left.) Nobody’s where they’re supposed to be!

Miller: (To radio operator) Shore Party! First wave ineffective. We do not hold the beach. Say again, we do not hold the beach. (Miller turns and rolls back towards the right, away from the camera. The camera zooms toward Horvath, excluding Miller from the frame as he listens to Horvath.)

Horvath: (Indistinct) We’re all mixed up, sir. We got the leftovers from Fox Company, Able Company and George Company. Plus we got some Navy Demo guys and a Beachmaster.

Miller: (The camera follows Horvath as he rolls to the left, toward Miller; we then see Miller in medium close-up as he turns back to radio operator.) Shore party! Shore party! (Realises radio operator is deceased; grabs hold of the radio himself.) Cat-F, Cat-F, C-… (Miller realises the radio is dead.)

In Overhearing Film Dialogue, Sarah Kozloff states that “although what the characters say, exactly how they say it, and how the dialogue is integrated with the rest of the cinematic techniques are crucial to our experience and understanding of every film since the coming of sound, for the most part analysts incorporate the information given by a film’s dialogue and overlook the dialogue as signifier” (2000, 6). By contrast, our analysis focuses closely on what the characters say, and also on what they hear. It may seem counterintuitive to be investigating the significance of sound in an eye tracking study, because sound is something that the eyes are not normally required to process. However, the Omaha Beach dialogue sequences are unusual because the audience has to rely on their eyes to search for contextual cues in order to fill in gaps in understanding due to indistinct vocals. Such cues include the direction of figure movement and eye lines (when Horvath yells, “Get on my ass and follow me!” but his words are obscured by background sound), or facial expressions and body language when words don’t fully make sense due to the inclusion of military terminology or unfamiliar radio communication codes and incomplete communicative exchanges (when Miller shouts “Dog-One” and “Cat-F” into the radio and receives no response). Coutrot and colleagues identify numerous instances in which aural and visual stimuli interact to affect attention (they offer as one example, “the help given by ‘lip reading’ to understanding speech, even more when speech is produced in poor acoustical conditions,” as is the case in Saving Private Ryan); furthermore, they report that “perceivers gazed more at the mouth as auditory masking noise levels increased” (Coutrot et al. 2012, 2).[iv] Our study builds on this work, as detailed below.

With respect to each of the four sequences outlined above, two different analyses were conducted. Given that most of the viewing was within the central area of the screen, we subdivided that area into a three by three grid providing nine smaller areas of interest, as illustrated in Figure 1.[v] The total time fixated, and mean fixation duration, was calculated for each of the nine Areas of Interest (AOI). For the default method on the Tobii eye tracking system that we used, a fixation is identified when the gaze is steadily focused in the same area of X-Y coordinates on the screen, typically occurring within 35 pixels. This technique of dividing the centre of the screen into nine smaller AOIs allowed greater granularity in determining the primary AOI where most people attended and the instances where individuals diverged and looked at other parts of the screen.

Figure 1: Nine Areas of Interest (AOI)

Figure 1: Nine Areas of Interest (AOI), Saving Private Ryan: Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

The second step was to analyse whether the participants exhibited attentional synchrony and primarily look at the same AOIs as each other or displayed individual variation. For this, we calculated an estimate of attentional distribution. For this exploratory study, a simple ratio of time spent following the guided (dominant) viewing pattern compared to time looking at other parts of the screen was calculated.[vi] The dominant AOIs were determined by which AOIs account for over 50% of fixation time in the sound on condition (control). As some sequences guide the viewer between several of the AOIs, the combination that yielded a majority of viewing time was used rather than simply the AOI with the greatest portion of time. The Distribution ratio was then calculated as follows:

Distribution Ratio =

Number non-dominant AOIs viewed multiplied by amount of time in those AOIs

Number of AOIs in dominant view multiplied by amount of time in the dominant AOIs

Eye Tracking Results and Discussion

The distribution ratio was intended to see whether the findings of Coutrot et al. (2012) applied to what we saw in the sequences from the Omaha Beach landing scene. We hypothesised that a lack of sound would increase divergence away from the dominant AOI, compared to the null condition with sound that should reinforce “attentional synchrony” (Smith 2013) by guiding the viewer to the most important focal area. However, we expected that because some of the sequences we were examining contained many competing audio and visual contextual cues, we might not see such a clear distinction. Furthermore, we anticipated that our findings in relation to dialogue may diverge from Coutrot and Guyader (2014) because the Omaha Beach scene has atypical dialogue sequences that deviate from conventional turn-taking and are overloaded with noise and movement to create a sense of confusion.

We found that averaged across all sequences, three of the viewing subjects followed the expected pattern of greater divergence with sound off, as indicated by a positive difference score in Table 1. The fourth (Subject 3) had slightly higher gaze distribution when there was sound but did have an increase in the mean number of AOIs with fixations in the “sound off” condition. This would suggest that, on the whole, sound does function to focus attention more tightly. Given the small number of participants in this exploratory study, this result is encouraging.

    Table 1: Distribution Ratios Arranged by Subject     *Note: A positive difference indicates that in line with Coutrot et al. (2012), there was greater distribution of attention across AOIs for the no sound condition. The higher distribution ratio when sound was off could be due to either more total time fixated away from the dominant AOI or having fixations in more of the nine AOIs, being more spread out, or a combination of both.

Table 1: Distribution Ratios Arranged by Subject
*Note: A positive difference indicates that in line with Coutrot et al. (2012), there was greater distribution of attention across AOIs for the no sound condition. The higher distribution ratio when sound was off could be due to either more total time fixated away from the dominant AOI or having fixations in more of the nine AOIs, being more spread out, or a combination of both.

However, not all sequences of the beach scene elicited the same results. When the distribution data is averaged by sequence, it turns out that sequences 1 (d = -3.4) and 3 (d = -1.8) were strongly in the predicted direction with less distribution across AOIs when there was sound. For sequence 2 there was little difference (d < 1.0), but it was still in the predicted direction. However, for sequence 4 (d = 1.0), there was greater distribution of the fixations away from the dominant AOI in the sound on condition, indicating greater focus when there was no sound. Sequence 4 (“Sand Dunes: Radio”) does not follow screen conventions for shooting dialogue: it is shot in one long take rather than the customary shot-reverse-shot style and because many of Captain Miller’s lines contain military jargon and receive no response, the audience’s habituated expectations about turn-taking and shifting attention from speaker to speaker are derailed. Breaking with aesthetic and technical conventions may disrupt cognitive process of meaning-making when watching film. Another, more physiologically based reason that the “Sand Dunes: Radio” sequence may not conform to the gaze distribution patterns found in other parts of the scene is that it is the continuation of a very long take (together the sand dune sequences constitute a single, minute-long shot that viewers watched unbroken); consequently, viewers’ eyes are not re-focused on the centre of the screen following a cut and their eyes have more time to rove and explore the visual field for other meaningful cues. Put another way, without any cuts to generate an orienting response during this sequence there is no automatic allocation of cognitive resources to the story or refocusing of attention back onto a particular portion of the screen (Lang 2000, 2014). It is then a very individual response to novel and emotive (signal) cues within the scene that drives where each subject looks over the duration of this sequence. With so many different types of auditory cues that orient the viewer (Lang 2014), it is not surprising that viewers had fixations in the various AOIs we analysed on the screen.

In Saving Private Ryan, as has been found to be the case in other films such as Sergei Eisenstein’s 1938 historical war epic, Alexander Nevsky (see Smith 2014), the overall viewing patterns reflect the intention of the director in that audience members typically look where they are guided to by audio-visual screen conventions. Yet, an important reminder for further investigation is that not all members of an audience respond in the same way to each scene. This leads us to question what cues other than the lack of a sound track might lead to increased gaze distribution.

Even though the size of the nine AOIs and the number of participants was small, paired-sample t-tests were conducted to see whether there were any statistically significant differences at the level of each of the nine AOIs between the sound and no sound viewing experiences of the participants. No significant differences between the sound on and off conditions were found for fixation duration, total time fixated, visit duration or total time visiting any particular AOI for three of the sequences: “Indistinct Dialogue,” “Wounded Man,” or “Sand Dunes: Radio.” However, for “Sand Dunes: In Command,” subjects spent significantly (p < .05) less time looking at areas 5 and 6 when the sound was off (significant greater mean fixation duration, total time fixated and total time visiting AOI 5 when the sound was on; only total time fixated on AOI 6 was greater when sound was present). Interestingly, this did not translate to an increase in any particular AOI so it seems their gazes spread out significantly (dispersed) in the sound off condition.

With a small sample size, it is not surprising that there were few statistical differences, but it was surprising that focusing on the central area as represented by the nine AOIs did not pick up what seemed “obvious” when looking at the aggregated gaze plots. For example, in the “Indistinct Dialogue” sequence (see Figure 2) there is an explosion on the right-hand side of the screen that equally drew the attention of the subjects when sound was off as well as on (the screen characters ducked as the mortar shell whistled in, so aural and visual cues reinforced each other). Although there was one subject who looked down to the lower right part of the screen outside our central area of view when sound was off, the overall pattern was consistent in both conditions.

Figure 2a (above) and Figure 2b (below): Gaze plots of four subjects for “Indistinct Dialogue” (Sequence 1)

Figure 2a (above) and Figure 2b (below): Gaze plots of four subjects for “Indistinct Dialogue” (Sequence 1): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

We explored whether eye tracking could help reveal differences in viewing experience amongst subjects or even offer insight into what was happening beneath the surface of apparent synchrony. An obvious finding is that each individual subject had a different gaze pattern across the four sequences sampled, as can be seen when examining the pattern as recorded for Subject 2 in Figure 4 and Figure 6. For the longer sequences, their gaze fixated in more than half of the AOIs, while for the shorter sequences they were often more focused on particular AOIs. However, this pattern could change with sound on and off. For example when examining the pattern for Subject 1 for “Indistinct Dialogue”, there was a noticeable difference between sound on and off such that they only viewed three AOIs with sound, but with sound off they spread out to three new AOIs, ranging across six in total. The greatest shift was away from time fixated in the top left corner of our central area (when the footage was played with sound), contracting to the central third of the screen (without sound).

Sean Redmond and colleagues reported that the presence of sound only has an effect on fixation duration (number of fixations) for the “Wounded Man” sequence (forthcoming 2015). In the “Wounded Man” sequence, there was no overall difference in gaze location with sound on or sound off. However, the gaze fixation pattern for Subject 2 showed a large qualitative difference (see Figure 3). With sound off, Subject 2 only looked at AOIs 6, 8 and 9 (bottom right part of the central area, which is consistent with Miller’s screen direction and the action of falling to the ground and dragging the wounded soldier, Briggs, in this sequence). However, with sound on, Subject 2 fixates at least briefly in all nine AOIs, with the most time shifting to the centre of the screen where noisy background action is taking place and where other soldiers rapidly pass in front of the camera.

Figure 3: Areas of Interest for subject 2 in “Wounded Man” (Sequence 2)

Figure 3: Areas of Interest for subject 2 in “Wounded Man” (Sequence 2): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

However, when comparing all of the subjects and how they responded to the “Wounded Man” sequence (see Figure 4), the other three subjects exhibit similar patterns of scanning across the AOIs when sound is on and off. This pattern is what we expected for this sequence, which incorporates a significant subjective sound component when Miller experiences shellshock and is temporarily stunned and deafened. It is possible that subjective sound may help to anchor the viewer’s attention to the character’s experience.[vii]

Figure 4: Total fixation duration in nine AOIs for “Wounded Man” (all subjects)

Figure 4: Total fixation duration in nine AOIs for “Wounded Man” (all subjects): Graphs produced by the authors

A final illustration is the “Sand Dunes: In Control” sequence. Subject 3 was interesting because their viewing patterns for the “Wounded Man” (sequence 2) and the “Sand Dunes: Radio” (sequence 4) were consistent with the other subjects; however, the focus for “Sand Dunes: In Command” (sequence 3) was inconsistent. There were clear AOIs for the sound off condition, but with sound their eyes wandered over more of the central area of the screen. Attention is focused on the radio operator in the sound off condition (as indicated by the red bar in AOI 4, middle left of Figure 5). However, with sound on (indicated by the blue bars), the subject’s attention extends to new AOIs, including corners (top left, bottom left, and bottom right) that were not fixated on when there were no sound cues.

Figure 5: Areas of Interest for subject 4 in “Sand Dunes: In Command” (Sequence 3)

Figure 5: Areas of Interest for subject 4 in “Sand Dunes: In Command” (Sequence 3): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

A comparison between how the subjects fixated during this segment when the sound was on and off (Figure 6), illustrates a similar pattern of having eyes fixate in different AOIs when sound was on and off, except for Subject 1. Given the length of this sequence, the focus on three main AOIs for all subjects in the sound off condition is interesting in its consistency. However, even though the fixation data averages out to no difference between sound on and off, each of our subjects had a different response when sound was on—from fixating in all of the AOIs by Subject 4 to just staying longer on the same AOIs for Subject 1. The variation between sound on and off for this sequence may simply be an artefact of camera and figure motion, where shifts in the location of the protagonists’ faces on the screen can result in fixations in non-dominant AOIs (Mital et al. 2011, 19). However, the fact that these shifts did not occur in both conditions indicates that there is something different about those shifts when sound is on and the viewer can hear the dialogue. This is the only sequence where there was a significant difference in total time spent fixated in AOIs 5 and 6. The much lower time spent on key AOIs when sound was off suggests the subjects were looking to the periphery of the screen and did not look as much to the nine central AOIs.

Figure 6: Total fixation duration for nine AOIs in “Sand Dunes: In Command” (all subjects)

Figure 6: Total fixation duration for nine AOIs in “Sand Dunes: In Command” (all subjects): Graphs produced by the authors

Relating Eye Tracking Findings to Film Aesthetics

Our qualitative, exploratory analysis of gaze patterns in Saving Private Ryan has used eye tracking to offer an empirical account of cognitive-perceptual processing that includes sound and attends closely to audio-visual cues in the film’s stylistic system. In this way we have sought both to redress the limitations of theoretical approaches to film analysis that privilege inferential cognitive processes and to counterbalance the tendency of empirical studies to neglect the role of screen aesthetics in informing audience responses. In particular, we have built on existing work on eye tracking by taking account of how the aesthetic and experiential deployment of sound might affect perceptual processing, given that sound waves have “palpable force,” which means that sound seems “more materialized, more concrete, and more present to our experience than what we can see” (Sobchack 2005, 7). We have worked from the premise that expectations regarding character and narrative are chief among the ways that screen texts engage audiences in the construction of meaning, yet we have also acknowledged that the process of meaning-making is informed by aesthetic cues and by physiological sense-making, which is in part involuntary. The dangerous and chaotic Omaha Beach scene is what Man-Fung Yip refers to as an “intensified sensory environment” in which, “as a concrete visual and visceral force rather than a mere vehicle for semiotic signification, film violence offers an intensity of raw, immediate sensation that powerfully engages the eye and body of the spectator” (2014, 78). Like Yip, our interest has been sustained by “the complex interplay between the capacity of the human body and the resources of the cinematic medium” (2014, 89).

In her acoustic study of extreme cinema, Lisa Coulthard refers to the use of “deliberate sonic imperfections” (2013, 115) in films in which “visual assaultiveness” is paired with a disturbing soundscape: “Capable of impacting the body in palpable ways, sound is mined in many of these films for its viscerality: as one listens to extremities of acoustic proximity, frequency and volume, one’s own body responds in subconscious ways to those depicted and heard on screen” (2013, 117). These insights about sound are pertinent to Saving Private Ryan in that the Omaha Beach scene is designed to bombard the audience with the relentless onslaught of noise and action that the characters themselves face. In analysing this scene, we began with the hypothesis that “when the intensity of a background sound exceeds a certain threshold, mental activity can be paralyzed” (Augoyard and Torgue 2005, 41 qtd in Coulthard 2013, 118). In other words, we questioned whether the frenzied barrage of sound might cause a form of sensory-cognitive overload that could affect typical patterns of perceptual processing.

The eye tracking results did not reveal a significant pattern for scenes where we predicted this would occur. It would be helpful to explore this in the future with other physiological or neuro-measures that are better at identifying moments of cognitive overload or resource allocation. What does seem to emerge from our exploratory study is that even in films that firmly direct attention, as is characteristic of Spielberg’s directorial style, individual audience members bring their own complexity and experience to the viewing.[viii] Lang and colleagues point out that “complexity should be indexed not by how much of something is in the message but rather by how many human processing resources will be consumed when the message is processed” (Lang et al. 2014, 2). With respect to understanding the specific effects of what individual viewers bring with them to the screen, or teasing out how the audience is affected when watching footage that uses hand-held camera, induces cognitive overload, invokes the acoustic imagination, or uses indistinct dialogue, we conclude that this eye tracking study has raised fruitful questions that may best be answered by an approach that includes multiple measures provided by electroencephalograms, pupillometry or galvanic skin response techniques, as well as eye tracking technologies.

The “Indistinct Dialogue” and “Sand Dunes” sequences have what Sobchack terms a larger number of “‘synch points’ (‘salient’ and ‘prominent’ moments of audio-visual synchrony),” such as lines of dialogue, bullets pinging off metal and mortar shells landing, and these sonic cues “are firmly attached in a logically causal—and conventionally ‘realistic’—relation to the image’s specificity” (2005, 6–7). These synchronised sounds are “not as acousmatically imaginative and generative” as we contend that the subjective sound in the “Wounded Man” sequence is because the sounds appear to be “generated from the physical action” seen on the screen (Sobchack 2005, 7). In the “Indistinct Dialogue” and “Sand Dunes” conversation sequences, our eye tracking experiment did not necessarily reveal greater attentional focus with dialogue. While this counters what Coutrot and colleagues found in their 2012 study and Smith’s finding that sound reinforces visual synchrony (Smith 2013), it is in line with our expectation that the unconventional use of indistinct dialogue and chaotic background sound and imagery would disperse attention. Perhaps dialogue is not something that focuses visual attention, but rather something that focuses engagement. When the dialogue is clear, the viewer is able to look around the screen and absorb other cues about context. Precisely because the linguistic meaning is clear, such expository dialogue does not require as many cognitive resources to process and leaves some free for assessing other audio-visual cues. However, when the dialogue is indistinct, the viewer must then use other cues to work out the importance of the speech; in such cases the audience is essentially in the same position as watching without sound—although they may even be worse off in terms of cognitive resource allocation because there is also a barrage of other sound being processed in concert with the visual stimuli.

Overall, our use of eye tracking in conjunction with aesthetic analysis in our investigation of Saving Private Ryan has supported Coutrot and colleagues’ 2012 findings that dispersion (the degree of variability between observers’ eye positions) was lower with sound than without, so sound generally acted to concentrate perceptual attention. However, unlike Coutrot et al., we teamed eye tracking with qualitative film analysis to explore the effect of aesthetic variation and individual differences on gaze patterns as well as to identify common psychophysiologically governed patterns of attention. In this exploratory study, we found that differences in aesthetic techniques within segments of footage in the same film scene do make a difference to the audience’s gaze patterns and attentional fixation, and we found that within these patterns individual subjects exhibited divergent perceptual processes as well. Although our study is more restricted than comparable work undertaken by Coutrot and others, our attention to screen aesthetics and to variations in subjects’ responses within a single scene affords our method broader explanatory power than a study that excludes outliers and looks for commonalities across a wide range of video styles and genres.

 

References

Alexander Nevsky. Directed by Sergei Eisenstein, 1938. Mosfilm, DVD.

Augoyard, Jean-Francois, and Henry Torgue. 2005. Sonic Experience: A Guide to Everyday Sounds. Translated by Andra McCartney and David Paquette. Montreal: McGill Queen’s University Press.

Berliner, Todd. 2010. Hollywood Incoherent: Narration in Seventies Cinema. Austin: University of Texas Press.

Bordwell, David. 2009. “Cognitive Theory.” In Routledge Companion to Philosophy and Film, edited by Paisley Livingston and Carl Plantinga. 356–367. London: Routledge.

Coulthard, Lisa. 2013. “Dirty Sound: Haptic Noise in New Extremism.” In The Oxford Handbook of Sound and Image in Digital Media, edited by Carol Vernallis, Amy Herzog and John Richardson. 115–126. New York: Oxford University Press.

Coutrot, Antoine, Gelu Ionescu, Nathalie Guyader and Bertrand Rivet. “Audio Tracks do not Influence Eye Movements when Watching Videos.” Paper presented to the 34th European Conference on Visual Perception, Toulouse, France August 30, 2011.

Coutrot, Antoine, Nathalie Guyader, Gelu Ionescu and Alice Caplier. 2012. “Influence of Soundtrack on Eye Movements During Video Exploration.” Journal of Eye Movement Research 5.5: 1–10.

Coutrot, Antoine and Nathalie Guyader. 2014. “How Saliency, Faces, and Sound Influence Gaze in Dynamic Social Scenes.” Journal of Vision 14.8: 5.

Duchowski, Andrew T. 2007. Eye Tracking Methodology Theory and Practice. Dordrecht, Springer.

Gallese, Vittorio. 2013. “Mirror Neurons, Embodied Simulation and a Second-person Approach to Mind-reading.” Cortex in press: 1–3. Accessed August 28, 2014, http://dx.doi.org/10.1016/j.cortex.2013.09.008

Gallese, Vittorio and Michel Guerra. 2012. “Embodying Movies: Embodied Simulation and Film Studies.” Cinema: Journal of Philosophy and the Moving Image 3: 183–210.

Hasson, Uri, Ohad Landesman, Barbara Knappmeyer, Ignacio Vallines, Nava Rubin and David J. Heeger. 2008. “Neurocinematics: The Neuroscience of Film” Projections 2.1: 1-26.

Heimann, Katrin, Maria Alessandra Umiltà, Michele Guerra and Vittorio Gallese. 2014. “Moving Mirrors: A High-density EEG Study Investigating the Effect of Camera Movements on Motor Cortex Activation during Action Observation.” Journal of Cognitive Neuroscience 26.9: 2087–2101.

Kozloff, Sarah. 2000. Overhearing Film Dialogue. Berkeley: University of California Press.

Land, Michael, Neil Mennie and J. Rusted. 1999. “The Roles of Vision and Eye Movements in the Control of Activities of Daily Living.” Perception 28.11: 1311–1328.

Lang, Annie. 2000. “The Limited Capacity Model of Mediated Message Processing.” Journal of Communication 50.1: 46–70.

Lang, Annie, Shuhua Zhou, Nancy Schwartz, Paul D. Bolls and Robert F. Potter. 2000. “The Effects of Edits on Arousal, Attention, and Memory for Television Messages: When an Edit is an Edit Can an Edit be too Much?” Journal of Broadcasting & Electronic Media 44.1: 94–109.

Lang, Annie, Ya Gao, Robert F. Potter, Seungjo Lee, Byungho Park and Rachel L. Bailey 2014. “Conceptualizing Audio Message Complexity as Available Processing Resources.” Communication Research, published online before print. Accessed September 28, 2014, doi: 10.1177/0093650213490722

Marchant, Paul, David Raybould, Tony Renshaw and Richard Stevens. 2009. “Are you seeing what I’m seeing? An Eye-tracking Evaluation of Dynamic Scenes.” Digital Creativity 20.3: 153–163.

McGurk, Harry and John MacDonald. 1976. “Hearing Lips and Seeing Voices.” Nature 264.5588: 746–8. doi:10.1038/264746a0.

Mital, Parag, Tim J. Smith, Robin Hill and Jim Henderson. 2011. “Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion.” Cognitive Computing 3: 5–24

Plantinga, Carl. 2009. Moving Viewers: American Film and the Spectator’s Experience. Berkeley: University of California Press.

Psycho. Directed by Alfred Hitchcock, 1960. Shamley Productions, DVD.

Rear Window. Directed by Alfred Hitchcock, 1954. Paramount, DVD.

Redmond, Sean, Sarah Pink, Jane Stadler, Jenny Robinson, Andrea Rassell and Darrin Verhagen. 2015 (forthcoming). “Seeing, Sensing Sound: Eye Tracking Soundscapes in Saving Private Ryan and Monsters Inc.” In Making Sense of Cinema: Empirical Studies into Film Spectators and Spectatorship, edited by CarrieLynn D. Reinhard and Christopher J. Olson. New York: Bloomsbury.

Remael, Aline. 2003. “Mainstream Narrative Film Dialogue and Subtitling.” The Translator 9.2: 225–247.

Saving Private Ryan. Directed by Steven Spielberg. 1998. Dreamworks/Paramount. DVD.

Shimamura, Arthur, ed. 2013. Psychocinematics: Exploring Cognition at the Movies. New York: Oxford University Press.

Sita, Jodi. 2014. Personal Communication. 19 June 2014. Australian Catholic University: Victoria, Australia.

Smith, Tim J. 2014. “Audiovisual Correspondences in Sergei Eisenstein’s Alexander Nevsky: A Case Study in Viewer Attention.” In Cognitive Media Theory (AFI Film Reader), edited by Paul Taberham and Ted Nannicelli. 85–105. New York: Routledge.

Smith, Tim J. 2013. “Watching You Watch Movies: Using Eye Tracking to Inform Cognitive Film Theory.” In Psychocinematics: Exploring Cognition at the Movies, edited by Arthur P. Shimamura. 165–191. New York: Oxford University Press.

Sobchack, Vivian. 2005. “When the Ear Dreams: Dolby Digital and the Imagination of Sound.” Film Quarterly 58.4: 2–15.

Song, Guanghan, Denis Pellerin and Lionel Granjon. 2011. “Sound Effect on Visual Gaze When Looking at Videos.” In 19th European Signal Processing Conference. 2034–2038. Barcelona: EUSIPCO 2011.

Tatler, Benjamin. 2014. “Eye Movements from Laboratory to Life.” Current Trends in Eye Tracking Research, edited by Mike Horsley, Matt Eliot, Bruce Allen Knight and Ronan Reily. 17–35. London: Springer.

Võ, Melissa, Tim J. Smith, Parag Mital and John Henderson. 2012. “Do the Eyes Really Have it? Dynamic Allocation of Attention when Viewing Moving Faces.” Journal of Vision. 12.13(3): 1–14 http://www.journalofvision.org/content/12/13/3.full

Yip, Man-Fung. 2014. “In the Realm of the Senses: Sensory Realism, Speed, and Hong Kong Martial Arts Cinema.” Cinema Journal 53.4: 76–97.

 

List of figures

Figure 1: Nine Areas of Interest (AOI), Saving Private Ryan: Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 2a: Gaze plots of four subjects for “Indistinct Dialogue” (Sequence 1): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 2b: Gaze plots of four subjects for “Indistinct Dialogue” (Sequence 1): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 3: Areas of Interest for subject 2 in “Wounded Man” (Sequence 2): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 4: Total fixation duration in nine AOIs for “Wounded Man” (all subjects): Graphs produced by the authors

Figure 5: Areas of Interest for subject 4 in “Sand Dunes: In Command” (Sequence 3): Still from Saving Private Ryan (Steven Spielberg, 1998), data illustrated by the authors

Figure 6: Total fixation duration for nine AOIs in “Sand Dunes: In Command” (all subjects): Graphs produced by the authors

 

Notes

[i] The two preliminary sound-based eye tracking studies preceding Coutrot et al’s 2012 publication are a conference presentation by Coutrot et al. (2011), and a conference paper by Song, Pellerin, and Granjon (2011). However, in 2012 Melissa Võ and colleagues also published a study that investigated the effects on attention to faces in videos when the auditory speech track was removed. This study found that when speech was not present, observers’ gaze allocation changed: they looked more at the scene background and decreased fixations to faces generally and especially decreased concentration on the mouth region (Võ et al. 2012, 12).

[ii] A study of everyday attention indicates that people exhibit visual search behaviours that anticipate, locate, and monitor action, which is evidence of top down influences on visual perception (see Land et al. 1999).

[iii] Tim Smith states that “The degree of attentional synchrony observed for a particular movie frame will vary depending on whether it is from a Hollywood feature film or from unedited real-world footage, the time since a cut and compositional details such as focus or lighting but attentional synchrony will always be greater in moving images than static images” (2014, 90).

[iv] The lip-reading phenomenon is called the “McGurk effect” (see McGurk 1976).

[v] For further discussion of central areas of interest in Saving Private Ryan, see Redmond et al. (2015).

[vi] Established formulae for dispersion and other measures of individual variation in gaze pattern exist (e.g., Coutrot 2012). As an exploratory study, we were limited by both number of subjects and post hoc data analysis. This distribution estimate was a sufficient way to capture dominant and non-dominant viewing. However, we would recommend future research develop a better variance measure of asynchronous viewing, such as the Kullback-Lieber Divergence formula referred to above.

[vii] Note that similar results were obtained in a related study of a sequence earlier in the beach-landing scene that depicts Captain Miller’s experience of shellshock (Redmond et al. forthcoming 2015).

[viii] A neuroimaging study comparing responses to film clips ranging from a sequence directed by Alfred Hitchcock to a segment of actuality footage shot in Washington Square Park found that higher levels of aesthetic control generate greater viewer synchrony or inter-subject correlation in the audience’s viewing patterns and brain activity (Hasson et al. 2008, 15).

 

Bios

Dr Jennifer Robinson is Lecturer in Public Relations, School of Media and Communication at RMIT University. She authors industry reports and has published in J Advertising, BMC Public Health, J Interactive Marketing and the J Public Relations Research. Her media effects research investigates new media and media audiences using neuro-measures.

Jane Stadler is Associate Professor of Film and Media Studies, School of Communication and Arts at the University of Queensland. She is author of Pulling Focus: Intersubjective Experience, Narrative Film and Ethics, and co-author of Screen Media and Media and Society.

Andrea Rassell is a PhD student and Research Assistant in the School of Media and Communication at RMIT University. She has a professional background in both science and film and researches at the nexus of the two disciplines.

How We Came To Eye Tracking Animation: A Cross-Disciplinary Approach to Researching the Moving Image – Craig Batty, Claire Perkins, & Jodi Sita

Abstract

In this article, three researchers from a large cross-disciplinary team reflect on their individual experiences of a pilot study in the field of eye tracking and the moving image. The study – now concluded – employed a montage sequence from the Pixar film Up (2009) to determine the impact of narrative cues on gaze behaviour. In the study, the researchers’ interest in narrative was underpinned by a broader concern with the interaction of top-down (cognitive) and bottom-up (salient) factors in directing viewers’ eye movements. This article provides three distinct but interconnected reflections on what the aims, process and results of the pilot study demonstrate about how eye tracking the moving image can expand methods and knowledge across the three disciplines of screenwriting, screen theory and eye tracking. It is in this way both an article about eye tracking, animation and narrative, and also a broader consideration of cross-disciplinary research methodologies.

 

Introduction

Over the past 18 months, a team of cross-disciplinary researchers has undertaken a pilot eye tracking and the moving image study that has sought to understand where spectators look when viewing animation.[i] The original study employed eye tracking methods to record the gaze of 12 subjects. It used a Tobii X120 (Tobii Technology, 2005) remote eye tracking device which allowed viewers to watch the animation sequence on a widescreen PC monitor at 25 frames per second, with sound. The eye tracker pairs the movements of the eye over the screen with the stimuli being viewed by the participant. For each scene viewed, the researchers selected areas of interest; and for these areas, all of the gaze data, including the number and duration of each fixation, was collected and analysed.

Using a well-known montage sequence from the Pixar film Up! (2009), this pilot study focussed on narrative with the aim of discerning whether story cues were instrumental in directing spectator gaze. Focussing on narrative seemed to be useful in that as well as being an original line of enquiry in the eye tracking context, it also offered a natural connection between each of our disciplines and research experiences. The study did not take into account emotional and physiological responses from its participants as a way of discerning their narrative comprehension. Nevertheless, what we found from our data was that characters (especially their faces), key (narrative) objects and visual/scenic repetition seemed to be core factors in determining where they looked.[ii]

In the context of a montage sequence that spans around 60 years of story time, in which the death of the protagonist’s wife sets up the physical and emotional stakes of the rest of the film, it was clear that narrative meaning relating to a character’s journey/arc is important to viewers, more so (in this study) than peripheral action or visual style, for example. With regards to animation specifically, a form ‘particularly equipped to play out narratives that solicit […] emotions because of its capacity to illustrate and enhance interior states, and to express feeling that is beyond the realms of words to properly capture’ (Wells, 2007: 127), the highly controlled nature of the sequence from which the data was drawn seems to suggest that animation embraces narrative techniques fully to control viewer attention.

In this article, three researchers from the team – A, a screenwriter, B, a screen scholar and C, an eye tracking neuroscientist – discuss the approaches they took to conducting this study. Each of us came to the project armed with different expertise, different priorities and a different set of expectations for what we might find out, which we could then take back to our individual disciplines. In this article, then, we purposely use three voices as way of teasing out our understandings before, during and after the study, with the aim of better understanding the potential for cross-disciplinary research in this area. Although other studies in eye tracking and the moving image have been undertaken and reported on, we suggest that using animation with a strongly directed narrative as a test study provides new information. Furthermore, few other studies to date have brought together traditional and creative practice researchers in this way.

What we present, then, is a series of interconnected discussions that draw together ideas from each researcher’s community of thought and practice, guided by the overriding question: how did this study embrace methodological originality and yield innovative findings that might be important to the disciplines of eye tracking and moving image studies? We present these discussions in the format of individual reflections, as a way of highlighting each researcher’s contributions to the study, and in the hope that others will see the potential of disciplinary knowledge in a study such as this one.

How ‘looking’ features in our disciplines, and what we might expect to ‘see’

Researcher A: ‘Looking’ in screenwriting means two things: seeing and reflecting on. By this I mean that a viewer looks at the screen to see what is happening, whilst at the same time reflecting on what they are looking at from on a personal, cultural and/or political level. Some screenwriters focus on theme from the outset: on what they want their work to ‘say’ (see Batty, 2013); some screenwriters focus on plot: on what viewers will see (action) (see Vogler, 2007). What connects these is character. In Aristotelian terms, a character does and therefore is (Aristotle, 1996); for Egri, a character is and therefore does (Egri, 2004). The link here is that what we see on the screen (action) is always performed by a character, meaning that through a process of agency, actions are given meaning, feeding into the controlling theme(s) of the text. In this way, looking at – or seeing – is tied closely to understanding and the feelings that we bring to a text. As Hockley (2007) says, viewers are sutured into the text on an emotional level, connecting them and the text through the psychology of story space.

What we ‘see’, then, is meaning. In other words, we do not just see but we also feel. We look for visual cues that help us to understand the narrative unfolding before our eyes. With sound used to point to particular visual aspects and heighten our emotional states, we bestow energy and emotion in the visuality of the screen, in the hope that we will arrive at an understanding. As this study has revealed, examples include symbolic objects in the frame (the adventure book; the savings jar; the picture of Paradise Falls) that have narrative value in screenwriting because of the meaning they possess (Batty and Waldeback, 2008: 52-3). By seeing these objects repeated throughout the montage, we understand what they mean (to the characters and to the story) and glean a sense of how they will re-appear throughout the rest of the film as a way of representing the emotional space of the story.

Landscape is also something we see, though this is always in the context of the story world (see Harper and Rayner, 2010; Stadler, 2010). In other words, where is this place? What happens here? What cannot happen here? Characters belong to a story world, and therefore landscape also helps us to understand the situations in which we find them. This, again, draws us back to action, agency and theme: when we see landscape, we are in fact understanding why the screenwriter chose put their characters – and us, the audience – there in the first place.

Researcher B: In screen theory, looking is never just looking – never innocent and immediate. The act of looking is the gateway to the experience and knowledge of what is seen on screen, but also of how that encounter reflects the world beyond the screen and our place within it. Looking is over determined as gazing, knowing and being, endlessly charged by the coincidence of eye and I and of real and reel. Psychoanalytic theory imagines the screen as mirror and our identity as a spectatorial effect of recognizing ourselves in the characters and situations that unfold upon it, however refracted. Reception studies seeks out how conversely real individuals encounter content on screen, and how meaning sparks in that meeting—invented anew with every pair of eyes. Television studies emerges from an understanding of a fundamental schism in looking: where the cinematic apparatus enables a gaze, the televisual counterpart can (traditionally) only produce a broken and distracted glance.

All of these theories begin with the act of looking, and are enabled by it in their metaphors, methods and practices. But in no instance is looking attended to as anatomical vision – the process of the “meat and bones” body and brain rather than the metaphysical consciousness. As a scholar of screen theory, my base interest in eye tracking comes down to this “problem”. Is it a problem? Should the biology and theory of looking align? What effects and contradictions arise when they are brought together?

Phenomenological screen theory is a key and complex pathway into this debate, as an approach that values embodied experience, but discredits the ocular—seeking to bring the whole body to spectatorship rather than privilege the centred and distant subject of optical visuality (Marks, 2002: xvi). Vivian Sobchack names film ‘an expression of experience by experience … an act of seeing that makes itself seen, an act of hearing that makes itself heard’ (Sobchack, 1992: 3). Eye tracking shows us the act of seeing – the raw fixations and movements with which screen content is taken in. In the study under discussion here it is this data that is of central interest, with our key questions deriving from what such material can verify about how narrative shapes gaze behaviour. A central question and challenge for me moving forward in this field, though, is to consider this process without ceding to ocularcentrism: that is, without automatically equating seeing to knowing. This ultimately means being cautious about reading gaze behaviour as ‘proof’ of what viewing subjects are thinking, feeling and understanding. This approach will be supported by the inclusion of further physiological measurements.

Researcher C: Interest in vision and how we see the world is an age-old interest, where it has been commonly held that the eyes are the windows to the mind. Where we look is then of great importance, as learning this offers us opportunities to understand more about where the brain wants to spend its time. Human eyes move independently from our heads and so our eyes have developed a specialised operating systems that both allows our eyes to move around our visual environment, and also counteract any movements the head may be making. This has led to a distinct set of eye movements we can study – saccades (the very fast blasts of movement that pivot our eye from focus point to focus point) – and fixations (brief moments of relative stillness where our gaze stops for a moment to allow the receptors in our eye to collect visual information). In addition, only a tiny area of the back of our eyeball, the fovea on the retina, is sensitive enough to gather highly ‘acuitive’ information, thus the brain must drive the eye around precisely in order to get light to fall onto this tiny area of the eye. As such, our eyes movements are an integral and essential part of our vision system.

Eye movement research has seen great advances during the last 50 years, with many early questions examined in the classic work of Buswell (1935) and Yarbus (1967). One question visual scientists and neuroscientists have been, and are still keen to, explore is why we look where we do: what is it about the objects or scene that draws our visual attention? Research over the decades has found that several different aspects are involved, relating to object salience, recognition, movement and contextual value (see Schütz et al., 2011). For animations that are used for learning purposes, Schnotz and Lowe (2008) discussed two major contributing factors that influence the attention-grabbing properties of features that make up this form. One is visuospatial contrast and a second is dynamic contrast; with features that are relatively large, brightly coloured or centrally placed, more likely to be fixated on compared to their less distinctive neighbours; and features that move or change over time drawing more attention.

Eye tracking research, which is now easier than ever to conduct, allows us to delve into examining how these and other features influence us, and is a unique way to gain access to the windows of the mind. Directing this focus to learning more about how we watch films, and in particular to animation, is what drove me to wanting to use eye tracking to better see how people experience these; and to delve into questions such as, what are people drawn to look at, and how might things like the narrative affect the way we direct our gaze?

When looking around a visual world, our view is often full of different objects and we tend to drive our gaze to them so we can recognize, inspect or use them. Not so surprisingly, what we are doing (our task at hand) strongly affects how we direct our gaze; such that as we perform a task, our salience-based mechanisms seem to go offline as people almost exclusively fixate on the task-relevant objects (Hayhoe, 2000; Land et al., 1999). From this, one expectation we have when considering how viewers watch animation is that more than salient features, aspects relating to the narrative components of the viewer’s understanding of the story will be the stronger drive. Another well-known drawcard for visual attention is towards faces, which tend to draw the eye’s attention very strongly (Cerf et al., 2009; Crouzet et al., 2010). For animated films we were interested to see if similar effects would be observed.

Finally, another strong and interesting effect that has been discussed is a tendency for people to have a central viewing bias, in which a large effect on viewing behaviour has been shown to be that people tend to fixate in the centre of a display (Tatler and Vincent, 2009). As this study was moving image screen based, we were keen to compare different scenes and how the narrative affected this tendency.

How we came to the project, and what we thought it might reveal

Researcher A: From a screenwriting perspective, I was excited to think that at last, we might have data that not only privileges the story (i.e., the screenwriter’s input), but that also highlights the minutiae of a scene that the screenwriter is likely to have influenced. This can be different in animation than in live action, whereby a team of story designers and animators actively shape the narrative as the ‘script’ emerges (see Wells, 2010). Nevertheless, if we follow that what we see on screen has been imagined or at least intended by a ‘writer’ of sorts – someone who knows about the composition of screen narratives – then it was rousing to think that this study might provide ‘evidence’ to support long-standing questions (for myself at least) of writing for the screen and authorship. Screenwriters work in layers, building a screenplay from broad aspects such as plot, character and theme, to micro aspects such as scene rhythm, dialogue and visual cues. Being able to ‘prove’ what viewers are looking at, and hoping that this might correlate with a screenwriting perspective of scene composition, was very appealing to me.

I was also interested in what other aspects of the screen viewers might look at, either as glances or as gazes. In some genres of screenwriting, such as comedy, much of the clever work comes around the edges: background characters; ironic landscapes; peripheral visual gags, etc. From a screenwriting perspective, then, it was exciting to think that we might find ways to trace who looks at what, and if indeed the texture of a screenplay is acknowledged by the viewer. The study would be limited and not all aspects could be explored, but as a general method for screen analysis, simply having ideas about what might be revealed led to some very interesting discussions within the team.

Researcher B: All screen theories rest upon a fundamental assumption that different types of content, and different viewing situations, produce different viewing behaviours and effects. Laura Mulvey’s famous theory of the gaze stipulates that classical Hollywood cinema and the traditional exhibition environment (dark cinema, large screen, audience silence) position men as bearers of the look and women as objects of the look, and that avant-garde cinemas avoid this configuration (Mulvey, 1975). New theories of digital cinema speculate upon whether a spectator’s identification with an image is altered when it bears no indexical connection to reality; that is, when the image is a simulated collection of pixels rather than the trace of an event that once took place before a camera (Rodowick, 2007). The phenomenological film theory of Laura Marks suggests that certain kinds of video and multimedia work can engender haptic visuality, where the eyes function like ‘organs of touch’ and the viewer’s body is more obviously involved in the process of seeing that is the case with optical visuality (Marks, 2002: 2-3). It made sense to begin our study into eye tracking by thinking about these different assumptions regarding content and context and formulating methods to analyse them empirically.

For our first project we chose to focus on an assumption regarding spectatorship that is more straightforward and essential than any listed above: namely that viewers can follow a story told only in images. This is an assumption that underpins the ubiquitous presence of the montage sequence in narrative filmmaking, where a large amount of story information is presented in a short, dialogue-free sequence. We hypothesized that by tracking a montage sequence we would be able to ascertain if and how viewers looked at narrative cues, even when these are not the most salient (i.e., large, colourful, moving) features in the scene. The study was in this way designed to start investigating how much film directors and designers can control subjects’ gaze behaviour and top-down (cognitively driven) processes.

The sequence from Up! was chosen in part to act as a ‘control’ against which we could later assess different types of content. The story told in the 4-minute sequence is complex but unambiguous, with its events and emotive power linked by clear relationships of cause and effect. It is in this way a prime example of a classical narrative style of filmmaking, where the emphasis is on communicating story information as transparently as possible (Bordwell, 1985: 160). Our hypothesis was that subjects’ gaze behaviour would be controlled by the tightly directed sequence with its strong narrative cues, and that this study could thereby function as a benchmark against which different types of less story-driven material could be compared later.

Researcher C: A colleague and I set up the Eye Tracking and the Moving Image (ETMI) research group in 2012, following discussions around how evidence was collected to support and investigate current film theory. These conversations grew into a determination to begin a cross-disciplinary research group, initially in Melbourne, to begin working together on these ideas. I had previously been involved in research using eye tracking to study other dynamic stimuli such as decision making processes in sport and the dynamics of signature forgery and detection, and my experience led to a belief that the eye tracker could have enormous potential as a research tool in the analysis and understanding of the moving image. Work on this particular study was inspired by the early aims of a subgroup (of which the other authors are a part), whose members were interested to investigate, in a more objective manner, the effect that narrative cues had on viewer gaze behaviour.

Existing research in our disciplines, and how that influenced our approaches to the study

Researcher A: While there had been research already conducted on eye tracking and the moving image, none of it had focussed on the creational aspects of screen texts: what goes into making a moving image text, before it becomes a finished product to be analysed. Much like screen scholarship that studies in a ‘post event’ way, what was lacking – usefully for us – was input from those who are practitioners themselves. The wider Melbourne-based Eye Tracking and the Moving Image research group within which this study sits has a membership that includes other practitioners, including a sound designer and a filmmaker. Combined, this suggested that our approach might offer something different; that it might ‘do more’ and hopefully speak to the industry as well as other researchers. As a screenwriter, the opportunity to co-research with scholars, scientists and other creative practitioners was therefore not only appealing, but also methodologically important.

As already highlighted, it was both an academic and a practical interest in the intersection of plot, character and theme that underpinned my approach. As Smith has argued, valuing character in screen studies has not always been possible (1995); moving this forward, valuing character, and in particular the character’s journey, has recently become more salient (see Batty, 2011; Marks, 2009), adding weight to a creative practice approach to screen scholarship. In this way, understanding the viewer’s experience of the screen seemed to lend itself well to some of the core concerns of the screenwriter; or to put it another way, had the ability to test what we ‘know’ about creative practice, and the role of the practitioner. Feeding, then, into wider debates about the place of screenwriting in the academy (see Baker, 2013; Price, 2013; 2010), it was important to value the work of the screenwriter, and in a scholarly rigorous – and hopefully innovative – way.

Researcher B: The majority of research on eye tracking and the moving image to date has been designed and undertaken as an extension to cognitive theories of film comprehension. Deriving from the constructivist school of cognitive psychology, and led by film theorist David Bordwell, this approach argues that viewers do not simply absorb but construct the meaning of a film from the data that is presented on screen. This data does not constitute a complete narrative but a series of cues that viewers process by generating inferences and hypotheses (Elsaesser and Buckland, 2002: 170). Bordwell’s approach explicitly opposes psychoanalytic film theory by attending to perceptual and cognitive aspects of film viewing rather than unconscious processes. Psychologist Tim Smith has mobilized eye tracking in connection with Bordwell’s work to demonstrate how this empirical method can “prove” cognitive theories of comprehension—showing that subjects’ eyes do fixate on those cues in a film’s mise-en-scène that the director has controlled through strategies of staging and movement (Smith, 2011; 2013).

The Up study was designed to follow in the wake of Smith’s work, with a particular interest in examining the premise of Bordwell’s theory – which is that narration is the central process that influences the way spectators understand a narrative film (Elsaesser and Buckland, 2002: 170). With this in mind, we deliberately chose a segment from an animated film, where the tightly directed narrative of the montage sequence is competing with a variety of other stimuli that subjects’ eyes could plausibly be attracted to: salient colourful and visibly designed details in the background and landscape of each shot.

We were also interested in this montage sequence for the highly affecting nature of its mini storyline, which establishes the protagonist Carl’s deep love for his wife Ellie as the motivation for his journey in Up! itself. The sequence carries a great deal of emotive power by contrasting the couple’s happiness in their long marriage with Carl’s ultimate sadness and regret at not being able to fulfill their life-long dream of moving to South America before Ellie falls sick and dies. Would it be possible to ‘see’ this emotional impact in viewers’ gaze behaviour?

How we reacted to the initial data, and what it was telling us.

Researcher A: When looking at data for the first time, I certainly saw a correlation between what we know about screenwriting and seeing, and what we could now turn to as evidence. For example, key objects such as the adventure book, the savings jar (see Fig. 1) and the picture of Paradise Falls – all of which recurred throughout the montage sequence – were looked at by viewers intensely, suggesting that narrative meaning was ‘achieved’.

Fig. 1. A heat map showing the collective intensity of viewers’ responses to the savings jar.

Fig. 1. A heat map showing the collective intensity of viewers’ responses to the savings jar.

As another example, when characters were purposely (from a screenwriting perspective) separated within the frame of the action, viewers oscillated between the two, eventually settling on the one they believed to possess the most narrative meaning (see Fig. 2). This further implied the importance of the character journey and its associated sense of theme, which for screenwriting verifies the careful work that has gone into a screenplay to set up narrative expectations.

Fig. 2. A gaze plot showing the fixations and saccades of one viewer in a scene with the prominent faces of Carl and Ellie.

Fig. 2. A gaze plot showing the fixations and saccades of one viewer in a scene with the prominent faces of Carl and Ellie.

Researcher B: We chose to analyse the data on Up! by examining how viewer attention fluctuated in focus between Carl and Ellie across the course of the montage sequence. The two are equal agents in the narrative at the beginning, but the montage’s story unfolds through the action and behaviour of each as it continues – that is, each character carries the story at different points. Overwhelmingly, the data supported this narrative pattern by showing that the majority of viewers fixated on the character who, moment by moment, functions as the agent of the story, even when that figure is not the most salient aspect of the image. Aligning with Bordwell’s cognitive theory of comprehension, this data confirms that viewers do rely principally on narrative cues to understand a film. As a top-down process of cognition, narrative exerts control over viewer attention to keep focus on the story rather than let the gaze wander to other bottom-up (salient) details in the mise-en-scène. It is this process that allowed Smith to show that viewers overwhelmingly will not notice glaring continuity errors on screen (Smith, 2005). As in the famous ‘Gorillas in our Midst’ experiment (Simons and Chabris, 1999), viewer attention is focused so closely on employing narrative schema to spatially, temporally and causally linked events that the salient stimuli on screen appears to be completely missed.

Researcher C: Initially I was quite interested to see the attention paid to faces, and in particular, characters’ eyes and mouths. Being animation, I had been keen to see if similar elements of faces would draw viewers’ eyes in the same ways that we look at human faces, where eyes and mouths are most viewed (Crouzet, et al., 2010). Here, even though the characters were not engaging in dialogue, their mouths as well as their eyes were still searched. Looking at eyes has been linked to looking for contextual emotional information (Guastella et al., 2007), and so with this montage sequence being non-verbal, it was not surprising to see much of the focus on characters’ eyes as viewers attempted to read the emotion though them (see Fig. 3).

Fig, 3. Two viewers’ gaze plots depicting the sequence of fixations made between Carl and Ellie.

Fig, 3. Two viewers’ gaze plots depicting the sequence of fixations made between Carl and Ellie.

Other areas I was interested to observe were instances when other well-known features drew strong viewer attention, such as written text and bright (salient) objects. Two particular scenes we examined contained examples of these. In one scene, in which the savings jar sits at the back of a dark bookshelf, viewers were both drawn to look at the bright candle in the foreground and also to the savings jar. The jar was in the dark, however with narrative cues to draw attention to it as well as the fact that it contained text, viewers were drawn to look at it (see Fig. 1). Surprisingly, in this scene other interesting objects are easily discernible – a wooden colourful bird figure; a guitar; a compass – yet the savings jar as well as the bright candles were viewed. The contextual information, the text and the salience appear to be working here to drive the eye, all within a few seconds of time.

Fig. 4. Gaze plots of fixations made by all viewers over the scene in which Carl purchases airline tickets.

Fig. 4. Gaze plots of fixations made by all viewers over the scene in which Carl purchases airline tickets.

The second scene to see text working as a cue for the eye was in the travel shop scene (Fig. 4). Here, viewers were drawn to look at two text-based posters placed on the back wall of the shop. Again, this scene was only shown momentarily, yet glances towards the text and images, as well as the exchange between the characters, give viewers the elements of the story they need to glean so that they know what is going on, and where the story will go next (Carl’s surprise for Ellie).

How over time we better understood the data, and what we began to know more

Researcher A: I was interested to see that some viewers spent time looking at the periphery. The Up! montage sequence did not necessarily offer ‘alternative’ layers in the margins of the screen, though given its created and controlled animated nature, it perhaps should not be a surprise that away from the centre of the screen there were visual delights, such as the sun setting over the city and a blanket of clouds that changed shape, from clouds to animals to babies. This suggested to me that in animation, because viewers know that images have been created from scratch, there is an expectation that the screen will offer a plethora of experiences, from narrative agency to visual amplification. This, in turn, suggested that in further studies, it might be useful to contrast texts that use the potential of the full screen to engage viewers with those that go in close and privilege the centre. Genre would most likely play a key role in this future endeavour.

Researcher B: As hoped, this pilot study has been instructive as a base from which we can now expand. It has raised many questions. One issue is that this data cannot ‘prove’ subjects were not seeing those elements on-screen that were not fixated upon – were they perhaps seeing them peripherally? This could only be confirmed by conducting interviews after the eye tracking takes place, and could instructively inform an understanding of how story information that is layered in the mise-en-scène (for instance in setting, lighting and costume) contributes to overall narrative comprehension. We are also very interested to determine how the context of viewing affects gaze behaviour. For instance, would subjects still fixate overwhelmingly on narrative cues when watching this sequence in a cinema environment on a large – even an IMAX – screen? In this environment the image on screen is larger and the texture more palpable. Would viewers here perhaps be more focused on these salient pleasures of the image and engage in a different, less cognitive experience of the film; letting their eyes roam across the grain of the shot in its colours, shapes and surfaces? Would results alter between an animated and live action film? Psychoanalytic film theory tells us that the cinematic apparatus promotes identification with characters and, by extension, the ideologies of the social system from which they are produced (Mulvey, 1975). Eye tracking can potentially intervene in this powerful theory of spectatorship by showing if and how viewers do fixate on the cues that give rise to this interpellation.

Researcher C: After looking at some of early scene analyses, I was somewhat surprised by how many eye movements could be made in fleetingly fast scenes, and at how many items in these scenes one could fixate on, if only briefly. I had expected viewers to be taking in some of the surrounding items in a scene using their peripheral vision, and to see more of the centralisation bias (Tatler and Vincent, 2009). Yet for some scenes, in particular for the two scenes in which Carl purchases the surprise airline tickets (see Figs 4 and 5), we see how viewers were drawn to search for narrative clues by looking around the scene.

Fig. 5. Gaze plot showing the fixations made by all viewers as they briefly see the contents of the picnic basket.

Fig. 5. Gaze plot showing the fixations made by all viewers as they briefly see the contents of the picnic basket.

In the first scene (see Fig. 4), Carl in seen in a shop, facing the shop assistant. Viewers had previously seen him in the midst of coming up with a bright idea. This scene thus gives the viewer a chance to work out what his idea was. What can be seen is that most viewers scanned the surrounds for clues. A similar pattern is seen in the next scene, in which we quickly glance at the contents of a picnic basket being carried by Carl (see Fig. 5). In the basket, which is seen close up, viewers scan the basket’s contents. It contains picnic items and the surprise airline ticket, and even though some glances went to other basket items, it was the ticket that captured most of the attention; the item that held the most narrative information. This item was also the most salient, being the clearest and brightest item in the basket, and, importantly, the only item to contain written text. In a very short glimpse of a scene, these features almost ensured that viewers’ eyes were directed to look at and acknowledge the ticket.

What excites us about the future of work in this area, and where we think it might take our own disciplines

Researcher A: If we are to fully embrace the creative practice potential of studies such as this, then we might look to creating new texts that can then be studied. If, in 1971, Norton and Stark created simple drawings to test how their subjects recognised and learned patterns, then over 40 years later, our approach might be to develop a short moving image narrative through which we can test our viewers’ gaze. For example, if we were to develop a short film and play it out of sequence (i.e., narrative meaning altered), might we affect where viewers look? Might they look differently: in different places and for different lengths of time? Similarly, what if we were to musically score a text in different ways, diegetically and non-diegetically? Might we affect the focus of viewer gaze? If so, what might this tell us about narrative attention and filmmaking techniques that sit ‘beyond the screenplay’?

For screenwriting as a discipline, studies such as these would serve two purposes, I feel. Firstly, they would help to strengthen the presence of screenwriting in the academy, especially in regard to innovative research that privileges the role of the practitioner. Accordingly, these studies could provide a variety of methodological approaches that might be of use to other screenwriting scholars; or that might be applied to other creative practice disciplines, in which researchers wish to understand the work that has gone into the creation of a text that might otherwise only be studied once it has been completed. Secondly, and perhaps more importantly, such studies might yield results that benefit, or at least inform, future screenwriting practices. Whether industry-related practices or otherwise, just like all ‘good’ creative practice research, the insights and understandings gained would contribute to the discipline in question in the form of ‘better’ or ‘different’ ways of doing (Harper, 2007). For me, this would reflect both the nature and the value of creative practice research.

Researcher B: All of the potential avenues for future research in this field take an essential interest in how moving images on screen produce a play between top-down and bottom-up cognition. In this, a larger issue for me – going back to the points I raised at the beginning of my section – is how the data can be mobilized beyond a strictly cognitive framework and vocabulary of screen theory. As indicated, the cognitive approach offers a deliberately ‘common sense’ counterpart to a paradigm such as psychoanalysis, with its reliance on myth, desire and fantasy (Elsaesser and Buckland, 2002: 169). Cognitive theory understands a film as a data set that a viewer’s brain processes and completes in an active construction of meaning – an understanding that eye tracking and neurocinematics is very well placed to support and expand. But most screen scholars appreciate and theorize film and television texts as much more than mere sets of data. The moving image is an experience that only ‘works’ by generating emotional affect, by engaging the viewer’s attachments, memories, desires and fears. Film theorist Linda Williams proposes that our investment in following the twists and turns of a narrative is fundamentally reliant upon the emotion of pathos: we continually, pleasurably invest in the expectation that a character will act or be acted upon in such a way that they achieve their goal, and continually, pleasurably have that expectation obscured and dashed by the story (Williams, 1998). So viewer attention is driven not just by a drive to know but also by a desire to feel: to be swept up in waves of hope and disappointment.

The mini storyline of the Up! montage sequence relies entirely on this dialectic of action and pathos. Carl and Ellie’s hopes are repeatedly frustrated, and Carl is finally unable to redeem this pattern before Ellie dies – producing a profound sense of pathos and regret as the defining theme of the sequence. We can see that our subjects’ fixations fell in line with this pattern as the sequence unfolded, consistently focusing on the character who was triggering or carrying the emotional power. But how do we distinguish the ‘felt’ dimension of this gaze out from the viewer’s efforts to simply comprehend what is happening by following characters’ movements, facial expressions or body language? How, that is, can we ‘see’ emotional engagement, and start to appreciate how this crucial dimension of spectatorship – based on feeling not thinking – governs the play between top-down and bottom-up cognition in moving pictures? For me, grappling with this problem – and perhaps experimenting with further measurements of pupil dilation, heart rate and brain activity – offers a fascinating pathway into understanding how eye tracking can move beyond an engagement with cognitive film theory to contribute to phenomenological thinking on genuinely embodied seeing and experience.

Researcher C: There is so much that can be done in this area, and that makes it an exciting pursuit; yet what makes it even more motivating is the way that we hope to go about it: collaboratively. One of the core aspects that members of ETMI are very passionate about is working together, bringing in different fields, different disciplines, different ways of seeing things, and building bridges between them. This work is not only about learning more about how we watch and interact with films, but also about having different perspectives on those insights. Work I would personally like to see undertaken in this way is to explore how black and white viewing compares to colourised viewing, and to explore whether and how 3D viewing affects how we gaze about a scene. To compare the gaze and emotional responses of children and adults to the same visual content, and similarly compare visual and emotional responses to material between males and females, and between genre fans and haters, is also an interesting possibility.

Finally, adding to these, I am excited about the potential collection and analysis of other physiological measures to better gauge emotional engagement. These include blood pressure, pupillometry, skin conduction, breathing rate and volumes, heart rate, sounds made (gasps, holding breath, sighs etc.) and facial expressions made.

Conclusion

By reflecting on each of our research backgrounds, experiences and expectations, what this article has revealed is that while we might have all come to the study with varied approaches and intentions, we have come out of the study with a somewhat surprisingly harmonious set of observations and conclusions. Without knowing it, perhaps, we were all interested in narrative and the role that characters play in the agency of it. We were also similarly interested in landscape and the visual potential of the screen; not in an obvious way, but in relation to subtext, meaning and emotion. The value of a study like this, then, lies not just in its methodological originality, but also in its ability to stir up passions in cross-disciplinary researchers, whereby each can bring to the table their own skills and ways of understanding data to reach mutual and respective conclusions. Although we ‘knew’ this from undertaking the study, the opportunity to reflect fully on the process in the form of an article has given us an even greater understanding of the collaborative potential of cross-disciplinary researchers such as ourselves.

 

References

Aristotle. (1996). Poetics. Trans. Malcolm Heath. London: Penguin.

Baker, Dallas. (2013). Scriptwriting as Creative Writing Research: A Preface. In: Dallas Baker and Debra Beattie (eds.) TEXT: Journal of Writing and Writing Courses, Special Issue 19: Scriptwriting as Creative Writing Research, pp. 1-8.

Batty, Craig, Adrian G. Dyer, Claire Perkins and Jodi Sita. (Forthcoming). Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative. In: CarrieLynn D. Reinhard and Christopher J. Olson (eds.). Making Sense of Cinema: Empirical Studies into Film Spectators and Spectatorship. New York: Bloomsbury.

Batty, Craig. (2013) Creative Interventions in Screenwriting: Embracing Theme to Unify and Improve the Collaborative Development Process. In: Shane Strange and Kay Rozynski. (eds.) The Creative Manoeuvres: Making, Saying, Being Papers – the Refereed Proceedings of the 18th Conference of the Australasian Association of Writing Programs, pp. 1-12.

Batty, Craig. (2011). Movies That Move Us: Screenwriting and the Power of the Protagonist’s Journey. Basingstoke: Palgrave Macmillan.

Batty, Craig and Zara Waldeback. (2008). Writing for the Screen: Creative and Critical Approaches. Basingstoke: Palgrave Macmillan

Bordwell, David. (1985). Narration in the Fiction Film. London: Routledge.

Buswell Guy. T. (1935). How People Look at Pictures. Chicago: Chicago University Press.

Cerf, Moran, E. Paxon Frady and Christof Koch. (2009). Faces and text attract gaze independent of the task: Experimental data and computer model. Journal of Vision, 9(12): 10, pp. 1–15.

Crouzet, Sebastien M., Holle Kirchner and Simon J. Thorpe. (2010). Fast saccades toward faces: Face detection in just 100 ms. Journal of Vision, 10(4): 16, pp. 1–17.

Egri, Lajos. (2004). The Art of Dramatic Writing. New York: Simon & Schuster.

Elsaesser, Thomas and Warren Buckland. (2002). Studying Contemporary American Film: A Guide to Movie Analysis. London: Hodder Headline.

Guastella, Adam J., Philip B. Mitchell and Mark R Dadds. (2008). Oxytocin increases gaze to the eye region of human faces. Biological Psychiatry, 63, pp. 3-5.

Harper, Graeme and Jonathan Rayner. (2010). Cinema and Landscape. Bristol: Intellect.

Harper, Graeme. (2007). Creative Writing Research Today. Writing in Education, 43, p. 64-66.

Hayhoe, Mary. (2000). Vision using routines: A functional account of vision. Visual Cognition, 7, pp. 43–64.

Hockley, Luke. (2007). Frames of Mind: A Post-Jungian Look at Cinema, Television and Technology. Bristol: Intellect.

Land, Michael F., Neil Mennie and Jennifer Rusted. (1999). The roles of vision and eye movements in the control of activities of daily living. Perception, 28, pp. 1311–1328.

Marks, Dara. (2009). Inside Story: The Power of the Transformational Arc. London: A&C Black

Marks, Laura U. (2002). Touch: Sensuous Theory and Multisensory Media.

Minneapolis: University of Minnesota Press.

Mulvey, Laura. (1975). Visual Pleasure and Narrative Cinema. Screen, 16(3), pp. 6-18.

Norton, David, and Lawrence Stark. (1971). Scanpaths in eye movements during pattern perception. Science, 171, pp. 308–311.

Price, Steven. (2013). A History of the Screenplay. Basingstoke: Palgrave Macmillan.

Price, Steven. (2010). The Screenplay: Authorship, Theory and Criticism. Basingstoke: Palgrave Macmillan.

Rodowick, David. (2007). The Virtual Life of Film. Cambridge, MA: Harvard University Press.

Schnotz, Wolfgang and Richard K. Lowe. (2008). A unified view of learning from animated and static graphics. In: Richard K. Lowe and Wolfgang Schnotz (eds.). Learning with animation: Research implications for design. New York: Cambridge University Press, pp. 304-356.

Schütz, Alexander C., Doris I. Braun and Karl R. Gegenfurtner. (2011). Eye movements and perception: A selective review. Journal of Vision, 11(5), pp. 9, 1–30.

Simons, Daniel J. and Christopher F. Chabris. (1999). Gorillas in our Midst: Sustained Inattentional Blindness for Dynamic Events. Perception, 28, pp. 1059-1074.

Smith, Murray (1995). Engaging Characters: Fiction, Emotion, and the Cinema. Oxford: Oxford University Press.

Smith, Tim J. (2005). An Attentional Theory of Continuity Editing. [accessed October 17, 2014].

Smith, Tim J. (2011). Watching You Watch There Will Be Blood. [accessed August 22, 2014].

Smith, Tim J. (2013). Watching you watch movies: Using eye tracking to inform cognitive film theory. In: A. P. Shimamura (ed.). Psychocinematics: Exploring Cognition at the Movies. New York: Oxford University Press, pp. 165-191.

Sobchack, Vivian (1992). The Address of the Eye: A Phenomenology of Film Experience. Princeton, N.J: Princeton University Press.

Stadler, Jane (2010). Landscape and Location in Australian Cinema. Metro, 165.

Tatler, Benjamin W., and Benjamin T. Vincent. (2009). The prominence of behavioural biases in eye guidance. Visual Cognition, 17, pp. 1029–1054.

Tobii Technology (2005). User Manual. Tobii Technology AB. Danderyd, Sweden.

Vogler, Christopher (2007). The Writer’s Journey: Mythic Structure for Writers. Studio City, CA: Michael Wiese Productions.

Wells, Paul (2010). Boards, Beats, Binaries and Bricolage – Approaches to the Animation Script. In: Jill Nelmes (ed.) Analysing the Screenplay, Abingdon: Routledge, pp. 104-120.

Wells, Paul (2007) Basics Animation 01: Scriptwriting. Worthing: AVA Publishing.

Williams, Linda (1998). Melodrama Revised. In: Nick Browne (ed.). Refiguring American Film Genres: History and Theory. Berkeley, CA: University of California Press.

Yarbus, Alfred L. (1967). Eye Movements and Vision. New York: Plenum.

 

List of figures

Fig. 1. A heat map showing the collective intensity of viewers’ responses to the savings jar. Source: author study.

Fig. 2. A gaze plot showing the fixations and saccades of one viewer in a scene with the prominent faces of Carl and Ellie. Source: author study.

Fig, 3. Two viewers’ gaze plots depicting the sequence of fixations made between Carl and Ellie. Source: author study.

Fig. 4. Gaze plots of fixations made by all viewers over the scene in which Carl purchases airline tickets. Source: author study.

Fig. 5. Gaze plot showing the fixations made by all viewers as they briefly see the contents of the picnic basket. Source: author study.

 

Notes

[i] A full analysis of this study, ‘Seeing Animated Worlds: Eye Tracking and the Spectator’s Experience of Narrative’, will appear in the forthcoming collection Making Sense of Cinema: Empirical Studies into Film Spectators and Spectatorship, edited by CarrieLynn D. Reinhard and Christopher J. Olson.

[ii] See Batty, Craig, Dyer, Adrian G., Perkins, Claire and Sita, Jodi (forthcoming) for full results.

 

Bios

Associate Professor Craig Batty is Creative Practice Research Leader in the School of Media and Communication, RMIT University, where he also teaches screenwriting. He is author, co-author and editor of eight books, including Screenwriters and Screenwriting: Putting Practice into Context (2014), The Creative Screenwriter: Exercises to Expand Your Craft (2012) and Movies That Move Us: Screenwriting and the Power of the Protagonist’s Journey (2011). Craig is also a screenwriter and script editor, with experiences across short film, feature film, television and online drama.

Dr Claire Perkins is Lecturer in Film and Screen Studies in the School of Media, Film and Journalism at Monash University. She is the author of American Smart Cinema (2012) and co-editor of collections including B is for Bad Cinema: Aesthetics, Politics and Cultural Value (2014) and US Independent Film After 1989: Possible Films (forthcoming, 2015). Her writing has also appeared in journals including Camera Obscura, Critical Studies in Television, Celebrity Studies and The Velvet Light Trap.

Dr Jodi Sita is Senior Lecturer in the School of Allied Health at the Australian Catholic University. She works within the areas of neuroscience and anatomy, with expertise in eye tracking research. She has extensive experience with multiple project types using eye tracking technologies and other biophysical data. As well as her current research using into viewer gaze patterns while watching moving images, she is using eye tracking to examine expertise in Australian Rules Football League coaches and players, and to examine the signature forgery process.