Untitled Document

In a talk given at the Pompidou Centre in 2012, Steven Connor proposed a new term — panophonia — as an update to Schafer’s concept of schizophonia. The term schizophonia refers to the generalized condition in which recorded sounds, in particular voices, split off from their sources (bodies), are brought out of the context of the original environments, thus making the listeners experience the moments of vocal uncanny. According to Connor, the proliferation of technical devices designed for reproducing, generating and transmitting voices affects the sonic dimension of contemporary culture in such a way that the term schizophonia is out of date; it does no longer properly describe the condition we are all living in — the condition in which prerecorded and synthesized voices have multiplied to such an extent that they create a peculiar “phonesthetic effect” in which the aforementioned voices “rather than being exiled from their origins” (as in the condition of schizophonia), “find a way of being at home everywhere” (Connor 2012, 8). In a nod to Leopold Bloom’s observation from Joyce’s Ulysses “everything speaks in its own way,” Connor notes:

Not only are we used to listening to those artificial voices “that do not necessarily have human bodies as their source” (Pettman 2017, 7) — they are not even uncanny anymore — and used to them talking, with almost perfect naturalness and sometimes very intimately [2], but we are also becoming accustomed to sharing domestic/private spaces with what Cayley calls “vocal transactors,” computational entities, such as Amazon’s Alexa, who hear us and respond with voice, which, although almost indistinguishable from a human one, is purely synthetic and ventriloquizes the institutions of the “technological and cultural architecture of the Big Software” (Cayley 2017a). For Cayley, the subversive experiments with the newest technologies of speech recognition and voice synthesis can significantly reconfigure “the field of literary practices of all kinds” (Cayley 2017a, 2017b), giving way to aurature, that is the “linguistic works valued for lasting artistic merit that has been expressed in the support media of aurality [rather than visuality]” (Cayley 2017a). If we consider aurature the critical expression of the panophonic condition, these linguistic performances — transacted between the users of the Alexa skill The Listeners (2015) [3] and the Amazon Echo’s vocal identity (with the Other Voices performed by Ian Hatcher, entangled [4]) —would be one of the most compelling instances of the mixed economy of voice. Connor uses the concept of “vicariances” from Michel Serres’ The Parasite to describe this new panophonic system of production, distribution and reception of voice:

Cayley’s The Listeners project demonstrates potential for practices in “transactive synthetic language in aurality” (Cayley 2017b) and is at the cutting edge of experimental language art.

This paper aims at selecting projects and practice-based research initiatives from the wider field of new media art, which can be described as foreshadowing the aurature proclaimed by Cayley.

Language and words in their auditory form play a significant role in Ken Feingold’s art. The most noteworthy installations in his diverse, although internally consistent, list of artistic achievements are minimalistic, non-interactive animatronic installations with a characteristic element of realistic, talking heads. Artificial intelligence software designed by the artist cooperates with speech (or text) recognition systems and speech synthesis systems to allow these creations to “conduct or, rather, present or enact endless improvised conversations” (Kluszczyński 2014, 19) in the presence of the onlookers. Maciej Ożóg, with his insightful observations about Feingold’s work, writes:

In If/Then (2001) [5] two identical, bodiless androgynous heads submerged in a box filled with Styrofoam pieces bring to mind “replacement parts being shipped from the factory that had suddenly gotten up and begun a kind of existential dialog on the assembly line” (Feingold 2015, online catalogue). The heads speak to each other, calling into question the reality of their own existence, asking: “Is this life?”, “What does exist mean for example?”, “Can I believe my ears?”, “Why can anything be the same as anything else if two things can’t be in the same place at the same time?”

The repetition of phrases beginning with the pronoun “I” — “I think we are exactly alike,” “I feel like I exist,” “I feel like I am inside my head,” “I feel like I am here,” “I think about what things mean,” “I can say things that have no meaning” — calls out associations with the “litany” of “I am’s,” harvested from real-time data feeds of Internet chat rooms and bulletin boards and recited by a synthesized voice in Mark Hansen and Ben Rubin’s Listening Post (2002)installation (“I am off,” “I am hot,” “I am nice,” “I am freezing,” “I am doing fine,” “I am fully awake,” “I am comfortable with my assertion,” “I am an artist,” “I am bored,” “I am just a security guide,” “I am not god in English”).

Both works are similar in their lexical resources and structures of the repeated phrases; however, they differ in the effect produced by their respective synthetic voices. In her eloquent analysis of the sonic dimension of Listening Post,Nori Neumark points out that despite the fact that the creators used only one synthetic voice, the “productive play between [its] thickness and thinness” allowed them to create the impression of the diversity of the internet chatter, causing the spectators visiting the installation to experience an almost intimate connection with the numerous individuals hiding behind usernames in virtual space, “whom the work doesn’t reduce to identities or ‘presences’” (Neumark 2010, 111). Neumark explains:

While the performative voicing of the internet communication exchange in Listening Post “produces a kind of authenticity effect and calls forth identity” (Neumark 2010, 111), the quasi-dialogue which is conducted by the hairless heads in If/Then using one shared voice (If/Then uses Festival/Mbrola SDK and voice)denominates identity and calls into question the notion of the individual self. It is worth noting that in If/Then, the animatronics, the “figures” indeed listen to and answer each other. At times, the homonymy of certain words and the lack of context to explain them leads to misunderstandings and verbal loops, in turn producing a peculiar kind of humor (wordplay) which combines the verbal performance with an element of the absurd. “They would go off on long chains of associations based on something the other one didn’t say. They would really veer very far off,” Feingold explains, and adds “the speech-recognition engine (…) was a kind of black box —there wasn’t much I could do to affect what it understood or misunderstood” (See Shanken 2014, 124). Those accidental mistakes, resulting from the peculiarity of speech recognition systems (Feingold used the ViaVoice SDK for Linux, which IBM released experimentally and later asked all the developers to stop using), were desired by the creator, who has distanced himself from the techno-utopian visions of computer-mediated communication, and is critical of the proclamations regarding the possibility of complete and efficacious exchange between people and artificial intelligence agents based on verbal communication. As noted by Matthew Biro, “the humor of the pseudological (and slightly erotic) If/Then only incompletely masks its trenchant critique of the technologically saturated future that is now emerging” (Biro 2007, 7). When IBM recalled in 2002 the free SDK for a Linux version of ViaVoice, the artist, unable to find an alternative to the IBM product, started to work on text recognition systems (in installations such as Hell (2013), “there are text strings going back and forth between the program objects, and they are analyzing the text and formulating a response, and then they turn the text into speech” (Shanken 2014, 123)). Because this technological change eliminated the accidental wordplay, based on the peculiarity of speech recognition systems, the newest installations had the wordplay designed by the artist. “If misunderstanding/humor occurs, it’s intentional and predefined” (Shanken 2014, 124). The theme of dysfunction in communication process appears in many of Feingold’s works; in the case of Hell (2013) [6], misunderstanding is the main focus of the work. Feelings and emotions become the subject of a lively conversation — a “shouting” match between lovers — between two visually identical heads, which nonetheless speak with two different voices (male and female).

Hell is subject to a certain paradox, which Feingold mentions in a conversation with Edward Shanken: the figures stubbornly and continuously attempt to establish a deeper emotional connection; however, due to the fact that they are not able to go through actual affective states, instead only being able to talk about them (and obsessively follow the same trains of thought, using the limited repertoire of clichés), they never reach the stage where they can engage in a real loving discourse (Shanken 2014, 113). However, it must be noted that the artist bestowed upon the “arguing lovers” a certain kind of perceived personalities, which manifest themselves not only through the choice of words, but also the overuse of certain expressions and phrasing (Wójtowicz 2014, 104). In Hell it is also the speaking rate, loudness, voice quality, and the effects of pausing that influence the individual personality impressions that the listeners might develop about the two hairless heads [7]. Lovers from Hell speak with AT&T Natural Voices (Crystal’s and Mark’s voices were selected for the installation). As artist explained, the illusion of psychological affect that he achieved has more to do with cadences he established in programming while taking into account the peculiarities of the speech technology developed by AT&T Laboratories; no custom effects were used to manipulate vocal performance of the two talking heads (Feingold 2017, [email]).

To some extent, the aesthetic experience of the Hell installation is reminiscent of what is experienced by the spectators of Bill Vorn’s “hysterical machines” performances [8]. The audiovisual show of dysfunctional and deviant behaviors of the machines designed by the Canadian artist, as well as the nervous verbal performance of Feingold’s neurotic chatterbots with their incomplete body prosthetics, expresses the paradoxical nature of artificial life. As Maciej Ożóg notices:

Feingold’s animatronic heads, which act like “caricatures of the fully functional chatterbots,” are a means of criticizing the general assumptions about AI and deconstructing the myths arising around it, perpetuated by popular culture texts (Ożóg 2009, 5). Prejudices in humanoid robot design also become the subject of the practice-based research initiative conducted by Mark Böhlen from RTS Research. One of the more curious examples of Böhlen’s speculative robotics are Amy and Klara (2006) [9], two chatterbots, which draw topics for their conversations from an internet tabloid website — Salon.com. The Amy and Klara installation was created as a part of a bigger research and artistic endeavor under the name Make Language Project [10], whose aim is to think critically about technology of speech synthesis and its perception in a wider socio-cultural context. The technological basis for the programmed conversation between Amy and Klara is built from following systems: text analysis (selected agent applications search the Salon.com website providing Amy and Klara with vocabulary and topics for the argument), speech synthesizers (speech synthesis engine SVOX allows the robots to speak to each other), speech recognition (Amy and Klara hear each other and react to the uttered sentences thanks to the capabilities of the speech recognition engine FONIX) [11]. A machine vision module is also used in the installation (the robots grow quieter whenever people appear in their immediate vicinity as well as manifest verbal aggression when they see the color pink). The performance of Amy and Klara is one of conflict, manifesting itself through profane language (normally filtered from ASR products) and expletives the robots aim at each other. The source of conflict stems from misunderstandings, which partially arise from the peculiarity of the speech recognition system (similarly to Feingold’s If/Then) but it is in the first place predetermined by the software written by Böhlen:

Additionally, one of the robots — Klara — speaks with a simulated thick German accent (“this accent is generated in real time by swapping select vowels and consonants between the VOX language models for German and English and applying several ad hoc SAMBA alphabet based phonetic remappings for special cases” — explains Böhlen), which greatly increases the probability of a communicational fiasco (Böhlen 2006, 12), and at the same time, gives a kind of individuality to this bodiless artificial entity.

Amy and Klara’s performance — quarrelsome communication agents embodied in pink boxes (while designing the robots Böhlen purposefully rejected physical anthropomorphism) — provokingly crosses the boundary of permissible AI behaviors, expressing in this way the criticism of the prevailing conventions within the framework of normative paradigm of HCI design (“benevolence and politeness are problematic machine design guidelines — Böhlen explains — by normalizing android culture, one loses opportunities for interaction forms that are uncomfortable and problematic but, potentially, rich and complex” (Böhlen 2008, 2)). Discussion concerning the philosophical assumptions of general artificial intelligence has been initiated by many works in the field of new media art, some of which could be discussed here; nevertheless, Böhlen’s project is especially noteworthy due to focusing not on the problem of identity/subjectivity of intelligent agents themselves, but rather on raising important questions about digital language itself and synthetic speech technology as a cultural phenomenon.

Citing experiments on simulations of language evolution in embodied robots conducted by computational linguistic researchers, he also predicts:

Voice transactive Artificial Intelligence robots, such as Alexa, are the most intriguing ones among those figures.

In the projects discussed in this paper the raised issues are often similar to those which John Cayley assumed as the main focus of his artistic practices — issues concerning synthetic language; the robot imaginary; reconfiguration of robotics; identity; the voice and individuality; artificial intelligence and identified artificial intelligence; digital subjectivity (Cayley 2017a, 2017b). Aurature, proclaimed by Cayley, along with the assumptions at its core, follows the pattern of the so-called “sonic turn” [12] in contemporary art, which shows appreciation for sound, while at the same time investigating its historical and cultural place as one of secondary importance to visual media. John Barber’s practice-based research and artistic projects, probing the potential of digital sound as a base for e-literary experience (Barber 2014 [13] also addresses this “sonic turn”). Considering the importance the transactive synthetic language can have on artistic practices as well as human culture, we should invest more energy into the studies on technologies which allow current cloud-based distributed entities — such as Siri, Cortana, Google Now, Watson, and Alexa — to listen to us and themselves and to speak. As platform studies promote the investigation of computing systems that shape and support the creative works of electronic literature, both hardware and software, we should pay special attention to technologies of text-to-speech recognition, voice recognition and speech synthesis.

BARBER, John (2014). “Internet radio and electronic literature: locating the text in the act of listening.” Electronic Book Review,http://www.electronicbookreview.com/thread/electropoetics/internetradio
BIRO, Matthew (2007). “Introduction.” In Feingold, Ken. Selected Works 1978-2007. New York: Ken Feingold Studio. 5-7.
BÖHLEN, Mark (2006). “When a Machine Picks a Fight. Notes on Machinic Male-Dicta and synthetic hissy fits.” CHI 2006 Workshop: Misuse and Abuse of Interactive Technologies. Montréal Québec Canada. 9-12.
–––––––––– (2008). “Robots with Bad Accent: Living with Synthetic Speech.” Leonardo. Vol. 4.3: 209-214. doi: 10.1162/leon.2008.41.3.209.
CAYLEY, John (2017a). “Aurature at the End(s) of Electronic Literature.” Electronic Book Review. http://electronicbookreview.com/thread/electropoetics/aurature.
–––––––––– (2017b). “Reconfiguration: Symbolic Image and Language Art.” Humanities 6.1, 8. doi:10.3390/h6010008.
CONNOR, Steven (2012). “Panophonia.”A talk given at Pompidou Centre, 22 February 2012. http://www.stevenconnor.com/panophonia/panophonia.pdf.
DROBNICK, Jim (2004). “Listening Awry.”Jim Drobnick, ed. Aural Cultures. Toronto: Banff, YYZ Books/ Walter Phillips Gallery Editions, 2004. 9–18.
FEINGOLD, Ken (2007). Selected Works 1978-2007. New York: Ken Feingold Studio.
–––––––––– (2017). “Concerning If/Then and Hell.” [email to the author]
FLORENCE, Penny (2016). A Review Essay: John Cayley’s The Listeners. Hyperrhiz 14. http://hyperrhiz.io/hyperrhiz14/reviews/1-florence-aurature.html
HATCHER, Ian (2017). “Multiplicity: An Interview with Ian Hatcher.” Interviewed by Steven Wingate. Rain Taxi Online Edition. Published 8 May, 2017.http://www.raintaxi.com/multiplicity-an-interview-with-ian-hatcher/.
KLOBUCAR, Andrew (2017). “Programming’s Turn: Computation and Poetics.” Humanities 6.2, 27. doi:10.3390/h6020027 .
KLUSZCZYŃSKI, Ryszard Waldemar, ed. (2014). Robotic Art and Culture: Bill Vorn and his Hysterical Machines. Gdańsk: Centrum Sztuki Współczesnej “Łaźnia”.
KLUSZCZYŃSKI, Ryszard Waldemar (2014).“Finding Oneself in Others. Introductory Reflections on Ken Feingold’s Art.” Ed. Ryszard Waldemar Kluszczyński. Ken Feingold: Figury mowy/Figures of Speech. Gdańsk: Centrum Sztuki Współczesnej “Łaźnia”. 6-23.
LABELLE, Brandon (2010). “Raw Orality: Sound Poetry and Live Bodies.” Ed. Norie Neumark, with Ross Gibson, and Theo van Leeuwen. VOICE: Vocal Aesthetics in Digital Art and Media. Cambridge, MA: MIT Press.147-71.
NEUMARK, Norie with Ross Gibson and Theo Van Leeuwen (2010). VOICE: Vocal Aesthetics in Digital Art and Media. Cambridge, MA: MIT Press.
NEUMARK, Norie (2010). “Doing Things with Voices: Performativity and Voice.” Ed. Norie Neumark, with Ross Gibson and Theo van Leeuwen. VOICE: Vocal Aesthetics in Digital Art and Media. Cambridge, MA: MIT Press. 95-119.
OŻÓG, Maciej (2009). “Art Investigating Science: Critical Art as a Meta-discourse of Science.” Proceedings of the Digital Arts and Culture Conference, University of California, December 12-15, 2009.
PETTMAN, Dominic (2017). Sonic Intimacy: Voice, Species, Technics (or, How to Listen to the World). Stanford, CA: Stanford University Press.
SHANKEN, Edward (2014). “Love is a Good Place to Start: Interview with Ken Feingold.” Ed. Ryszard Waldemar Kluszczyński. Ken Feingold: Figury mowy/Figures of Speech. Gdańsk: Centrum Sztuki Współczesnej “Łaźnia”. 108-137.
WÓJTOWICZ, Ewa (2014). “Writing Personalities: Art Vis-á-vis Artificial Intelligence. Ken Feingold’s Figures of Speech.” Ed. Ryszard Waldemar Kluszczyński. Ken Feingold: Figury mowy/Figures of Speech. Gdańsk: Centrum Sztuki Współczesnej “Łaźnia”. 76-107.

[1] From the conversation between Steven Wingate and Ian Hatcher in Rain Taxi. Read the full interview at http://www.raintaxi.com/multiplicity-an-interview-with-ian-hatcher/

[2] Dominic Pettman in the first chapter of Sonic Intimacy. Voice, Species, Technics (or, How to Listen to the World) explores the issue of intimate relationships with the voices that speak to us from machines, centering his analysis on Spike Jonze’s movie Her (2014). He notices: “One of the genuine gifts of the film Her is the suggestive sense of the ways in which nonhuman voices might soon have the capacity to seduce us into feeling genuine emotions of intimacy and affection. Or perhaps this moment has already arrived. The answer might depend on the listening closely to the various ghost emanating from the diverse machines of the present moment; which themselves (…) constitute – and contribute to — the vox mundi. Electronic, prerecorded, and synthesized voices are isolated members of a wider planetary chorus that I am (…) calling “the voice of the world.” The machinic or cybernetic voice (…) traces the invisible but affecting line anew between the hailing entity (a radio, for instance) and the interlocutor (who is not necessarily human). And it is the intensity of this relationship, based on acoustic attunement, that reveals sociality itself to be forged through the practice of listening to voices that do not necessarily have human bodies as their source. There is an extrahuman Eros at work in the vox mundi, seducing “us” into forms of recognizing, heeding, and needing a different type of presence, usually reserved for the generic metaphysical Man or human neighbor. Samantha (…) is partly modeled on Siri, Apple’s “intelligent personal assistant” (…) And Siri has quickly become an invisible sexual fetish of at least a vocal minority of users”. (Pettman 2017, 17).

[3] For further details about The Listeners by John Cayley, see Penny Forence’s insightful review published in Hyperrhiz. See also the recorded performance of transactional conversation with The Listeners at the Kitchen, NYC, Sept 10, 2016.

[4] For further details about Ian Hatcher’s vocal performances in which he convincingly imitates the synthetic speech technology with the analog instrument of his own voice read Andrew Klobucar’s article in Humanities. doi: 10.3390/h6020027

[5] See documentation clip from If/Then: https://www.youtube.com/watch?v=y_8mKgoYmFc.

[6] See documentation clip of Hell: https://www.youtube.com/watch?v=weP3JKC-IRI.

[7] The issue of individuality inscribed within the synthesized voices has been raised recently by John Cayley: “When it comes to language in aurality — as human or humanoid voice — the situation becomes more complex because we cannot (yet) conceive of the voice that is not marked by human individuality. This implies that whatever a voice inscribes is, minimally, within the diegetic scope of this individuality. If an apparent individual is subject to symbolic process for the production of their language, would this not break their individuality and require a change of voice (perhaps expressed as distinct intonation or accent—think also of acting, drama, and the complexities that this field of aesthetic practice would further introduce)? These are questions (…) that become crucial since the advent of distributed entities that speak and listen — such as Siri, Cortana, Google Now, Watson, and Alexa, all of which are literally embodied as transactive synthetic language. The language of these entities is (…) a synthesis of conventional linguistic image and language-generative symbolic process. Not only are the broken diegeses of this language disguised by their inevitable coincidence with differences that constitute language as such, synthetic language in aurality must also be wrapped within one of the definitive indications of human embodiment, an individual voice. This renders the implicated symbolic processes compelling in so far as they acquire a compelling relationship with embodied humanity. The resultant voices are not, by the way, necessarily ‘uncanny’ (disturbingly human-seeming non-human). They are something more troubling than that.” (Cayley, 2017b, note 6 on page 6).

[8] For more information about Bill Vorn’s artistic projects see: Kluszczyński, Ryszard Waldemar. 2014. ed. Robotic Art and Culture: Bill Vorn and his Hysterical Machines. Gdańsk: Centrum Sztuki Współczesnej “Łaźnia”.

[9] See documentation clip of Amy and Klara at ISEA 2006: http://www.realtechsupport.org/repository/male-dicta.html.

[10] See website: http://www.realtechsupport.org/repository/accents.html.

[11] For more technical details read the technical note in the CHI workshop paper: http://www.realtechsupport.org/pdf/CHI2006.pdf.

[12] In Listening Awry Jim Drobnic wrote: “Although an aural equivalent to “visual studies” has yet to become firmly established in the academy, there is a nevertheless a distinct and vibrant “sonic turn” that can be discerned in the recent upsurge in sound-based scholarship and artistic work. A phrase such as “sonic turn” — referring to the increasing significance of the acoustic as simultaneously a site for analysis, a medium for aesthetic engagement, and a model for theorization — self-consciously echoes W.J.T. Mitchell’s articulation of a “pictorial turn” (…).” (Drobnic 2004, 10).

[13] John Barber raises the following provoking questions: “Understanding the primacy of sound in human narrative, may we not reconsider sound as a basis for engagement with emerging forms of electronic literature? Rather than augmenting the visual text, cannot sound be the text? And in addition to the human voice or music, or even in lieu of, cannot the aural narrative of a work of electronic literature be comprised completely of environmental and/or mechanical sounds, or even what otherwise might be thought of as noise, all figures from the ground of acoustic space?” (Barber 2014). See also John Barber’s website: http://www.nouspace.net/john/