Language Art in the Age of Panophonia

This article calls attention to the hybrid genre of voice-based performances and its blending of the supposed binaries of human and machinic speech. Using the concept of panophonia, the author refers to the animatronic sculptures of speaking figures created by Ken Feingold and to Mark Böhlen’s talking robots. Through their comparative analysis, the author explores different poetic metalanguages both artists create to deconstruct communicative structures that demarcate post-human era.

Language is a system flowing through people and other systems in various forms. And now, on top of it, we have all these digital layers of data and correlation and collection and circulation, which feedback and have effects on language. It is tangled hierarchy 1 .
-Ian Hatcher The term schizophonia refers to the generalized condition in which recorded sounds, in particular voices, split off from their sources (bodies), are brought out of the context of the original environments, thus making the listeners experience the moments of vocal uncanny. According to Connor, the proliferation of technical devices designed for reproducing, generating and transmitting voices affects the sonic dimension of contemporary culture in such a way that the term schizophonia is out of date; it does no longer properly describe the condition we are all living in -the condition in which prerecorded and synthesized voices have multiplied to such an extent that they create a peculiar "phonesthetic effect" in which the aforementioned voices "rather than being exiled from their origins" (as in the condition of schizophonia), "find a way of being at home everywhere" (Connor 2012, 8). In a nod to Leopold Bloom's observation from Joyce's Ulysses "everything speaks in its own way," Connor notes: Well, everything indeed speaks, but now perhaps not necessarily in its own way -in propria persona -but rather in borrowed accents, mobile turns of phrase, mirrorings, accompaniments, descants, impersonations. (Connor 2012, 8) Not only are we used to listening to those artificial voices "that do not necessarily have human bodies as their source" (Pettman 2017, 7) -they are not 1 From the conversation between Steven Wingate and Ian Hatcher in Rain Taxi. Read the full interview at http://www.raintaxi.com/multiplicity-an-interview-with-ian-hatcher/ I even uncanny anymore -and used to them talking, with almost perfect naturalness and sometimes very intimately 2 , but we are also becoming accustomed to sharing domestic/private spaces with what Cayley calls "vocal transactors," computational entities, such as Amazon's Alexa, who hear us and respond with voice, which, although almost indistinguishable from a human one, is purely synthetic and ventriloquizes the institutions of the "technological and cultural architecture of the Big Software" (Cayley 2017a). For Cayley, the subversive experiments with the newest technologies of speech recognition and voice synthesis can significantly reconfigure "the field of literary practices of all kinds" (Cayley 2017a, 2017b), giving way to aurature, that is the "linguistic works valued for lasting artistic merit that has been expressed in the support media of aurality [rather than visuality]" (Cayley 2017a). If we consider aurature the critical expression of the panophonic condition, these linguistic performances -transacted between the users of the Alexa skill The Listeners (2015) 3 and the Amazon Echo's vocal identity (with the Other Voices performed by Ian Hatcher, entangled 4 )would be one of the most compelling instances of the mixed economy of voice. Connor uses the concept of "vicariances" from Michel Serres' The Parasite to describe this new panophonic system of production, distribution and reception of voice: The very same technological ventriloquism which once made a fetish of the voice now acts to disenchant it, making it less and less apprehensible in itself, or able to speak in its own voice. Just for a brief historical interval, our technological ventriloquisms highlighted the voice, stripping it out from its habitat, and making it the object of fascination, imaginary trauma and enchantment. Technological ventriloquism can now detach us from a restricted economy of voice, in which voices only ever commingle with each other, to a mixed economy, of mediatic translations and transpositions, or what Michel Serres has called "vicariances." (Connor 2012, 9) 2 Dominic Pettman in the first chapter of Sonic Intimacy. Voice, Species, Technics (or, How to Listen to the World) explores the issue of intimate relationships with the voices that speak to us from machines, centering his analysis on Spike Jonze's movie Her (2014). He notices: "One of the genuine gifts of the film Her is the suggestive sense of the ways in which nonhuman voices might soon have the capacity to seduce us into feeling genuine emotions of intimacy and affection. Or perhaps this moment has already arrived. The answer might depend on the listening closely to the various ghost emanating from the diverse machines of the present moment; which themselves (…) constituteand contribute to -the vox mundi. Electronic, prerecorded, and synthesized voices are isolated members of a wider planetary chorus that I am (…) calling "the voice of the world." The machinic or cybernetic voice (…) traces the invisible but affecting line anew between the hailing entity (a radio, for instance) and the interlocutor (who is not necessarily human). And it is the intensity of this relationship, based on acoustic attunement, that reveals sociality itself to be forged through the practice of listening to voices that do not necessarily have human bodies as their source. There is an extrahuman Eros at work in the vox mundi, seducing "us" into forms of recognizing, heeding, and needing a different type of presence, usually reserved for the generic metaphysical Man or human neighbor. Samantha (…) is partly modeled on Siri, Apple's "intelligent personal assistant" (…) And Siri has quickly become an invisible sexual fetish of at least a vocal minority of users". (Pettman 2017, 17).
Cayley's The Listeners project demonstrates potential for practices in "transactive synthetic language in aurality" (Cayley 2017b) and is at the cutting edge of experimental language art. http://impactum-journals.uc.pt/matlit/article/view/5860/4710 Sound File 1. "The Listeners." http://impactum-journals.uc.pt/matlit/article/view/5860/4710 Sound File 2. "The Listeners and Other Voices." This paper aims at selecting projects and practice-based research initiatives from the wider field of new media art, which can be described as foreshadowing the aurature proclaimed by Cayley.

I I. KEN FE IN G O L D' S T AL K I N G H E AD S
Language and words in their auditory form play a significant role in Ken Feingold's art. The most noteworthy installations in his diverse, although internally consistent, list of artistic achievements are minimalistic, non-interactive animatronic installations with a characteristic element of realistic, talking heads. Artificial intelligence software designed by the artist cooperates with speech (or text) recognition systems and speech synthesis systems to allow these creations to "conduct or, rather, present or enact endless improvised conversations" (Kluszczyński 2014, 19) in the presence of the onlookers. Maciej Ożóg, with his insightful observations about Feingold's work, writes: The software Feingold designed allows for generating improvised, predetermined only to a certain degree logic sequences of statements. The torrent of words that streams from their mouths seems to have no end; yet it is the way of talking and of formulating statements that is more surprising than the effusiveness of speaking automats. Alongside with logical sentences, the heads can create statements on the border of poetry and chaotic gibberish. Correctly pronounced words are accompanied by inarticulate sounds that form long sequences of repetitions and rhymes. (...) In their statements triviality borders with existential pathos, while the expressions of courtesy are accompanied by questions concerning ontology and epistemology. Last but not least, the heads reveal the ability of self-reflexivity, they reflect upon their nature, ask about the purpose and the aim of living, they can evaluate themselves and the others, also express their own opinions, fears and desires. (Ożóg 2004, 9) In If/Then (2001) 5 two identical, bodiless androgynous heads submerged in a box filled with Styrofoam pieces bring to mind "replacement parts being shipped from the factory that had suddenly gotten up and begun a kind of existential dialog on the assembly line" (Feingold 2015, online catalogue). The heads speak to each other, calling into question the reality of their own existence, asking: "Is this life?", "What does exist mean for example?", "Can I believe my ears?", "Why can anything be the same as anything else if two things can't be in the same place at the same time?" http://impactum-journals.uc.pt/matlit/article/view/5860/4710 Sound File 3. "If/Then." The repetition of phrases beginning with the pronoun "I" -"I think we are exactly alike," "I feel like I exist," "I feel like I am inside my head," "I feel like I am here," "I think about what things mean," "I can say things that have no meaning" -calls out associations with the "litany" of "I am's," harvested from real-time data feeds of Internet chat rooms and bulletin boards and recited by a synthesized voice in Mark Hansen and Ben Rubin's Listening Post (2002) installation ("I am off," "I am hot," "I am nice," "I am freezing," "I am doing fine," "I am fully awake," "I am comfortable with my assertion," "I am an artist," "I am bored," "I am just a security guide," "I am not god in English").
Both works are similar in their lexical resources and structures of the repeated phrases; however, they differ in the effect produced by their respective synthetic voices. In her eloquent analysis of the sonic dimension of Listening Post, Nori Neumark points out that despite the fact that the creators used only one synthetic voice, the "productive play between [its] thickness and thinness" allowed them to create the impression of the diversity of the internet chatter, causing the spectators visiting the installation to experience an almost intimate connection with the numerous individuals hiding behind usernames in virtual space, "whom the work doesn't reduce to identities or 'presences'" (Neumark 2010, 111). Neumark explains: There is some technical-emotional variation to the synthesized voice -pitch, thickness, reverberance and the way they compose themselves -which makes the "one" synthesized voice sound as if it were multiple voice (...) The thickness of a single voice can be particularly intense and moving (especially in the "I am" scene (...)) -giving a sense of variety and depth of emotion, condensed in one fragment, filtering through that one thickened voice. (Neumark 2010, 110) While the performative voicing of the internet communication exchange in Listening Post "produces a kind of authenticity effect and calls forth identity" (Neumark 2010, 111), the quasi-dialogue which is conducted by the hairless heads in If/Then using one shared voice (If/Then uses Festival/Mbrola SDK and voice) denominates identity and calls into question the notion of the individual self. It is worth noting that in If/Then, the animatronics, the "figures" indeed listen to and answer each other. At times, the homonymy of certain words and the lack of context to explain them leads to misunderstandings and verbal loops, in turn producing a peculiar kind of humor (wordplay) which combines the verbal performance with an element of the absurd. "They would go off on long chains of associations based on something the other one didn't say. They would really veer very far off," Feingold explains, and adds "the speech-recognition engine (…) was a kind of black box -there wasn't much I could do to affect what it understood or misunderstood" (See Shanken 2014, 124). Those accidental mistakes, resulting from the peculiarity of speech recognition systems (Feingold used the ViaVoice SDK for Linux, which IBM released experimentally and later asked all the developers to stop using), were desired by the creator, who has distanced himself from the techno-utopian visions of computer-mediated communication, and is critical of the proclamations regarding the possibility of complete and efficacious exchange between people and artificial intelligence agents based on verbal communication. As noted by Matthew Biro, "the humor of the pseudological (and slightly erotic) If/Then only incompletely masks its trenchant critique of the technologically saturated future that is now emerging" (Biro 2007, 7). When IBM recalled in 2002 the free SDK for a Linux version of ViaVoice, the artist, unable to find an alternative to the IBM product, started to work on text recognition systems (in installations such as Hell (2013), "there are text strings going back and forth between the program objects, and they are analyzing the text and formulating a response, and then they turn the text into speech" (Shanken 2014, 123)). Because this technological change eliminated the accidental wordplay, based on the peculiarity of speech recognition systems, the newest installations had the wordplay designed by the artist. "If misunderstanding/humor occurs, it's intentional and predefined" (Shanken 2014, 124). The theme of dysfunction in communication process appears in many of Feingold's works; in the case of Hell (2013) 6 , misunderstanding is the main focus of the work. Feelings and emotions become the subject of a lively conversation -a "shouting" match between lovers -between two visually identical heads, which nonetheless speak with two different voices (male and female).
http://impactum-journals.uc.pt/matlit/article/view/5860/4710 Sound File 5. "Hell." X: "At time I feel, I want to disturb you." Y: "I love you." X: "How can it be." Y: "Please, don't stop." X: "You are sure?" Y: "You say so." X: "I have been so." Y: "Yes, yes, yes." Y: "So you love to make me feel so irritated." Y: "So, I love you so." Hell is subject to a certain paradox, which Feingold mentions in a conversation with Edward Shanken: the figures stubbornly and continuously attempt to establish a deeper emotional connection; however, due to the fact that they are not able to go through actual affective states, instead only being able to talk about them (and obsessively follow the same trains of thought, using the limited repertoire of clichés), they never reach the stage where they can engage in a real loving discourse (Shanken 2014, 113). However, it must be noted that the artist bestowed upon the "arguing lovers" a certain kind of perceived personalities, which manifest themselves not only through the choice of words, but also the overuse of certain expressions and phrasing (Wójtowicz 2014, 104). In Hell it is also the speaking rate, loudness, voice quality, and the effects of pausing that influence the individual personality impressions that the listeners might develop about the two hairless heads 7 . Lovers from Hell speak with AT&T Natural Voices (Crystal's and Mark's voices were selected for the installation). As artist explained, the illusion of psychological affect that he achieved has more to do with cadences he established in programming while taking into account the peculiarities of the speech technology developed by AT&T Laboratories; no custom effects were used to manipulate vocal performance of the two talking heads (Feingold 2017, [email]).
While there is no way to manipulate voice qualities in the ViaVoice SDK directly, my code for interfacing with the synthesizer has several layers, including one that passes the generated sound output by the synthesizer through the native Linux audio mixer and uses its capabilities. There are ways to code emphasis directly in some other synthesizers, but not in this one. The other thing to mention is that AT&T seems to make some changes between versions to the synthesizer, so each version has its own peculiarities. (Feingold 2017, [email]).

The issue of individuality inscribed within the synthesized voices has been raised recently by John
Cayley: "When it comes to language in aurality -as human or humanoid voice -the situation becomes more complex because we cannot (yet) conceive of the voice that is not marked by human individuality. This implies that whatever a voice inscribes is, minimally, within the diegetic scope of this individuality. If an apparent individual is subject to symbolic process for the production of their language, would this not break their individuality and require a change of voice (perhaps expressed as distinct intonation or accent-think also of acting, drama, and the complexities that this field of aesthetic practice would further introduce)? These are questions (…) that become crucial since the advent of distributed entities that speak and listen -such as Siri, Cortana, Google Now, Watson, and Alexa, all of which are literally embodied as transactive synthetic language. The language of these entities is (…) a synthesis of conventional linguistic image and language-generative symbolic process. Not only are the broken diegeses of this language disguised by their inevitable coincidence with differences that constitute language as such, synthetic language in aurality must also be wrapped within one of the definitive indications of human embodiment, an individual voice. This renders the implicated symbolic processes compelling in so far as they acquire a compelling relationship with embodied humanity. The resultant voices are not, by the way, necessarily 'uncanny' (disturbingly human-seeming non-human). They are something more troubling than that." (Cayley, 2017b, note 6 on page 6).
To some extent, the aesthetic experience of the Hell installation is reminiscent of what is experienced by the spectators of Bill Vorn's "hysterical machines" performances 8 . The audiovisual show of dysfunctional and deviant behaviors of the machines designed by the Canadian artist, as well as the nervous verbal performance of Feingold's neurotic chatterbots with their incomplete body prosthetics, expresses the paradoxical nature of artificial life. As Maciej Ożóg notices: The bizarre personality of the heads, which manifests in characteristics of their speech, contrasts with the common idea about such artificial creatures. (...) A normal conversation with a computer should be predictable and logical, therefore comprehensible, aimed at a clearly defined goal and effective, whereas the artificially intelligent partner should be friendly, polite, helpful and user-oriented. Such notion of a talking digital identity is a direct result of the tradition of "strong AI" that defines intelligence in the categories of logical problem solving and rational symbolic representation. Feingold's chatterbots, not fulfilling this pattern, or so to say, overtly contesting it, on the one hand raise the question about the meaning of irrational, unconscious and pre-symbolic forms of knowledge and communication and the role they play in establishing both inter-human and human-machine relationship. On the other hand, they refer to the influence of the reductive vision of intelligence on the ways of perceiving technology as well as on the understanding of human nature (Ożóg 2009, 4)

I II . AM Y AN D K L AR A. M A R K B ÖH L EN ' S R O BO T S W I TH B AD AC CEN T S
Feingold's animatronic heads, which act like "caricatures of the fully functional chatterbots," are a means of criticizing the general assumptions about AI and deconstructing the myths arising around it, perpetuated by popular culture texts (Ożóg 2009, 5). Prejudices in humanoid robot design also become the subject of the practice-based research initiative conducted by Mark Böhlen from RTS Research. One of the more curious examples of Böhlen's speculative robotics are Amy and Klara (2006) 9 , two chatterbots, which draw topics for their conversations from an internet tabloid website -Salon.com. The Amy and Klara installation was created as a part of a bigger research and artistic endeavor under the name Make Language Project 10 , whose aim is to think critically about technology of speech synthesis and its perception in a wider socio-cultural context. The technological basis for the programmed conversation between Amy and Klara is built from following systems: text analysis (selected agent applications search the Salon.com website providing Amy and Klara with vocabulary and topics for the argument), speech synthesizers (speech synthesis engine SVOX allows the robots to speak to each other), speech recognition (Amy and Klara hear each other and react to the uttered sentences thanks to the capabilities of the speech recognition engine FONIX) 11 . A machine vision module is also used in the installation (the robots grow quieter whenever people appear in their immediate vicinity as well as manifest verbal aggression when they see the color pink). The performance of Amy and Klara is one of conflict, manifesting itself through profane language (normally filtered from ASR products) and expletives the robots aim at each other. The source of conflict stems from misunderstandings, which partially arise from the peculiarity of the speech recognition system (similarly to Feingold's If/Then) but it is in the first place predetermined by the software written by Böhlen: The results from the speech recognizer as well as the physical transmission of utterances from speaker to microphone are error prone. Even the best speech recognizers offer often spotty recognition (...) Hence miscommunication is unavoidable. If several misunderstandings occur in any given time frame, aggression, for which the agents have a programmatic disposition, increases and foul language comes to play. (Böhlen 2006, 11) Additionally, one of the robots -Klara -speaks with a simulated thick German accent ("this accent is generated in real time by swapping select vowels and consonants between the VOX language models for German and English and applying several ad hoc SAMBA alphabet based phonetic remappings for special cases" -explains Böhlen), which greatly increases the probability of a communicational fiasco (Böhlen 2006, 12), and at the same time, gives a kind of individuality to this bodiless artificial entity.
http://impactum-journals.uc.pt/matlit/article/view/5860/4710 Sound File 6. "Amy and Klara." K: "Leave me alone." A: "What is wrong with you?" K: "Leave me alone, please." A: "Weirdo." K: "Aha." A: "You are such a dork." Amy and Klara's performance -quarrelsome communication agents embodied in pink boxes (while designing the robots Böhlen purposefully rejected physical anthropomorphism) -provokingly crosses the boundary of permissible AI behaviors, expressing in this way the criticism of the prevailing conventions within the framework of normative paradigm of HCI design ("benevolence and politeness are problematic machine design guidelines -Böhlen explainsby normalizing android culture, one loses opportunities for interaction forms that are uncomfortable and problematic but, potentially, rich and complex" (Böhlen 2008, 2)). Discussion concerning the philosophical assumptions of general artificial intelligence has been initiated by many works in the field of new media art, some of which could be discussed here; nevertheless, Böhlen's project is especially noteworthy due to focusing not on the problem of identity/subjectivity of intelligent agents themselves, but rather on raising important questions about digital language itself and synthetic speech technology as a cultural phenomenon.
Synthetic speech research (...) is another victim of the division of labor, as it were, that has established itself between the engineering sciences and the humanities and arts. This would be just another instance of (...) often lamented disciplinary specialization if we did not have to repeatedly listen to the consequences on telephones and hear them in automobile navigation systems. How different might voice enabled machines sound and behave if they were informed by Wittgenstein's insight into meaning of words arising only from their use, or Rose's elaborately choreographed word games (...). Imagine if they knew about Blonk's powerful vocal tract, tongue and cheek skills that create sounds so odd they seem un-human and at times machinic, or the novelist Albahari, who surmised in recent work the minimum number of words one actually needs to function. (Böhlen 2008, 7) Citing experiments on simulations of language evolution in embodied robots conducted by computational linguistic researchers, he also predicts: Languages that are human in origin will be altered and amended by their use in machines in similar ways as popular culture alters and adds to the corpora of English language. There will be new figures of speech (Böhlen 2008, 7).
Voice transactive Artificial Intelligence robots, such as Alexa, are the most intriguing ones among those figures.

I V. C ON CL U S I ON S
In the projects discussed in this paper the raised issues are often similar to those which John Cayley assumed as the main focus of his artistic practices -issues concerning synthetic language; the robot imaginary; reconfiguration of robotics; identity; the voice and individuality; artificial intelligence and identified artificial intelligence; digital subjectivity (Cayley 2017a(Cayley , 2017b. Aurature, proclaimed by Cayley, along with the assumptions at its core, follows the pattern of the so-called "sonic turn" 12 in contemporary art, which shows appreciation for 12 In Listening Awry Jim Drobnic wrote: "Although an aural equivalent to "visual studies" has yet to become firmly established in the academy, there is a nevertheless a distinct and vibrant "sonic turn" that can be discerned in the recent upsurge in sound-based scholarship and artistic work. A sound, while at the same time investigating its historical and cultural place as one of secondary importance to visual media. John Barber's practice-based research and artistic projects, probing the potential of digital sound as a base for e-literary experience (Barber 2014 13 also addresses this "sonic turn"). Considering the importance the transactive synthetic language can have on artistic practices as well as human culture, we should invest more energy into the studies on technologies which allow current cloud-based distributed entities -such as Siri, Cortana, Google Now, Watson, and Alexa -to listen to us and themselves and to speak. As platform studies promote the investigation of computing systems that shape and support the creative works of electronic literature, both hardware and software, we should pay special attention to technologies of text-to-speech recognition, voice recognition and speech synthesis.