Parrot progress

Emily Bender famously coined the phrase stochastic parrot to describe text-only chatbots. The trend towards parrots continutes: Ars Technica has a clip of a ChatGPT-4o test run where the bot, which has a canned voice it is supposed to use, replies in the user’s own voice.

OpenAI promises that the actual release version totally doesn’t do this.¹

In a forthcoming paper, I argue that a big problem with text chatbots— what makes them engines of bullshit— is that they have no model of the world apart from their model of discourse. So they string together words based on words, with no further architecture connecting the structure of words to the structure of things.

ChatGPT-4o is multi-modal, meaning that it works in sounds and pictures as well as text. Instead of providing separate constraints on one another, however, all of those are just more data that the big network is trained on. It has no model of anything apart from its monolithic model of everything. Just as a chatbot riffs on topics with an indifference to truth, a multi-modal chatbot can also make sounds with an indifference to things that it sounds like.

Open AI’s solution is to run an additional algorithm on the output. If the separate check detects output that doesn’t use the canned voice that they want it to use, the user never hears it.

Share this:

Leave a ReplyCancel reply