Back in September, I wrote a post about generative AI and photographic transparency. The gist of it was this: Kendall Walton famously argued that I actually see Karl Marx when I look at a photograph of him, in a way I don’t when I look at a painting. The painting is mediated by the beliefs of the painter in a way that the photograph is not mediated by the photographer’s beliefs. So, I asked, what about an AI-generated image of Marx?
As I said in a footnote to that post, I wasn’t very happy with my answer to the question. As it happens, my Philosophy of Art class got interested in photographic transparency all on their own. So I made a mid-semester adjustment, added it to the syllabus, reread the Walton essay, and taught it to students in October. It turns out there was a part of the essay that I had forgotten when I wrote my post in September, and Walton gives us the resources for a better answer to the puzzle of AI-generated images.
Near the end of the essay, Walton considers a machine which detects incoming light and prints out a text description of the scene.1 He thinks it is obvious that reading the output of such a device would not count as seeing the scene. Because the text descriptions would not be mediated by someone else’s beliefs (the way a painting is) there must be some further feature required for transparency— a feature that photographs have but which machine-generated text descriptions lack.
The feature (Walton suggests) is that the kinds of mistakes we are inclined to make with photographs are like those that we make with ordinary direct seeing, but that text descriptions lend themselves to entirely different mistakes. He writes:
A house is easily confused with a horse or a hearse, when our information comes from a verbal description, as is a cat with a cot, a madam with a madman, intellectuality with ineffectuality, and so on. When we confront things directly or via pictures, houses are more apt to be confused with barns or woodsheds, cats with puppies, and so forth.2
Now, a generative AI is roughly the reverse of the machine that Walton imagined. Given a text prompt, it generates a digital image. The fact that it has text at one end makes it prone to the sorts of errors that Walton highlights. It trips up on homophones, depicting fruit when prompted for an orange color scheme. And things go wrong when the prompt is not spelled correctly. To consider a specific generative AI: Midjourney reliably generates pictures that look like Karl Marx when prompted for Karl Marx, but a prompt for Karl Marks yields a street scene, a jar of ointment, a bird perched on a door knob, and this guy (below).