After recent posts about AI image generation, I’ve been mulling over whether there are any interesting philosophical lessons. Despite the title, this is not a post about AI and the twilight of capitalism. Instead, it’s about the would-be transparency of images.1
Kendall Walton famously argues that one literally sees someone when one looks at a photograph of them, but one only fictionally sees someone when one sees an artist’s rendering of them. For example, I see Abraham Lincoln in a photograph of him. But I do not see Abraham Lincoln when I look at a painted portrait of Lincoln. I might say “I see Lincoln” as I turn a corner in a gallery and the portrait comes into view, but it’s a figure of speech— not literally true.2
Here’s how Walton puts the point:
[P]art of what it is to see something is to have visual experiences which are caused by it in a purely mechanical manner. Objects cause their photographs and the visual experiences of viewers mechanically; so we see the objects through the photographs. By contrast, objects cause paintings not mechanically but in a more “human” way, a way involving the artist; so we don’t see through paintings.3
What should we say about an AI-generated image? Consider, for example, this image which was generated in response to the prompt “Karl Marx with funny hat”.
This seems to pose a dilemma for Walton’s position. Either this image presents Marx mechanically or it presents Marx artistically.
If it is a mechanical presentation, then (on Walton’s view) it offers us the opportunity to literally see Marx. He never literally wore that funny hat, of course, so we are not literally seeing Karl Marx in a funny hat. But it’s like what happens with a defaced photograph. If I scribble a funny hat on a photograph of Lincoln, I can still see Lincoln in it.
If it is an artistic presentation, then it seems like we must recognize a degree of creative agency in the operation of the AI. Walton refers to what happens with painting as representation “in a more ‘human’ way.”
One might insist that both horns of the dilemma are unacceptably absurd and so that this a reductio of Walton’s view. I’m not sure I want to go that far.
I do want to deny that Midjourney (the program that generated this image) is doing work in a human way. One difference is that the program generates a lot of output. The typical format is to generate four images for initial prompts, and users pick from those and iterate— running the algorithm with their preferred image as part of the prompt, having the program produce higher resolution versions, and so on.4
One might think this blunts the second horn of the dilemma, because human selection provides the creative agency. Maybe, but I don’t think that helps Walton’s view. The image of Marx was selected from the first handful of options in a way that’s typical for photography. One secret of professional photographers has always been that they take a lot of pictures. The famous, poignant photos are typically selected from a lot of shots that aren’t so great. They were in the right place at the right time, of course, but that’s because they were in lots of places at lots of times. If curation and selection were sufficient for the human element, then Walton’s view fails to include much of photography.
Take a step back and consider a different objection to Walton’s view.
Jonathan Cohen and Aaron Meskin object to Walton’s account by pointing out a disanalogy between seeing in a photograph and literal seeing: If I were to see Lincoln across a crowded room, I would be able to situate him in my egocentric space as over there now. When I see a photo of Lincoln, I can’t do so— he’s just someplace at some time in the past.5
Note that their objection, even if successful, allows Walton’s distinction between the photograph (as mechanical) and the painting (as artist-mediated). Even if neither is literal seeing, the former is more like seeing than the latter.
However, Cohen and Meskin remind us that a crucial element of seeing is the information conveyed in the image. With a photograph, the connection to Marx is secured by a causal series: light bounces off Marx, light strikes a plate causing chemical changes, the plate is developed in a lab, and so on. The image below is from a digital scan of a print of a photoreproduction of the original photograph (or something like that). It’s an image of Marx both because of the mechanical process but also because of human agency at each stage deciding to preserve the image of Marx. Even at the last stage— I cropped it, and I chose to keep his face in frame rather than just zoom in on his jacket.
The image by Midjourney has none of that intentionality. The algorithm has a complex set of weights and dispositions, so that it generates something that looks like Marx in response to a prompt that includes the phrase “Karl Marx”. Humans interact with images of Marx in a broader context. We know that it is a name, that he is a person, and so on.6
Intentionality, which photographs often have but Midjourney lacks, is one of the human, artist-involving features of painting. Someone who paints a portrait of Karl Marx intends for it to depict Marx. It is about Marx.
So one might evade the dilemma in this way: A photograph is of Marx. A painted portrait is about Marx. An AI-generated image merely resembles Marx— it evokes him without invoking him.7
- This was nearly a post about Neil Gaiman. I picked Karl Marx, though, because an image of Marx was created by a user at Midjourney. I’ve used up my allowance of free samples there.
- “Transparent Pictures: On the Nature of Photographic Realism”, Critical Inquiry, Dec 1984, 11(2): 246-277.
- p. 261
- I decided not to use an image of Neil Gaiman as my example because the creator describes it as a “collaboration” with Midjourney.
- “On the Epistemic Value of Photographs”, Journal of Aesethetics and Art Criticism, Spr 2004, 62(2): 197-210.
- Of course, I found the photo with a quick image search. So it was also called up by the phrase “Karl Marx” fed into a pattern-matching machine. Let’s suppose, for the sake of argument, that it is a scan of a photo of Marx rather than (say) of a prize-winning Marx impersonator, a barn facade, or a cleverly disguised mule.
- I’m not sure I’m happy with this move, but one has to stop blogging somewhere.
This is really great. I’ve really enjoyed the recent posts about AI art because I’ve also been thinking a lot about what is philosophically interesting about them. It does seem like one of the big questions is how to think about intentions. I like the evoking and invoking distinction as well. I’m wondering though if the invokers could say that the Marx is invoked by the intentions of the real/true artist: the prompter. The meaning of the work was radically underdetermined (it could be about anything) until the prompter constrained the set of possible meanings. Maybe this is what painters do when they add paint to the canvas. They constrict the set of possible meanings for a future work according to their purposes. Then, the final image is selected from all of the images the AI generates on account of its ability to satisfy the prompter’s purposes. All of this amounts to the causal role of an intentional agent. In this way, the prompter is an artist the way that a fashion photographer or director might be. The photographer prompts the model to show them ‘fierce’ or ‘smize’ and then selects the image from a set on account of it fulfilling the prompt. I’m not sure what would follow about the artform of AI prompting on this picture though.
Another way in which Midjourney is less “mechanical”: labels. In order to produce its Marx image (or any other), it’s using image labels provided by humans. (Or, even if some of those labels are algorithmically generated, those algorithms in turn relied on human labels – it’s not turtles all the way down.) I wouldn’t necessarily call this collaborative filtering process “art” (as you suggest, that comes in the curation process), but nor is it “mechanical” enough for Walton.
Evan: This seems like a matter of degree. As more iterations happen and human choices further guide the outcome, I am more comfortable calling the user’s contribution artistic.
Dan: That seems right. Insofar as there is “aboutness”, it’s provided at the edges by human language and intention.