Earlier this week, my twitter feed was blown up by comments on a controversial new paper in Nature Communications titled Tracking historical changes in trustworthiness using machine learning analyses of facial cues in paintings by Nicolas Baumard and colleagues. In a sentence, the paper purports to have used a machine-learning algorithm based on human preferences to conclude that European paintings from the 16th to the 20th century gradually increase their perceived trustworthiness and that this effect lags behind an increase in GDP per capita in Europe.
Here’s the paper announcement, as put out by the lead author:
Reading just the tweet alone, its easy to see why this might be controversial. The announcement was met with scathing criticism, including references to racist and pseudoscientific phrenology of the 19th century:
I decided to take a closer look at this paper because 1) I suspected most people quote-tweeting criticism didn’t actually read the paper and 2) it’s fashionable to team-up on controversial content without actually meaningfully adding to the conversation—so here’s me adding to the conversation in a nuanced way. Let’s dive into what the paper is, what it isn’t, and what we can learn from it all.
What this paper IS
The premise of this paper is actually quite interesting. Social trust is a good thing and cultivating its proliferation should be the goal of any society. To study how we can do that, it might be helpful to learn how social trust has emerged throughout history and how this has affected society. To do this, we obviously can’t go back and survey the 16th-century population. Instead, all we have is artifacts from this time: books, art, etc. Art—portraits in particular—reflects the values of a society at a given time and thus provides a useful portal into the collective psychology of past societies. To quantitatively study 16th-20th century portraiture, the authors of this paper trained a machine-learning algorithm to evaluate the trustworthiness of a face based on human preferences:
The authors then applied this algorithm to a database of European portraits from the 16th to the 20th century and found that perceived trustworthiness increased over time. This increase in trustworthiness parallels the observed decline of interpersonal violence and the rise of western democratic values in Europe over this same time period. Moreover, the authors found that one of the predictors of this increase in trustworthiness is a rise in GDP per capita two decades prior to an increase in social trust. Thus, the authors attribute a rise in living standards as a reason for the increase and social trust and a decline in violence and point to the importance of economic prosperity in our psychological well-being.
What this paper ISN’T
If all of that sounds like a stretch to you, it’s because there are plenty of completely valid criticisms of this work. For one, how can one possibly conclude that portrait paintings (which were large of the wealthy and elite) reflect the values of an entire society? Moreover, how can we know whether perceived trustworthiness in portraits indicates an actual increase in social trust? We can also criticize the narrow and biased nature of the database of portraits used. Others have commented on how this paper seems to completely disregard the effects of art styles and other aspects of art history which would have informed their research if they had consulted a historian.
But what this paper ISN’T is racist phrenology. Much of the criticism I’ve seen seems to falsy believe that the authors trained an algorithm to detect trustworthiness using English portraiture of mostly white people. But in fact, this is not true. The authors developed the algorithm based on flawed and biased human preferences and applied it to historical portraits to predict how people might have perceived the trustworthiness of the paintings. The paper makes no claim to predict trustworthiness based on facial features. This paper is strictly about “perceived trustworthiness” not “actual trustworthiness.”
What we can learn
The problem with this paper is entirely in its framing and writing. It is well known that one cannot know about one’s actual character on facial features alone. The authors acknowledge this fact in passing, yet fail to make it explicit that they are using flawed human tendencies to study social trust. The title of this paper doesn’t do the authors any favors. “Tracking… trustworthiness… using machine learning analyses…. [of] paintings” makes it sound like they are using white, European portraits as a basis for determining trustworthiness. With this title, it is understandable why many people have been quick to call this paper racist.
It turns out that the framing of a scientific experiment is almost as important as the experiment itself when it comes to communicating it. It’s perhaps not fashionable in science to acknowledge the importance of a humanities lens on things, but this was absolutely necessary with this article. It shouldn’t take a close-read by someone who is used to reading academic science to realize that this paper is about perceptions, not actual characteristics. When scientists are toeing the line with dangerous ideas, they should be thinking about how the public is going to read the article—after all, they pay a lot of money for this research.
For the record, I don’t find the science in this paper very convincing or interesting—looks like a spurious correlation of historical data based on 21st-century preferences to me. The discourse around this paper should be on the validity of the data, not the idealogy behind it. The reason it is not is largely the fault of the author’s framing and the echo-chamber of social media.
Instead of the usual interesting links, I encourage all of you to read this paper (it’s short and fairly readable) and making conclusions for yourself. I’m open to changing my mind, so if you’ve got a different opinion let me know:
Thanks for reading.
⚡️P.S. If you're new here and want to read more of the Synapse Newsletter each Sunday, subscribe below!⚡️