Health Warning: See text for warning about significance (click to enlarge)
The Implicit Association Test (IAT) is telling virtually everyone that they are unconsciously racist, sexist and any other -ist you care to mention – however much they may protest otherwise. Is it science, or is it just another stick to beat us with? Read on.
- 1. The Implicit Association Test (IAT)
- 2. Claims for the IAT
- 3. Morality: Fast and Slow – A Little Neuroscience
- 4. The Evolution and Interaction of the Two Morality Modes
- 5. Criticisms of the IAT
- 5.1 What does the IAT Measure?
- 5.2 Reproducibility
- 5.3 Hating Yourself
- 6. Does the IAT Predict Behaviour?
- 7. Sexism and the IAT
You may recall that I have alluded to unconscious bias in a couple of previous posts (this one and this). I said we would be hearing more of it, and indeed we have. On 17/5/17, Radio 4’s “All In The Mind” programme was on the subject of Unconscious Bias and so was the same station’s “Analysis” programme on 5/6/17. The latter focussed on the Implicit Association Test, which is the subject of this post. The Analysis programme will not remain available on iPlayer for long but a transcript of the major points can be found here.
If you want to ‘prove’ that all men are misogynists and all white people are racists, then the Implicit Association Test (IAC) is like a gift from the Gods. Or perhaps I should say it’s like the Delphic oracle – damned treacherous and never what it seems.
Ostensibly, the IAC can be used to test ‘implicit’ or ‘unconscious’ bias against any chosen “out-group”. It works like this – using the race IAT as an example. You’re shown words and faces. The words may be positive ones (“terrific”, “friendship”, “joyous”, “celebrate”) or negative (“pain”, “despise”, “dirty”, “disaster”). In one part of the process you have to press a key whenever you see either a black face or a bad word, and press another key when you see either a white face or a good word. Then it switches round: one key for a black face and good words, another for white faces and bad words. Sounds easy? The snag is you’ve got to hit the appropriate key as fast as possible. The computer measures your speed.
Your implicit bias against (say) blacks is revealed by taking longer on average when positive words are accompanied by a black person’s face, as compared with when positive words are associated with a white person’s face (and being quicker to associate negative words with a black face).
What sort of time difference are we talking about? Typically about 0.2 of a second.
Note that whilst I shall refer to the use of the IAT as a measure of racial bias, this is not because I wish to extend the remit of this blog to the race issue. It is merely because the IAT has been used so extensively in this context, especially in the USA. My interest is in the reliability of the test to measure what it purports to measure – bias against the out-group, whatever the out-group.
The furore surrounding the IAT is that it claims to show that racism, or sexism, or whatever, is far more widespread than people think. Take someone who insists he is entirely unbiased, as egalitarian or as progressive as you might wish, but such people are, nevertheless, exposed as closet racists, or misogynists, by the test. Matin Durrani, editor of Physics World, had this disconcerting experience. The March 2016 edition of Physics World, the monthly journal of the UK Institute of Physics, was given over entirely to diversity in physics. (Initiatives by the Institute of Physics and the Institution of Mechanical Engineers to enhance the profile of women in physics and engineering are common, as I reviewed here). The IAT confidently informed Matin that he had a “strong automatic preference for white people compared with black people”. He was mortified, of course. (Matin’s father is Pakistani, his mother is German, and he was raised in Birmingham).
Poor Matin fared no better on the gender test. This time the IAT asked him to link male and female words with arts and science related words. Shock horror! He was informed that he had a “a strong association of male with science and female with liberal arts compared with vice-versa”. The misogynist swine! [I’m tempted to add that if he had been driven strictly by data on student numbers he should, of course, have associated women with both arts and sciences].
There are other psychology tests which purport to measure implicit bias, but the IAT has become by far the dominant tool used by psychologists. It was developed by Anthony Greenwald and Mahzari Banaji of Harvard (below).
You can take the test yourself on the Project Implicit website. Apparently some 18 million people have now taken the test on-line.
The IAT has become difficult to avoid. Unconscious bias posters are appearing in universities, and you may recall Hilary Clinton claiming implicit anti-black bias by the police in her election campaigning. She claimed there was a need to retrain the police because of this. The IAT is having “an enormous impact on public discourse”, to quote Radio 4’s Analysis programme.
Any tool which claims to weed out the racists and misogynists amongst us, and can also be deployed as part of a diversity training programme and sold to companies for solid cash, is going to catch on fast. It has. Diversity training is claimed to be an $8 billion/yr industry in the USA, and the UK is following suit.
A spokesman for KPMG at Canary Wharf told the Analysis programme that all staff have taken implicit bias training. He said, “implicit bias can get in the way of us being a truly inclusive and diverse company. Encouraging diversity makes good business sense – it improves the company’s bottom line”. (You’ll have seen the data. No? Me neither).
It was inevitable that I would take a closer look at the IAT eventually. I have. After reiterating the claims being made for the IAT, I indulge in a little neuroscience to explain what’s actually going on in the test. Then I focus my artillery on the veracity of the test and its purported interpretation.
The key feature of the IAT is that it measures reactions under severe time pressure. This is why it is referred to as a test of “implicit”, or automatic, bias. The subject does not have time for any measured thought. For this reason it is also referred to as a test of “unconscious” bias. The interpretational leap is then made that this automatic, or implicit, or unconscious bias – obtained essentially as a reflex action without time for thought – reveals your true nature. As one of the contributors to the Analysis programme put it, “what we see coming through is your genuine implicit attitudes”, i.e., you don’t have time to ‘cover it up’.
To quote from the introduction to the 2013 book Blindspot: Hidden Biases of Good People by the originators of the IAT, Banaji and Greenwald,
“The automatic White preference expressed on the Race IAT is now established as signaling discriminatory behavior. It predicts discriminatory behavior even among research participants who earnestly (and, we believe, honestly) espouse egalitarian beliefs. That last statement may sound like a self-contradiction, but it’s an empirical truth. Among research participants who describe themselves as racially egalitarian, the Race IAT has been shown, reliably and repeatedly, to predict discriminatory behavior that was observed in the research.”
We will see below that both those claims – that the IAT is “reliable and repeatable” and that it “predicts discriminatory behaviour” – are contentious and probably false, or at least dangerously exaggerated.
In an Appendix to the book, Banaji and Greenwald, wrote,
“given the relatively small proportion of people who are overtly prejudiced and how clearly it is established that automatic race preference [as measured by the IAT] predicts discrimination, it is reasonable to conclude not only that implicit bias is a cause of Black disadvantage but also that it plausibly plays a greater role than does explicit bias in explaining the discrimination that contributes to Black disadvantage.”
Pause to consider how bold is this claim: that a reflex reaction occurring unconsciously in a small fraction of a second is responsible for the majority of actual racial disadvantage. That such reflex reaction is generally over-ruled by subsequent measured thought (as evidenced by the admitted fact that a relatively small proportion of people are overtly prejudiced) seems not to count. It’s almost as if they were looking for a stick to beat us with.
Banaji and Greenwald have been lobbying for implicit bias to be taken into account in the criminal courts (extract taken from “Psychological Science Under Scrutiny: Recent Challenges and Proposed Solutions” edited by Scott O. Lilienfeld, Irwin D. Waldman). I can readily believe that blacks are treated more harshly in the criminal courts, just as men are, but I’m not convinced that the IAT is the right vehicle to use to address it.
Are these claims for the IAT valid, or is science being co-opted to provide a veneer of respectability for the approved sociopolitical narrative?
We are in the realms of morality. Were it not for racism and sexism being regarded as morally reprehensible we would have far less interest in whether a given individual displayed these characteristics. Whilst people may appeal to the pragmatic benefits of avoiding racism and sexism, it would be untrue, I believe, to assert that practical disbenefits are the reason for one’s pejorative opinion of racism and sexism. The claims that increasing diversity automatically leads to improved company performance, for example, would probably not arise if it were not for the received morality narrative. Indeed, in truth, the claim is an inducement to enact diversity which is actually motivated socio-politically.
I have strayed into moral philosophy once before in this blog, in discussing the intrinsically immoral nature of identity politics. Traditionally, discussions of morality have been taken to be philosophical, addressing questions such as whether there are moral absolutes, for example. But there is another approach to morality which is more tractable, and addresses different questions. In this approach one accepts that morality is an aspect of human behaviour, and hence that the relevant disciplines are psychology and neuroscience. There have been several excellent books published in the last few years which provide the lay person with a good grounding in how productive the psychological approach to studying morality can be, for example The Righteous Mind by Jonathan Haidt, Against Empathy by Paul Bloom, and Moral Tribes by Joshua Greene.
Studying the phenomenology of moral decision making leads rather quickly to the conclusion that moral opinions are not entirely rational. Greene, in Moral Tribes, makes extensive use of variants of a “trolley scenario” to dissect the structure of moral decisions. No doubt you are familiar with the initial version of the vignette…..
You are on a footbridge over a railway line. Ahead you see five workmen on the line. Behind you, and bearing down upon the men, you see a runaway trolley. The workmen’s vision of the trolley is obscured and it is clear to you that they are going to be killed. Beside you on the bridge is a man wearing a large backpack. If you push him off the bridge and into the path of the runaway trolley you will save the lives of five men at the cost of one. Do you push him off? (You may assume that it is clear that you yourself would not be sufficient to stop the trolley, so a noble self-sacrifice would not work).
Most people would not push the man off the bridge.
However, change the scenario to this: there is no man on the bridge. Instead, there is a side-track off the main line, the points for which are operated by a lever next to you on the bridge. But there is one workman on the branch line (whose vision of the trolley is also obscured). If you throw the switch you can save five lives at the cost of killing the single man on the branch line.
Most people would indeed throw the switch.
And yet, rationally, those two situations are functionally identical. Both involve you taking action which will lead to the death of one man, deliberately to save five. So why the difference in moral judgment? It is not as if the death of the unfortunate sacrifice is any different, e.g., a fast painless death versus a slow agonising death. The mechanism of the death is the same in both cases: being run over by the trolley.
A computer could surely not distinguish between the cases, and yet people do – very clearly.
This illustrates the motivation behind the “dual process” hypothesis of moral cognition.
What follows is woefully over-simplified, but inevitably so if I am to avoid straying too far from my more narrow subject matter. The phenomenology of scenarios like the trolley problem suggests that there are two processes at work in moral decision making: a rapid process which is emotionally based and a slower process which is more rational. Indeed, one of the key functions of emotions is to act as cognitive shortcuts. It’s no good working out slowly what that lion might be about to do by debating rationally on the use to which those teeth might be put. Instead, the reaction by any species which has succeeded in evolution is ‘fear-so-run’, cutting out the fatally slow rational middle man.
In order to be fast, the ’emotional’ moral process has to be of particularly simple cognitive form. To explain certain variants of the trolley scenario, Greene hypothesises that the fast emotional process can address only a single linear causal sequence. Any event which happens on a logical branch in the causal sequence is ignored by the fast moral process (so Greene hypothesises). The necessary cognitive simplicity of the rapid emotional moral response is something I’ll come back to in the context of the IAT.
I close this section by noting that the dual process hypothesis gains credibility when neurological correlates are also considered. Again this is really woefully over-simplified, but here goes…
The brain is not so modular in its operation that one can identify one region which carries out rational deduction and another which causes emotional responses. However, there are indeed areas of the brain which respond more over emotional issues, and others which respond more to rational calculation. This can be established by testing subjects using fMRI (functional magnetic resonance imaging). The correspondence between brain region and cognitive process can be established by getting the subject to perform a calculation or view pictures which induce an emotional response whilst in a scanner.
Emotional response is correlated with increased activity in the ventromedial prefrontal cortex (VMPFC), as well as the amygdala. In contrast, rational cognition is correlated with increased activity in the dorsolateral prefrontal cortex (DLPFC).
Subjects can then be scanned whilst being asked to make moral judgments of the ‘trolley problem’ type. So – does increased activity in these areas correspond as the dual process hypothesis would suggest, with emotionally driven decisions exciting the VMPFC and amygdala more, but cold rational decisions being related to DLPFC excitation? Yes, they do.
Some elementary considerations of the evolution of human social dynamics helps rationalise the emergence of a dual process morality.
The social structure of humans is unique. No other species forms such huge societies of cooperative, but unrelated, individuals. (Termites, and Hymenoptera such as bees, wasps and ants, are not an exception as all the individuals in their hives or colonies are siblings). A key ingredient in facilitating cooperative behaviour in human societies is morality – a shared innate view of correct and incorrect behaviour.
Humans cannot survive alone, so cooperation within your own tribe is essential. On the other hand, cooperation between different tribes may either be desirable or not. Depending upon game theoretic considerations, it might be beneficial to exploit or aggress against another tribe. Alternatively, cooperation with the other tribe may be mutually beneficial, just as cooperation between individuals within your own tribe is generally beneficial.
Within-tribe recognition, and the associated positive moral bias, can therefore be implemented in the fast, emotionally based neural process. Cooperation within the “in group” is the default setting. But cooperation with a member of an out-group is more problematic. This requires rational cognition because contingent factors must be taken into account to decide how to treat a ‘foreigner’. There’s not much benefit in being nice to some bloke who’s about to stick his spear in you. On the other hand, there may well be mutual benefit in trading with said ‘foreigner’.
The fast, emotional moral process (VMPFC-amygdala) therefore flags a negative response as a means of referring the decision ‘up the management’ to the slower rational moral process (DLPFC). If the emotional response is sufficiently strong, the more measured process may never activate (“spear raised – fear – attack/run”). But if the VMPFC-amygdala response is mild, being triggered solely by the recognition of an out-group member rather than by an overt threat, the higher management is engaged to make rational decisions.
In short, the emotionally based fast morality is for the purpose of ensuring automatic cooperation between “me” and “us”, whereas the slow, rationally based, morality implements an option to cooperate (or not) between “us” and “them”.
All this is again ridiculously over-simplified, but I think it’s a first order approximation to what happens. And it does rationalise the existence of a dual process moral mechanism.
Is it valid, then, to castigate ourselves for negative fast-process VMPFC-amygdala responses to an out-group member occurring in ~0.2 seconds if they are subsequently over-ruled by rational-cognitive responses?
Greene in Moral Tribes gives an example of the interplay of the two processes, observed experimentally…
“Kevin Ochsner and colleagues showed people pictures that elicit strong negative emotions (e.g., women crying outside a church) and asked them to reinterpret the pictures in a more positive way, for example, by imagining that the crying women are overjoyed wedding guests rather than despondent mourners. Simply observing these negative pictures produced increased activity in our old emotional friends, the amygdala and the VMPFC. By contrast, the act of reappraising the pictures was associated with increased activity in the DLPFC. What’s more, the DLPFC’s reinterpretation efforts reduced the level of activity in both the amygdala and the VMPFC.
It appears that many of us engage in this kind of reappraisal spontaneously when we encounter racial out-groups. Wil Cunningham and colleagues presented white people with pictures of black people’s and white people’s faces. Sometimes the pictures were presented subliminally – that is, for only 30 milliseconds , too quickly to be consciously perceived. Other times the faces were presented for about half a second, allowing participants to consciously perceive the faces. When the faces were presented subliminally, the black faces, as compared with the white faces, produced more activity in the amygdala of the white viewers. What’s more, the effect was stronger in people who had more negative associations with black people as measured by an IAT.
All of the participants in this study reported being motivated to respond to these faces without prejudice, and their efforts are reflected in their brain scans. When the faces were on the screen long enough to be consciously perceived, activity in the DLPFC went up, and amygdala activity went down, just as in Ochsner’s emotional regulation experiment. Consistent with these results, a subsequent study showed that, for white people who don’t want to be racist, interacting with a black person imposes a kind of cognitive load.
Thus, we see dual-process brain design not just in moral judgment but in the choices we make about food, money, and the attitudes we’d like to change. For most of the things that we do, our brains have automatic settings that tell us how to proceed. But we can also use our manual mode to over-ride these automatic settings, provided that we are aware of the opportunity to do so and motivated to take it.”
In January 2017, Jesse Singal published two excellent reviews of the IAT in Science of Us. The first addresses the race issue, but is also an accessible introduction to the literature for the lay person: Psychology’s Favorite Tool for Measuring Racism Isn’t Up to the Job. The second addresses gender bias. Both are recommended reading.
A 2008 article by John Tierney is also worth reading. After outlining the raging academic dispute between the pro-IAT and anti-IAT camps, Tierney pointedly concludes, “If they can’t figure out how to get along with their own colleagues, how seriously should we take their advice for everyone else?”
One can reasonably question what the association between words and pictures in the IAT means. An association between black faces and negative words is universally interpreted as meaning the person in question thinks badly of black people. Well, for a start, the word “thinks” here is inapplicable. The IAT detects reflex reactions over timescales too short for “thought” – understood as a rational-cognitive process.
All we can say is that people associating black faces with negative words have a fast VMPFC-amygdala response which refers the issue upwards for further rational consideration. The IAT itself tells us nothing about the subsequent rational process. Is this enough to imply racism? I think not. Consider a society in which people are schooled assiduously to avoid anti-black racism. This is likely to lead to a learnt VMPFC-amygdala response in which the association of black faces and negative words would indeed flag the need for careful further consideration – exactly as is commonly observed. In other words, the apparently racist IAT response may sometimes be exactly the opposite.
This point can be explained in a different way. Recall that the essence of the VMPFC-amygdala response is its speed. Its speed is accomplished by being computationally (cognitively) very simple – perhaps so simple that its response is virtually a binary switch. If so, it is reasonable to suppose that all ‘negative’ associations would produce the same response. So, what if a group – say blacks – were associated socially with disadvantage, oppression, victimization, and discrimination. The VMPFC-amygdala response might well conflate these negatives with the negative words deployed in the IAT (“lazy”, “dirty”, “disaster”, etc). So, an IAT result which is being interpreted as indicating racial bias is actually indicating a recognition of racial disadvantage – the very opposite. Uhlmann, Brescoll, and Levy Paluck have published a paper demonstrating in an experiment this very effect.
In an article titled, “I Don’t Actually Hate Myself: Why Harvard Is Wrong About Bias“, the openly gay journalist John Cloud was surprised to be told that, “Your data suggest a slight automatic preference for Straight People compared to Gay People”. He observed, “My results might mean I’m self-hating, although I’m not exactly sure what I could do to be gayer. Wear a tiara to work?”.
Cloud also quotes Maia Szalavitz as having noted that, “48% of African Americans who take the test also show a bias against themselves”. Are we to conclude that self-ism is as prevalent as other-ism?
But the paradox of ‘hating yourself’ disappears if you accept, as argued in Section 5.1, that the IAT can be a measure of perceived disadvantage rather than perceived inferiority.
One of the major criticisms of the IAT is its very poor reproducibility. A proposed measure of anything is obviously suspect if a second, third and fourth test produce substantially different results. With the IAT, they do tend to.
A measure of test reproducibility (or reliability) is the coefficient of multiple correlation, r. (For a single repeat test this is just the usual Pearson correlation coefficient). If r = 1 then the test result is precisely and consistently reproducible. If r = 0 then testing is merely producing random numbers. This site gives the following guidance on the degree of confidence one can have in a test against r value,
- 0.9 and greater: excellent reliability
- Between 0.9 and 0.8: good reliability
- Between 0.8 and 0.7: acceptable reliability
- Between 0.7 and 0.6: questionable reliability
- Between 0.6 and 0.5: poor reliability
- Less than 0.5: unacceptable reliability
Jesse Singal tells us,
“The IAT’s architects have reported that overall, when you lump together the IAT’s many different varieties, from race to disability to gender, it has a test-retest reliability of about r = .55. By the normal standards of psychology, this puts these IATs well below the threshold of being useful in most practical, real-world settings.”
But it gets worse. The value of r = 0.55 derives from the IAT’s proponents. Jesse Singal reviews estimates of r from other sources. Generally they are not better than r = 0.4 or thereabouts.
So, even setting aside what an IAT result might mean (if anything), the IAT does not meet the criterion to be a sound measurement of anything. Quoting Jesse Singal again,
“What all these (r) numbers mean is that there doesn’t appear to be any published evidence that the race IAT has test-retest reliability that is close to acceptable for real-world evaluation. If you take the test today, and then take it again tomorrow — or even in just a few hours — there’s a solid chance you’ll get a very different result. That’s extremely problematic given that in the wild, whether on Project Implicit or in diversity-training sessions, test-takers are administered the test once, given their results, and then told what those results say about them and their propensity to commit biased acts.”
So, finally to the main event: does an IAT tell us anything about a person’s behaviour? If not, then who cares about the IAT? If the IAT tells you that you have a bias against blacks or against women, but these claims are not born out in actual behaviour, then the claims of Greenwald and Banaji for their test are invalid. Behaviour is, in truth, all that matters.
In psychology, a single piece of research, published as a single journal paper, tends to be unreliable. At least the 100 studies considered in this review proved to be so – with only between one third to one half of reported significant effects being reproducible. It is important, therefore, to amalgamate many individual studies in meta-analyses in order to extract any reliable features. After years of tussle between pro-IAT and anti-IAT academics, the outcome seems to have been decided: the original architects of the IAT, Greenwald and Banaji, have effectively conceded, in 2015. Jesse Singal writes,
“The psychometric issues with race and ethnicity IATs, Greenwald, Banaji, and Nosek wrote, ‘render them problematic to use to classify persons as likely to engage in discrimination.’ In that same paper, they noted that ‘attempts to diagnostically use such measures for individuals risk undesirably high rates of erroneous classifications.’ In other words: you can’t use the IAT to tell individuals how likely they are to commit acts of implicit bias. To Blanton, this is something of a smoking gun: ‘This concession undermines the entire premise of their webpage,” he said. “Their webpage delivers psychological diagnoses that even they now admit are too filled with error to be meaningful.'”
Yet this zombie test is still being touted around, with ever increasing zeal. This is easy to understand: it ostensibly produces results in conformity with the approved narrative and provides succour (and income) to the diversity merchants.
There are even more question marks over the IAT applied to gender than applied to race. There is also a difference in the nature of such tests. In the race IAT, the associations in question relate to words which are indisputably negative or indisputably positive. In the gender IAT – at least as usually applied – the associations are with ‘career related’ words versus ‘family related’ words. There is nothing intrinsically positive or negative about either in this case.
But even leaving that aside, there are puzzling aspects of the IAT as applied to gender. As Greg Mitchell and Phil Tetlock put it in a book chapter* that is very critical of the IAT,
“One particularly puzzling aspect of academic and public dialogue about implicit prejudice research has been the dearth of attention paid to the finding that men usually do not exhibit implicit sexism while women do show pro-female implicit attitudes.”
*”Psychological Science Under Scrutiny: Recent Challenges and Proposed Solutions” edited by Scott O. Lilienfeld, Irwin D. Waldman).
Mitchell and Tetlock opine that “these findings are contrary to the common finding on IATs of the historically advantaged group being favoured by members of both the advantaged and the disadvantaged groups“. But these authors’ puzzlement is simply due to their assumption as to which gender is advantaged. Had they permitted themselves to entertain the notion that women are the more advantaged sex, their difficultly would largely vanish. That this thought did not occur to them we may attribute to cultural bias – ironically.
Referring to Mitchell and Tetlock’s observation, Jesse Singal writes,
“This appears to be a pretty robust finding, and if you translate it into the same language IAT proponents speak elsewhere, it means men don’t have implicit sexism and are therefore unlikely to make decisions in an implicitly sexist manner (women, meanwhile, will likely favor women over men in implicitly-driven decision-making). Even weirder, when you switch to IATs geared at evaluating not whether the test-taker implicitly favors men over women (or vice versa), but whether they are quicker to associate men versus women more with career, family, and similarly gendered concepts, the IAT somewhat reliably evaluates women as having higher rates of implicit bias against women than men do.”
This quote refers to the IAT results which are the basis of the graphic which heads this post. In view of the preceding criticisms of the IAT, I haven’t a clue as to whether it is indicative of actual behaviour or not.