An algorithm used for years by the municipality of Rotterdam to predict welfare fraud considers young mothers and people with poor Dutch language skills to be high-risk groups. According to research by Lighthouse Reports, Wired, Argos, Vers Beton and Follow the Money – whose journalists were the first to lay hands on a complete fraud algorithm – they had the highest chance of being subjected to a strict investigation by the municipality.
- What’s the news?
For years, Rotterdam has used an algorithm to predict welfare fraud and administrative mistakes. This led to vulnerable groups, like people with poor Dutch and young mothers, being targeted more quickly and subjected more often to investigations.
- Why is this relevant?
The municipality of Rotterdam is known for its strict welfare investigations. How the algorithm it used decided who to target for such invasive investigations was not known, as was the risk of discrimination.
- How was this investigated?
Journalists from Lighthouse Reports, Argos, Vers Beton, WIRED and Follow the Money managed to lay hands on the Rotterdam algorithm and the data this mathematical model uses to predict the chance of welfare fraud. Age, gender and language proficiency turn out to be very decisive factors. But what is this based on?
The letter tells you to show up for an appointment with all of your documents and bank statements. At the office, you are subjected to a barrage of questions. Adriana, why do you use cash for your groceries? (You couldn’t use a debit card at the market.)
Why did someone transfer 17 euros into your account? (You sold a computer game on eBay.) You are asked about working illegally. (You volunteer at a school, since you haven’t been able to find paid employment yet.) They want to know everything about you.
The slightest irregularity can have major consequences for your welfare benefit. Rotterdam has a reputation for being very strict. The municipality has even announced an investigation into potential possessions abroad, although it’s abundantly clear that you can’t afford the slightest luxury.
Adriana is never told why she was selected for such an invasive investigation. This was not the first time she was summoned either. Is this a coincidence? Did the neighbours snitch? Is Adriana doing something that invites suspicion?
One thing the civil servant sitting across from her is sure not to tell her is that a complex algorithm has calculated that the chance of fraud, errors and accidental mistakes in Adriana’s case is greater than for others (her ‘risk score’ is 0.683 on a scale of 0 to 1). The calculation is based on over 300 different characteristics recorded by the municipality about Adriana and her life.
Everything from her age (30) and relationships (married for three years, one young son) to her mental condition (Adriana has been through a lot, but is doing okay), where she lives (in Rotterdam-Noord district for the past year), her language proficiency (her Dutch is sufficient by now), how a social services staff member judges her ability to find paid employment and whether her personal appearance is appropriate (the civil servant has nothing negative to note).
Since – according to the system – Adriana has important characteristics in common with people caught earlier, either committing benefits fraud or making administrative mistakes, she is suspected too.
Minority Report in Rotterdam
It all seems like a Dutch take on Minority Report, the science fiction story that Steven Spielberg made into a movie, in which the pre-crime police squad helps predict and prevent future murders. For years, a complex algorithm has been used in Rotterdam to calculate which of the tens of thousands of city inhabitants on welfare could be committing benefits fraud. Every year, hundreds of Rotterdammers who had the highest risk scores, according to this mathematical model, could count on being investigated.
Lighthouse Reports, Vers Beton, Follow the Money and Argos were able to lay hands on the Rotterdam algorithm – rather boringly named ‘benefits fraud analytics’ – after making a series of information requests under the Dutch Freedom of Information Act. Never before have independent parties been able to assess how an advanced government model predicts fraud, looking at which personal data is being used, how the computer code makes calculations and who receives the highest risk scores.
Spoiler: These are vulnerable groups who are among the poorest city dwellers on benefits, such as people with poor Dutch language skills, young adults, single mothers who have left a lengthy relationship and people in financial difficulties.
Concerns about the Rotterdam fraud predictions have existed for some time. In 2021, the Rekenkamer Rotterdam (the municipal audit office) requested that attention be paid to the ethical risks attached to the fraud algorithm, which was developed in part by the consulting firm Accenture. In the same fashion someone’s nationality can lead to discrimination, so can characteristics like language. The Rotterdamse Rekenkamer felt that this had not been acknowledged.
Criticism also came from the municipal council. ‘In principle, I am against the use of identity attributes,’ said council member Duygu Yildirim, who belongs to the social democratic PvdA. According to her, the municipality cannot predict at the individual level who would or would not be likely to commit fraud. ‘Even if someone is an addict, they cannot be faulted offhand for the fact that – statistically speaking – addicts may be less likely to provide the information they are required to.’
Figures that Argos and Lighthouse Reports requested earlier also showed that the algorithm leads to remarkably more women being investigated: in the 2018-2020 period, no fewer than 2179 women were investigated, compared to 933 men. Referred to as the ‘welfare capital’ of the Netherlands, with 30,000 inhabitants receiving welfare, Rotterdam claims that there are logical explanations for this disparity.
It said that previous selection methods had led to disproportionately more men being investigated, who subsequently had been removed from the algorithm results, since they had already had their turn. According to Richard Moti, the alderman responsible for social benefits, ‘checks had shown that there were no groups of Rotterdammers who were either under- or overrepresented’. In short, there was no prejudice involved.
Rotterdam continues to believe the claims about what this technology can deliver: discover more fraud, use fraud investigators more efficiently and wrongly provide benefits less often.
Nevertheless, the system was shut down as a precaution in late 2021. The municipality still intends to develop a new version lacking in discriminatory elements, since Rotterdam continues to believe the claims about what this technology can deliver: discover more fraud, use fraud investigators more efficiently and wrongly provide welfare less often.
Just how much success the ‘risk assessment model’ yields is difficult to assess. According to the municipality, the total amount of recouped benefits averages 2.5 million euros a year. A fraction of this amount is due to the algorithm, since the municipality uses several methods to select people for an investigation.
Rotterdam points out that discovering fraud is not necessarily the objective of an investigation: it can be in the interest of someone receiving welfare to have errors corrected as quickly as possible, to prevent problems due to having to repay excess welfare.
Which people has the system learned to find suspicious?
Is the municipality correct in saying that there is no prejudice? Do vulnerable groups really have nothing to fear from this system, unlike those targeted by the Dutch Tax Administration's infamous child care benefits algorithm?
What is the exact influence on someone’s risk score of personal characteristics that are beyond their control, like origin, age and gender? How significant are sensitive matters like language proficiency, financial difficulties and addiction? In other words, which people has the system learned to consider suspicious, and why?
These questions could only be answered by testing the algorithm extensively, using data from real citizens. These experiments clearly show that the algorithm estimates the chance of fraud to be much higher for certain Rotterdammers.
This is especially true for people with poor Dutch language skills, women, young adults, parents, anyone who has left a lengthy relationship or shares household costs with others, and people with financial problems or addiction issues. They are all overrepresented in the group awarded the highest risk scores, particularly if they belong to more than one of these categories. This means that they top the list of potential investigation candidates.
The experiments also evaluated the influence of a single characteristic on the risk scores. This led to the conclusion that women scored higher merely for being female. It also showed that people who did not meet the language proficiency requirement were seen as a greater risk, solely based on this characteristic.
George has zero chance of being investigated
There are major differences at the individual level. Take an average Rotterdam inhabitant on benefits, a 30-year-old man named George. Although he doesn’t actually exist, there are a lot of real people among those receiving welfare in Rotterdam who share his characteristics. He lives in Rotterdam-Noord district, is athletic, single and childless.
Based on these characteristics, the algorithm calculates George’s risk score as 0.50 and ranks him in place 20,973 out of 30,000. This means that he has zero chance of being investigated.
If George had been a woman, he immediately would rank thousands of places higher. If she has a child and a partner with whom she is receiving welfare and sharing the household costs, George has turned into the Adriana introduced at the beginning of this article, with a risk score of 0.683. This is among the highest risk scores and an investigation seems almost guaranteed.
All this is disregarding language, the most sensitive characteristic of all in the Rotterdam welfare fraud algorithm, which includes twenty different language-related variables, ranging from verbal and written language skills to the language requirement for receiving welfare.
If all of these variables are set to indicate poor Dutch language proficiency, this results in someone ending up with the highest risk scores twice as often as someone with Dutch as their native language.
There are major differences for other groups as well. Single mothers have the highest risk scores 43 per cent more often than single women without children. And anyone who has experienced financial difficulties for a few years running will receive a higher score 27 per cent more often than someone on welfare without such problems.
The results of the experiments were shared with the municipality of Rotterdam; in its extensive response, it called the findings ‘interesting, educational and in part familiar’. ‘Over time, we have come to the conclusion that the risk assessment model can never be 100 per cent free from prejudice or from appearing to be prejudiced. This is undesirable, particularly when it involves variables that harbour the risk of prejudice based on discriminatory grounds such as age, nationality or gender. Your findings also demonstrate these risks.’
The research we conducted provides a clear view inside what up until now was a black box. There is usually a lack of clarity regarding how risk models are designed, since government organisations fear that citizens will adapt their behaviour to dodge fraud investigations. However, Rotterdam has opted for far-reaching transparency. After having received information requests under the Open Government Act from Lighthouse and Argos in 2021, the municipality released a list of over 300 risk indicators and even the computer code.
Further information requests yielded extensive technical information. ‘Rotterdam considers it extremely important to be aware of the risks inherent to using algorithms, as should other organisations inside and outside government,’ the municipality stated. Due to concerns about privacy, the municipality was not willing to share the data from thousands of inhabitants who had been investigated for fraud, which had been used to ‘train’ the algorithm to make its predictions.
The 315 variables used in the algorithm were, however, present in the background of the included ‘histograms’ depicting over 12,700 records. Although these had been stripped of directly identifiable data such as names, citizen service numbers and contact details, they came from actual Rotterdammers who had been investigated at some point. These data were used for our investigation, given their journalistic significance. But very few people were given access and the data were destroyed when the findings were published. According to the municipality, these data should never have been released. ‘We have informed our privacy officer about this.’
Read all about the investigation into the algorithm and the methods used.
The investigation into the algorithm’s functioning and its results are just part of the story. The data that were used to ‘teach’ the algorithm to make its predictions are important too. In the case of Rotterdam, these were from 12,700 people receiving welfare who had already been investigated.
For a model to be able to make accurate predictions about who is more or less likely to commit fraud, these data need to reflect reality. There are many examples of where this has gone wrong: from automatic face recognition which malfunctions when confronted with dark skin tones to an algorithm for a job vacancy site that disadvantages women. In such cases, the data is often not a true reflection of reality, like when a facial recognition algorithm is trained on people with predominantly light skin tones, causing the system to make many mistakes.
‘The Rotterdam algorithm does not perform well; it just makes random guesses actually’
Questions abound in Rotterdam too. For example, it is unclear how the people committing fraud were selected in earlier investigations. This may have been a random selection, but could have involved a method harbouring an element of prejudice. For instance, the data could have come from earlier theme-based investigations, which already targeted specific groups, such as people with a certain type of home or household composition. A striking aspect of the Rotterdam data is the lack of young adults, despite the fact that age has the most significant effect on how high a risk score is.
Rotterdam claims that the algorithm was ultimately more effective than random checks. According to figures supplied by the municipality, for each 100 checks it conducted, the algorithm ‘scored’ 39 cases of fraud or other types of irregularities. Random checks yielded 25 out of 100.
The renowned American computer scientist Margaret Mitchell remarked on the evaluations of the Rotterdam algorithm, which show that the model does not perform well and ‘just makes random guesses, actually’. Mitchell is specialised in artificial intelligence, ethics and prejudice. She assessed our investigation at our request. In her opinion, determining whether someone is a risk is always a job for humans. According to her, computer models will never be able to accurately predict the actual risk someone forms, since ‘all lives are different’.
A mathematical model will never weigh all of the factors at play in each individual case, says Mitchell. She thinks that the software developers involved in the Rotterdam algorithm either lacked sufficient information or the correct information required to create a good model. ‘So by and large, you have a recipe for a model that can't generalise well to the real world, based on what it has learned, meaning that it is not useful in the real world.’