In the U.S., computers help decide who goes to jail. But their judgment may be no better than ours
|sciencemag.org 17 Jan 2018 at 14:02|
Every day, judges across the United States face harrowing decisions: How many years should they give the bipolar woman convicted of murder? Should they jail the youngpossibly innocentman awaiting trial, or release him on bail, where he could commit a crime? Facing overflowing dockets, courts are increasingly using computer-based tools to help make those choices. Now, a new study suggests that one widely used toolan algorithm that calculates risk scores for defendants in sentencing or bail hearingsis no better than people armed with a few key pieces of information.
A fancy model isnt necessarily a better model, says David Robinson, who studies predictive analytics and governance at Georgetown University in Washington, D.C., but wasnt involved in the new work.
Being accused of a crimeeven a minor one such as trespassingcould land you in jail. But if youre considered low risk, or if jails are overcrowded, you might get to go home before your trial. To make sure judges were treating all defendants fairly, U.S. courts in the 1980s started requiring jail staff to collect data on defendants finances, families, friends, and drug and criminal histories. That information was often packaged into a recommendation and passed on to judges, who were free to use itor not.
But in dozens of states, those risk assessment tools are moving from pen-and-paper calculations to complex algorithms, many of them proprietary. Few have been independently studied, raising concerns among researchers and civil rights advocates. Some worry that machines carry an authority unmatched by humans, leading to a greater reliance on their data; others say the secret sauce of the algorithms can lead to unfair outcomes. For example, a contested 2016 study by investigative reporters at ProPublica found that one system, Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), and white offenders as low risk.
Those findings intrigued Julia Dressel, a computer science major at Dartmouth College. She set out to answer a more basic question: Are humans or machines better at assessing risk? To find out, she uploaded the ProPublica database, a collection of COMPAS scores for 10,000 defendants awaiting trial in Broward County, Florida, as well as their arrest records for the next 2 years.
Dressel randomly selected 1000 of the defendants and recorded seven pieces of information about each, including their age, sex, and number of previous arrests. She then recruited 400 people using Amazon Mechanical Turk, an online crowdsourcing service for finding research volunteers. Each volunteer received profiles of 50 defendants and was asked to predict whether they would be re-arrested within 2 years, the same standard COMPAS uses. The humans got it right nearly as often as the algorithm between 63% and 67% of the time, compared to about 65% for COMPAS, she reports today in
J. DRESSEL ET AL., SCIENCE ADVANCES, EAAO5580, 2018, ADAPTED BY C. AYCOCK/SCIENCE
Dressel was surprised. So was Megan Stevenson, an economist and legal scholar at George Mason University in Arlington, Virginia, who found that a similar risk assessment system in Kentucky. Stevenson says she always assumed algorithms were at least somewhat better than people at assessing risk, so the new studywhich she calls the first horse race between man and algorithmleft her quite shocked.
In a second experiment, Dressel and her adviser, Dartmouth computer scientist Hany Farid, explored whether a simple algorithm could beat COMPAS, which typically uses six factors from a 137-item questionnaire to assess risk. (A common misperception is that all 137 items are used to score risk, when most determine which rehabilitation programs an offender might qualify for.) They created their own algorithm, ultimately settling on just two factors: age and number of prior convictions. Plugging that information into a simple formula yielded predictions that were right about 67% of the timesimilar to the COMPAS score.
Robinson says those results reflect something long known in criminology: If youre young, youre risky. But just how those results would be translated into the criminal justice system is a mystery, he adds. Thats because the study looked at untrained volunteers, rather than real judges. Whats more, the volunteers were given a real-time feedback scoresomething impossible to introduce in a courtroom.
Tim Brennan, who created COMPAS in 1998 while at Northpointe (now Equivant) in Canton, Ohio, says that far from undercutting his approach, the new study validates it. Seventy percent accuracy, he says, has long been considered the speed limit of such prediction systems, and the fact that humans did no better is encouraging.
But humans are still no better than machines at eliminating bias, notes mathematician Cathy ONeil, founder of the risk consulting and auditing firm ONeil Risk Consulting & Algorithmic Auditing in New York City. Dressels study found that people were just as likely as COMPAS to overstate re-arrest risks for black defendants and understate risks for white defendantsthey incorrectly flagged black defendants as high risk 37.1% of the time (compared to 40.4%) and white defendants as low risk 40.3% of the time (compared to 47.9%).
Thats troubling, given that similar algorithms are increasingly influencing not only court decisions, but also loan approvals, teacher evaluations, and even whether .
People get awed by mathematical sophistication, but its mostly a distraction, says ONeil. She notes our algorithms are no better than usor the data we feed them. At the end of the day all we can do is make it biased in a way were comfortable with. Theres nothing objective about putting people in prison.