Why would a model be prejudiced against a certain race? Very rarely do people gi...

radarsat1 · on Sept 22, 2016

> They train on actual data

often generated by humans

> and care only about making the most accurate predictions possible

of labels often generated by humans.

Bias can get into a classifier. It can get there through a biased model, but that's very unlikely. Much more likely is that it's trained on biased data. Which is _easy_ to do, even by accident.

Houshalter · on Sept 22, 2016

Sure, but even then they would be no worse than the humans they replace.

But in most cases, the whole point of using machine learned models is to do better than humans. Like an insurance company using ML to predict how likely a customer is to get in an accident. They aren't going to train the models to mimic actuaries, they have plenty of actual data to train it on.

And it is quite possible that males get into accidents much more often than females. But that doesn't mean the model is prejudiced or that it's wrong.

vintermann · on Sept 23, 2016

> Sure, but even then they would be no worse than the humans they replace.

That depends on two things:

1. How possible it is to inspect and criticize the judgements of the model, and its basis for it

2. How possible it is to inspect and criticize the judgements of the humans, and their basis for it

I would say that 2 is the bigger problem all in all. But 1 can potentially still become a big problem if models are trusted blindly.

Houshalter · on Sept 23, 2016

We can't inspect human brains, they are black boxes. People are incredibly biased, but also mostly unaware of their biases. For instance, judges have been found to give much harsher sentences just before lunch, when they are hungry. Or attractive people get much shorter sentences. In job interviews attractive people do better. It also matters way more in tipping waitresses and in elections.

But on top of that, humans almost always do worse than even the most simple statistical baselines. Simple linear regression on a few relevant variables beats human 'experts' 99% of the time. Humans shouldn't be allowed to make decisions at all, yet everyone seems to fear teh scary algorithms instead.

daveloyall · on Sept 23, 2016

Funny you should mention that human brains are black boxes...

The research is about extracting information from black boxes.

I want an AI to steal my model. And run it. Forever.

sp332 · on Sept 22, 2016

https://motherboard.vice.com/read/why-an-ai-judged-beauty-co... “It happens to be that color does matter in machine vision,” Alex Zhavoronkov, chief science officer of Beauty.ai, wrote me in an email. “and for some population groups the data sets are lacking an adequate number of samples to be able to train the deep neural networks.”

Houshalter · on Sept 22, 2016

Well of course a beauty contest judged by AIs would go horribly wrong. Appearance is highly subjective and arbitrary.

But even so it's not clear their algorithm was the cause of the bias, or that the bias was significant. For instance, it's possible that black people have slightly worse "facial symmetry" on average, or whatever made up metric they were using. And even if black people only scored 1% worse on average, that means the extremes will be dominated by whites, because of the way gaussian distributions work. So it may appear to be way more biased than it actually is.

vintermann · on Sept 23, 2016

Even with no bias, uncertainty can cause problems. Say an ML system is tasked with finding the top 10 candidates in terms of "confidence that they will be able to do the job". Then if it has little training data on candidates of a particular class, and those in that class are actually quite different on many different variables (so it can't generalize very well), it may not be able to reach the required levels of confidence for them.

I think this is actually the reason for a lot of accidental discrimination, because human judges would have exactly the same problem.

I remember in school, playing chess a couple of times against a guy fresh from Sudan. He had the most unsettling smile, and played very unorthodox openings. I won some, he won some - but I always suspected he was stronger than me, and just being polite/testing me. It's just impossible to read someone from a culture so different. I'm glad we didn't play poker, to put it like that.

lqdc13 · on Sept 22, 2016

That particular study is pretty bad. Look at the contestants' faces. They might be failing at identifying facial features or not accounting for lighting, tilt, etc. It is obvious if you look at their estimated ages.

sp332 · on Sept 22, 2016

Sorry but your knee-jerk "analysis" doesn't hold as much weight with me as the judgement of the program's actual authors.

lqdc13 · on Sept 23, 2016

They clearly got the ages wrong that's not really debatable. Do not take my word for it though. There are plenty of studies in this space that got wildly different results some of which mostly agree with hotornot ratings.

flashman · on Sept 23, 2016

It's possible to predict race by looking at several other variables which are correlated to race. See Technical Appendix A and especially Table 8 for an example of predicting ethnicity using surname and state only: http://files.consumerfinance.gov/f/201409_cfpb_report_proxy-...

You can do this deliberately, or your machine learning model might do it spontaneously.

Houshalter · on Sept 23, 2016

I don't deny that. Although I don't think anyone uses surnames as a feature, and that would be pretty blatant. You also need to use census data to really make use of that for less common last names. Anyone going out of their way to do this might as well just discriminate directly instead of this convoluted method.

It may be true that a sufficiently complex machine learned model could learn race as a feature, but again, why would it? It has no prejudice against specific races. Unless you really believe that black people are inherently more likely to get into car accidents, even after controlling for income and education, etc. And even then it doesn't hate black people, it's just doing it's best to predict risk as accurately as possible. It's not charging blacks, as a group, a higher rate than they cost, as a group.

I fail to see why people get so upset at the mere possibility of this. I think they are anthropomorphizing AI as if it was a human bigot that has an irrational hatred for other races, and strongly discriminates against them for no reason. This is more like giving people who live in neighborhoods with slightly higher accident rates, slightly higher insurance rates, to make up for their increased risk. Maybe it correlates with race, maybe it doesn't, it doesn't really matter.

flashman · on Sept 23, 2016

Here's an article with more information on the potential for bias in algorithms: https://www.propublica.org/article/machine-bias-risk-assessm...

Based on a list of 137 questions[1] the Northpointe system predicts the risk of re-offending, and "blacks are almost twice as likely as whites to be labeled a higher risk but not actually re-offend." Meanwhile, whites are "much more likely than blacks to be labeled lower risk but go on to commit other crimes."

In other words:

- a white person labeled high risk will re-offend 66.5% of the time

- a black person labeled high risk will re-offend 55.1% of the time

- a white person labeled low risk will re-offend 47.7% of the time

- a black person labeled high risk will re-offend 28% of the time.

The model is specifically avoiding race as an input, but still overestimates the danger of black recidivism, while underestimating white recidivism.

[1]https://www.documentcloud.org/documents/2702103-Sample-Risk-...

Houshalter · on Sept 23, 2016

This article was posted below and it has serious problems with it: https://www.chrisstucchio.com/blog/2016/propublica_is_lying....

zellyn · on Sept 22, 2016

https://www.propublica.org/article/machine-bias-risk-assessm...

Houshalter · on Sept 22, 2016

That research has some serious issues with it: https://www.chrisstucchio.com/blog/2016/propublica_is_lying....