Data & Intelligence October 04, 2018
Unbiased machine learning does not exist
When speaking about machine learning, we’re all excited about the unbiased manner in which machines can provide us with, and analyse our data. However, when implementing machine learning algorithms, there are still ethical challenges we face. After all, machines or not, they learn from human behaviour.
When looking at data, we look for a pattern. From that pattern, we make a prediction. It is that same process that we try to teach machines. When doing so, we call it machine learning. It gives us the opportunity to analyse data that our brains are incapable of processing. A computer, on the other hand, is capable of analysing multiple variables instantaneously and uses algorithms to do so.
Machines as by humans
Apart from being able to analyse faster than we do and complete a large set of variables at the same time, computers are not biased like humans are. They don’t have preconceived notions of the outcome, based on personal prejudices. As humans, we always come up with a way to prove to ourselves that we are right. No wonder we have a saying in data analysis: ‘If you torture the data long enough, it will confess anything.’ When seeking to prove your own assumption, you are likely to succeed.
Therefore, when leaving the analysis up to machines, we make more objective – and better – predictions. That’s why we’re using them in finance, medicine, and marketing. Machines are trading stocks, diagnosing illnesses, as well as forecasting supply and demand. And this is great! Except there’s one catch: these machines still learn from human behaviour. From the historical data that we provide them with, they find patterns based on our decision-making, behaviour and data sets.
In other words, these “objective” machines find patterns after learning from our irrational decision-making, our strange behaviour, and our biased choices. Consequently, algorithms could become prejudiced and segregated exactly the way our society is. One can wonder if that’s a good thing.
There are already some very embarrassing examples of algorithms making things worse. The most infamous example of this is the COMPAS software the US justice system is currently using to predict a defendant’s risk of recidivism. Usually, at the sentencing, it is entirely a judge’s prerogative to prescribe a sentence within statutory guidelines. The obvious flaw in this system is that it is biased. Judges might abuse this unchecked power to sentence based, not only on relevant factors, such as the seriousness of a defendant’s offence but also on those that are morally and constitutionally problematic – for example, gender and race.
This is where the algorithm comes in. The algorithm was introduced to make a better, and more importantly, fairer risk assessment. But shifting the sentencing responsibility to a computer does not necessarily eliminate bias; it delegates and often compounds it.
Algorithms like COMPAS simply mimic the data with which we train them. A ProPublica study found that COMPAS predicts African American defendants will have higher risks of recidivism than they actually do, while white defendants are predicted to have lower rates than they actually do.
However, the company that developed the COMPAS software disputes this. Given that it doesn’t publicly disclose the calculations used to arrive at a defendants’ risk scores, it is therefore impossible for either the public or the defendant to understand the calculations behind the score. So, we now have an algorithm that is racially biased while guiding sentencing which is also a complete black box to those involved.
Now, we data scientist are no evil geniuses. We are not planning to take over the world. In the end, we’re just a bunch of enthusiasts that like data and want to make online marketing more efficient. We use data to make better websites and ads that are more relevant. This is for your convenience, so you don’t have to click your way through endless menus on webshops for example. Via machine learning and the insane amount of data, we know who you are and what you’re looking for. Only question remaining is: how far can we go in using these insights?
Ethically responsible, or not?
Let’s illustrate this with an example. Take Peter. An internet user like many of us. We know a lot about him. We know that he recently lost his job because he just googled for information on unemployment allowance. We also know that he likes camping, because he bought camping supplies for a holiday he’s been comparing online. Also, by showing him discounts on these products we motivated him to buy them. However, after receiving these, he returned all of it. But there’s more; he searched for what helps best when feeling bloated. He bought quite some alcohol online until some years ago – when he googled ways to quit drinking for some time.
That’s quite some information. We can do a lot with that. But one might argue if all of our options are ethically responsible. Let me present you with a few options, and you decide whether you agree or not:
- Is it ok for us to show him more camping supplies ads when he’s browsing on websites?
- Can we show him products on sale? As that is what motivated him to purchase last time around.
- Should we exclude him from our campaigns? Given that he is most likely to return the items purchased, costing our clients’ money.
- Can we target him for a personal loan? Having lost his job and already booked his holiday, he could use the money.
- Is it ok to send him alcohol ads? Given the stress of losing his job, he might be more eager to relieve that stress through alcohol.
- Can we show him ads from one of our clients, a health insurance company, for supplementary insurance with dynamic (higher) prices? His sore throat might indicate something worse after all those years of smoking.
Although you could argue that some of the above questions are unethical, this is already happening. And let’s be clear, there’s no right or wrong answer to these questions. Nor are they good or bad. But not everyone has the same perspective on this. It is all about a matter of opinion. Meaning that while continuing to apply machine learning algorithms, sooner or later we might cross over someone’s moral line.
In all honesty, even now I often wonder whether or not I am crossing any lines. Is it ok for me to make these kinds of decisions? Deciding for someone that I don’t know to be confronted with their personal weaknesses? We’ve seen that even with the best intentions, algorithms fail to eliminate our human biases. And even if we could create an unbiased algorithm, we should discuss how to use it. This is something we have to decide for ourselves, because there are no laws yet, no guidebook, no bible.