Velocity is a key factor in consulting and projects delivery. This is no different for data engineering/science projects. The urge to develop a “functional” product plus the creation of new tools such as Azure Machine Learning Studio have made it quite easy to create a model for data. With a “plug and play” framework you can easily preprocess your data, train, “validate” and test your model in order to deliver a product as fast as possible. With all this possible, one question I’d like to ask is, “Is data modeling only about modeling data?”
In order to answer this question let’s take a look at a use case where focusing only on data modeling has been the cause of a bottleneck, and a narrow one in my opinion.
Use Case: A biased criminal justice system.
Proposed solution: An impartial and objective AI algorithm to judge without the inevitable human bias.
Sounds good, right? It just might work. Let’s train a well-known logistic regression model with the U.S. Supreme Court database that holds information on court cases dating back to 1791. Better yet, let’s train a sophisticated deep neural network. Once we have our data model trained and tested, we can start using it on new court cases to make impartial judgments without human intervention. After all, an algorithm does not have feelings or emotions and its decisions are objective and neutral with no personal preferences or bias.
Outcome: The U.S justice system turned to technology for help (guided by the popular idea of “Algorithm neutrality”), just to find out that the algorithms had a remarkable bias too.
So what happened? The answer is simple:
In order to develop an AI algorithm, human-created data sets are used. If the underlying data is biased and reflects human preferences, the algorithm will also learn these inclinations. In other words, AI learns what you’re teaching it. In this particular case, the AI algorithm, trained with historical court cases, learned from what humans (judges) have decided in the past. If the judges had a bias at court, the neural network will find and use this pattern while making new judgments. There is a saying in data science: “garbage in, garbage out” or in this specific case: “Bias in, Bias out.” Machine learning algorithms are not magical. If you provide them skewed information, they will not fix the skews by themselves. If you are not careful enough, you risk automating the same biases you were supposed to eliminate.
When put this way, algorithmic bias sounds obvious, but many companies at the forefront of AI research have already encountered this very problem. For example, Google translation’s algorithms include gender stereotypes, LinkedIn’s advertising program shows preferences for male names, and Microsoft’s chatbot named “Tay” spent some hours learning from Twitter to adopt an anti-Semitic posture, among other cases.
So far researchers and data scientists have mostly focused on the learning part of the process, searching for new, better and faster ways to train an algorithm to learn. They have indeed progressed at this, however, is having the ability to learn all that matters? What about the teaching part? What would be the point of achieving the perfect learning algorithm if you only teach it with poor, outdated, unchecked or biased data?
It would be like teaching Albert Einstein all the dates of births of historical characters, sometimes even teaching unimportant information of fictional characters. In the human learning process, the teacher is the one that has the bigger responsibility. So, shouldn’t the same approach be used in a machine learning process?
There are some contexts where the problems of algorithmic bias and the opacity of algorithm’s decisions might not be harmful. For example, Netflix’s AI deciding which movies to recommend to you, Spotify recommending songs to listen to, and Amazon or Facebook showing you advertising based on previous patterns. However, there are other contexts where the awareness of these problems is crucial. Some of these situations include using AI to hire people at companies or to admit them to universities, using algorithms at banks to decide who is worthy of financing, or using AI to make court decisions.
It may be wise to take a step back in the path to full AI adoption in sensitive situations and consider a model of hybrid intelligence. We may not be ready for full AI adoption just yet. I believe it is too soon to let badly taught, black box algorithms drive the criminal justice system or the labor market by themselves. At this point, there is barely a federal law or organization that sets standards or performs inspections of AI algorithms making important decisions. If created, an such an organization operate similar to the way the FDA oversees new drugs and medical devices.
I know at this point you might be wondering, should we just eliminate AI as a possible solution to human bias and prejudice? My answer is: of course not, but I think we might need to attack the issue with a different approach. With state of the art AI we could get better results taking the “Hybrid Intelligence” approach. By this I mean humans and machines collaborating more satisfactorily. For example, we can use AI and Natural Language Processing (NLP) to develop a virtual assistant to team up with lawyers and judges while going through cases. The virtual assistant would have a knowledge base of not only previous cases, but it would also be up to date on the latest laws and regulations. It would be able to perform complex searches by concepts considering the context, instead of simple keyword searching.
If you are not aware of current machine learning limitations, and do not take them into account while modeling data, you will probably get deficient models that could indeed aggravate social inequalities. However, well-built and, most importantly, well-taught algorithms can lessen these limitations and consequential troubles. At Intersys Consulting, we are aware of this. Our expertise on the matter allows us to not only develop the best machine learners but also be great teachers, knowledgeable of how to distinguish between good and bad teaching material (data). We do not let swiftly designed algorithms blindly drive your business, but instead can help you take your AI technology to the next level.