Insights

The Price of Precision: Accuracy vs Explainability in Machine Learning

Building a highly accurate AI model, such as one predicting aviation engine failures using a complex neural network considering numerous factors, might seem like the ultimate goal. However, in science, accuracy alone isn’t sufficient; the model’s predictions must also be understandable and verifiable.

Neural networks are great at identifying complex patterns in data, but due to the complexity of their inner workings, they lack interpretability; they can’t easily explain why they made their prediction. In high-stake applications like aviation safety, explainability and justification is a necessity. In this case, simpler models that may not be as accurate but can justify the predictions they make are more likely to provide real world value.

In this blog, we’ll look into vital decision points that define different types of machine learning models, helping you choose the right one for your application and deliver true business value.

Regression Models

When making predictions based on a dataset, you’ll be using regression models, which are models that predict an output based on given inputs and learn from data.

Linear regression is the simplest. It assumes linear (proportional) relationships between inputs and outputs. A linear regression model is a function with assigned weights to variables so that a prediction can be outputted given input data. It learns by adjusting these weights to reduce its error.

This model is very explainable as it is clear why the model made its prediction. However, because it assumes linear relationships, it isn’t accurate for non-linear data with complex relationships.

Random forest regression models can represent complex, non-linear relationships through decision trees. During training, it splits the data into the smallest groups possible by finding splits in the data that produce the least spread of values, e.g. house price < 200000 might split a real estate dataset into tighter groups than house price < 180000 for example. Multiple of these splits are made until the model has an accurate decision tree, where the model follows this tree until it finds a tight enough data group to make a prediction.

Therefore, this model can make accurate predictions on non-linear data with complex relationships. While it’s not as explainable as linear regression models, it’s still explainable in human terms by following the decision trees.

One step further, unlike Random Forest Regression, neural networks can learn patterns on their own without needing hand-crafted features. In neural networks, between the input and output layers, there is a hidden layer made up of, effectively, millions of linear regression models, each called nodes that behave like neurons in the human brain. These nodes each make a single prediction and work together to make an overall prediction.

The complex inner workings of a neural network means the model can process all types of data, automatically learning complex relationships from images, text, and numbers, something linear regression and Random Forests can’t do. As a result, when working with complex, unstructured data, this model can produce by far the most accurate predictions.

However, this comes at a cost of interpretability. The inner workings of neural networks are not directly observable and its decisions are not easily explainable in human terms. This can leave you saying a prediction was made “because the AI said so”! Therefore, this can be a deal breaker for use cases where explainability and accountability is key.

The Challenge of Explainability

Different regression models offer varying degrees of explainability and accuracy. This raises the question of whether an optimal balance exists between these two competing factors.

There is a growing legal and regulatory pressure on explainability, particularly in fields where AI-driven decisions affect people’s lives. GDPR (in Europe) enforces that AI decisions that affect consumers must be justifiable, meaning consumers have the right to understand how those decisions were made.

Therefore, in sectors like finance and healthcare, using highly accurate but abstract deep learning models could pose a compliance risk if the company can’t justify its decisions. Simpler methods like linear regression or decision trees provide accountability and transparency.

In situations demanding high accuracy with complex data, explainability techniques such as LIME and SHAP can provide insights into neural network predictions. These methods analyse individual nodes to identify the most influential factors.

Key Takeaway

Simple, explainable models are great cases where transparency, regulatory compliance, and user trust are essential, such as in medical diagnoses. Their justifiable predictions make them ideal when decisions need clear explanations.

Consider the challenge of analysing extremely large and intricate datasets, such as forecasting uncommon illnesses based on general medical information. These simple models simply aren’t going to give you accurate enough results. This is where deep learning and neural networks offer unparalleled performance. This does come with a cost as interpretability is sacrificed, potentially introducing compliance issues and losing user trust. However, techniques like SHAP and LIME can help reach some level of justification, though not as transparently as the simpler models we mentioned earlier.

Ultimately, if you don’t have massive datasets and are working with semi-structured data, simple models, like linear regression and random forests, strike the best balance between accuracy and explainability. Complex, deep learning models work best with generic data that is high-dimensional, unstructured, or highly complex, such as images and text, where white-box models struggle to capture intricate patterns.

For a more in-depth dive into this topic, read the following article: https://medium.com/@jed.ryan/the-price-of-precision-accuracy-vs-explainability-in-machine-learning-0af1059dc9f0

For regular insights into machine learning, connect and follow Jed on Linked In: https://linkedin.com/in/jed-ryan-64401325a