Bias is an interesting yet often misunderstood issue in artificial intelligence (AI). I previously wrote an essay exploring data bias in AI systems but felt it lacked sufficient depth. Upon deeper reflection, I realized I had viewed AI bias through the wrong lens. Rather than distinct types of bias, I discovered bias manifests in layered forms within AI systems, each building upon the last.
This may seem like semantics, but the layer analogy aptly captures the interdependent nature of biases in AI. Just as a cake recipe comprises interconnected ingredients, AI bias flows across three key layers, each enabling the next. No single bias can exist in isolation – they accumulate into an integrated stack. This layered perspective provides greater technical accuracy for deconstructing the nuanced issue of bias in AI versus viewing biases as disparate types.
The first pervasive layer involves biases embedded in the training data ingested by AI systems. However, data issues alone do not fully explain bias. Problematic data combines with AI model limitations, which comprise the second layer. Finally, the third layer entails biases within the teams building the AI systems and curating the data. Each layer feeds into the next to generate real-world impacts. By delineating the layered flow, we gain insight into AI bias for mitigation.
Let’s go deeper into this.
The First Layer - Data Bias
The foundational layer underpinning all forms of bias in AI systems is data bias. At its core, data bias refers to any skew in the accurate representation, values, and precision of an AI's training data. Anything that aims to improperly weight the data fed into an AI model constitutes data bias.
Here are some examples of Data Bias:
Selection/sampling bias arises when the method of collecting data implicitly overlooks certain groups or viewpoints. This skews the diversity and completeness of the training data.
Prejudice bias involves conscious or subconscious injection of discriminatory societal prejudices into the data curation process.
Reporting bias stems from inconsistencies in how events and information are recorded and categorized.
Observer bias occurs when human perceptions and assumptions influence data gathering and labelling decisions.
Confirmation bias sees data collectors preferentially seek out or emphasize data that adheres to their existing hypotheses or worldviews.
Exclusion bias is the deliberate or unintentional omission of specific population groups, perspectives, or variables from data collection.
Measurement bias flows from systematic errors in how data is numerically quantified or characterized.
The Second Layer - Algorithmic Bias
The next layer of bias flows from limitations in the AI model algorithms themselves. Even if data inputs were pristine, issues can emerge from the training process and statistical models. This algorithmic bias, also called technical bias, arises when an AI model learns biased relationships, correlations or rules from tainted training data.
Here are some examples of Algorthmic Bias:
Representation bias occurs when a model fails to sufficiently represent certain groups, identities or perspectives in its parameters or architecture.
Evaluation bias arises when models are tested on non-representative datasets, obscuring unfair performance differences.
Overgeneralization bias happens when a model draws broad biased conclusions from limited data.
Underspecialization bias stems from models failing to capture relevant specialized cases, nuances and exceptions.
Inductive bias refers to assumptions and simplifications implicitly encoded into a model’s structure by its designers.
These examples demonstrate how bias can become baked into models themselves, irrespective of data inputs.
The Third Layer - Systemic Bias
The final layer of bias flows from integration of AI systems into the broader social context. Even if data and algorithms are pristine, real-world deployment can propagate bias through interactions with people, policies, and environments. This systemic bias compounds underlying technical issues.
Several forms of systemic bias may emerge:
Deployment bias stems from inequitable access to AI systems across population groups.
Reporting bias arises when humans interact differently with certain groups when collecting AI input data.
Behavioral bias flows from people anticipating and reacting differently to AI systems based on race, gender, age and other attributes.
Composition bias occurs when an AI workforce lacks diversity, propagating the biases of homogenized teams.
Power bias arises when AI concentrates decision-making power among privileged groups.
Policy bias flows from AI governance frameworks that insufficiently address biases.
A good example of systemic bias is Amazon's biased AI recruiting tool that discriminated against women for jobs common to only men.
End Note
In summary, bias in AI manifests in layered forms, cascading from data inputs to algorithms to real-world systems. No single bias operates in isolation.
Addressing data bias requires scrutinizing training data integrity and representativeness. Algorithmic bias mitigation involves monitoring for distortions during model development. Combating systemic bias necessitates a proactive evaluation of integration plans.
While challenging, acknowledging the layered nature of bias provides clearer insight into AI's complexities. Only through holistic understanding can we work to minimize harm. Progress requires cross-disciplinary collaboration and nuanced perspectives.
There are no quick fixes, but many promising paths forward. Developing representative data benchmarks, auditing algorithms, and pioneering inclusive governance frameworks offer potential solutions.
The journey towards equitable, ethical AI demands diligence, imagination and good faith efforts. But the goal of maximizing benefits while minimizing unintended consequences makes it worthwhile. With care and wisdom, we can steadily reduce bias one layer at a time.
If you’d like to read more on this topic check out
‘s essay on the topic as it was the original inspiration for this piece.
This is a really great layering, Edem - complements your earlier post on data/human biases in AI really well! Love this framing and I'll be using it going forwards
Nice addition to the conversation. I think the hardest part is that we think bias is defacto bad.