Who Controls What AI Learns—And What It Leaves Out
We live in a time when artificial intelligence can write essays, recommend policies, generate images, and answer complex questions with convincing authority. And thanks to the rapid democratization of AI—especially open-source models—these tools are more widely available than ever before.
This is often celebrated as a win for innovation and access. But beneath the surface lies a less comfortable truth: every AI model has a built-in bullshit filter.
What do we mean by that?
Every AI system, no matter how open or advanced, is trained on a curated corpus of information. That data is not neutral. Someone—often behind closed doors—chooses what the model sees, and what it doesn’t. This filtering process shapes what AI knows, how it behaves, and what it tells the rest of us. And most users have no idea it’s happening.
The Upside of AI Democratization
There’s no doubt that lowering the barrier to entry for AI development and deployment brings powerful benefits. Startups, educators, researchers, and individuals now have access to tools once reserved for Big Tech. From local journalism to custom language learning, democratized AI is unlocking creativity and utility across sectors.
But availability alone does not mean transparency. And trust, in any system that influences our decisions, requires more than just access.
The Hidden Layer: What AI Gets to Learn
Every large language model is shaped by its training data—and every training dataset is filtered. Some sources are excluded for copyright reasons. Others are removed to avoid toxicity or controversy. Still others are prioritized because they are easy to scrape, or align with a particular worldview or language pattern.
These decisions amount to a bullshit filter: a quiet, upstream editorial process that decides which content is legitimate, useful, or relevant—and which is not.
But here’s the problem: we don’t know who is doing the filtering. In most cases, the model’s creators don’t disclose their data sources, let alone the criteria used to exclude certain content. This leaves users—citizens, policymakers, and developers—relying on black-box systems that carry invisible biases and blind spots.
What Gets Filtered Gets Forgotten
When entire schools of thought, minority languages, alternative histories, or controversial perspectives are underrepresented in training data, their absence is echoed in the model’s output. Over time, these gaps reinforce dominant narratives and marginalize dissent—not by malice, but by omission.
And as more of society relies on AI to inform decisions, the risk isn’t just skewed results—it’s a narrowing of collective understanding.
We begin to mistake algorithmic answers for objectivity, not realizing they are mediated by unseen filters. In this way, AI becomes not just a tool for information, but a gatekeeper of knowledge.
Responsible AI Means Exposing the Filter
If AI is to serve the public good, its bullshit filter must be made visible. This means:
- Transparency in training data: What went in, what was left out, and why.
- Diverse data governance: Including communities who are most impacted by AI’s outputs.
- Clear accountability: Who benefits from the model, and who bears the cost of its blind spots?
At the Cambrian Institute, we believe that AI’s promise lies not just in access, but in accountability. Democratization without transparency is not progress—it’s obfuscation.
In a world where AI is rapidly becoming the lens through which people see, search, and solve, we must ask: Who’s adjusting the lens? Who’s tuning the bullshit filter? And are we okay with the answers it gives us?
Because the future of knowledge isn’t just about information—it’s about who gets to decide what counts as truth.