This is the third article in my series on the 2025 AI Index Report.
This 3rd chapter is about Responsible AI, or as it's sometimes known, the boring stuff that becomes extremely interesting the moment an AI system leaks patient records, suggests financial fraud, or rewrites history during an election year.
The truth is, we're building systems that can outperform humans in language, code, reasoning, and even manipulation. But when it comes to safety, fairness, and accountability? Let's just say the gap between capability and responsibility is still very wide.
So let's take a look at what the AI Index has to say about how the world is really doing on responsible AI: the risks we're seeing, the progress we're making, and the few serious efforts actually trying to fix it.
1. The Problems Are Growing: Incidents, Hallucinations, and Hidden Biases
AI Incidents Are on the Rise
The AI Incident Database documents everything from mildly concerning missteps to major failures with legal, ethical, or social consequences. In 2024 alone, cases ranged from customer service chatbots giving dangerous medical advice, to AI-powered resume screening tools reinforcing discriminatory hiring patterns. There have also been reports of content recommendation algorithms suggesting self-harm content to vulnerable users.

Importantly, the rising number of reports isn't necessarily a signal of deteriorating model performance. It can be as likely the result of three things happening at once:
- Wider adoption across industries and demographics
- Greater public literacy about what AI is (and isn't) supposed to do
- Stronger oversight mechanisms, internal, governmental, and journalistic
In other words, we're not necessarily seeing more failures because AI is worse, but probably because more eyes are watching more closely, and that's progress in its own right.
The figure below shows that adversarial attacks and privacy violations are the most commonly reported AI-related incidents over the past two years. Not far behind are unintended decision-making, model bias, and performance failures.

Hallucinations
If you've ever asked an LLM to summarize a legal opinion and it confidently cited a court case that doesn't exist, you've seen a hallucination. Not because the model is broken, but because it's doing exactly what it was built to do: generate plausible language. And in that sense, it's behaving a lot like a person: filling in gaps, smoothing over uncertainty, even making things up and delivering it all with a straight face (I know, it feels like I am describing someone you know).
In medical use cases, hallucinations can be dangerous. Evaluations show that even top-tier models like GPT-4 fabricate plausible but incorrect diagnoses or treatment plans with surprising regularity. One study on the FACTS benchmark found GPT-4 produced factually grounded responses only 61.7% of the time when dealing with clinical documents. In law, the stakes are no lower. GPT-based tools have generated entire legal opinions built on fictional precedents, and in at least one real-world case, lawyers were sanctioned in federal court after submitting filings that included completely fabricated case citations generated by AI.

That said, hallucinations aren't just a bug, they're the byproduct of what makes these models so useful. The ability to generate new ideas, analogies, or explanations without copying from seen examples is exactly what enables them to write poems, brainstorm product names, or explain quantum mechanics to a 10-year-old. It's that creative edge that makes them powerful, and also what makes them unreliable. The tension between fluency and factuality isn't going away anytime soon. It's one of the core design dilemmas of generative AI.
Bias
Efforts to reduce bias in LLMs have come a long way. Many models now undergo extensive fine-tuning and red-teaming focused on fairness, neutrality, and respect for diversity. But here's the twist: models that perform well on bias benchmarks can still exhibit problematic outputs in less constrained, real-world conditions.
One recent example: a study found that even after fairness training, vision-language models were more likely to associate darker-skinned individuals with criminal behavior in image-captioning tasks. These were models that had passed conventional bias tests. Yet subtle, deeply encoded stereotypes remained.
Similarly, in natural language, a model may pass structured tests but still display gendered assumptions in more open-ended generation. Ask it to describe a CEO, and it may skew male; a nurse, and it may skew female. These are reflections of the statistical patterns in the data the model was trained on.
This is the difference between explicit and implicit bias. Explicit bias is what shows up in structured tests with clear labels and categories and it's relatively easy to measure and fix. Implicit bias, on the other hand, is more subtle. It shows up in tone, framing, word choice, and associations. And because it hides in plain sight, it's much harder to detect and even harder to eliminate. Yet its impact, especially when scaled across millions of users, is just as real.

So while it's tempting to declare success after a benchmark score bumps by a few points, real-world deployments keep reminding us: bias doesn't vanish just because it's harder to measure.
Agent mistakes
Once upon a time, all a language model could do was answer your question. Now? It can book your meeting, scrape a website, call an API, update your spreadsheet, and send an email to your boss. All before you've finished your coffee.
Welcome to the age of agents: AI systems that don't just respond, they act. They plan, they reason (sort of), and they try things. And when they mess up, it's rarely a one-off. It's a chain reaction.
In 2024, researchers behind the AgentSmith study showed just how fast things can unravel. One cleverly crafted image managed to compromise an entire fleet of thousands of multimodal agents by exploiting a single, shared vulnerability. Why? Because these systems aren't operating in silos. They're linked, and they talk to each other.
Here's the issue: most current safety tools aren't built for this. They're designed to check if a model answers a prompt badly, not if it collaborates badly with other models in a live system. And as these agents become more common in customer support, automation, search, and even software development, that gap starts to matter.
We've taught our models how to act. Now we need to teach them how to fail gracefully, and preferably alone.
2. Awareness Is Growing, but Gaps Remain
Responsible AI Is Getting Attention
There's no denying the shift. Research in Responsible AI has accelerated rapidly, with the number of papers focused on safety, fairness, robustness, and governance more than doubling in recent years.

And this awareness isn't limited to academia. According to the AI Index, more companies are assigning responsibility for AI governance to senior leadership, especially Chief Data Officers, CTOs, and increasingly, CEOs. In 2024, nearly half of organizations surveyed reported that a C-level executive is now formally responsible for AI risk and compliance.

The strategic value of Responsible AI is also becoming clearer. It's increasingly seen as a business asset. From customer trust to reputational resilience, organizations that prioritize safety and fairness can see competitive upside.

The Gap Between Awareness and Action
Still, awareness alone isn't enough. The data shows a significant disconnect between the risks companies recognize and the ones they actually work to mitigate.

Take explainability. Despite being foundational to understanding and debugging AI systems, only about 40% of organizations even consider it a risk. This suggests a deeper lack of clarity about what explainability actually means, and how it underpins every other aspect of trust, from bias mitigation to error correction.
The most common reasons cited? Lack of knowledge, inadequate tools, and unclear definitions, all pointing to a still-immature understanding of Responsible AI at the operational level.

Maturity Is Growing
There's also progress in formalizing AI governance structures. Maturity assessments show that more organizations are putting frameworks in place, conducting internal audits, and assigning formal roles to manage AI risk.
But there's a catch: organizational maturity is often outpacing operational readiness. Companies may have policies on paper, but lack the day-to-day mechanisms to implement them consistently. It's what researchers call the "governance illusion" or the belief that having a framework is the same as enforcing it.

And the disparity isn't the same everywhere:

3. Governments, Companies, and Researchers Trying
Policy Is Catching Up (Mostly)
The EU AI Act is leading the global charge in turning abstract principles into enforceable obligations. It classifies AI systems by risk, mandates transparency for high-risk applications, and sets out real consequences for non-compliance. It's not perfect, but it's the most comprehensive attempt yet to regulate frontier AI at scale.
Other initiatives, including the U.S. Executive Order on AI and OECD AI Principles, are helping lay the groundwork for broader alignment. Still, enforcement and global coordination remain limited. Most efforts are national or regional, and the pace of policymaking continues to lag behind the pace of model deployment.
Companies Are Building Frameworks, but Still Avoid Transparency
Many companies are building internal governance processes, publishing model cards, and participating in benchmarking initiatives. But when it comes to actual transparency, the numbers aren't flattering.
According to the Foundation Model Transparency Index (FMTI):
The average developer scored 58 out of 100. Disclosure is especially weak around training data sources, labor practices, and real-world impacts.

Data licensing is another weak spot. The Index highlights how many datasets used in model training are licensed ambiguously, or not at all. This matters not just for legal compliance, but for basic accountability.

Meanwhile, sectors like medicine, education, and journalism are increasingly locking down their data, pushing back against AI systems that were built by scraping everything available online.

Researchers Are Raising the Bar on Evaluation
The academic community has stepped up with new benchmarks and tools that go beyond accuracy and into real-world behavior.
HELM (Holistic Evaluation of Language Models) and AIR Bench are two of the most promising initiatives:
- HELM evaluates models across a wide range of tasks and metrics for performance, robustness, fairness, calibration, and efficiency.
- AIR Bench focuses on alignment and instruction-following, making it more representative of how LLMs are actually used.


There's also progress on adversarial robustness. One standout technique: Targeted Latent Adversarial Training (LAT). It tweaks internal model representations to reduce vulnerability to harmful prompts, without degrading performance.

And Then There's Misinformation
As we head into global election cycles, LLMs are poised to play an uncomfortably large role. They can generate political propaganda, impersonate candidates, and produce fake news articles indistinguishable from real ones. Several red-teaming efforts have already shown how easy it is to prompt models into generating voter suppression tactics or fabricated policy statements that sound alarmingly official.
The tools to detect and prevent this kind of misuse are still playing catch-up. And once misinformation is out, it becomes a training problem. That false information can make its way back into future models, reinforcing itself through the next generation of training runs.
Efforts like SynthID, a digital watermarking system developed by Google DeepMind, are a step in the right direction. SynthID embeds imperceptible tags into AI-generated content across text, audio, image, and video, helping platforms trace and flag synthetic outputs. It's not a silver bullet, but it's a sign that at least some developers are thinking about provenance and accountability at the point of generation, not just after the damage is done.
Conclusion
It would be unfair to say nothing is being done. In fact, 2024 marks a turning point. More companies are treating Responsible AI as a core function. Governments are finally moving beyond guidelines into regulation. Researchers are building tools that evaluates AI. And we're starting to see real innovation in benchmarks, red teaming, transparency, and risk modeling.
But we're not there yet. The AI Index shows clearly: for every step forward, there's a challenge still unaddressed. Many of the most pressing risks like implicit bias, cascading agent failures, and misinformation loops are still poorly understood, let alone consistently measured. Even when we can measure them, mitigation is often ad hoc or underfunded.
The field needs a complete accounting of where AI systems can fail, how often those failures occur, and what it takes to meaningfully reduce harm. That means better data, tools, incentives, and habits, because responsibility can't be retrofitted after deployment. We've made progress. But if these systems are going to scale safely into law, education, healthcare, and infrastructure, then so must our capacity to document risk, evaluate it rigorously, and respond like it matters. Because it does.



