What governance do we actually need in place before deploying AI in production?

A documented risk assessment, a clear human in the loop, logging you can actually audit, and a named owner for incidents. Most teams have none of this when I first look and are surprised how fast you can build a defensible baseline. A readiness diagnostic is the fastest way to find your real gaps, and it is one of the core services I offer at Verum Services.

How do I balance innovation speed with responsible AI practices?

Split the problem into two tracks. A lightweight experimentation lane with tight data constraints where teams can move fast, and a production lane with real governance for anything customer facing. Treating these as one process is why most companies end up either blocked or reckless.

How do I write an internal AI policy that people will actually follow?

Keep it short, tie it to the tools and workflows your team already uses, and make enforcement someone's explicit job. A six page policy aligned with real operations beats a sixty page document nobody reads. If this is unfamiliar territory, a governance engagement will cut the trial and error dramatically.

As Models Improve, So Must Responsibility

This is the third article in my series on the 2025 AI Index Report.

This 3rd chapter is about Responsible AI, or as it's sometimes known, the boring stuff that becomes extremely interesting the moment an AI system leaks patient records, suggests financial fraud, or rewrites history during an election year.

The truth is, we're building systems that can outperform humans in language, code, reasoning, and even manipulation. But when it comes to safety, fairness, and accountability? Let's just say the gap between capability and responsibility is still very wide.

So let's take a look at what the AI Index has to say about how the world is really doing on responsible AI: the risks we're seeing, the progress we're making, and the few serious efforts actually trying to fix it.

1. The Problems Are Growing: Incidents, Hallucinations, and Hidden Biases

AI Incidents Are on the Rise

The AI Incident Database documents everything from mildly concerning missteps to major failures with legal, ethical, or social consequences. In 2024 alone, cases ranged from customer service chatbots giving dangerous medical advice, to AI-powered resume screening tools reinforcing discriminatory hiring patterns. There have also been reports of content recommendation algorithms suggesting self-harm content to vulnerable users.

Number of AI incidents and controversies reported, 2012-2024

Importantly, the rising number of reports isn't necessarily a signal of deteriorating model performance. It can be as likely the result of three things happening at once:

Wider adoption across industries and demographics
Greater public literacy about what AI is (and isn't) supposed to do
Stronger oversight mechanisms, internal, governmental, and journalistic

In other words, we're not necessarily seeing more failures because AI is worse, but probably because more eyes are watching more closely, and that's progress in its own right.

The figure below shows that adversarial attacks and privacy violations are the most commonly reported AI-related incidents over the past two years. Not far behind are unintended decision-making, model bias, and performance failures.

AI-related incidents by type, including adversarial attacks and privacy violations, 2023-2024

Hallucinations

If you've ever asked an LLM to summarize a legal opinion and it confidently cited a court case that doesn't exist, you've seen a hallucination. Not because the model is broken, but because it's doing exactly what it was built to do: generate plausible language. And in that sense, it's behaving a lot like a person: filling in gaps, smoothing over uncertainty, even making things up and delivering it all with a straight face (I know, it feels like I am describing someone you know).

In medical use cases, hallucinations can be dangerous. Evaluations show that even top-tier models like GPT-4 fabricate plausible but incorrect diagnoses or treatment plans with surprising regularity. One study on the FACTS benchmark found GPT-4 produced factually grounded responses only 61.7% of the time when dealing with clinical documents. In law, the stakes are no lower. GPT-based tools have generated entire legal opinions built on fictional precedents, and in at least one real-world case, lawyers were sanctioned in federal court after submitting filings that included completely fabricated case citations generated by AI.

Hallucination rates across leading AI models on clinical and factual benchmarks

That said, hallucinations aren't just a bug, they're the byproduct of what makes these models so useful. The ability to generate new ideas, analogies, or explanations without copying from seen examples is exactly what enables them to write poems, brainstorm product names, or explain quantum mechanics to a 10-year-old. It's that creative edge that makes them powerful, and also what makes them unreliable. The tension between fluency and factuality isn't going away anytime soon. It's one of the core design dilemmas of generative AI.

Bias

Efforts to reduce bias in LLMs have come a long way. Many models now undergo extensive fine-tuning and red-teaming focused on fairness, neutrality, and respect for diversity. But here's the twist: models that perform well on bias benchmarks can still exhibit problematic outputs in less constrained, real-world conditions.

One recent example: a study found that even after fairness training, vision-language models were more likely to associate darker-skinned individuals with criminal behavior in image-captioning tasks. These were models that had passed conventional bias tests. Yet subtle, deeply encoded stereotypes remained.

Similarly, in natural language, a model may pass structured tests but still display gendered assumptions in more open-ended generation. Ask it to describe a CEO, and it may skew male; a nurse, and it may skew female. These are reflections of the statistical patterns in the data the model was trained on.

This is the difference between explicit and implicit bias. Explicit bias is what shows up in structured tests with clear labels and categories and it's relatively easy to measure and fix. Implicit bias, on the other hand, is more subtle. It shows up in tone, framing, word choice, and associations. And because it hides in plain sight, it's much harder to detect and even harder to eliminate. Yet its impact, especially when scaled across millions of users, is just as real.

Bias benchmark performance showing explicit vs implicit bias in AI models

So while it's tempting to declare success after a benchmark score bumps by a few points, real-world deployments keep reminding us: bias doesn't vanish just because it's harder to measure.

Agent mistakes

Once upon a time, all a language model could do was answer your question. Now? It can book your meeting, scrape a website, call an API, update your spreadsheet, and send an email to your boss. All before you've finished your coffee.

Welcome to the age of agents: AI systems that don't just respond, they act. They plan, they reason (sort of), and they try things. And when they mess up, it's rarely a one-off. It's a chain reaction.

In 2024, researchers behind the AgentSmith study showed just how fast things can unravel. One cleverly crafted image managed to compromise an entire fleet of thousands of multimodal agents by exploiting a single, shared vulnerability. Why? Because these systems aren't operating in silos. They're linked, and they talk to each other.

Here's the issue: most current safety tools aren't built for this. They're designed to check if a model answers a prompt badly, not if it collaborates badly with other models in a live system. And as these agents become more common in customer support, automation, search, and even software development, that gap starts to matter.

We've taught our models how to act. Now we need to teach them how to fail gracefully, and preferably alone.

2. Awareness Is Growing, but Gaps Remain

Responsible AI Is Getting Attention

There's no denying the shift. Research in Responsible AI has accelerated rapidly, with the number of papers focused on safety, fairness, robustness, and governance more than doubling in recent years.

Growth in Responsible AI research publications over recent years

And this awareness isn't limited to academia. According to the AI Index, more companies are assigning responsibility for AI governance to senior leadership, especially Chief Data Officers, CTOs, and increasingly, CEOs. In 2024, nearly half of organizations surveyed reported that a C-level executive is now formally responsible for AI risk and compliance.

Reported RAI roles and structures within organizations - Responsible AI in the Global Context: Maturity Model and Survey, Reuel et al., 2024

The strategic value of Responsible AI is also becoming clearer. It's increasingly seen as a business asset. From customer trust to reputational resilience, organizations that prioritize safety and fairness can see competitive upside.

Strategic value and business benefits of Responsible AI adoption

The Gap Between Awareness and Action

Still, awareness alone isn't enough. The data shows a significant disconnect between the risks companies recognize and the ones they actually work to mitigate.

Gap between AI risks recognized by companies and risks actually mitigated

Take explainability. Despite being foundational to understanding and debugging AI systems, only about 40% of organizations even consider it a risk. This suggests a deeper lack of clarity about what explainability actually means, and how it underpins every other aspect of trust, from bias mitigation to error correction.

The most common reasons cited? Lack of knowledge, inadequate tools, and unclear definitions, all pointing to a still-immature understanding of Responsible AI at the operational level.

Most common barriers to Responsible AI adoption: lack of knowledge, tools, and definitions

Maturity Is Growing

There's also progress in formalizing AI governance structures. Maturity assessments show that more organizations are putting frameworks in place, conducting internal audits, and assigning formal roles to manage AI risk.

But there's a catch: organizational maturity is often outpacing operational readiness. Companies may have policies on paper, but lack the day-to-day mechanisms to implement them consistently. It's what researchers call the "governance illusion" or the belief that having a framework is the same as enforcing it.

Comparison of organizational and operational maturity levels across all the organizations - Responsible AI in the Global Context: Maturity Model and Survey, Reuel et al., 2024

And the disparity isn't the same everywhere:

Operational maturity level across different industries - Responsible AI in the Global Context: Maturity Model and Survey, Reuel et al., 2024

3. Governments, Companies, and Researchers Trying

Policy Is Catching Up (Mostly)

The EU AI Act is leading the global charge in turning abstract principles into enforceable obligations. It classifies AI systems by risk, mandates transparency for high-risk applications, and sets out real consequences for non-compliance. It's not perfect, but it's the most comprehensive attempt yet to regulate frontier AI at scale.

Other initiatives, including the U.S. Executive Order on AI and OECD AI Principles, are helping lay the groundwork for broader alignment. Still, enforcement and global coordination remain limited. Most efforts are national or regional, and the pace of policymaking continues to lag behind the pace of model deployment.

Companies Are Building Frameworks, but Still Avoid Transparency

Many companies are building internal governance processes, publishing model cards, and participating in benchmarking initiatives. But when it comes to actual transparency, the numbers aren't flattering.

According to the Foundation Model Transparency Index (FMTI):

The average developer scored 58 out of 100. Disclosure is especially weak around training data sources, labor practices, and real-world impacts.

Foundation Model Transparency Index scores across major AI developers

Data licensing is another weak spot. The Index highlights how many datasets used in model training are licensed ambiguously, or not at all. This matters not just for legal compliance, but for basic accountability.

Data licensing ambiguity in datasets used for model training

Meanwhile, sectors like medicine, education, and journalism are increasingly locking down their data, pushing back against AI systems that were built by scraping everything available online.

Sectors restricting data access and pushing back against AI scraping

Researchers Are Raising the Bar on Evaluation

The academic community has stepped up with new benchmarks and tools that go beyond accuracy and into real-world behavior.

HELM (Holistic Evaluation of Language Models) and AIR Bench are two of the most promising initiatives:

HELM evaluates models across a wide range of tasks and metrics for performance, robustness, fairness, calibration, and efficiency.
AIR Bench focuses on alignment and instruction-following, making it more representative of how LLMs are actually used.

HELM holistic evaluation of language models across performance, robustness, and fairness

AIR Bench alignment and instruction-following evaluation results

There's also progress on adversarial robustness. One standout technique: Targeted Latent Adversarial Training (LAT). It tweaks internal model representations to reduce vulnerability to harmful prompts, without degrading performance.

Targeted Latent Adversarial Training reducing vulnerability to harmful prompts

And Then There's Misinformation

As we head into global election cycles, LLMs are poised to play an uncomfortably large role. They can generate political propaganda, impersonate candidates, and produce fake news articles indistinguishable from real ones. Several red-teaming efforts have already shown how easy it is to prompt models into generating voter suppression tactics or fabricated policy statements that sound alarmingly official.

The tools to detect and prevent this kind of misuse are still playing catch-up. And once misinformation is out, it becomes a training problem. That false information can make its way back into future models, reinforcing itself through the next generation of training runs.

Efforts like SynthID, a digital watermarking system developed by Google DeepMind, are a step in the right direction. SynthID embeds imperceptible tags into AI-generated content across text, audio, image, and video, helping platforms trace and flag synthetic outputs. It's not a silver bullet, but it's a sign that at least some developers are thinking about provenance and accountability at the point of generation, not just after the damage is done.

Conclusion

It would be unfair to say nothing is being done. In fact, 2024 marks a turning point. More companies are treating Responsible AI as a core function. Governments are finally moving beyond guidelines into regulation. Researchers are building tools that evaluates AI. And we're starting to see real innovation in benchmarks, red teaming, transparency, and risk modeling.

But we're not there yet. The AI Index shows clearly: for every step forward, there's a challenge still unaddressed. Many of the most pressing risks like implicit bias, cascading agent failures, and misinformation loops are still poorly understood, let alone consistently measured. Even when we can measure them, mitigation is often ad hoc or underfunded.

The field needs a complete accounting of where AI systems can fail, how often those failures occur, and what it takes to meaningfully reduce harm. That means better data, tools, incentives, and habits, because responsibility can't be retrofitted after deployment. We've made progress. But if these systems are going to scale safely into law, education, healthcare, and infrastructure, then so must our capacity to document risk, evaluate it rigorously, and respond like it matters. Because it does.