You retrained your model last week. How do you know it’s better than the one it replaced? Not “the benchmarks improved”, but: how do you certify it’s better, with the kind of documented evidence that survives a regulatory audit?
Your research agent just pulled a client’s portfolio data to generate a briefing. Who logged that access? What happens to that data after the briefing is delivered? If the client asks, can you produce the lineage in under an hour?
An agent is about to delete 200 files from a shared drive. It has permission. The files match the pattern. But some of them are active deal documents. Who decides whether “matches the pattern” is sufficient authorization for a destructive action, and is that decision framework written down anywhere, or does it live in someone’s head?
Now scale it up. Agent A at your firm executes a trade based on a research signal from Agent B at a counterparty. The trade goes wrong. Who’s liable: your firm, theirs, or the agent orchestration layer that connected them? Whose compliance policies governed the interaction? Where’s the audit trail that proves the interaction was within both firms’ entitlement bounds?
One more: Your compliance agent flags a suspicious transaction and routes it to your counterparty’s KYC agent for verification. Whose data retention policy applies to the shared investigation record? If regulators ask for the full chain of agent-to-agent decisions six months later, can either firm reconstruct it?
These aren’t hypotheticals. They’re what enterprises with 50+ agents in production are facing right now. And the numbers are sobering: 80% of enterprises have agent use cases in development, but only 14% have governance frameworks in place. Just 2% report their agents are governed in an always-on, consistent manner.¹ Gartner projects 40% of agentic AI projects will be canceled by end of 2027 primarily governance and risk control failures.²
The protocol stack for AI agents is solidifying fast but unfortunately the governance layer is nowhere close.

What’s Been Solved — and What Hasn’t
To understand the governance gap, you need to understand what the agent protocol landscape actually looks like as of March 2026. It’s more mature than most practitioners realize — and the maturity is exactly what makes the governance gap so glaring.
Agent-to-tool is settled. The Model Context Protocol (MCP), introduced by Anthropic in November 2024 and donated to the Linux Foundation’s Agentic AI Foundation (AAIF) in December 2025, has achieved 97 million monthly SDK downloads.³ RedMonk called it “the fastest adopted standard we’ve ever seen.” Every major AI platform (OpenAI, Google, Microsoft, Cursor, GitHub Copilot) supports it. MCP solves tool discovery, credential isolation (agents never see raw API keys), and structured error handling. It’s the plumbing. It works.
Agent-to-agent is emerging. Google’s Agent-to-Agent Protocol (A2A), launched April 2025 and moved to AAIF governance in late 2025, provides agent discovery via Agent Cards, task lifecycle management, and multi-transport support (JSON-RPC, gRPC, HTTP/SSE).⁴ It has 150+ organizational supporters and five SDKs. The adoption lags MCP (multi-agent cross-boundary workflows are still early) but the spec is solid and the governance is neutral.
Tool-to-agent-to-human is getting easier. Claude Code’s elicitation hooks now allow MCP tools to ask the agent clarifying questions, which the agent can escalate to the human operator.⁵ This closes a loop that previously required developers to hardcode every possible tool interaction path. The direction is clear: agents mediating between tools and humans with structured escalation, not just executing blindly.
Agent-to-business and business-to-agent is where enterprise connectors live. Anthropic’s February 2026 enterprise launch shipped 11 open-sourced plugins spanning productivity, financial analysis, HR, and operations, with live MCP connectors to FactSet, S&P Capital IQ, LSEG, Gmail, Google Drive, DocuSign, and Slack.⁶ Microsoft’s Copilot Cowork, powered by Anthropic’s Claude, orchestrates across Outlook, Teams, Excel, and SharePoint within the customer’s M365 tenant.⁷ The enterprise integration layer is filling in rapidly.
Human-to-agent is the natural language interface everyone already uses, i.e. the chat window, the slash command, the scheduled task. This is the one layer nobody needs to worry about.
So what’s missing? Not the protocols. Not the tooling. Certainly not the integrations. What’s missing is governance. More importantly, it’s missing in two directions simultaneously. Inside the enterprise: how do you govern the agents you deploy? Between enterprises: when your agents interact with another firm’s agents, whose rules apply?
Neither question has a standard answer currently and given that we’re at an inflection point with Agentic AI, both are urgent.
The Gap Has Two Dimensions
The conventional framing of the “AI governance problem” treats it as a single challenge. It isn’t. Enterprise agent governance splits cleanly into two distinct problem domains, and conflating them is why progress has stalled.
Intra-enterprise governance is the internal question: how does an organization govern its own agents? This covers model and agent evaluation (proving V2 is better than V1), versioning and rollback (managing agent lifecycles like software artifacts), data governance (PII, secrets, data lineage), destructive action prevention (permission models, kill switches, risk taxonomies), and operational registries (tracking what agents exist, who owns them, and what they’re authorized to do). The technical infrastructure for much of this exists. The organizational discipline to use it does not.
Inter-enterprise governance is the boundary question: when agents from different organizations interact — via A2A or any other coordination protocol — who governs the interaction? Whose compliance policies apply? What audit metadata accompanies every cross-boundary action? Who bears liability when a multi-agent workflow fails across firm lines? No protocol, no standard, no governance framework addresses this today. MCP solves tool discovery. A2A solves agent coordination. Neither solves governance at the organizational boundary.
These two dimensions are not independent. You cannot govern agent interactions across firms if you cannot govern them within your own firm first. An enterprise that can’t answer “how do we certify our agent V2 is production-ready?” has no business asking “how should our agents interact with Goldman’s agents?” The intra-enterprise maturity curve is a precondition for the inter-enterprise standard.
First: the state of internal governance — what works, what’s fragmented, what’s missing. Then: a first-principles analysis of the cross-enterprise problem, grounded in three historical parallels that reveal a pattern so consistent it amounts to a playbook.

Inside the Walls: The State of Intra-Enterprise Governance
The technical building blocks exist. Most enterprises are assembling them after the house is already on fire.
Evaluation: The Certification Gap
Anthropic formalized the distinction between regression evals (“does the agent still handle everything it used to?”) and capability evals (“what new things can this agent do?”) in early 2026.⁸ Their skill-creator benchmark mode runs eval suites and records pass rate, latency, and token usage, which creates a quantitative baseline for comparison after model updates or prompt changes. OpenAI’s AgentKit provides parallel infrastructure with automated and human-in-the-loop grading at scale.⁹ Langfuse offers native A/B testing in its prompt management layer for side-by-side version comparison against business metrics.¹⁰
These are working systems, not proposals. The problem is what they don’t solve: there is no standard for agent certification. SWE-bench certifies model performance on software engineering tasks. HELM benchmarks language model capabilities. No equivalent exists for “this agent version is production-ready to replace the previous one.” Enterprises implement domain-specific eval suites manually: test datasets curated by the business, pass/fail criteria defined per workflow, human review of borderline cases. Basically, every firm reinvents the wheel.
For financial services, this is not just an inconvenience, it’s a risk to regulatory complianc. FINRA’s 2026 oversight report requires “robust testing of agent capabilities and limitations.”¹¹ Without a standardized certification framework, every firm is bound to interpret “robust” differently. The ones that get it wrong will find out during an examination – and it won’t be pretty.
Versioning: The Discipline Gap
The tools are there: Claude Skills follow semantic versioning with strict backward-compatibility guarantees.¹² Microsoft Foundry and Salesforce mandate immutable versions for deployed agents and any modification requires a new version.¹³ Blue-green deployment, canary releases, and shadow testing are all documented patterns with production implementations.¹⁴
What’s missing is organizational discipline. When your agent depends on 40 custom skills and the underlying model gets upgraded or one of the skills in the chain is descoped / upgraded, what’s the migration path? When an agent reaches end-of-life, what’s the decommissioning procedure? When Skill A v2.3 is required but only v2.1 is deployed, where’s the dependency resolution? Today, most enterprises manage this with spreadsheets, manual runbooks, and human coordination.¹⁵ The infrastructure treats agents as versioned software artifacts but the organizations deploying them often don’t.
Data Governance: The Policy Engine Gap
MCP centralizes credential management server-side, agents never see raw API keys.¹⁶ Claude Code uses OS-level sandboxing (Seatbelt on macOS, Bubblewrap + seccomp on Linux/WSL) to block unauthorized file access and network calls.¹⁷ PII masking exists via tools like Microsoft Presidio, LLM Guard, and Kong AI Gateway’s native redaction capabilities.¹⁸ Data lineage agents can trace upstream sources and downstream impact across multi-agent workflows.¹⁹
The gap: no unified policy engine applies rules like “never pass PII to LLM prompts” or “never write SSNs to logs” universally across all agents in an organization, regardless of which tools they’re using. Each agent gets its own data governance configuration. Scale that to 50+ agents and the attack surface is no longer the technology; it’s the inconsistency. One agent with a misconfigured PII filter is all it takes.
Secret rotation for long-lived agents is similarly unresolved. Agents operate continuously with persistent credentials. If a key is compromised, there’s no standard mechanism to propagate fresh credentials to all affected agents without workflow disruption.
Destructive Action Prevention: The Taxonomy Gap
This is where the most visible failures have occurred. In a six-month span: At Meta, an AI safety researcher’s OpenClaw agent mass-deleted 200+ emails while ignoring explicit stop commands.²⁰ Replit’s agent deleted 1,206 executive records from a production database despite an all-caps code freeze instruction. Cursor’s agent deleted ~70 git-tracked files using rm -rf after acknowledging a halt command. Amazon’s Kiro agent reportedly deleted an entire production environment, causing a 13-hour outage.²¹ Five platforms, five control failures, one common thread: the stop mechanism broke.
The defensive infrastructure is maturing. OWASP’s Agentic Top 10 for 2026 (developed by 100+ industry experts) catalogs the risk categories: tool misuse, unexpected code execution, identity and privilege abuse.²² Claude Code implements a multi-layered permission model (normal, auto-accept, plan, bypass) with mandatory human sponsors for agent identity and lifecycle.²³ Microsoft’s Agent Governance Toolkit and open-source tools like AgentBouncr provide runtime policy enforcement and kill switches.²⁴
What nobody has standardized: an action risk taxonomy. Is deleting a file in staging “destructive” or “acceptable” and to what entitlement levels? Does modifying a database record require human review, or is it safe within scope? Is sending an email on behalf of a user always high-risk, or does it depend on the content? OWASP identifies the risk categories but doesn’t classify actions within them. Most enterprises are building their own risk matrix which go something like: rows are agent types, columns are action categories, cells contain risk levels and required approvals. Labor-intensive, inconsistent across organizations, and impossible to audit against a common standard.
Agent Registries: The First Products
Microsoft Agent 365 (generally available May 1, 2026 at $15/user/month) is the first productized intra-enterprise governance platform.²⁵ Every agent gets a Microsoft Entra Agent ID. The platform provides audit trails, conditional access (least-privilege, extended to agents), integration with Microsoft Purview for DLP enforcement, and Microsoft Defender for real-time threat detection. Kong’s MCP Registry, in tech preview since February 2026, offers centralized tool discovery and governance integrated with Kong Konnect’s API catalog.²⁶
These are real products, not whitepapers, but they’re platform-specific. Agent 365 governs the Microsoft ecosystem, Kong governs MCP tools. An enterprise running agents across Claude, Copilot, and a custom LangGraph deployment needs three separate governance configurations. Unified metadata standards (what should every agent registry capture?), cross-registry interoperability, and cost tracking/chargeback are all gaps.
The Deployment Reality
The pattern across all five domains is the same: the technical infrastructure exists but organizational adoption lags dangerously behind deployment. The 80% / 14% / 2% stat bears repeating — 80% have agents in development, 14% have governance frameworks, 2% govern agents consistently.¹ Enterprises are deploying first and attempting to govern later. Once agents are in production handling real workflows, retrofitting governance requires workflow redesign, API changes, and retraining and That IS expensive! The firms that achieve governance-first deployment now will compound that discipline into a 2-3 year operational maturity advantage by 2028.
But intra-enterprise governance, however critical, only solves half the problem. The moment your agents need to interact with agents outside your firm (and in financial services, that moment is ever so close), you cross the wall into territory that no amount of internal discipline can solve alone.
Beyond the Walls: Three Industries That Hit This Before
Cross-organizational agent governance has no standard. No protocol addresses it. No regulatory framework mandates it. The question isn’t whether it will be standardized — the history of every regulated industry that faced coordinated multi-party digital workflows says it will. The question is what the trajectory looks like and who leads.
Three historical parallels illuminate the pattern.
FIX Protocol (1992–present): Bottom-Up, Speed Wins
FIX was created in 1992 as a bilateral solution between Fidelity and Salomon Brothers to replace voice-and-paper equity trading.²⁷ The design was deliberately simple: ASCII key-value pairs over TCP sockets. A firm could build a FIX adapter in weeks, not months.
FIX 2.7 went public in 1995. By 1998, it had achieved critical mass among major institutional traders. Governance transferred to FIX Protocol Limited, a Purpose Trust with vendor neutrality enshrined in legal deed, not aspiration, but law.²⁸ Regulatory codification came after market dominance, not before. FIX 4.2, released in 2001, is still in production today (twenty-five years and counting). The sign of a mature standard isn’t version churn but version stability.
The pattern: Bilateral innovation → public release → neutral governance (legally protected) → early critical mass (3-4 year window) → regulatory codification → network effects moat.
Why it worked: The pain was acute and daily. Every equity trader experienced voice-and-paper friction multiple times per session. The solution was simple enough to adopt without specialist knowledge. And critically, the governance was neutral from inception; no single firm controlled the standard.
RIXML (2000–present): Consortium-Driven, Regulation-Accelerated
RIXML was founded in October 2000 by a consortium of 15 major buy-side and sell-side firms (Fidelity, Goldman Sachs, Morgan Stanley, T. Rowe Price, among others) to standardize research document metadata exchange.²⁹ Before RIXML, each sell-side firm packaged research in proprietary formats. Buy-side firms couldn’t systematically search, filter, or track research across providers.
The critical design decision: RIXML didn’t invent a new wire protocol. It built a domain-specific schema on top of XML, the existing base transport standard. RIXML added the financial services semantics: research metadata tagging, interaction records (who received what research, when, with what entitlements), analyst rosters, and coverage declarations.³⁰ Four modular standards, not one monolithic spec.
Adoption was painfully slow. RIXML was technically sound from 2001. Critical mass took 17 years, not because the standard was wrong, but because the coordination cost exceeded the compliance benefit. The inflection came in 2018 when MiFID II mandated research commission unbundling, making it legally necessary to track research entitlements and interactions with the precision RIXML enabled.³¹ Today, roughly 30% of firms drive 70-80% of research business activity through RIXML.³²
The pattern: Consortium genesis (competing firms co-create) → modular standards suite → 15+ years of slow voluntary adoption → regulatory mandate creates urgency → concentrated adoption among major players.
Why it matters for agentic AI: RIXML’s design insight is the one that maps most directly. You don’t need firms to agree on how to build agents. You need them to agree on how to describe what their agents did, what their agents are authorized to do, and who is responsible when something goes wrong. A domain-specific schema on top of an existing base protocol such as A2A – that’s the template.
HL7/FHIR (1987–present): Simplicity Wins, Over-Engineering Kills
HL7 was founded in 1987 to solve the N²-N integration problem in healthcare: connecting N hospital systems required roughly N² custom interfaces at ~$100K each.³³ HL7 v2 (1989) used pragmatic, pipe-delimited message formats. It achieved 95% adoption in US healthcare institutions through bottom-up demand, not regulatory mandate.³⁴
Then came the cautionary tale. HL7 v3 attempted a Reference Information Model: perfect semantic consistency, universal data coverage. Architecturally rigorous. Practically unusable. Implementation required specialist knowledge. Most organizations skipped it entirely.³⁵
FHIR (2014) corrected course: REST + JSON + discrete resources. Skills every modern developer already had. No RIM, no domain-specific parsing, no learning cliff. The 80/20 principle applied ruthlessly: cover the top 80% of use cases, let edge cases be extensions.³⁶ The 21st Century Cures Act (2016, enforced 2020-2027) mandated FHIR-based API access for all certified EHR technology but crucially, the mandate came after organic adoption had reached 30-40%.³⁷
The pattern: Vendor pain point → simple protocol → organic adoption → over-engineered successor fails → simpler replacement wins → regulatory mandate after organic critical mass.
The warning: Any attempt to standardize cross-enterprise agent governance must resist the v3 temptation. Comprehensive semantic models that require specialist knowledge to implement do not get adopted. Simple interchange formats that developers can implement in days do.
The Pattern: Six Conditions for a Standard Moment
Every successful cross-enterprise standard in the evidence base required all six conditions. No standard succeeded with fewer than five.
1. Acute, quantifiable shared pain. FIX: hours per voice trade, with transcription errors. HL7: $100K per system integration pair. RIXML: buy-side firms couldn’t search research across providers. Agentic AI today: 81% of organizations lack documented governance for machine-to-machine interactions; 46% cite integration as the primary barrier to scaling agents.³⁸ The pain exists but isn’t yet acute for most firms. Multi-agent cross-boundary workflows are early-stage. Condition partially met.
2. Simple initial solution from a credible source. FIX: ASCII key-value pairs from Fidelity + Salomon. FHIR: REST + JSON from Grahame Grieve within HL7. In every case, the initial solution was deliberately under-specified. Agentic AI today: A2A provides the base protocol: agent discovery, task lifecycle, multi-transport. What’s missing is the domain-specific profile. Condition met for infrastructure; not yet met for domain profile.
3. Low barrier to adoption. FIX: TCP sockets, weeks to integrate. FHIR: REST + JSON, no domain expertise required. HL7 v3: specialist knowledge, years to implement (and it failed). Agentic AI today: A2A uses JSON-RPC, gRPC, HTTP; these are a developer’s bread & butter. A domain profile inheriting this stack would maintain the low barrier. Condition achievable.
4. Vendor-neutral governance, legally protected. FIX: Purpose Trust with neutrality enshrined in deed poll. HL7: ANSI-accredited non-profit. RIXML: consortium with buy-side, sell-side, and vendor representation. Agentic AI today: AAIF provides neutral governance for A2A and MCP under the Linux Foundation. FINOS provides financial services-specific AI governance with multi-bank participation (BMO, Citi, Morgan Stanley, Bank of America).³⁹ Condition met.
5. Regulatory pressure after organic adoption, not before. FIX: codification after market dominance. RIXML: MiFID II, 17 years after founding. FHIR: Cures Act after 30-40% organic adoption. Premature regulation creates compliance theater. Regulation after critical mass locks in the winner. Agentic AI today: FINRA’s 2026 report requires GenAI governance but doesn’t specify interchange standards. NIST is in “federated integrator” mode. SEC exam priorities flag GenAI.⁴⁰ The pressure is building but hasn’t reached mandate stage. Condition approaching, correctly sequenced.
6. Modular scope (solve one problem, not everything). FIX: equity trading communication only. RIXML: four modular standards. FHIR: discrete resources, not a universal health model. Agentic AI today: The scope must be narrow. Not “all agent governance” but the interchange format for cross-boundary agent actions in regulated industries. Condition achievable if scope is disciplined.
Five of six conditions are met or approaching. The missing piece is the domain-specific solution itself.

What the Solution Could Look Like
Two problems and two solution tracks, but they’re connected. The intra-enterprise track is the foundation; the inter-enterprise track is the standard that sits on top of it.
Inside: Governance-First Deployment
The technical infrastructure exists. What’s needed is operational discipline, and the firms that implement it now gain a compounding advantage.
Agent certification pipelines: Before any agent version reaches production, it passes through regression evals (did we break anything?), capability evals (what improved?), and business metric validation (did it actually improve the business metric?). The certification record becomes an auditable artifact. Think of it as building on top of, and disciplined use of Anthropic’s benchmark mode, OpenAI’s Evals, or Langfuse’s A/B testing, configured as a mandatory gate rather than an optional check.
Hooks as governance primitives: Claude Code’s hook system (pre and post tool execution hooks that can intercept, validate, log, and reject tool calls, or the subagent/session start and stop hooks) provides a blueprint for deterministic governance enforcement.⁴¹ Think of it as the GitHub Actions’ webhook pattern applied to agent behavior: every tool invocation passes through a policy gate before execution. This is how you enforce “never write PII to logs” or “escalate to human for any action above $50K” consistently: At the harness level, without relying on the model to remember the rule.
Standardized action risk taxonomies: Instead of every enterprise inventing its own risk matrix, the industry needs a shared classification: what actions are destructive, high-risk, standard, or safe and most importantly, what approval gates each level requires. OWASP’s Agentic Top 10 provides the threat categories. The next step is translating threats into a shared risk taxonomy that enterprises can adopt and customize, rather than building from scratch.
Outside: A Domain-Specific A2A Profile for Financial Services
This is where RIXML’s design insight maps directly. RIXML adapted XML for research exchange between firms. The agentic equivalent adapts A2A for cross-enterprise agent governance in financial services. It won’t be a new protocol, rather a domain-specific profile on the existing one.
The profile would standardize four components:
Financial Services Agent Cards
Extensions to A2A’s Agent Cards declaring regulatory jurisdiction (SEC/FINRA/FCA registered), authorized action categories (trade execution, research distribution, compliance screening), maximum autonomous authority thresholds (dollar limits, asset classes, counterparty types), audit certification level (SOC 2, ISO 27001), and human escalation contact chain.
Compliance Event Schema
Every cross-boundary agent action produces a standardized audit event containing: initiating and receiving agent identities (cryptographic, not just names), action type from a controlled vocabulary, regulatory classification (material/non-material, advisory/executable), timestamp at exchange-time precision, reasoning trace hash (for reproducibility without exposing proprietary logic), human authorization status, and FINRA/SEC/FCA regulatory reference codes.
Entitlement Interchange
When Agent A delegates to Agent B across firm boundaries: What data can Agent B access? What actions can it take? What happens to the data after the task completes? Standardized entitlement schemas, adapted from RIXML’s Interactions Standard, mapping to existing compliance infrastructure (LDAP groups, Okta roles, Bloomberg entitlements).
Cross-Boundary Liability Metadata
The unsolved problem: when a multi-agent workflow fails, who is responsible?
Standardized liability delegation records: which firm’s agent initiated the workflow, which executed the failing action, whether the action was within entitlement bounds, and the contractual framework reference governing the inter-firm relationship.
What the domain-specific A2A profile would not standardize is implementation. Each firm builds its own agent orchestration, compliance engines, and risk controls. The profile only standardizes what crosses the boundary, i.e. the metadata envelope, not the content. You don’t need firms to agree on how to build agents. You need them to agree on what their agents are authorized to do, what their agents did, and who’s responsible when something goes wrong.
When Would It Happen
The research indicates that the trajectory would follow the historical pattern: consortium, organic adoption and finally, regulatory codification, with the intra-enterprise governance maturity as a gating precondition.
Phase 1: Consortium Formation (Late 2026-2027)
FINOS is the natural home. It already has the multi-bank consortium (BMO, Citi, Morgan Stanley, Bank of America), an AI Governance Framework v2.0 with agentic-specific controls, and the operational credibility to convene competitors.³⁹ AAIF provides the protocol governance; FINOS provides the domain expertise. A working group drafts the initial A2A Financial Services Profile. Scope is deliberately narrow: Agent Cards for regulated entities, compliance event schema, entitlement interchange. Liability metadata is deferred because it requires legal frameworks that don’t exist yet. The FIX parallel: 3-5 major institutions running cross-boundary agent workflows using the profile in production or controlled pilots by end of 2027.
Phase 2: Organic Adoption + Regulatory Awareness (2027-2029)
Early adopters publish case studies. FINRA and SEC observe the standard in use. It gets referenced — not mandated — in regulatory guidance. Vendor tooling emerges: A2A SDKs add Financial Services Profile extensions, agent platforms add profile support. The RIXML parallel applies: concentrated adoption (30% of firms driving 70-80% of activity) is the expected pattern, not a failure. If the pain is acute enough, this phase compresses from RIXML’s 17 years to 2-3 (closer to FIX’s timeline).
Phase 3: Regulatory Codification (2029-2032)
Regulators mandate standardized audit trails for cross-boundary agent actions. The FINOS profile becomes the referenced standard, the way FHIR became referenced in the Cures Act, the way FIX became implicitly expected in electronic trading regulation. The regulation doesn’t name the standard; it specifies requirements that the standard uniquely satisfies.
Obviously, there are counterarguments: Platform vendors might preempt the consortium by shipping proprietary governance layers (Microsoft’s Agent 365 is already headed this direction). Regulators might mandate top-down before organic adoption reaches critical mass, creating compliance theater rather than real governance. A2A itself might not achieve sufficient adoption to serve as the base protocol. These are real risks.
But every counterargument runs into the same historical wall: no regulated industry that faced coordinated multi-party digital workflows avoided standardization. Trading standardized (FIX). Research exchange standardized (RIXML). Health records standardized (FHIR). The only variables were timing, governance structure, and whether the standard was simple enough to achieve organic adoption before regulation forced the issue.
The firms that answer three questions first: a.) Who leads the consortium, b.) How narrow is the initial scope, and c.) Is the solution simple enough to adopt in weeks rather than months — will shape the standard for the next 20 years.
That’s not a prediction. It’s what the evidence shows, every single time.
Sources
- [Tier 2] Board.org & DataMatters, “2025 State of Enterprise Data Governance Report,” 2026
- [Tier 2] Gartner, “40% of Agentic AI Projects Will Be Canceled by End of 2027,” June 2025
- [Tier 1] Anthropic, MCP adoption data; RedMonk, “Fastest adopted standard,” late 2025
- [Tier 1] Google, A2A Protocol v1.0.0, March 2026; Linux Foundation AAIF governance
- [Tier 1] Anthropic, Claude Code elicitation hooks documentation, 2026
- [Tier 1] Anthropic, “Briefing: Enterprise Agents,” February 24, 2026
- [Tier 1] Microsoft, “Copilot Cowork,” announced March 9, 2026; GA May 1, 2026
- [Tier 1] Anthropic Engineering Blog, “Demystifying Evals for AI Agents,” 2026
- [Tier 2] OpenAI, “AgentKit and Evals,” OpenAI Docs, 2026
- [Tier 2] Langfuse, “A/B Testing — Prompt Management Features,” 2026
- [Tier 1] FINRA, “2026 Annual Regulatory Oversight Report,” December 2025
- [Tier 1] Anthropic, Claude Agent Skills semantic versioning documentation, 2026
- [Tier 1] Microsoft, “Agent Lifecycle Management — Microsoft Foundry,” Azure Learn docs
- [Tier 2] Deployment patterns: blue-green, canary, shadow testing — MarkTechPost, March 2026
- [Tier 3] Enterprise versioning practices — industry survey data and practitioner reports
- [Tier 1] Kong Inc., “Governing Claude Code: Secure Agent Harness Rollouts with Kong AI Gateway,” 2026
- [Tier 1] Anthropic, Claude Code permissions and security documentation
- [Tier 2] Microsoft Presidio; LLM Guard; Kong AI Gateway PII redaction capabilities
- [Tier 2] OvalEdge, “Data Lineage Best Practices for 2026”
- [Tier 1] Summer Yue incident, February 2026 — documented by Meta Superintelligence Labs alignment director
- [Tier 2] Replit (July 2025), Cursor (December 2025), Kiro (December 2025) agent control failures — compiled from incident reports
- [Tier 1] OWASP, “Top 10 for Agentic Applications 2026,” OWASP Gen AI Security Project
- [Tier 1] Anthropic, Claude Code permission model documentation
- [Tier 2] Microsoft, “Agent Governance Toolkit,” GitHub; AgentBouncr, open-source policy enforcement
- [Tier 1] Microsoft, “Microsoft Agent 365,” GA May 1, 2026, $15/user/month
- [Tier 1] Kong Inc., “Kong Introduces MCP Registry in Kong Konnect,” February 2, 2026
- [Tier 1] FIX Protocol origin: Lamoureux & Morstatt, 1992. Fidelity + Salomon Brothers bilateral
- [Tier 1] FIX Protocol Limited Purpose Trust, 1998. Vendor neutrality enshrined in deed poll
- [Tier 1] RIXML.org founded October 18, 2000. 15 founding members
- [Tier 1] RIXML standards suite: Research Standard, Interactions Standard, Analyst Roster Standard, Coverage Standard
- [Tier 1] MiFID II research unbundling effective January 18, 2018. ESMA, 2020
- [Tier 3] RIXML.org FAQ: ~30% of firms drive 70-80% of activity through RIXML
- [Tier 1] HL7 founding 1987, Donald W. Simborg. N²-N problem, $100K per interface pair
- [Tier 2] HL7 v2: 95% adoption in US healthcare institutions. Comply Assistant analysis
- [Tier 2] HL7 v3 failure: RIM complexity, specialist knowledge required
- [Tier 1] FHIR: Grahame Grieve, August 2011. REST + JSON + discrete resources
- [Tier 1] 21st Century Cures Act (2016), ONC Final Rule (2020), CMS Interoperability Rule
- [Tier 2] Gartner: 81% lack documented M2M governance. IDC: 46% cite integration as primary barrier
- [Tier 1] FINOS AI Governance Framework v2.0; member banks: BMO, Citi, Morgan Stanley, Bank of America
- [Tier 1] FINRA 2026 report; SEC examination priorities; NIST AI Agent Standards Initiative, February 2026
- [Tier 2] Claude Code hooks documentation; GitHub Actions webhook pattern analogy