The U.S. government is getting a deeper look at the next wave of frontier AI models before they reach the public. On May 5, 2026, the Center for AI Standards and Innovation, or CAISI, at NIST announced new agreements with Google DeepMind, Microsoft and xAI for pre-deployment evaluations and targeted research.
The practical meaning is straightforward: more leading AI labs are agreeing to let federal evaluators test advanced systems before public release, then continue assessing them after deployment. That does not make CAISI a product approval board, and NIST did not say the government will have veto power over releases. It does, however, make independent government testing a more normal part of the frontier AI release cycle.
Read Next: Musk Says xAI Partly Used OpenAI Models to Train Grok
What CAISI Announced
CAISI says the agreements will let it evaluate models before they are publicly available, carry out post-deployment assessments and run research into frontier AI capabilities. NIST also said CAISI has already completed more than 40 evaluations, including evaluations of state-of-the-art models that remain unreleased.
The new deals build on earlier U.S. AI Safety Institute agreements with OpenAI and Anthropic. Those 2024 agreements gave the government access to major new models before and after public release for safety research, testing and evaluation. The latest announcement widens that structure to include Google DeepMind, Microsoft and xAI.
Why Early Access Matters
Frontier AI models are now being evaluated for more than general helpfulness or chatbot safety. CAISI’s public materials say the center focuses on demonstrable risks such as cybersecurity, biosecurity and chemical weapons, while also assessing U.S. and foreign AI capabilities and the state of international AI competition.
That focus matters because the most serious risks are often capability-driven. A model that is better at coding, tool use and long-horizon reasoning may also become more useful for vulnerability discovery, reconnaissance or other dual-use tasks. The point of early testing is to see those behaviors before a broad public rollout makes them harder to contain.
From AI Safety Institute to CAISI
CAISI is not simply a rebranded research group. In June 2025, the Commerce Department announced that the former U.S. AI Safety Institute would become the Center for AI Standards and Innovation, with a stronger emphasis on measurement science, voluntary standards, national security and U.S. competitiveness.
That framing aligns with the White House’s America’s AI Action Plan, which centers on accelerating innovation, building AI infrastructure and leading internationally on security. The result is a testing posture that tries to avoid heavy release licensing while still giving the government technical visibility into the most capable systems.
How the Testing Could Work
NIST says developers frequently provide CAISI with models that have reduced or removed safeguards so evaluators can test national security-related capabilities and risks more thoroughly. That detail is important. Testing a heavily restricted public interface can miss what a model is capable of when safeguards fail, are bypassed or are intentionally removed in controlled settings.
CAISI also says evaluators from across government may participate through its TRAINS Taskforce, an interagency group focused on AI national security concerns. The agreements support testing in classified environments, which suggests the government wants to examine sensitive threat scenarios without pushing all details into public reports.
What Industry Gets Out of It
For AI labs, the incentive is not only regulatory goodwill. Early government testing can produce feedback before launch, support voluntary product improvements and give companies a stronger answer when enterprise buyers ask how advanced models were assessed. Microsoft, for example, said its new work with CAISI and the U.K. AI Security Institute will focus on testing frontier models, assessing safeguards and improving evaluation science.
For buyers, the key is to read these agreements correctly. A CAISI evaluation is not the same thing as a blanket guarantee that a model is safe for every workflow. It is a signal that the developer is participating in a more serious evaluation process, especially around national security risk, but customers still need their own governance, red-team testing, logging and incident response.
The Open Questions
The biggest unresolved issue is transparency. Some testing will necessarily involve sensitive details, especially where cyber or classified evaluation is involved. But if the public only sees high-level announcements, it will be difficult for enterprises, researchers and policymakers to compare how different models performed or whether the feedback materially changed release decisions.
The other question is whether voluntary agreements can keep up with model capability. CAISI’s model depends on cooperation from frontier developers and enough government talent to test systems deeply. If the testing becomes too slow, labs may treat it as process overhead. If it is too opaque, the market may treat it as a trust badge without enough substance behind it.
The Bottom Line
The new CAISI agreements are best understood as a shift in AI release norms. The U.S. government is not publicly claiming approval authority over Google DeepMind, Microsoft or xAI models. It is building a more formal channel for early access, technical measurement and national security review.
That is a meaningful development for AI policy and enterprise risk management. The frontier AI market is moving too fast for safety claims based only on vendor self-assessment. CAISI’s challenge is to turn early access into useful evidence, and to do it without slowing legitimate deployment or hiding all of the important findings from the people who have to buy, deploy and govern these systems.
Sources
- NIST: CAISI signs agreements with Google DeepMind, Microsoft and xAI
- NIST: Center for AI Standards and Innovation
- NIST: 2024 agreements with OpenAI and Anthropic
- U.S. Commerce Department: Transforming the AI Safety Institute into CAISI
- White House: America’s AI Action Plan
- Microsoft: Advancing AI evaluation with CAISI and the U.K. AISI