Every examination starts the same way. A document request list arrives, the compliance team opens a shared folder, and someone starts pulling files. For most of the last two years, the AI section of that list, when it existed at all, could be answered with a policy PDF. We think that window is closing, and the regulators have been telling us so in plain language.
The signal
AI has moved from exam priorities to exam evidence
Look at what the major supervisors published, in order.
The SEC's Division of Examinations put artificial intelligence in its exam priorities, telling advisers and broker-dealers to expect reviews of how AI is actually integrated into operations and whether representations about it are accurate. That followed the Commission's first "AI washing" enforcement actions, where two advisers paid penalties for claiming AI capabilities they did not have. FINRA Notice 24-09 made the quietly radical point that no new rulebook was coming: existing rules are technology-neutral, so supervision, suitability, and books-and-records obligations apply to generative AI exactly as they apply to everything else. New York's DFS issued guidance on AI-related cybersecurity risk telling covered entities to fold AI into the risk assessments and controls they already certify annually under 23 NYCRR Part 500. And on August 2, 2026, the EU AI Act's main obligations for high-risk systems take effect, including Article 12, which requires those systems to automatically record events over their lifetime, and Article 26, which requires deployers to keep those logs for at least six months.
Notice what none of these did. None created a separate AI evidence standard. They did something more demanding: they attached AI to the evidence standards that already exist, the ones with document requests, retention periods, and attestations behind them.
The question is changing from "do you have an AI policy?" to "show us it ran."
Gartner named AI governance platforms one of its top strategic technology trends and projects that organizations with comprehensive AI governance in place will see 40% fewer AI-related incidents by 2028 than those without. The analyst framing is useful, but we'd put the argument more concretely: the firms that struggle in their next exam cycle will not be the ones missing a policy. They will be the ones who cannot produce records.
A "first-day letter" is the document request list a regulator sends at the start of an examination. It defines what evidence you must produce, and by when.
We've written separately about the questions examiners open with in conversation. This piece is about the paperwork: the document requests, the part of the exam where an answer is not an explanation but a file you either have or don't. Below are five AI requests we believe belong on your rehearsal list, based on where the published priorities, notices, and statutes point. For each one, the common failure mode, and what a good answer looks like.
The request list
Five AI document requests to draft answers for now
1. "Provide an inventory of AI models and applications in use, including third-party services."
This is the oldest request on the list. SR 11-7, the Federal Reserve's model risk guidance, has required a model inventory since 2011, and supervisors treat LLM-backed applications as models within its scope. The failure mode is the survey inventory: a spreadsheet built by emailing team leads, accurate for about a week, and missing every integration someone shipped without telling compliance. An examiner who finds one production AI call absent from the inventory now doubts the rest of your program, because the inventory is the artifact everything else hangs on.
A good answer is an inventory derived from observed traffic rather than from self-reporting. If the record of which models your firm calls is generated from the calls themselves, it cannot drift from reality, and the awkward gap between "models we know about" and "models in use" disappears.
2. "Provide your AI use policy, and evidence it is enforced."
The first half of this request is easy, which is exactly what makes the second half dangerous. We looked at the enforcement gap in detail in our analysis of AI's impact on data breaches: most organizations that have an AI policy have no technology enforcing it. A policy without an enforcement record is a statement of intent, and examiners are professionally trained to distinguish intent from operation.
Evidence of enforcement means a trail: this rule existed on this date, it evaluated this traffic, it triggered here, and this was the outcome. When policy lives as code and every evaluation writes a record, that trail is a query. When policy lives in a PDF and enforcement lives in training slides, the honest answer to "show us it ran" is that it didn't run. It was read.
3. "Produce records of AI interactions involving customer data for the review period."
This is a books-and-records request wearing new clothes, and the retention math is unforgiving. Broker-dealer records under SEC Rule 17a-4 and FINRA Rule 4511 carry multi-year retention, six years for many record classes. The EU AI Act's Article 26 sets a six-month floor for deployer logs, with longer periods where other law applies. Meanwhile the raw material usually lives in places built for neither: a provider console that retains thirty days, or an observability tool that samples traffic and scrubs payloads by design.
The failure mode is discovering, mid-exam, that the records were never created, because no logging system can be retroactive. A good answer captures every AI interaction at a chokepoint the traffic actually crosses, records who initiated it, which model handled it, what data classification applied, and what the governance decision was, and retains that record for the strictest period that applies to your firm.
4. "Demonstrate that these records are complete and unaltered."
Here is where ordinary logging, even diligent logging, fails. Application logs sit in systems where an administrator can edit or delete, which means they establish nothing on their own. Chain of custody is a concept examiners bring from every other record class, and they will bring it to AI. Can you prove the record of a prompt is the record, not a reconstruction? Can you prove nothing was removed?
Completeness and integrity need to be properties of the record itself. That means write-once storage and a tamper-evident hash chain, where each event is cryptographically linked to the one before it, so any alteration or gap is detectable by verification rather than asserted by testimony. Examiner-grade, not log-grade: the difference between a system that can show what your AI did and a system that can prove it.
5. "Provide your exception log: blocked requests, waivers, and who approved them."
The counterintuitive one. Firms instinctively want to show examiners a clean sheet, but a governance program that has never blocked anything, never granted a waiver, and never escalated an incident does not read as a healthy program. It reads as a program that isn't looking. Supervisors know what real control operation produces: friction, exceptions, and judgment calls with names attached.
A good answer treats exceptions as first-class records. Every block carries the rule that fired and the request it stopped. Every waiver carries its scope, its expiry, and the approvals behind it. This is also the request that quietly tests governance of AI agents, because an agent that retries, invokes tools, and calls other models generates exactly the kind of activity where "who approved this" needs an answer at the boundary the traffic crosses.
Why screenshots fail
Point-in-time evidence cannot answer operating-effectiveness questions
There is a reason SOC 2 distinguishes a Type I report, which examines design at a moment, from a Type II, which examines operation over a period. Financial supervisors make the same distinction instinctively. A screenshot of a settings page proves that a control was configured on the day someone pressed print screen. It says nothing about the other 364 days of the review period, and the review period is what the request list covers.
This is the structural problem with the way most firms currently plan to evidence AI governance: the artifacts are point-in-time (a policy document, a committee charter, a screenshot of a vendor dashboard) while the questions are longitudinal (show us the period). The only artifact that answers a longitudinal question is a continuous record, which is why the audit trail, not the policy, is the load-bearing document in an AI examination.
There's a corollary worth sitting with. Evidence cannot be backfilled. The records your firm will need for an examination covering 2026 are being created now, or they are not being created at all. Every month of ungoverned AI traffic is a month of the review period you will not be able to evidence, no matter what you deploy later.
Preparing
Building an examination-ready AI audit trail
The encouraging part: the preparation is concrete, and most of it compounds. Four moves, roughly in order.
Put a chokepoint in the path. Records are only complete if the traffic actually crosses the point where records are made. Route AI calls through a single governed path before worrying about the sophistication of what runs there.
Start the record now. Even in observe-only mode, with no enforcement at all, a complete log of AI interactions started today is worth more in an exam than a sophisticated policy launched next quarter, because of the backfill problem above.
Map controls to named regulations. "We govern AI responsibly" is not a control. "This rule set maps to SR 11-7 inventory requirements, this retention setting meets FINRA books-and-records periods, this log satisfies EU AI Act Article 12 record-keeping" gives an examiner a crosswalk they can verify. Named frameworks are the shared language of the exam.
Rehearse the pull. Once a quarter, have someone play examiner: pick a two-week window from last year and produce the five artifacts above. The first rehearsal is usually humbling. That is the point of rehearsing.
Where Meilynx fits. We built Meilynx around the premise of this post: that the audit trail, not the dashboard, is what a regulated firm ultimately needs from AI governance. The proxy sits inline in front of every governed AI call (one environment variable change, no application rewrite), enforces policy as code on each request and response, and writes every interaction and every enforcement decision to a tamper-evident, hash-chained audit log that an examiner can independently verify. The SR 11-7 model inventory and the NYDFS certification package are generated from live traffic rather than surveys, and curated control presets ship today for SR 11-7, NYDFS 23 NYCRR 500, FINRA 24-09, and SOC 2, with EU AI Act controls configurable today and a curated bundle in progress. Raw prompts and responses never leave your perimeter. From prompt to examiner, one audit chain.
Frequently asked
What logging does the EU AI Act require for high-risk AI systems?
Article 12 requires high-risk AI systems to automatically record events over the system's lifetime, so that operation is traceable. Article 19 requires providers to keep logs under their control for at least six months, and Article 26 places a matching six-month minimum on deployers, in both cases longer where other Union or national law requires it. For financial firms, sector record-keeping rules typically extend the effective retention well beyond six months.
Does SR 11-7 apply to generative AI and LLM applications?
SR 11-7 defines a model broadly, as a quantitative method that processes inputs into estimates, and supervisors have treated AI and machine learning systems as falling within model risk management scope. In practice that means LLM-backed applications belong in the model inventory, with documented validation, monitoring, and clear ownership, the same expectations applied to any other model.
What makes an AI audit trail examination-ready?
Four properties, together. Completeness: every AI interaction is captured at a point the traffic actually crosses, not sampled. Attribution: each record identifies who initiated the call, which model handled it, and what the governance decision was. Integrity: records are tamper-evident, typically via a cryptographic hash chain over write-once storage, so alterations and gaps are detectable by verification. Retention: records are kept for the strictest period that applies, which for broker-dealer record classes can mean six years.
How long should AI interaction logs be retained?
It depends on which rules reach your firm, and the strictest one wins. The EU AI Act sets a six-month floor for high-risk system logs. US broker-dealer books-and-records rules (SEC Rule 17a-4, FINRA Rule 4511) require multi-year retention, six years for many record classes. A defensible default for a regulated financial firm is to align AI interaction records with the existing books-and-records schedule rather than treating them as ordinary application logs.