Abstract clinical data compliance visualization — structured health data nodes with compliance markers

Healthcare AI April 1, 2025 12 min read

HIPAA-Adjacent AI: What Covered Entities Need Before Deploying a Fine-Tuned Clinical NLP Model

Fatima Al-Rashid

CEO & Founder, Cognify

HIPAA — the Health Insurance Portability and Accountability Act and its implementing regulations in 45 CFR Parts 160 and 164 — does not contain a section titled "artificial intelligence." There is no HIPAA rule specifically governing clinical NLP models, no OCR guidance on fine-tuning language models with protected health information, and no safe harbor for de-identified training data that was created by an automated process.

What HIPAA does contain is a framework of obligations around the use and disclosure of protected health information (PHI) that has been consistently interpreted broadly by OCR — broadly enough to create significant documentation obligations for any covered entity deploying a model trained on clinical data. The gap between "HIPAA doesn't mention AI" and "HIPAA creates real constraints on clinical AI" is a gap that health system legal and compliance teams are navigating daily, and that clinical ML teams consistently underestimate until they're six weeks into a deployment that should have gone live six weeks earlier.

This post is about the specific HIPAA-adjacent obligations that affect clinical AI teams, and the documentation a covered entity needs in place before a fine-tuned clinical NLP model goes into production.

HIPAA doesn't regulate models — what it does regulate

HIPAA's Privacy Rule (45 CFR §164.502) restricts covered entities from using or disclosing PHI except as expressly permitted. The Security Rule (45 CFR §164.306) requires covered entities to implement safeguards for electronic PHI. Neither rule mentions ML models specifically — but they regulate what can be done with PHI that might be used to train a model, and how that PHI must be handled when a model processes it at inference time.

The practical questions for a clinical AI team are:

Is the training data PHI, de-identified PHI, or a limited data set?
If PHI was used in training, was that use authorized — either by patient authorization, a research waiver, or the treatment operations exception?
Is the AI vendor (if using an external fine-tuning platform or cloud training service) a business associate under 45 CFR §160.103?
Does the deployed model at inference time process PHI, and if so, are the technical safeguards for that processing in place?
Does the model's audit log meet the audit control requirements under 45 CFR §164.312(b)?

These are structural questions that need to be answered before training begins, not after the model is ready for deployment.

BAA coverage for AI vendors

Under the HIPAA Privacy Rule, a business associate is any entity that performs functions or activities on behalf of a covered entity that involve the use or disclosure of PHI. An AI vendor that trains a model using PHI — even if the PHI is in the training dataset and not transmitted to the vendor directly — is almost certainly a business associate. A Business Associate Agreement (BAA) is required before that vendor can receive or use PHI.

The BAA requirement has created problems for clinical AI teams using cloud training infrastructure. Major cloud providers (AWS, Google Cloud, Microsoft Azure) offer BAA coverage for specific services — but not necessarily for every service in their catalog. If your fine-tuning job is running on a GPU instance type that's outside the BAA-covered service scope, you have a compliance gap even if your data never leaves the cloud provider's infrastructure.

We're not saying cloud providers are unreliable BAA partners — most major providers have sophisticated HIPAA programs and extensive BAA coverage. The point is that the coverage is service-specific and needs to be verified at the level of the specific compute resources being used for training, not assumed based on a general cloud relationship.

The BAA also needs to cover any experiment tracking or ML tooling vendor that receives training data or model artifacts containing PHI. This includes self-hosted tools if the vendor has SaaS telemetry that could inadvertently capture training data. It's a category of compliance exposure that catches clinical ML teams off guard because experiment tracking tools are treated as internal engineering tools — but if they process or store PHI, they're business associates.

Minimum necessary and training data scope

The Privacy Rule's minimum necessary standard (45 CFR §164.502(b)) requires covered entities to limit PHI uses to the minimum necessary to accomplish the intended purpose. For a clinical NLP model, this creates a training data scoping question: what is the minimum PHI necessary to train a model that accomplishes the clinical purpose?

Take a concrete example: Meridian Health Systems (a synthetic example) wants to fine-tune a language model for clinical note summarization. The clinical notes corpus includes all notes in the EHR system — approximately 40 million notes spanning 15 years across all specialties. The model being developed is specifically for cardiology department use. Using all 40 million notes when the intended use is cardiology summarization would likely not satisfy minimum necessary — the scope of PHI used should be defensible in relation to the specific model purpose.

The documentation implication is that the training data scoping decision — why these record types, this date range, this patient cohort — needs to be documented and reviewed by the privacy officer or compliance team before training. This is a data governance decision, not a purely technical one, and it needs an approval record.

De-identification: Safe Harbor vs. Expert Determination

The most common approach to clinical AI training data compliance is de-identification: removing or transforming PHI so that the data no longer constitutes PHI under HIPAA's definition. De-identified data is not PHI and is therefore not subject to the Privacy Rule's use and disclosure restrictions.

HIPAA recognizes two de-identification methods. The Safe Harbor method (45 CFR §164.514(b)(2)) requires removal of 18 specific identifiers: names, geographic data smaller than state, dates (other than year) directly related to an individual, phone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle identifiers, device identifiers, web URLs, IP addresses, biometric identifiers, full-face photographs and comparable images, and any other unique identifying number or code.

Expert Determination (45 CFR §164.514(b)(1)) requires a person with statistical expertise to certify that the risk of re-identification is very small. This method allows retention of some information that Safe Harbor would require removing, if the statistical analysis supports it.

For clinical NLP training data, the de-identification process needs to be documented in detail: which method was used, who performed it (a named person or validated tool), what validation was done to verify that the 18 Safe Harbor identifiers were removed, and what residual re-identification risk assessment was performed. The documentation of this process is part of the compliance record for the model — if the de-identification process is later called into question, the covered entity needs to be able to demonstrate it was conducted rigorously.

Audit controls under 45 CFR §164.312

The Security Rule's Technical Safeguards (45 CFR §164.312) include an audit controls requirement: "Implement hardware, software, and/or procedural mechanisms that record and examine activity in information systems that contain or use electronic PHI." This requirement applies to systems that use PHI at inference time — if your deployed clinical NLP model processes PHI in production queries, the system needs audit logging that records who queried the model, with what input, and when.

This is a more demanding requirement than it initially appears. Standard API logging records request metadata — timestamps, user IDs, endpoints — but may not record input content if the input contains PHI and the team has been cautious about logging PHI. The tension between "log everything for audit purposes" and "minimize PHI exposure in log systems" requires an explicit design decision: either a structured approach to logging metadata without logging PHI content, or a PHI-compliant logging system with appropriate access controls.

The audit log also needs to be tamper-evident. A Security Rule audit log that can be modified after the fact doesn't satisfy the control requirement — the purpose of audit controls is to provide an authoritative record of access activity that can be reviewed after a potential security incident. Write-once, append-only logging with integrity verification is the appropriate architecture.

Clinical AI documentation: what's required before deployment

Based on the regulatory framework above, the documentation a covered entity should have in place before deploying a fine-tuned clinical NLP model includes:

Data use authorization record. Documentation of the legal basis for using PHI (or de-identified data) in training: treatment operations authorization, research waiver, or de-identification certification.
De-identification process documentation. Which method (Safe Harbor or Expert Determination), who performed it, validation results.
Training data scope justification. Why this dataset scope satisfies the minimum necessary standard for the specific clinical use case.
BAA inventory. Which vendors and cloud services processed PHI (or de-identified PHI that required BAA coverage based on their service agreements), and confirmation that BAAs are in place.
Training artifact lineage. Dataset version records, training configuration, model checkpoint hashes — the technical audit trail proving which data produced which model.
Inference PHI handling documentation. Architecture documentation for how PHI is handled in production queries: what is logged, where logs are stored, who has access, retention period.
Model performance documentation. Evaluation results across relevant clinical subpopulations, known limitations, and the monitoring plan for production performance.

The inference side: PHI in production queries

A common compliance gap is teams that carefully document their training data compliance but don't adequately address the inference-time PHI handling. When a clinical NLP model is deployed in a production clinical workflow — processing discharge notes, summarizing patient histories, classifying clinical documents — it's typically processing PHI in real time.

This creates obligations beyond the training data documentation. The production system needs: technical safeguards appropriate for a system handling electronic PHI (encryption in transit and at rest, access controls, audit logging), a BAA with any cloud infrastructure vendor hosting the inference endpoint, a documented process for handling inferences that might need to be reviewed or retracted if the model produces clinically incorrect output, and retention policies for the inference logs that are consistent with the covered entity's overall PHI retention framework.

Clinical ML teams who address these questions systematically before deployment — rather than discovering them during a compliance review — consistently move faster in the long run. The documentation work done pre-deployment is also reusable: a well-structured compliance record for model version 1.0 provides most of the scaffolding for the review of version 2.0. The investment in upfront documentation pays dividends in every subsequent iteration cycle.

HIPAA doesn't regulate models — what it does regulate

BAA coverage for AI vendors

Minimum necessary and training data scope

De-identification: Safe Harbor vs. Expert Determination

Audit controls under 45 CFR §164.312

Clinical AI documentation: what's required before deployment

The inference side: PHI in production queries

Related articles

Why ML Compliance Fails: The Gap Between Experiment Tracking and Audit-Ready Documentation

Dataset Versioning for Fine-Tuning: Why SHA-256 Hashes Are Not Enough

Data Provenance for Fine-Tuning: The Six Questions Every Compliance Team Will Ask