Audit-Ready Web Archives: SEC 17a-4, FDA, and HIPAA Rules
Steve Butterworth · May 6th, 2026 · 12 min read

Free archive tools fail under regulatory audit. Here is what SEC 17a-4, FDA 21 CFR Part 11, and HIPAA require from a web archive that holds up under examination.

Steve Butterworth
Founder of Changeflow. Builds regulatory monitoring infrastructure used by compliance teams, law firms, and regulated-industry operators.

Audit-Ready Web Archives

A regulator schedules an examination. Their request letter asks for the firm's website content as it appeared on twelve specific dates over the past four years. The compliance team opens the Wayback Machine, finds patchy snapshots, and starts assembling what they can. Some dates have no captures. Others have captures that miss the embedded PDF. The audit response is going to be incomplete and the firm cannot prove what it said.

This is a common pattern. Free web archives are excellent for research and citation, and unsuitable for regulatory production. The gap between the two is what "audit-ready" means.

This guide covers what regulators actually require from a web archive, where the rules differ across SEC, FDA, and HIPAA, and what to look for in a vendor before the request letter arrives. For the related question of detecting changes on regulator and vendor pages on the way in, see our guide on compliance monitoring software.

What "Audit-Ready" Means

Across rules, audit-ready archives share five technical properties. A free archive lacks most of them. A compliance vendor will document each.

Immutable storage. The archive cannot be edited or deleted before the retention period ends. The SEC calls this WORM (write-once-read-many). HIPAA effectively requires the same thing for records used in compliance proceedings. Implementation is via immutable cloud storage, write-locked filesystems, or cryptographic hash chains.

Cryptographic timestamps and hashes. Each capture is signed and hashed at the time of capture. If the captured bytes are altered after the fact, the hash check fails. This is the foundation for Federal Rule of Evidence 902(13) self-authentication and is what separates court-admissible captures from screenshots.

Audit log. Every access is logged: who retrieved which capture, when, and from where. HIPAA explicitly requires this. The SEC and FDA expect it. Wayback Machine and archive.today produce no logs the requesting firm controls.

Retention configuration. Different sources need different retention periods. A broker-dealer keeps marketing communications for three years. A pharma manufacturer keeps device IFU pages for the product lifecycle plus years of follow-on. A healthcare provider keeps PHI-adjacent content for six years. The archive must let an administrator set per-source retention, not impose a single global rule.

Documented capture process. The archive vendor publishes a process specification: how captures are scheduled, what is captured (HTML, PDF, embedded media), how completeness is validated, and how the archive is recovered after a disaster. Auditors ask to see this. "Trust us" is not a process specification.

A free archive provides none of these reliably. The Wayback Machine has timestamps but they are not signed. Archive.today produces a screenshot and HTML but no audit log or retention model. ArchiveBox can be configured for some of this with effort, but the configuration becomes the firm's responsibility.

Five properties of audit-ready web archives: WORM, cryptographic hash, audit log, retention rules, documented process

Monitor regulatory pages automatically

Changeflow watches agency websites and tells you what changed and why it matters.

Free plan available. No credit card required.

SEC 17a-4: Broker-Dealer Recordkeeping

SEC Rule 17a-4 governs how broker-dealers retain books and records. Subsection (f) sets the technical bar for electronic records.

The 2022 amendment updated the rule from a strict WORM-only standard to allow either WORM storage or an audit-trail alternative. Both options remain demanding. A firm using audit-trail storage must demonstrate the system records and time-stamps every record event, retains an unalterable original alongside any modified copy, and produces a complete audit trail on demand.

Web content lands in scope when the page contains "communications" within FINRA's definition. Public marketing pages, retail-facing product descriptions, social-media posts on official accounts, and email-newsletter archives can all qualify. The 2017 FINRA Regulatory Notice on social media made it explicit: a tweet or LinkedIn post about products by a registered representative is a record subject to retention.

FINRA Rule 4511 aligns recordkeeping practices with SEC 17a-4. Member firms are routinely examined on whether their archive design satisfies both rules. A common failing: the firm retains email and chat through a vendor, then archives the website with screenshots that fail the WORM and audit-trail tests.

The practical fix is a single archive that handles email, social media, and website content under the same retention and audit-log discipline. Smarsh, Global Relay, and similar sectoral vendors built this category. PageFreezer focuses on the website and social-media side and integrates with email-archive vendors for the rest. Compliance teams running SEC EDGAR monitoring or watching enforcement-action pages should also archive their own published content under matching retention.

FDA 21 CFR Part 11: Electronic Records for Life Sciences

21 CFR Part 11 sets requirements for electronic records and signatures used to satisfy FDA recordkeeping. The rule is technology-neutral. It demands controls, not specific tools.

For a pharma or medical-device company, web pages enter Part 11 scope when they contain regulated content the FDA expects the company to maintain. Examples:

  • Drug labeling published online
  • Medical device instructions for use (IFUs)
  • Adverse event reporting forms and their submission confirmations
  • Clinical trial result postings
  • Patient-information leaflets in clinical settings

The technical controls Part 11 imposes are familiar to compliance teams: signed records, secure user-access controls, audit trails capturing who created or modified each record, validation of the system that produces the records, and authorized copies of records on demand.

A free archive cannot satisfy these. Neither can most general-purpose website archivers. The systems FDA-regulated companies actually run are validated archive vendors with documented IQ/OQ/PQ qualification packages, change-control logs, and a Part 11 compliance attestation.

The cost is enterprise. The ROI shows up at the next pre-approval inspection or post-market surveillance audit. Companies tracking FDA and CMS website changes for current awareness should run a parallel archive of their own regulated pages on a Part 11-validated stack.

Changeflow email alert showing a tracked SEC enforcement-action page with a timestamped change capture for audit response

HIPAA: Web Archives Touching PHI

HIPAA's Security Rule sets administrative, physical, and technical safeguards for electronic protected health information (ePHI). Web archives intersect HIPAA in three ways, each with its own risk.

The web archive contains PHI. Patient portals, secure messaging interfaces, and provider-facing tools may produce screenshots that contain PHI. If those screenshots are captured for compliance or evidence purposes, the archive becomes ePHI storage and inherits all HIPAA technical safeguards: access controls, audit logs, integrity controls, and transmission security.

Public-web pages inadvertently contain PHI. Misconfigured forms, accidentally indexed staging environments, and crawler-grabbed PDFs can capture PHI that was never supposed to be public. The Wayback Machine has historic captures of PHI in such cases. Healthcare organizations responding to a breach often need to identify and request removal of these captures, which the Internet Archive will honor under specific procedures but which never disappear from third-party mirrors.

The archive is used in audits or breach response. When an OCR (Office for Civil Rights) investigation looks at a healthcare organization's web presence over time, the archive becomes a piece of the investigation. Auditors ask for the access log of the archive itself, the retention configuration, and the chain-of-custody documentation. A free archive answers none of this.

The practical pattern: healthcare organizations run a HIPAA-eligible archive vendor with a Business Associate Agreement (BAA) for any internal-facing or audit-relevant content, and rely on free archives only for genuinely public content where no PHI exposure is possible. Pairing the archive with regulatory change management covers the inbound side: knowing when state privacy laws or HHS guidance evolves.

Court-Grade Authentication: FRE 902(13) and 902(14)

When a web archive ends up in litigation, the rules of evidence take over from the regulatory rules. Federal Rule of Evidence 902(13) and 902(14) cover self-authentication of records generated by an electronic process and records produced by an electronic process or system.

The drafters' notes are explicit about what self-authentication requires: a process the proponent can describe, metadata that allows tampering to be detected, and a certification from a qualified person attesting to the process.

Audit-ready archives produce all of this by default. Free archives produce almost none of it. The gap is why federal courts admit Wayback Machine evidence under FRE 901(b)(1) with witness testimony rather than under 902(13), and why witnesses end up in declarations explaining how they accessed the snapshot.

Our guide on eDiscovery website evidence covers the litigation side of this in depth. The thread that connects litigation evidence and regulatory audit is the same: a process that produces signed, hashed, timestamped, access-logged captures will satisfy both. A process that does not will fall short of either when stakes are real.

Vendor Checklist: What to Ask Before Signing

For compliance teams evaluating a web-archive vendor, the questions that matter are technical and process-specific. Generic feature lists do not survive an examination.

Storage and immutability. Does the vendor offer WORM-equivalent storage with cryptographic immutability? What is the underlying mechanism? How is the WORM state attestable to a regulator? Where do the bytes physically live and under whose jurisdiction?

Capture process and validation. What is captured (HTML, embedded PDF, JavaScript-rendered content, media)? What is the capture cadence and how is it logged? How is capture completeness validated? Are failed captures flagged and re-attempted?

Retention configuration. Can retention periods be set per source? Per record type? Per legal-hold flag? What happens at retention expiry, and is the deletion auditable?

Audit log and access controls. Who can view, retrieve, or export captures? Is each access logged with user, timestamp, IP, and retrieved record? Is the audit log itself immutable?

Authentication and chain of custody. Does each capture include a digital signature, hash, and timestamp authority signature? Does the vendor produce a 902(13) declaration on demand? Is there a documented chain-of-custody from capture to export?

Compliance attestations. What audits has the vendor passed? SOC 2 Type II is table stakes. ISO 27001 helps. HIPAA-eligible (with BAA). Sector-specific certifications matter for SEC 17a-4 and FDA Part 11.

Pricing. Most enterprise compliance archive vendors do not publish pricing. Expect mid-five-figure to six-figure annual contracts depending on scope, integrations, and capture volume. Smaller, sectoral vendors sometimes publish entry tiers. Always get a written quote scoped to your actual sources and retention.

The same questions apply whether the vendor is monitoring your own published content for retention or tracking vendor terms of service for legal-ops awareness. Audit-readiness is a property of the system, not the source.

Where the Lines Get Blurry

Three real-world scenarios consistently catch compliance teams off guard.

Cross-border requirements. A US broker-dealer with EU and APAC clients now has GDPR data-residency, MiFID II archiving, and APAC sectoral rules layered on top of SEC 17a-4. A vendor that stores everything in US-East-1 may not survive an EU audit. Multi-region storage with documented residency is the answer, and not every vendor does it.

Acquired entities and legacy archives. When Company A acquires Company B, Company A inherits Company B's archive obligations. Often the legacy archive is on a vendor with weaker retention guarantees, or no documented capture process for periods before acquisition. The integration step is rarely budgeted and routinely overlooked. Our guide on maintaining regulatory compliance walks through this in detail.

The line between awareness and archive. A monitoring tool detects changes on a regulator page. An archive preserves what was published on a firm's own page. These are different categories with different audit profiles. A team that buys one and assumes it covers the other will discover the gap during an exam, not before. Our compliance monitoring software guide covers the awareness layer; this guide covers the archive layer; both are required for a defensible posture in regulated environments.

Frequently Asked Questions

What does SEC 17a-4 require for web archiving?

SEC Rule 17a-4(f) requires broker-dealers to retain certain electronic records on non-rewritable, non-erasable storage (WORM), with quality and accuracy verification, indexed retrieval, and audit logging. When firm websites or social-media pages contain communications subject to recordkeeping, those pages fall under the rule. Free archives fail because they offer no WORM property, no audit log, and no retention guarantee.

Does 21 CFR Part 11 apply to a pharma company's website?

Sometimes. Part 11 covers electronic records used to satisfy FDA recordkeeping requirements. If a company's website contains GxP-regulated content, such as drug labeling, medical device IFUs, or adverse-event reporting forms, those pages may be in scope. The technical controls Part 11 demands are not features of free web archives.

Can I use Wayback Machine snapshots in a HIPAA audit?

Generally no. HIPAA's Privacy and Security Rules require covered entities to know who accessed a record and when. Wayback Machine snapshots have no access logs, no associated chain of custody, and may have inadvertently archived PHI that should never have been on the public web. For HIPAA-relevant archives, organizations need a tool that produces signed, access-logged captures with documented retention.

What is a WORM web archive?

WORM stands for write-once-read-many. It is a storage property the SEC and several other regulators require for retained records. The archive cannot be modified or deleted before the retention period expires. Compliance vendors implement WORM through immutable cloud storage, write-locked filesystems, or cryptographic hash chains. Free archives like Wayback Machine and archive.is do not provide WORM properties.

How long do regulators expect web archives to be kept?

Retention periods vary by rule. SEC 17a-4 typically requires three to six years, FINRA aligns with SEC retention, FDA expects records kept for the lifecycle of the regulated product plus follow-on periods, and HIPAA requires six years from creation or last effective date. State laws and sectoral rules add overlays. A vendor that does not let you configure per-source retention will not survive a multi-rule environment.

Watch regulator and audit-target pages, capture every version

Track agency pages, vendor policies, and disclosure URLs. Get timestamped change history when audit season arrives.

Try Changeflow Free

No credit card required