The Compensation Problem No One Is Solving
The litigation wave against AI companies is real. Lawsuits targeting training data scraping practices have named major model developers under copyright law, the California Consumer Privacy Act, and state biometric privacy statutes. The legal theory is becoming clearer: if your writing, your voice, your likeness, or your behavioral data was ingested without consent into a training corpus, you may have a compensable claim.
But here is the structural problem that almost no one in these proceedings is discussing. Even if plaintiffs win, even if a court certifies a class and a settlement fund is established, the distribution mechanism does not exist. There is no registry of who created what, when, and in what form it was scraped. There is no cryptographic record linking a specific individual to a specific data artifact. Without that, a settlement becomes legally awkward at best and practically unenforceable at scale.
This is not a speculative concern. It is a known failure mode in data class actions that courts and administrators have struggled with for years.
What Settlements Actually Require to Function
Class action settlements involving digital harms typically require three things to distribute compensation fairly. First, a defined class with identifiable members. Second, a mechanism to verify membership. Third, a claims process where individuals can prove standing.
In data breach litigation, standing has been contested at the Supreme Court level. TransUnion LLC v. Ramirez (2021) tightened the requirement that plaintiffs demonstrate concrete, particularized harm rather than speculative risk. That precedent has ripple effects into AI data harvesting claims because the injury is diffuse and hard to pin to a single individual's provable loss.
For AI training data claims, the standing problem compounds. Your words were scraped. Your stylistic fingerprint may exist inside a model weight. But proving that your specific creative output was ingested, at a specific time, from a specific URL, in a way that harmed you, requires documentation that most individuals have never thought to create.
Courts will ask: what is your proof of prior ownership? The answer, for most people right now, is nothing verifiable.

Why Existing Frameworks Fall Short
GDPR Article 17 gives EU residents the right to erasure. CCPA Section 1798.105 gives California residents a similar deletion right. Both frameworks assume that a data subject can identify themselves to a controller and request action. Neither framework was designed to handle the inverse problem: proving that you are the original creator of data that has already been absorbed into a model.
Copyright registration is the closest analog to origination proof that currently exists at scale. The U.S. Copyright Office maintains timestamped registration records. But copyright registration covers finished works, not the granular data artifacts that training pipelines actually consume. A scraper does not harvest your registered novel. It harvests sentences, paragraphs, metadata, behavioral signals, image patches, and audio features that exist across thousands of unregistered contexts.
Platform terms of service are not a solution either. When you agreed to a platform's terms, you typically granted that platform a license. Whether that license extended to sublicensing your data to third-party AI developers is the core legal question in several active proceedings. But even a favorable ruling on that question does not create a distribution mechanism. It only establishes liability in the abstract.
The gap between legal liability and individual compensation is where the entire AI data rights movement is currently stalled.
Origination Proof as Infrastructure
Think of what a functioning settlement infrastructure would actually need. It would need a timestamped record that a specific individual created or controlled a specific data artifact before it was scraped. It would need that record to be tamper-resistant, independently verifiable, and tied to a real identity without requiring that identity to be publicly exposed in the record itself.
This is precisely what cryptographic origination certificates provide. A hash of a data artifact, signed with a private key, anchored to a timestamp, and associated with a verified identity creates an audit trail that can survive litigation discovery. It is not a claim that the data was never scraped. It is proof that you owned it first.
The distinction matters enormously in a settlement context. In a class action, the burden of proof for individual claims typically falls on the claimant. If you can produce a cryptographically verifiable certificate showing that you created a specific piece of content on a specific date, and that content is demonstrably present in a training dataset through forensic analysis, you have the foundation of a compensable claim. Without that certificate, you have an argument.
Arguments do not clear claims administration hurdles. Certificates do.

PDAOS and the Settlement Layer
Own Your Data Inc. developed the Personal Data Asset Origination System, or PDAOS, specifically to address this infrastructure gap. The full technical specification is published at mydatakey.org/pdaos-white-paper/ and details how origination certificates are structured, signed, and stored in a way that is designed to survive the evidentiary demands of legal proceedings.
The core design principle is that a PDAOS certificate is not a self-attestation. It is a timestamped, cryptographically signed record issued through MyDataKey™ that establishes provable prior creation. The certificate does not need to be publicly broadcast to function. It needs to be producible on demand when a claims administrator, a court, or a settlement fund asks: prove you owned this data before it was harvested.
At scale, this architecture becomes settlement infrastructure. If a significant percentage of claimants in a future AI data class action can produce origination certificates, the claims administration process becomes tractable. A verifiable registry exists. Membership in the class can be confirmed programmatically rather than through manual affidavit review. Compensation can be distributed proportionally based on the scope and type of data covered by each certificate.
This is not a novel concept in compensation systems. Patent pools, music licensing registries like ASCAP and BMI, and the Copyright Office's registration system all function on the same logic: prior registration determines rights and distribution. The AI settlement layer needs an equivalent mechanism. PDAOS is designed to be that mechanism for personal data.
As a 501(c)(3) nonprofit, Own Your Data Inc. built MyDataKey™ to make this infrastructure accessible to individuals, not just corporations with legal departments. The goal is to ensure that when compensation mechanisms finally exist, the people most affected by data harvesting are not excluded because they lacked the resources to document their claims in advance.
What You Can Do Now Before Any Settlement Exists
The window to establish origination records is open now, before litigation resolves. Once a settlement fund is established, a claims bar date is set, and the period to submit evidence closes, there is no retroactive path. Courts do not reopen claims periods because individuals failed to document their assets in advance.
The practical steps are straightforward. Register a MyDataKey™ origination certificate for data you have created and published. This includes written content, images, audio, and any digital artifact with a traceable creation history. If you have published content across platforms that have supplied training data to major AI developers, those artifacts are the most likely candidates for future class membership.
You can start at mydatakey.org/signup/. The certificate process is designed to be usable by individuals without legal or cryptographic expertise, while producing records that meet the technical standards required for legal proceedings.
If you are also concerned about data brokers holding profiles compiled from your behavioral data, the opt-out process documented at mydatakey.org/opt-out/ addresses a parallel but distinct problem. Origination certificates and opt-out rights work in different directions. Certificates prove what you created. Opt-outs limit what brokers can distribute. Both matter. Neither substitutes for the other.
The Registry Question Courts Will Eventually Ask
Legal scholars and class action practitioners are beginning to surface the distribution problem explicitly. If AI companies face billions in liability for training data claims, how does a settlement fund of any size get distributed to tens of millions of potential claimants? The Facebook BIPA settlement in Illinois distributed roughly $397 per claimant from a $650 million fund. That proceeding had biometric data records, platform account data, and a defined user base to work with.
AI training data claims are structurally more complex. The scraped data is not confined to one platform. It spans the open web, social media exports, public repositories, and licensed datasets. There is no single controller with a clean user list. The claimant class, if certified, could be enormous. And the only way to manage distribution at that scale is through a registry of verifiable origination records.
Courts will ask whether such a registry exists. Plaintiffs' counsel will ask whether their clients are in it. Claims administrators will ask how to verify membership without it. The answer today is that no centralized, independent registry exists at the scale required. Building one retroactively, after a settlement is reached, is not feasible under the time constraints of claims administration.
The infrastructure has to be built before the settlement. That is the window that exists right now, in 2026, while litigation is still in early stages and before any major AI data class action has reached the distribution phase. The question of whether individuals will be compensated when courts rule in their favor may ultimately depend on whether they documented their claims before the proceedings concluded.
Origination proof is not a legal strategy. It is legal infrastructure. And the time to build it is before you need it.
Editorial Review
This article was reviewed by Ryan Gaughan on May 13, 2026 for accuracy, currency, and clarity. Content is updated when laws or guidance change.