Generative Watermarking: A Building Block to Digital Trust

It is increasingly difficult to determine if content is created by humans or artificial intelligence, creating problems for model developers who want to avoid polluting their training models and for society at large. Misuse is already happening, from rampant academic plagiarism to the mass generation of misinformation.

Experts fear this is undermining trust in digital content and could lead to a ‘information apocalypse’ creating societal confusion about which information sources are reliable and a world in which people no longer have a shared reality.

That’s not all. With images, videos, design files, and proprietary documents now representing multimillion-dollar assets, organizations face unprecedented risks from unauthorized use that can severely damage brand reputation and revenue.

Part of the solution to this problem might be generative watermarking, one of the Top 10 Emerging Technologies of 2025 named in a June World Economic Forum report in collaboration with scientific publisher Frontiers. The Innovator is publishing a series of independently reported in-depth articles on the 2025 emerging trends in its FutureScope section, under a collaboration agreement with Frontiers.

The term watermarking comes from a process in which an identifying image or pattern in paper appears when viewed by transmitted light to prove the providence of a document. Generative watermarking techniques for the AI age modify the training process, the inference process, or both so that an artifact of the model – the text, audio, and video they generate – embeds some identifying information of the model from which it originates. This way a model operator, or potentially the consumer of the content themselves, can determine whether an artifact came from the model by checking for the presence of the watermark. The trick is to subtly alter generative AI outputs without noticeably impacting their quality.

Over the next decade, generative watermarking technologies could evolve from optional technical safeguards to important components of digital trust infrastructure, according to the Dubai Future Foundation, which contributed to the Forum’s report. As synthetic content becomes increasingly prevalent, the Dubai. Foundation says embedded watermarks might form the foundation of a global verification ecosystem that helps distinguish between human and machine-created digital assets.

Generative watermarking technologies could, for example. help strengthen the Coalition for Content Provenance and Authenticity (C2PA) initiative which seeks to add provenance of media from a variety of sources, not just AI. Watermarking has the potential to make C2PA signatures, which are embedded in an image’s metadata, more robust by preserving the origin of the artifact even after unattributed modification, according to a blog post published by Cloudflare, a major content delivery network that supports the C2PA standard.

But first, some serious barriers must be overcome. “It is a very complex problem,” says watermarking expert Sridhar Krishnan, Dean, Faculty of Engineering and Architectural Science, at Toronto Metropolitan University.

From a technical perspective, the central challenge of generative watermarking lies in balancing robustness, imperceptibility, and compatibility across diverse platforms and post-processing operations.

Today it is easy to remove watermarks in text. Methods to tie videos and images to their source can also be broken by bad actors. Researchers are working on new ways to make watermarking more robust, but it is early days. To be effective generative watermarking needs the buy-in both the makers of large language models and the users of GenAI chatbots like ChatGPT. Currently neither is a given.

Breaking Bad

Text-based watermark technologies, such as Google DeepMind’s SynthID technology, take advantage of the fact that there are thousands of words in each language that can be randomly substituted by others, according to the Forum’s report. They work by including a narrow and specific subset of such words throughout AI-generated text that seems natural but is distinct from the more random word choices a human writer might make. This results in an AI-specific textual “fingerprint”. Image and video watermark technologies include introducing imperceptible changes at the pixel level that can survive edits like resizing and compression – for instance by subtly altering the values of individual pixels so that a machine can see the changes, but the human eye can’t or embedding hidden patterns in generated output that only a machine can extract

If a watermark is embedded in AI-generated content but can be stripped away with little effort, then it becomes impossible to prove whether the data was generated by AI, says Hanqing Guo, an assistant professor at University of Hawaii at Mānoa

Department of Electrical and Computer Engineering. But in practice, there is no perfectly robust watermark, he says. Determined bad actors can eventually break watermarks, For example, they might pass the watermarked output through multiple signal-processing steps, feed it into another generative model for re-synthesis, re-record it through an analog channel, or shuffle the content, says Guo, whose recent work includes studies on the robustness of watermarking in generative AI models and analysis of overwriting attacks that compromise neural watermarking. Each of these transformations can gradually erode or destroy the watermark, making detection unreliable, he says.

Watermarks, including Google’s SynthID and Meta’s Video Seal, are often based on deep learning. These schemes involve training on a machine learning model, typically one with an encoder–decoder architecture where the encoder encodes a signature into an artifact and the decoder decodes the signature:This process amounts to a scaled-up version of penetration testing, an essential practice of security engineering whereby the system is subjected to a suite of known attacks until all known vulnerabilities are patched.

There will always be new attack variants or new attack strategies that the model was not trained on and that may evade the model, prompting the type of cat and mouse game well known by cybersecurity professionals, says Krishnan, who contributed to the Forum’s report on generative watermarking. Coping with new attacks on deep learning-based watermarks requires continual intelligence gathering and re-training of deployed models to keep up with attackers.

Cryptography helps reduce the attack surface, enhancing watermarking by providing keys, authentication, and tamper-resistance, Krishnan says. An integration with blockchain systems to create verifiable watermarks represents a promising frontier that could establish better content identity regardless of modification or distribution, according to the Forum report.

Despite ongoing innovation ensuring robustness remains a challenge, says Krishnan. He is the topic editor of a new research topic for Frontiers in signal processing to explore how watermarking can maintain its integrity across various file formats and promote standardized verification protocols to enhance digital providence. A call for proposals closes on October 15.

The Importance Of Imperceptibility

Watermarks need to be imperceptible, meaning users should not notice any difference between the original content and the watermarked content. If a watermark makes text, images, or audio look or sound distorted, users will resist adoption, says Guo. For example, in audio watermarking for music streaming, watermarks are carefully designed so that even professional musicians cannot perceive them, yet they can still be detected with special tools. In generative AI, if the watermark leaves visible text artifacts or unnatural image patterns, then both quality and trust in the system will drop.. In the current state of the art, the watermark designer needs to make a trade-off between imperceptibility and robustness, Guo says.

The Need for Industry Buy-in

Generative watermarking will only be effective if all the LLM model makers opt in and so far, this is not the case, says John Kirchenbauer, a PhD Student in Computer Science at the University of Maryland, College Park who is credited with his own early pioneering efforts in generative watermarking. (The watermarking technology he developed at the University of Maryland is available in the Hugging Face transformers library, an open source toolkit for building and sharing state-of-the-art machine learning models ) While Google has open sourced SynthID it is unclear which LLM providers are actively using watermarking technology, he says.

One of the first encryption-based approaches to generative watermarking was developed in 2022 by Scott Aaronson, for Open AI while he was on leave from the University of Texas in Austin. But Open AI did not deploy it, due to an internal debate. “In trying to decide what to do, OpenAI employees wavered between the startup’s stated commitment to transparency and their desire to attract and retain users,” according to an August 2024 article in The Wall Street Journal.

In April 2023, OpenAI commissioned a survey that showed people worldwide supported the idea of an AI detection tool by a margin of four to one, internal documents show, according to the Journal article. That same month, OpenAI surveyed ChatGPT users and found 69% believe cheating detection technology would lead to false accusations of using AI, the article said. Nearly 30% said they would use ChatGPT less if it deployed watermarks and a rival didn’t.

User blowback isn’t the only issue. Some LLM providers fear that the ability to trace content back to their AI model might leave them open to legal liability, Kirchenbauer said in an interview with The Innovator.

Regulatory pressure maybe the only way to get all LLM providers to adopt the technology, he says.

Indeed, to be successful, these technologies will need to be accompanied by equally sophisticated governance and use guidelines, says the Forum’s report. It notes that China has acted to regulate generated content to require watermarking and other regions, such as the European Union, are also developing responses to manage the security and authenticity of digital content.

Contending With Compatibility Issues

It is one thing to pass regulation but quite another to implement them. It is very difficult to enforce a single watermarking standard across all generative AI models such as ChatGPT, says Guo. If every model were forced to adopt the same watermarking method, the scheme would become less secure, since adversaries could study one system and apply the same attack broadly, he says. On the other hand, if different models each use their own watermark, then it becomes unclear which model generated a given piece of content, or whether the watermark belongs to one system or another. Moreover, because no watermarking method is fully tamper-proof, even if kept closed-source, attackers could manipulate content to either strip the watermark or create false positives. This creates a risk of mislabeling: a user who never touched ChatGPT could be incorrectly flagged as using it, or conversely, GPT-generated content that has been manipulated could be mislabeled as human-authored, Guo says. “Such compatibility issues highlight the difficulty of making watermarking both secure and fair at scale,” he says.

The Role of Complimentary Detection Technologies

While researchers and policymakers work out how to secure generative watermarking, scale it and regulate its use, startups are perfecting ways to detect AI generated content.

Among them is New York City-based Pangram Labs, Developed by experienced AI researchers with backgrounds from Stanford, Tesla and Google., Pangram claims its AI detector can detect AI writing from ChatGPT, Claude, Gemini, Perplexity and more, with a near-zero false positive rate. It does not detect watermarking. Instead, it searches for patterns that AI injects into writing.

“Watermarking is like locking your front door. It acts as a deterrent but anyone who really wants to get in can,” says Spero. “Detection technologies are a good compliment, the equivalent of a motion detector on the inside.”

Clients are often shocked at the results. Take the case of the American Association for Cancer Research (AACR). It used an AI tool developed by Pangram Labs to analyze tens of thousands of research-paper submissions. As reported in Nature Pangram found that 23% of abstracts in manuscripts and 5% of peer-review reports submitted to its journals in 2024 contained text that was probably generated by LLMs.. The publishers also found that less than 25% of authors disclosed their use of AI to prepare manuscripts, despite the publisher mandating disclosure for submission.

When applied to 46,500 abstracts, 46,021 methods sections and 29,544 peer-review comments submitted to 10 AACR journals between 2021 and 2024, Pangram’s tool flagged a rise in suspected AI-generated text in submissions and review reports since the public release of OpenAI’s chatbot, ChatGPT, in November 2022.

The AACR’s study is an example of the meteoric rise of the use of chatbots in academia, the media and in business, Max Spero, Pangram Labs’ CEO, said in an interview with The Innovator.

Pangram’s clients include academic institutions, news organizations and executives at large companies. One of its clients, NewsGuard, a watchdog organization that helps enterprises and consumers identify reliable information online., uses Pangram’s tool to detect whether websites that appear to represent news organizations are in fact AI generated and using bots to generates tens of thousands of articles to sway opinion.

“There is a lot of demand for this,” says Spero. “Detecting AI is one of the last remaining natural language problems. If you ask an LLM if something is AI-generated they are really, really bad at detecting it.”

The Urgent Need for Comprehensive Governance

The tech sector can’t fix the problem alone. There is need for a comprehensive governance system for digital content, according to the Dubai Foundation’s contribution to the Forum’s report. It predicts that nations and organizations that take the lead in setting watermarking standards will shape the rules of the emerging synthetic media economy and could potentially reshape legal and financial systems.

“Courts might eventually accept watermarked content as evidence in intellectual property disputes and defamation cases, while insurance companies could consider developing tiered coverage models based on content authentication levels, the Foundation says in the report. “For creators, the ability to verify their work – whether human-made or AI-assisted may position them to command premium prices in markets increasingly populated with synthetic alternatives.”

Looking ahead, the Foundation says generative watermarking represents not just another tool for content verification, but a potential reimagining of how trust is established in an increasingly synthetic digital landscape. “Progress will depend on collaboration across technology, policy and creative sectors to create systems that balance innovation with appropriate safeguards,” it said in the Forum’s report.

It’s a tall order. Given the exponential growth of AI-generated content, the question is whether such systems can be developed fast enough.

This article is content that would normally only be available to subscribers. Become a subscriber to see what you have been missing.

Generative Watermarking: A Building Block to Digital Trust

About the author

Jennifer L. Schenker

You may also like

About the author

Jennifer L. Schenker