The Law and Ethics of Generative AI Use of Copyrighted Works

I. Introduction

My deeply profound guiding Principle No. 1 for Responsible AI (“Don’t be scum”) starts not being quite so helpful right around here. My IP head and my IP heart diverge here when analyzing generative AI use of copyrighted works to train its AI models. This is where law and ethics collide.

Let’s look to our nation’s moral compass–the “Dude with Sign” guy–for guidance. His company FJerry LLC has continued its practice of suing companies for copyright infringement and violation of rights of publicity.¹ If you question the propriety of a meme-generator suing people for using his images like memes are commonly used, you are certainly not alone. But the fundamental right of any intellectual property is the ability to exclude others from using it. The Dude with Sign has successfully developed a widely recognized and valuable brand. If people continue to trade on your brand without permission, what other recourse do you have but to sue? Especially if you’ve already made it clear to the market that you will enforce your rights?

What’s my point? Well, if you think that generative AI providers and implementers who train their models using copyrighted materials are scum, then logic dictates you need to be on Team Dude with Sign on his copyright lawsuit campaign too. The rights and interests of content creators and those of Dude with Sign are fundamentally one and the same.

Let’s start exploring these issues with this first of many blog articles to come on generative AI and copyrights.

A. Principle 2: But where there is gray area….

1. How to Prove Copyright Infringement in Generative AI

Big picture, copyright owners must prove they own the copyrighted work and that the defendant misappropriated their exclusive right to reproduce and distribute it and all derivative works based on it.² Derivative works must be “substantially similar” to the copyrighted work to be infringing.

[cont’d ↗]

The featured image above is President Biden and the “Dude With Sign” guy for coronavirus vaccination in Aug. 2021.

Self-schedule a free 20-min. video or phone consult with Jim W. Ko of Ko IP and AI Law PLLC here.

2. Current generative AI copyright cases

a. The “substantially similar” requirement is a high bar

There are several pending cases filed against the generative AI use of copyrighted works. The initial returns for creators have not been promising, as shown in two prominent pending AI copyright cases: Andersen v. Stability AI Ltd.³ and Silverman v. OpenAI, Inc.⁴

Establishing the “substantially similar” requirement in generative AI cases is particularly challenging. This should not be a surprise. Unless a prompt instructs AI to generate “an image similar to X,” the output will not look like X. It is going to be a composite of numerous inputs, not appearing to be substantially similar to any of them.

“Substantial similarity” is even harder to establish with text, which is much easier to plagiarize without proper attribution. Replace a couple of choice words and play with the sentence structure, and you’ve made someone else’s idea your own. This is not necessarily improper. In fact, it happens to at least some degree in most writing, in particular in any sort of research paper.

Generative AI isn’t conceptually doing anything different than any of us area doing when we do research and analysis. Or when we are inspired by musicians or artists when making our compositions. It just does it better than any human being could in several respects, including the volume of information it “considers” when generating its output.

b. A failure to establish “substantial similarity” is also fatal to indirect copyright infringement claims

For a successful third-party claim against the big fish–the AI providers like OpenAI–for any copyright infringement by the AIaaS providers or AI implementers⁵ who incorporate, e.g., OpenAI’s ChatGPT into their own service offerings, the plaintiff needs to first establish that direct infringement by the AI implementer and/or its end-user took place.

Even if that significant hurdle is cleared, plaintiff also needs to establish that the AI provider:

knew or had reason to know of the infringing activity of that AI implementer; and
intentionally induced or materially contributed to that AI implementer’s infringing activity.³

These are also high bars.

c. It’s not how you start it’s how you finish…

Team Creators has seemingly taken loss after loss in the early stages of this first wave of AI lawsuits. This, however, should be taken as entirely expected. AI is a new world and raises issues that do not fit neatly within traditional IP and other legal concepts. In some ways AI turns IP entirely on its head.

Copyright plaintiffs’ counsel are understandably stretching the boundaries of copyright law in creative ways to try to capture the generative AI use of copyrighted works. They are swinging for the fences, as they should, and are not surprisingly striking out with many of these claims.

Some may ultimately be allowed after modification and some may be revived on appeal. The minority that survive this process may well be more around the margins or even entirely outside of copyright law. But that doesn’t mean they can’t ultimately develop into principles that might define or move the law for creators.

[cont’d ↗]

d. A key battleground will be on the improprieties with how the training data was collected.

With copyright infringement an uphill battle at best, one of the key under-the-radar battlegrounds will be the interpretation of Section 1201(a) of the Digital Millennium Copyright Act of 1998. This is an avenue by which third parties might establish liability based on how AI businesses collected their training data alone.

Section 1201(a) states: “No person shall circumvent a technological measure that effectively controls access to a work protected under this title.”⁶

There is currently a circuit split as to whether Section 1201(a) “created a new statutory anti-circumvention right distinct from infringement or whether circumvention of an access control is only actionable if it facilitates infringement.”⁷ The Ninth Circuit has held that Subsections 1201(a)(1) and 1201(a)(2) created ‘‘a new form of protection, i.e., the right to prevent circumvention of access controls, broadly to. . . copyrighted works’’⁸ In contrast, the Federal Circuit has required a showing of a causal link between the circumvention and actual copyright infringement.⁹

The propriety of AI businesses’ webcrawling activities bypassing firewalls and ignoring copyright protections to train their data will be clarified in the coming months and years. Perhaps the Supreme Court will weigh in and resolve the circuit split. More likely, Congress will pass legislation on this issue.

[cont’d ↗]

e. Creators, mark your creative works

Creators individually and collectively would have much stronger copyright infringement claims against generative AI companies if they more regularly did one simple thing: mark their creative works with a copyright notice. E.g., © 2023 Jim Ko. That’s all it takes for starters.¹⁰

The U.S. Copyright Act of 1976 officially did away with requiring such marking. This brought the U.S. in line with the international Berne Convention of 1886, conferring international copyright protection to U.S. individuals.¹¹ The intent was to strengthen copyright protections by making that the default for anything published.

Fast forward to today. We now have a well-entrenched custom of not affixing copyright marking on text and photos we post on the internet. This unfortunately plays right into the hands of AI providers and implementers and their webscraping activities.

i. Getty v. Stability AI watermark claims

Getty Images v. Stability AI illustrates some of the advantages of marking.¹² Marking adds, amongst other things, a claim under Section 1202(b) of the DMCA. Section 1202(b) states: “No person shall, without the authority of the copyright owner or the law … intentionally remove or alter any copyright management information….”¹³

One of Getty’s claims is that the generative AI sometimes removed Getty’s watermarks for its model training. Another is that you can even see remnants of Getty’s watermarks in the AI generated images themselves.

ii. Explore watermarking and NFTs

Copyright marking mitigates against one of AI providers and implementers’ “best” defenses: How are we possibly supposed to know or keep track of what on the internet is and is not copyright protected?¹⁴ Creators should determine the best way to apply their mark to any content they intend to publish, including using watermarking and nonfungible tokens (NFTs). I will further explore this issue in a future blog article.

II. Conclusion

There really is little guidance–legal, moral or otherwise–that I could meaningfully presume to provide AI providers or implementers in the face of such uncertainty in the law and ethics on some of these generative AI use of copyrighted works issues. One obvious recommendation is that AI implementers should seek indemnification from their AI providers from third-party copyright infringement claims, as more and more have started doing.¹⁵ Another is to monitor developments in the legislature and courts and try to stay one step ahead in your company’s AI policies. Or hire experienced counsel who can do this for you.

The copyright law, as currently drafted, may simply not adequately protect the interests of creators as individuals or as a whole. We will gain clarity in the years to come as this first wave of copyright AI cases winds its way through the courts.

The flip side is that copyright law and the law in general¹⁶ may impose restrictions on U.S. AI companies that may put us at a competitive disadvantage compared to the rest of the world, namely China.

I presume big tech is hedging its bets and pushing for legislation to “clarify” the law in its favor with respect to copyright issues, the DMCA, and other areas of law that impact generative AI companies. Creators are assuredly doing the same, with more limited resources but with strength in numbers. It will be fascinating to see how this all plays out.

Come back in the New Year for new blog articles on the 2d and 4th Mondays of every month.

Happy Holidays and New Year everyone!

See Kyle Jahner, Seltzer Company Ripped off DudeWithSign Instagram Pic, Suit Says, Bloomberg Law (Dec. 14, 2023), available here; FJerry, LLC v. Neatly Spiked LLC, Case No. 1:23-cv-10821 (S.D.N.Y. filed Dec. 13, 2023); Katie Notopoulos, The “Dude with Sign” Instagram Account Sure Does Sue A Lot Of Brands, BuzzFeed.News (Apr. 7, 2023), available here. ↩︎
See 17 U.S. Code Sect 106 (Exclusive rights in copyrighted works); How to Prove Copyright Infringement, copyright alliance, available here. ↩︎
See Blake Brittain, US judge finds flaws in artists’ lawsuit against AI companies (July 19, 2023), available here (discussing U.S. District Judge William Orrick’s comments during a motion to dismiss hearing in Andersen v. Stability AI Ltd., Case No. 3:23-cv-00201 (N.D. Cal. 2023)). ↩︎
See Joshua Benton, The legal framework for AI is being built in real time, and a ruling in the Sarah Silverman case should give publishers pause, NiemanLab (Nov. 27, 2023), available here ( discussing U.S. District Judge Vince Chhabria’s motion to dismiss ruling in Silverman v. OpenAI, Inc., Case No. 3:23-cv-03416 (N.D. Cal. 2023)). ↩︎
[Updated Nov. 2024] For definitions of “AI providers” (including “LLM providers” and “AIaaS providers”) and “AI implementers” (including those incorporating AI into one’s products and services and those implementing AI into one’s internal business processes) as used here and throughout this blog, see 11/13/23 blog article (“…and without implementing AI successfully, you will be replaced“). For purposes of this discussion the term “AI implementer” also includes can “AI-Agent providers” with respect to the relationships with their “LLM providers.” ↩︎
17 U.S.C. Sect. 1201(a). ↩︎
Ian C. Ballon, E-Commerce & Internet Law, Vol. 1, Ch. 5 Data Scraping, Database Protection, and the Use of Bots and Artificial Intelligence to Gather Content and Information, at Sect. 5.07[1] DMCA Anti-Circumvention Provisions. ↩︎
MDY Industries, LLC v. Blizzard Entertainment, Inc., 629 F.3d 928, 945 (9th Cir. 2011). ↩︎
See Storage Tech. Corp. v. Custom Hardware Engineering & Consulting, Inc., 421 F.3d 1307, 1318-19 (Fed. Cir. 2005) (applying First Circuit law); Chamberlain Group, Inc. v. Skylink Technologies, Inc., 381 F.3d 1178, 1192-1203 (Fed. Cir. 2004) (applying Seventh Circuit law). ↩︎
For specific guidance, see Copyright Notice, U.S. Copyright Office, available here. ↩︎
See Summary of the Berne Convention for the Protection of Literary and Artistic Works (1886), World Intellectual Property Organization, available here. ↩︎
Case No. 1:23-cv-00135 (D. Del. 2023). ↩︎
17 U.S.C. Sect. 1202(b). ↩︎
Of course, the real answer is that they could simply set their own AI tools–which are perfect for this exact type of challenge–to the task. But that’s neither here nor there. ↩︎
See Kyle Wiggers, OpenAI promises to defend business customers against copyright claims, TechCrunch, available here. But buyer beware, the exceptions can easily swallow the rule when it comes to indemnification terms. ↩︎
We’re looking at you, U.S. patent law…. ↩︎