Your guide to responsible AI contracting. Part 1: Don’t be scum.

I. What is “responsible AI”?

Most people would agree we should implement AI responsibly and appropriately balance the interests of each stakeholder and society. But AI may be the most disruptive innovation in history, promising to increase overall productivity and displace workers to an unprecedented degree. Figuring out what the proper balance is not easy. “Responsible AI contracting” is a prerequisite.

This issue was from all accounts at the heart of this past week’s failed coup attempt at AI’s present avatar OpenAI of ChatGPT fame. It is safe to assume both sides lay claim to embodying “responsible AI.” They simply can’t both be right.

Whatever the heck “responsible AI” is, the contractual arrangements, terms of use, and data protection and privacy policies of and between generative AI providers, implementers, and end-users¹ should reflect such a balance. And governments should set up system of laws and regulations that incentivize “responsible” behaviors maintaining this balance and disincentivize those that disrupt it.

This is the first in an ongoing series on responsible AI contracting, in which I will:

lay out the key ethical and legal issues raised by generative AI,
assess how they are addressed by current AI providers’ terms of service, privacy policies, and acceptable use policies, and
provide guidance on how you as an AI provider, implementer, or end user can use “responsible AI” to drive your contract negotiations to secure a closer approximation to the protections you deserve, at least on principled grounds.

Let’s start with the lowest hanging fruit and then go from there. This article will focus on my first Principle of Responsible AI: Don’t be scum. We’ll turn our immediate attention on the interests of third parties with whom you don’t have a contractual relationship.

II. Principle No. 1: Don’t be scum

A. Rule 1(a): Don’t bypass paywalls and copy

1. The OpenAI-Microsoft Bing Search paywall case study

In May, OpenAI provided one of our seminal “Oh shit, we didn’t know that would happen” moments in AI history. OpenAI beta launched ChatGPT Browse, a Microsoft Bing-based search engine feature. This launch incorporated ChatGPT 4.0, trained on a broader and more up-to-date dataset than earlier versions.²

Users discovered they could use Bing Search to bypass website paywalls. If you asked for the text of an article for which access required a paid subscription, ChatGPT dutifully obliged.

ChatGPT disabled Bing Search in July to “do right by content owners.“ It addressed this issue and then fully launched Bing Search in September.

a. not-so-OpenAI at work

“Responsible AI contracting” in action, right? Not in the least. OpenAI knew damn well that ChatGPT was already bypassing firewalls to deconstruct copyrighted information into its component parts and use them to train its chatbot capabilities. Its entire business model is predicated on doing exactly this and getting away with it.³ The volume of training data is king in the generative AI competitive world–more data enables more accurate and powerful generative AI chatbots. Any AI provider that limits itself from any volume of relevant (and probably higher-quality) available data will be left behind.

OpenAI’s “mistake” was just that it didn’t realize that this capability could be coopted by its Bing Search users to steal component parts that were big enough to be recognizable as being from the original copyrighted articles, images, etc. And thus, such that OpenAI’s hands would be unequivocally caught in the cookie jar for contributory copyright infringement liability in violation of the Digital Millennium Copyright Act Sect. 1201(a).⁴

[cont’d ↗]

Self-schedule a free 20-min. video or phone consult with Jim W. Ko of Ko IP and AI Law PLLC here.

1. The OpenAI-Microsoft Bing Search paywall case study, cont’d

b. Why even AI providers respect website paywalls

Respecting website paywalls and preventing wholesale copying is the closest thing we will find to unanimous agreement.⁵ Enough to even overcome the stock position of manufacturers and service providers everywhere that they shouldn’t be held responsible for the illegal activities of their users.

This paywall issue has a number of unique things going for it in this AI arena:

The reproduction of such paywalled information is in clear violation of the terms of use of any website, copyright law, and possibly the Computer Fraud and Abuse Act of 1986. Good fences make good neighbors.
There was a feasible technological solution to the issue, as reflected in subsequent documentation OpenAI released:
- OpenAI can filter its GPTBot webcrawler searches to remove sources that require paywall access to at least some degree.
- In addition, OpenAI (and presumably all generative AI providers) can make its GPTBot webcrawlers transparent, such that website hosts know when and from where its bots are crawling through their sites. ChatGPT issued guidance to all websites on how to specify what parts they want to close off from OpenAI’s GPTBot webcrawlers.⁶

Nonetheless, OpenAI is relatively if not completely unique on this front. Most if not all AI providers have not provided such an avenue to third-party website hosts to block their webcrawlers.

Furthermore, any such universal respect for website paywalls is simply not reflected within the policies of AI providers.

A. Rule 1(a): Don’t bypass paywalls, cont’d

2. How current AI provider’s customer policies address paywall bypassing

They don’t. At least not directly. In my survey of ~12 generative-AI providers’ policies, none mention paywalls or the concept of their webcrawlers bypassing them.

This should not come as a surprise. As noted, the generative AI provider’s business model is predicated on bypassing paywalls.

Given OpenAI’s recent Bing Search history discussed above, its policies are most informative on this issue. It still accepts no responsibility for ChatGPT bypassing paywalls with respect to any customer that uses its free platforms. OpenAI’s governing Terms of Use require its users to “indemnify and hold harmless” Open AI “from and against any costs, losses, liabilities, and expenses (including attorneys’ fees) from third party claims arising or relating to use” of ChatGPT.⁷ OpenAI disclaims any warranties that it doesn’t expressly assume, and respect for paywalls ain’t one of them.⁸ [cont’d ↗]

Doesn’t seem particularly “responsible,” but in fairness, OpenAI is in good company. In fact, Open AI’s policy is currently one the better ones on this issue.

The closest AI providers get is to have IP indemnification policies stating they will defend their customers against third-party claims. OpenAI, Google, Adobe, Microsoft Copilot, and IBM offer IP indemnification terms, but only in their contract terms for paid services.⁹ This in effect covers paid AI implementers and end users from liability for at least unwitting paywall hopping which they did not contribute to.¹⁰

In contrast, several AI providers including Cohere (Generate) and Anthropic (Claude)’s do not provide any indemnification for IP infringement whatsoever. They flip it entirely, imposing broad indemnification obligations on their AI implementers and end-users to protect Cohere and Anthropic from all third party IP claims.¹¹ [cont’d ↗]

*3. Negotiating paywall bypassing contract terms on principled grounds**

In general, a provider should indemnify its customers from liabilities stemming from its products or services that are entirely within its control and/or from liabilities that are entirely outside of the control of its customers. Bypassing paywalls falls squarely under this category, at least when end users are not intentionally trying to do so when typing their prompts into the AI provider’s platform.

Negotiating in such terms may simply not be possible, because leverage is everything in contract negotiation. But at a minimum, you should renegotiate any portion of your AI provider’s policies and standard contract terms that assign such liabilities to you. E.g., with Cohere and Anthropic, you should carve out liability for unintentional paywall bypassing that is baked into their platforms.

You may not get everything you want, but you will get closer to what you deserve with responsible AI contracting as your guide.

*Note: Nothing in this blog constitutes legal advice or the formation of any attorney-client relationship. See Disclaimers.

II. More responsible AI contracting to come…

My goal for this series is to present everything you need to know to negotiate AI contracts on principled grounds. Responsible AI contracting requires having an appropriate legal, technological, historical, and societal context for these issues. I will attempt to capture the competing interests of all stakeholders and gather all the information necessary for you to fully evaluate “What is Responsible AI?”

If I do this right, this will be the best stuff out there on AI and this paradigm shift of our times.

If you have started to look to this blog as a go-to resource on any of the above, then please:

subscribe below, and
spread the word to any of your friends and colleagues who might be interested.

I will post a new article every Monday around 1 PM Pacific.

Come back next Monday for the next article in this series:

Your guide to responsible AI contracting.

Part 2: My body and identity, my choice

[updated Nov. 2024] For definitions of “AI providers” (including “LLM providers” and “AI-Agent providers”) and “AI implementers” (including those incorporating AI into one’s products and services and those implementing AI into one’s internal business processes) as used here and throughout this blog, see 11/13/23 blog article (“…and without implementing AI successfully, you will be replaced“). ↩︎
Per OpenAI’s September 27 tweet, ChatGPT 4.0 can now browse the internet real time and “is no longer limited to data before September 2021.” ↩︎
This statement may not have been entirely accurate before last week’s failed coup at OpenAI. OpenAI’s founding principles were to practice transparency including with respect to how its generative AI works and to keep generative AI out of the clutches of big tech. It was uniquely formed as a nonprofit, later forming a for-profit subsidiary as a necessary evil to solicit the massive investment (most notably from Microsoft) required to build the infrastructure and hire the AI programmers necessary to develop a competitive generative AI. While we certainly don’t know all the details, it appears clear that the then majority “transparency camp” in OpenAI’s board resolved to terminate OpenAI’s CEO Sam Altman who had come to champion the capitalistic drive of its for-profit subsidiary. Team Altman won, with a populist revolt of its AI programmers who threatened to walk out with Altman and into the clutches of Microsoft.

So perhaps OpenAI’s entire business model before last week was not in fact entirely predicated on secretly training on copyrighted information and getting away with it. But it is now! ↩︎
12 U.S.C. Sect. 1201(a)(1)(A) (“No person shall circumvent a technological measure that effectively controls access to a work protected under this title.”). ↩︎
But even here, there is room for some principled dissent. Social media and generative AI already test the marketplace of ideas theory so fundamental to the health of a democracy with the proliferation of fake news generated with the specific intent of presenting it as real. When the major news outlets tie up any body of news and opinion pieces reported and written by verified reporters and columnists behind paywalls, there is less “real news” available to stem the tide of unlimited “fake news” that AI can and will be used to generate. I personally do not believe this particular greater good should trump the rights of creators to their own work product and support the hopping of paywalls. But it is worth noting the logical tie-ins between this dissent and the broader challenges of combatting “algorithmic discrimination.” ↩︎
This was, however, more of a one-off workaround than a solution to the issue. All generative AIs send their webcrawlers to comb the internet and all of them access paywalled information. The OpenAI/ChatGPT solution only works to block the GPTbot. A broader but still ad hoc solution would need to be jointly developed by all generative AI providers who opt-in. Any universal, compulsory solution would need to come from our federal legislature and ultimately from global cooperation between governments. ↩︎
OpenAI’s Terms of Use, updated Nov. 14, 2023, available here. ↩︎
See id. at “Disclaimer of Warranties.” ↩︎
OpenAI’s IP Indemnification terms are only found in its Business Terms, directed to its ChatGPT Enterprise Users (updated Nov. 14, 2023, available here). Google’s IP Indemnification terms are found in Google’s Cloud Platform Terms Of Service (last modified November 16, 2023, available here) (excluding any “Services provided to Customer free of charge” in Sect. 13.3(c)). A summary of Adobe Firefly’s IP indemnification terms for its enterprise customers are found in Firefly Legal FAWs – Enterprise Customers (Sept. 13, 2023, available here). A summary of Microsoft Copilot’s IP indemnification plans can be found in its Introducing the Microsoft Copilot Copyright Commitment announcement (September 7, 2023, available here) (stating the commitments applies to “paid versions of Microsoft commercial Copilot services and Bing Chat Enterprise”). A summary of IBM’s watsonx models’ IP indemnification plans can be found in its press release (Sept 28, 2023, available here).

ScribeAI tries to in effect do the same, offering IP indemnification for all customers, but imposing a limit to its liability to the amount paid. See Scribe’s Terms of Use (last modified April 7, 2020, available here) (see Sect. 8 Indemnification and Sect. 10. Limitation of Liability). This is standard practice with indemnification terms. ↩︎
Each have additional limitations that if not met exclude the application of any IP indemnification terms. These are drafted with varying levels of opaqueness, which I will dig into in a future article (I’m looking at you OpenAI and Amazon…) ↩︎
See Cohere’s Terms of Use (undated, last accessed Nov. 27, 2023, available here) (see Sect. 21 Indemnification); Anthropic’s Terms of Service (ver. 4.0, effective Sept. 6, 2023, available here) (see Sect. 12. Disclaimer of Warranties, Limitations of Liability, and Indemnity). Both Cohere and Anthropic put the entire burden of any IP infringement by third-party users on the AI implementers who incorporate their platforms into their service offerings. They take zero responsibility for any such third party IP infringement claims. ↩︎

One response to “Your guide to responsible AI contracting. Part 1: Don’t be scum.”

Your duties to your AI customers and their private data – Ko IP & AI Law PLLC

December 11, 2023

[…] to AI implementers: this is not like internet paywall bypassing and deepfake pornography (discussed in Part 1 and 2 of this series). The duty to your AI customers […]