Your duties to your AI customers and their private data

Any discussion of “responsible AI” and your use of AI customer data simply cannot start with the pretense: “It’s currently not illegal, so we’re good!” As noted in our discussion of deepfake pornography last week, the law simply hasn’t caught up with the technology. Perhaps the same can be said for data privacy issues. But it’s also harder to catch up here in the U.S. when we seem to be moving sideways at best.

There is no comprehensive federal data privacy law in the U.S.

In the U.S., there is no constitutional right to data privacy. Nor is there a comprehensive federal law on data privacy.

In Europe, the rights to privacy and data protection are both considered fundamental rights.¹ Under Europe’s comprehensive privacy law, General Data Protection Regulation (GDPR), companies have to ask individuals for permission to share data and individuals have the right to access, delete, or control the use of that data.²

In the U.S., the courts determine an individual’s data privacy rights and an organization’s ability to use her personal information on a case-by-case basis. The courts factor in a hodge-podge of:

various federal laws regulating finance, health, communications, consumer protection, law enforcement, workplace privacy, student privacy, and more
applicable law in the individual’s home state³
applicable industry standards, and
applicable contractual requirements.

Key responsibilities for customer AI data custodians: transparency and security

As such, U.S. companies can by default sell their customer’s private information unless a law or court specifies otherwise. This is common knowledge. But most simply do not have a full understanding of the implications and how this can harm them. “What’s currently unclear for many consumers is the complex and indirect ways that companies go about monetizing through tracking, bundling, and profiling their personal information and behavior in order to further influence parents’, kids’, and consumers’ behavior.”⁴

It would be both irresponsible and, in most cases, bad business for U.S. customers to sell their customer’s private information. Certainly not without consent.

Your key responsibilities as custodian of your AI customer data are to be transparent and take “reasonable measures” to secure it.

Note to AIaaS providers and AI implementers⁵: this is not like internet paywall bypassing and deepfake pornography (discussed in Part 1 and 2 of this series). The duty to your AI customers and their data is primarily if not entirely yours (not your LLM provider’s).⁶

I. Principle No. 1: Don’t be scum, cont’d

A. Rule 1(c): Be transparent about what you will do with your AI customer data

If your AI implementer or end user customers are good with you selling or sharing the data they provide to you to a third party, then:

tell them what you plan to do (and not do) with their data
secure their consent, and
go forth and prosper.

1. Facebook: Do as I say, don’t do as I do…

Not surprisingly, you shouldn’t say you will keep your customer’s data secure and then renege and sell. Nor should you give assurances regarding your security measures, and then not live up to them.

Facebook did all of the above in multiple ways, in specific violation of an earlier 2012 settlement it had made with the Federal Trade Commission.⁷ Facebook “repeatedly misrepresented the extent to which users could control the privacy of their data.”

In 2019, Facebook agreed to implement a new privacy structure and to additional FTC monitoring to settle. Facebook further agreed to a record $5 billion dollar penalty.

AI businesses should determine whether you plan to use any of your AI implementer or end-user customer’s data to train your AI models. Just like whether you plan to sell or share any of your customer’s data. Both purposes are unrelated to the delivery of your products or services.

It’s fine if you do. Just be transparent about it, most obviously by complying with any disclosure obligations.

2. It’s not “transparent” if you need a lawyer to understand the basics

a. How not in control of my data am I, exactly…?

There are three key questions that consumers should care about regarding the data they provide to an AI provider or implementer:

Will you sell or share my data to third parties unrelated to the delivery of your AI services?
Will you use my data to further train your AI models?
If yes to either 1 or 2, will you remove all my personal identifying information (PII) from the data beforehand?

For my part, I’d prefer to know the answers (AI providers are simply not forthcoming re no. 1 in particular) and better yet have the ability to opt-out of 1 and 2.

But I would generally consent regardless so long as the data:

is “disaggregated”/”de-identified” from me (i.e., the information within cannot be connected back specifically to me); and
does not include my financial account information

b. When “legal disclosure” and practical reality collide…

I don’t, however, have time to pore over a company’s privacy policy to figure this all out before every purchase. Nor do I have the time to figure out all the ways an app will unnecessarily access the other information in my phone. And if an IP lawyer like myself can’t manage this, it’s safe to say that 99.9% of other people do not either. For most, even if they had the time, they wouldn’t have the ability to sufficiently comprehend the ever finer print.

Some companies do better than others here, in particular with the “acknowledgment” boxes that must be checked before proceeding. Most, however, fail miserably. [cont’d ↗]

Self-schedule a free 20-min. video or phone consult with Jim W. Ko of Ko IP and AI Law PLLC here.

There is no reason that companies couldn’t provide this basic information in plain English if they were either required or inclined to do so. All companies based or doing business in Europe are doing it, as required by the GDPR. All larger companies based or doing business in California are already doing it, as required by the CCPA.⁸

California Consumer Privacy Act and your AI customer data

California has by far the strongest consumer-privacy protections in the U.S.

B. Rule 1(d): Take “reasonable measures” to secure your AI customer data

What would you do if a regulator like the Federal Trade Commission starts an enforcement action against you for a data breach of your AI inputs or outputs? Or if your customer files a private claim against you for the same?

Your primary defense will be to establish that your security efforts were “reasonable.” The level of security required depends on several factors, including:

the type of information at issue (e.g., social security numbers v. nonpublic but not particularly sensitive business information)
the type of organization, and
the levels of resources your organization has available to it

Guidance on this issue provided by various regulators is general, not specific. And it is not legally binding in any event.

Courts look to industry customs to inform a reasonable security measures analysis. And “in some instances, legislatures and regulatory agencies have already identified particular security measures or ‘controls’ to be worth the cost of implementation and have required them.”⁹

Some of the privacy measures the FTC imposed on Facebook should apply to all AI companies of all sizes, including:

encrypting passwords (and let’s go ahead and add social security, credit card, and bank account numbers, OK?)
don’t use phone numbers given as part of two-factor authentication for advertising purposes
don’t retain personal information that users deleted on your servers
don’t let your employees have free access to your customer information.¹⁰

C. Rule 1(e): Fess up if you have to.

There will be federal and state laws and regulations to come mandating what should do in response to a data breach impacting your AI customer’s data.

Until then, just be aware that each state has its own data breach response requirements. There are notification requirements that vary state-to-state to: 1.) regulators; 2.) credit/consumer reporting agencies; and 3.) impacted individuals.¹¹

One universal takeaway is that you should keep your customer’s sensitive information encrypted. Virtually all states exempt you from any notification requirements in the event of a data breach when you do.¹²

D. How current AI provider customer policies address use of customer data

1. For businesses

The market is being set for data privacy of business proprietary data as we speak.

a. AI provider enterprise service offerings from the big guns

AI providers, in particular the larger ones, will often agree to not share, sell, or use their enterprise customer’s data provided for unrelated purposes.

The reason is simple. All businesses need to protect their confidential, proprietary information. This force is significant enough to even trump AI providers’ insatiable need for data to train their models. AI providers simply would not get business from the corporate world without such protections.

Both Google Workspace and Microsoft Copilot have clear policy statements or plans to isolate and protect their customer’s data from use in the training of their AI.¹³ Similarly OpenAI “do[es] not use content submitted by customers to [its] business offerings such as [its] API and ChatGPT Enterprise to improve model performance.”¹⁴

Some AI providers, however, exclude commercial customer data from model training purposes only if they opt out.¹⁵ [cont’d ↗]

b. AI provider or implementer service offerings for businesses

The non-behemoth AIaaS providers and AI implementers, however, take a wide variety of approaches on this front. Their AI service terms and conditions and privacy policies tend to be the same for their business customers as those for consumer customers, as presented below.

2. For consumers

a. Will you sell or share my data to third parties or use to train your AI models?

A minority of AI providers affirmatively state they will not sell or share their customer data without consent.

Many do sell or share their customer, but few are particularly transparent about this. E.g., “This Policy places no limitations on our use or sharing of Aggregate/De-Identified Information.”¹⁶ Translation: “Yeah, we’re totally selling your data, but by disclosing this, you can’t sue us for lying about it even though we’re kinda hiding it.”

b. Will you de-identify my data?

The best practice for AI providers is to “further take steps to reduce the amount of personal information in [] training datasets before they are used to improve [the AI provider’s] models.”¹⁷ This is in my estimation where the rubber will hit the road in terms of any government regulatory efforts to address the problem of AI and personal data privacy.

The Biden executive order on AI references the need to develop and implement “privacy-enhancing technology” (PETs), but only references imposing them on data collected and stored by federal agencies for this effort. But if mitigating against the inexorable spread of personal data is the goal, it is the AI providers and implementers that must implement such PETs into their own generative AI processes.

In the absence of laws and regulations and standards setting for PETs, there will be no accountability on this issue. Just more aspirational statements, along the lines of: “We’re going to try our best to de-identify your information when we sell it to a third party for reasons unrelated to our services to you. But if we fail to do so and you become a victim of identify theft because of it, then oh well.” [cont’d ↗]

c. Opt out or opt in?

AI providers provide their users the ability to opt-out of having their prompts and output used to train the AI. This aligns with the general practice requiring users to opt-out of having their data sold to third parties.

Some advocate for these defaults to be flipped.

The key distinction for me remains whether or not the data is “de-identified” from my name. But so-called “reasonable efforts” to de-identify are not sufficient here.

Actually de-identified should be the standard on this specific issue. The AI provider is running in effect a side-hustle off of my data. The AI provider has control over how it collects and processes its data. And the AI provider has the (AI) technological means to screen it. The standard should be higher.

If my personal data is actually de-identified at a minimum as of the time that it is sold, shared, or used, then I would generally have no issue with an opt-out regime. Examples of AI providers that specify that they will use or share only aggregate/de-identified information include Scribe¹⁸ and Anthropic.¹⁹ But most build in considerable wiggle room in their verbiage.

But if not, then responsible AI would require that it should be opt-in. People should have to affirmatively provide clear, informed consent for their data to be sold or shared with third-parties for unrelated purposes, especially if there is any risk that sensitive information will be connected back to them.

*Note: Nothing in this blog constitutes legal advice or the formation of any attorney-client relationship. See Disclaimers.

II. Conclusion

It will be interesting to see how the politics on these data privacy issues develop in our new AI age.

AI is the poison. It greatly heightens the data privacy risks involved. Any private information that is posted publicly even for an instant may never become private again. AI will find it.

AI is also the closest approximation to a cure that we have. In principle, AI can be used to find all personal identifying information and remove it (at least in a given data repository at a given moment in time). The devil as always will be in the details.

Figuring out what other AI providers are doing and what applicable federal and state laws are should be your starting point for mitigating against AI data security risks. Figuring out where the law is going should be your target.

Find experienced counsel who actively tracks all of this. Someone who can help you develop and implement comprehensive AI and cybersecurity policies now. This will save you a lot of heartache down the road.

III. Where we’ve come from and where are we going?

We’ve completed our journey through the responsible AI no-brainers over the past three weeks. We’ve covered internet paywalls, deepfake pornography, and customer data privacy policies.

In the coming weeks, we’ll be tackling the tougher-to-call topics of:

AI provider responsibilities when using copyrighted works to train its generative AI models,
political ad deepfakes, and
AI provider responsibilities when collecting third-party private information that is publicly available, but shouldn’t be.

Come back next Monday for the next article in this series:

Your guide to responsible AI contracts.

Part 4: The law and ethics of generative AI use of copyrighted works

European Convention of Human Rights, art. 8 (Right to respect for private and family life); European Charter of Fundamental Rights, arts. 7 & 8 (Protection of Personal Data. “1. Everyone has the right to the protection of personal data concerning him or her. 2. Such data must be processed fairly for specified purposes and on the basis of the consent of the person concerned or some other legitimate basis laid down by law. Everyone has the right of access to data which has been collected concerning him or her, and the right to have it rectified. 3. Compliance with these rules shall be subject to control by an independent authority.”). ↩︎
General Data Protection Regulation (GDPR), art. 5 (Principles relating to processing of personal data), available here. ↩︎
The state of residence is particularly important if it is California, which has developed its own comprehensive data privacy law similar to Europe’s GDPR. The California Consumer Privacy Act (CCPA) provides strong consumer-privacy friendly mechanisms such as: 1.) a “global opt out”, by which residents can set their internet browsers to automatically notify every website that a user wishes to opt out of the sale of their personal data or use of it for targeted advertising (six other state’s privacy laws require this too, but only California put some more teeth behind it–see Samuel Adams and Stacey Gray, Survey of Current Universal Output Mechanisms, Future of Privacy Forum (Oct 12, 2023), available here; and 2.) a private right of action by which individuals can sue companies directly for any violations (California stands alone here–see Comparing US state-level data privacy laws, usercentrics, available here). Most states that do not have a comprehensive data privacy law in place or in the works, the majority of those that do are not so consumer-privacy friendly, including Virginia. For a website tracking U.S. state privacy legislation, see The International Association of Privacy Professions (IAPP) here. For some critical commentary, see Todd Feathers, Big Tech is Pushing States to Pass Privacy Laws, and Yes, You Should Be Suspicious, The Markup, April 15, 2021, available here.

The private right of action is a particularly hot-button issue for the development of any federal or state comprehensive data privacy statute. The consumer-privacy side of the argument is that in the absence of such a right to sue, individuals have to entirely rely on federal law enforcement or state attorney generals (if there is an applicable federal or state law) or federal or state regulators (again, if applicable) to protect their privacy interests and that is more miss than hit. The business side of the argument is that private rights of actions will be abused by individuals and their attorneys, imposing disproportionate and potentially debilitating litigation costs on businesses.

Some of the existing applicable federal laws specifically provide for private rights of action, most notably including The Fair Credit Reporting Act, permitting consumers to recover actual damages from “any person who is negligent in failing to comply with a [credit reporting] requirement” and potentially willful damages for willful violations. 15 U.S.C. § 1681o-n (1996), as amended by the Fair and Accurate Credit Transaction Act in 2003. See also Telephone Consumer Protection Act, 47 U.S.C. §§ 227 (1991) (including private right of actions for “actual monetary loss or $500 per telemarketing violation, whichever is greater,” and up to treble damages for willful violations). ↩︎
Jeff G., A Majority of Apps Are About to Come Clean and Say They’ve Been Selling Your Data All Along, common sense education (Mar. 29, 2022), available here. For a comprehensive study, see 2021 State of Kids’ Privacy, common sense education (2021), available here. ↩︎
For definitions of “AI providers” (including “LLM providers” and “AIaaS providers”) and “AI implementers” (including those incorporating AI into one’s products and services and those implementing AI into one’s internal business processes) as used here and throughout this blog, see 11/13/23 blog article (“…and without implementing AI successfully, you will be replaced“). ↩︎
[Updated Nov. 2024] For definitions of “AI providers” (including “LLM providers” and “AI-Agent providers”) and “AI implementers” (including those incorporating AI into one’s products and services and those implementing AI into one’s internal business processes) as used here and throughout this blog, see 11/13/23 blog article (“…and without implementing AI successfully, you will be replaced“). ↩︎
To be precise, Facebook made customer data available to other parties either in exchange for more data or as payment to, e.g., Facebook app developers. See, e.g., Alexis C. Madrigal, Facebook Didn’t Sell Your Data; It Gave It Away, The Atlantic (Dec. 19, 2018), available here. You can judge for yourself whether Zuckerberg had his fingers crossed when he claimed in his infamous Wall Street Journal op-ed in response to the Cambridge Analytica scandal, “We don’t sell people’s data, even though it’s often reported that we do.”

In the Cambridge Analytica scandal itself, the British consulting firm paid for data harvested by a third-party who developed an app using an Application Programming Interface that Facebook made available. Through this Facebook API, the data of 87 million Facebook users was accessed, including public and private information. For a discussion of the inadequate security measures Facebook put in place for this API, see Ronnie Mitra, How the facebook API led to the Cambridge Analytica Fiasco, APIacademy (June 15, 2018), available here. ↩︎
The CCPA applies to for-profit businesses that do business in California and meet any of the following:
— Have a gross annual revenue of over $25 million;
— Buy, sell, or share the personal information of 100,000 or more California residents, households, or devices; or
— Derive 50% or more of their annual revenue from selling California residents’ personal information. ↩︎
The Sedona Conference, Commentary on a Reasonable Security Test, 22 Sedona Conf. J. 345, 358 (2021). ↩︎
See Andrew Morse and Queenie Wong, Facebook-FTC settlement: What you need to know about the $5 billion deal, CNET, available here. ↩︎
For a summary of this information and how to develop a data breach incident response plan, see The Sedona Conference, Incident Response Guide, 21 Sedona Conf. J. 125 (2020). ↩︎
Id. at 182-83. ↩︎
See How we’re protecting your Google Workspace data in the era of generative AI, Google (“Your data is your data,” “Your data stays in Workspace,” “Your content is not used for ads targeting,” “Your interactions with Duet AI stay within your organization”, “Your content is not used for any other customers,” etc.), available here. See Our vision to bring Microsoft Copilot to everyone, and more, Microsoft Bing Blogs (Nov. 15, 2023) (“With Copilot’s commercial data protection, prompts and responses are not saved, Microsoft has no eyes-on access to it, and it’s not used to train the underlying models.”), available here; see also Data, Privacy, and Security for Microsoft Copilot for Microsoft 365, Microsoft (Dec. 5, 2023), available here. Naturally this only applies to your data that you keep within the Google or Microsoft platforms and to which you apply the required security settings. ↩︎
See Michael Schade, Data usage for consumer services FAQ, OpenAI, available here. ↩︎
See, e.g., Cohere Data Usage Policy, Cohere (last update: Oct. 30, 2023), available here. ↩︎
See Privacy Policy, Colony Labs (d/b/a Scribe), available here. ↩︎
See Michael Schade, How your data is used to improve model performance, OpenAI, available here. See also Cohere Data Usage Policy, supra note 12 (“API data undergoes a sanitization process before storage. Before being fed into any training models, our team removes common sources of personal information.”). ↩︎
See Privacy Policy, Colony Labs (d/b/a Scribe) (“This Policy places no limitations on our use or sharing of Aggregate/De-Identified Information.”), available here ↩︎
See Privacy Policy, Anthropic (version 3.0, effective July 8, 2023) (“We use your personal data for the following purposes … To de-identify it and train our AI models”), available here. ↩︎