Privacy-enhancing technologies: Will AI providers be held accountable too?

[Privacy-Enhancing Technologies] could usher in a paradigm shift in how we as a society protect privacy while deriving knowledge from data. However, there are also risks that PETs could provide a false veneer of privacy, misleading people into believing that a data sharing arrangement is more private than it really is.
Alexander Macgillivray & Tess deBlanc-Knowles, U.S. Office of Science and Technology Policy, Advancing a Vision for Privacy-Enhancing Technologies, JUNE 28, 2022, available here.

I. Introduction

President Biden’s October 2023 Executive Order on AI pays special attention to fostering the development and implementation of “privacy-enhancing technologies” (PETs). Sec. 9 (Protecting Privacy) of the Order is focused almost entirely on the topic. The Office of Management and Budget, Federal Privacy Council, Interagency Council on Statistical Policy, Office of Science and Technology Policy, Secretary of Commerce, National Science Foundation all have designated roles to play.

The focus is entirely on the development and implemention of PETs by the federal government. There is no discussion on imposing any duties, requirements, or liabilities on the generative AI providers who contribute significantly to the threat to our private information. Let’s explore why.

II. Privacy-enhancing technologies (PETs)

A. All named PETs focus on data repository and sharing protections

The term “privacy-enhancing technology” means any software or hardware solution, technical process, technique, or other technological means of mitigating privacy risks arising from data processing, including by enhancing predictability, manageability, disassociability, storage, security, and confidentiality. These technological means may include secure multiparty computation, homomorphic encryption, zero-knowledge proofs, federated learning, secure enclaves, differential privacy, and synthetic-data-generation tools.
Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, Sec. 3(z), Oct. 30, 2023, available here.

Data disassociation or deidentification is a key concept for privacy-enhancing technologies. Data de-identification “unlinks” individuals from their sensitive information. Once personal identifiers are removed or transformed using data deidentification, the data can be reused and shared without implicating data privacy (presuming the data can’t be re-paired with the individuals it was originally disassociated from, which AI is really, really good at doing).

“Disassociability” is defined by NIST as: “[e]nabling the processing of PII or events without association to individuals or devices beyond the operational requirements of the system.”

A full discussion of the 7 different “technological means” for privacy-enhancing technologies (PETs) listed above is beyond the scope of this blog article.¹ Like all security measures, they generally entail a trade-off between privacy and utility.

It is, e.g., certainly technologically possible to encrypt all data. And it would be legally advantageous to do so, as virtually all states exempt you from any notification requirements in the event of a data breach when you do.² But the general assumption remains that while highly sensitive information should be encrypted, it would be a bridge too far to impose such a requirement on all just-sorta-sensitive information. The concern is it would be cost-prohibitive to do so and/or it would slow down processing speeds for systems too much.³

Notably, all 7 listed technological means focus on the data repository side. They are repository and sharing protections that can only be implemented by the organizations that manage person-related data for their businesses. [cont’d ↗]

Federal regulatory agencies charged with developing and implementing privacy-enhancing technologies (PETs) in the President Biden Oct. 2023 Executive Order on AI

Self-schedule a free 20-min. video or phone consult with Jim W. Ko of Ko IP and AI Law PLLC here.

B. No PETs focus on personally-identifiable information (PII) detection

But what about technologies or practices that the AI providers who actively collect massive amounts of data to train their models—through webcrawlers or otherwise—can implement? Shouldn’t AI providers bear responsibility for

mitigating against the collection of private information in the first instance, and/or
scrubbing data collected of such private information after the fact?

The closest the Biden Oct. 2023 Executive Order gets to this is:

Sec. 9. Protecting Privacy. (a) To mitigate privacy risks potentially exacerbated by AI — including by AI’s facilitation of the collection or use of information about individuals, or the making of inferences about individuals — the Director of OMB shall:

(i) evaluate and take steps to identify commercially available information (CAI) procured by agencies, particularly CAI that contains personally identifiable information and including CAI procured from data brokers and CAI procured and processed indirectly through vendors …

This responsibility is as top-down as it gets. It is directed at the federal government and its collection or use of any “commercially available information (CAI) procured by agencies, identifiable information and including CAI procured from data brokers and CAI procured and processed indirectly through vendors….” The Director of Office of Management and Budget is directed to “evaluate and take steps to identify” such CAI and to issue a Request for Information and develop privacy impact assessments to mitigate against resulitng privacy risks, “including those that are further exacerbated by AI.”⁴

And as to those who are doing the exacerbating—the generative AI providers crawling the internet to train their AI models? The Executive Order is notably silent toward them and to this entire issue.

I acknowledge that it is objectively unfair on some levels to impose liability for the collection of publicly available information. It, however, is also objectively unfair on every level to the public that a single unauthorized release of your social security number, bank information etc. should doom you to having that information permanently publicly available due to the inexorable work of generative AI webcrawlers and model building. In particular since generative AI providers have the perfect tool to screen for and identify after the fact private information in the massive amounts of data that they collect—their very own AI technology.⁵

III. Conclusion

Perhaps that side of the coin is actually being addressed in full elsewhere, and we shouldn’t rush to judgment here as to our federal government’s efforts here.

But there is no right to data privacy in the U.S.⁶ And there are no laws meaningfully limiting the collection of private information by generative AI providers, requiring them to screen out such private information after collecting it, or preventing them from reselling such information after collecting it. The general rule after all is that if it is publicly available on the internet, then anyone has the right to collect and do what they want with it. A strong case can be made that this is generally how it should be.

This general rule, however, becomes problematic when applied to the information that was acquired illegally by a third person who then posts it on the internet. And when generative AI providers and implementers release their webcrawlers with full knowledge that they are collecting such data that no one would want to have publicly released and that it was likely originally acquired and posted from such an illegal data breach or other action without the consent of the “owner” of the data.

Please leave a feedback on thisx

Unless laws are eventually passed directly addressing these issues, there really isn’t any reason for generative AI providers to do anything here, is there? The simple reality is that left unchecked, their incentives are to do just enough to support what they really want—no laws to be passed on this issue at all or better yet laws favorable to their positions, so they can continue to minimize any legal liability they might otherwise become subject to.

For the best high-level technological discussion of the subject that I have come across, see Katharine Jarmul, Privacy Enhancing Technologies: An Introduction for Technologists, martin.Fowler.com, May 30, 2023, available here. ↩︎
See The Sedona Conference, Incident Response Guide, 21 Sedona Conf. J. 125, 182-83 (2020). ↩︎
See Rebecca Herold, Top 4 Reasons Encryption Is Not Used, Privacy & Security Brainiacs, March 21, 2020, available here. ↩︎
Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, Sec. 9(i)-(ii), Oct. 30, 2023, available here. ↩︎
A Google search for “PII detection” and the reams of hits it yields strongly suggests that a lot could be done on this front if so required…. ↩︎
For discussion, see my blog article Your duties to your AI customers and their private data, available here. ↩︎

Ko IP & AI Law PLLC