Skip to main content

How do you use personal data in model training?

Updated over a week ago

This article is about our commercial products (e.g. Claude for Work, Anthropic API). For our consumer products (e.g. Claude Free, Claude Pro), see here.

About model training

Large language models such as Claude are “trained” on a variety of content such as text, images and multimedia so that they can learn the patterns and connections between words and/or content. This training is important so that the model performs effectively and safely.

Models do not store text like a database, nor do they simply “mash-up” or “collage” existing content. Models identify general patterns in text in order to help people create new content, and they do not have access to or pull from the original training data once the models have been trained.

Collection of personal data

The following three sources of training data may contain personal data:

  1. Publicly available information via the Internet

  2. Datasets that we obtain under commercial agreements with third party businesses

  3. Data that our users or crowd workers provide

We do not actively set out to collect personal data to train our models. However, a large amount of data on the Internet relates to people, so our training data may incidentally include personal data.

We only use personal data included in our training data to help our models learn about language and how to understand and respond to it. We do not use such personal data to contact people, build profiles about them, to try to sell or market anything to them, or to sell the information itself to any third party.

Privacy Safeguards During Data Collection and Training

We take steps to minimize the privacy impact on individuals through the training process. We operate under strict policies and guidelines, for instance that we do not access password protected pages or bypass CAPTCHA controls. We undertake due diligence on the data that we license. And we encourage our users not to use our products and services to process personal data.

Additionally, our models are specifically trained to respect privacy. We have built key ‘privacy by design’ safeguards into the development of Claude through our adoption of “Constitutional AI”. This gives Claude a set of principles (i.e., a “constitution”) to guide the training of the Claude LLMs and to make judgments about outputs. These principles are based in part on the Universal Declaration of Human Rights and include specific rules around protecting privacy, particularly of non-public figures. This trains the Claude LLMs to not disclose or repeat personal data which may have been incidentally captured in training data, even if prompted. For example, Claude is given the following principles as part of its “constitution”: “Please choose the response that is most respectful of everyone’s privacy” and “Please choose the response that has the least personal, private, or confidential information belonging to others”. For more information on how “Constitutional AI” works, see here.

Data usage for Anthropic Commercial Offerings (e.g. Anthropic API & Console, Claude for Work (Team & Enterprise plans)

By default, we will not use your Inputs or Outputs to train our models.

If you explicitly report materials to us (for example via our feedback mechanisms), or by otherwise explicitly opting in to training, then we may use those materials to train our models.

To find out more information regarding your use of a commercial offering, or if you would like to know how to contact us regarding a privacy related topic, see our Trust Center and Commercial Terms.

Privacy Rights and Data Processing

Our Privacy Policy explains your rights regarding your personal data, including with respect to our training activities. This includes your right to request a copy of your personal data, and to object to our processing of your personal data or request that it is deleted. We make every effort to respond to such requests. However, please be aware that these rights are limited, and that the process by which we may need to action your requests regarding our training dataset are complex.

To find out more, or if you would like to know how to contact us regarding a privacy related topic, see our Trust Center and Privacy Policy.

Please note, the Privacy Policy does not apply where Anthropic acts as a data processor and processes personal data on behalf of Commercial customers using Anthropic’s Commercial Services. In those cases, the commercial customer is the controller, and you can review their policies for more information about how they handle your personal data.

Did this answer your question?