Cohere API Data Can Be Opted Out from Training Usage
When integrating LLMs into applications, one of the biggest concerns is “How is data sent via API handled?” Especially when dealing with potentially confidential or personal information, it’s natural to want to avoid having that data used for model training.
This article introduces the data privacy policies and options available when using the Cohere API.
First, let’s get to the point: by default, data sent to the Cohere API may be used for model improvement purposes.
However, Cohere provides clear opt-out mechanisms so users can control their data.
The most convenient method is to change settings from the Cohere dashboard. There’s an option in the settings screen to disable data usage.
By disabling this setting, your API data will no longer be used for model training or fine-tuning. For many use cases, this dashboard opt-out will be a simple and sufficient solution.
Below is a quote from the Cohere dashboard settings screen https://dashboard.cohere.com/data-controls:
Data Controls
Use the toggles below to control how we may use prompts, generations, or fine-tune data to train our models. You may change your settings at any time and they will take effect immediately. To learn more about how we handle and protect enterprise data, read our Enterprise Data Commitments.
On ✅ ALLOW PROMPT AND GENERATION USE FOR TRAINING
Prompts are the text or content that you input to the model (e.g. your messages to chat.cohere.com), and Generations are the model outputs (e.g. the responses generated in the chat).
ON - we may use prompts or generations to help train our models to improve their quality.
OFF - we do not use prompts or generations to train our models.
On ✅ ALLOW FINETUNE DATA USE FOR TRAINING
Finetune data are your data sets (csv, jsonl, etc) uploaded to create a fine-tuned model.
- ON - we may use these datasets to help train our models to improve their quality.
- OFF - we do not use fine-tune data to train our models.
We never share fine-tuned models you create with any other customer.
Depending on corporate security policies or the confidentiality of data being handled, more stringent data management may be required. Cohere provides multiple advanced options to meet such needs.
“Zero Data Retention (ZDR)” is, as the name suggests, a feature to retain no data at all. When this option is enabled, all data logging for API requests is disabled.
ZDR is optimal for applications handling particularly sensitive information such as:
Using this feature requires consultation with Cohere’s privacy team and sales team.
When you want to keep data completely within your own infrastructure, private cloud deployment is the optimal solution. Cohere supports private deployments on major cloud platforms like AWS Bedrock and AWS SageMaker.
The benefits of this approach are clear:
While infrastructure management costs are incurred, this is a highly effective option for enterprise environments requiring the highest level of data governance.
Cohere complies with major data protection and security standards such as GDPR (General Data Protection Regulation), CCPA (California Consumer Privacy Act), and SOC II.
Additionally, for users seeking legal guarantees through business contracts, DPA (Data Processing Agreement) execution is also possible. A DPA is a document that legally commits how Cohere will protect customer data and maintain compliance, serving as an important element for enterprises to use the service with confidence.
Data privacy settings for the Cohere API can be flexibly chosen according to developer and enterprise requirements:
To safely utilize LLMs, it’s essential to properly understand these privacy policies and available options. Choose the method that best fits your project requirements and proceed with development confidently.
That’s all from the Gemba.