Large Language Model (LLM) Data Collection: What You Can "Do" About It . . .
You can be proactive about your history!
WHAT USER DATA/INFORMATION DO AI MODELS COLLECT?
CHAT-GPT: OPEN.AI (Open.AI Privacy Policy)
What user data/information is collected?
Account information: name, contact information, email address, credit card number, transaction history
IP address, browser type and settings, operating system, device identifiers, date and time of system use, user-website interactions
User content: prompts (which may include private information) and AI output (conversation history), uploaded content (e.g. images, files), us feedback
DOES THIS MODEL STORE A USER’S CHAT HISTORY? YES
Open.AI states, “We only keep this information for as long as we need it to serve its intended purpose.”
DO CHAT HISTORIES TRAIN THE MODEL? YES
“We use training information [chat prompts/conversation history and uploaded files] only to help our models learn about language and how to understand and respond to it. We do not and will not use any personal information in training information to build profiles about people, to contact them, to try to sell them anything, or to sell the information itself.”
User content, which may or may not be de-identified (made anonymous), is used to train the AI model, to conduct research, to “analyze the effectiveness” of Open.AI operations, and may be released to third parties without a user’s notification (unless necessitated by the law)
BARD: GOOGLE (Bard Privacy Help Hub)
What user data/information is collected?
Account information (if applicable): name, contact information, billing address
IP address, location (“precise location” information is not stored as part of a user’s Bard activity), browser type, applications used, device identifiers
For users 18 and older, Google automatically stores Bard activity to their Google Account for 18 months, including the general area from your device, IP address, or Home or Work addresses in your Google Account.”
User content: prompts (which may include private information) and AI output (conversation history), uploaded content (e.g. images), user feedback
BING CHAT: MICROSOFT (Microsoft Privacy Statement – contains the section, “Search, Microsoft Edge, and artificial intelligence” and The new Bing: Our Approach to Responsible AI)
What user data/information is collected?
IP address, location, browser type, device identifiers, cookies, time and date of use,
User content: text, voice, data, and images associated with searches, prompts, and queries
Search results, suggested websites, websites visited
“[S]ome de-identified data (data where the identity of a specific person is not known) from Bing and Bing-powered experiences with selected third parties”
If a user signs up for the Bing Experience Improvement Program (when setting up their account) additional information is collected including “how you use these specific Bing apps, such as the addresses of the websites you visit, to help improve search ranking and relevance.” No identifying data is collected to identify or contact a user, or direct additional advertising their way
DOES THIS MODEL STORE A USER’S CHAT HISTORY? YES
IP addresses are removed from stored search queries after 6 months; cookie ID’s and “other cross-session identifiers” used to identify an account or device are deleted after 18 months
DO CHAT HISTORIES TRAIN THE MODEL? YES
CLAUDE: ANTHROPIC (Anthropic Privacy Policy)
What user data/information is collected?
Account information: name, email address, password
IP address, location, time zone, operating system, browser information, device identifiers, “advertising identifiers, probabilistic identifiers, and other unique personal or online identifiers”
User content: prompts, system output, feedback
Pages visited before and after using Anthropic.ai, browsing history, search history, date and time of use, click history (links used, time between clicks), viewed websites, cookies
DOES THIS MODEL STORE A USER’S CHAT HISTORY? YES
Anthropic keeps personal data “for as long as reasonably necessary for the purposes and criteria outlined in our Privacy Policy.” That data may be de-identified for research or “statistical purposes.”
User prompts and system output are deleted from the server after 90 days unless another agreement has been reached
Conversations that are flagged for violating Trust & Safety guidelines are saved for up to 2 years
DO CHAT HISTORIES TRAIN THE MODEL? NO
Conversation histories are not used to train the AI model unless certain arrangements have been made, such as giving Anthropic permission to use them. When permission is given (or a user provided constructive feedback or reports bugs), the information is de-identified and kept for 10 years
HOW TO TURN OFF CHAT HISTORIES
CHAT-GPT: ALL CHATS
Click on your username (lower right-hand corner) > Settings & Beta > Data Controls > Chat history & training
Toggle “Chat history & training” off; when this setting is applied an “Enable chat history” button appears with a disclaimer on the user screen to re-enable the feature
As explained by OpenAI, “While history is disabled, new conversations won’t be used to train and improve our models, and won’t appear in the history sidebar. To monitor for abuse, we will retain all conversations for 30 days before permanently deleting.”
Previous chat history is not deleted when the toggle setting is changed
CHAT-GPT: AN INDIVIDUAL CHAT
In the history sidebar, click the three dots (· · · ) to the right of the desired chat
From the dropdown menu select “Delete”
BARD
On the main menu bar click the “Bard Activity” icon (upper right-hand corner)
Click the “Bard Activity” card > toggle “Bard Activity” to off
Even when Bard Activity is off, Google may “continue to save location and other data as part of your use of other Google services” for up to 72 hours
“Human reviewers read, annotate, and process [a user’s] Bard conversations”
Google states, “Bard conversations that have been reviewed or annotated by human reviewers are not deleted when you delete your Bard activity because they are kept separately and are not connected to your Google Account. Instead, they are retained for up to three years.”
Auto-delete removes Bard interactions after 18 months; to change that default to either 3 or 36 months, click the “Auto-delete” card
To manually remove Bard activity, click the “Delete” button; deletion choices are: Last hour, Last Day, All time, Custom range
BING (REMOVING EARLIER CONVERSATIONS)
Open Bing Chat in the Edge browser > click on the three-bar icon (upper right-hand corner) > click on “Search History”
Scroll down to “Activity” to view the list of individual searches/chatbot exchanges
Click the check boxes associated with the chats to be removed > click “Clear”
Click on the uppermost check box to select 16 searches simultaneously > click “Clear”
BING (REMOVING AN ENTIRE SEARCH HISTORY & PREVENTING FUTURE INFO COLLECTION)
Manage or clear your search history > click “Clear all”
Toggle “Show New Searches Here” to off (According to MS, “New searches won't appear on your search history page on Bing.”)
CLAUDE
Click on the action arrow [V] associated with the chat title > “Delete”