What are Gemini, Claude, and Meta AI doing with enterprise data?
Enterprise users of leading large language models are at risk of making private information public, according to a new study on the data collection and sharing practices of organizations such as Meta, Google, and Microsoft that reveals they are collecting sensitive data and sharing it with unknown third parties.
In fact, businesses may face even greater risks than the multitude of individuals who use the various LLMs, according to the findings from Incogni, a personal data removal services and data privacy company.
“Employees frequently use generative AI tools to help draft internal reports or communications, not realizing that this can result in proprietary data becoming part of the model’s training dataset,” the company said. “This lack of safeguards not only exposes individuals to unwanted data sharing, but could also lead to sensitive business data being reused in future interactions with other users, creating privacy, compliance, and competitive risks.”
Ron Zayas, the CEO of Incogni’s business and government division Ironwall, said, “the analogy would be that we spend a lot of time as businesses making sure that our emails are secure, making sure that our machines lock themselves down after a certain period of time, of following SOC 2 protocols, all these things to protect information.” But now, he said, the concern is that “we’ve opened the door, and we have employees feeding information to engines that will process that and use it [perhaps in responses to competitors or foreign governments].”
To evaluate the LLMs, Incogni developed a set of 11 criteria that allowed it to assess the privacy risk in each, and compiled the results to determine each program’s privacy ranking in the areas of training, transparency, and data collection and sharing. From these, it also derived an overall rating.
Key findings in Incogni’s study revealed that:
Le Chat by Mistral AI is the “least privacy invasive platform, with ChatGPT and Grok following closely behind. These platforms performed the best when it comes to how transparent they are on how they use and collect data, and how easy it is to opt out of having personal data used to train underlying models.”
LLM platforms developed by the biggest tech companies turned out to be the most privacy-invasive, the report said, with Meta AI (Meta) being the worst, followed by Gemini (Google) and Copilot (Microsoft).
Gemini, DeepSeek, Pi AI, and Meta AI don’t seem to allow users to opt out of having prompts used to train the models.
ChatGPT turned out to be the most transparent about whether prompts will be used for model training, and it had a clear privacy policy.
Grok (xAI) may share photos provided by users with third parties.
Meta.ai “shares names, email addresses and phone numbers with external entities, including research partners and corporate group members.”
What not to tell AI
Justin St-Maurice, technical counselor at Info-Tech Research Group, said that from a corporate perspective, “training your staff on what not to put into tools like ChatGPT, Gemini, or Meta’s AI is critical.”
He added, “just as people are taught not to post private or sensitive information on social media, they need similar awareness when using generative AI tools. These platforms should be treated as public, not private. Putting personally identifiable information (PII) or proprietary company data into these systems is no different than publishing it on a blog. If you wouldn’t post it on LinkedIn or Twitter, don’t type it into ChatGPT. The good news? You can do a lot with these tools without needing to expose sensitive data.”
According to St-Maurice, “if you’re worried about Meta or Google sharing your data, you should reconsider your overall platform choices; this isn’t really about how LLMs process your data, but how these large corporations handle your data more generally.”
Privacy concerns are important, he said, “but it doesn’t mean organizations should avoid large language models altogether. If you’re hosting models yourself, on-prem or through secure cloud services like Amazon Bedrock, you can ensure that no data is retained by the model.”
St-Maurice pointed out that, in these scenarios, “the LLM functions strictly as a processor, like your laptop’s CPU. It doesn’t ‘remember’ anything you don’t store and pass back into it yourself. Build your systems so that the LLM does the thinking, while you retain control over memory, data storage, and user history. You don’t need OpenAI or Google to unlock the value of LLMs; host your own internal models, and cut out the risk of third-party data exposure entirely.”
What people don’t understand, added Ironwall’s Zayas, “is that all this information is not only being sucked in, it’s being repurposed, it’s being reused. It’s being publicized out there, and it’s going to be used against you.”10 ways to boost Windows security – ComputerworldRead More