Privacy Is a Must-Have Requirement for Using AI

August 24, 2025Reading time: 3 minutes

Using a cloud AI is a privacy nightmare waiting to happen.

ChatGPT chats are showing up in Google Search and that’s just the surface of it. What happens when you give it access to your local files? That’s right—you don’t know. It’s safe to assume that they’ll be gobbled up by the machine for later training purposes.

American big tech is not exactly renowned for their privacy credentials. The AI crowd is particularly data hungry. Having exhausted all of Reddit, 4Chan, Twitter and the rest of the web to train their models, the hunt for more data is on. Clearview AI scraped 3 billion social media images in a major privacy invasion. Perplexity was caught using stealth, undeclared crawlers to evade website no-crawl directives. Meta even crawls revenge porn sites. It’s not a pretty picture.

In my perspective, privacy is a non-negotiable requirement for using AI. That’s a bit tricky if you’re in the tech space and want to experiment with this technology, since the industry is dominated by US cloud providers.

I stopped using Github Copilot

I’ve used Github Copilot for a year, to see how that would work out in writing software code. I stopped doing that, for two reasons:

Quality. Having a junior pair programmer interrupt me with dubious code suggestions doesn’t enhance my productivity.
Privacy. By default, Github Copilot will use your data for training. The paid account I used allowed me to opt out of that. But I don’t really trust that. More fundamentally, the copilot integration in my Emacs editor used a big black box of Microsoft-supplied code that I don’t trust with access to potentially all of my files.

If you run closed-source software in your editor, there’s no guardrails preventing that software to open any of the files you yourself can open. All you can do is hope the software won’t do that.

Secure open source integration

Instead I’m now using a more secure open source integration into my editor, that gives me complete control over what I’m sharing—meaning especially what I’m not sharing with a cloud provider. By default, only the text I type is sent over the wire. If I want to add a file as context, I will to explicitly have to add it myself.

With open source, you can at least verify it’s not up to dirty tricks. Well, I can. Many people can. Even if you yourself cannot, you can trust the others that can to sound the alarm if there’s evel intent.

Don’t just give your data to OpenAI, Anthropic or Google

Trusting a US company with your data is beyond naive. Anthropic’s Claude chatbot agrees:

Key risks of giving AI services access to local files include:
Privacy & Data Exposure:
Personal documents, credentials, API keys sent to their servers
No guarantee of data deletion after processing
Potential data breaches exposing your information
Security Vulnerabilities:
Accidental sharing of sensitive business/financial data
Source code or proprietary information exposure
System configuration details that could aid attackers
Compliance Issues:
May violate GDPR, HIPAA, or corporate data policies
Legal liability for exposing client/customer data
Industry regulations around data handling
Loss of Control:
Files processed on external servers vs. locally
Unclear data retention and usage policies
Potential training data incorporation (though major providers claim to avoid this)
Best Practices:
Review files before sharing
Use local/offline AI tools for sensitive data
Redact personal info when possible
Check your organization’s AI usage policies
Consider file-level permissions and isolation
The risk level depends on file sensitivity, but it’s wise to treat any upload as potentially permanent and accessible to the service provider.

In other words: don’t give over your local files to Antropic or OpenAI, unless you want them to be handed over to the NSA and potentially the whole world.

Privacy Is a Must-Have Requirement for Using AI

Using a cloud AI is a privacy nightmare waiting to happen.

I stopped using Github Copilot

Secure open source integration

Don’t just give your data to OpenAI, Anthropic or Google

Khoj Private Local AI: Disappointing

The AI Bubble Is Going to Pop

AI as a Rorschach Projection Screen

Crowdstrike Outage Shows Systemic Risks