Privacy Is a Must-Have Requirement for Using AI
Using a cloud AI is a privacy nightmare waiting to happen.
ChatGPT chats are showing up in Google Search and that’s just the surface of it. What happens when you give it access to your local files? That’s right—you don’t know. It’s safe to assume that they’ll be gobbled up by the machine for later training purposes.
American big tech is not exactly renowned for their privacy credentials. The AI crowd is particularly data hungry. Having exhausted all of Reddit, 4Chan, Twitter and the rest of the web to train their models, the hunt for more data is on. Clearview AI scraped 3 billion social media images in a major privacy invasion. Perplexity was caught using stealth, undeclared crawlers to evade website no-crawl directives. Meta even crawls revenge porn sites. It’s not a pretty picture.
In my perspective, privacy is a non-negotiable requirement for using AI. That’s a bit tricky if you’re in the tech space and want to experiment with this technology, since the industry is dominated by US cloud providers.
I stopped using Github Copilot
I’ve used Github Copilot for a year, to see how that would work out in writing software code. I stopped doing that, for two reasons:
- Quality. Having a junior pair programmer interrupt me with dubious code suggestions doesn’t enhance my productivity.
- Privacy. By default, Github Copilot will use your data for training. The paid account I used allowed me to opt out of that. But I don’t really trust that. More fundamentally, the copilot integration in my Emacs editor used a big black box of Microsoft-supplied code that I don’t trust with access to potentially all of my files.
If you run closed-source software in your editor, there’s no guardrails preventing that software to open any of the files you yourself can open. All you can do is hope the software won’t do that.
Secure open source integration
Instead I’m now using a more secure open source integration into my editor, that gives me complete control over what I’m sharing—meaning especially what I’m not sharing with a cloud provider. By default, only the text I type is sent over the wire. If I want to add a file as context, I will to explicitly have to add it myself.
With open source, you can at least verify it’s not up to dirty tricks. Well, I can. Many people can. Even if you yourself cannot, you can trust the others that can to sound the alarm if there’s evel intent.
Don’t just give your data to OpenAI, Anthropic or Google
Trusting a US company with your data is beyond naive. Anthropic’s Claude chatbot agrees:
Key risks of giving AI services access to local files include:
Privacy & Data Exposure:
- Personal documents, credentials, API keys sent to their servers
- No guarantee of data deletion after processing
- Potential data breaches exposing your information
Security Vulnerabilities:
- Accidental sharing of sensitive business/financial data
- Source code or proprietary information exposure
- System configuration details that could aid attackers
Compliance Issues:
- May violate GDPR, HIPAA, or corporate data policies
- Legal liability for exposing client/customer data
- Industry regulations around data handling
Loss of Control:
- Files processed on external servers vs. locally
- Unclear data retention and usage policies
- Potential training data incorporation (though major providers claim to avoid this)
Best Practices:
- Review files before sharing
- Use local/offline AI tools for sensitive data
- Redact personal info when possible
- Check your organization’s AI usage policies
- Consider file-level permissions and isolation
The risk level depends on file sensitivity, but it’s wise to treat any upload as potentially permanent and accessible to the service provider.
In other words: don’t give over your local files to Antropic or OpenAI, unless you want them to be handed over to the NSA and potentially the whole world.