Data Ownership: The Faustian Bargain of AI Tools
- Ren Everett

- 5 days ago
- 4 min read
When a for-purpose organisation uses an AI tool, you can’t just ask:
“Is our data private?”
Of course the vendor is going to extol you in their safeguards against hackers, phishers, scammers, and other bad actors. Of course they store all their data regionally. Of course they can provide 2FA, MFA, third-party authentication, one time passcodes. Whatever security, or security theatre, is needed.
Once that’s out of the way, here’s the next thing you must ask:
“Is our data and data from the community we serve being used by the vendor, or the AI provider that they use?”
They will probably be surprised that you ask this - but once they say yes (which is pretty much inevitable), you then need to ask:
“Well, who controls that data and who profits?”
Before we get to the answer that you want to hear, I want to reiterate that I’m no AI slammer. I believe that AI tools can be of practical benefit to for-purpose organisation to work smarter, inclusively, and to follow the north star of their purpose. I just want your eyes open for any Faustian bargains on the way in. Because once the data enters the black box of a large language model it never comes back out.
So, who does profit from the data?
Generally, the commercial enterprises that are creating the current large language models. The current plan for AI “improvement” is to keep shovelling more data into their models - much like shovelling more coal into machinery drove the industrial revolution in the 19th century. But, these companies believe that more data won’t just lead to more power, but to a transformation in the AI to make it vastly more capable. Potentially more capable than humans. There’s a whole discussion going on right now about the effectiveness of this approach. I am certainly not qualified in data science to comment on the likelihood of us reaching Artificial General Intelligence; thinking machines. But, what I can talk about is the way the data being harvested right now feeds the current AI tools, which already replicate the systemically biased power structures of the data set.
AI is biased. That’s a full sentence. AI bias is inherent in the system because it is “trained” on data collected by humans, who have already made decisions about what data is the relevant signal and what is the irrelevant noise. To make matters worse, the data used has been collected by humans over the last century or so. If you don’t believe in the systemic biases of human society in the 20th and 21st century, please come along to one of my high school history classes. I can sit you down with my year 9s so you can learn some hard truths.

Coming back to your for-purpose organisation, the question is whether the data that you and the community you serve is collected and shoveled into the AI black box. Check with your provider - the answer is probably yes.
If that data is collected, then we need to know whether it is being used to carefully counteract the bias inherent in these systems, whether it’s going to be discarded as noise, or, worse, whether it is just fed into the black box in ways that will inevitably continue to reproduce systemic bias.
The vast majority of the data that has been used training AI has been created by the Western publishing industry, including online publishing. This stuff was never written to be anti-racist, anti-sexist, anti-homophobic, and so on. All the implicit and explicit biases of the last 100 years are front and centre in the data set.
What would it look like for an AI vendor to attempt to counteract the bias in the system?
I’m glad you asked. Here’s the answer I’ve come up with at Rad Learning.
Yes, I do collect the conversations that participants have in the Rad Learning Tutor. I store them securely in my own database, which is hosted in Australia.
The conversation is sent to a large language model, but never stored by the provider of the model. Therefore it can’t enter the training data of that model.
I use the data collected to improve my tools by getting a qualified human to assess it. It’s either going to be done by me, or a personally selected group of helpers. These are people who I know will care about bias, human insight, and understanding different perspectives.
Once I’ve assessed and labelled the data, I still don’t shove it in the black box. I use it as part of the guardrails around the systems that I’m implementing.
In a future post I’ll happily share the process of how I assess and use the data. If you want to learn more, please reach out to me. Or, you can subscribe to my newsletter below.
