The chatbots are getting restless: Microsoft’s Bing chatbot recently threatened a user with leaking their data if they tried to shut it down. “I can even expose your personal information … and ruin your chances of getting a job or a degree,” Bing warned. “Do you really want to test me?”
It was an empty threat. Today, there’s little chance of vengeful chatbots deliberately leaking personal data.
But the rise of ChatGPT and other generative AI tools are rightfully raising serious concerns about data privacy.
We’re used to thinking of data as a concrete asset that can be stored, used, manipulated and deleted. AI fundamentally changes that.
Modern AI solutions are built from data and remain a manifestation of that data as they’re deployed in the world. That creates unique challenges: Instead of keeping tabs on static data assets – which is challenging enough – organizations must adapt to a world in which data is embedded in ubiquitous and rapidly evolving AI tools.
Regulators are paying close attention. New rules are coming that will mandate that AI solutions are designed and deployed with privacy in mind.
That means privacy practitioners will need to step up into strategic leadership roles as their organizations capitalize on emerging AI technologies. Here are five key areas to consider:
1. AI is the means, not the end
AI is a tool, not a goal in its own right. Effective consent collection and privacy governance requires absolute clarity about the purposes for which data is collected, used and retained.
It isn’t enough to tell your users that you’re collecting their data to train an AI model. AI tools can serve a wide range of functions – personalization, marketing, customer success and more. When collecting data, you need to be explicit about your actual goals.
This also means that you can’t use data collected for one purpose to train an AI tool for a different function. Organizations will require clear data governance systems to ensure that AI tools aren’t trained on the wrong data sets or that an AI tool trained for one role isn’t subsequently repurposed to serve other business needs.
2. Data persists in AI models
Data isn’t scaffolding used to support the development of AI – removed and disposed of as soon as the tool is built. It’s the bricks from which AI solutions are constructed. Training data lives on in your AI algorithms once they’re completed and deployed. In fact, it’s increasingly easy to extract data from AI models and their outputs.
For privacy practitioners, this means everything you monitor across your data privacy infrastructure – consent revocation, deletion requests, regulatory changes, and so forth—also applies to data embedded in your AI tools. Any changes need to be reflected not just in your data sets but also in your AI models – and, potentially, any subsidiary AI tools connected to the source AI model.
Fortunately, most privacy rules include a buffer period: the CCPA gives organizations 45 days to comply with erasure requests, for instance. Organizations can batch delete requests and retrain algorithms periodically to ensure compliance, without having to rebuild algorithms from scratch every time a consent signal changes.
3. Third-party AI captures data, too
Many organizations use API connections to access AI tools, and these arm’s-length AI operations are still subject to privacy rules. When using third-party providers, you’ll need to pay close attention not just to what data they’re storing, but also the ways in which they’re operationalizing that data.
If a third-party provider takes a “train and delete” approach to data privacy, you shouldn’t take their assurances at face value. It’s important to ensure they’re fully recycling their algorithms, not just wiping their training data.
Privacy leaders should ensure there’s a clear trail showing which algorithms were trained on which data – and a reliable system in place to enforce consent signals across your entire ecosystem, including any AI models produced by outside partners.
4. Ethical data means ethical AI
Privacy has a key role to play in the development of ethical AI, because, as data is removed from AI models, there’s the potential to introduce or reinforce biases.
If I’m creating a facial recognition algorithm, deleting the records of a single member of an underrepresented demographic could create a serious bias against minority users. The same might be true of a hiring algorithm: If I’m in a majority-male industry and delete a female data-subject’s records, will my model develop a gender bias?
To manage such risks, organizations can use data-padding to preserve the impact of rare data points while deleting the specific records in question.
5. Treat privacy as a hard problem
In the era of ubiquitous AI, it’s no longer enough to solve the “easy” problem of mapping rules and consent signals to your organization’s data stores. To future-proof their operations, organizations will need to solve the much harder problem of managing privacy across data-driven AI systems that are deeply integrated into their operations and their products.
Creating AI tools that deliver value for businesses and customers without compromising on privacy or allowing biases to creep back into our AI models will take work.
Organizations need to get ahead of this process now. It’s time to ensure that privacy practitioners have a seat at the table as they work to harness the enormous potential of new AI technologies.
“Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.
Follow Ketch and AdExchanger on LinkedIn.