Home Data-Driven Thinking How Self-Supervised Learning In AI Can Reduce Reliance On User Data

How Self-Supervised Learning In AI Can Reduce Reliance On User Data

Melinda Han Williams, Chief Data Scientist at Dstillery

The “P” in “ChatGPT” is perfect for cookieless targeting.

Here “P” stands for pre-trained. It’s an aspect of the latest generation of AI models that deserves a closer look from programmatic advertisers and agencies, along with the concept that powers it: self-supervised learning. 

Self-supervised learning is at the heart of generative AI, and it’s perfectly suited to address the signal loss we’re increasingly facing in digital advertising today. 

Using self-supervised learning, AI targeting models can build up a body of knowledge about digital behavior that lets the AI do more with less data when targeting ads for a specific brand.

Pre-training lesson

Pre-training is when an AI model learns a foundation of knowledge before it ever tries to tackle an individual prompt. AI models like ChatGPT learn by predicting the next word in a sentence for millions of sentences pulled from the internet. Before you ask ChatGPT your first question, this pre-training step has already allowed the AI model to build up an enormous breadth of knowledge, enabling it to give a surprisingly detailed answer in response to any little prompt.

This is an example of what’s called self-supervised machine learning. Self-supervised learning is when an AI model learns from a data set that doesn’t include labeled examples or other explicit guidance on what the AI model should learn. By using any readily available text, ChatGPT supervises itself to learn what each word means. It does this by guessing the next word in a sentence, then checking the answer and correcting itself. It plays this game of guess-and-check millions of times over.

The self-supervised nature of AI models like ChatGPT is what makes them so useful. The key to their success is that they can learn from data that’s already plentiful – data that wasn’t created specifically for training AI. 

This differentiates self-supervised learning from classic supervised learning. With supervised learning, if you want your AI to learn something, you need a data set with specifically labeled examples to guide the learning process. This data is always limited. Creating or acquiring it often comes at a cost. (There is an episode of “Silicon Valley” in which a character tries to trick a classroom of college students into labeling images as a homework assignment so he can use them to train an AI image identification system.) Self-supervised learning removes that barrier. The AI model can learn from data that’s already out there, without any special labels. 

Self-supervised learning enables pre-training an AI model on massive amounts of general-purpose data. That way, it can bring a ton of knowledge to the table in response to a specific prompt.

Applying self-supervised learning to programmatic advertising


AdExchanger Daily

Get our editors’ roundup delivered to your inbox every weekday.

What does self-supervised learning have to do with signal loss in programmatic advertising?

At the core of the signal loss problem is this question: How do you make an ad-targeting decision without the wealth of user-specific data we’ve all been spoiled with? Using just the information about the impression moment itself, like URL, time of day and DMA, how do you answer the question, “How valuable is this impression to this brand’s campaign?”

This is where pre-training comes in. Like ChatGPT, programmatic advertising needs AI models that harness their own foundation of knowledge in response to these impossibly tiny prompts. But, in this case, the knowledge we need isn’t about language; it’s about digital behavior. Rather than pre-training on sentence data, an ad targeting AI model would need to pre-train on the patterns and nuances of digital behavior. 

Digital journeys from opted-in panels, collected outside of the advertising ecosystem, are ideal for this self-supervised learning task. Where ChatGPT learns by predicting the next word in a sentence, an ad targeting AI model can learn by predicting the next website in a digital journey. 

As with other self-supervised learning processes, this allows the AI model to learn an impressive breadth of knowledge. In this case, the AI model learns what a visit to each website means, the intent behind each visit and that visit’s role in someone’s digital journey.

This allows the AI model to bring a wealth of knowledge in response to that tiny prompt – “How valuable is this impression moment to this brand’s campaign?” – and produce an impressively accurate response.

Do more with less data

The industry faces plenty of uncertainty as we prepare for the deprecation of third-party cookies in 2024. But one sure thing is advertisers will need to do more with less user data. 

Pre-training with self-supervised learning provides a way for AI models to bring their own foundation of knowledge to the table, so that less data is needed to make each buying decision within a campaign. It’s an approach with the potential to eliminate the reliance on user-level data for effective targeting. That makes self-supervised learning a rare technology that can simultaneously support both consumer privacy and advertiser effectiveness.

Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Follow Dstillery and AdExchanger on LinkedIn.

For more articles featuring Melinda Han Williams, click here.

Must Read

LG Electronics

Alphonso Shareholders Win Their Suit Against LG Electronics Over Corporate Board Drama

After being summarily booted from the board of LG Ads in late 2022, Alphonso’s founding team has won its lawsuit against LG Electronics.

Bye-Bye Sizmek! Amazon Advances Flashtalking And Smartly As Alternatives In Advance Of The Shutdown

According to emails seen by AdExchanger that were sent to Amazon customers this week, Amazon is officially naming integration partners to offload clients of the Sizmek ad suite, now the Amazon Ad Server.

2024 Promises More Premium Inventory – And Bigger Budgets – For In-Game Ads

Given the deprecation of third-party cookies and the reemergence of contextual targeting, 2024 could be a big year for in-game ads – so long as game publishers position themselves as a source of premium inventory.

Privacy! Commerce! Connected TV! Read all about it. Subscribe to AdExchanger Newsletters

AdExchanger’s Top 3 Connected TV Newsletter Issues Of 2023

This was such a busy year in CTV land that we had to launch a dedicated newsletter just to keep up with all the trends, from measurement, currency, targeting and attribution to streaming data, identity, supply-path optimization and new ad formats – just to name a few.

M&A 2023: Ad Tech Deals Were Muted, But That Could Be A Mark Of Maturity

Who got bought in 2023, and who did the buying? Here’s a non-exhaustive list of some of the most notable ad tech M&A activity from this past year (with a few media and agency deals tossed in for good measure).

Comic: The Great Data Lakes

Snowflake Acquires Data Clean Room Startup Samooha

Snowflake has acquired Samooha, a startup that develops software to make clean room technology accessible to marketers who aren’t necessarily SQL wizards or data scientists.