Why is the Reddit licensing deal important for Google’s AI plan?

This deal couldn’t have come at a better time for the two companies. Reddit wants money and investor love ahead of its planned initial public offering (IPO). And Google wants to save face from its AI misadventures

2024-02-28 - John Xavier Nabeel Ahmed

ocial media platform Reddit on Thursday struck a licensing deal with Google, allowing the search giant to access Reddit users’ posts to train the company’s artificial intelligence (AI) engine. As part of the deal, Google will pay the social news aggregation site $60 million annually to access usergenerated content from the platform. This deal couldn’t have come at a better time for the two companies. Reddit wants money and investor love ahead of its planned initial public offering (IPO). And Google is looking to save face from its AI misadventures.

While Reddit generates revenue, the company is not profitable. Its IPO document, filed with the U.S. stock market regulator, reveals a revenue of $804 million in 2023; most of it coming from advertisers. But the platform suffered a net loss of $90.8 million. Google’s annual pay check to Reddit will provide the platform money to make the company profitable. Plus, a data partnership with one of the biggest players in AI can boost Reddit’s stature before its IPO, making investors find value in the platform. The licensing deal hands the Mountain View, Californiabased company a data mine to salvage itself from the AI wreck its in now.

SWhat ails Google?

Google’s sporadic attempts to break OpenAI’s dominance in AI has left the search giant badly bruised. The company’s maiden AI chatbot Bard, launched as a rival to OpenAI’s ChatGPT, was faulty. It had factual errors in its first demo video and subsequent iterations weren’t upto par either.

Most recently, the company’s Gemini chatbot overcompensated for the lack of diversity by throwing up irrelevant images in response to queries. The company’s AIbased image generator showed a picture of a Black woman when queried ‘Who is the United States’ founding father?’ In another instance, it showed Asian persons as Naziera German soldiers. Such unintelligent responses have caused quite a stir. These blunders made the company’s top executive, overseeing its search business, Prabhakar Raghavan, apologise and note that the product “missed the mark”.

While these issues are tied to its large language model (LLM) and weights attached to tokens, the other challenge Google is facing is the lack of raw data — LLMs are datahungry algorithms, and the quality of information flowing into it them matters a lot. To be good at typing out accurate texts, Generative AI (GenAI) models first need to read copious amounts of texts.

Till now tech firms had a free ride by scraping the web for text and using opensource crawling tools to sneak into websites and take data from those sites. This modus operandi is being challenged as users and publishers are pushing back against AI companies from scraping data from the web indiscriminately. In a proposed class action lawsuit, in July 2023, Google was accused of misusing a large amount of web users’ personal information to train its AI models. Separately, in December, news publisher The New York Times sued OpenAI and Microsoft for copyright infringement. The lawsuit claims that the AI firms used millions of its news articles to train the company’s AI model — ChatGPT. Such complaints from individuals and corporations are making lawmakers sit up and formulate policies on the ethical use of information available on the web.

Lawmakers in the U.S. filed a Bill, the AI Foundational Model Transparency Act, that would require the Federal Trade Commission (FTC) and National Institute of Standards and Technology (NIST) to frame rules to report data transparency in AI models. This would require builders of foundational AI models to disclose their sources of training data. If such a law is passed, AI companies will have to compensate for using data to train their models. Consequently, cost of building AI models will go up. To preempt such a law, large tech firms are sealing up licensing deals with news publishers and other content sources. OpenAI’s deal with the news agency Associated Press is a case in point.

Other news organisations, including Gannett (the largest U.S. newspaper company) and News Corp (the owner of The Wall Street Journal), have been in talks with OpenAI, as per media reports. The publications that have cut a deal with AI companies will get a fee based on the frequency of their content being used.

How different is this deal?

It is against this context Google is making a deal with Reddit. But, unlike other platforms, Reddit works as a social news website, where content is socially curated and promoted. The platform is composed of hundreds of subcommunities, known as subreddits, where members submit content, which is then up or downvoted by other members.

In the context of this deal, Google will have access to Reddit’s Data API, which will provide the search giant realtime, unique content from a large and dynamic platform. This will help the company’s AI model access behavioural and tending information data. And apart from this, Google will continue to access information from the web using crawlers.

However, there is one catch. In July 2023, when Reddit decided to introduce a new policy that charged some thirdparty apps for accessing data on its platform, concerns over content moderation and accessibility arose. Several groups protested the changes proposed by Reddit. Over 8,000 subreddits went dark. The subreddit groups, at the time, said the changes threatened to end the key way of historically customising the platform. To avoid such a conflict this time around, Reddit is giving an unspecified number of its top users, including moderators and those with high karma scores (a score that shows how much a user contributes to the Reddit community), the chance to buy shares in its IPO, according to a report by The Verge.

Reddit plans to do it through an allocation system based on tiers. Individuals from tier one, will be certain users and moderators identified as those who have meaningfully contributed to Reddit community programmes. The second tier will be made up of people with a karma score of at least 2,000 and those who have performed at least 5,000 moderator actions. This is an unusual move, as this privilege is usually reserved for professional investors who want to buy stock at a theoretically lower price before the stock is listed on an exchange. Reddit currently has some 2,67.5 million active weekly users, more than 1,00,000 active communities, and one billion total posts, according to its SEC filing.

Unlike Reddit, few platforms have been forthcoming on whether the public information of users is used to train AI models. X, formerly Twitter, in September, said it would use users’ posts to train AI models for the purposes outlined in its policy. The policy did not specify the AI model it referred to.

Meta said user data from its applications, including Facebook, Instagram, and Threads, would be used to train AI for its AI chatbot. While TikTok and Snapchat have both launched AI chatbots, neither has mentioned taking users posts to train AI models.

The practice of using user data to train algorithms is not new in the world of tech. Most of the platform’s recommender engine uses a person’s usage data to suggest videos, articles and movies. But using that information to train AI models is new and it calls for caution given these chatbots propensity to regurgitate personal information when it responds to prompts. A case in point is Samsung banning the use of AI chatbots in its offices after it found that the bot spat out company secrets after employees used the application.

Why is the Reddit licensing deal important for Google’s AI plan?

This deal couldn’t have come at a better time for the two companies. Reddit wants money and investor love ahead of its planned initial public offering (IPO). And Google wants to save face from its AI misadventures

SWhat ails Google?

How different is this deal?

Newspapers in English

Newspapers from India

Why is the Reddit licensing deal important for Google’s AI plan?

This deal couldn’t have come at a better time for the two companies. Reddit wants money and investor love ahead of its planned initial public offering (IPO). And Google wants to save face from its AI misadventu­res

SWhat ails Google?

How different is this deal?

Newspapers in English

Newspapers from India

This deal couldn’t have come at a better time for the two companies. Reddit wants money and investor love ahead of its planned initial public offering (IPO). And Google wants to save face from its AI misadventures