Reddit sues data scrapers and Perplexity over unauthorized content access

Reddit filed a complete federal lawsuit on October 22, 2025, naming 4 defendants accused of circumventing technological controls to entry and scrape its content material. The criticism, filed in the USA District Court docket for the Southern District of New York, targets SerpApi LLC, Oxylabs UAB, AWMProxy, and Perplexity AI, Inc., looking for damages and injunctive reduction underneath the Digital Millennium Copyright Act.

The platform’s 41-page criticism describes how the defendants allegedly bypassed two layers of safety safety. First, the businesses evaded Reddit’s personal anti-scraping measures. Second, they circumvented Google’s SearchGuard system to scrape Reddit content material immediately from Google search engine outcomes pages. “Defendants are much like would-be financial institution robbers, who, understanding they can not get into the financial institution vault, break into the armored truck carrying the money as an alternative,” the criticism states.

Subscribe PPC Land publication ✉️ for comparable tales like this one. Obtain the information day by day in your inbox. Freed from adverts. 10 USD per yr.

Throughout a two-week interval in July 2025, three defendant corporations accessed almost three billion search engine outcomes pages containing Reddit content material by automated processes. SerpApi accessed 784 million SERPs between July 1-6 and 1.05 billion between July 7-13. Oxylabs accessed 333 million and 448 million SERPs throughout the identical intervals. AWMProxy accessed 217 million and 264 million SERPs.

Reddit maintains over 100 million each day lively customers partaking in discussions throughout a whole bunch of 1000’s of interest-based communities. This corpus of genuine human discourse has turn into more and more useful to AI corporations. “Reddit is a ‘top-cited supply’ of information for these corporations,” the criticism states, citing trade evaluation describing Reddit information as “like manna from heaven” for AI growth.

The platform has entered partnership agreements with choose AI corporations, together with Google and OpenAI, that embody contractual guardrails to guard person privateness and mental property rights. Reddit and Google introduced their partnership on February 22, 2024, enabling programmatic entry to Reddit content material to be used in Google’s services.

Technical circumvention strategies detailed

The criticism describes particular methods the defendant scrapers make use of to bypass technological management measures. SerpApi advertises its “Ludicrous Velocity Max” function as utilizing 4 instances the server sources to create a number of parallel requests that “reject unhealthy HTMLs, CAPTCHA and error pages, and different abnormalities.” The corporate’s web site tells customers they “need not care” about technological management measures together with “HTTP requests, parsing HTML recordsdata to JSON, or captcha, IP tackle, bots detection, sustaining user-agent, HTML headers, [or] being blocked by Google.”

Oxylabs operates what it calls “the world’s largest moral proxy community” and affords over 62,000 IP addresses situated in New York. The corporate’s web site explicitly states its scraping service is supposed to “bypass” restrictions, noting that “Oxylabs’ Google Search API is constructed to bypass the technical challenges Google has carried out.”

AWMProxy, beforehand operated by a Russian entity in reference to a cybercriminal botnet shut down in 2021, has apparently resumed operation promoting entry to proxy companies that conceal location and id. The corporate re-established its proxy community to renew data-scraping operations.

Perplexity accused of marked information monitoring

Reddit employed digital monitoring methods to substantiate Perplexity was utilizing Reddit information acquired by scraping of Google SERPs. The platform created a “take a look at submit” that would solely be crawled by Google’s search engine and was not in any other case accessible wherever on the web. Inside hours, queries to Perplexity’s reply engine produced the contents of that take a look at submit. “The one means that Perplexity might have obtained that Reddit content material after which used it in its ‘reply engine’ is that if it and/or its Co-Defendants scraped Google SERPs for that Reddit content material,” the criticism states.

Reddit despatched a cease-and-desist letter to Perplexity in Could 2024, demanding the corporate cease scraping Reddit information. At the moment, Perplexity informed Reddit it was not utilizing Reddit content material to coach any AI fashions and would respect Reddit’s robots.txt directives. Nevertheless, after Reddit despatched its cease-and-desist letter, the amount of Reddit information cited by Perplexity elevated forty-fold.

Perplexity publicly lists itself as a buyer of SerpApi on the scraping firm’s web site. The AI agency acknowledged in an August 2025 weblog submit that “Reddit has emerged as essentially the most cited area throughout AI fashions globally.”

Authorized claims underneath DMCA and unfair competitors

The criticism brings 5 counts towards the defendants. Rely I alleges all 4 defendants violated 17 U.S.C. § 1201(a)(1)(A) by circumventing technological measures that successfully management entry to copyrighted works. Counts II and III goal SerpApi and Oxylabs particularly for trafficking in know-how designed to bypass technological measures underneath 17 U.S.C. § 1201(a)(2) and § 1201(b).

Rely IV alleges unfair competitors by all defendants underneath state frequent legislation of New York, claiming they misappropriated Reddit’s labor, talent, expenditures, and goodwill whereas displaying unhealthy religion. Rely V alleges unjust enrichment by all defendants at Reddit’s expense. Rely VI alleges civil conspiracy between SerpApi and Perplexity.

Reddit has carried out a number of anti-scraping methods together with robots.txt recordsdata, IP fee limits, captcha bot safety, and anomaly-detection instruments. The platform’s Consumer Settlement explicitly prohibits “scraping the Providers with out Reddit’s prior written consent.” Reddit’s robots.txt file states, “Reddit believes in an open web, however not the misuse of public content material.”

Google likewise prohibits unauthorized automated entry to its search engine outcomes pages by technological management measures together with SearchGuard. This technique is designed to forestall automated methods from accessing and acquiring wholesale search outcomes whereas permitting particular person customers entry to Google’s search outcomes.

Purchase adverts on PPC Land. PPC Land has normal and native advert codecs by way of main DSPs and advert platforms like Google Adverts. By way of an public sale CPM, you possibly can attain trade professionals.

Learn more

Monetary implications and prior litigation

Reddit’s financial mannequin relies upon more and more on content material licensing income. The platform achieved 31 p.c year-over-year development in each day lively customers to 108.1 million in Q1 2025, with promoting income rising 61 p.c to $358.6 million. The corporate went public in 2024.

The criticism notes Reddit beforehand filed a lawsuit towards Anthropic on June 4, 2025, in San Francisco Superior Court docket (Case No. CGC-25-625892) for breach of contract, unjust enrichment, trespass to chattels, and unfair competitors. That case alleged the AI firm scraped platform information with out permission to coach Claude chatbot fashions.

The present lawsuit emphasizes privateness safety mechanisms absent from unauthorized scraping. “By bypassing all the lawful means for acquiring this content material, Defendants additionally bypass all the strict restrictions positioned on Reddit’s licensed enterprise companions that defend Redditors and their anonymity and privateness,” the criticism states.

Reddit is compelled to take a position important sources into {hardware}, software program, and personnel to enhance its technical safety methods, surveillance, and anti-scraping efforts. The criticism seeks to cease the defendants’ conduct to forestall ongoing hurt to Reddit’s fame, person belief, and licensing enterprise.

Context in broader AI information disputes

The lawsuit emerges as authorized battles intensify throughout the AI trade over coaching information rights. Cloudflare data released in August 2025 revealed stark imbalances between how a lot content material AI platforms crawl versus the site visitors they refer again to publishers. Perplexity’s crawl-to-refer ratio elevated 256.7 p.c relative to referrals all through 2025, climbing from 54 crawls per referral in January to 195 by July.

Earlier in 2024, Cloudflare noted that “Perplexity is utilizing stealth, undeclared crawlers to evade web site no-crawl directives.” The corporate’s CEO described Perplexity’s practices as resembling “North Korean hackers” reasonably than a good AI firm.

Google has taken its own steps to guard search outcomes from scraping. On September 14, 2025, the search big eradicated its n=100 SERP parameter, forcing website positioning instruments to make 10 separate requests as an alternative of 1 to retrieve 100 search outcomes. This variation elevated operational prices tenfold for companies depending on complete search end result evaluation.

The criticism particulars that Reddit information is especially well-suited to coaching giant language fashions as a result of it gives “real-time entry to evolving and dynamic subjects” and “consistently grows and regenerates as customers come and work together with their communities and one another in real and genuine methods.”

Reddit seeks injunctive reduction to cease defendants from accessing or utilizing Reddit’s web site and Google’s web site for illegal information scraping, growing or distributing circumvention know-how, and utilizing any Reddit information beforehand obtained by circumvention. The platform additionally seeks precise, statutory, or compensatory damages; disgorgement of defendants’ earnings; pre-judgment and post-judgment curiosity; and attorneys’ charges.

The defendants haven’t but filed formal responses to the lawsuit. The case is assigned Case No. 25-cv-8736 with a jury trial demanded.

Subscribe PPC Land publication ✉️ for comparable tales like this one. Obtain the information day by day in your inbox. Freed from adverts. 10 USD per yr.

Timeline

February 22, 2024: Reddit and Google announce partnership enabling programmatic entry to Reddit content material to be used in Google’s services
Could 2024: Reddit sends cease-and-desist letter to Perplexity demanding it cease scraping Reddit information
June 4, 2025: Reddit files lawsuit against Anthropic for unauthorized use of Reddit information to coach Claude AI fashions
July 1-13, 2025: Defendants entry almost three billion Google SERPs containing Reddit information throughout two-week interval
August 2025: Perplexity’s crawl-to-refer ratio reaches 195 crawls per referral, up from 54 in January
September 14, 2025: Google eliminates n=100 SERP parameter to guard search outcomes from scraping
October 22, 2025: Reddit recordsdata lawsuit towards SerpApi, Oxylabs, AWMProxy, and Perplexity AI in U.S. District Court docket for the Southern District of New York

Subscribe PPC Land publication ✉️ for comparable tales like this one. Obtain the information day by day in your inbox. Freed from adverts. 10 USD per yr.

Abstract

Who: Reddit Inc. filed the lawsuit towards 4 defendants: SerpApi LLC (a Texas firm offering web-scraping instruments), Oxylabs UAB (a Lithuanian information scraper), AWMProxy (a former Russian botnet now registered in California), and Perplexity AI Inc. (a synthetic intelligence firm based mostly in San Francisco).

What: The lawsuit alleges the defendants circumvented technological management measures carried out by Reddit and Google to entry and scrape Reddit content material on an industrial scale. Throughout a two-week interval in July 2025, the three scraper defendants accessed almost three billion search engine outcomes pages containing Reddit information by automated processes that bypassed safety restrictions.

When: The criticism was filed on October 22, 2025, in the USA District Court docket for the Southern District of New York. The alleged circumvention actions occurred all through 2024 and 2025, with particular information assortment documented throughout July 1-13, 2025.

The place: The lawsuit was filed within the Southern District of New York (Case No. 25-cv-8736). The defendants function from numerous areas: SerpApi relies in Austin, Texas; Oxylabs operates from Vilnius, Lithuania; AWMProxy is registered in California; and Perplexity AI maintains places of work in San Francisco, California, and New York, New York.

Why: This issues for the advertising neighborhood as a result of it addresses basic questions on information entry, AI coaching practices, and the worth of user-generated content material within the synthetic intelligence period. Reddit’s content material licensing income has turn into an more and more vital a part of its enterprise mannequin, whereas AI corporations desperately want entry to high-quality, present information to help their ambitions. The case might set up vital precedents for AI coaching information rights because the know-how trade faces rising litigation over content material utilization, doubtlessly affecting how advertising professionals entry information for aggressive intelligence and the way platforms monetize their content material by licensing agreements versus permitting unauthorized scraping.

Source link

Reddit sues data scrapers and Perplexity over unauthorized content access

Technical circumvention strategies detailed

Perplexity accused of marked information monitoring

Authorized claims underneath DMCA and unfair competitors

Monetary implications and prior litigation

Context in broader AI information disputes

Timeline

Abstract

[email protected]

Leave a Reply Cancel reply

7 Best SendOwl Alternatives in 2026 (Compared & Reviewed)

Animated Html5 We Are Hiring Ad Banners Template

The Razer Pro Type Ergo proves the company can do non-gaming keyboards well – but it’s not perfect

Press ESC to close

Technical circumvention strategies detailed

Perplexity accused of marked information monitoring

Authorized claims underneath DMCA and unfair competitors

Monetary implications and prior litigation

Context in broader AI information disputes

Timeline

Abstract

Share Article:

ShareBang, Ultimate Social Share Buttons for WordPress.

Delivery boy app for WooCommerce

Leave a Reply Cancel reply