How ChatGPT Actually Picks Sources

I hold getting the identical query from purchasers and SEOs (GEOs?).

“How will we present up in ChatGPT?”

The reply is all the time the identical. Write good content, do listicles, touch upon Reddit.

The standard.

However, how will we truly know any of that works? Most of it will get repeated on religion, one skilled quoting the final.

So, as a substitute of taking it on belief, I spent a couple of days studying what ChatGPT sends my browser beneath the reply. The uncooked community visitors, in readable JSON.

This can be a walk-through of what I discovered, roughly within the order I discovered it.

Earlier than you quote a quantity from this, learn this. It’s one particular person, one logged-in Professional account, a couple of days of visitors, not a inhabitants examine. I logged about 1,240 supply information throughout a couple of dozen searches. The structural findings, the fields ChatGPT makes use of and the way they behave, are agency, since you solely have to see a subject as soon as to comprehend it’s actual, and I noticed them time and again. The numbers and percentages are a unique matter. They arrive from a small batch of principally SaaS and tech queries, so deal with them as path, not measurement. I flag which is which all through.

How This Differs From The Massive Visibility Research, And What You Can Take To The Financial institution

There are two methods to do such a examine, and so they level in reverse instructions.

The massive research, those the platforms and the well-funded instruments run, hearth hundreds of prompts, document which manufacturers seem within the solutions, and roll that up into share-of-voice studies. Massive pattern, however black field. They solely ever see the completed reply, so that they should infer the equipment beneath from the output.

That is the opposite method spherical. I learn the community visitors, the JSON the engine sends to my very own browser, and carry out the engine’s personal inside labels: the result_source it stamps on every end result, the turn_use_case it recordsdata every question beneath, the seller names, the search queries it wrote, the mannequin it truly ran. I’m not measuring how usually one thing occurs throughout a inhabitants. I’m documenting that the machine has a factor, and what the machine calls it.

That distinction decides what you possibly can belief right here, so I’m going to be blunt about it.

2 Confidence Ranges, Do Not Combine Them Up

Structural Details (Excessive Confidence)

That result_source exists and carries serp, labrador, vivid, oxylabs. That vivid is Vivid Information and oxylabs is Oxylabs. That there are six turn_use_case values. That textual content queries skip the net solely. That Pondering fires dozens of website: and price-verification sub-queries. These are learn straight off the wire. One clear seize proves a subject exists and what it’s named, and a immediate case examine, nonetheless monumental, can not see any of it.

Frequency Observations (Directional Solely)

Something with a share or a rating, “70% vivid,” “Reddit is essentially the most cited area,” “YouTube by no means will get cited,” comes from tens of queries on a single account, and my very own question selection skews it. I picked SaaS and tech, which is strictly why Reddit and the tech evaluate hubs lead right here; a batch of well being or trend queries would crown completely different ones. Learn these as the form of the factor, not the measurement. The place a path has a mechanical purpose behind it (Reddit is textual content so it will get quoted, YouTube is video (metadata) so it doesn’t), belief the path and ignore the precise quantity.

First, The Boring Fact About ‘Packet Evaluation’

Skip this part when you don’t need to get into nitty-gritty technical particulars.

My first intuition was improper. You can not sniff packets and skim queries, as a result of the payload is TLS-encrypted, so a seize fingers you scrambled ciphertext for the precise messages. What the seize does leak is the metadata.

The vacation spot hostname, the IPs, and the truth that the ChatGPT app talks over QUIC (HTTP/3), not plain TCP. That’s the reason, within the screenshot under, Wireshark can nonetheless present “openai” within the handshake. It reads the unencrypted server identify, not the dialog. QUIC obfuscates its first packet with mounted keys from the spec, so a instrument can unwrap that opening packet to point out the ClientHello.

Picture Credit score: Suganthan Mohanadasan

The true request and response our bodies sit in later protected payloads that keep unreadable. So the readable layer is the browser itself, after decryption, within the Community panel.

That’s the place the queries, the solutions, and all of the metadata dwell as JSON.

That is HTTP inspection, not packet sniffing, and it’s price saying as a result of half the individuals who do this begin with Wireshark and quit. (I do know I did lol.)

Two issues that didn’t work, so you don’t repeat them.

Driving a clear automated Chrome bought me laborious blocked by Cloudflare inside a couple of queries on a unique engine: the “verifying you’re human” wall simply loops endlessly in an automatic browser, so I moved to my actual Chrome with my actual periods.
On ChatGPT, the reply by no means confirmed up in my seize at first, as a result of it streams over a long-lived connection opened at web page load {that a} hook put in mid-session can not see. Extra on each later.

The Area That Labels Each Supply

I opened DevTools, turned on Protect log, ran a traditional question, and searched the responses for something that appeared like a label.

The sphere that got here again was result_source. It sits on each internet end result ChatGPT pulls; you by no means see it within the reply, and it takes 1 of 4 values.

Mark Williams-Prepare dinner shared that he had discovered three of those; I got here throughout the fourth. I then noticed Metehan’s post, and it appears to be like like he could have already discovered it too. However actually, this isn’t actually about who discovered what first. It’s extra about sharing what we’re seeing, evaluating notes, and studying from one another.

Right here’s one supply from the visitors, trimmed to the fields that matter.

{
 "attribution": "TechRadar",
 "url": "https://www.techradar.com/finest/...",
 "snippet": "...",
 "pub_date": "2026-05-09",
 "result_source": "labrador"
}

The 4 values it makes use of:

`result_source`	What it’s
`serp`	The open internet baseline, principally seen on information (Yahoo, StreetInsider)
`labrador`	An allowlist of established publishers. Reuters, The Guardian, the WSJ, the FT, Wikipedia, even arXiv. Snippets run to ~1,080 characters, mainly full-article extracts
`vivid`	Vivid Information, a business internet scraper. Dominant for purchasing, finance, climate, native.
`oxylabs`	Oxylabs, a rival scraper. Regional and native press, some open internet

labrador appears to be like like a licensed tier, a number of of these publishers have signed content material offers with OpenAI, and it isn’t one you get into except you personal a nationwide newspaper.

vivid and oxylabs are the fascinating pair. The names level at Vivid Information and Oxylabs, two business scraping corporations that occur to be direct rivals. I can’t see a contract within the visitors, so I received’t declare ChatGPT pays them, however its open internet fetching runs by each, and the sphere tells you which of them one fetched every end result. (We’ve been Oxylabs clients for a very long time for our SaaS Key phrase Insights.)

Throughout every little thing I logged, vivid did the majority of the fetching, particularly on business, purchasing, finance and climate queries. oxylabs skewed regional and native, labrador stayed on information and reference, and serp principally turned up on information. To place names to the tiers, labrador carried Reuters, the WSJ, Wikipedia and TechRadar, vivid pulled Reddit, Forbes and rtings, and oxylabs introduced the Gulf press like Khaleej Occasions and Gulf Information.

I even caught the break up inside one climate question, vivid taking the worldwide knowledge websites just like the Met Workplace whereas oxylabs dealt with the native Gulf press. (I dwell in Dubai, by the best way.) In that one question, the breakdown got here out like this.

Supply Pipeline

metoffice.gov.uk vivid
accuweather.com vivid
timeanddate.com vivid
khaleejtimes.com oxylabs
gulfnews.com oxylabs
whatson.ae oxylabs

The AI search engine marketing/GEO Takeaway

You’re principally competing within the scraped tier, so be cleanly scrapable. Put your information and numbers in plain HTML textual content, by no means behind a script or inside a PDF or a picture. The licensed tier is usually shut, so the lever you’ve bought is third-party protection, PR, brand mentions, hyperlinks, and Reddit, to land on the pages the scrapers truly attain.

The Queries That By no means Attain The Internet

The subsequent factor I observed was that some queries produced no community search in anyway. Earlier than ChatGPT searches, it recordsdata your query right into a bucket, in a subject known as turn_use_case. I noticed six of them throughout the questions I attempted: instantaneous search, purchasing, textual content, native, considering, and picture technology.

The one to care about is textual content. When ChatGPT recordsdata your query as textual content, it doesn’t search. It solutions from its coaching corpus and stops.

The plain instances find yourself right here: “how do I alter a flat tyre“, “write a Python perform to merge two sorted lists,” and “translate this into 4 languages” all got here again textual content with an empty community tab.

The one that ought to fear you is that “newest remedy tips for kind 2 diabetes” additionally got here again textual content, a present, high-stakes query you’d assume it researches. It didn’t; it answered from coaching. No E-E-A-T here. Oops!

Of 10 intentionally present questions I attempted, three had been dealt with this manner with no search in any respect.

The wording decides the bucket, not the subject.

“finest espresso close to me” flips to the native pipeline, “finest 4K TVs to purchase” activates purchasing, however “finest 4K TVs with evaluations” stayed a traditional search.

A maths query quietly jumped to a reasoning mannequin beneath considering, whereas “Tesla inventory worth this week” stayed instantaneous search.

Bear in mind, these are outcomes from my restricted testing. I’ll do extra assessments once I discover some extra time.

The AI search engine marketing/GEO Takeaway

Earlier than you spend a penny on a web page, examine the question even searches. If it’s a how-to or a definition, it could be answered from coaching, the place no web page can get in, nonetheless good it’s. Spend your effort the place it truly fetches.

If you wish to be talked about for such queries, you’d have to spend so much of time building authority and wait on your model to be included in future coaching knowledge. (For instance, ensure that crawlers like Frequent Crawl can see your website.)

How One Query Followers Out Into Dozens Of Searches (Fan-Out Queries)

ChatGPT additionally exposes the searches it runs for you, when you pull the total dialog again from its personal API. On the quick mannequin, it’s minimal: one reworded question and finished, possibly optimized for velocity over depth. On the considering mannequin, requested to match a couple of merchandise, it ran roughly 15 to 40 sub-queries off the only query. (The quantity relied on the complexity of the query.)

Right here’s a slice of what it truly ran for one evaluate activity.

"Profound AI search visibility pricing AI engines tracked 2026"
"AthenaHQ pricing AI search visibility instrument"
"website:peec.ai/pricing Peec AI Starter Professional Superior 50 prompts 150 prompts"
"Peec AI pricing $95 $245 $495 official" (a guessed worth, then searched to verify)
"Scrunch AI pricing" (not in my immediate, discovered mid-research)
...round 40 of those for one comparability

Three issues stand out in there. It fires website: probes straight at vendor pricing pages.

It guesses a worth after which searches to verify it. And it retains widening because it goes, choosing up instruments you by no means named and chasing their pricing, too.

It doesn’t solely search both; the page-reading is simply as literal. It ran discover for $, €, 99 and even “Company,” then used the looking instrument’s personal open and click on instructions to drag up the outcomes it needed, run server-side, not an agent in your display.

The identical occurs to your individual website. Ask it “key phrase insights pricing,” and it runs a website:keywordinsights.ai/pricing probe, guesses one thing like “Starter $58, Professional $145, Superior $299,” then opens the web page and reads the HTML for the forex image to verify.

The AI search engine marketing/GEO Takeaway

Put your key numbers and knowledge in plain HTML textual content, by no means inside a picture, as a result of on this case with pricing it greps the web page for $ and € and might’t learn a graphic. Additionally, it is advisable be sure to survive a website:yourdomain.com/pricing probe on this use case and write for the cleaned-up question it truly runs, not the messy phrase an individual varieties. Keep away from JavaScript-based toggles and dynamic knowledge loading.

Fetched, Cited, And Talked about Aren’t The Similar

That is the excellence individuals muddle most, so it’s price being precise. Three various things can occur to a supply.

Fetched. The mannequin pulls your web page into context. That is the result_source object. The reader by no means sees it.
Cited. It attaches your web page because the supply behind a particular sentence, the footnote you possibly can click on.
Talked about. Your brand name appears in the answer, usually as a chip linking to your website, nevertheless it isn’t the supply of the declare.

They’re three separate outcomes, and you’ll win or lose each by itself.

To see the hole between them, I took a batch of economic and suggestion queries and break up what ChatGPT fetched from what it cited.

That is the small, tech-skewed pattern, so learn what follows as a sample, not a quantity to financial institution on.

Throughout that batch, Reddit and YouTube had been each fetched closely, 278 and 201 instances. However Reddit was cited 11 instances and YouTube not as soon as.

I believe the reason being mechanical. A quotation has to bind to textual content the mannequin truly pulled, and when it fetches a YouTube web page in search, it will get the metadata, not the precise video transcript.

A Reddit thread is all there within the web page. This isn’t simply my pattern both. Ahrefs, throughout 1.4 million ChatGPT prompts, discovered Reddit cited at 1.93% in opposition to YouTube’s 0.51%, and Profound discovered the identical hole.

A number of different patterns, identical caveat on pattern dimension. Reddit was the only most-cited area, narrowly, and after that nobody ran away with it. The citations unfold skinny throughout evaluate hubs like rtings and TechRadar and vendor pages cited for their very own specs.

Right here’s the highest of the cited listing throughout that batch.

Vendor pages get cited too, however for their very own information, the pricing and specs. Zoho, Semrush, and the VPNs earned citations that method. The decision on which one is finest nonetheless will get cited to a 3rd get together. You may be talked about with out being cited, and cited with out being talked about.

Two mechanics sit beneath this. Citations bind to a particular sentence, not the entire reply, so being topically related isn’t sufficient; you need to be the most effective help for a exact declare.

And outcomes are deduped by area, so 20 skinny pages out of your website collapse into one.

One sturdy web page per declare beats a pile of weak ones.

So, don’t go round creating hundreds of low high quality/skinny pages to deal with every fanout question.

The AI search engine marketing/GEO Takeaway

You’ll be able to’t cite your self. The declare about you will get sourced from another person, so earn third-party coverage on evaluate websites and Reddit, win on textual content relatively than video, and put one sturdy web page behind every declare, as a result of it dedupes by area.

The Mannequin Explains Its Personal Technique

I went in search of a hidden rating rating first and located nothing. That type of logic – a site authority quantity, a belief weight, a components – by no means reaches your browser, as a result of it stays on OpenAI’s servers.

So, anybody promoting you “ChatGPT’s rating components” is promoting you snake oil.

What the visitors does have is the considering mannequin’s chain of thought, saved within the dialog, the place it describes its personal sourcing in plain phrases.

For information, the pricing and the specs, it goes to the official web page first, and it says so.

Evaluating Ahrefs, it reads the official web page, notes it “lists Lite at $129, Commonplace at $249, and Superior at $449,” and decides “pricing web page appears extra present, so I ought to cite that.” It needs the supply it trusts, and the present one.

Then it hits the wall this complete put up is about.

On Profound, it causes that “the pricing isn’t exhibiting up immediately within the search end result, probably as a result of it’s loaded with JavaScript.” Similar on Peec, the place “the pricing doesn’t present up immediately, probably hidden with JavaScript.”

So, it stops making an attempt to learn them and falls again. “I can quote third-party sources for the reason that official web page is difficult to parse and doesn’t present costs”, it writes, and it notes it ought to “use citations from G2 the place applicable.”

That’s the entire sport in a single hint. The mannequin needed Profound’s and Peec’s personal numbers. Their pricing sat behind JavaScript, so it couldn’t learn them, and it cited G2 as a substitute. Your information, another person’s web page, as a result of yours wouldn’t parse.

These quotes are the mannequin’s personal, from the saved reasoning, not mine.

The AI search engine marketing/GEO Takeaway

Personal your information, in plain HTML. Your pricing and spec numbers have to sit down in crawlable textual content, not loaded by JavaScript and never baked into a picture, as a result of the mannequin reads the web page itself and provides up when it could actually’t. A JavaScript pricing desk doesn’t simply rank badly; it fingers your numbers to G2.

The opinion you earn individually, by evaluations, Reddit, and trustworthy comparability content material, which is the place the advice will get cited from. A clear, readable pricing web page with no third-party protection will get your information learn and another person really helpful.

What I Might Not See

There’s no seen rating logic, as above, so why one supply beats one other, previous the mannequin’s personal narration, stays server-side.

Personalization is actual and selective.

On a question that overlapped my very own work, ChatGPT pulled in my previous conversations, with the sources listed as personal_sources: ["convo_search", "gmail", "files"].

It used one among my previous chats inside a generic “finest instruments” reply, however solely on one of many three conversations I checked, the one which matched my historical past.

So, a part of some solutions is constructed from a person’s personal knowledge you possibly can by no means optimize for, which is one purpose two individuals get completely different solutions and visibility scores wobble.

Native is capped. There’s a config worth, local_results_limit, set to 2.

Ask for the most effective espresso close to you, and ChatGPT returns two locations, not a prime 10. For native, you’re within the prime 2, otherwise you aren’t there.

One factor I genuinely can’t name but. My learn on purchasing comes from a single purchasing question, and it flatly contradicts what Mark noticed on his single question, so the purchasing combine is unsettled till somebody runs a correct batch.

And the broader caveat, mentioned plainly. The construction I’m certain of, as a result of I noticed it throughout roughly 1,240 information. The chances come from a small batch of economic queries, principally SaaS and tech, so that they want a much bigger run throughout actual verticals earlier than anybody banks on them.

That run is the following piece.

Run It Your self

None of this wants particular entry or requires you to be linked to the Matrix and change into an operator, simply your individual browser.

Open ChatGPT, press Cmd+Possibility+I for DevTools, open Community, tick Protect log, run a question, then press Cmd+Possibility+F and search the responses for result_source.

That alone reveals you the pipeline behind every hyperlink.

For the remainder, the fan-out and the citations and the reasoning, open the Console, kind permit pasting as soon as, and run this in opposition to a dialog that searched the net.

const t = (await (await fetch('/api/auth/session')).json()).accessToken;
const c = await (await fetch('/backend-api/dialog/' + location.pathname.break up('/c/')[1], {headers: {Authorization: 'Bearer ' + t}})).json();
const rows = [];
JSON.stringify(c, (ok, v) => {
 if (v && v.result_source) {
 const d = (v.attribution || v.url || '?').toString();
 rows.push({supply: d.exchange('https://', '').exchange('www.', '').break up('/')[0], pipeline: v.result_source});
 }
 return v;
});
console.desk(rows);

It reads solely your individual session, so nothing leaves your machine. The output is a plain desk of every supply and the pipeline that fetched it.

supply pipeline
techradar.com labrador
whathifi.com labrador
soundguys.com vivid
rtings.com vivid
khaleejtimes.com oxylabs
streetinsider.com serp

Change what the loop collects, and you’ll pull the searches, the citations, and the reasoning the identical method.

A Free Extension Now Captures Most Of This

If pasting scripts into your individual console isn’t your factor, there’s now a better route. Olivier de Segonzac already ran a free Chrome extension that pulls ChatGPT’s search and fan-out knowledge.

He learn this analysis and prolonged it to seize three of the alerts I took aside above.

The turn_use_case bucket. The intent label ChatGPT recordsdata every flip beneath, so you possibly can spot when a question flips to purchasing, native, or textual content earlier than it even solutions.
The reference-type combine. How most of the reply’s citations had been merchandise versus search outcomes, information, or pictures, parsed straight from the reference tokens.
The result_source pipeline. The scraper behind every cited end result, charted per dialog, so the Vivid Information, Oxylabs, Labrador, and SERP break up reveals up with out you studying a line of JSON.

It runs regionally by yourself session and exports straight to Excel. Seize it from the Chrome Web Store, and Olivier wrote up the update here.

So, again to the query we opened with. Does the standard recommendation maintain up? Principally. Reddit earns citations and topped my cited listing. Listicles and evaluate websites make up many of the relaxation. Good content material nonetheless issues, however solely the half the mannequin can truly learn. The remainder it reads off another person’s web page.

Which is the actual lesson. ChatGPT isn’t a search engine, so cease optimizing for one.

It reads your individual web page for the information, if it could actually parse them, and everybody else’s for the opinion, and solely when the query is price a search. Construct for that.

And deal with all of this, mine included, as a snapshot of a system that adjustments by the week. The construction holds. The numbers transfer.

Whereas I used to be within the visitors, I additionally discovered a pile of issues with nothing to do with sourcing: the bot wall that stops you scripting it, a hidden purchasing engine, and 573 dwell experiments operating on the account. These shall be revealed individually.

I’ve additionally finished comparable evaluation on Perplexity, Gemini, and so on., so I’ll be sharing these quickly.

Extra Sources:

This put up was initially revealed on Suganthan.

Featured Picture: Viktoriia_M/Shutterstock

Source link

How ChatGPT Actually Picks Sources

How This Differs From The Massive Visibility Research, And What You Can Take To The Financial institution

2 Confidence Ranges, Do Not Combine Them Up

Structural Details (Excessive Confidence)

Frequency Observations (Directional Solely)

First, The Boring Fact About ‘Packet Evaluation’

The Area That Labels Each Supply

The AI search engine marketing/GEO Takeaway

The Queries That By no means Attain The Internet

The AI search engine marketing/GEO Takeaway

How One Query Followers Out Into Dozens Of Searches (Fan-Out Queries)

The AI search engine marketing/GEO Takeaway

Fetched, Cited, And Talked about Aren’t The Similar

The AI search engine marketing/GEO Takeaway

The Mannequin Explains Its Personal Technique

The AI search engine marketing/GEO Takeaway

What I Might Not See

Run It Your self

A Free Extension Now Captures Most Of This

[email protected]

Leave a Reply Cancel reply

Wedding Cards Html5 Ad Banners

61% of CMOs Say Local Marketing Is Too Complex: Here’s the Fix

HTML Ad Banners – Summer Sale Set-3

Press ESC to close

How This Differs From The Massive Visibility Research, And What You Can Take To The Financial institution

2 Confidence Ranges, Do Not Combine Them Up

Structural Details (Excessive Confidence)

Frequency Observations (Directional Solely)

First, The Boring Fact About ‘Packet Evaluation’

The Area That Labels Each Supply

The AI search engine marketing/GEO Takeaway

The Queries That By no means Attain The Internet

The AI search engine marketing/GEO Takeaway

How One Query Followers Out Into Dozens Of Searches (Fan-Out Queries)

The AI search engine marketing/GEO Takeaway

Fetched, Cited, And Talked about Aren’t The Similar

The AI search engine marketing/GEO Takeaway

The Mannequin Explains Its Personal Technique

The AI search engine marketing/GEO Takeaway

What I Might Not See

Run It Your self

A Free Extension Now Captures Most Of This

Share Article:

CloudPOS – SaaS POS, Sales, Purchase & Inventory Management JS Script

Dream Destination – Summer Sale – HTML Ad Banner Template

Leave a Reply Cancel reply