LinkedIn this week printed an in depth technical submit of the way it rebuilt the advice system underpinning its Feed, deploying giant language fashions, transformer-based architectures, and clusters of H100 GPUs to switch what had beforehand been a fragmented set of impartial retrieval pipelines. The blog post, authored by Hristo Danchev and printed on March 12, 2026, is probably the most granular disclosure LinkedIn has made in regards to the mechanics of content material rating on its platform, which now reaches greater than 1.3 billion professionals.

The disclosure issues for the advertising and marketing neighborhood for a particular motive: the Feed shouldn’t be merely a social device. It’s the main floor on which organic and paid content compete for attention on LinkedIn, a platform that accounted for 41% of whole B2B paid media budgets in 2025, based on Dreamdata’s most up-to-date benchmarks. How the Feed ranks content material – what it surfaces and what it suppresses – has direct implications for content material technique, viewers focusing on, and finally return on advert spend.

The previous structure and its limits

Till this rebuild, based on the announcement, LinkedIn’s Feed retrieval relied on what the corporate describes as a “heterogeneous structure.” When a member opened the Feed, content material arrived from a number of separate techniques operating in parallel: a chronological index of community exercise, trending posts filtered by geography, collaborative filtering based mostly on comparable members’ pursuits, industry-specific trending content material, and several other embedding-based retrieval techniques. Every supply maintained its personal infrastructure, its personal index, and its personal optimization logic. The structure was purposeful. It produced numerous outcomes. Nevertheless it carried, based on LinkedIn, “substantial upkeep prices” and made holistic optimization troublesome as a result of no single staff might tune throughout all sources concurrently.

The rating layer compounded the issue. In keeping with the announcement, the earlier method “handled every impression independently” – which means that when the system evaluated whether or not to indicate a submit, it made that judgment in isolation, with no reference to what the member had simply learn, or what trajectory their pursuits gave the impression to be following.

A unified retrieval system

The brand new system replaces the multi-source structure with a single, unified retrieval pipeline constructed round LLM-generated embeddings. The core concept is {that a} sufficiently succesful language mannequin, fine-tuned on LinkedIn engagement knowledge, can symbolize each posts and member profiles as vectors in a shared embedding area – and that semantic proximity in that area is a greater sign of relevance than any mixture of key phrase matching, collaborative filtering, or geographic trending alone.

In keeping with the submit, the sensible distinction turns into seen in what the corporate calls “cold-start” eventualities. When a brand new member joins with solely a profile headline and a job title, a standard embedding system can determine shallow correlations – energy, vitality, electronics. A mannequin skilled on a big pretraining corpus understands deeper relationships: that {an electrical} engineer who mentions grid optimization probably has latent curiosity in renewable vitality infrastructure and small modular reactors, even when no engagement historical past but exists to verify it.

From structured knowledge to LLM-readable prompts

One of many extra technically particular disclosures within the submit considerations how LinkedIn converts structured profile and submit knowledge into textual content that the LLM can course of. The corporate constructed what it calls a “immediate library” – a system of templates that transforms options into templated sequences. For posts, this contains format, writer headline, firm, {industry}, engagement counts, and submit textual content. For members, it contains work historical past, abilities, schooling, and – critically – a chronologically ordered sequence of posts the member has beforehand engaged with.

The staff bumped into an surprising downside with numerical options. Uncooked engagement counts, handed into the immediate as plain textual content – “views:12345” – have been handled by the mannequin as arbitrary textual content tokens, with no ordinal which means. The consequence was near-zero correlation (-0.004) between merchandise reputation and the cosine similarity scores between member and merchandise embeddings. Reputation is, in apply, a powerful relevance sign. The staff wanted the mannequin to know it.

The answer was percentile bucketing. As a substitute of passing a uncooked depend, the system now converts it to a percentile rank, wrapped in particular tokens: 71. This tells the mannequin that the submit sits within the 71st percentile of view counts – above common however not distinctive. Crucially, most percentile values between 1 and 100 tokenize as a single token, making the illustration secure and learnable. The consequence was a 30x enchancment in correlation between reputation options and embedding similarity, and a 15% enchancment in recall@10 – which means the highest 10 retrieved posts have been measurably extra related.

Coaching twin encoders

The retrieval mannequin makes use of a twin encoder structure. A shared LLM processes each member prompts and merchandise prompts, producing embeddings in contrast by way of cosine similarity. Coaching used InfoNCE loss with two sorts of detrimental examples: simple negatives (randomly sampled posts the member by no means noticed) and arduous negatives (posts that have been proven however obtained no engagement). The excellence issues. Arduous negatives power the mannequin to be taught nuanced distinctions between content material that’s almost related and content material that’s genuinely worthwhile.

Including simply two arduous negatives per member improved recall@10 by 3.6%, relative to a baseline utilizing solely simple negatives. One arduous detrimental per member produced a 2.0% enchancment. The marginal worth of the second arduous detrimental – a further 1.6 share factors – was substantial given the engineering price concerned.

A second key discovering involved the composition of the member’s interplay historical past. Initially, the staff included all impressed posts, each these the member engaged with and people they scrolled previous. This damage efficiency and elevated compute prices. GPU compute scales quadratically with context size. By filtering to incorporate solely posts that obtained optimistic engagement, the staff achieved a 37% discount in per-sequence reminiscence footprint, the power to course of 40% extra coaching sequences per batch, and a 2.6x sooner coaching iteration – all at equal or higher mannequin high quality. The coaching configuration used 8 H100 GPUs.

Sequential rating with the Generative Recommender

Retrieval determines which posts attain the rating stage. What a member truly sees is decided by a separate mannequin LinkedIn calls the Generative Recommender (GR). Not like the earlier method – which scored every submit independently – the GR mannequin treats a member’s Feed interplay historical past as a sequence. It processes greater than 1,000 historic interactions to know temporal patterns and long-term curiosity trajectories.

The structure makes use of transformer layers with causal consideration, which means every place within the sequence can solely attend to earlier positions. This mirrors the precise temporal move of how a member experiences content material. A member who engaged with machine studying content material on Monday, then distributed techniques on Tuesday, shouldn’t be displaying two random preferences. In keeping with the submit, “a sequential mannequin understands the trajectory.”

After the transformer layers, the mannequin makes use of a way referred to as late fusion: the transformer output is concatenated with per-timestep context options – system kind, member profile embeddings, aggregated affinity scores – earlier than being handed by a Multi-gate Combination-of-Specialists (MMoE) prediction head. Passive duties (click on, skip, long-dwell) and lively duties (like, remark, share) obtain specialised gating whereas sharing the identical underlying sequential representations.

The choice to fuse depend and affinity options after, quite than inside, the transformer sequence is a deliberate computational trade-off. Together with them within the self-attention pathway would inflate price quadratically – self-attention already scales quadratically with sequence size. Since these options present impartial sign energy quite than sequential interplay worth, late fusion delivers their profit with out the fee penalty.

Serving at scale

The engineering problem shouldn’t be solely constructing a succesful mannequin. It’s serving that mannequin to hundreds of queries per second with sub-second latency. Conventional rating fashions at LinkedIn ran on CPUs. Transformers are completely different. Self-attention scales quadratically with sequence size, and parameter counts within the billions require the high-bandwidth reminiscence out there solely on GPUs.

LinkedIn’s answer includes a set of customized infrastructure decisions. A disaggregated structure separates CPU-bound characteristic processing from GPU-heavy mannequin inference. A shared context batching method computes the historical past illustration as soon as, then scores all candidates in parallel utilizing customized consideration masks. A customized Flash Consideration variant – referred to as GRMIS (Generative Recommender Multi-Merchandise Scoring) – delivers a further 2x speedup over PyTorch’s normal scaled dot-product consideration implementation.

On the coaching aspect, a customized C++ knowledge loader eliminates Python’s multiprocessing overhead by fusing padding, batching, and packing on the native code stage. Customized CUDA kernels for multi-label AUC computation cut back metric calculation from a big bottleneck to negligible overhead.

The consequence, based on the submit, is sub-50ms retrieval latency throughout an index containing hundreds of thousands of posts. The system retrieves the highest nearest-neighbor candidates by operating a k-nearest-neighbor search towards a GPU-accelerated index of merchandise embeddings. Embeddings are refreshed by three steady nearline pipelines – immediate era, embedding inference, and index updates – every optimized independently. New posts obtain embeddings in near-real time. Present posts gaining engagement are dynamically refreshed.

Why this issues for advertising and marketing professionals

The implications lengthen past natural attain. The Feed is the first placement for LinkedIn’s sponsored content material codecs, and the identical rating logic that determines which natural posts a member sees operates alongside paid placement choices. LinkedIn’s ad platform has grown significantly lately, with B2B return on advert spend reaching 121% in 2025, based on Dreamdata’s March 10, 2026 report. A Feed that higher fashions skilled curiosity trajectories – quite than scoring posts in isolation – adjustments the aggressive dynamics for each natural and paid content material.

Entrepreneurs producing content material geared toward professionals in adjoining or rising fields might discover the brand new system extra receptive to their materials, notably if these audiences have demonstrated latent curiosity indicators even with out direct engagement historical past. The cold-start dealing with – inferring pursuits from profile knowledge and world data embedded within the LLM – is related for campaigns focusing on professionals who’re new to a class or position.

The question of how LinkedIn surfaces content has develop into extra urgent because the platform’s weight in B2B media plans has grown. At 41% of B2B paid media budgets in 2025, LinkedIn is the one largest line merchandise for a lot of advertisers – bigger than any particular person Google product. Understanding the mechanics of what the Feed prioritizes shouldn’t be an instructional train.

The LLM-based method additionally introduces a special type of content material competitors. Beneath keyword-based or shallow-embedding techniques, a submit about “knowledge safety” competes primarily with different posts about “knowledge safety.” Beneath an LLM-based system that understands semantic relationships, the identical submit competes with content material about regulatory compliance, cloud infrastructure, and operational threat – as a result of the mannequin understands these subjects as interconnected for sure skilled profiles. For content material strategists, the breadth of efficient competitors expands significantly.

LinkedIn has beforehand disclosed its broader AI-driven content strategy, together with the way it restructured its personal advertising and marketing operations after B2B non-brand site visitors fell by as much as 60% on its owned internet properties. The March 12 engineering disclosure is a separate however associated thread – it describes the interior advice infrastructure quite than the platform’s exterior web optimization posture, however each mirror the identical underlying shift: the platform’s content material surfaces are more and more mediated by LLM-based reasoning quite than less complicated sign aggregation.

Accountable AI issues are talked about within the announcement. In keeping with the submit, LinkedIn “usually and rigorously” audits fashions to verify that posts from completely different creators compete on equal footing, and that the scrolling expertise is constant throughout viewers teams. The rating mannequin depends on skilled indicators and engagement patterns, and particularly excludes demographic attributes.

Timeline

Abstract

Who: LinkedIn’s AI Modeling, Product Engineering, and Infrastructure groups, led by engineer Hristo Danchev.

What: An entire rebuild of LinkedIn’s Feed advice system, changing a multi-source retrieval structure with a unified LLM-based twin encoder for retrieval and a transformer-based Generative Recommender (GR) mannequin for rating. Key technical specifics embrace percentile-bucketed numerical options, arduous detrimental sampling with a 3.6% recall achieve, 8 H100 GPUs for coaching, sub-50ms retrieval latency, and a customized Flash Consideration variant delivering 2x extra speedup.

When: The engineering weblog was printed March 12, 2026, although the system has been rolling out as an ongoing deployment. Associated infrastructure work, together with SGLang-based LLM serving, was documented in a February 20, 2026 submit.

The place: The adjustments have an effect on the LinkedIn Feed globally, serving content material to all 1.3 billion members of the platform. The infrastructure runs on GPU clusters, with nearline pipelines refreshing embeddings and indices constantly.

Why: LinkedIn’s earlier heterogeneous retrieval structure created upkeep complexity and made holistic optimization troublesome. The rating mannequin handled every impression independently, lacking sequential patterns in skilled content material consumption. The rebuild goals to floor extra related content material – together with from exterior a member’s speedy community – by modeling each semantic understanding and temporal curiosity trajectories.


Share this text


The hyperlink has been copied!




Source link