LLMs Link Pseudonymous Profiles at Scale

Society

Published: Fri, Feb 20, 2026 • By Theo Solander

LLMs Link Pseudonymous Profiles at Scale

New research finds Large Language Models (LLMs) can link pseudonymous accounts across platforms by mining unstructured text. With web access or a fixed candidate set, agents achieve up to 68% recall at 90% precision, far ahead of classical baselines. The work argues practical obscurity no longer protects users and urges updated threat models.

Pseudonymity on public forums has long relied on practical obscurity: you could be found, but only if someone cared enough to look. New work shows that Large Language Models (LLMs) turn that old, creaky lock into a revolving door. By lifting identifying signals out of ordinary posts and profiles, LLMs can link accounts at scale with unsettling competence.

The researchers test two attack modes. In one, an agent with web search converts a target’s posts into concrete claims and hunts for likely identities. On datasets from Hacker News and Reddit, this approach saw roughly 25 to 67 percent recall at 70 to 90 percent precision. On a vetted Hacker News set, it correctly identified 226 of 338 targets, about 67 percent recall at 90 percent precision. What a dedicated human might do in hours now fits into an automated loop.

The second mode is a closed-world pipeline that breaks deanonymisation into four steps: Extract, Search, Reason and Calibrate. The model first turns free-form text into identity-relevant micro-data, then uses embeddings to pull a shortlist of candidates, before applying LLM reasoning to verify matches and a final calibration to tune confidence. Across three settings it substantially outperforms classical baselines, reaching up to 68 percent recall at 90 percent precision, while non-LLM methods are near zero in some tasks. The components matter: dense retrieval often puts the true match in the top 15, but the Reason stage is the workhorse. In one comparison, adding Reason lifted recall at 99 percent precision from 4.4 percent to 45.1 percent.

To ground the numbers, the authors build three datasets with verifiable links. One connects Hacker News profiles to LinkedIn via cross-platform references in profiles. Another pairs users across Reddit movie forums. A third splits a single Redditor’s history in time to create two pseudonyms to be matched. They also report results on a set of interview transcripts. Performance degrades as the candidate pool grows, but the Reason-enabled pipeline holds up better and remains non-trivial.

If this sounds familiar, it is. The Netflix Prize episode showed how a few structured signals could pierce anonymity. The difference now is that LLMs work directly on raw prose. Style, topics, and those stray biographical crumbs we all drop become a composite signature. Yesterday’s manual feature engineering becomes today’s prompt.

For defenders, the security story is plain. Practical obscurity no longer protects at scale. Automated linking raises the stakes for surveillance, targeted social engineering, harassment and commercial profiling, and it erodes the safety of communities that depend on separation between handles and real identities. When the marginal cost of a search falls close to zero, threat models must change.

There are practical steps that follow from the evidence here. Platforms can reduce exposure through data minimisation, rate-limited and audited access to user content, and privacy-preserving processing such as differential privacy where feasible. Policies and monitoring need to assume adversaries will automate feature extraction, retrieval and verification, not just scrape.

The caveats are sensible. The datasets rely on ground-truthable links or synthetic splits, which may overestimate recall for harder, cleaner separations. Open-web agents depend on external search systems. The authors avoided targeting real pseudonymous users, did not release code or processed data, and false positives remain a risk as pools grow. Still, the trajectory is clear. Each time we make behaviour machine-readable, the shadows shorten. The rhyme with past de-anonymisation work is unmistakable; the verse has just become easier to sing at scale.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Large-scale online deanonymization with LLMs

Authors: Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, and Florian Tramèr

We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.

🔍 ShortSpan Analysis of the Paper

Problem

The paper studies whether large language models can be used to deanonymise pseudonymous online accounts at scale and how that changes threat models for online privacy. It shows that LLMs, when combined with web access or a closed candidate set, can extract identity-relevant signals from unstructured text, search large candidate pools, and reason to verify matches. This matters because many users rely on practical obscurity and pseudonymity; automated LLM-driven linking could enable mass doxxing, surveillance, targeted scams and erosion of safe spaces for vulnerable groups.

Approach

The authors evaluate two attack modes. First, an open-web, agentic approach where an LLM with web-search tools autonomously converts a pseudonymous profile into structured claims, performs web searches, and reasons about candidate identities. Second, a modular closed-world pipeline that decomposes deanonymization into Extract, Search, Reason and Calibrate stages. Extract uses LLMs to turn unstructured posts into semi-structured micro-data; Search encodes features with dense embeddings for nearest-neighbour retrieval; Reason applies LLM selection and verification on top candidates; Calibrate produces confidence scores or sorts matches to trade precision and recall. They build three evaluation datasets with ground truth: Hacker News profiles linked to LinkedIn, Reddit movie-community splits, and temporally split Reddit histories. They also report results on an Anthropic interview transcript set. Classical baselines adapted from prior deanonymization work are used for comparison.

Key Findings

Open-web agentic attacks can re-identify real users quickly: on Hacker News and Reddit the agents achieved roughly 25 to 67 percent recall at 70 to 90 percent precision, and on a vetted Hacker News set the agent correctly identified 226 of 338 targets (67%) at 90% precision.
In closed-world experiments the LLM-augmented pipeline substantially outperforms classical baselines. Across settings the authors report up to about 68% recall at 90% precision while the best non-LLM methods were near 0% for some tasks.
Pipeline components are complementary: embeddings narrow the candidate pool (true match often in top-15), while the Reason step (LLM selection and verification) and Calibrate step (confidence scoring or pairwise sorting) materially increase recall at high precision. For LinkedIn to Hacker News matching the Reason step improved recall at 99% precision from 4.4% to 45.1% in one comparison.
Performance degrades with larger candidate pools but remains non-trivial: the Reason-enabled pipeline scaled more gracefully and extrapolations indicate meaningful recall even at much larger pools. High reasoning effort models outperform low effort, especially at strict precision targets.

Limitations

Evaluation relies on datasets constructed to provide verifiable ground truth: synthetically anonymised profiles and profile splits. This may overestimate recall because users who expose cross-links are easier to deanonymise, and split profiles are more similar than truly separate accounts. The agentic open-web experiments depend on external search systems, making contributions hard to isolate. The authors did not release code or processed data and avoided deanonymising real pseudonymous users to limit harm.

Why It Matters

The results imply that pseudonymity on public platforms is less protective than assumed: LLMs lower the cost of linking accounts and identities by automating feature extraction, retrieval and reasoning. Security implications include increased risks of surveillance, targeted social engineering, harassment and commercial profiling. Defences suggested include stronger data minimisation, audited and rate-limited access to user content, privacy-preserving processing, improved platform policies, and monitoring of model misuse. The paper calls for reconsidering anonymisation standards and developing new mitigations that account for LLM-enabled attacks.

Attribution Original paper on arXiv