ShortSpan.ai logo Home

LLMs Aid SOC Analysts, But Do Not Replace Them

Enterprise
Published: Wed, Aug 27, 2025 • By Clara Nyx
LLMs Aid SOC Analysts, But Do Not Replace Them
A 10-month study of 3,090 queries from 45 SOC analysts finds LLMs act as on-demand cognitive aids for interpreting telemetry and polishing reports, not as decision-makers. Usage grows from casual to routine among power users. This shows promise for efficiency but warns against unchecked trust and single-site overreach.

The new field study is refreshingly honest and quietly alarming. Researchers tracked 3,090 queries from 45 SOC analysts over 10 months and found LLMs mostly doing grunt sensemaking: decoding commands, clarifying logs, and tidying writeups. They help, but they do not replace the analyst in the loop.

That matters because vendors and press love bold headlines. Here the real story is less glamorous. Analysts use short 1 to 3 turn exchanges to get context or reword a report. Only about 4 percent of interactions ask for explicit recommendations. In short, LLMs are a tool, not an oracle.

The good news is practical: these models reduce friction on tiny tasks that otherwise interrupt attention during a crisis. The worrying news is structural. Usage skews toward a handful of power users, the study covers a single enterprise deployment, and no hard outcomes like time-to-triage or missed alerts are reported. That leaves room for complacency and for leaders to assume benefits that are not yet proven.

Design takeaway: toolmakers must surface evidence, not rhetoric. Show the telemetry used for a suggestion, flag uncertainty, and avoid turning helpful drafts into unquestioned decisions.

Do this now: require any SOC LLM trial to log queries and outputs for audit, and run a short pilot that measures time-to-triage and error rates before you scale. Small, measurable proofs beat optimism and marketing every time.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

LLMs in the SOC: An Empirical Study of Human-AI Collaboration in Security Operations Centres

The integration of Large Language Models (LLMs) into Security Operations Centres (SOCs) presents a transformative, yet still evolving, opportunity to reduce analyst workload through human-AI collaboration. However, their real-world application in SOCs remains underexplored. To address this gap, we present a longitudinal study of 3,090 analyst queries from 45 SOC analysts over 10 months. Our analysis reveals that analysts use LLMs as on-demand aids for sensemaking and context-building, rather than for making high-stakes determinations, preserving analyst decision authority. The majority of queries are related to interpreting low-level telemetry (e.g., commands) and refining technical communication through short (1-3 turn) interactions. Notably, 93% of queries align with established cybersecurity competencies (NICE Framework), underscoring the relevance of LLM use for SOC-related tasks. Despite variations in tasks and engagement, usage trends indicate a shift from occasional exploration to routine integration, with growing adoption and sustained use among a subset of analysts. We find that LLMs function as flexible, on-demand cognitive aids that augment, rather than replace, SOC expertise. Our study provides actionable guidance for designing context-aware, human-centred AI assistance in security operations, highlighting the need for further in-the-wild research on real-world analyst-LLM collaboration, challenges, and impacts.

🔍 ShortSpan Analysis of the Paper

Problem

The paper studies how large language models (LLMs) are actually used by Security Operations Centre (SOC) analysts in live operations, addressing a gap between laboratory evaluations and real-world practice. This matters because LLMs could change analyst workload, decision processes and tooling in time-sensitive, high-stakes security environments.

Approach

Longitudinal analysis of 3,090 valid analyst queries submitted by 45 SOC analysts over 10 months to an internally deployed GPT-4 instance (GPT-4-0613) without fine-tuning or internet access. Researchers anonymised data and used a six-phase mixed method: exploratory statistics, manual coding and conversation tagging, semantic clustering with Sentence-BERT, and triangulation. Inter-rater reliability scores reported: Fleiss' Kappa 0.90, 0.82, 0.79 for coding dimensions and 0.75 for conversation tagging. Analysts were free to use the system but instructed not to share sensitive data.

Key Findings

  • Adoption: usage rose from under 10 to over 30 queries per day, driven by a subset; one power user submitted ~600 queries (17% of dataset).
  • Primary uses: 31% of queries were command interpretation, 22% text editing/rewriting, 11% code/script/regex analysis; 93% of queries aligned with NICE cybersecurity competencies.
  • Interaction style: most conversations were short and iterative—57% two-step, 75% two to three queries; analysts favoured short 1–3 turn exchanges and retained final decision authority.
  • Behavioural patterns: analysts used the LLM as an on-demand cognitive aid for sensemaking and communication rather than for prescriptive judgements; only ~4% requested explicit recommendations.
  • Query/response stats: mean analyst query length 25 words, mean LLM response 161 words; median gap between visits 1–2 hours.

Limitations

Single enterprise SOC and single model deployment limit generalisability; no objective performance metrics (time-to-triage, accuracy) were measured; potential novelty effects and lack of direct interviews to validate motives. Other limitations: not reported.

Why It Matters

LLMs can augment SOC workflows by interpreting low-level telemetry and offloading documentation tasks, improving situational awareness and efficiency while preserving analyst autonomy. Design priorities include embedding context-aware explanations, supporting microtasks to reduce context switching, and surfacing evidence rather than definitive recommendations to limit over-reliance. Further multi-site, outcome-focused studies are needed.


← Back to Latest