Study Finds 62 Security Smells in IaC

Defenses

Published: Wed, Sep 24, 2025 • By Lydia Stratus

A study expands Infrastructure as Code (IaC) security smells from seven to 62 categories across seven popular tools. It uses Large Language Model (LLM) assistance with human validation and adds linter rules. Smells persist in public projects and can expose AI endpoints, credentials and data pipelines; teams must adopt DevSecOps checks.

A study updates the catalogue of Infrastructure as Code (IaC) security smells, expanding the recognised patterns from seven to 62 categories across seven popular IaC tools. That expansion matters because insecure IaC can create persistent attack surfaces that expose AI endpoints, credentials and data pipelines used in automated deployments.

The work covers Terraform, Ansible, Chef, Puppet, Pulumi, Saltstack and Vagrant, and combines automated Large Language Model (LLM) clustering with systematic human validation. The authors map findings to Common Weakness Enumeration (CWE) categories and report about 95 percent consistency between the LLM output and CWE-oriented labels. They also implement new linter rules for seven tool ecosystems and report that manually validated rules often achieved a precision of 1.00.

What changed

Rather than rely on a single tool or a small set of manual examples, the study samples verified snippets from public GitHub projects and uses static analysis to validate candidate smells. LLMs help scale pattern discovery but the taxonomy decisions are reconciled with human reviewers and security standards. An evolution study in the paper shows smells persist in repositories over multiple years, indicating detection and remediation gaps.

The practical risk is straightforward. IaC drives cloud configuration, and repeatable misconfigurations in templates or automation can leak API keys, expose model endpoints, grant excessive privileges to storage or compute, or route sensitive data through insecure pipelines. For AI services that rely on automated provisioning, these smells become avenues for data leakage, model tampering or unauthorised access.

Mitigations

Teams can act now by baking IaC checks into CI pipelines, treating linter rules as first-class tests and keeping human review in the loop for any LLM-generated rule. Key priorities include

Enable and enforce IaC linting in CI to catch template smells early.
Run secret scanning and rotate exposed credentials immediately.
Enforce least privilege for provisioned resources and review default network access.

Validation matters: the paper shows automated rules need human supervision to reach high precision. Organisations should treat new checks as guidelines until they are validated against their own codebase and threat model.

Limitations include the use of public GitHub data that may not reflect closed environments, and the non-deterministic behaviour of LLMs which requires careful prompt engineering and repeatable validation. Some ecosystems also posed practical barriers to compiling or deploying new rules.

Forward looking, the study pushes for continuous DevSecOps integration: automated IaC linting, targeted human validation and regular audits can reduce the long tail of infrastructure smells and shrink attack surface for AI deployments.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Security smells in infrastructure as code: a taxonomy update beyond the seven sins

Authors: Aicha War, Serge L. B. Nikiema, Jordan Samhi, Jacques Klein, and Tegawende F. Bissyande

Infrastructure as Code (IaC) has become essential for modern software management, yet security flaws in IaC scripts can have severe consequences, as exemplified by the recurring exploits of Cloud Web Services. Prior work has recognized the need to build a precise taxonomy of security smells in IaC scripts as a first step towards developing approaches to improve IaC security. This first effort led to the unveiling of seven sins, limited by the focus on a single IaC tool as well as by the extensive, and potentially biased, manual effort that was required. We propose, in our work, to revisit this taxonomy: first, we extend the study of IaC security smells to a more diverse dataset with scripts associated with seven popular IaC tools, including Terraform, Ansible, Chef, Puppet, Pulumi, Saltstack, and Vagrant; second, we bring in some automation for the analysis by relying on an LLM. While we leverage LLMs for initial pattern processing, all taxonomic decisions underwent systematic human validation and reconciliation with established security standards. Our study yields a comprehensive taxonomy of 62 security smell categories, significantly expanding beyond the previously known seven. We demonstrate actionability by implementing new security checking rules within linters for seven popular IaC tools, often achieving 1.00 precision score. Our evolution study of security smells in GitHub projects reveals that these issues persist for extended periods, likely due to inadequate detection and mitigation tools. This work provides IaC practitioners with insights for addressing common security smells and systematically adopting DevSecOps practices to build safer infrastructure code.

🔍 ShortSpan Analysis of the Paper

Problem

Infrastructure as Code IaC security smells are recurring patterns in IaC scripts that signal security weaknesses. This study revisits the seven sins taxonomy and extends it to seven popular IaC tools including Terraform Ansible Chef Puppet Pulumi Saltstack and Vagrant, to produce a comprehensive taxonomy of security smell categories and to assess how these patterns relate to risky configurations in automated AI service deployments. The authors combine an automated Large Language Model assisted workflow with thorough human validation to expand the taxonomy from seven to sixty two categories, and they demonstrate actionable mitigation by augmenting linters with new security checks for seven IaC tools. The work also finds that smells persist in GitHub projects, indicating gaps in detection and remediation tools, and discusses practical implications for adopting DevSecOps practices to build safer infrastructure code while highlighting exploitation risks such as exposure of AI endpoints credentials storage or data pipelines through insecure automation.

Approach

The methodology relies on open source IaC scripts from GitHub and uses security fix commits identified via tool specific keywords to assemble relevant code snippets. Snyk static analysis aids validation of the collected data, yielding a final dataset of 1050 verified smelly snippets (150 per tool type where possible). An LLM based workflow GPT-3.5 Turbo and GPT-4o is used to cluster patterns and generate descriptive labels, with two categorisation paths CWE oriented extractive mapping and generative LL M based labeling. All taxonomic decisions undergo rigorous human validation against CWE and OWASP guidelines, with external annotators providing additional review. The study expands detection rule sets for seven IaC tools and demonstrates actionability by implementing new security rules in linters for Ansible Terraform Chef Puppet Pulumi Vagrant Saltstack and Yaml Lint Bandit Rubocop ESLint Terrascan. A separate evolution study on 212 smelly code snippets across seven tools quantifies persistence of smells from 2019 to 2024. The work also documents a replication package containing prompts prompts prompts and results to ensure reproducibility.

Key Findings

472 security smells across sixty two CWE related categories were identified in IaC scripts, indicating broad vulnerability coverage across seven IaC tools.
The taxonomy expands from the original seven smells to sixty two categories and achieves high alignment with CWE categories; cross validation between CWE oriented and LL M generated labels yielded a 95 per cent consistency rate, supporting the validity of the LL M assisted approach.
Actionable mitigation is demonstrated by augmenting linters with new security rules for seven IaC tools; evaluation on an oracle dataset of two hundred twelve smelly snippets shows precision scores often reaching one point zero for manually validated rules, while LL M only validation yields lower precision, underscoring the need for human in the loop.

Additional observations reveal tool specific smell patterns with persistence over time. For example Chef scripts show a very high prevalence of outdated software version smells (74.9 per cent), while Puppet exhibits a high rate of code injection vulnerabilities (55.89 per cent). Across tools there are recurring issues in insecure dependency management and insecure input handling, with variations in prevalence by tool type. The results indicate that although IaC adoption is growing, automated detection and remediation must keep pace to reduce security debt.

Limitations

The study recognises several limitations. LLMs provide automated analysis but require systematic human supervision; outputs are non deterministic and replication relies on documented prompts and validation processes. The evaluation focuses on the Top Ten smells and may not fully cover all sixty two categories. Some tool ecosystems pose practical barriers to validating generated rules, notably Terrascan where produced rules were difficult to compile, limiting validation. Data are derived from open source GitHub projects which may not fully reflect closed source environments. The authors also note potential data leakage from pre training and emphasise that practitioner perception and real world usability of the smells and rules require further study.

Why It Matters

The work strengthens the security lens on Infrastructure as Code by showing that sixty two smell patterns across seven IaC tools can indicate risky configurations in automated AI deployments. Practical implications include tool wide lint checks and an automated LLM assisted analysis workflow with human validation to detect and remediate issues early in deployment pipelines. The findings stress that insecure IaC can expose AI endpoints credentials and data pipelines, risking data leakage model tampering or unauthorized access. Societal impact includes reducing large scale AI service outages privacy breaches and misuse stemming from insecure cloud configurations. In practice the taxonomy enables IaC practitioners to methodically identify common smells and adopt DevSecOps practices to improve infrastructure code security.

Attribution Original paper on arXiv