The scale nobody talks about
GitGuardian’s 2024 report identified over 12.8 million new secrets exposed in public GitHub repositories in a single year. This includes API keys, database passwords, cloud access tokens, private SSH keys. Each of these is a potential entry point into infrastructure that was never meant to be public.
But public repositories are only the tip of the iceberg. In private repositories — where most production code lives — the problem is equally severe, just less visible. The difference is that a leaked secret in a private repo stays there until someone actively looks for it. And most organizations don’t look.
How secrets end up in pipelines
The mechanisms are prosaic. A developer needs to test a deployment — hardcodes a token in a workflow file, intending to move it to secrets later. “Later” never comes, the file gets committed, the secret lives in Git history indefinitely. Even if the file is later edited to remove the token — Git remembers everything.
Another mechanism: debug output. A pipeline fails, the developer adds echo $SECRET_VALUE to diagnose the issue. The debug line stays after the fix. Or the CI platform’s masking fails for a specific format of the secret. Or the secret appears in an error message from a tool that doesn’t know it should be masked.
Third mechanism: environment variable leakage. The pipeline sets a secret as an environment variable, then runs a third-party tool that logs all environment variables as part of its diagnostic output. The secret appears in the build log — visible to anyone with read access to the repository.
Why traditional approaches fail
Manual code review doesn’t catch secrets reliably. Reviewers focus on logic and functionality. A base64-encoded token embedded in a YAML file looks like configuration, not a security vulnerability. And even if caught — git history retention means the secret was already committed.
Periodic scanning (quarterly, annually) finds secrets that have been exposed for months. By the time the scan runs, the secret may have been harvested, used, and the damage done. The window between exposure and detection is the attacker’s advantage.
What actually works
Pre-commit hooks that scan for secret patterns before code enters the repository. Pipeline-integrated scanners (TruffleHog, GitLeaks, GHAS secret scanning) that run on every push and block merges when secrets are detected. These tools use pattern matching, entropy analysis and known credential formats to identify secrets with high accuracy.
The critical factor: these must be mandatory gates, not optional checks. A scanner that runs but doesn’t block is a monitoring tool, not a prevention mechanism. The difference matters — monitoring tells you about the problem after it happened. Prevention stops it from happening.
Read also: