Detection engineering

Wazuh Detection Harness: automated alert validation per ATT&CK technique

A Python tool that queries the Wazuh Indexer REST API after Atomic Red Team tests and tells you — per technique — whether your detection rules actually fired. Not "does the rule exist in the config." Does it produce an alert when the attack runs. That's a different question. This harness answers it.

Context

A rule can be syntactically valid, deployed to Wazuh, and still never fire in practice. The logsource might not match. The Sysmon config might not capture the required event. The agent might not have the right policy. None of that is visible from the rule file alone. The only way to know a detection works is to trigger the attack, look for the alert, and document whether it appeared. This harness automates that verification step — write a YAML spec, run the harness, get a pass/fail report with evidence artifacts for every test.

Problem

Language: Python 3
Input: expected_detections.yaml — per-test specs with rule ID, rule groups, MITRE technique IDs, and must-contain strings
Query target: Wazuh Indexer (OpenSearch) via REST API — wazuh-alerts-4.x-* index
Auth: environment variables (WAZUH_INDEXER_HOST, WAZUH_INDEXER_USER, WAZUH_INDEXER_PASS)
Output: timestamped run folder with report.md, matches.json, query_debug.json
Exit code: non-zero if any expected detection fails — CI-compatible

Approach

tests:
  - test_name: ART_T1110_001_password_guessing
    platform: windows
    agent_name: WINDOWS-PRIMARY
    lookback_minutes: 30
    expected:
      rule_id: "60204"
      rule_groups:
        - authentication_failures
      must_contain:
        - "logon"

  - test_name: ART_T1003_001_lsass_dump
    platform: windows
    agent_name: WINDOWS-PRIMARY
    lookback_minutes: 30
    expected:
      rule_groups:
        - credential_access
      mitre_ids:
        - T1003
      must_contain:
        - "lsass"

Evidence

export WAZUH_INDEXER_HOST='localhost'
export WAZUH_INDEXER_USER='admin'
export WAZUH_INDEXER_PASS='[REDACTED_INTERNAL]'
export WAZUH_TLS_INSECURE='true'

# From project root — runs Atomic tests on the endpoint first,
# then call the harness to check whether detections fired:
./scripts/new_run.sh

# Output: run_02-19-2026_HHMMSS/
#   report.md        — pass/fail per technique
#   matches.json     — raw alert matches
#   query_debug.json — exact OpenSearch queries sent

Outcome

Run from 2026-02-19, 3 tests against the primary Windows endpoint:

Run: run_02-19-2026_034113
Result: 0/3 PASS

| Test                           | Status | Expected                          | Matches |
|---                             |---     |---                                |---      |
| ART_T1110_001_password_guessing | FAIL  | rule_id=60204; groups=auth_fail   | 0       |
| ART_T1059_001_powershell        | FAIL  | groups=powershell; mitre=T1059    | 0       |
| ART_T1003_001_lsass_dump        | FAIL  | groups=credential_access; T1003   | 0       |

0/3 PASS is useful information. It tells you the lookback window was too narrow, the Atomic test artifacts didn't match the detection logic, or the Sysmon/agent config isn't capturing these events. That's a detection gap analysis, not a failure. A SOC that can run this harness and get 0/3 is in a better position than a SOC with no harness — because it now knows specifically where coverage is missing.

The SOC report from the same sprint (a separate detection validation tool) achieved 1/5 PASS — T1110 password guessing fired rule 60204 on the primary Windows endpoint at exactly 2026-02-19T02:20:25.678+0000, alert ID 1771467625.2810738. That match was captured and preserved.

Why 0/3 is worth showing

Detection engineering portfolios often show only the successes. This shows the tooling, the methodology, and the honest result. 0/3 in a test run is a detection gap report. The value is in the process:

The harness exists and is runnable
The spec is version-controlled and reproducible
The run artifacts are timestamped and preserved
The gaps are now addressable — tune detection rules and re-run

Lessons + next hardening step

No active response — harness only reads from Wazuh Indexer, never writes or modifies the environment
No Atomic execution — harness validates detections, does not run attacks; separation of concerns
CI-compatible exit codes — non-zero on any FAIL; can gate CI pipelines on detection coverage
Timestamped run folders — each run is isolated; history is preserved for trend analysis
YAML spec — version-controlled, reviewable, extensible without modifying Python code
TLS insecure flag — explicit opt-in for lab self-signed certs; not default

View project (GitHub) Run report artifact SOC integration case study