Overview: Decision-Level Observability for Autonomous Agents
Autonomous AI agents make hundreds of decisions in a single run. They select tools, interpret results, escalate permissions, branch logic paths, and interact with external systems — all without human intervention. When something goes wrong, a simple success/failure log is not enough. You need to understand why the agent made each decision and what context drove it there.
The Run Trace Analyzer provides decision-level observability for every autonomous agent run in your environment. Rather than treating agent executions as opaque black boxes, it captures a structured trace of every step: the action taken, the reasoning behind it, the tools invoked, the permissions exercised, and the outcome produced. This gives security teams, compliance officers, and platform engineers a complete forensic record of agent behaviour.
Traditional application logs tell you what happened. Run traces tell you what the agent was thinking when it happened — and whether that thinking was sound.
The analyzer serves two audiences. Security teams use it to detect anomalous agent behaviour in real time — permission escalation, data exfiltration patterns, unexpected tool sequences. Engineering teams use it post-run to debug failures, validate agent logic, and ensure compliance with organisational policies. Both benefit from a single, structured trace format that is machine-parseable and human-readable.
Trace Format
Every agent run produces a structured JSON trace document. The format is designed for both automated analysis and manual review, with each decision point captured as a discrete step within an ordered array. Here is a representative trace from a production triage agent:
{
"run_id": "run-a4f8c312-9e7b",
"agent_id": "prod-triage-au-east",
"trigger": "alert:cpu-spike-web-03",
"started_at": "2026-03-05T14:22:01.440Z",
"completed_at": "2026-03-05T14:22:38.112Z",
"status": "completed",
"policy_version": "v2.4.1",
"steps": [
{
"step_index": 0,
"action": "assess_alert",
"reasoning": "Incoming CPU spike alert on web-03. Checking current metrics to determine severity before escalating.",
"tool_call": {
"tool": "metrics.query",
"parameters": {
"host": "web-03",
"metric": "cpu.usage",
"window": "5m"
}
},
"result": {
"status": "success",
"data": { "avg_cpu": 94.2, "peak_cpu": 99.1 }
},
"permissions_used": ["metrics:read"],
"timestamp": "2026-03-05T14:22:02.881Z",
"duration_ms": 1441
},
{
"step_index": 1,
"action": "identify_process",
"reasoning": "CPU at 94.2% average over 5 minutes is critical. Identifying the top consuming process to determine if this is expected load or anomalous.",
"tool_call": {
"tool": "ssh.exec",
"parameters": {
"host": "web-03",
"command": "ps aux --sort=-%cpu | head -5"
}
},
"result": {
"status": "success",
"data": { "top_process": "node /app/worker.js", "cpu_pct": 91.3 }
},
"permissions_used": ["ssh:exec:read-only"],
"timestamp": "2026-03-05T14:22:05.220Z",
"duration_ms": 2339
},
{
"step_index": 2,
"action": "restart_service",
"reasoning": "worker.js is the primary consumer. Process has been running for 72 hours without restart, likely a memory leak causing CPU pressure. Restarting the service per runbook SOP-114.",
"tool_call": {
"tool": "service.restart",
"parameters": {
"host": "web-03",
"service": "app-worker"
}
},
"result": {
"status": "success",
"data": { "new_pid": 48291, "cpu_after": 12.4 }
},
"permissions_used": ["service:restart", "ssh:exec:write"],
"timestamp": "2026-03-05T14:22:14.005Z",
"duration_ms": 8785
}
],
"permissions_summary": {
"granted": ["metrics:read", "ssh:exec:read-only", "service:restart", "ssh:exec:write"],
"denied": [],
"escalated": ["ssh:exec:write"]
}
}
reasoning field that captures the agent's internal rationale. This is not optional metadata — it is the primary audit artifact. Without it, a trace is just a log. With it, a trace is evidence of intent.
The permissions_summary block at the top level aggregates all permissions exercised during the run, making it straightforward for policy engines to cross-reference against the agent's assigned permission boundary. Any permission that was escalated (requested beyond the agent's baseline grant) is flagged explicitly.
Anomaly Detection
The Run Trace Analyzer applies a set of heuristic and rule-based checks against each trace to surface suspicious behaviour. These checks run both in real-time (as steps are emitted) and during post-run analysis. The following categories of anomaly are detected:
Permission escalation. The analyzer compares every permissions_used entry against the agent's configured permission boundary. If an agent with read-only SSH access suddenly exercises ssh:exec:write, this is flagged as an escalation event. Escalations are not inherently malicious — some agents are allowed to request elevated permissions through approval workflows — but every escalation is recorded and scored.
Unusual tool sequences. Agents tend to follow predictable patterns: query metrics, then read logs, then take action. The analyzer builds a baseline tool-sequence profile for each agent and flags deviations. If a deployment agent suddenly invokes a database export tool it has never used before, that deviation is surfaced with the sequence context that led to it.
Excessive retries. An agent that retries the same tool call more than a configurable threshold (default: three attempts) may be brute-forcing credentials, probing for access, or stuck in a failure loop. The analyzer distinguishes between retries with identical parameters (likely a bug or connectivity issue) and retries with mutating parameters (potentially adversarial).
Data exfiltration patterns. The analyzer watches for sequences that resemble data staging and extraction: large data reads followed by writes to external endpoints, encoding operations, or file transfers to destinations outside the organisation's trusted network list. These patterns are matched against a configurable ruleset that accounts for legitimate ETL workflows.
Deviation from expected behaviour. Each agent can be associated with a behavioural policy — a declarative specification of what the agent should and should not do. The analyzer evaluates each step's action and reasoning against this policy and flags contradictions. For example, if a triage agent's policy states "never delete resources in production" but the trace shows a resource.delete call targeting a production host, the deviation is flagged regardless of whether the call succeeded.
Anomaly detection is not about blocking agents in the moment — that is the job of the policy engine. Trace analysis is about building the evidentiary record that tells you whether your policy engine is working as intended.
Analysis Modes
The Run Trace Analyzer supports two distinct operational modes, each suited to different workflows.
Real-time monitoring connects to the agent runtime's event stream and analyses steps as they are emitted. This mode is designed for security operations centres and on-call engineers who need immediate visibility into what agents are doing right now. Real-time monitoring surfaces anomalies within seconds of occurrence and can trigger alerts via webhook, Slack, PagerDuty, or any configured notification channel. It is particularly valuable during incident response, when agents may be operating under elevated permissions and reduced oversight.
Post-run analysis operates on completed trace files. This mode is used for compliance auditing, periodic security reviews, and forensic investigation. Post-run analysis can apply more computationally expensive checks — cross-referencing traces against historical baselines, running behavioural policy evaluations, and generating detailed reports. It also supports batch analysis across hundreds or thousands of traces, enabling trend detection and fleet-wide behavioural profiling.
CLI Usage
The SecuRight CLI provides direct access to trace analysis from the terminal. The two primary commands are trace analyze for post-run analysis and trace watch for real-time monitoring.
Analyse a completed trace file:
$ securight trace analyze run-12345.json
Run Trace Analysis: run-12345
Agent: prod-triage-au-east
Duration: 36.67s (12 steps)
Status: completed
Permissions:
Baseline: metrics:read, ssh:exec:read-only
Used: metrics:read, ssh:exec:read-only, service:restart, ssh:exec:write
Escalated: service:restart, ssh:exec:write
Anomalies:
[WARN] Step 2: Permission escalation — ssh:exec:write not in baseline grant
[WARN] Step 2: Permission escalation — service:restart not in baseline grant
[INFO] Step 7: Retry detected — metrics.query called 2x with identical params
Behavioural Policy: SOP-114 (triage-standard)
[PASS] No prohibited actions detected
[PASS] All tool calls within approved tool list
[WARN] Escalated permissions require manual approval audit
Summary: 2 warnings, 0 critical — review escalation approvals
Watch a running agent in real time:
$ securight trace watch --agent prod-triage --format compact
Watching agent: prod-triage-au-east (run-c891f...)
14:22:02 [step 0] assess_alert metrics.query OK metrics:read
14:22:05 [step 1] identify_process ssh.exec OK ssh:exec:read-only
14:22:14 [step 2] restart_service service.restart OK service:restart [ESCALATED]
14:22:19 [step 3] verify_recovery metrics.query OK metrics:read
14:22:25 [step 4] close_alert alerting.resolve OK alerts:write
^C
Disconnected. Partial trace saved to: ~/.securight/traces/run-c891f-partial.json
Batch analysis across a directory of traces:
$ securight trace analyze ./traces/2026-03/ --batch --output report.json
Analysing 342 trace files...
Critical anomalies: 3
Warnings: 47
Clean runs: 292
Report written to: report.json
Dashboard Concepts
The trace dashboard provides a visual interface for exploring agent runs. It is organised around four primary views.
Timeline view. A horizontal timeline showing every step in a run, colour-coded by action type. Each step is a clickable node that expands to reveal the full step detail: reasoning, tool call parameters, result data, and permissions. Steps flagged with anomalies are highlighted with a warning or critical indicator. The timeline makes it immediately obvious where an agent spent most of its time and where the decision flow branched or stalled.
Decision tree. For agents that evaluate multiple options before selecting an action, the decision tree view renders the branching logic visually. Each node represents a decision point where the agent considered alternatives, and the selected branch is highlighted. Rejected branches are shown in muted tones with their rejection reasoning displayed on hover. This view is invaluable for understanding why an agent chose one remediation strategy over another.
Permission heatmap. A matrix view with agents on one axis and permissions on the other. Cell intensity indicates frequency of use over a selected time window. This view reveals which agents are exercising which permissions most frequently and helps identify agents that have been granted permissions they never use (indicating over-provisioning) or agents that frequently escalate the same permission (indicating a baseline grant that needs updating).
Anomaly highlights. A filtered feed showing only flagged steps across all recent runs. Each entry links directly to the relevant step in the full trace view. Anomalies are categorised by severity and type, and the feed supports filtering by agent, anomaly type, time range, and resolution status. Security teams typically use this as their primary monitoring view during business hours.
Forensic Use Cases
Run traces become critical evidence in three primary forensic scenarios.
Incident investigation. When an autonomous agent causes an outage, corrupts data, or interacts with a compromised system, the trace provides a step-by-step reconstruction of exactly what happened. Investigators can follow the agent's reasoning chain from the initial trigger through every decision point to the final outcome. Because each step includes the agent's stated reasoning alongside the actual action taken, investigators can determine whether the agent behaved as designed (a policy failure) or deviated from its expected behaviour (a potential compromise or software defect). In regulated Australian industries such as financial services and healthcare, this level of reconstruction is increasingly expected by regulators.
Compliance auditing. Frameworks like the Australian Privacy Act, APRA CPS 234, and the Essential Eight require organisations to demonstrate control over automated systems that access sensitive data. Run traces provide the audit trail that proves agents only accessed data they were authorised to access, only performed actions within their approved scope, and operated under appropriate oversight. The batch analysis mode enables auditors to review thousands of runs systematically, generating compliance reports that map agent behaviour to specific control requirements.
Agent behaviour validation. Before promoting an agent to production or expanding its permission boundary, engineering teams use trace analysis to validate that the agent behaves correctly across a range of scenarios. By analysing traces from staging or shadow-mode runs, teams can verify that the agent's decision-making aligns with organisational expectations. This validation process catches subtle issues that unit tests miss — for instance, an agent that technically follows its policy but makes decisions in an inefficient or risky order that could lead to problems under production conditions.
A trace is not just a debugging tool. It is the organisational memory of every decision an autonomous agent has ever made. Treat it accordingly — store it securely, retain it according to policy, and index it for rapid retrieval.