Observability
Prometheus metrics, status fields, and monitoring the Remediator Agent in production.
Prometheus Metrics
The Remediator Agent exposes Prometheus metrics at the controller manager’s metrics endpoint.
Available Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
remediator_reconciles_total | Counter | result="success|error" | Total number of reconciliation runs |
remediator_reconcile_duration_seconds | Histogram | result="success|error" | Duration of each reconciliation run |
Enable ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: go-agent-remediator-metrics
namespace: go-agent-remediator-system
spec:
selector:
matchLabels:
control-plane: controller-manager
endpoints:
- port: https
path: /metrics
scheme: https
tlsConfig:
insecureSkipVerify: true
Access Metrics Directly
kubectl -n go-agent-remediator-system port-forward \
deploy/go-agent-remediator-controller-manager 8443:8443
SA=go-agent-remediator-controller-manager
NS=go-agent-remediator-system
TOKEN=$(kubectl -n $NS create token $SA)
curl -k -H "Authorization: Bearer $TOKEN" https://localhost:8443/metrics
Example Queries
# Success rate over the last hour
sum(rate(remediator_reconciles_total{result="success"}[1h]))
/ sum(rate(remediator_reconciles_total[1h]))
# P95 reconciliation latency
histogram_quantile(0.95,
sum by (le) (rate(remediator_reconcile_duration_seconds_bucket[1h]))
)
```yaml
---
## Remediator Status
The `Remediator` resource reports detailed status about each run.
```bash
# View full status
kubectl get remediator remediator-argo-hub -n nirmata -o yaml
# View just the last run summary
kubectl get remediator remediator-argo-hub -n nirmata \
-o jsonpath='{.status.lastRunSummary}' | jq
Status Fields
| Field | Description |
|---|---|
phase | Current operational phase: Running, Idle, or Failed |
lastScheduleTime | When the last remediation was scheduled |
lastSuccessfulTime | When the last successful run completed |
nextScheduledTime | When the next run is scheduled |
conditions | Step-by-step workflow tracking with collector information |
lastRunSummary.startTime / endTime | Run duration timestamps |
lastRunSummary.status | Success or failure |
lastRunSummary.message | Human-readable outcome |
lastRunSummary.targetsProcessed | Number of targets scanned |
lastRunSummary.violationsFound | Total violations discovered |
lastRunSummary.remediationPlans | Number of AI-generated plans produced |
lastRunSummary.actionsExecuted | Number of actions taken (PRs created, etc.) |
lastRunSummary.errors | Any errors encountered |
Example Status Query
kubectl get remediator remediator-argo-hub -n nirmata \
-o jsonpath='{.status.lastRunSummary}' | jq '{
status: .status,
violations: .violationsFound,
plans: .remediationPlans,
actions: .actionsExecuted,
errors: .errors
}'
```bash
---
## Logs
```bash
# Follow live logs
kubectl logs -n nirmata -l app.kubernetes.io/name=nirmata-agent -f
# Last 100 lines
kubectl logs -n nirmata -l app.kubernetes.io/name=nirmata-agent --tail=100
```yaml
---
## Support Matrix
| Component | Supported |
|-----------|-----------|
| **Kubernetes** | All CNCF-compliant distributions v1.20+, including on-prem |
| **AI providers** | Nirmata AI (default), AWS Bedrock, Azure OpenAI |
| **GitOps** | ArgoCD |
| **VCS** | GitHub (App & PAT), GitLab (Enterprise & SaaS) |
| **Manifests** | YAML files, simple Helm charts |