Passed with one manual review branch.
Correctness and policy compliance stayed above threshold.
Reviewed by Prometa QA · run at 2026-03-15T15:00:00Z
Testing lab
Testing Lab combines manual test runs, golden dataset checks, review scoring, execution telemetry, and release readiness checks in one bounded environment.
Correctness and policy compliance stayed above threshold.
Reviewed by Prometa QA · run at 2026-03-15T15:00:00Z
Review copy needs refinement before publish.
Reviewed by Claims Reviewer · run at 2026-03-15T16:30:00Z
Regression breach blocks release readiness.
Reviewed by Planning QA · run at 2026-03-15T18:00:00Z
Evaluation engine
Exact match · semantic similarity · rule-based validation · human review scoring.
Compare version vs version to prevent silent behavior drift before release.
Run agent_v1 vs agent_v2 or workflow branch vs branch under the same dataset.
Asynchronous evaluation runner supports multiple test cases, versions, and score summaries.
Release readiness
Tests passed: yes
Integrations healthy: yes
Governance approved: yes
Rollback path: available
Tests passed: yes
Integrations healthy: yes
Governance approved: yes
Rollback path: available
Tests passed: no
Integrations healthy: yes
Governance approved: no
Rollback path: missing
Execution trace
Trace `/trace-001` exposes orchestration steps, tool calls, decision points, latency, token cost, and suggested remediation without opening backend internals.
Scenario pack normalized
Forecast API
Peak cluster anomaly detected
1240 in · 318 out · $2.10
model 420 ms · tool 980 ms · orchestration 210 ms
Escalated to planner queue
Planner Review UI
Confidence below threshold
220 in · 91 out · $0.60
model 180 ms · tool 220 ms · orchestration 140 ms
Decision path stayed inside registered tools, applied runtime guardrails, and routed to review whenever confidence or policy state required intervention.
Planner scenario payload missing one required pricing field. Suggested fix: Add schema validation in input handler and inject default pricing floor before review.