Part of Forge DevKit ecosystem
◇ forge-qa
Tests that trace to requirements
The problem
AI writes tests that test nothing
Unit tests are fake. Displays use mock data. Tests pass but don't verify actual behavior.
No traceability to requirements
You can't tell which test covers which acceptance criterion. Gaps are invisible.
Test strategy is an afterthought
AI generates random tests. No coverage plan, no prioritization, no framework consistency.
How it works
Setup
Test auditor scans your project: framework, patterns, coverage tooling, maturity level.
Generate
From product artifacts or code analysis — unit, integration, component, E2E, and acceptance tests.
Trace
4-level traceability: AC→unit, UC→E2E, UX→component. Every test maps to a requirement.
Judge
LLM-as-Judge evaluates test quality against rubrics. Catches fake mocks and meaningless assertions.
Key capabilities
◇4-level traceability
AC→unit tests, UC→E2E tests, UX→component tests, LLM-as-Judge for quality.
◇8+ test frameworks
Vitest, Jest, Playwright, Cypress, Testing Library, Supertest, and more. Auto-detected.
◇LLM-as-Judge
Rubric-based evaluation catches fake tests, meaningless mocks, and missing edge cases.
◇Product artifact integration
When forge-product artifacts exist, tests generate from requirements. Without them, code analysis.
◇10 execution modes
Unit, integration, component, E2E, acceptance, coverage, plan, generate, quality, upgrade.
forge-qa vs Virtuoso / TestSprite
See the full comparison for details.
| Dimension | Virtuoso / TestSprite | Forge DevKit |
|---|---|---|
| Test source | AI guesses from code | Traces to acceptance criteria and use cases |
| Quality check | None — tests just need to pass | LLM-as-Judge evaluates against rubrics |
| Coverage map | Line coverage only | Requirement-level traceability matrix |