Part of Forge DevKit ecosystem

forge-qa

Tests that trace to requirements

The problem

AI writes tests that test nothing

Unit tests are fake. Displays use mock data. Tests pass but don't verify actual behavior.

No traceability to requirements

You can't tell which test covers which acceptance criterion. Gaps are invisible.

Test strategy is an afterthought

AI generates random tests. No coverage plan, no prioritization, no framework consistency.

How it works

1

Setup

Test auditor scans your project: framework, patterns, coverage tooling, maturity level.

2

Generate

From product artifacts or code analysis — unit, integration, component, E2E, and acceptance tests.

/forge:qa test authentication
3

Trace

4-level traceability: AC→unit, UC→E2E, UX→component. Every test maps to a requirement.

4

Judge

LLM-as-Judge evaluates test quality against rubrics. Catches fake mocks and meaningless assertions.

Key capabilities

4-level traceability

AC→unit tests, UC→E2E tests, UX→component tests, LLM-as-Judge for quality.

8+ test frameworks

Vitest, Jest, Playwright, Cypress, Testing Library, Supertest, and more. Auto-detected.

LLM-as-Judge

Rubric-based evaluation catches fake tests, meaningless mocks, and missing edge cases.

Product artifact integration

When forge-product artifacts exist, tests generate from requirements. Without them, code analysis.

10 execution modes

Unit, integration, component, E2E, acceptance, coverage, plan, generate, quality, upgrade.

forge-qa vs Virtuoso / TestSprite

See the full comparison for details.

Dimension Virtuoso / TestSprite Forge DevKit
Test source AI guesses from code Traces to acceptance criteria and use cases
Quality check None — tests just need to pass LLM-as-Judge evaluates against rubrics
Coverage map Line coverage only Requirement-level traceability matrix
Get Forge →