Part of Forge DevKit ecosystem

◇ forge-ab

Test with rigor, not hunches

Get Complete — €149 → → one-time

The problem

Tests launched without statistical rigor

Team runs A/B test for 3 days, declares a winner. Sample size: 47 visitors. That's noise, not signal.

No pre-committed hypothesis

Change button color, measure everything, find something significant. Classic p-hacking disguised as experimentation.

Test results don't get documented

Nobody remembers what you tested last quarter. Same experiments get repeated. Learnings evaporate.

How it works

Install

One command adds forge-ab to your environment.

forge install forge-ab

Configure

3-gate wizard reads analytics context and establishes experimentation principles.

Experiment

Structured hypothesis, pre-committed sample sizes, isolated variables, documented results.

Mode: hypothesis / design / analyze

Learn

Every test produces a structured doc: hypothesis, result, confidence level, and next action. Win or lose, it's searchable.

Key capabilities

◇3 experiment modes

Hypothesis (structured if/then/because), design (sample size + duration calc), analyze (significance test + documented learning).

◇Sample size pre-commitment

Calculate required sample size before launch. No early stopping, no p-hacking.

◇4 psychology biases

Anchoring to first results, confirmation bias in analysis, novelty effect - surfaced as experiment warnings.

◇Documented learnings

Every experiment produces structured documentation. Win or lose, knowledge compounds.

Sample output

A real-world example of what this module produces.

forge:ab design - Experiment Spec

◆ Experiment: Pricing Page CTA Color

Hypothesis: Changing CTA from blue to green increases clicks by 10%
Metric:     CTA click-through rate on /pricing
Guardrail:  Bounce rate must not increase by > 5%

Design:
  Control:  Blue CTA (#2563EB)    50% traffic
  Variant:  Green CTA (#16A34A)   50% traffic
  Sample:   3,200 visitors        (MDE 10%, power 80%)
  Duration: ~14 days              at current traffic

Pre-commit: Decision logged before results - no peeking

Who is this for

Product Manager

Run statistically rigorous experiments with pre-committed hypotheses and sample sizes.

Growth Lead

Document every experiment result - wins and losses compound into organizational knowledge.

Data-Driven Developer

Get concrete experiment specs with sample size calculations instead of gut-feel testing.

forge-ab vs Ad-hoc A/B testing

Dimension	Ad-hoc A/B testing	Forge DevKit
Statistical rigor	Run for a week, pick the winner	Pre-committed sample size, significance threshold
Hypothesis	Change it, measure everything, find something significant	Structured: If [change] then [metric] because [reason]
Knowledge retention	Results in a Slack thread, then forgotten	Documented learnings that compound across experiments

Works with

◇forge-analytics

Experiment events integrate with the analytics schema and measurement plan

◇forge-product

Product context helps prioritize high-impact pages for testing

Get started with forge-ab

Get Complete — €149 → →

See all modules →