The problem
Prompts drift across sessions
Same question, different answers. No consistent framework. Each session reinvents the prompt wheel.
No way to test prompt quality
You change a system prompt and hope it still works. No regression tests, no quality metrics.
Prompt knowledge stays in one person's head
The developer who wrote the prompt leaves. Nobody knows why it's structured that way.
How it works
Install
One command adds forge-prompts to your environment.
Configure
3-gate wizard detects your LLM stack, establishes prompt principles, and selects frameworks (CO-STAR, RISEN, TIDD-EC).
Manage
Inventory all prompts, audit against principles, review for quality, test for regressions.
Evolve
Learning loop captures findings from audits and tests. Principles improve automatically over time.
Key capabilities
◇5 operational modes
Inventory, audit, review, test, evolve. Full lifecycle management for every prompt in your project.
◇3 prompt frameworks
CO-STAR (context-structured), RISEN (role-based), TIDD-EC (task-decomposed) - or define your own. Each enforces a different prompt architecture.
◇Regression testing
LLM-as-judge tests ensure prompt changes don't break existing behavior. Integrated with forge-qa.
◇5 psychology biases
Anchoring to first drafts, confirmation bias in test evaluation, sunk cost on failing prompts, authority bias toward vendor examples, and framing effects in A/B prompt comparison.
◇Learning loop
Audit findings become new principles automatically. After 3 cycles, your prompt guidelines reflect real project patterns, not generic best practices.
Sample output
A real-world example of what this module produces.
◆ Prompt Audit - acme-web
File Framework Score Issues
prompts/generate-summary.md CO-STAR 9/10 -
prompts/classify-ticket.md RISEN 6/10 missing negative examples
prompts/draft-email.md none 3/10 no role, no output format
Total: 3 prompts | 1 passing | 1 warning | 1 failing Who is this for
AI Engineer
Manage and version-control prompts with frameworks, audit trails, and regression tests.
Developer Using LLM APIs
Stop ad-hoc prompt writing - get structured frameworks and automated quality checks.
Team Lead
Standardize prompt engineering across the team with shared principles and learning loops.
forge-prompts vs Manual prompt engineering
| Dimension | Manual prompt engineering | Forge DevKit |
|---|---|---|
| Prompt management | Scattered across files, no inventory | Full catalog with principles and frameworks |
| Quality assurance | Manual spot-checking | Automated audit + LLM-as-judge regression tests |
| Knowledge retention | In developer's head | Documented principles with learning loop evolution |
| Consistency | Each prompt written ad-hoc | Framework-guided with team-wide principles |