A development copilot built to change behavior, not just deliver content.
I spotted where talent development stalls, then scoped, built, and evaluated a working applied-AI fix, solo, before a single engineer was involved. Real RAG, a real eval harness, and a documented path to production. Live prototype below.
The actual working app, embedded live. Open it full screen to ask a real question and watch it retrieve, cite, refuse the out-of-scope ones, and follow up.
Teams have good content and no signal that it changed anything. The opening: an assistant that meets people at the moment of need, and closes the loop on whether the advice actually landed.
most ai tools stop at delivering content. greenroom is built around the part they skip: behavior change.
An employee asks a real question in plain words. Greenroom answers from a curated people development library using real retrieval (RAG) with citations, or an honest "not covered". It proposes a verb-first next step, then follows up with a "did it land?" check that records the behavior, not just the click. Built solo with Claude Code across six surfaces.
A real question at the moment of need. No course catalog to dig through, no keyword guessing.
Answers come from the curated library with citations. Outside it, the answer is "not covered", not a guess.
Every response ends in one concrete action the person can actually do, not a reading list.
It checks back and records the behavior, not just the click. Kirkpatrick Level 3, built into the flow.
Hosted Voyage embeddings over a governed library, similarity retrieval with a refusal threshold. An agentic loop runs underneath: a cheaper model triages and routes each question, answer, clarify, or refuse, before a stronger model writes the grounded response. Every answer exposes its own wiring on a run sheet.
Hosted Voyage embeddings over a governed library. Similarity retrieval with a refusal threshold, so weak matches get declined instead of faked.
A cheaper model triages and routes each question before a stronger model writes the answer. Right-sized compute per step.
Every answer shows how it was produced: what was retrieved, how it routed, what it grounded against. Nothing hidden.
A labeled evaluation suite runs all 22 cases through the real pipeline: 14 in-scope, 4 out-of-scope, 4 adversarial and prompt-injection. Not vibes, a harness.
Two grounding checks on purpose: 100% deterministic citation integrity in code, plus an independent judge model that scores how well each answer traces to its sources, a mean of 92/100. So the system never grades its own work.
A prototype that knows what production would take.
Spotting the highest-leverage opportunity, scoping it, and building a working prototype solo, before engaging a single engineer. Validating it with a real eval harness. Defining the production handoff: cost and latency per answer, the evals as a release gate, monitoring, a clear buy-vs-build line, and a privacy and fairness review from the start. This is the space between strategy and engineering, owned end to end. Outcomes, not outputs.
A working prototype. Ask it a real question, watch it retrieve, refuse the out-of-scope ones, and follow up on whether the advice landed.
open the live demo view the code on github →