KingdomWatch.dev Deterministic trust tooling for the AI-code era

AI wrote your code. Who checks the tests?

When AI migrates a test suite, checks quietly disappear while CI stays green. The same is true when people do it. KingdomWatch counts what your tests still actually assert, and hands you the list of what vanished. It's deterministic. No AI grading AI.

I run it on a real migration of yours and walk you through the report, including everything it can't tell you.

DeterministicNot heuristic
Evidence-firstReceipts, not vibes
AI-neutral0 AI in the verdict
1The Trap

Green CI is telling you a comforting lie.

A converter is paid to say "done." So is an LLM. A passing test run agrees with them. Neither one is built to tell you that the check you used to have is gone.

Migrating React tests from Enzyme to React Testing Library is mechanically easy now. A tool rewrites shallow() into render() all day long. The diff looks like a tidy library swap. Review approves it. CI goes green. Ship it.

Except sometimes the conversion quietly drops a check, or weakens an exact count into a vague "something is there." The test still runs. It still passes. And nothing, anywhere, tells you your coverage just shrank.

Assertion-survival is the one audit a converter structurally cannot run honestly on itself. That is exactly why it slips past even great teams, and why the checker can't be another model.

ci pipelinewhat you see
  • Build
  • Tests
  • Coverage
PASSED
What you don't see expectations · edge cases · negative tests · assertions. Vanishing quietly, one migration at a time.
Pixel art of an aged parchment scroll in a dark dungeon corridor, its edge dissolving into drifting pixel fragments as its writing vanishes.
checks vanish. confidence lies. risk grows in the shadows.
Adyen/adyen-web #3849a payments form
// before (Enzyme): exact count
expect(input).toHaveLength(1);

// after (RTL): presence only
expect(input).toBeTruthy();

// and this check just vanished:
expect(img.prop('alt')).toEqual('VISA')

// a wrong number of card inputs
// now passes the test. Still green.
2The Receipts

I measured it on real, merged, reviewed migrations.

Before trusting AI to do this at scale, I wanted a baseline: how much do careful humans lose, with full code review? I ran an assertion-counter over 12 real Enzyme to RTL pull requests from well-known open-source projects.

1 in 3
reviewed migration files silently dropped an assertion
7.6%
of all assertions quietly vanished
100%
of these suites stayed green & passed review
0
AI in the verdict. It just counts
135 genuine before/after test-file conversions · 9 repos incl. Kiali, Adyen, MetaMask, Foreman, RedisGrafana, AWX, react-plotly. Every number reproducible from the merge SHA.
What we measuredResult
Files that demonstrably dropped an assertion (strict)~35% (47/135)
Files with any dropped or weakened assertion (broad)~74% (100/135)
Assertions that survived the conversion92.4%
Assertions that quietly disappeared7.6%
Suites that went red to warn anyone0
Migrations that scored a clean 100% (not crying wolf)5

That's the human baseline, with real review. An AI-assisted batch migration runs the same mechanism, just faster and larger, with thinner review per file. Nothing about that direction makes the number smaller.

3How It Works

It counts. That's the whole trick, and the whole point.

A short quest. Deterministic. Repeatable. No model sits in the verdict path. Same input, same number, every run, reproducible by anyone from the same commit. In a world where AI writes the code and reviews the code, the trust layer can't be a third model's opinion.

Pixel art panorama: a knight walks a winding night path connecting four lit watchtowers, the final tower's gate glowing gold.
1

You bring the migration

The original suite and the converted one: a branch, a diff, a migration PR. Repo link or before/after, we fetch what we need.

2

We analyze both worlds

It pairs every test and counts effective assertions, before and after. Assertions, coverage, and intent.

3

We compare and prove

An intra-file shuffle can't hide a drop; a pre-existing weak test doesn't get blamed on the migration. What changed, what vanished, what matters.

4

You get the receipts

A worklist, not a verdict: which files lost assertions, which checks disappeared, and where an exact count became a loose truthy check. A human decides which of those losses actually matter.

The honest part

A smoke detector, not fireproofing.

This is a structural proxy, not proof. It counts assertion nodes before and after. A green grade means "no obvious loss detected." It never means "your coverage is safe." A dropped assertion might have been deliberately replaced by an equivalent check the counter can't pair; a surviving one can still pass for the wrong reason.

I lead with this on purpose. If I oversold it, you'd be right to ignore me. Every finding ships with what it can't tell you. A number you can trust is worth more than one that's trying to impress you.

4Free Migration Parity Pass

Start free. Pay only when you want the deep pass.

The finding is the gift. If you want a person to go through the list, decide which losses matter, and help restore the coverage, that's the paid audit. One fixed price, agreed up front, so there are no surprises.

Start here

Your Free Pass includes:

$0
  • Analyze one migration (before vs after)
  • Assertion parity diff: dropped, weakened, survived
  • Risk summary & recommendations
  • Human-reviewed report (no AI verdicts)
Get my free pass

No login, no credit card, no obligations. One migration per team. Bring the one you trust least.

When you want it fixed

Fixed-scope audit

One fixed price, agreed up front
  • Multi-area audit across your migration
  • Root cause & risk heatmap
  • Which coverage losses actually matter
  • Prioritized fix plan to restore coverage
  • Optional re-audit after fixes
Start the conversation

The free report doubles as the scope. You'll know exactly what you're paying for before you pay for it.

5CI Gate Roadmap

Make migrations safer over time.

The same deterministic check, running automatically on every migration PR, so a silent drop fails the build instead of merging green.

  • Automated parity checks in CI. Every push, every PR.
  • Fail builds on risky drops. The gate goes red where review goes blind.
  • Trend dashboards & alerts. Watch assertion health over time.
  • Policy-as-code for quality gates. Your rules, enforced the same way every run.
Push / PR» Run checks» Parity gate» Report
Status: in development. The audit service comes first, so the gate ends up enforcing what real migrations actually break, not hypotheticals.
Pixel art of a castle gatehouse at night, its raised portcullis revealing a glowing teal magical gate that everything must pass through.
6The Toolbag

Seven deterministic test-engineering analyzers. Built for truth.

One counts assertions; the rest watch the other ways a suite lies.

Pixel art of a candlelit armory-library shelf holding scales, swords, potions, scrolls, a magnifying glass, and a certain dwarf's beloved beer mug.
kw parityMigration ParityDoes your converted suite still assert what it used to? Assertion & intent parity across migrations. (the one above)CORE
kw cypressTestBridge · CypressCypress to Playwright migration confidence report: analyzer & risk detector.E2E
kw protractorTestBridge · ProtractorProtractor to Playwright, for the framework that's already EOL. Analyzer & migration helper.LEGACY
kw flakeFlakeWatchStatic flakiness-risk grader. Catches the risk before a test starts failing at random.QUALITY
kw auditSuiteAuditGrades the "test smells" that make a suite green but worthless. Human-reviewed audit & evidence reporting.REPORTS
kw investigateIncident InvestigatorDeterministic on-call triage: alert in, ranked verdict + proof trail out. Deep dives into diffs, gaps & mismatches.FORENSICS
kw ci-truthCI-TruthYour tests exist. Does CI actually run them? Finds the escape hatches that make a pipeline impossible to fail: skipped globs, continue-on-error, zeroed coverage gates.PIPELINE
7Bring Us Your Migration

Bring me a migration. I'll show you what it dropped.

A repo link, a PR, or just the before/after. You'll get an honest report and a walkthrough. No pitch until you ask for one.

  • 100% free. No strings.
  • Human-reviewed. No AI verdicts.
  • Actionable in days, not weeks.
  • Built by an engineer, sold with receipts.
Pixel art of the knight at a candlelit writing desk, helmet off, drafting your report, his dwarf friend's beer mug within reach.
your report gets written by a person. the mug belongs to a friend.

Prefer email? hello@kingdomwatch.dev