AI does the work. You supervise until it earns your trust.

A framework for verifiable agentic workflows. The AI reads documents, extracts structured data, proposes a plan, and waits for human sign-off — then gradually earns autonomy through measured performance.

The Bottleneck

Someone reads a document and types the important parts into a system. Every industry, every department, same story.

The AI Problem

AI can read those documents — but it doesn't give you the same answer twice. That's not good enough when the output triggers real consequences.

Apprentice

A framework that treats AI like an apprentice — it works under supervision, proves competence through measurement, and gradually earns autonomy.

Before — Manual Entry
Policy Configuration Form
Policy Configuration Draft
Basic Information
Standard Attendance Policy 2025
TechCorp Industries
US - East
03/08/2026
Optional
Point System Configuration
12
12 months
Yes
Violation Rules & Points + Add Rule
Tardiness (Late Arrival)
7 min
0.5
1.0
Early Departure
15 min
0.5
Pro-rate
No Call / No Show
2.0
3
Terminate
Unexcused Absence
1.0
4
1.5x
Escalation Levels
Level 1 — Verbal Warning
Trigger: 3 pts
Level 2 — Written Warning
Trigger: 6 pts
Level 3 — Final Warning
Trigger: 9 pts
Level 4 — Termination
Trigger: 12 pts

Read a 20-page policy doc, then fill out every field by hand — rules, points, thresholds, escalation levels, one by one.

After — AI Pipeline
fairplay.app/dashboard
Playground Safe sandbox — no production data
Upload HR-Policy-2025.pdf
Step 1 — Extracted Keywords gpt-4o-mini 847 tok
attendance_policy tardy > 7min 0.5 pts no_call_no_show 2.0 pts excused_absence rolling_12mo early_departure escalation_4lvl
Human Review: Good Partial Bad
Step 2 — Ingestion Plan 87% conf claude-sonnet Plain Text

1. INSERT INTO policies — name="HR-811-A"

2. INSERT INTO rules — tardy > 7min → 0.5pts

3. INSERT INTO rules — early_depart → 0.5pts

4. INSERT INTO rules — no_call → 2.0pts

5. INSERT INTO rules — unexcused → 1.0pts

6. INSERT INTO escalation_levels ×4

7. LINK policy → region US-East

Human Review: Good Partial Bad
Step 3 — Generated SQL 7 statements gpt-4o

INSERT INTO policies (name, org_id, region_id)

VALUES ('HR-811-A', 1, 1);

INSERT INTO rules (policy_id, name, points)

VALUES (1, 'tardy', 0.5);

-- 5 more statements...

Human Review: Sandbox execute → verify results before production
3 steps · 3 models · 2.1s · 3,412 tokens

Upload the same policy doc. AI extracts every rule, shows the plan, you rate each step, approve. Done.

The Shift

Software Changed. The Workflow Didn't.

Traditional software is predictable. AI-powered software isn't. That demands a fundamentally different approach.

Traditional Software
AI-Powered Software
How it works Same input = same output, every time Same input can produce different results each time
Where effort goes 80–90% building: writing logic, fixing bugs 80–90% evaluating: checking outputs, tuning quality
When is it "done"? When the bugs are fixed, you ship it Never — continuous monitoring and improving
When things go wrong Bugs are predictable and reproducible Failures are subtle, inconsistent, and can drift
The human's role Operates the system: fills forms, clicks buttons Supervises the system: reviews output, approves or rejects

Businesses need guarantees — precise compliance, exact calculations, auditable decisions. Apprentice delivers those guarantees using technology that, by its nature, doesn't offer them.

How It Works

Three Steps. Human Checkpoints at Every Stage.

Instead of a black box, the AI shows its work at every stage — and you rate, review, or reject before it moves on.

Rate & Review
1
Extract

"Here's what I found"

The AI reads your document and pulls out structured data. You skim the results and rate the extraction — Good, Partial, or Bad.

Approve or Reject
2
Plan

"Here's what I'll do"

Before anything touches the database, the AI shows its plan in plain language. You say "yes" or "no, fix this."

Sandbox First
3
Execute

"Approved? Done."

Run in sandbox first to verify results. When satisfied, execute against production. Deterministic. Auditable.

The Trajectory

Trust Is Earned, Not Declared

Every approval and rejection trains the system. The human isn't removed from the process — they're promoted.

1

Phase 1

Hands-On

Review every output. Rate each step. The system builds a reliability baseline. The interface is your workspace.

All reviews required Rate every step Building baseline
fairplay.app/dashboard Phase 1 — Hands-On
Pipeline Review Progress 0 / 3 steps reviewed
Step 1 — Extract Review Required
attendance_policy tardy no_call_no_show rolling_12mo
Your rating: Good Partial Bad
Step 2 — Plan 87% conf Review Required

1. INSERT INTO policies

2. INSERT INTO rules ×4

3. INSERT INTO escalation_levels ×4

Your rating: Good Partial Bad
Step 3 — Execute Sandbox Required
Must run sandbox → verify results before production
3 steps · all reviews pending

Every step requires human review before the pipeline can proceed.

2

Phase 2

Building Trust

High-confidence results auto-approve. You focus on what the system flags as uncertain. The interface becomes your audit panel.

Auto-approve > 90% Flag low confidence Review exceptions
fairplay.app/dashboard Phase 2 — Assisted
Auto-approve threshold: 90% Agreement rate: 94.2%
Step 1 — Extract 97% conf Auto-approved
Step 2 — Plan 78% conf Flagged

Low confidence on rule mapping — found 2 ambiguous escalation thresholds

Your review: Good Partial Bad
Step 3 — Execute 95% conf Auto-approved
2 of 3 steps auto-approved · 1 flagged for review

Only the flagged step needs your attention. High-confidence steps flow through.

3

Phase 3

Autonomous

The system processes documents overnight. You see a morning summary. Only genuine edge cases reach a human — unusual documents, ambiguous language, cross-jurisdictional conflicts. Everything routine is handled.

Overnight processing Exceptions only Human promoted
fairplay.app/dashboard Phase 3 — Autonomous

Overnight Processing Summary

Mar 6, 2026 · 2:00 AM – 6:00 AM

98.3% auto-resolved

12

Processed

10

Auto-approved

2

Flagged

0

Failed

Auto-Approved (10)
HR-Policy-2025-East.pdf 97%
Attendance-Midwest-Q1.pdf 95%

+ 8 more documents...

Needs Your Attention (2)

Multi-State-Policy.pdf

Cross-jurisdictional conflict — CA vs. TX overtime rules

62%

Contractor-Attendance.docx

Ambiguous — policy references external handbook

71%
Next batch: tonight at 2:00 AM

Morning summary. 10 policies handled overnight. Only 2 edge cases need you.

First Apprentice

The Fair Play Initiative

Workforce attendance compliance — complex HR policies, nested rules, real consequences. The first domain where Apprentice proves the Extract → Plan → Execute pattern works.

Policy Upload & AI Extraction

Upload HR policy docs (PDF, DOCX, or paste text). AI extracts rules, thresholds, and escalation levels.

Points-Based Discipline

Configurable rules with rolling windows. CSV attendance upload with automatic scoring against active policies.

Multi-Tenant Hierarchy

Regions (with labor law context) → Organizations → Policies → Rules → Employees.

Full Audit Trail

Every point change — violation or manual override — is recorded. Dashboard with KPIs, trends, and risk grid.

Measurement Infrastructure

Human feedback on every step. Token usage, cost, and response time tracking per model per stage.

Playground Mode

Safe sandbox with sample policies and seed data. Test the full pipeline without touching production.

Beyond Fair Play

Same Apprentice. Different Domain.

Apprentice is domain-agnostic: change the document type, the extraction schema, and the reviewer — the framework stays the same. Fair Play is the first apprentice. These are the next ones.

LIVE
Fair Play Initiative

Doc: HR attendance policies

Extract: Rules, point values, thresholds

Plan: "Create policy, add 5 rules, link to region"

Result: Policy & rules in compliance database

Insurance

Doc: Claims, medical records

Extract: Diagnoses, procedures, coverage terms

Plan: "Approve claim, apply deductible, flag for review"

Result: Adjudication decision and payout

Procurement

Doc: Vendor contracts, RFPs

Extract: Terms, SLAs, pricing, penalties

Plan: "Onboard vendor, set payment terms, flag SLA risk"

Result: Contract records and payment schedule

Regulatory

Doc: Legal filings, compliance docs

Extract: Requirements, deadlines, obligations

Plan: "Map to reporting schedule, flag gaps"

Result: Filing submissions and audit trail

Healthcare

Doc: Patient forms, referrals

Extract: Demographics, conditions, medications

Plan: "Assign provider, schedule intake, flag allergies"

Result: EHR records and care plan

Same Architecture

Extract → Plan → Verify → Execute

Different document. Different domain. Same bones.

See the first Apprentice in action

The Fair Play Initiative — upload an HR policy document in the playground, watch the AI extract rules and build an ingestion plan, then approve or reject.

Launch the Dashboard