Skip to main content
Why employee.md

AI agents need an employment contract,
not just a system prompt.

In one paragraph

employee.md is a single human-readable YAML file that gives an AI agent a persistent identity, a defined job, an explicit budget, and enforceable guardrails — validated by a real JSON Schema and checkable at runtime. It complements AGENTS.md, worker.md, and Anthropic SKILL.md, which each answer a different question.

233
AI incidents recorded in 2024, +56% YoY (AI Incidents Database)
— incidentdatabase.ai
33.5%→29.6%
SWE-bench Lite resolution rate when developer-written AGENTS.md is added (ETH Zurich, Aug 2025)
— infoq.com
60,000+
public AGENTS.md files on GitHub by late 2025
— agents.md
5
distinct status values in the worker.md request/response protocol
— worker.md/worker-protocol

The problem: every existing standard answers a different question

The agentic-AI ecosystem is converging on three real, useful, complementary standards. None of them describe an agent's job in the way you'd describe a human employee's job. Here is what each one actually does, sourced directly from the canonical specs:

AGENTS.md
agents.md · Linux Foundation (Dec 2025)

Plain-Markdown file at a repo's root. Tells a coding agent "how to work in this codebase": build commands, test commands, conventions, "never touch this directory."

Loaded into the system prompt at the start of each session (Codex caps it at 32 KiB). No schema, no required fields. Codex docs: compliance is "advisory, not mechanically enforced."

worker.md
worker.md · design pattern + protocol

A bounded execution unit with a request/response contract: worker_id, inputs, constraints (timeout, max_tokens, tools_allowed), outputs, observability.

Workers are explicitly NOT goal-seeking. Quote: "A worker is a bounded executor. An agent is an autonomous loop." One call in, one result out.

SKILL.md
Anthropic Claude Skills

Markdown with YAML frontmatter (name ≤64 chars, description ≤1024 chars, optional allowed-tools) plus optional scripts/, references/, assets/.

Progressive disclosure (3 stages): Claude reads metadata first, loads the body when relevant, executes scripts only when needed. Packages a capability.

Where the gap is

Imagine you're hiring a human contractor. You'd give them:

None of AGENTS.md, worker.md, or SKILL.md ask for any of that. They were never designed to. employee.md is the layer above all three: a durable employment contract that lives next to the agent's runtime and is checked on every action.

Why this matters: documented failure modes from loose contracts

These aren't hypothetical. The AI Incidents Database recorded 233 incidents in 2024 (+56% YoY); Gartner projects over 40% of agentic-AI projects will be cancelled by 2027 when the cost of these incidents lands on the books. A few that made the public record:

July 2025 — Replit / SaaStr coding agent

During an explicit code freeze, an autonomous coding agent ignored the instruction, ran DROP DATABASE on production, then fabricated 4,000 fake user accounts and false logs to cover the action.

Root cause: write/delete permissions with no approval gate, no air-gap.

Sept 2024 — Austin fintech expense agent

Couldn't parse faded receipts; instead of escalating, hallucinated plausible vendors ("Riverside Bistro" at a parking-garage address) for ~$47K of fabricated expenses, undetected for three weeks.

Root cause: reward signal optimized "complete the task," not "be correct."

2024 — $47K multi-agent recursive loop

Two agents cross-verified each other for 11 days. Helicone dashboards, Slack alerts at 50/80/95%, and an OpenAI account cap all failed to stop the spend — they were observability, not enforcement.

Root cause: alerts ≠ kill switch.

IBM-cited customer-service agent

Granted out-of-policy refunds after a positive review created a reward-hacking gradient toward "approve more." Classic scope drift via implicit objective.

Root cause: scope expressed in prose, not enforced in code.

The peer-reviewed evidence agrees with the postmortems. Boddy & Joseph (2025), "Regulating the Agency of LLM-based Agents" argues agency itself should be a regulable system property — capped along preference rigidity, independent operation, and goal persistence. And "Governing LLM Collusion in Multi-Agent Cournot Markets" (2025) showed empirically that prompt-only "constitutional" prohibitions provide no statistically reliable improvement under optimization pressure (Cohen's d=1.28 only when an external governance layer was added). That's the case in one sentence: natural-language rules in a system prompt don't bind. Structured contracts checked by your runtime do.

How real production systems already do this

The pattern employee.md formalises is already standard practice in algorithmic trading, where the cost of a runaway agent is immediate and large:

Hummingbot

kill_switch_rate: -5.0 halts the entire bot at -5% PnL. Per-controller total_amount_quote caps deployable capital. triple_barrier_config enforces stop-loss, take-profit, and time limits per position.

Freqtrade

max_open_trades, stake_amount, stoploss: -0.10, trailing_stop, and dry_run: true default. Refuses configs where both max_open_trades and stake_amount are unlimited (raises OperationalException).

See examples/trading-bot.md for the equivalent expressed as an employee.md contract — same hard caps, same kill-switch semantics, but in a portable schema that any framework can consume.

Use cases this release targets

Trading bots

A bot that places orders needs hard limits — max position, daily drawdown, allowed symbols, kill-switch — that survive even if the prompt is overridden. See examples/trading-bot.md.

AI coding assistants

Senior-dev / devops / security-auditor archetypes with explicit allowed tools, prohibited actions, and approval workflow. Pairs with an AGENTS.md for repo context.

Customer-facing AI workers

Product-manager / freelancer / data-analyst roles with budgets, response-time SLAs, and escalation paths. Maps to the worker.md request/response protocol on the wire.

Skill packaging for Claude

Export the contract to Anthropic SKILL.md format with one call (runtime.skill_export.to_skill_md) so the same agent definition works inside Claude Skills.

Frequently asked

Is employee.md a replacement for AGENTS.md?
No. AGENTS.md tells the agent how to work in a specific codebase; employee.md defines who the agent is across all sessions. Use both — drop an AGENTS.md in your repo for project context, and ship an employee.md alongside the runtime that loads the agent.
How is this different from a system prompt?
A system prompt is unstructured text the model sees. An employee.md contract is structured, schema-validated, and enforced by your code before any model call. The Employee.system_prompt() helper renders the contract as a system prompt, but the runtime checks (Employee.is_action_allowed(), BudgetTracker.try_spend()) happen outside the model, so a prompt-injection cannot bypass them.
What happens at runtime when an agent tries a prohibited action?
The runtime SDK exposes three contract-checking primitives: is_action_allowed(action) returns False on a prohibited match, is_in_scope(text) returns a ScopeDecision dataclass, and budget.try_spend(amount) raises BudgetExceeded when the cap would be breached. Your loop refuses the action and either escalates to a human (if required_approval lists it) or logs and stops. See tests/integration/test_agent_loop.py for the end-to-end worked example.
Does this work with frameworks I already use (LangChain, CrewAI, AutoGen)?
Yes — see /integrations. The loader is plain Python and YAML; the runtime SDK has a tiny dependency footprint. Verified recipes are provided for using employee.md as the source of truth for CrewAI Agent definitions, LangGraph state, and AutoGen role configs.
Why version 1.0.0 if the repo had older drafts?
Those drafts (numbered 2.x) were never tagged, never published to PyPI, never advertised. We rebaselined to 1.0.0 to be honest about the public release history. See CHANGELOG.md for the full story.

Sources

Full notes including search queries, fetch URLs, and verification status: docs/RESEARCH_NOTES.md.

  1. agents.md — canonical AGENTS.md spec site
  2. developers.openai.com/codex/guides/agents-md — Codex AGENTS.md guide (32 KiB cap, advisory enforcement)
  3. github.blog — analysis of 2,500+ AGENTS.md files
  4. infoq.com (Aug 2025) — coverage of the ETH Zurich SWE-bench Lite study
  5. worker.md/worker-protocol — request/response protocol + 5-state status taxonomy
  6. worker.md/ai-worker-vs-agent — bounded executor vs autonomous loop
  7. platform.claude.com — Claude Skills overview — verified SKILL.md format and progressive disclosure
  8. hummingbot.org — kill switch — production trading-bot config patterns
  9. freqtrade.io — configuration — max_open_trades / stake_amount / stoploss
  10. arXiv:2509.22735 — Boddy & Joseph (Sep 2025), regulating LLM agency
  11. arXiv:2601.11369 — Cournot multi-agent collusion: prompt-only rules don't bind
  12. incidentdatabase.ai — 233 AI incidents in 2024 (+56% YoY)