Build ReleasePilot, an internal platform that moves application versions through deployment environments, dev → staging → production. The core is the Promotion: a state machine with strict business rules. Model it with DDD, CQRS and events, in any backend stack you like.
The domain is the whole point. Use any language and framework you like. Model the Promotion as an aggregate that guards its own rules, keep the API thin, and ship a docker-compose.yml that starts the database and message queue it needs.
Environments form an ordered pipeline, dev → staging → production. A Promotion advances one application version exactly one step along it, and a version must complete each environment before it can move to the next, no skipping. On top of that: only one promotion may be in progress per application + target environment, only an approver may approve, and once completed or cancelled a promotion is immutable. The aggregate guards these rules itself; violating one is a domain error, never a 500.
One dedicated handler per command. Each transition emits a domain event.
Read models are shaped for the consumer, not the aggregate.
The platform talks to external systems. Define them as ports and stub them in-memory, no real HTTP. Where you place these interfaces matters, and you'll be asked why.
Publish every domain event to a queue and consume it in a decoupled handler that could be its own process. Build an Audit Log consumer that persists each event: type, promotion id, timestamp, acting user. The API responds before the consumer finishes.
When a promotion reaches Approved, trigger an agent that drafts release notes for it. Build a real tool-calling loop, not a single prompt, with an agent framework if your stack has one or hand-rolled if it doesn't. A mocked LLM backend is perfectly fine, we evaluate structure, not live output. Doing this won't make up for a weak core, but attempting it well, even partially, is a strong signal.
Please, use AI. We do too. Just own every line you send us.
You won't finish it all, and that's fine. Pick a slice you can stand behind and do it well, then tell us what you left out. We care more about your decisions than your line count.
If the work's strong, we pair. We go through your code together and talk through a few real situations. No whiteboard puzzles, no trick questions.
Push your code to a repo and send us the link at hiring@theagilemonkeys.com. Make sure it includes four things:
It should run against the infrastructure your compose file starts, and do what you set out to build. A small slice that actually works beats a broad one that doesn't. Your commit history is part of the story, so let it show how you got there.
Prerequisites, how to start the infrastructure, how to run the API, and at least one example request per command. We should be able to get it going without asking you.
The record of how you worked with AI, whatever your tools produce: a chat export, a prompts log, screenshots. We want to see how you used it, not whether you did. Own every line you send us.
An async presentation of how you approached the challenge: your domain model, where invariants live, how CQRS, events and ports are wired, the trade-offs you weighed, and what you'd improve next. Any format you like, a Loom, a video, a deck, a doc. Keep it tight: under ten minutes if it's a video, under fifteen slides if it's a deck.