Pull requests pile up. A senior developer opens the queue on Monday morning and finds eleven PRs waiting, most of them touching style, obvious bugs, or the same patterns the team has flagged a hundred times before. The author has been waiting since Friday. Velocity stalls in the gap between "ready for review" and "merged." HumanPR sits in that gap. When a PR opens, a GitHub App webhook hands the payload to a BullMQ worker, which runs a deterministic rule pass first: hardcoded secrets, leftover console.logs, merge markers, missing license headers. Only what survives gets routed to the LLM the organization picked, whether that is Claude, GPT, Gemini, or a self-hosted model behind their own tunnel. By the time a reviewer opens the PR in the dashboard, the cheap checks are already triaged with suggestions ready to accept, reject, or comment on. The reviewer agrees or pushes back per suggestion, and that feedback feeds back into per-org metrics on acceptance rate and turnaround.
The Challenge
Code review is the part of engineering that scales worst. The people qualified to review are also the people building the hardest features, so teams either let PRs sit (and watch velocity collapse), rubber-stamp them (and watch quality collapse), or burn senior hours on lint-level checks. Pointing a raw LLM at every diff sounds easy until the bill arrives, two workers race the same webhook and double-charge a customer, the chosen model is down, or a prompt leaks a secret a regex would have caught for free. The real problem was triage: deciding what deserves an LLM, what deserves a human, and what can be answered by a rule in a few milliseconds, while staying defensible enough that an admin can sign findings into a PR comment under their own name.
Our Solution
The orchestrator treats LLM calls as the expensive last resort. A deterministic rule engine runs first and filters out issues a regex can prove, so tokens are only spent on code that actually needs judgment. Surviving chunks are sized to each model's token budget, with Anthropic prompt caching reused across reviews on the same repo. Provider choice is per-organization, with LiteLLM and a self-hosted vLLM box reachable through a Cloudflare Tunnel as fallbacks. Concurrency is solved by an atomic claim: workers transition a job from PENDING_AI to AI_REVIEWING via a single conditional update, so only one worker proceeds and the rest exit cleanly. GitHub webhook signatures are verified before any payload-driven side effect. NextAuth v5 gates admin routes behind MFA, every admin mutation chains into a hash-verified audit log, and GDPR export and delete endpoints preserve that chain by anonymizing rather than dropping records.
Results
Tech Stack
Want Similar Results?
Let's discuss how we can build something great for your business.

