Site currently under development
HumanPR

HumanPR

saasVisit Site

Pull requests pile up. A senior developer opens the queue on Monday morning and finds eleven PRs waiting, most of them touching style, obvious bugs, or the same patterns the team has flagged a hundred times before. The author has been waiting since Friday. Velocity stalls in the gap between "ready for review" and "merged." HumanPR sits in that gap. When a PR opens, a GitHub App webhook hands the payload to a BullMQ worker, which runs a deterministic rule pass first: hardcoded secrets, leftover console.logs, merge markers, missing license headers. Only what survives gets routed to the LLM the organization picked, whether that is Claude, GPT, Gemini, or a self-hosted model behind their own tunnel. By the time a reviewer opens the PR in the dashboard, the cheap checks are already triaged with suggestions ready to accept, reject, or comment on. The reviewer agrees or pushes back per suggestion, and that feedback feeds back into per-org metrics on acceptance rate and turnaround.

The Challenge

Code review is the part of engineering that scales worst. The people qualified to review are also the people building the hardest features, so teams either let PRs sit (and watch velocity collapse), rubber-stamp them (and watch quality collapse), or burn senior hours on lint-level checks. Pointing a raw LLM at every diff sounds easy until the bill arrives, two workers race the same webhook and double-charge a customer, the chosen model is down, or a prompt leaks a secret a regex would have caught for free. The real problem was triage: deciding what deserves an LLM, what deserves a human, and what can be answered by a rule in a few milliseconds, while staying defensible enough that an admin can sign findings into a PR comment under their own name.

Our Solution

The orchestrator treats LLM calls as the expensive last resort. A deterministic rule engine runs first and filters out issues a regex can prove, so tokens are only spent on code that actually needs judgment. Surviving chunks are sized to each model's token budget, with Anthropic prompt caching reused across reviews on the same repo. Provider choice is per-organization, with LiteLLM and a self-hosted vLLM box reachable through a Cloudflare Tunnel as fallbacks. Concurrency is solved by an atomic claim: workers transition a job from PENDING_AI to AI_REVIEWING via a single conditional update, so only one worker proceeds and the rest exit cleanly. GitHub webhook signatures are verified before any payload-driven side effect. NextAuth v5 gates admin routes behind MFA, every admin mutation chains into a hash-verified audit log, and GDPR export and delete endpoints preserve that chain by anonymizing rather than dropping records.

Results

Routine PR feedback (secrets, merge markers, stale debug statements, missing headers) is caught before any LLM is called, keeping cost per review predictable as repo activity grows.
Concurrent webhook deliveries and worker restarts cannot double-bill or double-comment, because the AI_REVIEWING transition is a single atomic claim against the database.
Each organization can switch between Claude, GPT, Gemini, and a self-hosted model without code changes, trading speed against cost on their own terms.
Admin mutations, session revocations, and role changes are hashed into an append-only audit chain that an operator can verify end to end with one script.
GDPR export and delete are first-class endpoints, with the delete path anonymizing the user record so the audit chain stays intact.
Stripe Billing meters per suggestion rather than per seat, so pricing tracks the work the platform actually does.

Tech Stack

Next.js 16React 19TypeScriptBullMQRedisPostgreSQL 16Prisma 7NextAuth v5GitHub AppOctokitAnthropic ClaudeOpenAI GPTGoogle GeminiLiteLLMStripe BillingTailwind CSS v4shadcn/uiResendLokiGrafanaTurborepopnpmVitestPlaywrightDockerCoolify

Want Similar Results?

Let's discuss how we can build something great for your business.