How Koalr Works

Architecture overview — how Koalr collects, processes, and surfaces engineering data.

Koalr connects to your existing tools (GitHub, Jira, PagerDuty, Codecov, etc.) via OAuth and webhooks, then normalizes the data into a unified engineering intelligence layer.

Data collection

Webhooks (real-time)

When you connect GitHub, Koalr registers a webhook on your organization. Every pull request event, deployment event, and check run is delivered in real time. Koalr processes these within seconds.

REST API sync (historical)

On first connection, Koalr fetches up to 90 days of historical data: merged PRs, deployment history, CODEOWNERS files, repository metadata. This populates your initial DORA metrics and deploy risk model.

Scheduled syncs

For data that doesn't emit webhooks (CODEOWNERS drift detection, coverage snapshots, on-call schedules), Koalr runs scheduled syncs every 1–24 hours depending on the integration.

Processing pipeline

GitHub webhook → BullMQ queue → Ingestion worker → Prisma (PostgreSQL)
                                                  ↓
                                           Risk scoring engine
                                                  ↓
                                       ClickHouse (aggregations)
                                                  ↓
                                            API → Dashboard

Ingestion: raw events are enqueued in BullMQ and processed asynchronously
Normalization: events are mapped to Koalr's unified schema (PR → Deployment → Incident chain)
Scoring: deploy risk scores are computed per deployment using 32 signals
Aggregation: DORA metrics, cycle time, and review health are pre-aggregated in ClickHouse for fast queries

Data freshness

Data type	Latency
PR events	< 5 seconds (webhook)
Deployment risk score	< 30 seconds (webhook trigger)
DORA metrics	< 1 minute (aggregation job)
CODEOWNERS drift	Daily (3 AM UTC)
Coverage snapshots	Every 6–24 hours (provider-dependent)
On-call roster	Every 15 minutes (live query)

Multi-tenancy

All data is scoped to your organization. No data is shared between organizations. Each query is automatically filtered by organizationId at the database level.