How Koalr Works
Architecture overview — how Koalr collects, processes, and surfaces engineering data.
How Koalr Works
Koalr connects to your existing tools (GitHub, Jira, PagerDuty, Codecov, etc.) via OAuth and webhooks, then normalizes the data into a unified engineering intelligence layer.
Data collection
Webhooks (real-time)
When you connect GitHub, Koalr registers a webhook on your organization. Every pull request event, deployment event, and check run is delivered in real time. Koalr processes these within seconds.
REST API sync (historical)
On first connection, Koalr fetches up to 1 year of historical data: merged PRs, deployment history, CODEOWNERS files, repository metadata. This populates your initial DORA metrics and deploy risk model.
Scheduled syncs
For data that doesn't emit webhooks (CODEOWNERS drift detection, coverage snapshots, on-call schedules), Koalr runs scheduled syncs every 1–24 hours depending on the integration.
Processing pipeline
GitHub webhook → BullMQ queue → Ingestion worker → Prisma (PostgreSQL)
↓
Risk scoring engine
↓
ClickHouse (aggregations)
↓
API → Dashboard
- Ingestion: raw events are enqueued in BullMQ and processed asynchronously
- Normalization: events are mapped to Koalr's unified schema (PR → Deployment → Incident chain)
- Scoring: deploy risk scores are computed per deployment using 7 weighted factors
- Aggregation: DORA metrics, cycle time, and review health are pre-aggregated in ClickHouse for fast queries
Data freshness
| Data type | Latency |
|---|---|
| PR events | < 5 seconds (webhook) |
| Deployment risk score | < 30 seconds (webhook trigger) |
| DORA metrics | < 1 minute (aggregation job) |
| CODEOWNERS drift | Daily (3 AM UTC) |
| Coverage snapshots | Every 6–24 hours (provider-dependent) |
| On-call roster | Every 15 minutes (live query) |
Multi-tenancy
All data is scoped to your organization. No data is shared between organizations. Each query is automatically filtered by organizationId at the database level.