Files
sofarr/ARCHITECTURE.md
Gronod 1bef14d590
All checks were successful
Build and Push Docker Image / build (push) Successful in 41s
Docs Check / Markdown lint (push) Successful in 48s
Licence Check / Licence compatibility and copyright header verification (push) Successful in 57s
CI / Security audit (push) Successful in 1m23s
CI / Tests & coverage (push) Successful in 1m36s
Docs Check / Mermaid diagram parse check (push) Successful in 1m43s
feat(webhooks): security hardening, tests, full documentation audit & polish (Phase 6)
2026-05-19 17:11:45 +01:00

234 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# sofarr — Architecture Reference
> Concise top-level architecture guide. For the full deep-dive (API reference, matching pipeline, deployment) see [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md).
---
## 1. Overview
sofarr is a **Node.js/Express** single-page application. It aggregates download activity from multiple media automation services, filters results by Emby user identity, and presents a real-time personalised dashboard.
Three pluggable layers form the core:
| Layer | Name | Location |
|-------|------|----------|
| Download client abstraction | **PDCA** — Pluggable Download Client Architecture | `server/clients/` + `server/utils/downloadClients.js` |
| *arr data retrieval | **PALDRA** — Pluggable *Arr Library Data Retrieval Architecture | `server/utils/arrRetrievers.js` |
| Real-time push | **Webhook receiver** | `server/routes/webhook.js` |
---
## 2. Request / Data Flow
```
Browser (SPA)
│ POST /api/auth/login → Auth routes → Emby verify → set httpOnly cookie
│ GET /api/dashboard/stream → SSE stream → poller cache → matched downloads
│ POST /api/webhook/* ← Sonarr/Radarr push events
Express Server (:3001)
├── Helmet (CSP nonce, HSTS, X-Frame-Options, …)
├── express-rate-limit (300/15 min general; 60/1 min webhook)
├── cookie-parser (HMAC-signed session cookie)
├── verifyCsrf (double-submit cookie, all state-changing /api routes except auth + webhook)
├── /api/auth → login, logout, me, csrf
├── /api/webhook → [rate-limit] → [secret validation] → [payload validation]
│ → [replay check] → updateWebhookMetrics → processWebhookEvent
├── /api/dashboard → requireAuth → read cache → match downloads → SSE/JSON
├── /api/history → requireAuth → historyFetcher (5 min cache) → filter + dedup
├── /api/sonarr|radarr → requireAuth → verifyCsrf → proxy to *arr API
└── /api/sabnzbd|emby → requireAuth → verifyCsrf → proxy
Background:
Poller (setInterval POLL_INTERVAL ms)
└── shouldSkipInstancePolling? ──yes──► extend cache TTL, increment pollsSkipped
│ no (or fallback triggered)
PDCA Registry.getDownloadsByClientType()
PALDRA Registry.getQueuesByType() / getHistoryByType() / getTagsByType()
cache.set('poll:*', data, TTL)
notify pollSubscribers → SSE push to all connected browsers
```
---
## 3. Pluggable Download Client Architecture (PDCA)
All download clients extend `DownloadClient` (abstract base in `server/clients/DownloadClient.js`):
```
DownloadClient (abstract)
├── SABnzbdClient — REST API, API key auth
├── QBittorrentClient — Sync API (incremental deltas), cookie auth, fallback to /torrents/info
├── TransmissionClient — JSON-RPC, session-ID management
└── RTorrentClient — XML-RPC, HTTP Basic Auth
```
`DownloadClientRegistry` (`server/utils/downloadClients.js`) initialises all configured clients from `*_INSTANCES` env vars, fetches from all in parallel, and returns a `{ sabnzbd, qbittorrent, transmission, rtorrent }` map. Individual client failures are isolated.
**Adding a new client:** extend `DownloadClient`, implement `getActiveDownloads()` returning `NormalizedDownload[]`, register in the registry factory.
---
## 4. Pluggable *Arr Retrieval Architecture (PALDRA)
`server/utils/arrRetrievers.js` provides `arrRetrieverRegistry` which:
- Initialises one retriever per configured Sonarr/Radarr instance
- Exposes `getQueuesByType()`, `getHistoryByType()`, `getTagsByType()` — returning results keyed by `sonarr` / `radarr`
- Results carry `{ instance: instanceId, data: … }` so callers can look up instance credentials
The poller and webhook processor both use the same registry, ensuring consistency.
---
## 5. Webhook Flow (Phase 15.1)
```
Sonarr/Radarr
POST /api/webhook/sonarr (X-Sofarr-Webhook-Secret: <secret>)
{
"eventType": "Grab",
"instanceName": "Main Sonarr",
"date": "2026-05-19T10:00:00.000Z",
}
webhookLimiter (60 req/min/IP)
validateWebhookSecret() ──fail──► 401 Unauthorized
│ ok
validatePayload() ──fail──► 400 Bad Request
│ ok
isReplay() ──yes───► 200 { received: true, duplicate: true }
│ no
cache.updateWebhookMetrics(instance.url) ← activates smart polling skip
processWebhookEvent('sonarr', 'Grab') [fire-and-forget]
├── classify: Grab → QUEUE_EVENT
├── arrRetrieverRegistry.getQueuesByType()
├── cache.set('poll:sonarr-queue', …, CACHE_TTL)
└── pollAllServices() → pollSubscribers.forEach(cb) → SSE push
200 { received: true } (returned immediately, before fire-and-forget completes)
```
---
## 6. Smart Polling Optimization (Phase 5)
```
pollAllServices() called every POLL_INTERVAL ms:
globalMetrics = cache.getGlobalWebhookMetrics()
fallbackTriggered = lastGlobalWebhookTimestamp > WEBHOOK_FALLBACK_TIMEOUT ago
for each service type (sonarr, radarr):
shouldSkip = !fallbackTriggered
&& all instances have metrics.eventsReceived > 0
&& all instances have metrics.lastWebhookTimestamp within WEBHOOK_FALLBACK_TIMEOUT
if shouldSkip:
extend TTL of existing cached data ← no API calls made
increment metrics.pollsSkipped
log "[Poller] Skipping sonarr polling for N instance(s) with active webhooks"
else:
fetch from *arr APIs → update cache
```
**Result:** zero *arr API calls per poll cycle when webhooks are active and recent. Falls back automatically after `WEBHOOK_FALLBACK_TIMEOUT` minutes of silence (default: 10).
---
## 7. Cache Keys
| Key | Content | TTL |
|-----|---------|-----|
| `poll:sab-queue` | SABnzbd queue slots + status | `POLL_INTERVAL × 3` |
| `poll:sab-history` | SABnzbd history slots | `POLL_INTERVAL × 3` |
| `poll:sonarr-queue` | Sonarr queue records (with `_instanceUrl`) | `POLL_INTERVAL × 3` |
| `poll:sonarr-history` | Sonarr history records | `POLL_INTERVAL × 3` |
| `poll:sonarr-tags` | Sonarr tag list per instance | `POLL_INTERVAL × 3` |
| `poll:radarr-queue` | Radarr queue records (with `_instanceUrl`) | `POLL_INTERVAL × 3` |
| `poll:radarr-history` | Radarr history records | `POLL_INTERVAL × 3` |
| `poll:radarr-tags` | Radarr tag list | `POLL_INTERVAL × 3` |
| `poll:qbittorrent` | qBittorrent torrent list | `POLL_INTERVAL × 3` |
| `history:sonarr` | Sonarr history (on-demand, `/api/history/recent`) | 5 min |
| `history:radarr` | Radarr history (on-demand) | 5 min |
| `emby:users` | Emby user list | 60 s |
When polling is disabled (`POLL_INTERVAL=0`), all `poll:*` TTLs fall back to 30 s.
---
## 8. Security Model
| Concern | Mechanism |
|---------|-----------|
| User authentication | Emby credentials → httpOnly HMAC-signed cookie |
| Session validation | `requireAuth` middleware on all `/api/dashboard`, `/api/history`, proxy routes |
| CSRF | Double-submit cookie (`X-CSRF-Token` header) on all state-changing routes |
| Webhook auth | Shared secret on `X-Sofarr-Webhook-Secret` header (webhook routes are outside CSRF) |
| Webhook input | `validatePayload()` allowlists event types; rejects invalid shapes |
| Webhook replay | 5-minute nonce cache keyed on `(eventType, instanceName, date)` |
| Rate limiting | 300 req/15 min (general), 10 fails/15 min (login), 60 req/1 min (webhook) |
| Secret leakage | `sanitizeError()` redacts all secrets from error messages and logs |
| Headers | Helmet v7: CSP nonce, HSTS, X-Frame-Options DENY, noSniff, Referrer-Policy |
---
## 9. Directory Structure (summary)
```
sofarr/
├── server/
│ ├── app.js Express factory (imported by tests + index.js)
│ ├── index.js Entry point: logging, listen, start poller
│ ├── clients/ PDCA — one file per download client
│ ├── routes/
│ │ ├── auth.js Login / logout / csrf / me
│ │ ├── dashboard.js SSE stream, downloads, status, cover-art
│ │ ├── history.js Recently completed downloads
│ │ ├── webhook.js Webhook receiver (Phase 16)
│ │ ├── sonarr.js Sonarr API proxy + webhook management
│ │ └── radarr.js Radarr API proxy + webhook management
│ ├── middleware/
│ │ ├── requireAuth.js Cookie auth enforcement
│ │ └── verifyCsrf.js Double-submit CSRF check
│ └── utils/
│ ├── arrRetrievers.js PALDRA — Sonarr/Radarr fetch registry
│ ├── cache.js MemoryCache + webhook metrics helpers
│ ├── config.js Multi-instance config parser
│ ├── downloadClients.js PDCA registry + factory
│ ├── historyFetcher.js History fetch + event classification
│ ├── poller.js Smart background polling engine
│ ├── sanitizeError.js Secret redaction from errors
│ └── tokenStore.js Emby token store (JSON file, atomic writes)
├── public/ Static SPA (HTML + CSS + vanilla JS)
├── tests/
│ ├── setup.js Isolated DATA_DIR, SKIP_RATE_LIMIT
│ ├── unit/ Pure unit tests
│ └── integration/ Supertest + nock integration tests
├── docs/ARCHITECTURE.md Full deep-dive architecture documentation
├── ARCHITECTURE.md This file — concise reference
├── SECURITY.md Threat model + hardening guide
├── CHANGELOG.md Version history
└── .env.sample Annotated configuration template
```
---
*For complete API reference, data-flow diagrams, download matching pipeline, qBittorrent Sync API details, and deployment guidance see [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md).*