Claude_Homelab/113_stirling_pdf_deployment.md

340 lines
8.0 KiB
Markdown

# 113 — Stirling PDF Deployment Guide
> Status: **PLANNED** — not yet deployed
> CT ID: 113 · IP: 192.168.1.113
> Domain: `pdf.spendlik.sk`
> Last updated: 2026-06-22
---
## Overview
Stirling PDF is a self-hosted, open-source PDF toolkit with 50+ operations (merge, split, OCR, compress, convert, redact, sign, rotate, watermark, etc.). All processing happens locally — no documents leave the network.
Primary use case in this homelab: PDF preprocessing for the Paperless-ngx pipeline (CT 111). Potential n8n integration for automated document processing.
Stack: single Docker container (Java/Spring backend + Next.js frontend), no database required.
---
## Resource Allocation
| Resource | Allocation |
|---|---|
| **CT ID** | 113 |
| **IP** | 192.168.1.113 |
| **CPUs** | 2 |
| **RAM** | 1 GB (idle ~512 MB; OCR/conversion peaks higher) |
| **Disk** | 8 GB |
| **Template** | Debian 13 (trixie) |
| **Privileged** | Yes (Docker requires it) |
| **Nesting** | Enabled (`features: nesting=1`) |
---
## Phase 1 — Create LXC Container
In the Proxmox web UI terminal on the host:
```bash
pct create 113 local:vztmpl/debian-13-standard_13.0-1_amd64.tar.zst \
--hostname stirling-pdf \
--cores 2 \
--memory 1024 \
--swap 512 \
--rootfs local-lvm:8 \
--net0 name=eth0,bridge=vmbr0,ip=192.168.1.113/24,gw=192.168.1.1 \
--unprivileged 0 \
--features nesting=1 \
--ostype debian \
--start 1
```
Enter the container:
```bash
pct enter 113
```
---
## Phase 2 — Base Setup
```bash
apt update && apt upgrade -y
apt install -y nano curl ca-certificates gnupg lsb-release
```
---
## Phase 3 — Install Docker
```bash
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
chmod a+r /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/debian \
$(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
apt update
apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
```
Verify:
```bash
docker run --rm hello-world
```
---
## Phase 4 — Deploy Stirling PDF
```bash
mkdir -p /opt/stirling-pdf
cd /opt/stirling-pdf
nano docker-compose.yml
```
Paste:
```yaml
services:
stirling-pdf:
image: stirlingtools/stirling-pdf:latest
container_name: stirling-pdf
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- ./trainingData:/usr/share/tessdata
- ./extraConfigs:/configs
- ./logs:/logs
environment:
- DOCKER_ENABLE_SECURITY=false
- INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false
- LANGS=en_GB
```
> ⚠️ `DOCKER_ENABLE_SECURITY=false` is correct for setups where authentication is handled externally by Authelia. Do not enable internal login as well — it conflicts.
Start:
```bash
docker compose up -d
docker compose logs -f
```
Wait for the message `Started StirlingPDFApplication`. Then verify locally:
```bash
curl -s http://localhost:8080 | grep -i stirling
```
---
## Phase 5 — nginx Reverse Proxy (CT 101)
Enter CT 101:
```bash
pct enter 101
nano /etc/nginx/sites-available/stirling-pdf
```
Paste:
```nginx
server {
listen 80;
server_name pdf.spendlik.sk;
location / {
proxy_pass http://192.168.1.113:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
client_max_body_size 100M;
}
}
```
> ⚠️ `client_max_body_size 100M` is important — large PDFs will be rejected without it.
Enable and reload:
```bash
ln -s /etc/nginx/sites-available/stirling-pdf /etc/nginx/sites-enabled/
nginx -t && systemctl reload nginx
```
---
## Phase 6 — SSL Certificate
Still in CT 101:
```bash
certbot --nginx -d pdf.spendlik.sk
```
> ⚠️ After certbot runs, always inspect the config:
```bash
cat /etc/nginx/sites-available/stirling-pdf
```
Check for:
- Duplicate `server_name` directives
- Missing closing `}` brace
- `listen 443 ssl` block correctly added
If anything looks wrong, fix manually — do not re-run certbot without correcting first.
---
## Phase 7 — DNS Record
In WebSupport admin panel:
1. Add A record: `pdf` → current public IP
2. **Check both DNS management pages** — missing the second page has caused outages before
3. Note the numeric record ID assigned by WebSupport
4. Add the record ID to `00_index.md` DNS table
---
## Phase 8 — DDNS Updater (CT 108)
Enter CT 108:
```bash
pct enter 108
nano /usr/local/bin/ddns-update.sh
```
Add an entry for `pdf.spendlik.sk` using the record ID obtained in Phase 7, following the existing pattern in the script.
---
## Phase 9 — Authelia Protection (CT 102)
Enter CT 102:
```bash
pct enter 102
nano /etc/authelia/configuration.yml
```
In the `access_control.rules` section, add a bypass rule for the Stirling PDF API (needed if wiring to n8n) **before** the catch-all 2FA rule:
```yaml
- domain: pdf.spendlik.sk
resources:
- "^/api/.*"
policy: bypass
- domain: pdf.spendlik.sk
policy: two_factor
```
> ⚠️ Rule order is first-match-wins. The API bypass must precede the catch-all `two_factor` rule or n8n calls will be blocked by 2FA.
Restart Authelia:
```bash
docker compose restart
```
Add Authelia middleware to the nginx vhost in CT 101 (refer to how other Authelia-protected services are configured — e.g., `automation.spendlik.sk`).
---
## Phase 10 — Verify
Test from mobile data (not LAN — hairpin NAT):
- `https://pdf.spendlik.sk` loads and Authelia prompts for 2FA
- After login, Stirling PDF home page is accessible with all tool categories visible
- Upload a test PDF and run merge or compress — file should process and download
---
## API Integration with n8n
Stirling PDF exposes a full REST API. Swagger UI is available at:
```
https://pdf.spendlik.sk/swagger-ui/index.html
```
Example n8n HTTP Request node — compress a PDF:
```
POST https://pdf.spendlik.sk/api/v1/general/compress-pdf
Content-Type: multipart/form-data
fileInput: <binary PDF>
optimizeLevel: 3
```
> For API calls from n8n, the Authelia bypass rule on `/api/*` (Phase 9) allows Bearer-free requests from within the LAN.
Common operations available via API:
| Operation | Endpoint |
|---|---|
| Merge PDFs | `POST /api/v1/general/merge-pdfs` |
| Split PDF | `POST /api/v1/general/split-pdf` |
| Compress PDF | `POST /api/v1/general/compress-pdf` |
| PDF to image | `POST /api/v1/convert/pdf/img` |
| Add OCR layer | `POST /api/v1/misc/add-ocr-pdf` |
| Rotate pages | `POST /api/v1/general/rotate-pdf` |
| Remove metadata | `POST /api/v1/misc/remove-blanks` |
Full API reference at the Swagger UI on your instance after deployment.
---
## Paperless-ngx Integration Ideas
Stirling PDF sits naturally upstream of Paperless-ngx for preprocessing:
- **Compress oversized scans** before consume — reduces NAS storage for Modelář magazines
- **Rotate misaligned pages** from batch scans
- **Strip metadata** from sensitive documents before ingestion
- **OCR layer addition** for scanned PDFs that Paperless struggles with (though Paperless has its own OCR — use Stirling for pre-processing only when needed)
A simple n8n workflow pattern:
```
New file in NAS consume folder (webhook or poll)
→ Stirling PDF: compress + rotate
→ Save back to consume folder
→ Paperless picks it up automatically
```
---
## Resource Notes
- Idle RAM: ~512 MB
- OCR operations: can spike to ~800 MB temporarily
- LibreOffice conversions (PDF → DOCX etc.): heaviest operation, may need RAM bump to 2 GB if used frequently
- 2 CPU cores sufficient for single-user use
---
## Gotchas
| Issue | Fix |
|---|---|
| Large PDF uploads rejected | `client_max_body_size 100M` in nginx config |
| certbot corrupts nginx config | Always inspect after issuance |
| n8n API calls blocked by Authelia | Add `/api/*` bypass rule before catch-all |
| Container won't start after Proxmox reboot | Check `pct config 113` for boot order; add `--onboot 1` if missing |