Add Paperless-ngx deployment guide (CT 111)

This commit is contained in:
Spendlik 2026-06-18 04:21:52 +00:00
parent 843193352e
commit 11abd62309

230
111_paperless_deployment.md Normal file
View File

@ -0,0 +1,230 @@
# Paperless-ngx Deployment — CT 111
## Overview
Self-hosted document management system with multi-language OCR. Deployed on CT 111 via Docker Compose, accessible at `paperless.spendlik.sk`. All documents, media, and data stored on NAS.
| Property | Value |
|---|---|
| **Container** | CT 111 |
| **Hostname** | paperless |
| **IP** | 192.168.1.111 |
| **OS** | Debian 13 (privileged LXC, `nesting=1`) |
| **URL** | https://paperless.spendlik.sk |
| **Internal port** | 8000 |
| **Compose file** | `/opt/paperless/docker-compose.yml` |
| **NAS mount (host)** | `/mnt/pve/spendlik-nas/data/paperless` |
| **NAS mount (CT)** | `/mnt/paperless` |
---
## LXC Configuration
```
# /etc/pve/lxc/111.conf
arch: amd64
cores: 2
features: nesting=1
hostname: paperless
memory: 8192
mp1: /mnt/pve/spendlik-nas/data/paperless,mp=/mnt/paperless,shared=1
net0: name=eth0,bridge=vmbr0,gw=192.168.1.1,hwaddr=BC:24:11:A8:11:71,ip=192.168.1.111/24,type=veth
ostype: debian
rootfs: local-lvm:vm-111-disk-0,size=50G
startup: order=5,up=30
swap: 1024
```
> ⚠️ RAM is set to 8192MB — this was raised from 4GB to handle bulk OCR. Should be reduced to 2048MB once bulk imports are complete.
---
## NAS Directory Structure
The entire `/mnt/pve/spendlik-nas/data/paperless` is bind-mounted into CT 111 at `/mnt/paperless`. Subdirectories:
| Path (inside CT) | Purpose |
|---|---|
| `/mnt/paperless/consume` | Drop files here for automatic ingestion |
| `/mnt/paperless/export` | Export destination |
| `/mnt/paperless/media` | Processed documents (originals, archive, thumbnails) |
| `/mnt/paperless/data` | Paperless application data (search index, classifier, etc.) |
---
## Docker Compose
Located at `/opt/paperless/docker-compose.yml`:
```yaml
services:
broker:
image: redis:7
restart: unless-stopped
volumes:
- redis_data:/data
db:
image: postgres:16
restart: unless-stopped
volumes:
- pg_data:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: <see Vaultwarden>
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
user: root
depends_on:
- db
- broker
ports:
- "8000:8000"
volumes:
- /mnt/paperless/data:/usr/src/paperless/data
- /mnt/paperless/media:/usr/src/paperless/media
- /mnt/paperless/export:/usr/src/paperless/export
- /mnt/paperless/consume:/usr/src/paperless/consume
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_DBNAME: paperless
PAPERLESS_DBUSER: paperless
PAPERLESS_DBPASS: <see Vaultwarden>
PAPERLESS_URL: https://paperless.spendlik.sk
PAPERLESS_SECRET_KEY: <see Vaultwarden>
PAPERLESS_TIME_ZONE: Europe/Bratislava
PAPERLESS_OCR_LANGUAGE: slk+ces+rus+hun+deu+eng
PAPERLESS_OCR_LANGUAGES: slk ces rus hun deu eng
volumes:
redis_data:
pg_data:
```
> `media` and `data` were originally Docker named volumes. They were migrated to NAS bind mounts after the container disk filled up during bulk OCR. See migration notes below.
---
## Docker Container Names
| Name | Image | Purpose |
|---|---|---|
| `paperless-webserver-1` | `ghcr.io/paperless-ngx/paperless-ngx:latest` | Main app + Celery worker + consumer |
| `paperless-db-1` | `postgres:16` | Database |
| `paperless-broker-1` | `redis:7` | Task queue |
---
## nginx Reverse Proxy (CT 101)
Config at `/etc/nginx/sites-available/paperless.spendlik.sk`:
```nginx
server {
server_name paperless.spendlik.sk;
location / {
proxy_pass http://192.168.1.111:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
listen 443 ssl; # managed by Certbot
ssl_certificate /etc/letsencrypt/live/paperless.spendlik.sk/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/paperless.spendlik.sk/privkey.pem;
include /etc/letsencrypt/options-ssl-nginx.conf;
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
}
server {
if ($host = paperless.spendlik.sk) {
return 301 https://$host$request_uri;
}
listen 80;
server_name paperless.spendlik.sk;
return 404;
}
```
---
## Consuming Documents
### Automatic (inotify watcher)
Drop files into `/mnt/paperless/consume` — the consumer detects new files automatically via inotify and queues them. The consumer runs inside the `paperless-webserver-1` container.
### Manual trigger (for pre-existing files)
The inotify watcher only detects **new** file additions, not files already present when the container starts. To process existing files:
```bash
cd /opt/paperless
docker compose exec webserver python3 manage.py document_consumer --oneshot
```
> ⚠️ Flag is `--oneshot` (one word), not `--one-shot`.
### If consumer process is not running
Check with:
```bash
docker compose exec webserver ps aux | grep consumer
```
If missing, restart the webserver container:
```bash
docker compose restart webserver
```
Then watch logs to confirm consumer starts:
```bash
docker compose logs webserver --tail=50 -f
```
Look for: `Using inotify to watch directory for changes: /usr/src/paperless/consume`
---
## Supported File Types
Paperless-ngx supports PDF and common image formats (JPG, PNG, etc.). `.djvu` files are **not supported** and will be skipped with a warning.
---
## OCR Notes
- 6 languages configured: Slovak, Czech, Russian, Hungarian, German, English
- Tesseract warnings about "lots of diacritics" and "too few characters" are normal for old scanned magazines — not errors
- OCR is CPU-intensive; bulk imports require adequate RAM (8GB during bulk, can reduce to 2GB after)
---
## Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| Files in consume folder not processing | Consumer process died (OOM kill) | `docker compose restart webserver` |
| HTTP 500 on web UI | Container disk full | Check disk: `df -h`; migrate volumes to NAS or resize disk |
| `chown: Invalid argument` in logs | NAS mount doesn't allow ownership changes | Harmless — files still process correctly |
| OOM kill of worker | Insufficient RAM during bulk OCR | `pct set 111 --memory 8192 --swap 1024` on Proxmox host |
| Tasks show as failed in UI | OOM kill mid-processing | Re-trigger with `--oneshot`; failed tasks can be cleared from UI |
---
## Deployment History & Key Events
1. **Initial deploy** — CT 111 created, Docker + Paperless stack deployed, NAS consume/export mounted via Proxmox bind mount
2. **Disk fill** — Container root disk (20GB) filled during bulk OCR of 500+ magazines; resized to 50GB (`pct resize 111 rootfs +30G`)
3. **OOM kills** — 2GB RAM insufficient for 6-language bulk OCR; raised to 4GB then 8GB
4. **NAS migration**`media` and `data` Docker named volumes migrated to NAS bind mounts (`/mnt/paperless/media` and `/mnt/paperless/data`) to avoid future disk issues. Migration done via `docker cp` without reprocessing.
5. **Bulk import** — 500+ scanned Czech/Slovak modelling magazines (Letecký Modelář, 1950s1960s) imported
---
## Planned: Gemini Post-Processing
Future project to run nightly Gemini API post-processing on documents to improve OCR text, suggest tags, and improve titles. See `obsidian-vault/02 Projects/Gemini Post-Processing for Paperless.md`.