Add Paperless-ngx deployment guide (CT 111)
This commit is contained in:
parent
843193352e
commit
11abd62309
230
111_paperless_deployment.md
Normal file
230
111_paperless_deployment.md
Normal file
@ -0,0 +1,230 @@
|
|||||||
|
# Paperless-ngx Deployment — CT 111
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Self-hosted document management system with multi-language OCR. Deployed on CT 111 via Docker Compose, accessible at `paperless.spendlik.sk`. All documents, media, and data stored on NAS.
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|---|---|
|
||||||
|
| **Container** | CT 111 |
|
||||||
|
| **Hostname** | paperless |
|
||||||
|
| **IP** | 192.168.1.111 |
|
||||||
|
| **OS** | Debian 13 (privileged LXC, `nesting=1`) |
|
||||||
|
| **URL** | https://paperless.spendlik.sk |
|
||||||
|
| **Internal port** | 8000 |
|
||||||
|
| **Compose file** | `/opt/paperless/docker-compose.yml` |
|
||||||
|
| **NAS mount (host)** | `/mnt/pve/spendlik-nas/data/paperless` |
|
||||||
|
| **NAS mount (CT)** | `/mnt/paperless` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## LXC Configuration
|
||||||
|
|
||||||
|
```
|
||||||
|
# /etc/pve/lxc/111.conf
|
||||||
|
arch: amd64
|
||||||
|
cores: 2
|
||||||
|
features: nesting=1
|
||||||
|
hostname: paperless
|
||||||
|
memory: 8192
|
||||||
|
mp1: /mnt/pve/spendlik-nas/data/paperless,mp=/mnt/paperless,shared=1
|
||||||
|
net0: name=eth0,bridge=vmbr0,gw=192.168.1.1,hwaddr=BC:24:11:A8:11:71,ip=192.168.1.111/24,type=veth
|
||||||
|
ostype: debian
|
||||||
|
rootfs: local-lvm:vm-111-disk-0,size=50G
|
||||||
|
startup: order=5,up=30
|
||||||
|
swap: 1024
|
||||||
|
```
|
||||||
|
|
||||||
|
> ⚠️ RAM is set to 8192MB — this was raised from 4GB to handle bulk OCR. Should be reduced to 2048MB once bulk imports are complete.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## NAS Directory Structure
|
||||||
|
|
||||||
|
The entire `/mnt/pve/spendlik-nas/data/paperless` is bind-mounted into CT 111 at `/mnt/paperless`. Subdirectories:
|
||||||
|
|
||||||
|
| Path (inside CT) | Purpose |
|
||||||
|
|---|---|
|
||||||
|
| `/mnt/paperless/consume` | Drop files here for automatic ingestion |
|
||||||
|
| `/mnt/paperless/export` | Export destination |
|
||||||
|
| `/mnt/paperless/media` | Processed documents (originals, archive, thumbnails) |
|
||||||
|
| `/mnt/paperless/data` | Paperless application data (search index, classifier, etc.) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Docker Compose
|
||||||
|
|
||||||
|
Located at `/opt/paperless/docker-compose.yml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
broker:
|
||||||
|
image: redis:7
|
||||||
|
restart: unless-stopped
|
||||||
|
volumes:
|
||||||
|
- redis_data:/data
|
||||||
|
|
||||||
|
db:
|
||||||
|
image: postgres:16
|
||||||
|
restart: unless-stopped
|
||||||
|
volumes:
|
||||||
|
- pg_data:/var/lib/postgresql/data
|
||||||
|
environment:
|
||||||
|
POSTGRES_DB: paperless
|
||||||
|
POSTGRES_USER: paperless
|
||||||
|
POSTGRES_PASSWORD: <see Vaultwarden>
|
||||||
|
|
||||||
|
webserver:
|
||||||
|
image: ghcr.io/paperless-ngx/paperless-ngx:latest
|
||||||
|
restart: unless-stopped
|
||||||
|
user: root
|
||||||
|
depends_on:
|
||||||
|
- db
|
||||||
|
- broker
|
||||||
|
ports:
|
||||||
|
- "8000:8000"
|
||||||
|
volumes:
|
||||||
|
- /mnt/paperless/data:/usr/src/paperless/data
|
||||||
|
- /mnt/paperless/media:/usr/src/paperless/media
|
||||||
|
- /mnt/paperless/export:/usr/src/paperless/export
|
||||||
|
- /mnt/paperless/consume:/usr/src/paperless/consume
|
||||||
|
environment:
|
||||||
|
PAPERLESS_REDIS: redis://broker:6379
|
||||||
|
PAPERLESS_DBHOST: db
|
||||||
|
PAPERLESS_DBNAME: paperless
|
||||||
|
PAPERLESS_DBUSER: paperless
|
||||||
|
PAPERLESS_DBPASS: <see Vaultwarden>
|
||||||
|
PAPERLESS_URL: https://paperless.spendlik.sk
|
||||||
|
PAPERLESS_SECRET_KEY: <see Vaultwarden>
|
||||||
|
PAPERLESS_TIME_ZONE: Europe/Bratislava
|
||||||
|
PAPERLESS_OCR_LANGUAGE: slk+ces+rus+hun+deu+eng
|
||||||
|
PAPERLESS_OCR_LANGUAGES: slk ces rus hun deu eng
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
redis_data:
|
||||||
|
pg_data:
|
||||||
|
```
|
||||||
|
|
||||||
|
> ℹ️ `media` and `data` were originally Docker named volumes. They were migrated to NAS bind mounts after the container disk filled up during bulk OCR. See migration notes below.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Docker Container Names
|
||||||
|
|
||||||
|
| Name | Image | Purpose |
|
||||||
|
|---|---|---|
|
||||||
|
| `paperless-webserver-1` | `ghcr.io/paperless-ngx/paperless-ngx:latest` | Main app + Celery worker + consumer |
|
||||||
|
| `paperless-db-1` | `postgres:16` | Database |
|
||||||
|
| `paperless-broker-1` | `redis:7` | Task queue |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## nginx Reverse Proxy (CT 101)
|
||||||
|
|
||||||
|
Config at `/etc/nginx/sites-available/paperless.spendlik.sk`:
|
||||||
|
|
||||||
|
```nginx
|
||||||
|
server {
|
||||||
|
server_name paperless.spendlik.sk;
|
||||||
|
|
||||||
|
location / {
|
||||||
|
proxy_pass http://192.168.1.111:8000;
|
||||||
|
proxy_set_header Host $host;
|
||||||
|
proxy_set_header X-Real-IP $remote_addr;
|
||||||
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||||
|
proxy_set_header X-Forwarded-Proto $scheme;
|
||||||
|
}
|
||||||
|
|
||||||
|
listen 443 ssl; # managed by Certbot
|
||||||
|
ssl_certificate /etc/letsencrypt/live/paperless.spendlik.sk/fullchain.pem;
|
||||||
|
ssl_certificate_key /etc/letsencrypt/live/paperless.spendlik.sk/privkey.pem;
|
||||||
|
include /etc/letsencrypt/options-ssl-nginx.conf;
|
||||||
|
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
|
||||||
|
}
|
||||||
|
|
||||||
|
server {
|
||||||
|
if ($host = paperless.spendlik.sk) {
|
||||||
|
return 301 https://$host$request_uri;
|
||||||
|
}
|
||||||
|
listen 80;
|
||||||
|
server_name paperless.spendlik.sk;
|
||||||
|
return 404;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Consuming Documents
|
||||||
|
|
||||||
|
### Automatic (inotify watcher)
|
||||||
|
Drop files into `/mnt/paperless/consume` — the consumer detects new files automatically via inotify and queues them. The consumer runs inside the `paperless-webserver-1` container.
|
||||||
|
|
||||||
|
### Manual trigger (for pre-existing files)
|
||||||
|
The inotify watcher only detects **new** file additions, not files already present when the container starts. To process existing files:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /opt/paperless
|
||||||
|
docker compose exec webserver python3 manage.py document_consumer --oneshot
|
||||||
|
```
|
||||||
|
|
||||||
|
> ⚠️ Flag is `--oneshot` (one word), not `--one-shot`.
|
||||||
|
|
||||||
|
### If consumer process is not running
|
||||||
|
Check with:
|
||||||
|
```bash
|
||||||
|
docker compose exec webserver ps aux | grep consumer
|
||||||
|
```
|
||||||
|
|
||||||
|
If missing, restart the webserver container:
|
||||||
|
```bash
|
||||||
|
docker compose restart webserver
|
||||||
|
```
|
||||||
|
|
||||||
|
Then watch logs to confirm consumer starts:
|
||||||
|
```bash
|
||||||
|
docker compose logs webserver --tail=50 -f
|
||||||
|
```
|
||||||
|
|
||||||
|
Look for: `Using inotify to watch directory for changes: /usr/src/paperless/consume`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Supported File Types
|
||||||
|
|
||||||
|
Paperless-ngx supports PDF and common image formats (JPG, PNG, etc.). `.djvu` files are **not supported** and will be skipped with a warning.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## OCR Notes
|
||||||
|
|
||||||
|
- 6 languages configured: Slovak, Czech, Russian, Hungarian, German, English
|
||||||
|
- Tesseract warnings about "lots of diacritics" and "too few characters" are normal for old scanned magazines — not errors
|
||||||
|
- OCR is CPU-intensive; bulk imports require adequate RAM (8GB during bulk, can reduce to 2GB after)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
| Symptom | Cause | Fix |
|
||||||
|
|---|---|---|
|
||||||
|
| Files in consume folder not processing | Consumer process died (OOM kill) | `docker compose restart webserver` |
|
||||||
|
| HTTP 500 on web UI | Container disk full | Check disk: `df -h`; migrate volumes to NAS or resize disk |
|
||||||
|
| `chown: Invalid argument` in logs | NAS mount doesn't allow ownership changes | Harmless — files still process correctly |
|
||||||
|
| OOM kill of worker | Insufficient RAM during bulk OCR | `pct set 111 --memory 8192 --swap 1024` on Proxmox host |
|
||||||
|
| Tasks show as failed in UI | OOM kill mid-processing | Re-trigger with `--oneshot`; failed tasks can be cleared from UI |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Deployment History & Key Events
|
||||||
|
|
||||||
|
1. **Initial deploy** — CT 111 created, Docker + Paperless stack deployed, NAS consume/export mounted via Proxmox bind mount
|
||||||
|
2. **Disk fill** — Container root disk (20GB) filled during bulk OCR of 500+ magazines; resized to 50GB (`pct resize 111 rootfs +30G`)
|
||||||
|
3. **OOM kills** — 2GB RAM insufficient for 6-language bulk OCR; raised to 4GB then 8GB
|
||||||
|
4. **NAS migration** — `media` and `data` Docker named volumes migrated to NAS bind mounts (`/mnt/paperless/media` and `/mnt/paperless/data`) to avoid future disk issues. Migration done via `docker cp` without reprocessing.
|
||||||
|
5. **Bulk import** — 500+ scanned Czech/Slovak modelling magazines (Letecký Modelář, 1950s–1960s) imported
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Planned: Gemini Post-Processing
|
||||||
|
|
||||||
|
Future project to run nightly Gemini API post-processing on documents to improve OCR text, suggest tags, and improve titles. See `obsidian-vault/02 Projects/Gemini Post-Processing for Paperless.md`.
|
||||||
Loading…
Reference in New Issue
Block a user