# Paperless-ngx Deployment — CT 111 ## Overview Self-hosted document management system with multi-language OCR. Deployed on CT 111 via Docker Compose, accessible at `paperless.spendlik.sk`. All documents, media, and data stored on NAS. | Property | Value | |---|---| | **Container** | CT 111 | | **Hostname** | paperless | | **IP** | 192.168.1.111 | | **OS** | Debian 13 (privileged LXC, `nesting=1`) | | **URL** | https://paperless.spendlik.sk | | **Internal port** | 8000 | | **Compose file** | `/opt/paperless/docker-compose.yml` | | **NAS mount (host)** | `/mnt/pve/spendlik-nas/data/paperless` | | **NAS mount (CT)** | `/mnt/paperless` | --- ## LXC Configuration ``` # /etc/pve/lxc/111.conf arch: amd64 cores: 2 features: nesting=1 hostname: paperless memory: 8192 mp1: /mnt/pve/spendlik-nas/data/paperless,mp=/mnt/paperless,shared=1 net0: name=eth0,bridge=vmbr0,gw=192.168.1.1,hwaddr=BC:24:11:A8:11:71,ip=192.168.1.111/24,type=veth ostype: debian rootfs: local-lvm:vm-111-disk-0,size=50G startup: order=5,up=30 swap: 1024 ``` > ⚠️ RAM is set to 8192MB — this was raised from 4GB to handle bulk OCR. Should be reduced to 2048MB once bulk imports are complete. --- ## NAS Directory Structure The entire `/mnt/pve/spendlik-nas/data/paperless` is bind-mounted into CT 111 at `/mnt/paperless`. Subdirectories: | Path (inside CT) | Purpose | |---|---| | `/mnt/paperless/consume` | Drop files here for automatic ingestion | | `/mnt/paperless/export` | Export destination | | `/mnt/paperless/media` | Processed documents (originals, archive, thumbnails) | | `/mnt/paperless/data` | Paperless application data (search index, classifier, etc.) | --- ## Docker Compose Located at `/opt/paperless/docker-compose.yml`: ```yaml services: broker: image: redis:7 restart: unless-stopped volumes: - redis_data:/data db: image: postgres:16 restart: unless-stopped volumes: - pg_data:/var/lib/postgresql/data environment: POSTGRES_DB: paperless POSTGRES_USER: paperless POSTGRES_PASSWORD: webserver: image: ghcr.io/paperless-ngx/paperless-ngx:latest restart: unless-stopped user: root depends_on: - db - broker ports: - "8000:8000" volumes: - /mnt/paperless/data:/usr/src/paperless/data - /mnt/paperless/media:/usr/src/paperless/media - /mnt/paperless/export:/usr/src/paperless/export - /mnt/paperless/consume:/usr/src/paperless/consume environment: PAPERLESS_REDIS: redis://broker:6379 PAPERLESS_DBHOST: db PAPERLESS_DBNAME: paperless PAPERLESS_DBUSER: paperless PAPERLESS_DBPASS: PAPERLESS_URL: https://paperless.spendlik.sk PAPERLESS_SECRET_KEY: PAPERLESS_TIME_ZONE: Europe/Bratislava PAPERLESS_OCR_LANGUAGE: slk+ces+rus+hun+deu+eng PAPERLESS_OCR_LANGUAGES: slk ces rus hun deu eng volumes: redis_data: pg_data: ``` > ℹ️ `media` and `data` were originally Docker named volumes. They were migrated to NAS bind mounts after the container disk filled up during bulk OCR. See migration notes below. --- ## Docker Container Names | Name | Image | Purpose | |---|---|---| | `paperless-webserver-1` | `ghcr.io/paperless-ngx/paperless-ngx:latest` | Main app + Celery worker + consumer | | `paperless-db-1` | `postgres:16` | Database | | `paperless-broker-1` | `redis:7` | Task queue | --- ## nginx Reverse Proxy (CT 101) Config at `/etc/nginx/sites-available/paperless.spendlik.sk`: ```nginx server { server_name paperless.spendlik.sk; location / { proxy_pass http://192.168.1.111:8000; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } listen 443 ssl; # managed by Certbot ssl_certificate /etc/letsencrypt/live/paperless.spendlik.sk/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/paperless.spendlik.sk/privkey.pem; include /etc/letsencrypt/options-ssl-nginx.conf; ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; } server { if ($host = paperless.spendlik.sk) { return 301 https://$host$request_uri; } listen 80; server_name paperless.spendlik.sk; return 404; } ``` --- ## Consuming Documents ### Automatic (inotify watcher) Drop files into `/mnt/paperless/consume` — the consumer detects new files automatically via inotify and queues them. The consumer runs inside the `paperless-webserver-1` container. ### Manual trigger (for pre-existing files) The inotify watcher only detects **new** file additions, not files already present when the container starts. To process existing files: ```bash cd /opt/paperless docker compose exec webserver python3 manage.py document_consumer --oneshot ``` > ⚠️ Flag is `--oneshot` (one word), not `--one-shot`. ### If consumer process is not running Check with: ```bash docker compose exec webserver ps aux | grep consumer ``` If missing, restart the webserver container: ```bash docker compose restart webserver ``` Then watch logs to confirm consumer starts: ```bash docker compose logs webserver --tail=50 -f ``` Look for: `Using inotify to watch directory for changes: /usr/src/paperless/consume` --- ## Supported File Types Paperless-ngx supports PDF and common image formats (JPG, PNG, etc.). `.djvu` files are **not supported** and will be skipped with a warning. --- ## OCR Notes - 6 languages configured: Slovak, Czech, Russian, Hungarian, German, English - Tesseract warnings about "lots of diacritics" and "too few characters" are normal for old scanned magazines — not errors - OCR is CPU-intensive; bulk imports require adequate RAM (8GB during bulk, can reduce to 2GB after) --- ## Troubleshooting | Symptom | Cause | Fix | |---|---|---| | Files in consume folder not processing | Consumer process died (OOM kill) | `docker compose restart webserver` | | HTTP 500 on web UI | Container disk full | Check disk: `df -h`; migrate volumes to NAS or resize disk | | `chown: Invalid argument` in logs | NAS mount doesn't allow ownership changes | Harmless — files still process correctly | | OOM kill of worker | Insufficient RAM during bulk OCR | `pct set 111 --memory 8192 --swap 1024` on Proxmox host | | Tasks show as failed in UI | OOM kill mid-processing | Re-trigger with `--oneshot`; failed tasks can be cleared from UI | --- ## Deployment History & Key Events 1. **Initial deploy** — CT 111 created, Docker + Paperless stack deployed, NAS consume/export mounted via Proxmox bind mount 2. **Disk fill** — Container root disk (20GB) filled during bulk OCR of 500+ magazines; resized to 50GB (`pct resize 111 rootfs +30G`) 3. **OOM kills** — 2GB RAM insufficient for 6-language bulk OCR; raised to 4GB then 8GB 4. **NAS migration** — `media` and `data` Docker named volumes migrated to NAS bind mounts (`/mnt/paperless/media` and `/mnt/paperless/data`) to avoid future disk issues. Migration done via `docker cp` without reprocessing. 5. **Bulk import** — 500+ scanned Czech/Slovak modelling magazines (Letecký Modelář, 1950s–1960s) imported --- ## Planned: Gemini Post-Processing Future project to run nightly Gemini API post-processing on documents to improve OCR text, suggest tags, and improve titles. See `obsidian-vault/02 Projects/Gemini Post-Processing for Paperless.md`.