Claude_Homelab/111_paperless_deployment.md

7.3 KiB
Raw Blame History

Paperless-ngx Deployment — CT 111

Overview

Self-hosted document management system with multi-language OCR. Deployed on CT 111 via Docker Compose, accessible at paperless.spendlik.sk. All documents, media, and data stored on NAS.

Property Value
Container CT 111
Hostname paperless
IP 192.168.1.111
OS Debian 13 (privileged LXC, nesting=1)
URL https://paperless.spendlik.sk
Internal port 8000
Compose file /opt/paperless/docker-compose.yml
NAS mount (host) /mnt/pve/spendlik-nas/data/paperless
NAS mount (CT) /mnt/paperless

LXC Configuration

# /etc/pve/lxc/111.conf
arch: amd64
cores: 2
features: nesting=1
hostname: paperless
memory: 8192
mp1: /mnt/pve/spendlik-nas/data/paperless,mp=/mnt/paperless,shared=1
net0: name=eth0,bridge=vmbr0,gw=192.168.1.1,hwaddr=BC:24:11:A8:11:71,ip=192.168.1.111/24,type=veth
ostype: debian
rootfs: local-lvm:vm-111-disk-0,size=50G
startup: order=5,up=30
swap: 1024

⚠️ RAM is set to 8192MB — this was raised from 4GB to handle bulk OCR. Should be reduced to 2048MB once bulk imports are complete.


NAS Directory Structure

The entire /mnt/pve/spendlik-nas/data/paperless is bind-mounted into CT 111 at /mnt/paperless. Subdirectories:

Path (inside CT) Purpose
/mnt/paperless/consume Drop files here for automatic ingestion
/mnt/paperless/export Export destination
/mnt/paperless/media Processed documents (originals, archive, thumbnails)
/mnt/paperless/data Paperless application data (search index, classifier, etc.)

Docker Compose

Located at /opt/paperless/docker-compose.yml:

services:
  broker:
    image: redis:7
    restart: unless-stopped
    volumes:
      - redis_data:/data

  db:
    image: postgres:16
    restart: unless-stopped
    volumes:
      - pg_data:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: <see Vaultwarden>

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    user: root
    depends_on:
      - db
      - broker
    ports:
      - "8000:8000"
    volumes:
      - /mnt/paperless/data:/usr/src/paperless/data
      - /mnt/paperless/media:/usr/src/paperless/media
      - /mnt/paperless/export:/usr/src/paperless/export
      - /mnt/paperless/consume:/usr/src/paperless/consume
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBHOST: db
      PAPERLESS_DBNAME: paperless
      PAPERLESS_DBUSER: paperless
      PAPERLESS_DBPASS: <see Vaultwarden>
      PAPERLESS_URL: https://paperless.spendlik.sk
      PAPERLESS_SECRET_KEY: <see Vaultwarden>
      PAPERLESS_TIME_ZONE: Europe/Bratislava
      PAPERLESS_OCR_LANGUAGE: slk+ces+rus+hun+deu+eng
      PAPERLESS_OCR_LANGUAGES: slk ces rus hun deu eng

volumes:
  redis_data:
  pg_data:

media and data were originally Docker named volumes. They were migrated to NAS bind mounts after the container disk filled up during bulk OCR. See migration notes below.


Docker Container Names

Name Image Purpose
paperless-webserver-1 ghcr.io/paperless-ngx/paperless-ngx:latest Main app + Celery worker + consumer
paperless-db-1 postgres:16 Database
paperless-broker-1 redis:7 Task queue

nginx Reverse Proxy (CT 101)

Config at /etc/nginx/sites-available/paperless.spendlik.sk:

server {
    server_name paperless.spendlik.sk;

    location / {
        proxy_pass http://192.168.1.111:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/paperless.spendlik.sk/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/paperless.spendlik.sk/privkey.pem;
    include /etc/letsencrypt/options-ssl-nginx.conf;
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
}

server {
    if ($host = paperless.spendlik.sk) {
        return 301 https://$host$request_uri;
    }
    listen 80;
    server_name paperless.spendlik.sk;
    return 404;
}

Consuming Documents

Automatic (inotify watcher)

Drop files into /mnt/paperless/consume — the consumer detects new files automatically via inotify and queues them. The consumer runs inside the paperless-webserver-1 container.

Manual trigger (for pre-existing files)

The inotify watcher only detects new file additions, not files already present when the container starts. To process existing files:

cd /opt/paperless
docker compose exec webserver python3 manage.py document_consumer --oneshot

⚠️ Flag is --oneshot (one word), not --one-shot.

If consumer process is not running

Check with:

docker compose exec webserver ps aux | grep consumer

If missing, restart the webserver container:

docker compose restart webserver

Then watch logs to confirm consumer starts:

docker compose logs webserver --tail=50 -f

Look for: Using inotify to watch directory for changes: /usr/src/paperless/consume


Supported File Types

Paperless-ngx supports PDF and common image formats (JPG, PNG, etc.). .djvu files are not supported and will be skipped with a warning.


OCR Notes

  • 6 languages configured: Slovak, Czech, Russian, Hungarian, German, English
  • Tesseract warnings about "lots of diacritics" and "too few characters" are normal for old scanned magazines — not errors
  • OCR is CPU-intensive; bulk imports require adequate RAM (8GB during bulk, can reduce to 2GB after)

Troubleshooting

Symptom Cause Fix
Files in consume folder not processing Consumer process died (OOM kill) docker compose restart webserver
HTTP 500 on web UI Container disk full Check disk: df -h; migrate volumes to NAS or resize disk
chown: Invalid argument in logs NAS mount doesn't allow ownership changes Harmless — files still process correctly
OOM kill of worker Insufficient RAM during bulk OCR pct set 111 --memory 8192 --swap 1024 on Proxmox host
Tasks show as failed in UI OOM kill mid-processing Re-trigger with --oneshot; failed tasks can be cleared from UI

Deployment History & Key Events

  1. Initial deploy — CT 111 created, Docker + Paperless stack deployed, NAS consume/export mounted via Proxmox bind mount
  2. Disk fill — Container root disk (20GB) filled during bulk OCR of 500+ magazines; resized to 50GB (pct resize 111 rootfs +30G)
  3. OOM kills — 2GB RAM insufficient for 6-language bulk OCR; raised to 4GB then 8GB
  4. NAS migrationmedia and data Docker named volumes migrated to NAS bind mounts (/mnt/paperless/media and /mnt/paperless/data) to avoid future disk issues. Migration done via docker cp without reprocessing.
  5. Bulk import — 500+ scanned Czech/Slovak modelling magazines (Letecký Modelář, 1950s1960s) imported

Planned: Gemini Post-Processing

Future project to run nightly Gemini API post-processing on documents to improve OCR text, suggest tags, and improve titles. See obsidian-vault/02 Projects/Gemini Post-Processing for Paperless.md.