Troubleshooting¶
Paperless-NGX Connection Issues¶
"Connection refused" or timeout errors¶
Symptom: The Test Connection button fails, or sync jobs fail immediately.
Causes and fixes:
- Wrong URL: Verify
PAPERLESS_URLis correct. It should include the protocol and port (e.g.,http://paperless:8000). Do not include a trailing slash. - Docker networking: If both Paperless-NGX and Paperless NGX Dedupe run in Docker,
localhostinside the Dedupe container refers to itself, not the host. Use the container name (e.g.,http://paperless-ngx:8000) or the Docker network IP. Both containers must be on the same Docker network. - Firewall: Ensure the Paperless-NGX port is accessible from the Dedupe container. On Linux,
iptablesorufwrules may block inter-container traffic.
# Test connectivity from inside the container
docker compose exec app node -e "fetch('http://paperless:8000/api/').then(r => console.log(r.status)).catch(console.error)"
Authentication failures (401)¶
Symptom: Test Connection returns "Unauthorized" or sync fails with a 401 error.
Causes and fixes:
- Invalid token: Regenerate your API token in Paperless-NGX and update
PAPERLESS_API_TOKEN. - Wrong auth method: If using username/password, ensure both
PAPERLESS_USERNAMEandPAPERLESS_PASSWORDare set. Providing only one will fail. - Paperless-NGX permissions: The token or user must have read access to all documents. Admin-level tokens work best.
# Verify your token works
curl -H "Authorization: Token your-token-here" http://paperless:8000/api/documents/
SSL/TLS errors¶
If your Paperless-NGX instance uses HTTPS with a self-signed certificate, the Node.js runtime may reject the connection. This is a security measure. If you must bypass it for local development:
Sync Problems¶
"A job of type 'sync' is already running or pending"¶
Symptom: POST /api/v1/sync returns a 409 error.
Fix: Wait for the current sync to complete. Only one sync job can run at a time. Check the current job status:
If a sync appears stuck, check the container logs for errors. If needed, cancel the job:
Documents synced but no content¶
Symptom: Documents appear in the Documents page but have no text content.
Causes:
- OCR not complete: Paperless-NGX may still be processing documents. Wait for Paperless-NGX to finish OCR, then re-sync.
- Documents without text: Some documents (images without OCR, corrupted PDFs) may genuinely have no extractable text. Check the Documents page for processing status.
Slow sync¶
Causes:
- First sync is always the slowest because it fetches all documents. Subsequent incremental syncs only fetch changes.
- Large library: Syncing thousands of documents takes time. The progress bar shows how many documents have been processed.
- Network latency: If Paperless-NGX is on a remote server, network speed is the bottleneck.
Enable debug logging for detailed sync progress:
Analysis Issues¶
No duplicates found¶
Possible causes:
- No actual duplicates: Your library may not contain duplicate documents.
- Threshold too high: The default
similarityThresholdof 0.75 requires strong similarity. Try lowering it to 0.5 or 0.6. - Documents too short: Documents with fewer than
minWords(default: 20) words are skipped. Check how many documents were analyzed vs. total. - Sync incomplete: Ensure documents have text content. Run sync first if you have not already.
# Check analysis results
curl http://localhost:3000/api/v1/analysis/status
# Lower the threshold
curl -X PUT http://localhost:3000/api/v1/config/dedup \
-H 'Content-Type: application/json' \
-d '{ "similarityThreshold": 0.5 }'
Too many false positives¶
Symptom: The system flags documents as duplicates when they are clearly different.
Fixes:
- Raise
similarityThreshold(e.g., 0.85 or 0.90) - Adjust confidence weights to shift emphasis between Jaccard and fuzzy text matching
- Reduce
numBandsto narrow the candidate pool - See How It Works - Tuning Guide for detailed guidance
Documents skipped during analysis¶
Symptom: documentsAnalyzed is much lower than totalDocuments.
Cause: Documents with fewer than minWords words are excluded from analysis. This is intentional -- very short documents produce unreliable MinHash signatures.
If you want to include shorter documents:
curl -X PUT http://localhost:3000/api/v1/config/dedup \
-H 'Content-Type: application/json' \
-d '{ "minWords": 5 }'
"A job of type 'analysis' is already running or pending"¶
Same as the sync job conflict -- wait for the current analysis to finish or cancel it.
Database Issues¶
"database is locked"¶
Symptom: API requests fail with "database is locked" errors.
Causes:
- Multiple processes writing to the same SQLite file simultaneously. Paperless NGX Dedupe handles this internally, but if you have external tools accessing the same database file, they may conflict.
- A crashed worker left a write lock. Restart the container.
Volume permissions¶
Symptom: Container fails to start with "Permission denied" errors for the database.
Fix: The container drops privileges to PUID/PGID (defaults: 1000:1000). Ensure your bind-mounted data directory is writable by that user/group:
If you override PUID/PGID in .env, use those values instead.
Database corruption¶
In rare cases (e.g., unclean shutdown during a write), SQLite databases can become corrupted.
Recovery steps:
- Stop the container:
docker compose down - Back up
./docker-data/paperless-ngx-dedupe.db - Try the SQLite integrity check:
- If corrupted beyond repair, delete the database and re-sync:
Resetting the database¶
To start fresh, remove the persisted data directory:
This deletes all synced data, duplicate groups, and configuration. You will need to sync and analyze again.
Docker Issues¶
Port conflicts¶
Symptom: Container fails to start with "port is already in use".
Fix: Change the host port in your .env or compose.yml:
Or map to a different host port directly:
ORIGIN environment variable¶
Symptom: POST requests return 403 "Cross-site POST form submissions are forbidden".
Cause: SvelteKit requires the ORIGIN environment variable to match the URL users access the app at. This is a CSRF protection mechanism.
Fix: Set ORIGIN in your .env:
# For local access
ORIGIN=http://localhost:3000
# Behind a reverse proxy
ORIGIN=https://dedupe.example.com
Data directory path issues¶
Symptom: Startup or write failures for the SQLite database.
Cause: The app needs write access to /app/data in the container. In the default setup, this maps to ./docker-data on the host.
Fix: Verify the mount and host permissions:
Viewing container logs¶
# Follow logs in real-time
docker compose logs -f app
# Last 100 lines
docker compose logs --tail 100 app
Performance Tuning¶
Large libraries (10,000+ documents)¶
- Sync: The first sync will take time proportional to your library size. Subsequent syncs are incremental and fast.
- Analysis: MinHash signature generation is O(n). LSH candidate detection is sub-quadratic. The most expensive step is detailed scoring of candidate pairs.
- Memory: Signatures are stored in the SQLite database, not in memory. RAM usage is modest even for large libraries.
Reducing analysis time¶
- Lower
numPermutations(e.g., 128) -- fewer hash computations per document - Lower
fuzzySampleSize(e.g., 2000) -- less text compared per pair - Raise
similarityThreshold-- fewer pairs to score in detail - Increase
minWords-- skip more short documents
Reducing false positives in large libraries¶
Large libraries tend to surface more borderline matches. Consider:
- Setting
similarityThresholdto 0.85 or higher - Using the Duplicates page filters to focus on high-confidence groups first
Getting Help¶
Diagnostic information¶
Gather this information before reporting an issue:
# Application readiness (checks DB + Paperless connectivity)
curl http://localhost:3000/api/v1/ready
# Container logs
docker compose logs --tail 200 app
# Sync status
curl http://localhost:3000/api/v1/sync/status
# Analysis status
curl http://localhost:3000/api/v1/analysis/status
# Document stats
curl http://localhost:3000/api/v1/documents/stats
Enable debug logging¶
Set LOG_LEVEL=debug in your .env file and restart:
Debug logs include detailed information about API calls, sync progress, and analysis stages.
Filing issues¶
Report bugs on the project's GitHub issue tracker. Include:
- What you were doing when the issue occurred
- The error message or unexpected behavior
- Output from the diagnostic commands above
- Your Docker Compose configuration (redact tokens/passwords)