Paperless NGX Dedupe¶
Intelligent document deduplication for Paperless-NGX
Features¶
Intelligent Detection¶
MinHash signatures combined with Locality-Sensitive Hashing provide efficient O(n log n) candidate discovery — no need to compare every document against every other.
Multi-Dimensional Scoring¶
Four similarity dimensions — Jaccard text overlap, fuzzy text matching, metadata comparison, and filename similarity — are combined into a single confidence score with configurable weights.
Real-Time Processing¶
Background worker threads handle sync, analysis, and batch operations with real-time progress streamed via Server-Sent Events.
Single Container¶
Deploy with Docker Compose using an embedded SQLite database. No Redis, no Postgres, no external dependencies beyond Paperless-NGX itself.
Quick Start¶
# 1. Create your configuration
cp .env.example .env
# Edit .env — set PAPERLESS_URL and PAPERLESS_API_TOKEN
# 2. Start the application
docker compose up -d
# 3. Open the web UI
# http://localhost:3000
# 4. Sync → Analyze → Review duplicates
See the Getting Started Guide for a full walkthrough.
Explore the Documentation¶
-
Getting Started
First run walkthrough — sync documents, run analysis, and review duplicates
-
Configuration
Environment variables, authentication methods, and algorithm tuning parameters
-
API Reference
Complete REST API documentation with curl examples for every endpoint
-
How It Works
The deduplication pipeline — shingling, MinHash, LSH, scoring, and clustering
-
SDK Reference
TypeScript client library for programmatic access to the Paperless NGX Dedupe API
-
CLI Reference
Command-line interface for sync, analysis, configuration, and data export
Community & Support¶
- GitHub: rknightion/paperless-ngx-dedupe
- Issues: Report bugs or request features
- Discussions: Community discussions
- Paperless-NGX: Official Paperless-NGX project