Paperless NGX Dedupe¶

Intelligent document deduplication for Paperless-NGX

Features¶

Intelligent Detection¶

MinHash signatures combined with Locality-Sensitive Hashing provide efficient O(n log n) candidate discovery — no need to compare every document against every other.

Multi-Dimensional Scoring¶

Four similarity dimensions — Jaccard text overlap, fuzzy text matching, metadata comparison, and filename similarity — are combined into a single confidence score with configurable weights.

Real-Time Processing¶

Background worker threads handle sync, analysis, and batch operations with real-time progress streamed via Server-Sent Events.

Single Container¶

Deploy with Docker Compose using an embedded SQLite database. No Redis, no Postgres, no external dependencies beyond Paperless-NGX itself.

Quick Start¶

# 1. Create your configuration
cp .env.example .env
# Edit .env — set PAPERLESS_URL and PAPERLESS_API_TOKEN

# 2. Start the application
docker compose up -d

# 3. Open the web UI
# http://localhost:3000

# 4. Sync → Analyze → Review duplicates

See the Getting Started Guide for a full walkthrough.

Explore the Documentation¶

Getting Started

First run walkthrough — sync documents, run analysis, and review duplicates

Quick start
Configuration

Environment variables, authentication methods, and algorithm tuning parameters

Configure
API Reference

Complete REST API documentation with curl examples for every endpoint

API docs
How It Works

The deduplication pipeline — shingling, MinHash, LSH, scoring, and clustering

Learn more
SDK Reference

TypeScript client library for programmatic access to the Paperless NGX Dedupe API

SDK docs
CLI Reference

Command-line interface for sync, analysis, configuration, and data export

CLI docs

Community & Support¶

GitHub: rknightion/paperless-ngx-dedupe
Issues: Report bugs or request features
Discussions: Community discussions
Paperless-NGX: Official Paperless-NGX project