Skip to content

Paperless NGX Dedupe

Intelligent document deduplication for Paperless-NGX

Features

Intelligent Detection

MinHash signatures combined with Locality-Sensitive Hashing provide efficient O(n log n) candidate discovery — no need to compare every document against every other.

Multi-Dimensional Scoring

Four similarity dimensions — Jaccard text overlap, fuzzy text matching, metadata comparison, and filename similarity — are combined into a single confidence score with configurable weights.

Real-Time Processing

Background worker threads handle sync, analysis, and batch operations with real-time progress streamed via Server-Sent Events.

Single Container

Deploy with Docker Compose using an embedded SQLite database. No Redis, no Postgres, no external dependencies beyond Paperless-NGX itself.

Quick Start

# 1. Create your configuration
cp .env.example .env
# Edit .env — set PAPERLESS_URL and PAPERLESS_API_TOKEN

# 2. Start the application
docker compose up -d

# 3. Open the web UI
# http://localhost:3000

# 4. Sync → Analyze → Review duplicates

See the Getting Started Guide for a full walkthrough.


Explore the Documentation

  • Getting Started


    First run walkthrough — sync documents, run analysis, and review duplicates

    Quick start

  • Configuration


    Environment variables, authentication methods, and algorithm tuning parameters

    Configure

  • API Reference


    Complete REST API documentation with curl examples for every endpoint

    API docs

  • How It Works


    The deduplication pipeline — shingling, MinHash, LSH, scoring, and clustering

    Learn more

  • SDK Reference


    TypeScript client library for programmatic access to the Paperless NGX Dedupe API

    SDK docs

  • CLI Reference


    Command-line interface for sync, analysis, configuration, and data export

    CLI docs

Community & Support