Paperless-NGX Dedupe¶
Find and resolve duplicate documents with MinHash/LSH and fuzzy OCR matching, then add AI metadata suggestions when you are ready.
What it does¶
Paperless-NGX Dedupe connects to your paperless-ngx instance, syncs documents, analyzes content similarity, and groups likely duplicates. You can review groups in the UI, resolve duplicates safely, and optionally run OpenAI-based metadata extraction for titles, tags, correspondents, document types, and dates as part of the LLM-based categorization workflow.
Quick start¶
- Start the stack with Docker (see the root README for compose examples).
- Open the UI at http://localhost:30002 and configure your Paperless-NGX connection.
- Sync documents, then run deduplication analysis from the Dashboard controls.
- Review and resolve duplicates from the Duplicates page.
- (Optional) Add an OpenAI key in Settings and run AI Processing.
Documentation map¶
- Getting Started - setup and first run
- User Guide - UI walkthrough and workflows
- AI Processing - metadata suggestions with OpenAI
- Configuration - settings and environment variables
- Troubleshooting - common issues and fixes
API reference¶
When the backend is running, interactive API documentation is available at http://localhost:30001/docs.