AI Processing (OpenAI)¶
AI Processing uses OpenAI to extract metadata from OCR text and existing metadata. This is the LLM-based categorization flow for Paperless-NGX Dedupe. It proposes (but does not apply) updates for: - Title - Correspondent - Document type - Tags (up to 5) - Date
All suggestions include confidence scores and remain in a pending review state until you apply them.
Behavior details: - Values may be null when evidence is missing - Dates are ISO formatted (YYYY-MM-DD) when detected - Suggestions are returned in English
Requirements¶
- Paperless-NGX connection configured in Settings
- OpenAI API key configured in Settings (or via env var)
Configuration options¶
In Settings > AI Processing: - OpenAI API Key - Model: gpt-5.1, gpt-5-mini, or gpt-5-nano - Reasoning effort: low, medium, high - Max OCR characters per document (default 12000)
The max input cap controls token usage and cost.
Running a job¶
- Open AI Processing
- Choose a tag or process all documents
- Select which fields to extract (or Everything)
- Start processing
Jobs are queued and processed in the background. Progress is shown in the Current run card, and completed results appear in the Results table.
Reviewing results¶
Each row shows: - Current document metadata - Suggested values with confidence scores - Status (pending_review, applied, failed)
You can: - Select specific rows - Choose which fields to apply - Apply selected suggestions
Nothing is written to Paperless-NGX until you click Apply.
Applying suggestions to Paperless-NGX¶
When you apply results: - Titles, dates, and document types update the Paperless document - Tags and correspondents are created in Paperless-NGX if missing - The local cache is updated to match Paperless
If the Paperless connection is not configured, apply will fail.
Health checks¶
The Verify OpenAI button checks that your API key and model are valid. It uses model retrieval and does not consume tokens.
Privacy and cost¶
OCR text and relevant metadata are sent to OpenAI. Use a key you control and review privacy requirements for your documents. Reduce cost by: - Using gpt-5-mini or gpt-5-nano - Lowering the max OCR character limit - Processing a single tag instead of all documents
Common issues¶
- OpenAI API key missing: add a key in Settings
- Model not allowed: use gpt-5.1, gpt-5-mini, or gpt-5-nano
- Health check fails: verify network access and API key validity
- No results: ensure documents have OCR content and the job completed