Development Guide¶
This guide covers setting up a local development environment, running tests, and understanding the project structure.
Prerequisites¶
- Node.js 24 or later
- pnpm 9.15 or later
- Git
Setup¶
# Clone the repository
git clone https://github.com/rknightion/paperless-ngx-dedupe.git
cd paperless-ngx-dedupe
# Install dependencies
pnpm install
# Copy environment configuration
cp .env.example .env
# Edit .env — set PAPERLESS_URL and PAPERLESS_API_TOKEN
# Start the development server
pnpm dev
# Opens at http://localhost:5173
Worker Thread Limitation
Background jobs (sync, analysis, batch delete) use worker_threads that run as raw Node.js processes outside Vite. These do not work with pnpm dev because Node.js cannot execute the TypeScript source files directly. Use docker compose up to test the full workflow including background jobs.
Project Structure¶
paperless-ngx-dedupe/
├── packages/
│ ├── core/ # Framework-agnostic business logic
│ │ └── src/
│ │ ├── dedup/ # MinHash, LSH, scoring, analysis
│ │ ├── sync/ # Document sync and normalization
│ │ ├── jobs/ # Worker thread management
│ │ ├── queries/ # Database queries (Drizzle ORM)
│ │ ├── schema/ # Database schema definitions
│ │ ├── paperless/ # Paperless-NGX API client
│ │ ├── export/ # CSV and JSON export
│ │ └── config.ts # Zod-validated environment config
│ ├── web/ # SvelteKit 2 application
│ │ └── src/
│ │ ├── routes/ # UI pages and API endpoints
│ │ └── lib/ # Shared components and utilities
│ ├── sdk/ # TypeScript API client library
│ │ └── src/
│ │ ├── client.ts # PaperlessDedupeClient class
│ │ └── types.ts # Exported type definitions
│ └── cli/ # Command-line interface
│ └── src/
│ ├── commands/ # sync, analyze, status, config, export
│ └── lib/ # CLI utilities
├── docs/ # Documentation (this site)
├── Dockerfile # Multi-stage Docker build
├── compose.yml # Development/production compose
└── pnpm-workspace.yaml # Monorepo workspace config
Package Dependencies¶
graph TD
Web["packages/web<br/>SvelteKit App"] --> Core["packages/core<br/>Business Logic"]
CLI["packages/cli<br/>CLI Tool"] --> Core
SDK["packages/sdk<br/>API Client"] -.->|"HTTP"| Web
style Core fill:#e8eaf6,stroke:#3f51b5
style Web fill:#e8f5e9,stroke:#4caf50
style SDK fill:#fff3e0,stroke:#ff9800
style CLI fill:#fce4ec,stroke:#e91e63
- core: No framework dependencies. All business logic lives here.
- web: Imports
@paperless-dedupe/coredirectly. Serves both the UI and REST API. - cli: Imports
@paperless-dedupe/coredirectly. Runs operations without the web server. - sdk: Communicates with the web server over HTTP. Zero dependencies on core.
Build Commands¶
| Command | Description |
|---|---|
pnpm dev |
Start SvelteKit dev server at http://localhost:5173 |
pnpm build |
Build all packages (core, sdk, cli, web) |
pnpm check |
TypeScript type-check all packages |
pnpm test |
Run Vitest tests (core and sdk) |
pnpm lint |
ESLint check |
pnpm lint:fix |
ESLint auto-fix |
pnpm format |
Prettier check |
pnpm format:fix |
Prettier auto-fix |
Testing¶
Tests use Vitest and are co-located with source files as *.test.ts or in __tests__/ directories.
# Run all tests once
pnpm test
# Watch mode (per package)
pnpm --filter @paperless-dedupe/core test:watch
pnpm --filter @paperless-dedupe/sdk test:watch
# Run tests for a specific package
pnpm --filter @paperless-dedupe/core test
pnpm --filter @paperless-dedupe/sdk test
Test structure:
packages/core/src/dedup/__tests__/-- MinHash, LSH, shingles, scoringpackages/core/src/sync/__tests__/-- Document sync and normalizationpackages/core/src/queries/__tests__/-- Database query testspackages/core/src/paperless/__tests__/-- Paperless API client testspackages/core/src/jobs/__tests__/-- Job management testspackages/sdk/src/__tests__/-- SDK client tests
Code Conventions¶
- Formatting: Prettier -- 100 char width, single quotes, trailing commas, 2-space indent
- Path alias:
@paperless-dedupe/coreresolves topackages/core/src/index.ts - API routes: SvelteKit file-based routing at
packages/web/src/routes/api/v1/ - Validation: Zod schemas for environment config and API request bodies
- Logging: Pino structured JSON logging
- Styling: Tailwind CSS 4 via Vite plugin
Database¶
- Engine: SQLite via
better-sqlite3 - ORM: Drizzle ORM with schema defined in
packages/core/src/schema/ - Migrations: Auto-detected via SHA-256 hashing of DDL statements. Run on startup when
AUTO_MIGRATE=true(the default). - Tables:
document,documentContent,documentSignature,duplicateGroup,duplicateMember,job,appConfig,syncState
Docker Development¶
For testing the full workflow including background jobs:
The Dockerfile uses a 3-stage build:
- deps -- Install pnpm dependencies
- build -- Build core + web and bundle the CLI
- production -- Minimal runtime with flattened node_modules
The container runs as a non-root user using PUID/PGID (defaults: 1000:1000). Data is persisted at /app/data (mounted from ./docker-data by default).