v0.8.0 · self-hosted — Open source · MIT

Documents,
structured
and queryable.

Paperwise turns the pile of PDFs, scans, and statements on your machine into a searchable, citation-backed knowledge base. Run the whole stack locally. Your files never leave the box.

Local OCR· Bring-your-own LLM· No telemetry

A library, not a folder.

248 documents · 13 tags
filtered by correspondent, type, and date

§01 — Pipeline

From raw files
to structured answers.

OCR, extraction, organization, and grounded Q&A — four stages, one self-hosted binary. Swap the model behind each stage. Refine the taxonomy over time. Keep everything on your own infrastructure.

01 · OCR & EXTRACTION ~3s / page

Read messy PDFs like a person would.

Switch between local OCR and an LLM model per document. Paperwise handles scans, multi-column layouts, handwritten margins, and 50-page statements without babysitting.

Extracted · last batch
DocumentTypeReceived
Sonic InvoiceInvoiceMar 12, 2026
Annual FinancialsStatementSep 1, 2025
Auto Rate NoticeInsuranceApr 5, 2026
Kindergarten PacketFormMar 14, 2026
02 · ORGANIZATION auto-tagged

Tagged by type, party, date.

Auto-classify each document into a taxonomy you control. Refine the rules, merge categories, and the library reorganizes itself.

Active taxonomy · 13 tags
12Contract 28Medical 41Invoice 7Amendment 3Tuition 19Utility 6Legal 22Finance 14Billing 11Insurance
03 · GROUNDED Q&A cited

Answers that cite their source.

Every answer comes back with the page-level citations it was built from. Click through, verify, and trust.

"What is the billing cap in the Renfroe contract?"

Aggregate liability is capped at $1,000,000 in the executed agreement.

renfroe-msa.pdf · p.4 amendment-2.pdf · p.1
04 · MODEL ROUTING 3 slots · BYO key

The right model for each job.

Configure separate models for OCR, extraction, and Q&A — cheap and fast where it doesn't matter, slow and accurate where it does. Local, OpenAI-compatible, or hosted.

Active configuration
OCR gemini-2.5-flash fast
Extract gpt-4.1-mini balanced
Q&A gpt-4.1 accurate

§02 — In the wild

Private documents,
grounded answers.

Each example is reconstructed from public sample documents in a demo library — with the page numbers to prove it.

Medical bills EOB + invoice Reconciliation

Which medical bill still needs action, and what should I pay next?

Reconciles the insurance EOB, clinic invoice, and HSA record so the answer tells you what was covered, what remains, and which document is the source of truth.

Across billing & insurance records · 3 sources

Utility bills Table extraction Monthly trend

Summarize my PG&E bills by month and show usage in a table.

Pulls four separate monthly statements into one cross-document table — totals, kWh, therms — each row linked to the statement it came from.

Across four utility statements · 4 sources

Vendor renewal Contract + policy Cross-document

What should we know before renewing Acme Analytics?

Pulls renewal terms, expiring security review, procurement policy, and usage exports into a compact brief — flagging the open issues before sign-off.

Across contract, policy & usage docs · 4 sources

§03 — Workflow

Three steps. No babysitting.

Drop files into the watch folder and walk away. Paperwise does the rest — OCR, extraction, tagging, indexing — and lets you know when something needs your eyes.

step / 01

Drop in a stack.

Drag PDFs, scans, screenshots, and text dumps into the workspace. Or point Paperwise at a folder and let it watch.

Upload · Watch folder

step / 02

Let it work.

OCR runs automatically. Metadata is extracted. Documents are tagged, dated, and attached to the right correspondent. Re-indexable any time.

OCR · Extract · Tag

step / 03

Ask in plain English.

Search by language, ask cross-document questions, get back answers with the citations attached. Trace each claim to the page that proves it.

Query · Cite · Act

§04 — Self-host

Your documents
stay on your machine.

Clone the repo, run a single command, and you've got a private document intelligence service. No accounts. No upload limits. No outbound calls you didn't authorize.

LicenseMIT
DeployDocker Compose · localhost
StoragePostgres + local object store
QueueRedis + Celery worker
OCRLocal Tesseract or LLM OCR
Modelsself-hosted or API provider
Telemetrynone in the app