Self hosted

Documents, structured and queryable.

Paperwise helps you OCR, extract, organize, and query documents on your own infrastructure. Run it locally or self-host it, and keep full control of your data.

Recent documents 5 new today
Name Date Status
Invoice_Dec_invoice_I.pdf March 12, 2025 parsed
All orders from the records
Monthly Payments September 1, 2024
Paperwise Due See related results

From raw files to structured answers

Paperwise combines OCR, extraction, organization, and grounded Q&A in one self-hosted workflow.

OCR & extraction accuracy

OCR & Data Extraction

Go from a 50-page PDF to structured data in seconds. Paperwise can switch between local OCR and LLM-based OCR depending on document quality, helping with scans, dense layouts, and messier files.

Name Date Received
Sonic Invoice March 12, 2025
Annual Financials September 1, 2024
Auto Rate April 5, 2025

Flexible document categories

Smart Organization

Auto-tag documents by type, date, entity, and custom taxonomy. Build filterable views across your library and refine the structure over time.

Auto-tagged categories

Contract Health Invoice Amendment Tuition Housing Legal Finance Billing Insurance

Source-backed answers

Grounded Q&A

Ask natural language questions and get answers grounded in your documents, with source quotes you can trace back to the original files.

"What is the billing cap in the Renfroe contract?"

The aggregate liability is capped at $1,000,000 as stated in Section 4, Paragraph 3 of the executed agreement.

3 configurable model slots

Task-Specific Models

Configure different AI models for OCR, extraction, and Q&A. Use fast models for bulk processing and premium models for high-stakes analysis.

Example model configuration

OCR gemini-2.5-flash Fast
Extraction gpt-4.1-mini Balanced
Doc Q&A gpt-4.1 Accurate

See it in action

Real queries. Real documents. Real answers with citations.

Invoices Financial Multi-doc query

Query

"What domains I'm paying for on Cloudflare?"

Returned a structured table of every domain, billing period, and cost across all invoices — each row citing its source document.

Across 8 invoices

Billing Time series Summarization

Query

"How much did I spend on Sonic internet in each month?"

Extracted a 10-month billing history with amounts and service periods, plus a summary of the charge range. Every row cites its source invoice.

Across 10 monthly invoices

Policy docs Financial analysis Complex reasoning

Query

"List every time the target rate changed from Federal Reserve"

Identified three rate changes across policy statements and produced a timeline with old ranges, new ranges, and reserve balance rates.

Across 3 policy statements

Three steps. Less manual work.

Self-host Paperwise and turn raw files into structured, answerable knowledge.

Upload your files in batch

Drag and drop PDFs, scans, images, or text documents into your workspace.

AI processes & structures

OCR runs automatically. Metadata is extracted, and documents are organized.

Query & act

Use grounded AI to search language across any set of documents instantly.

Ready to self-host your document stack?

Clone the repo, run it locally or on your own infrastructure, and keep your document workflows under your control.

Open source and self-hosted