Use pdfmux inside Claude Code (and Claude Desktop, Cursor, any MCP agent)

TL;DRTwo-minute setup to give Claude Code, Claude Desktop, and Cursor any-PDF parsing via the pdfmux MCP server. Local, offline, no API keys.

Direct answer: pdfmux ships a native MCP server that gives Claude Code, Claude Desktop, Cursor, or any spec-compliant agent the ability to parse any local PDF — tables, scans, multilingual, anything — with confidence scores per page, no uploads, no API keys. Install with pip install pdfmux, add {"command": "pdfmux", "args": ["serve"]} to your agent’s MCP config, restart. The server exposes six tools: convert_pdf, analyze_pdf, extract_structured, extract_streaming, get_pdf_metadata, and batch_convert. Backed by a 7-extractor router (PyMuPDF, RapidOCR, Surya, Docling, Mistral OCR, Marker, Gemma 4) that scores 0.903 on opendataloader-bench — #2 overall, #1 among free tools.

If you use Claude Code, Claude Desktop, or Cursor and you’ve ever dragged a PDF into the chat window and waited for a lossy vision-model parse — there’s a faster, more accurate way.

pdfmux runs entirely on your machine, hashes every result so re-runs are near-instant, and tells the agent how confident it is on each page so the agent can flag uncertain regions instead of hallucinating through them. This guide is the entire setup, start to finish, with the current command surface as of pdfmux 1.6.3.

1. Install pdfmux

pip install pdfmux

That’s the baseline install — PyMuPDF, the router, and the MCP server. For scanned-document support add the OCR extras:

pip install "pdfmux[ocr]"     # RapidOCR + Surya
pip install "pdfmux[marker]"  # Marker neural extractor
pip install "pdfmux[llm-all]" # Mistral OCR + Gemma 4 + every LLM provider
pip install "pdfmux[all]"     # everything in one shot

You only need the extras for the engines you’ll actually use. The router gracefully skips anything that isn’t installed and tells you (via the structured error surface) what to pip install if a page needs a backend you don’t have.

2. Register the MCP server

The MCP config shape is identical across Claude Desktop, Claude Code, and Cursor — only the file path differs.

Claude Desktop

Open ~/Library/Application Support/Claude/claude_desktop_config.json on macOS (or the platform equivalent) and add:

{
  "mcpServers": {
    "pdfmux": {
      "command": "pdfmux",
      "args": ["serve"]
    }
  }
}

Restart Claude Desktop. You should see pdfmux under the plug icon in the composer.

Claude Code

Claude Code reads MCP config from ~/.claude/mcp.json:

{
  "mcpServers": {
    "pdfmux": {
      "command": "pdfmux",
      "args": ["serve"]
    }
  }
}

Then run claude and try: extract the tables from ~/Downloads/q3-report.pdf using pdfmux.

Cursor

Cursor reads from ~/.cursor/mcp.json with the same shape:

{
  "mcpServers": {
    "pdfmux": {
      "command": "pdfmux",
      "args": ["serve"]
    }
  }
}

Any other MCP client

If your agent implements the MCP spec (Continue, Cline, Zed’s assistant panel, your own client) it will pick up pdfmux from a stdio MCP config the same way. The server speaks standard MCP over stdio — no proprietary handshake.

3. Verify it works

In any of these agents, ask:

List the MCP tools available to you.

You should see six pdfmux tools:

convert_pdf(path, quality, format) — full extraction, returns Markdown / JSON / LLM-ready chunks
analyze_pdf(path) — classifier-only triage (page count, layout signals, suggested plan, no extraction)
extract_structured(path, schema) — pulls structured fields (invoice, receipt, contract) into typed JSON
extract_streaming(path) — NDJSON page-by-page streaming for long documents and live UIs
get_pdf_metadata(path) — title, author, page count, encryption status, form fields
batch_convert(dir, profile) — recurse a directory, apply a saved profile (invoices, receipts, papers, contracts, bulk-rag)

Now try a real task:

Use pdfmux to extract the tables from ~/Downloads/q3-report.pdf and summarise the revenue section.

The agent calls pdfmux locally, gets back structured content with per-page confidence, and reasons over it. The PDF never leaves your machine.

Why this matters more than it looks

The agent gets structured context, not a text blob

Most PDF-to-chat workflows collapse the document to one flat string, losing page boundaries, tables, and heading structure. pdfmux returns sectioned, chunked, token-estimated context — exactly what an LLM needs to reason well. The --format llm output is pre-chunked with overlap, page metadata, and token estimates per chunk; the agent can decide how much to load instead of stuffing the whole document into context.

You see confidence per page

When pdfmux returns {"page": 7, "confidence": 0.62, "text": "..."}, your agent can flag that page as uncertain instead of confidently hallucinating. This is the difference between “Claude answered” and “Claude answered correctly”. Confidence is computed per-page from extractor self-reports (PyMuPDF char-recovery rate, OCR engine confidence, neural-extractor logprobs) and audited against a separate verifier — the self-healing extraction loop re-routes low-confidence pages to a stronger backend before returning.

Failed pages get auto-retried with a stronger backend

The router picks from seven extractors — PyMuPDF, Docling, OpenDataLoader, RapidOCR, Surya, Mistral OCR, Marker, plus Gemma 4 as a vision LLM — and routes each page to the one most likely to succeed. Failures get re-extracted with a stronger backend, optionally including a BYOK LLM (your Gemini / Claude / GPT-4o key) as the last-resort fallback. The agent only sees the final, best result, along with a per-page recovered: true flag if a fallback fired.

On the opendataloader-bench 200-PDF dataset, this routing strategy scored 0.903 overall — ranking #2 across all libraries (paid and free) and #1 among free tools, with 0.920 on reading order and 0.911 on table accuracy (TEDS). The paid leader (the opendataloader-hybrid engine) scored 0.909. Numbers are from a benchmark re-run on April 22, 2026 against the then-current versions.

It’s local-first

No uploads. No API keys (unless you opt into Mistral or a BYOK LLM as fallback). No per-page bill. For sensitive documents — legal, medical, financial, case files — this is the entire reason to use it.

Re-runs are essentially free

Every extraction is keyed by (file_hash, quality, format, schema) and cached at ~/.cache/pdfmux/results/. A 600-page prospectus that took 14 seconds the first time comes back in 0.05 seconds on the second run — the same SHA-256 hash hits the cache regardless of which agent or which tool call triggered it. For an MCP-driven workflow where the agent might call convert_pdf, then extract_structured, then analyze_pdf on the same document in a single conversation, this turns a 30-second sequence into one.

Common follow-ups

“What about scanned PDFs?” pip install "pdfmux[ocr]" gets you RapidOCR and Surya. pdfmux picks the right engine per page based on layout and language signals. For Arabic and other RTL languages, pip install "pdfmux[llm-all]" adds Gemma 4 27B, which has native Arabic vision OCR.

“I want to use my own LLM as fallback.” Drop a 5-line pdfmux.yaml in your project:

llm:
  provider: anthropic
  model: claude-sonnet-4-6
  api_key_env: ANTHROPIC_API_KEY
  trigger_below_confidence: 0.7

pdfmux will only call your LLM when the confidence score says a page is unreliable. You can also pass --llm-provider gemini (or mistral, gemma, gpt-4o, ollama) on the CLI to override per-run.

“How do I cap my agent’s spend?” Two ways. First, pdfmux estimate <path> runs the classifier and the cost model without doing any extraction — it tells you exactly what each plan will cost before you commit. Second, the --budget flag (and the PDFMUX_BUDGET_USD env var) enforces a hard cap that aborts mid-run instead of overspending. For agent workflows we recommend setting a per-session budget on the MCP server’s environment so a runaway agent can’t bill more than you’ve authorized.

“Does this work with any MCP client, not just Anthropic’s?” Yes. pdfmux implements the standard MCP protocol over stdio. Any spec-compliant client works — Continue, Cline, Zed, your own.

“My agent extracted a 400-PDF batch and silently lost 16 documents. How do I prevent that?” This is real and it happened to us. The fix landed in 1.6.1: pdfmux batch_extract --strict --min-confidence 0.7 fails loud the moment any document falls below threshold, writes a manifest.json per batch with per-file status, and exposes pdfmux doctor --check <dir> to audit results after the fact. The full retro is here — if you’re putting pdfmux behind an agent that processes user-uploaded batches, read it before shipping.

“Can the MCP server emit results as the agent watches?” Yes — that’s what extract_streaming is for. For a 600-page document the agent can render page 1 while page 600 is still extracting. NDJSON events include classified, page (with confidence and char count), warning, and complete. Live UIs (and live agent reasoning) feel dramatically more responsive than waiting for the full doc to settle.

A realistic Claude Code session

Here’s what a working session looks like once pdfmux is wired in. The conversation is verbatim from a recent Claude Code task on a stack of Arabic / English bilingual logistics documents:

You: I dropped 12 bills of lading into ~/Downloads/cargo/. Use pdfmux to extract the consignee, vessel, and total weight from each one into a CSV.

Claude: I’ll call batch_convert with the invoices profile, then walk each result and pull the three fields. Running now… (Claude calls analyze_pdf first on one document to confirm the schema fits, then batch_convert with quality=high and schema=invoice, then writes a CSV.)

Claude: Done. 11 of 12 extracted with confidence ≥ 0.9. Document cargo-7.pdf came back at 0.62 — page 2 is a scanned signature block and the consignee field is blurred. I’ve flagged it in the CSV and saved the recovered text for your review.

That last paragraph is the whole point. Without confidence scoring, the agent would have silently emitted whatever it guessed for cargo-7.pdf. With it, the agent knows which row to flag and the human knows where to look.

Get started

pip install pdfmux
pdfmux serve

Then add the MCP config block above to whichever agent you use, restart, and ask it to list its tools. If you don’t see the six pdfmux tools, the most common causes are: forgetting to restart the agent after editing the config, having two mcpServers blocks in the JSON (only one is read), or the pdfmux binary not being on the agent’s PATH — fixable by setting "command": "/full/path/to/pdfmux" instead of just "pdfmux".

Repo: github.com/NameetP/pdfmux Full docs: pdfmux.com Release notes: pdfmux 1.6 — three new backends, smart cache, streaming, watch mode

If you hit anything weird, open an issue — feedback from agent builders is what’s shaping the 1.7 roadmap.