Open source · Self-hosted · No vector DB

Document Q&A without
embeddings or guesswork

Context Pool exhaustively scans every chunk of every document, pools positive hits, and synthesizes a final answer with verbatim citations — all running on your own infrastructure.

Get started →See how it's different

terminal

# 3 commands to get started

$git clone https://github.com/steve958/Context-Pool.git

$cp config.example.yaml config/config.yaml

$docker-compose -f docker-compose.hub.yml up

✓ backend ready http://localhost:8000

✓ frontend ready http://localhost:3000

LLM providers

File formats

Vector DBs needed

100%

Self-hosted

🔒Self-hosted — your data never leaves your infra

⚡No vector DB required

📄MIT Licensed — open source forever

The problem with prefiltering

Why not vector RAG?

Vector RAG prefilters chunks by similarity score before your LLM ever sees them. If the relevant passage scores low, it's silently dropped. Context Pool never prefilters — it reads every chunk.

Scenario: Legal contract — buried indemnification clause

Standard Vector RAG

Query

"Does this contract limit our liability for data breaches?"

⚡Similarity prefilter

Embedding model scores all 47 chunks. Top-5 retrieved by cosine similarity.

📄Retrieved chunks (top-5)

0.91§3.1 — Services overview and delivery timeline

0.88§7.2 — Payment terms and invoice schedule

0.85§12.4 — Governing law and jurisdiction

0.82§2.1 — Scope of engagement and deliverables

0.79§9.1 — Confidentiality obligations

⚠️Missed (score: 0.41)

✗§18.3 — Liability cap: In no event shall either party be liable for indirect, incidental, or consequential damages arising from data loss or security breaches, including but not limited to…

Wrong — key clause missed

Based on the retrieved sections, the contract does not appear to contain explicit liability limits for data breaches.

Context Pool

Query

"Does this contract limit our liability for data breaches?"

🔍Exhaustive scan

Reads all 47 chunks sequentially. No prefiltering. No chunk is skipped.

✅Positive hit found

✓§18.3 — Liability cap: In no event shall either party be liable for indirect, incidental, or consequential damages arising from data loss or security breaches, including but not limited to…

🗂️Pooled with 2 other hits

§18.3, §19.1 (force majeure carve-out), §21.2 (mutual indemnification) — synthesized together.

Correct — with verbatim citation

Yes. §18.3 explicitly caps liability for data breach damages. Confirmed by cross-reference in §19.1 and §21.2.

§18.3 — “In no event shall either party be liable for indirect, incidental, or consequential damages arising from data loss or security breaches…”

💡

The tradeoff is deliberate

Context Pool is slower than vector RAG because it reads every chunk. In domains where missing a single passage is unacceptable — legal, compliance, finance, medical — that slowness is the point. You get exhaustive recall, not probabilistic retrieval.

Reproducible results

Benchmarks

We measured Context Pool against vector RAG baselines on a synthetic legal contract dataset. The results confirm what the architecture predicts: exhaustive scanning finds answers that similarity prefiltering misses.

📊Recall Benchmark Results

Method	Recall	Chunks Examined	Est. Tokens
Context Pool (exhaustive)	100%	19 / 19	~116K
Vector RAG (top-5)	70%	5 / 19	~10K

✓

100% Recall

Context Pool examines every chunk. By design, it cannot miss an answer that exists in the document.

⚠

Prefiltering Risk

Vector RAG missed 3 of 10 answers due to keyword mismatches and similarity scoring thresholds.

⚖

The Tradeoff

Speed vs. certainty. Vector RAG is faster and cheaper. Context Pool is exhaustive.

Run the benchmark yourself on your own documents.

View Full Report↗

What's New

New in Context Pool

v1.3.0 · March 2026

Stay up to date with the latest features and improvements. Every release makes document analysis more powerful.

💾

Query History & Persistence

Every query you run is now automatically saved to disk. Review past questions, compare results over time, and re-run with a single click.

•Automatic persistence with gzip compression (~80% savings)
•Browse complete query history per workspace
•Re-run any historical query against current documents
•Full detail view with citations and token usage

Architecture

How Context Pool works

Four deterministic phases. No semantic shortcuts. Every document, every chunk, every time.

STEP 01

Parse

Each file is converted to clean Markdown — PDF text layers, DOCX headings, HTML content, EML bodies and attachments, or OCR for scanned images.

PyMuPDF · python-docx · BeautifulSoup · OCR.space

STEP 02

Chunk

Markdown is split into token-bounded segments that respect heading boundaries and page markers. Chunk size is fully configurable.

Heading-aware · Token-windowed · Page-marker preserved

STEP 03

Scan

Every chunk is sent to the LLM with a strict extractive prompt. Positive hits are pooled; empty chunks are discarded. No skipping, no shortcuts.

{"has_answer": true, "evidence_quotes": ["..."]}

STEP 04

Synthesize

All pooled hits are sent to the LLM in a single synthesis call. The result is a final answer with full citations: document, page, heading, verbatim quote.

{"final_answer": "...", "citations": [...]}

Deployment Flow

📄

Documents

PDF

DOCX

EML

HTML

Images

→

⚙️

Parse

Text extraction

OCR

Normalization

→

🧩

Chunk

Heading-aware split

Token windows

→

🔍

Scan

LLM per chunk

Hit detection

Pool building

→

📝

Synthesize

Evidence pooling

Cited answer

💡The key difference: Every chunk is checked individually. No semantic prefiltering. The LLM sees every segment of the document before synthesizing the final answer.

🔍

Exhaustive by design

Unlike vector-search RAG, Context Pool never prefilters chunks. Every segment of every document is evaluated against your question. If the answer exists somewhere in your documents, Context Pool will find it — even when the vocabulary in the question differs from the document.

Capabilities

Everything you need

Batteries included. From OCR to citations to a production-ready Docker setup.

🔎

Exhaustive scanning

Every chunk of every document is evaluated. No prefiltering, no semantic shortcuts, no missed passages.

📌

Verbatim citations

Every claim is backed by an exact quote from the source, with document name, page number, and heading path.

🏠

Fully self-hosted

Run on your own machine or server. Documents stay in your Docker volume. Your infrastructure, your data.

🔌

4 LLM providers

OpenAI, Anthropic, Google Gemini, and Ollama. Switch without changing code — just update config.yaml.

📄

8 file formats

PDF (text + scanned), DOCX, TXT, Markdown, HTML, EML (with attachments), PNG, and JPEG.

👁

OCR built in

Scanned PDFs and images are processed via OCR.space. Toggle per query — no permanent setup needed.

📧

Email-aware parsing

.eml files are parsed intelligently: body, attachments, or both — individually chunked and cited.

⚡

Real-time progress

WebSocket events stream chunk-by-chunk progress to the UI as the scan runs. No polling required.

🧩

REST + WebSocket API

Every feature is available programmatically. The UI is just a client. Build your own integration.

🗂

Workspaces

Organise documents into named workspaces. Query a single document or the entire workspace at once.

🎛

Configurable chunking

Control chunk size, overlap strategy, and token limits. Tune the accuracy vs. cost trade-off for your use case.

🔐

Production security

API key auth middleware, CORS env config, non-root Docker user, file MIME validation, and input bounds checking.

LLM Providers

Your model, your choice

Switch providers by changing one line in config.yaml. No code changes needed.

OpenAIRecommended

gpt-4ogpt-4o-minigpt-4-turbo

provider: openai
api_key: "ENV:OPENAI_API_KEY"
model: "gpt-4o-mini"
context_window_tokens: 128000
max_chunk_tokens: 24000

💡 gpt-4o-mini is the best cost/quality starting point.

AnthropicBest reasoning

claude-3-5-sonnetclaude-3-5-haikuclaude-3-opus

provider: anthropic
api_key: "ENV:ANTHROPIC_API_KEY"
model: "claude-3-5-haiku-20241022"
context_window_tokens: 200000
max_chunk_tokens: 32000

💡 200K context window means fewer, larger chunks.

Google GeminiLargest context

gemini-2.0-flashgemini-1.5-progemini-1.5-flash

provider: google
api_key: "ENV:GOOGLE_API_KEY"
model: "gemini-2.0-flash"
context_window_tokens: 1000000
max_chunk_tokens: 48000

💡 1M context window. Very large chunk sizes possible.

Ollama100% offline

llama3.2mistralphi3deepseek-r1

provider: ollama
api_key: ""
model: "llama3.2"
context_window_tokens: 8192
max_chunk_tokens: 3000
ollama_base_url: "http://host.docker.internal:11434"

💡 Nothing leaves your machine. Requires Ollama running locally.

Installation

Up and running in minutes

Docker Compose is the fastest path. Local dev and API-only modes also supported.

1Clone the repo

git clone https://github.com/steve958/Context-Pool.git
cd Context-Pool

2Create config

mkdir -p config
cp config.example.yaml config/config.yaml
# Edit config/config.yaml — set provider + model

3Set your API key

# Create .env at the project root
echo "OPENAI_API_KEY=sk-proj-..." > .env

# Optional: enable API authentication
echo "API_KEY=your-secret-here" >> .env

4Start (pulls pre-built images — no build needed)

docker-compose -f docker-compose.hub.yml up

# UI  → http://localhost:3000
# API → http://localhost:8000/docs

REST API

First-class programmatic access

Every feature available in the UI is accessible via REST API and WebSocket. Build your own integrations.

WS /ws/query/{run_id}

Real-time events: chunk_progress · synthesis_started · synthesis_finished · error

Request

{
  "name": "Q3 Contracts"
}

Response

{
  "ws_id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "Q3 Contracts",
  "document_count": 0
}

"What are the stated load-bearing limits in each structural report?"

RESULT

Extracted 22 numeric values with units, pages, and table headings cited.

FAQ

Document Q&A withoutembeddings or guesswork

Why not vector RAG?

Benchmarks

New in Context Pool

Query History & Persistence

How Context Pool works

Parse

Chunk

Scan

Synthesize

Everything you need

Your model, your choice

Up and running in minutes

First-class programmatic access

Built for high-stakes document work

Contract review

Literature review

Due diligence

Email archive search

Clinical document review

Technical spec analysis

Common questions

Document Q&A without
embeddings or guesswork