Your AI is only as good
as its input
Upload any document. Get back structured, labeled, deduplicated data — ready for any LLM workflow. One API call.
From 100K+ tokens to fewer tokens
Every metric preserved. Every table structured. Every risk factor labeled.
Raw Document
Before processing
- 65 pages, 100K+ tokens
- Unstructured text dump
- Table structure lost in extraction
- No semantic labels or categories
- Exceeds LLM context windows
Structured Output
After pretreatment
{
"document_type": "10-K",
"company_name": "Apple Inc.",
"skill": "financial",
"metrics": [
{ "label": "Revenue",
"value": "$416,161M",
"period": "FY2025" }
],
"condensed_text": "~8K tokens..."
}Pipeline
Seven steps, one API call
Ingest, structure, return — runtime depends on document size and complexity.
Upload
PDF, DOCX, or XLSX
Extract
Text & tables preserved
Chunk
Section-aware splits
Classify
Route by type
Analyze
Metrics, facts & risks
Merge
Deduplicate & reconcile
Condense
Structured output
Upload
PDF, DOCX, or XLSX
Extract
Text & tables preserved
Chunk
Section-aware splits
Classify
Route by type
Analyze
Metrics, facts & risks
Merge
Deduplicate & reconcile
Condense
Structured output
Skills
Domain-specific intelligence
Each skill teaches the engine what to extract and how to structure it for your domain.
Auto
Let the engine classify the document and choose the best skill — financial, legal, sales, or generic — so you do not have to pick one up front.
Financial Analysis
10-K filings, annual reports, earnings. Revenue, margins, ratios, and risk factors.
Legal Contracts
Contracts, NDAs, leases. Clauses, obligations, parties, and key dates.
Real Estate
Appraisals, leases, property reports. Valuations, lease terms, cap rates.
Sales & Growth
Sales reports, QBRs, board decks. ARR/MRR, pipeline, conversion rates.
Generic Document
Any document type. Key facts, entities, dates, numbers, and summary.
Integration
Built for Crucible
Every document processed through Crucible is powered by Pretreatment. Raw files stay ephemeral; structured output is stored in the cloud so you can pick up where you left off.
API
Plug the engine into your product
Your app sends documents to our service over HTTPS. Single files go to /extract; multiple files in one request go to /extract/batch. You pass an optional skill field (not skill_id) so we know how to label the output. Add your API key on every call; we return JSON your product can store or pipe into any model.
// Pretreatment API
const base = "https://api.pretreatment.io";
const apiKey = "pt_live_YOUR_KEY_HERE";
const auth = { Authorization: `Bearer ${apiKey}` };
// One file → POST /extract (multipart: file + optional skill)
const one = new FormData();
one.append("file", pdfFile);
// one.append("skill", "financial"); // optional: auto, financial, legal, …
const single = await fetch(`${base}/extract`, {
method: "POST",
headers: auth,
body: one,
});
const doc = await single.json();
// doc.condensed_text, doc.metrics, …
// Several files at once → POST /extract/batch (same fields, repeat file)
const batch = new FormData();
batch.append("file", fileA);
batch.append("file", fileB);
batch.append("skill", "financial");
const multi = await fetch(`${base}/extract/batch`, {
method: "POST",
headers: auth,
body: batch,
});
const payload = await multi.json();
// payload.documents (or equivalent) — see API reference