Document Condensing Engine

Your AI is only as good
as its input

Upload any document. Get back structured, labeled, deduplicated data — ready for any LLM workflow. One API call.

100K+

Tokens in

Many×

Fold reduction

92%

Compression

Max batch files

From 100K+ tokens to fewer tokens

Every metric preserved. Every table structured. Every risk factor labeled.

Raw Document

Before processing

65 pages, 100K+ tokens
Unstructured text dump
Table structure lost in extraction
No semantic labels or categories
Exceeds LLM context windows

Structured Output

After pretreatment

{
  "document_type": "10-K",
  "company_name": "Apple Inc.",
  "skill": "financial",
  "metrics": [
    { "label": "Revenue",
      "value": "$416,161M",
      "period": "FY2025" }
  ],
  "condensed_text": "~8K tokens..."
}

Pipeline

Seven steps, one API call

Ingest, structure, return — runtime depends on document size and complexity.

Upload

PDF, DOCX, or XLSX

Extract

Text & tables preserved

Chunk

Section-aware splits

Classify

Route by type

Analyze

Metrics, facts & risks

Merge

Deduplicate & reconcile

Condense

Structured output

Upload

PDF, DOCX, or XLSX

Extract

Text & tables preserved

Chunk

Section-aware splits

Classify

Route by type

Analyze

Metrics, facts & risks

Merge

Deduplicate & reconcile

Condense

Structured output

Skills

Domain-specific intelligence

Each skill teaches the engine what to extract and how to structure it for your domain.

Auto

Let the engine classify the document and choose the best skill — financial, legal, sales, or generic — so you do not have to pick one up front.

skill="auto"

Financial Analysis

10-K filings, annual reports, earnings. Revenue, margins, ratios, and risk factors.

skill="financial"

Legal Contracts

Contracts, NDAs, leases. Clauses, obligations, parties, and key dates.

skill="legal"

Real Estate

Appraisals, leases, property reports. Valuations, lease terms, cap rates.

skill="real-estate"

Sales & Growth

Sales reports, QBRs, board decks. ARR/MRR, pipeline, conversion rates.

skill="sales"

Generic Document

Any document type. Key facts, entities, dates, numbers, and summary.

skill="generic"

Integration

Built for Crucible

Every document processed through Crucible is powered by Pretreatment. Raw files stay ephemeral; structured output is stored in the cloud so you can pick up where you left off.

Document Upload

User uploads to Crucible

PreTreatment.io

Condensed to ~8K tokens

Cloud storage

Output Reuse anytime

Crucible

Business decision engine

API

Plug the engine into your product

Your app sends documents to our service over HTTPS. Single files go to /extract; multiple files in one request go to /extract/batch. You pass an optional skill field (not skill_id) so we know how to label the output. Add your API key on every call; we return JSON your product can store or pipe into any model.

// Pretreatment API
const base = "https://api.pretreatment.io";
const apiKey = "pt_live_YOUR_KEY_HERE";
const auth = { Authorization: `Bearer ${apiKey}` };

// One file → POST /extract (multipart: file + optional skill)
const one = new FormData();
one.append("file", pdfFile);
// one.append("skill", "financial");  // optional: auto, financial, legal, …

const single = await fetch(`${base}/extract`, {
  method: "POST",
  headers: auth,
  body: one,
});
const doc = await single.json();
// doc.condensed_text, doc.metrics, …

// Several files at once → POST /extract/batch (same fields, repeat file)
const batch = new FormData();
batch.append("file", fileA);
batch.append("file", fileB);
batch.append("skill", "financial");

const multi = await fetch(`${base}/extract/batch`, {
  method: "POST",
  headers: auth,
  body: batch,
});
const payload = await multi.json();
// payload.documents (or equivalent) — see API reference

Interested? Contact us

Your AI is only as goodas its input