Document Condensing Engine

Your AI is only as good
as its input

Upload any document. Get back structured, labeled, deduplicated data — ready for any LLM workflow. One API call.

100K+
Tokens in
Many×
Fold reduction
92%
Compression
10
Max batch files

From 100K+ tokens to fewer tokens

Every metric preserved. Every table structured. Every risk factor labeled.

Raw Document

Before processing

  • 65 pages, 100K+ tokens
  • Unstructured text dump
  • Table structure lost in extraction
  • No semantic labels or categories
  • Exceeds LLM context windows

Structured Output

After pretreatment

{
  "document_type": "10-K",
  "company_name": "Apple Inc.",
  "skill": "financial",
  "metrics": [
    { "label": "Revenue",
      "value": "$416,161M",
      "period": "FY2025" }
  ],
  "condensed_text": "~8K tokens..."
}

Pipeline

Seven steps, one API call

Ingest, structure, return — runtime depends on document size and complexity.

Upload

PDF, DOCX, or XLSX

Extract

Text & tables preserved

Chunk

Section-aware splits

Classify

Route by type

Analyze

Metrics, facts & risks

Merge

Deduplicate & reconcile

Condense

Structured output

Skills

Domain-specific intelligence

Each skill teaches the engine what to extract and how to structure it for your domain.

Auto

Let the engine classify the document and choose the best skill — financial, legal, sales, or generic — so you do not have to pick one up front.

skill="auto"

Financial Analysis

10-K filings, annual reports, earnings. Revenue, margins, ratios, and risk factors.

skill="financial"

Legal Contracts

Contracts, NDAs, leases. Clauses, obligations, parties, and key dates.

skill="legal"

Real Estate

Appraisals, leases, property reports. Valuations, lease terms, cap rates.

skill="real-estate"

Sales & Growth

Sales reports, QBRs, board decks. ARR/MRR, pipeline, conversion rates.

skill="sales"

Generic Document

Any document type. Key facts, entities, dates, numbers, and summary.

skill="generic"

Integration

Built for Crucible

Every document processed through Crucible is powered by Pretreatment. Raw files stay ephemeral; structured output is stored in the cloud so you can pick up where you left off.

Document Upload

User uploads to Crucible

PreTreatment.io

Condensed to ~8K tokens

Cloud storage

Output Reuse anytime

Roundtable mark

Crucible

Business decision engine

API

Plug the engine into your product

Your app sends documents to our service over HTTPS. Single files go to /extract; multiple files in one request go to /extract/batch. You pass an optional skill field (not skill_id) so we know how to label the output. Add your API key on every call; we return JSON your product can store or pipe into any model.

// Pretreatment API
const base = "https://api.pretreatment.io";
const apiKey = "pt_live_YOUR_KEY_HERE";
const auth = { Authorization: `Bearer ${apiKey}` };

// One file → POST /extract (multipart: file + optional skill)
const one = new FormData();
one.append("file", pdfFile);
// one.append("skill", "financial");  // optional: auto, financial, legal, …

const single = await fetch(`${base}/extract`, {
  method: "POST",
  headers: auth,
  body: one,
});
const doc = await single.json();
// doc.condensed_text, doc.metrics, …

// Several files at once → POST /extract/batch (same fields, repeat file)
const batch = new FormData();
batch.append("file", fileA);
batch.append("file", fileB);
batch.append("skill", "financial");

const multi = await fetch(`${base}/extract/batch`, {
  method: "POST",
  headers: auth,
  body: batch,
});
const payload = await multi.json();
// payload.documents (or equivalent) — see API reference