MinML®
Features
How it works
Savings
Calculator
MinML®
Features
How it works
Savings
Calculator
MinML
Cut LLM costs by 40–60%.
Download for macOS
Read the docs
MinML
Cut LLM costs by 40–60%.
Download for macOS
Read the docs
MinML
Cut LLM costs by 40–60%.
Download for macOS
Read the docs
-60%
fewer input tokens
2–3×
effective context
<200ms
overhead, local-first
-60%
fewer input tokens
2–3×
effective context
<200ms
overhead, local-first
-60%
fewer input tokens
2–3×
effective context
<200ms
overhead, local-first
Semantic compression.
Three modes: Clean, Focus, Ultra-choose your tradeoff between brevity and fidelity.

Model-agnostic.
Supports major providers: OpenAI, Anthropic, Google, gpt-oss, Ollama/TGI.
Privacy, observability.
On-device by default, one-tap purge, optional encryption. Per-request metrics and CSV/JSON exports.

Privacy is Priority
Semantic compression.
Three modes: Clean, Focus, Ultra-choose your tradeoff between brevity and fidelity.

Model-agnostic.
Supports major providers: OpenAI, Anthropic, Google, gpt-oss, Ollama/TGI.
Privacy, observability.
On-device by default, one-tap purge, optional encryption. Per-request metrics and CSV/JSON exports.

Privacy is Priority
Semantic compression.
Three modes: Clean, Focus, Ultra-choose your tradeoff between brevity and fidelity.

Model-agnostic.
Supports major providers: OpenAI, Anthropic, Google, gpt-oss, Ollama/TGI.
Privacy, observability.
On-device by default, one-tap purge, optional encryption. Per-request metrics and CSV/JSON exports.

Privacy is Priority
How it works: minimal effort.
Your prompt: “Teach me the fundamentals of software engineering as if I’m five years old and completely new to the field.”
MinML intercepts your prompt, compresses it semantically, and forwards a compact version to your model, no workflow changes required.
How it works: minimal effort.
Your prompt: “Teach me the fundamentals of software engineering as if I’m five years old and completely new to the field.”
MinML intercepts your prompt, compresses it semantically, and forwards a compact version to your model, no workflow changes required.
How it works: minimal effort.
Your prompt: “Teach me the fundamentals of software engineering as if I’m five years old and completely new to the field.”
MinML intercepts your prompt, compresses it semantically, and forwards a compact version to your model, no workflow changes required.
3-step compression timeline.
1. Clean (−20%) → 32 tokens: removes filler/duplication, preserves structure.
2. Focus (−25%) → 24 tokens: prioritizes entities, numbers, verbs.
3. Ultra (−33%) → 16 tokens: shorthand + reversible AI Dictionary merges.
3-step compression timeline.
1. Clean (−20%) → 32 tokens: removes filler/duplication, preserves structure.
2. Focus (−25%) → 24 tokens: prioritizes entities, numbers, verbs.
3. Ultra (−33%) → 16 tokens: shorthand + reversible AI Dictionary merges.
3-step compression timeline.
1. Clean (−20%) → 32 tokens: removes filler/duplication, preserves structure.
2. Focus (−25%) → 24 tokens: prioritizes entities, numbers, verbs.
3. Ultra (−33%) → 16 tokens: shorthand + reversible AI Dictionary merges.

AI Dictionary
Opt-in, collapses frequent phrases into reversible super-tokens. See dictionary merges.

Deploy anywhere
macOS app, Chrome extension, Docker/CLI, air-gapped ready.

AI Dictionary
Opt-in, collapses frequent phrases into reversible super-tokens. See dictionary merges.

Deploy anywhere
macOS app, Chrome extension, Docker/CLI, air-gapped ready.

AI Dictionary
Opt-in, collapses frequent phrases into reversible super-tokens. See dictionary merges.

Deploy anywhere
macOS app, Chrome extension, Docker/CLI, air-gapped ready.
Pricing strip.
≤50M tokens/mo: 10% • 50–200M: 8% • 200M+: 5%.


$0 minimum charge.
If input savings <30%, you pay $0. Shared-savings only.
Transparent fees.
You only pay 10% of what you save. Net to you is maximized.

Pricing strip.
≤50M tokens/mo: 10% • 50–200M: 8% • 200M+: 5%.


$0 minimum charge.
If input savings <30%, you pay $0. Shared-savings only.
Transparent fees.
You only pay 10% of what you save. Net to you is maximized.

Pricing strip.
≤50M tokens/mo: 10% • 50–200M: 8% • 200M+: 5%.


$0 minimum charge.
If input savings <30%, you pay $0. Shared-savings only.
Transparent fees.
You only pay 10% of what you save. Net to you is maximized.

Privacy & security.
No cloud storage. Ephemeral cache with one-click purge.
Optional encryption at rest; local-first by default.
Privacy & security.
No cloud storage. Ephemeral cache with one-click purge.
Optional encryption at rest; local-first by default.
Privacy & security.
No cloud storage. Ephemeral cache with one-click purge.
Optional encryption at rest; local-first by default.
How do I know quality is retained?
We benchmark ROUGE/BLEU/LLM evals—check our docs.
Do you change the model’s answer?
No, just compresses inputs. Model output is unchanged. Details in docs.
What about latency?
Sub–200ms overhead. All local. Read more about performance in docs.
How do I know quality is retained?
We benchmark ROUGE/BLEU/LLM evals—check our docs.
Do you change the model’s answer?
No, just compresses inputs. Model output is unchanged. Details in docs.
What about latency?
Sub–200ms overhead. All local. Read more about performance in docs.
How do I know quality is retained?
We benchmark ROUGE/BLEU/LLM evals—check our docs.
Do you change the model’s answer?
No, just compresses inputs. Model output is unchanged. Details in docs.
What about latency?
Sub–200ms overhead. All local. Read more about performance in docs.
Ready to save?
Drop-in proxy. Turn off anytime.
Download for macOS
Ready to save 40–60
Drop-in proxy. Turn off anytime.
Download for macOS
Ready to save 40–60
Drop-in proxy. Turn off anytime.
Download for macOS
𝓶
Links
GitHub
Contact
Not affiliated notice
Docs
FAQ
Privacy
Download
𝓶
Links
GitHub
Contact
Not affiliated notice
Docs
FAQ
Privacy
Download
𝓶
Links
GitHub
Contact
Not affiliated notice
Docs
FAQ
Privacy
Download