Fastino Raises $17.5M Seed Round from Khosla Ventures to Launch TLMs: Task-Specific Language Models

A new chapter in enterprise AI begins as we announce TLMs – faster, cheaper, and smarter models built for task-level precision at scale.

In a space dominated by massive general-purpose models trained on everything from internet chatter to code dumps, we are taking a radically different path: purpose-built, Task-Specific Language Models (TLMs) that do one job — and do it better than anything else on the market.

Today, we’re thrilled to announce our $17.5M seed round led by Khosla Ventures (the first investor in OpenAI) to bring TLMs to every AI developer on every planet. This brings our total funding to nearly $25 million after our $7M pre-seed round led by Insight Partners and M12, Microsoft’s Venture Fund, announced in November of 2024. Additional investors in this seed round include Valor Equity Partners, former Docker CEO Scott Johnson, and Weights & Biases co-founders CEO Lukas Biewald and CTO Shawn Lewis.

With this new funding, we’ll double down on our mission to make powerful, lightweight AI models available to every developer, everywhere. We’re scaling our research team to push the boundaries of fast, accurate model performance.

Why TLMs?

We started Fastino after seeing firsthand how unsustainable generalist LLMs were at scale at our last agent startup: as platform usage grew, our LLM API costs began to skyrocket. It wasn’t a compute problem – it was inefficiency. Our tasks didn’t require a trillion-parameter generalist model; they needed something faster, cheaper, and purpose-built. That insight sparked Fastino and our TLMs: task-optimized models that do specific jobs exceptionally well – and don’t burn your budget doing it.

Our TLMs are trained from the ground up on a new architecture, built for specific tasks core to modern enterprise needs:

Summarization: Generating concise, accurate summaries from long-form or noisy text, enabling faster understanding and content distillation.
Function Calling: A hyper-efficient model designed for agentic systems, enabling precise, low-latency tool invocation – ideal for integrating LLMs into production workflows.
Text to JSON: Converting unstructured text into structured, clean, and production-ready JSON for seamless downstream integration.
PII Redaction: Redacting sensitive or personally identifiable information (PII) on a zero-shot basis, including support for user-defined or industry-specific entity types.
Text Classification: A versatile zero-shot model for any labeling task, equipped with enterprise-grade safeguards including spam and toxicity detection, out-of-bounds filtering, jailbreak detection, and intent classification.
Profanity Censoring: Identifying and re dacting profane language to ensure content compliance and brand safety.
Information Extraction: Extracting structured data – such as entities, attributes, and contextual insights – from unstructured text to support use cases like document processing, search query parsing, question answering, and custom data detection.

Built by Experts, Trained on Low-End Gaming GPUs

Fastino was started by our co-founders Ash Lewis and George Hurn-Maloney and includes research team members from Google DeepMind, Stanford, CMU, and Apple Intelligence. Together, we developed a novel model architecture that outperforms traditional LLMs on task-specific benchmarks — despite being trained on commodity NVIDIA gaming GPUs for less than $100K and zero H100s.

Yes, you read that right. While the industry trains trillion-parameter models on multi-million-dollar clusters, we achieved state-of-the-art performance with models that are:

99x faster than standard LLMs
Able to run on CPUs and low-end GPUs
Faster, more accurate, and cheaper to deploy

“Large enterprises using frontier models typically only care about performance on a narrow set of tasks,” said Jon Chu, Partner at Khosla Ventures. “Fastino’s tech allows enterprises to create a model with better-than-frontier model performance for just the set of tasks you care about and package it into a small, lightweight model that’s portable enough to run on CPUs, all while being orders of magnitude faster with latency guarantees. These tradeoffs open up new use cases for generative models that historically haven’t been practical before.”

Flat Pricing. Predictable Costs. And a Free Model API inferencing on CPUs.

We’re also breaking from the status quo of runaway per-token fees. We are offering the first flat monthly subscription model for developers — use any TLM API without worrying about surprise bills.

We’re also introducing the industry’s first free model API, offering up to 10,000 requests per user each month. It runs entirely on CPUs to minimize environmental impact and avoid unnecessary use of natural resources.

Enterprise Ready Models Built for Scale.

For enterprise customers, Fastino TLMs can be seamlessly integrated across your infrastructure stack:

Within your private VPC environment
Across on-prem compute clusters or bare metal
At the edge, close to data sources for low-latency inference

This means maximum control, full compliance, and zero data leakage — without compromising performance.

Real-World Impact

From document parsing in finance and healthcare to real-time search query intelligence in e-commerce, Fortune 500 companies are already using Fastino TLMs to speed up AI adoption across their workflows.

Our COO and co-founder George Hurn-Maloney put it best:

"AI developers don't need an LLM trained on trillions of irrelevant data points – they need the right model for their task," said George Hurn-Maloney, COO and co-founder of Fastino. "That’s why we’re making highly accurate, lightweight models with the first-ever flat monthly pricing – and a free tier so devs can integrate the right model into their workflow without compromise."

Join Us

We believe this is a new chapter in enterprise AI. Fastino’s TLMs are changing what’s possible — with models that are not just smart, but sensible.

Check out our APIs, docs, and use cases at fastino.ai — and see how precision AI can help you build faster, scale cheaper, and deploy safely.