GLiNER-2-XL

GLiNER-2 Pricing

GLiNER 2 delivers enterprise-grade information extraction at a fraction of the cost of large language models — optimized for real-time inference and large-scale deployment.

Overview

GLiNER 2 is built for production workloads where efficiency, cost control, and predictable performance matter.


Its CPU-optimized architecture enables low-latency Named Entity Recognition (NER), Text Classification, and Structured Extraction at half the cost of comparable LLM-based solutions.

Efficiency at Scale

Metric

Value

Description

Price

$0.625 per 1M tokens

Run enterprise-grade NER and classification at full scale — 50% lower cost than standard LLM inference.

Average Latency

≈130 ms per request

Built for real-time pipelines and streaming applications.

Throughput

>1,500 req/sec

Horizontally scalable.

Model Size

1B parameters

Compact, high-accuracy transformer model optimized for low latency.

💡 Billing granularity: Usage is measured per 1 M processed tokens (input + output combined).

Example Cost Scenarios

Use Case

Volume / Month

Estimated Cost

Customer support entity extraction

25 M tokens

≈ $11.25

Document classification pipeline

80 M tokens

≈ $36.00

Performance Benchmark

GLiNER 2 achieves state-of-the-art efficiency compared to general-purpose LLMs:

Model

Avg Latency

Cost per 1M Tokens

GLiNER-2-XL

130 ms

$0.625

GPT-4-Turbo (LLM)

500 – 900 ms

$1.25 – $3.00

GPT-5

7000-28000 ms

$1.25 – $3.00

Claude 3 Haiku

250 – 400 ms

$0.80 – $1.00

Included Features

All tiers include:

  • Access to the /gliner-2 hosted inference API

  • Schema-based multi-task extraction (NER + classification + structured parsing)

  • CPU-optimized real-time inference (no GPU needed)

  • Usage dashboard and token analytics

  • Fastino support and model updates

Summary

GLiNER 2 delivers half-price, full-scale extraction with 130 ms latency — purpose-built for enterprise information pipelines, real-time analytics, and cost-sensitive applications.
It’s the most efficient way to bring schema-driven intelligence into your workflow.

On this page