LLM Guardrails That
Don't Slow You Down

A 300M-parameter safety model that evaluates prompts
and responses across four moderation dimensions in a single pass.
Competitive accuracy to guard models 23-90x its size.

Try GLiGuard in Pioneer

Hugging Face

GitHub

87.7 avg. prompt
F1 score

16x

higher
throughput

0.3B

params
encoder

RESEARCH

Four moderation tasks.
One forward pass.

Multi-aspect moderation in one pass. Prompt safety, response safety, harm categorization, and jailbreak detection scored simultaneously in a single forward pass.

Learn more

Schema-conditioned at inference. Any combination of moderation tasks, composed at inference time. No retraining. No prompt redesign.

Learn more

Inline with the agent loop. At sub-30ms latency, GLiGuard is fast enough to gate every prompt and response without slowing your agent loop.

Deploy GLiGuard

EFFICIENCY

23–90× smaller than competing LLM guards

Single non-autoregressive pass

Bidirectional encoder evaluates every safety dimension in parallel. No token-by-token generation, no sequential decoding bottleneck.

Competitive accuracy at 1/30 the size

At 0.3B, GLiGuard runs on a single GPU and beats models like LlamaGuard (12B) and WildGuard (7B) on HarmBench and SafeRLHF response classification.

Stay on your infrastructure

Apache 2.0 open weights. Can be run on-prem / air-gapped, or deployed with Pioneer.

0.3B

Parameters
(vs. 7B-12B baselines)

9 Safety benchmarks
evaluated

16×

Higher throughput

vs. Qwen3Guard-8B

17×

Lower latency
at sequence length 64

GLiGuard Benchmark Results

Nine established safety benchmarks. Up to 16x faster inference.

LlamaGuard4

Qwen3Guard-Gen

FASTINO GLiGuard

Parameters

12B

0.3B

Architecture

Decoder (autoregressive)

Encoder (bidirectional)

Multi tasks in one pass

Avg. prompt F1

82.5

88.7

87.7

Avg. response F1

70.8

84.1

82.7

HarmBench (response)

83.3

87.2

91.0

SafeRLHF (response)

42.5

70.5

84.5

Latency @ SL 64 (ms)

—

426

Throughput @ BS 4 (req/s)

—

8.2

133

Get started

3,000,000+

monthly
downloads

3200+

github
stars

1.1BN+

end
users

Join the community

Join our active community on discord

Join now

Need help?

Get in touch with our support team.

Contact Support

Products

Fastino Labs

Connect

Fastino Inc. (“Fastino”) develops specialized AI models and provides APIs designed to support structured data extraction, classification, reasoning, and production AI workflows. Fastino is a technology company and does not provide legal, financial, compliance, or advisory services.

Any outputs, predictions, classifications, or decisions generated through Fastino models are based on the configuration, data, and implementation provided by the customer. Fastino does not control, verify, or guarantee the accuracy, completeness, or suitability of model outputs for any specific purpose. By using this website or Fastino’s models and services, you acknowledge that all content and outputs are provided for informational and operational purposes only and agree to our Terms of Use and Privacy Policy.

2026 Fastino Inc.

LLM Guardrails ThatDon't Slow You Down

A 300M-parameter safety model that evaluates prompts and responses across four moderation dimensions in a single pass. Competitive accuracy to guard models 23-90x its size.

87.7

avg. promptF1 score

16x

higherthroughput

0.3B

paramsencoder

RESEARCH

Four moderation tasks.One forward pass.

EFFICIENCY

23–90× smaller than competing LLM guards

0.3B

Parameters(vs. 7B-12B baselines)

9

Safety benchmarksevaluated

16×

Higher throughput

vs. Qwen3Guard-8B

17×

Lower latencyat sequence length 64

GLiGuard Benchmark Results

LlamaGuard4

Qwen3Guard-Gen

FASTINO GLiGuard

3,000,000+

monthlydownloads

3200+

githubstars

1.1BN+

endusers

LLM Guardrails That
Don't Slow You Down

A 300M-parameter safety model that evaluates prompts
and responses across four moderation dimensions in a single pass.
Competitive accuracy to guard models 23-90x its size.

avg. prompt
F1 score

higher
throughput

params
encoder

Four moderation tasks.
One forward pass.

Parameters
(vs. 7B-12B baselines)

Safety benchmarks
evaluated

Lower latency
at sequence length 64

monthly
downloads

github
stars

end
users