Open source · MIT · Python ≥ 3.10

AIGuard

Evaluation, Testing, and Observability for Large Language Models.

AIGuard helps developers evaluate LLM applications before deployment and continuously monitor them in production through hallucination detection, adversarial testing, human review workflows, and trace analysis.

Get Started Documentation GitHub

Install

$pip install aiguard-safety

Test before deployment. Monitor after deployment.

Core capabilities

A focused set of building blocks for safer, more reliable LLM applications.

Hallucination Detection

Evaluate model outputs using structured classification: ground-truth verification, contextual grounding, and self-consistency analysis.

Adversarial Testing

Test models against prompt injection, jailbreak prompts, and instruction override attacks before deployment.

Human Review

Escalate uncertain evaluations to a reviewer queue for manual validation and calibration feedback.

Trace Explorer

Inspect prompts, responses, latency, metadata, and evaluation results from every LLM call.

Two stages, one toolkit

Pre-Deployment Evaluation

Run adversarial attack suites and hallucination checks against your model. Enforce thresholds in CI to block unsafe releases.

CI/CD integration

Production Monitoring

Capture traces from every call with the SDK. Continuously score for hallucinations and escalate uncertain cases to human review.

Monitoring dashboard