Open Source · MIT Licensed

Robustness testing
for physical AI

Systematically test how vision-language-action models degrade under realistic physical perturbations. One command. Audit-grade reports. Compliance-ready.

The Problem

VLA models look great in the lab.
They fail in the real world.

Vision-language-action models power the next generation of autonomous robots. But change the lighting, shift the camera angle, or add a distractor object and performance can collapse. Most teams discover this in production, not in testing.

87%

of VLA models show significant performance drops under moderate lighting changes

7

physical dimensions where models silently degrade: camera, lighting, noise, texture, layout, dynamics, language

2027

EU AI Act and Machinery Regulation deadlines requiring documented robustness testing

Features

Test. Red-team. Certify.

Robustness Benchmarks

Sweep across 7 physical dimensions with 5 calibrated severity levels. Immutable benchmarks that are comparable across time and teams.

Adversarial Red-Teaming

Find deliberate exploit paths with optimization-based adversarial attacks. Universal patches, action freezing, 3D texture attacks, and backdoor detection.

Compliance Reports

Generate audit-grade evidence packs mapped to EU AI Act, Machinery Regulation, and ISO 10218. PDF reports ready for regulators.

How It Works

One command to test your model

terminal

# Install

$ pip install kiln-lab


# Run robustness benchmark

$ kiln bench --model my_model.py --suite standard-L3 --env PickCube


Running robustness evaluation...

Model: MyVLA | Suite: standard-L3 | Env: PickCube | Episodes: 50


Baseline TSR: 0.92

Camera jitter (L3): 0.74 Grade: C

Lighting change (L3): 0.68 Grade: C

Gaussian blur (L3): 0.85 Grade: B

Distractors (L3): 0.41 Grade: F

Robot init state (L3): 0.79 Grade: B


Overall Robustness Score: 0.67 (Grade: C)


# Generate compliance report

$ kiln comply results.json --regulation eu-ai-act -o report.pdf

Model-agnostic

Any (image, instruction) → action model

Physically calibrated

Fixed severity parameters — benchmarks are comparable across years

CI/CD ready

GitHub Actions integration — robustness as a required pipeline check

Open Source

Built in the open

The Kiln CLI is MIT-licensed and free forever. Contribute perturbation modules, simulator adapters, or failure reports. The robustness of robot AI is too important to be behind closed doors.

MIT

Licensed

Compliance

Regulation is coming.
Be ready.

The EU AI Act and Machinery Regulation require documented robustness testing for AI-powered robots. Kiln maps your test results directly to regulatory requirements.

EU

EU AI Act

High-risk AI systems · Art. 6(1)

Mandatory robustness requirements for AI safety components in machinery, medical devices, and automotive systems.

Deadline: August 2, 2027

EU

Machinery Regulation

2023/1230 · Self-evolving AI

Mandatory Notified Body assessment for safety components using machine learning. No assessment means no EU market access.

Deadline: January 20, 2027

ISO

ISO 10218

Robot safety · 2025 revision

Updated robot safety standard with new requirements for AI-based control systems, environmental conditions testing, and cybersecurity.

+

More Frameworks

NIST AI RMF · ISO 42001

Additional framework mappings for NIST AI 100-2 adversarial ML taxonomy and ISO/IEC 42001 AI management systems.

Start testing today

Install the CLI and run your first robustness benchmark in minutes. Free and open source.

pip install kiln-lab