Systematically test how vision-language-action models degrade under realistic physical perturbations. One command. Audit-grade reports. Compliance-ready.
The Problem
Vision-language-action models power the next generation of autonomous robots. But change the lighting, shift the camera angle, or add a distractor object and performance can collapse. Most teams discover this in production, not in testing.
of VLA models show significant performance drops under moderate lighting changes
physical dimensions where models silently degrade: camera, lighting, noise, texture, layout, dynamics, language
EU AI Act and Machinery Regulation deadlines requiring documented robustness testing
Features
Sweep across 7 physical dimensions with 5 calibrated severity levels. Immutable benchmarks that are comparable across time and teams.
Find deliberate exploit paths with optimization-based adversarial attacks. Universal patches, action freezing, 3D texture attacks, and backdoor detection.
Generate audit-grade evidence packs mapped to EU AI Act, Machinery Regulation, and ISO 10218. PDF reports ready for regulators.
How It Works
# Install
$ pip install kiln-lab
# Run robustness benchmark
$ kiln bench --model my_model.py --suite standard-L3 --env PickCube
Running robustness evaluation...
Model: MyVLA | Suite: standard-L3 | Env: PickCube | Episodes: 50
Baseline TSR: 0.92
Camera jitter (L3): 0.74 Grade: C
Lighting change (L3): 0.68 Grade: C
Gaussian blur (L3): 0.85 Grade: B
Distractors (L3): 0.41 Grade: F
Robot init state (L3): 0.79 Grade: B
Overall Robustness Score: 0.67 (Grade: C)
# Generate compliance report
$ kiln comply results.json --regulation eu-ai-act -o report.pdf
Any (image, instruction) → action model
Fixed severity parameters — benchmarks are comparable across years
GitHub Actions integration — robustness as a required pipeline check
Open Source
The Kiln CLI is MIT-licensed and free forever. Contribute perturbation modules, simulator adapters, or failure reports. The robustness of robot AI is too important to be behind closed doors.
Licensed
Compliance
The EU AI Act and Machinery Regulation require documented robustness testing for AI-powered robots. Kiln maps your test results directly to regulatory requirements.
High-risk AI systems · Art. 6(1)
Mandatory robustness requirements for AI safety components in machinery, medical devices, and automotive systems.
Deadline: August 2, 2027
2023/1230 · Self-evolving AI
Mandatory Notified Body assessment for safety components using machine learning. No assessment means no EU market access.
Deadline: January 20, 2027
Robot safety · 2025 revision
Updated robot safety standard with new requirements for AI-based control systems, environmental conditions testing, and cybersecurity.
NIST AI RMF · ISO 42001
Additional framework mappings for NIST AI 100-2 adversarial ML taxonomy and ISO/IEC 42001 AI management systems.
Install the CLI and run your first robustness benchmark in minutes. Free and open source.
pip install kiln-lab