Kiln Lab — Robustness Testing for Robot AI

The Problem

VLA models look great in the lab.
They fail in the real world.

Vision-language-action models power the next generation of autonomous robots. But change the lighting, shift the camera angle, or add a distractor object and performance can collapse. Most teams discover this in production, not in testing.

87%

of VLA models show significant performance drops under moderate lighting changes

physical dimensions where models silently degrade: camera, lighting, noise, texture, layout, dynamics, language

2027

EU AI Act and Machinery Regulation deadlines requiring documented robustness testing

Features

Test. Red-team. Certify.

Robustness Benchmarks

Sweep across 7 physical dimensions with 5 calibrated severity levels. Immutable benchmarks that are comparable across time and teams.

Adversarial Red-Teaming

Find deliberate exploit paths with optimization-based adversarial attacks. Universal patches, action freezing, 3D texture attacks, and backdoor detection.

Compliance Reports

Generate audit-grade evidence packs mapped to EU AI Act, Machinery Regulation, and ISO 10218. PDF reports ready for regulators.

How It Works

One command to test your model

terminal

# Install

$ pip install kiln-lab

# Run robustness benchmark

$ kiln bench --model my_model.py --suite standard-L3 --env PickCube

Running robustness evaluation...

Model: MyVLA | Suite: standard-L3 | Env: PickCube | Episodes: 50

Baseline TSR: 0.92

Camera jitter (L3): 0.74 Grade: C

Lighting change (L3): 0.68 Grade: C

Gaussian blur (L3): 0.85 Grade: B

Distractors (L3): 0.41 Grade: F

Robot init state (L3): 0.79 Grade: B

Overall Robustness Score: 0.67 (Grade: C)

# Generate compliance report

$ kiln comply results.json --regulation eu-ai-act -o report.pdf

Model-agnostic

Any (image, instruction) → action model

Physically calibrated

Fixed severity parameters — benchmarks are comparable across years

CI/CD ready

GitHub Actions integration — robustness as a required pipeline check

Open Source

Built in the open

The Kiln CLI is MIT-licensed and free forever. Contribute perturbation modules, simulator adapters, or failure reports. The robustness of robot AI is too important to be behind closed doors.

Star on GitHub Contributing Guide

MIT

Licensed

Compliance

Regulation is coming.
Be ready.

The EU AI Act and Machinery Regulation require documented robustness testing for AI-powered robots. Kiln maps your test results directly to regulatory requirements.

EU AI Act

High-risk AI systems · Art. 6(1)

Mandatory robustness requirements for AI safety components in machinery, medical devices, and automotive systems.

Deadline: August 2, 2027

Machinery Regulation

2023/1230 · Self-evolving AI

Mandatory Notified Body assessment for safety components using machine learning. No assessment means no EU market access.

Deadline: January 20, 2027

ISO

ISO 10218

Robot safety · 2025 revision

Updated robot safety standard with new requirements for AI-based control systems, environmental conditions testing, and cybersecurity.

More Frameworks

NIST AI RMF · ISO 42001

Additional framework mappings for NIST AI 100-2 adversarial ML taxonomy and ISO/IEC 42001 AI management systems.

Limited Beta — Accepting Applications Now

Only 50 teams in
the first cohort.

We're onboarding a small cohort of robotics teams to shape the future of Kiln. Beta members get hands-on support, early access to every new feature, and a permanent discount locked in at launch.

Spots are filling fast — we don't do waitlists twice.

Beta spots claimed 38 / 50

12 spots remaining

Dedicated Onboarding

White-glove setup with the Kiln team. We configure benchmarks around your exact robot stack.

Shape the Roadmap

Direct input on features, perturbation suites, and compliance mappings before public release.

Founder Pricing

Beta pricing locked in forever. Early believers shouldn't pay the same as latecomers.

No spam. We'll respond within 48 hours.

Robustness testing
for physical AI

VLA models look great in the lab.
They fail in the real world.

Test. Red-team. Certify.

Robustness Benchmarks

Adversarial Red-Teaming

Compliance Reports

One command to test your model

Built in the open

Regulation is coming.
Be ready.

EU AI Act

Machinery Regulation

ISO 10218

More Frameworks

Only 50 teams in
the first cohort.

Dedicated Onboarding

Shape the Roadmap

Founder Pricing

Start testing today

Robustness testing for physical AI

VLA models look great in the lab.They fail in the real world.

Test. Red-team. Certify.

Robustness Benchmarks

Adversarial Red-Teaming

Compliance Reports

One command to test your model

Built in the open

Regulation is coming.Be ready.

EU AI Act

Machinery Regulation

ISO 10218

More Frameworks

Only 50 teams inthe first cohort.

Dedicated Onboarding

Shape the Roadmap

Founder Pricing

Start testing today

Robustness testing
for physical AI

VLA models look great in the lab.
They fail in the real world.

Regulation is coming.
Be ready.

Only 50 teams in
the first cohort.