A micro-benchmark suite to assess the effectiveness of tools designed for IoT apps
-
Updated
Nov 22, 2019 - Groovy
A micro-benchmark suite to assess the effectiveness of tools designed for IoT apps
Evaluate safety in long-horizon, tool-using AI agents with this collection of realistic trajectory benchmarks.
Add a description, image, and links to the flawed-apps topic page so that developers can more easily learn about it.
To associate your repository with the flawed-apps topic, visit your repo's landing page and select "manage topics."