v3.9.1
6 days ago by Rok Šikonja
Task repeatability, Trials, Solver post-processor, Risk Dashboard.
What's new
-
Task repeatability helps you check whether an evaluation reliably produces consistent numbers (i.e. consistent metrics and scores).
-
Trials let you run each sample multiple times (before computing the final metrics).
-
Solver postprocessor: apply a Python postprocessing step to solver outputs before scoring.
-
"Used for metrics" indicator in evidence stats, so you can see at a glance which evidence contributes to metrics.
-
CLI: filter local datasets directly from the command line.
-
CLI: new
lf integration listcommand to list your configured integrations. -
CLI: load additional environment variables from extra env files.
Improvements
- Empty risk policies page now links directly to the documentation to help you get started.
- Faster model adapters thanks to bulk conversion, eliminating an N+1 query bottleneck.
- Redesigned and rebranded AI platform login page, including refreshed labels, spacing, and styling.
- Polished model configuration form in the UI.
- Added a cancel button to editing forms so you can discard changes more easily.
- Added a Beta tag to the risk overview.
Bug fixes
- Updated the Fireworks Kimi model from 2.5 to 2.7 following its retirement.
- Removed an outdated Anthropic model and added missing model entries.
- Fixed an incorrect color for the progress value in the risk policy status table's score column.
- Fixed early stoppage of the task progress tracker during repeatability runs.
- Fixed an invalid log status check.
- Fixed model serialization that was dropping type information.
