Deprecate evaluator cutoff config fields and add CLI cutoff precedence#6580
Deprecate evaluator cutoff config fields and add CLI cutoff precedence#6580
Conversation
43777d9 to
f287b57
Compare
b40e5e6 to
1aff483
Compare
f287b57 to
298c7da
Compare
1aff483 to
6db61cb
Compare
6db61cb to
f0dfab1
Compare
298c7da to
507f480
Compare
f0dfab1 to
df16908
Compare
df16908 to
18a6444
Compare
507f480 to
78f3349
Compare
18a6444 to
5925ae2
Compare
423de65 to
db0471b
Compare
5925ae2 to
50d9b9d
Compare
docs/evaluations/inference-evaluations/configuration-reference.mdx
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 50d9b9d938
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
5f28647 to
6b126f8
Compare
|
@BugBot review |
6b126f8 to
a622918
Compare
a622918 to
daafa66
Compare
…add config-only deprecation warning, fix backticks in error message
|
@BugBot review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| ) -> Result<HashMap<String, f32>> { | ||
| for evaluator_name in cli_cutoffs.keys() { | ||
| if !evaluator_configs.contains_key(evaluator_name) { | ||
| return Err(anyhow!("Unknown evaluator in --cutoff: `{evaluator_name}`")); |
There was a problem hiding this comment.
Error message references wrong CLI flag name --cutoff
Medium Severity
The error message says --cutoff (singular) but the actual CLI flag is --cutoffs (plural). A user seeing this error would try --cutoff which doesn't exist. The doc comment on line 675 has the same mismatch. Notably, the warning messages on lines 704 and 712 correctly reference --cutoffs.


General idea is that cutoffs should not be configured at the evaluator level. They are contextual, so if people use them today for regression testing setups, they should pass it on the CLI.
#6603
Note
Medium Risk
Changes evaluation pass/fail behavior by introducing CLI-driven cutoff thresholds with precedence over config values, which can affect CI/regression outcomes. Deprecation warnings and cutoff resolution errors (e.g., unknown evaluator names) may also surface in existing workflows.
Overview
Evaluation cutoffs are migrated from evaluator config fields to a new CLI flag, adding
--cutoffs evaluator=value,...(validated as non-negative) and using it to determine pass/fail exit status.The evaluation runner now resolves effective cutoffs by merging legacy config cutoffs with CLI cutoffs (CLI wins, with warnings), errors on unknown evaluator names, and logs cutoff failures via
tracingbefore failing the run. Evaluatorcutofffields are explicitly deprecated across configs/tests and the tutorial example removes in-config cutoffs.Written by Cursor Bugbot for commit 4ba600c. This will update automatically on new commits. Configure here.