Skip to content

Deprecate evaluator cutoff config fields and add CLI cutoff precedence#6580

Open
shuyangli wants to merge 2 commits intomainfrom
sl/deprecate-cutoff-config
Open

Deprecate evaluator cutoff config fields and add CLI cutoff precedence#6580
shuyangli wants to merge 2 commits intomainfrom
sl/deprecate-cutoff-config

Conversation

@shuyangli
Copy link
Member

@shuyangli shuyangli commented Feb 25, 2026

General idea is that cutoffs should not be configured at the evaluator level. They are contextual, so if people use them today for regression testing setups, they should pass it on the CLI.

#6603


Note

Medium Risk
Changes evaluation pass/fail behavior by introducing CLI-driven cutoff thresholds with precedence over config values, which can affect CI/regression outcomes. Deprecation warnings and cutoff resolution errors (e.g., unknown evaluator names) may also surface in existing workflows.

Overview
Evaluation cutoffs are migrated from evaluator config fields to a new CLI flag, adding --cutoffs evaluator=value,... (validated as non-negative) and using it to determine pass/fail exit status.

The evaluation runner now resolves effective cutoffs by merging legacy config cutoffs with CLI cutoffs (CLI wins, with warnings), errors on unknown evaluator names, and logs cutoff failures via tracing before failing the run. Evaluator cutoff fields are explicitly deprecated across configs/tests and the tutorial example removes in-config cutoffs.

Written by Cursor Bugbot for commit 4ba600c. This will update automatically on new commits. Configure here.

@shuyangli shuyangli force-pushed the sl/deprecate-cutoff-config branch from b40e5e6 to 1aff483 Compare February 25, 2026 20:48
@shuyangli shuyangli force-pushed the sl/deprecate-cutoff-config branch from 1aff483 to 6db61cb Compare February 25, 2026 21:11
@shuyangli shuyangli force-pushed the sl/deprecate-cutoff-config branch from 6db61cb to f0dfab1 Compare February 25, 2026 22:25
@shuyangli shuyangli force-pushed the sl/deprecate-cutoff-config branch from f0dfab1 to df16908 Compare February 25, 2026 22:29
@shuyangli shuyangli force-pushed the sl/deprecate-cutoff-config branch from df16908 to 18a6444 Compare February 26, 2026 15:04
@shuyangli shuyangli force-pushed the sl/deprecate-cutoff-config branch from 18a6444 to 5925ae2 Compare February 26, 2026 22:30
Base automatically changed from sl/regex-evaluator to main February 26, 2026 23:01
@shuyangli shuyangli force-pushed the sl/deprecate-cutoff-config branch from 5925ae2 to 50d9b9d Compare February 27, 2026 01:23
@shuyangli shuyangli marked this pull request as ready for review February 27, 2026 01:38
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50d9b9d938

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@shuyangli shuyangli force-pushed the sl/deprecate-cutoff-config branch 3 times, most recently from 5f28647 to 6b126f8 Compare February 27, 2026 15:02
Copy link
Member Author

@BugBot review

@shuyangli shuyangli force-pushed the sl/deprecate-cutoff-config branch from a622918 to daafa66 Compare February 27, 2026 16:10
…add config-only deprecation warning, fix backticks in error message
Copy link
Member Author

@BugBot review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

) -> Result<HashMap<String, f32>> {
for evaluator_name in cli_cutoffs.keys() {
if !evaluator_configs.contains_key(evaluator_name) {
return Err(anyhow!("Unknown evaluator in --cutoff: `{evaluator_name}`"));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message references wrong CLI flag name --cutoff

Medium Severity

The error message says --cutoff (singular) but the actual CLI flag is --cutoffs (plural). A user seeing this error would try --cutoff which doesn't exist. The doc comment on line 675 has the same mismatch. Notably, the warning messages on lines 704 and 712 correctly reference --cutoffs.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants