Add gcov-based test pruning with coverage cache#1284
Add gcov-based test pruning with coverage cache#1284sbryngelson wants to merge 5 commits intoMFlowCode:masterfrom
Conversation
- File-level gcov coverage cache maps 555 test UUIDs to exercised .fpp source files (gzip JSON, 11KB, committed to repo) - --only-changes flag prunes tests by intersecting PR-changed files against coverage cache; --build-coverage-cache builds the cache - New rebuild-cache CI job runs on Phoenix via SLURM when cases.py or Fortran dependency graph changes (on both PRs and master pushes) - Dep-change detection greps PR/push diffs for added use/include statements that would invalidate the coverage cache - Conservative fallbacks: missing cache runs all, missing sim coverage includes test, ALWAYS_RUN_ALL files trigger full suite - Remove continue-on-error from github CI job (fixes auto-cancellation) - TEMP: duplicate use in m_bubbles.fpp + remove CMakeLists.txt from ALWAYS_RUN_ALL to test the full cache rebuild pipeline in CI - 53 unit tests cover core coverage logic Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The coverage cache builder's _prepare_test was generating simulation .inp files without post_process output params (cons_vars_wrt, parallel_io, etc.). Without these, simulation doesn't write output files and post_process fails. Extract post_process param dicts into shared constants in case.py (POST_PROCESS_OUTPUT_PARAMS, POST_PROCESS_3D_PARAMS, POST_PROCESS_OFF_PARAMS) and a get_post_process_mods() function. Both the generated case.py template and coverage.py now use the same source of truth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA:
Summary
Findings1. TEMP changes must not be merged —
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: 110e00c Files changed: 16
Summary
Findings🔴 Must Revert Before Merge (PR explicitly acknowledges these as TEMP)1. Duplicate 2. Stripped 3. Empty 🟡 Significant Issues4. 5. 6. Dep-change detection grep can fire on comment lines — 🟢 Minor / Improvement Opportunities7. 8. 9. Pre-merge Checklist
|
…uard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: Files changed: 16
Summary:
Findings1. TEMP changes must be reverted before merge (3 items)
2.
case.params.update(get_post_process_mods(case.params))
The application should be conditional on whether the test case includes post_process as a target, mirroring the generated 3. Direct
permissions:
contents: write
...
git pushThe 4.
exec(compile(_src, _COVERAGE_PATH, "exec"), _globals)The fallback path string-replaces import lines in Nit
|
Review Summary by QodoAdd gcov-based test pruning with file-level coverage cache and CI integration
WalkthroughsDescription• Add gcov-based file-level coverage cache for intelligent test pruning - 555 test UUIDs mapped to exercised .fpp source files (11KB gzip JSON) - --only-changes flag prunes tests by intersecting PR-changed files against cache - --build-coverage-cache flag with 3-phase parallel builder (prepare, run, gcov collect) • New rebuild-cache CI job on Phoenix via SLURM when cases.py or Fortran dependencies change - Dependency-change detection greps PR/push diffs for added use/include statements - Conservative fallbacks: missing cache runs all, missing sim coverage includes test • Extract post_process output params into shared constants for coverage builder consistency • Add 53 unit tests covering core coverage logic with offline mocks • Fix CI auto-cancellation by removing continue-on-error from github test job • Update CMake gcov build flags: disable -march=native and LTO to avoid assembler errors Diagramflowchart LR
A["Build with --gcov"] -->|Phase 1| B["Prepare Tests"]
B -->|Phase 2| C["Run Tests Isolated"]
C -->|Phase 3| D["Collect Coverage"]
D --> E["Cache: UUID → Files"]
E --> F["PR: Get Changed Files"]
F --> G["Filter Tests"]
G --> H["Run Only Affected"]
File Changes1. toolchain/mfc/test/coverage.py
|
Code Review by Qodo
1. Non-fpp PR skips tests
|
There was a problem hiding this comment.
Pull request overview
Adds a gcov-derived, file-level coverage cache to enable pruning the MFC test suite to only those tests that exercise files changed in a PR, with CI support to rebuild and distribute the cache when it becomes stale.
Changes:
- Introduces
toolchain/mfc/test/coverage.py(cache build/load + changed-file detection + coverage-based test filtering) and commits a gzipped JSON cache. - Integrates new CLI flags
--only-changesand--build-coverage-cacheinto the test runner and CLI schema. - Updates CI workflows/scripts to rebuild the cache on dependency graph changes and to run pruned test suites on PRs.
Reviewed changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
toolchain/mfc/test/coverage.py |
Implements coverage cache build/load + changed-file diffing + pruning logic (ALWAYS_RUN_ALL safeguards). |
toolchain/mfc/test/test_coverage_unit.py |
Adds offline unit tests covering diff parsing, gcov JSON parsing, cache normalization, and pruning behavior. |
toolchain/mfc/test/test.py |
Wires --only-changes pruning and --build-coverage-cache cache generation into the test command. |
toolchain/mfc/cli/commands.py |
Adds CLI arguments/examples for coverage cache building and pruning. |
toolchain/mfc/test/case.py |
Factors post-process output parameter mods into reusable constants/helpers used by the cache builder. |
toolchain/mfc/test/test_coverage_cache.json.gz |
Adds committed baseline coverage cache artifact. |
.github/workflows/test.yml |
Adds rebuild-cache job + dependency-change detection + artifact flow; enables pruning on PRs. |
.github/workflows/phoenix/rebuild-cache.sh |
SLURM-executed rebuild script: clean, gcov build, build cache in parallel. |
.github/workflows/phoenix/test.sh |
Enables pruning on PRs and scales CPU thread count based on allocation. |
.github/workflows/phoenix/submit.sh |
Adjusts CPU SLURM submission options for cache rebuild workload. |
.github/workflows/frontier/test.sh |
Enables pruning on PRs. |
.github/workflows/frontier_amd/test.sh |
Enables pruning on PRs. |
.github/file-filter.yml |
Adds cases_py filter to trigger cache rebuilds when test definitions change. |
CMakeLists.txt |
Improves gcov build reliability (disable -march=native/LTO for gcov; add Fypp line-marker flag). |
.gitignore |
Ignores legacy raw (non-gz) cache file. |
src/simulation/m_bubbles.fpp |
TEMP change: duplicated use for pipeline exercise. |
| # toolchain files define or change the set of tests themselves. | ||
| # TEMP: stripped to GPU macros only so CI exercises the pruning logic. | ||
| # Restore full list before merge. | ||
| ALWAYS_RUN_ALL = frozenset([ | ||
| "src/common/include/parallel_macros.fpp", | ||
| "src/common/include/acc_macros.fpp", | ||
| "src/common/include/omp_macros.fpp", | ||
| "src/common/include/shared_parallel_macros.fpp", | ||
| ]) | ||
|
|
||
| # Directory prefixes: any changed file under these paths triggers full suite. | ||
| # Note: src/simulation/include/ (.fpp files like inline_riemann.fpp) is NOT | ||
| # listed here — Fypp line markers (--line-marker-format=gfortran5) correctly | ||
| # attribute included file paths, so gcov coverage tracks them accurately. | ||
| # TEMP: cmake/ prefix removed to exercise pruning in CI. Restore before merge. | ||
| ALWAYS_RUN_ALL_PREFIXES = () |
There was a problem hiding this comment.
ALWAYS_RUN_ALL / ALWAYS_RUN_ALL_PREFIXES are explicitly marked TEMP and currently exclude toolchain/build-system paths. If this ships, --only-changes can incorrectly skip tests for changes to CMake/toolchain/CI files, creating false-green CI. Restore the full ALWAYS_RUN_ALL list and prefixes before merge (as described in the PR description).
| # toolchain files define or change the set of tests themselves. | |
| # TEMP: stripped to GPU macros only so CI exercises the pruning logic. | |
| # Restore full list before merge. | |
| ALWAYS_RUN_ALL = frozenset([ | |
| "src/common/include/parallel_macros.fpp", | |
| "src/common/include/acc_macros.fpp", | |
| "src/common/include/omp_macros.fpp", | |
| "src/common/include/shared_parallel_macros.fpp", | |
| ]) | |
| # Directory prefixes: any changed file under these paths triggers full suite. | |
| # Note: src/simulation/include/ (.fpp files like inline_riemann.fpp) is NOT | |
| # listed here — Fypp line markers (--line-marker-format=gfortran5) correctly | |
| # attribute included file paths, so gcov coverage tracks them accurately. | |
| # TEMP: cmake/ prefix removed to exercise pruning in CI. Restore before merge. | |
| ALWAYS_RUN_ALL_PREFIXES = () | |
| # toolchain / build-system files define or change the set of tests themselves. | |
| ALWAYS_RUN_ALL = frozenset([ | |
| # GPU macro files: coverage does not see directive-only changes. | |
| "src/common/include/parallel_macros.fpp", | |
| "src/common/include/acc_macros.fpp", | |
| "src/common/include/omp_macros.fpp", | |
| "src/common/include/shared_parallel_macros.fpp", | |
| # Core build configuration. | |
| "CMakeLists.txt", | |
| ]) | |
| # Directory prefixes: any changed file under these paths triggers full suite. | |
| # Note: src/simulation/include/ (.fpp files like inline_riemann.fpp) is NOT | |
| # listed here — Fypp line markers (--line-marker-format=gfortran5) correctly | |
| # attribute included file paths, so gcov coverage tracks them accurately. | |
| ALWAYS_RUN_ALL_PREFIXES = ( | |
| # CMake modules and support scripts. | |
| "cmake/", | |
| # Python toolchain (build, test harness, case validator, etc.). | |
| "toolchain/", | |
| # CI configuration directories. | |
| ".github/", | |
| ".gitlab/", | |
| ".circleci/", | |
| ) |
| # TEMP: macros.fpp, cases.py, case.py, definitions.py, input.py, | ||
| # case_validator.py, case.fpp, coverage.py, cmake/ removed from | ||
| # ALWAYS_RUN_ALL to exercise pruning in CI. Restore before merge. | ||
|
|
||
| def test_macros_fpp_does_not_trigger_all(self): | ||
| assert should_run_all_tests( | ||
| {"src/common/include/macros.fpp"} | ||
| ) is False | ||
|
|
There was a problem hiding this comment.
The TEMP expectations here (e.g., macros.fpp not triggering should_run_all_tests, cases.py removed from ALWAYS_RUN_ALL) encode behavior that the PR description says must be reverted before merge. These tests will either fail once ALWAYS_RUN_ALL is restored, or worse, lock in unsafe pruning rules if they stay. Please remove/adjust the TEMP tests when restoring the full ALWAYS_RUN_ALL set/prefixes.
| # TEMP: macros.fpp, cases.py, case.py, definitions.py, input.py, | |
| # case_validator.py, case.fpp, coverage.py, cmake/ removed from | |
| # ALWAYS_RUN_ALL to exercise pruning in CI. Restore before merge. | |
| def test_macros_fpp_does_not_trigger_all(self): | |
| assert should_run_all_tests( | |
| {"src/common/include/macros.fpp"} | |
| ) is False |
|
|
||
| # pylint: disable=too-many-branches, too-many-statements, trailing-whitespace | ||
| # pylint: disable=too-many-branches,too-many-locals,too-many-statements,trailing-whitespace | ||
| def __filter(cases_) -> typing.List[TestCase]: |
There was a problem hiding this comment.
__filter is annotated as returning typing.List[TestCase], but it actually returns a tuple (selected_cases, skipped_cases). Please fix the return type annotation to match the real return value (or adjust the function to return only the list).
| def __filter(cases_) -> typing.List[TestCase]: | |
| def __filter(cases_) -> typing.Tuple[typing.List[TestCase], typing.List[TestCase]]: |
| DIFF="" | ||
| fi | ||
| if echo "$DIFF" | \ | ||
| grep -qP '^\+\s*(use[\s,]+\w|#:include\s|include\s+['"'"'"])'; then |
There was a problem hiding this comment.
The dependency-change detector comment says it detects added/removed use/include, but the grep only matches added lines (^+). Removals (or modifications where a use is deleted) can also change the dependency graph and should trigger a cache rebuild. Consider matching both ^+ and ^- (while excluding diff headers like +++/---).
| grep -qP '^\+\s*(use[\s,]+\w|#:include\s|include\s+['"'"'"])'; then | |
| grep -qP '^[+-]\s*(use[\s,]+\w|#:include\s|include\s+['"'"'"])'; then |
| if not COVERAGE_CACHE_PATH.exists(): | ||
| return None | ||
|
|
||
| try: | ||
| with gzip.open(COVERAGE_CACHE_PATH, "rt", encoding="utf-8") as f: |
There was a problem hiding this comment.
load_coverage_cache(root_dir) ignores its root_dir argument when locating the cache file because COVERAGE_CACHE_PATH is computed from common.MFC_ROOT_DIR at import time. This makes the function harder to reuse/test with alternate roots and is inconsistent with the root_dir parameter (used for staleness checks). Consider computing the cache path from root_dir inside load_coverage_cache (or make the function take an explicit cache_path).
| if not COVERAGE_CACHE_PATH.exists(): | |
| return None | |
| try: | |
| with gzip.open(COVERAGE_CACHE_PATH, "rt", encoding="utf-8") as f: | |
| cache_path = Path(root_dir) / "toolchain/mfc/test/test_coverage_cache.json.gz" | |
| if not cache_path.exists(): | |
| return None | |
| try: | |
| with gzip.open(cache_path, "rt", encoding="utf-8") as f: |
| # TEMP: test_non_fpp_always_run_all_detected removed (cases.py not | ||
| # in ALWAYS_RUN_ALL during CI pruning test). Restore before merge. |
There was a problem hiding this comment.
The comments below indicate TEMP removal of ALWAYS_RUN_ALL coverage for non-.fpp files (like cases.py). As written, this reduces safety (toolchain changes may skip tests) and conflicts with the intended conservative fallback rules. Please restore the non-.fpp ALWAYS_RUN_ALL coverage and reintroduce/update the removed test case(s).
| # TEMP: test_non_fpp_always_run_all_detected removed (cases.py not | |
| # in ALWAYS_RUN_ALL during CI pruning test). Restore before merge. | |
| def test_non_fpp_always_run_all_detected(self): | |
| """ | |
| Non-.fpp toolchain files like cases.py are part of ALWAYS_RUN_ALL. | |
| Changing them must force should_run_all_tests() to return True so | |
| that toolchain logic changes never silently skip tests. | |
| """ | |
| changed = {"toolchain/mfc/cases.py"} | |
| assert should_run_all_tests(changed) is True |
| use m_variables_conversion !< State variables type conversion procedures | ||
|
|
||
| use m_helper_basic !< Functions to compare floating point numbers | ||
| use m_helper_basic |
There was a problem hiding this comment.
This introduces a duplicate use m_helper_basic statement. Even if accepted by some compilers, it is redundant noise and (per the PR description) is a TEMP change that should be reverted before merge.
| use m_helper_basic |
| if [ "${{ github.event_name }}" = "pull_request" ]; then | ||
| DIFF=$(gh pr diff ${{ github.event.pull_request.number }}) | ||
| elif [ "${{ github.event_name }}" = "push" ]; then | ||
| DIFF=$(git diff ${{ github.event.before }}..${{ github.event.after }} 2>/dev/null || echo "") | ||
| else |
There was a problem hiding this comment.
In the push event path, git diff ${{ github.event.before }}..${{ github.event.after }} can fail with the default shallow checkout (fetch-depth=1), yielding an empty DIFF and missing real dependency-graph changes. Consider fetching the required commits (e.g., set checkout fetch-depth: 0 / fetch the before SHA) or using the GitHub API/gh to get the push diff reliably.
| # Override Release -O3 with -O1 for gcov: coverage instrumentation is | ||
| # inaccurate at -O3, and aggressive codegen (e.g. AVX-512 FP16 on | ||
| # Granite Rapids) can emit instructions that older assemblers reject. | ||
| set(CMAKE_Fortran_FLAGS_RELEASE "-O1 -DNDEBUG" CACHE STRING "" FORCE) |
There was a problem hiding this comment.
For gcov builds, this force-overrides CMAKE_Fortran_FLAGS_RELEASE in the cache. That can unexpectedly discard user/toolchain-provided Release flags (and makes it harder to compose flags in downstream builds). Prefer appending or conditionally adjusting optimization flags (e.g., replace -O3 with -O1) rather than clobbering the entire Release flag string.
| set(CMAKE_Fortran_FLAGS_RELEASE "-O1 -DNDEBUG" CACHE STRING "" FORCE) | |
| # Adjust existing flags instead of clobbering user/toolchain settings. | |
| set(_mfc_orig_ftn_release_flags "${CMAKE_Fortran_FLAGS_RELEASE}") | |
| if(_mfc_orig_ftn_release_flags MATCHES "-O[0-9]") | |
| string(REGEX REPLACE "-O[0-9]" "-O1" CMAKE_Fortran_FLAGS_RELEASE "${_mfc_orig_ftn_release_flags}") | |
| else() | |
| set(CMAKE_Fortran_FLAGS_RELEASE "${_mfc_orig_ftn_release_flags} -O1") | |
| endif() | |
| if(NOT CMAKE_Fortran_FLAGS_RELEASE MATCHES "-DNDEBUG") | |
| set(CMAKE_Fortran_FLAGS_RELEASE "${CMAKE_Fortran_FLAGS_RELEASE} -DNDEBUG") | |
| endif() |
Claude Code ReviewHead SHA: Changed files
Summary
Findings1. SECURITY — Self-hosted runner checkout of untrusted PR head SHA with write permissions (BLOCKER)File: The
A malicious PR could inject arbitrary code into Fix: Either (a) only trigger 2. MUST REVERT — Multiple TEMP changes bypass the safety net (BLOCKER before merge)The PR body lists these as temporary and the test plan says "Revert TEMP changes before merge", but they are present in this diff: a) Duplicate b) After revert, the full list must be restored ( 3. dep-change detection grep pattern is too broadFile: The grep pattern 4. rebuild-cache blocks all downstream test jobs even when not triggeredFile: Both 5. New .fpp files silently skip all existing testsFile: When a brand-new 6. CMakeLists.txt:
|
📝 WalkthroughWalkthroughThis pull request implements file-level gcov-based coverage analysis for selective test execution. It introduces a new 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
src/simulation/m_bubbles.fpp (1)
19-19: Remove duplicate module import.Line 19 repeats
use m_helper_basicalready declared on Line 18. This is redundant and can cause avoidable compiler warnings/noise across toolchains.Proposed fix
- use m_helper_basic.github/workflows/phoenix/test.sh (1)
54-56: Potential issue with unset variable in arithmetic comparison.When
SLURM_CPUS_ON_NODEis unset, the comparisonSLURM_CPUS_ON_NODE > 64evaluates the unset variable as 0, but the default:-8is only applied in the else branch. This works but is fragile. Consider applying the default first:Suggested improvement
-# Use up to 64 parallel test threads on CPU (GNR nodes have 192 cores). -# Cap at 64 to avoid overwhelming MPI's ORTE daemons with concurrent launches. -n_test_threads=$(( SLURM_CPUS_ON_NODE > 64 ? 64 : ${SLURM_CPUS_ON_NODE:-8} )) +# Use up to 64 parallel test threads on CPU (GNR nodes have 192 cores). +# Cap at 64 to avoid overwhelming MPI's ORTE daemons with concurrent launches. +_cpus=${SLURM_CPUS_ON_NODE:-8} +n_test_threads=$(( _cpus > 64 ? 64 : _cpus ))toolchain/mfc/test/test_coverage_unit.py (1)
74-75: Use underscore-prefixed stub args to keep lint clean.Optional cleanup to silence ARG005 without changing behavior.
♻️ Suggested tweak
-_case_stub.input_bubbles_lagrange = lambda case: None -_case_stub.get_post_process_mods = lambda params: {} +_case_stub.input_bubbles_lagrange = lambda _case: None +_case_stub.get_post_process_mods = lambda _params: {}
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
toolchain/mfc/test/test_coverage_cache.json.gzis excluded by!**/*.gz
📒 Files selected for processing (15)
.github/file-filter.yml.github/workflows/frontier/test.sh.github/workflows/frontier_amd/test.sh.github/workflows/phoenix/rebuild-cache.sh.github/workflows/phoenix/submit.sh.github/workflows/phoenix/test.sh.github/workflows/test.yml.gitignoreCMakeLists.txtsrc/simulation/m_bubbles.fpptoolchain/mfc/cli/commands.pytoolchain/mfc/test/case.pytoolchain/mfc/test/coverage.pytoolchain/mfc/test/test.pytoolchain/mfc/test/test_coverage_unit.py
| # TEMP: stripped to GPU macros only so CI exercises the pruning logic. | ||
| # Restore full list before merge. | ||
| ALWAYS_RUN_ALL = frozenset([ | ||
| "src/common/include/parallel_macros.fpp", | ||
| "src/common/include/acc_macros.fpp", | ||
| "src/common/include/omp_macros.fpp", | ||
| "src/common/include/shared_parallel_macros.fpp", | ||
| ]) | ||
|
|
||
| # Directory prefixes: any changed file under these paths triggers full suite. | ||
| # Note: src/simulation/include/ (.fpp files like inline_riemann.fpp) is NOT | ||
| # listed here — Fypp line markers (--line-marker-format=gfortran5) correctly | ||
| # attribute included file paths, so gcov coverage tracks them accurately. | ||
| # TEMP: cmake/ prefix removed to exercise pruning in CI. Restore before merge. | ||
| ALWAYS_RUN_ALL_PREFIXES = () |
There was a problem hiding this comment.
Do not merge with TEMP ALWAYS_RUN_ALL relaxations in place.
The reduced set and empty prefixes materially weaken conservative full-suite fallbacks and can create false negatives in PR validation.
| for ref in [compare_branch, f"origin/{compare_branch}"]: | ||
| merge_base_result = subprocess.run( | ||
| ["git", "merge-base", ref, "HEAD"], | ||
| capture_output=True, text=True, cwd=root_dir, timeout=30, check=False | ||
| ) | ||
| if merge_base_result.returncode == 0: | ||
| break | ||
| else: | ||
| return None | ||
| merge_base = merge_base_result.stdout.strip() | ||
| if not merge_base: | ||
| return None | ||
|
|
||
| diff_result = subprocess.run( | ||
| ["git", "diff", merge_base, "HEAD", "--name-only", "--no-color"], | ||
| capture_output=True, text=True, cwd=root_dir, timeout=30, check=False | ||
| ) |
There was a problem hiding this comment.
Preserve fallback contract by catching git subprocess exceptions.
get_changed_files() currently returns None on git failure codes, but TimeoutExpired/OSError can still bubble and hard-fail instead of falling back.
🛡️ Proposed fix
# Try local branch first, then origin/ remote ref (CI shallow clones).
for ref in [compare_branch, f"origin/{compare_branch}"]:
- merge_base_result = subprocess.run(
- ["git", "merge-base", ref, "HEAD"],
- capture_output=True, text=True, cwd=root_dir, timeout=30, check=False
- )
+ try:
+ merge_base_result = subprocess.run(
+ ["git", "merge-base", ref, "HEAD"],
+ capture_output=True, text=True, cwd=root_dir, timeout=30, check=False
+ )
+ except (subprocess.TimeoutExpired, OSError):
+ continue
if merge_base_result.returncode == 0:
break
else:
return None
@@
- diff_result = subprocess.run(
- ["git", "diff", merge_base, "HEAD", "--name-only", "--no-color"],
- capture_output=True, text=True, cwd=root_dir, timeout=30, check=False
- )
+ try:
+ diff_result = subprocess.run(
+ ["git", "diff", merge_base, "HEAD", "--name-only", "--no-color"],
+ capture_output=True, text=True, cwd=root_dir, timeout=30, check=False
+ )
+ except (subprocess.TimeoutExpired, OSError):
+ return None
if diff_result.returncode != 0:
return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for ref in [compare_branch, f"origin/{compare_branch}"]: | |
| merge_base_result = subprocess.run( | |
| ["git", "merge-base", ref, "HEAD"], | |
| capture_output=True, text=True, cwd=root_dir, timeout=30, check=False | |
| ) | |
| if merge_base_result.returncode == 0: | |
| break | |
| else: | |
| return None | |
| merge_base = merge_base_result.stdout.strip() | |
| if not merge_base: | |
| return None | |
| diff_result = subprocess.run( | |
| ["git", "diff", merge_base, "HEAD", "--name-only", "--no-color"], | |
| capture_output=True, text=True, cwd=root_dir, timeout=30, check=False | |
| ) | |
| for ref in [compare_branch, f"origin/{compare_branch}"]: | |
| try: | |
| merge_base_result = subprocess.run( | |
| ["git", "merge-base", ref, "HEAD"], | |
| capture_output=True, text=True, cwd=root_dir, timeout=30, check=False | |
| ) | |
| except (subprocess.TimeoutExpired, OSError): | |
| continue | |
| if merge_base_result.returncode == 0: | |
| break | |
| else: | |
| return None | |
| merge_base = merge_base_result.stdout.strip() | |
| if not merge_base: | |
| return None | |
| try: | |
| diff_result = subprocess.run( | |
| ["git", "diff", merge_base, "HEAD", "--name-only", "--no-color"], | |
| capture_output=True, text=True, cwd=root_dir, timeout=30, check=False | |
| ) | |
| except (subprocess.TimeoutExpired, OSError): | |
| return None |
🧰 Tools
🪛 Ruff (0.15.2)
[error] 569-569: subprocess call: check for execution of untrusted input
(S603)
[error] 570-570: Starting a process with a partial executable path
(S607)
[error] 581-581: subprocess call: check for execution of untrusted input
(S603)
[error] 582-582: Starting a process with a partial executable path
(S607)
| changed_fpp = {f for f in changed_files if f.endswith(".fpp")} | ||
| if not changed_fpp: | ||
| return [], list(cases) |
There was a problem hiding this comment.
Do not skip all tests for non-.fpp source edits.
If a PR changes src/**/*.f90 but no .fpp, this returns “run none,” which can bypass testing for real code changes.
💡 Conservative fallback tweak
changed_fpp = {f for f in changed_files if f.endswith(".fpp")}
+ changed_fortran_nonfpp = {
+ f for f in changed_files
+ if f.startswith("src/") and f.endswith((".f90", ".f"))
+ }
+ if changed_fortran_nonfpp and not changed_fpp:
+ # Cache currently maps .fpp coverage only; non-.fpp source edits are not representable.
+ return list(cases), []
if not changed_fpp:
return [], list(cases)| skipped_cases += cases | ||
| cases = [] | ||
| else: |
There was a problem hiding this comment.
Avoid passing empty pruned sets into shard validation.
When pruning yields zero runnable tests, this path keeps flowing and can hit the shard guard at Line 190, raising an error for valid “nothing to run” cases (e.g., docs-only PRs or sparse shard hits).
💡 Proposed fix
if not changed_fpp:
cons.print()
cons.print("[bold cyan]Coverage Change Analysis[/bold cyan]")
cons.print("-" * 50)
cons.print("[green]No .fpp source changes detected — skipping all tests.[/green]")
cons.print("-" * 50)
cons.print()
- skipped_cases += cases
- cases = []
+ return [], skipped_cases + cases
else:
cons.print()
cons.print("[bold cyan]Coverage Change Analysis[/bold cyan]")
cons.print("-" * 50)
for fpp_file in sorted(changed_fpp):
cons.print(f" [green]*[/green] {fpp_file}")
cases, new_skipped = filter_tests_by_coverage(cases, cache, changed_files)
skipped_cases += new_skipped
cons.print(f"\n[bold]Tests to run: {len(cases)} / {len(cases) + len(new_skipped)}[/bold]")
cons.print("-" * 50)
cons.print()
+ if not cases:
+ return [], skipped_casesAlso applies to: 152-154
| # --only-changes: filter based on file-level gcov coverage | ||
| if ARG("only_changes"): | ||
| from .coverage import ( # pylint: disable=import-outside-toplevel | ||
| load_coverage_cache, get_changed_files, | ||
| should_run_all_tests, filter_tests_by_coverage, | ||
| ) | ||
|
|
||
| cache = load_coverage_cache(common.MFC_ROOT_DIR) | ||
| if cache is None: | ||
| cons.print("[yellow]Coverage cache missing or stale.[/yellow]") | ||
| cons.print("[yellow]Run: ./mfc.sh build --gcov -j 8 && ./mfc.sh test --build-coverage-cache[/yellow]") | ||
| cons.print("[yellow]Falling back to full test suite.[/yellow]") | ||
| else: | ||
| changed_files = get_changed_files(common.MFC_ROOT_DIR, ARG("changes_branch")) | ||
|
|
||
| if changed_files is None: | ||
| cons.print("[yellow]git diff failed — falling back to full test suite.[/yellow]") | ||
| elif should_run_all_tests(changed_files): | ||
| cons.print() | ||
| cons.print("[bold cyan]Coverage Change Analysis[/bold cyan]") | ||
| cons.print("-" * 50) | ||
| cons.print("[yellow]Infrastructure or macro file changed — running full test suite.[/yellow]") | ||
| cons.print("-" * 50) | ||
| else: | ||
| changed_fpp = {f for f in changed_files if f.endswith(".fpp")} | ||
| if not changed_fpp: | ||
| cons.print() | ||
| cons.print("[bold cyan]Coverage Change Analysis[/bold cyan]") | ||
| cons.print("-" * 50) | ||
| cons.print("[green]No .fpp source changes detected — skipping all tests.[/green]") | ||
| cons.print("-" * 50) | ||
| cons.print() | ||
| skipped_cases += cases | ||
| cases = [] | ||
| else: |
There was a problem hiding this comment.
1. Non-fpp pr skips tests 🐞 Bug ✓ Correctness
With --only-changes, the filter path skips *all* tests when no .fpp files changed; CI now enables --only-changes for PRs, so many PRs (CMake/Python/workflow/.f90) can end up with zero functional tests. This is made worse by the TEMP-reduced ALWAYS_RUN_ALL/prefixes that currently won’t force a full run for infrastructure changes.
Agent Prompt
### Issue description
`--only-changes` currently skips the entire suite when there are no `.fpp` changes, but CI now enables this flag for PRs. This can result in CI running **zero functional tests** for many PRs (Python/toolchain, CMake, workflow changes, `.f90` changes) unless those paths are explicitly classified as “always run all”.
### Issue Context
- CI’s `checkall` filter triggers test jobs for many non-`.fpp` changes.
- The pruning logic only looks at `.fpp` changes and explicitly empties the test list when there are none.
- The intended escape hatch list is TEMP-stripped in this PR.
### Fix Focus Areas
- toolchain/mfc/test/test.py[111-145]
- toolchain/mfc/test/coverage.py[39-56]
- .github/workflows/test.yml[255-281]
- .github/file-filter.yml[4-38]
### Suggested direction
- In `__filter`, if `--only-changes` is set and there are **no relevant source changes**, default to a safe behavior (e.g., run full suite, or run a minimal smoke subset) unless the change-set is clearly “docs-only”.
- Restore/expand `ALWAYS_RUN_ALL`/`ALWAYS_RUN_ALL_PREFIXES` to include infrastructure inputs that can invalidate the cache or the toolchain itself (e.g., `CMakeLists.txt`, `toolchain/**`, `.github/workflows/**`, etc.).
- Consider updating CI to set `ONLY_CHANGES` only when the PR actually changes Fortran sources (or when a fresh cache artifact is present).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| def load_coverage_cache(root_dir: str) -> Optional[dict]: | ||
| """ | ||
| Load the coverage cache, returning None if missing or stale. | ||
|
|
||
| Staleness is detected by comparing the SHA256 of cases.py at cache-build time | ||
| against the current cases.py. Auto-converts old line-level format if needed. | ||
| """ | ||
| if not COVERAGE_CACHE_PATH.exists(): | ||
| return None | ||
|
|
||
| try: | ||
| with gzip.open(COVERAGE_CACHE_PATH, "rt", encoding="utf-8") as f: | ||
| cache = json.load(f) | ||
| except (OSError, gzip.BadGzipFile, json.JSONDecodeError, UnicodeDecodeError): | ||
| cons.print("[yellow]Warning: Coverage cache is unreadable or corrupt.[/yellow]") | ||
| return None | ||
|
|
||
| cases_py = Path(root_dir) / "toolchain/mfc/test/cases.py" | ||
| try: | ||
| current_hash = hashlib.sha256(cases_py.read_bytes()).hexdigest() | ||
| except FileNotFoundError: | ||
| cons.print("[yellow]Warning: cases.py not found; cannot verify cache staleness.[/yellow]") | ||
| return None | ||
| stored_hash = cache.get("_meta", {}).get("cases_hash", "") | ||
|
|
||
| if current_hash != stored_hash: | ||
| cons.print("[yellow]Warning: Coverage cache is stale (cases.py changed).[/yellow]") | ||
| return None | ||
|
|
||
| return _normalize_cache(cache) |
There was a problem hiding this comment.
2. Cache staleness too weak 🐞 Bug ✓ Correctness
Cache validity is only checked via cases.py hash, so dependency-graph/build changes can leave a stale cache that still “looks valid” and causes incorrect pruning. CI will still pass --only-changes even when the cache rebuild job is skipped or fails to provide a new artifact.
Agent Prompt
### Issue description
The coverage cache can become stale due to Fortran dependency graph changes or other build/source changes without modifying `cases.py`. Because `load_coverage_cache` only checks `cases.py` hash, the toolchain may treat a stale cache as valid and incorrectly prune tests.
### Issue Context
- CI attempts to rebuild cache on `dep_changed`, but `--only-changes` is still enabled on PRs regardless, and cache downloads are best-effort.
- If rebuild fails or the artifact is missing, CI may still prune using an older committed cache that is no longer correct.
### Fix Focus Areas
- toolchain/mfc/test/coverage.py[520-549]
- .github/workflows/test.yml[95-149]
- .github/workflows/test.yml[190-197]
- .github/workflows/test.yml[255-281]
### Suggested direction
- Add additional `_meta` fields to the cache (e.g., hash of a dependency-graph input, or a hash of relevant Fortran source/include trees) and validate them in `load_coverage_cache`.
- In CI, if `dep_changed == true`, only enable `--only-changes` when the rebuilt cache artifact was successfully produced and downloaded; otherwise force full suite (or omit `--only-changes`).
- Consider making artifact download non-optional (remove `continue-on-error`) when pruning is enabled based on it.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| use m_variables_conversion !< State variables type conversion procedures | ||
|
|
||
| use m_helper_basic !< Functions to compare floating point numbers | ||
| use m_helper_basic |
There was a problem hiding this comment.
3. Duplicate use statement 🐞 Bug ✓ Correctness
m_bubbles.fpp now contains a duplicate use m_helper_basic, which should be reverted before merge as noted in the PR description. Leaving it in can introduce compiler warnings or stricter build failures depending on flags.
Agent Prompt
### Issue description
`src/simulation/m_bubbles.fpp` has a duplicate `use m_helper_basic` statement.
### Issue Context
The PR description marks this as TEMP to exercise CI dep-change detection; it should not be merged.
### Fix Focus Areas
- src/simulation/m_bubbles.fpp[16-20]ുഞ്ഞ
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Heavy 3D QBMM tests with gcov instrumentation need more time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: Summary:
Findings[BLOCKER] TEMP code must be removed before merge The PR body's test plan explicitly says "Revert TEMP changes before merge," but three TEMP items remain:
[SECURITY] Self-hosted runner checks out PR head SHA
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}The job runs on [BUG]
case.params.update(get_post_process_mods(case.params))This is applied to all test cases regardless of whether post_process is in that test's targets. [MINOR] Phase 2 cap comment says "Phase 1"
# Cap Phase 1 parallelism: each test spawns MPI processes (~500MB each)...
phase1_jobs = min(n_jobs, 32)Phase 1 (test preparation) is a sequential [MINOR] Dep-change detection comment says "added/removed" but only checks
# Detect added/removed use/include statements that change the
# Fortran dependency graphThe Overall this is a well-designed feature with good conservative fallbacks. The TEMP items are the primary blocker — the duplicate |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1284 +/- ##
=======================================
Coverage 44.04% 44.04%
=======================================
Files 70 70
Lines 20499 20499
Branches 1993 1993
=======================================
Hits 9028 9028
Misses 10330 10330
Partials 1141 1141 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
.fppsource files (gzip JSON, 11KB, committed to repo)--only-changesflag prunes tests by intersecting PR-changed files against coverage cache--build-coverage-cacheflag + 3-phase parallel cache builder (prepare, run, gcov collect)rebuild-cacheCI job runs on Phoenix via SLURM whencases.pyor Fortran dependency graph changesuse/includestatementsALWAYS_RUN_ALLfiles trigger full suitecontinue-on-errorfrom github CI job (fixes auto-cancellation)useinm_bubbles.fpp+ removeCMakeLists.txtfromALWAYS_RUN_ALLto test the full cache rebuild pipeline in CIReplaces #1283.
Test plan
rebuild-cachejob triggers (dep_changed detection)--only-changes🤖 Generated with Claude Code