Refactor m_riemann_solvers Module (HLLC Solver Subroutine)#912
Refactor m_riemann_solvers Module (HLLC Solver Subroutine)#912sbryngelson merged 16 commits intoMFlowCode:masterfrom
m_riemann_solvers Module (HLLC Solver Subroutine)#912Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the s_hllc_riemann_solver subroutine in m_riemann_solvers.fpp to streamline variable initialization, consolidate energy adjustment logic, and introduce a loop_end helper for fluid loops.
- Consolidated zero-initialization of primitive-state variables at the start of each cell loop
- Merged hypoelastic and hyperelastic energy adjustments into a single conditional block
- Introduced
loop_endto simplify fluid‐count branching
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||
|
Pushing this after ensuring select buggy tests that failed prior have passed neatly. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #912 +/- ##
==========================================
- Coverage 40.92% 40.91% -0.02%
==========================================
Files 70 70
Lines 20299 20270 -29
Branches 2521 2520 -1
==========================================
- Hits 8307 8293 -14
+ Misses 10454 10439 -15
Partials 1538 1538 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||
| if (mpp_lim) then | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| qL_prim_rs${XYZ}$_vf(j, k, l, i) = max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, i)) | ||
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = min(max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)), 1._wp) | ||
| alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) | ||
| end do | ||
|
|
||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps) | ||
| end do | ||
|
|
||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, i) = max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, i)) | ||
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = min(max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)), 1._wp) | ||
| alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) | ||
| alpha_R_sum = alpha_R_sum + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) | ||
| end do | ||
|
|
||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps) | ||
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)/max(alpha_R_sum, sgm_eps) | ||
| end do | ||
| end if |
There was a problem hiding this comment.
Suggestion: Guard against the case where all volume fractions are zero after limiting. If both alpha_*_sum are zero, the normalization keeps zeros and can cause later divisions by rho_* or invalid EOS mixtures. Ensure a fallback that assigns a minimum fraction to a valid phase to keep sums positive. [possible issue, importance: 8]
| if (mpp_lim) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| qL_prim_rs${XYZ}$_vf(j, k, l, i) = max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, i)) | |
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = min(max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)), 1._wp) | |
| alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) | |
| end do | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps) | |
| end do | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, i) = max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, i)) | |
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = min(max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)), 1._wp) | |
| alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) | |
| alpha_R_sum = alpha_R_sum + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) | |
| end do | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps) | |
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)/max(alpha_R_sum, sgm_eps) | |
| end do | |
| end if | |
| if (mpp_lim) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| qL_prim_rs${XYZ}$_vf(j, k, l, i) = max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, i)) | |
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = min(max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)), 1._wp) | |
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, i) = max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, i)) | |
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = min(max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)), 1._wp) | |
| alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) | |
| alpha_R_sum = alpha_R_sum + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) | |
| end do | |
| if (alpha_L_sum <= sgm_eps) then | |
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + 1) = 1._wp | |
| alpha_L_sum = 1._wp | |
| end if | |
| if (alpha_R_sum <= sgm_eps) then | |
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + 1) = 1._wp | |
| alpha_R_sum = 1._wp | |
| end if | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps) | |
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)/max(alpha_R_sum, sgm_eps) | |
| end do | |
| end if |
| E_L = gamma_L*pres_L + pi_inf_L + 5.e-1_wp*rho_L*vel_L_rms + qv_L | ||
|
|
||
| E_R = gamma_R*pres_R + pi_inf_R + 5.e-1_wp*rho_R*vel_R_rms + qv_R |
There was a problem hiding this comment.
Suggestion: Protect against zero or negative rho_* before using in energy/enthalpy computations. During limiting or reconstruction, rho_* can be zero, leading to NaNs downstream when computing H_* or wave speeds. Clamp to a minimum positive value. [possible issue, importance: 8]
| E_L = gamma_L*pres_L + pi_inf_L + 5.e-1_wp*rho_L*vel_L_rms + qv_L | |
| E_R = gamma_R*pres_R + pi_inf_R + 5.e-1_wp*rho_R*vel_R_rms + qv_R | |
| rho_L = max(rho_L, sgm_eps) | |
| rho_R = max(rho_R, sgm_eps) | |
| E_L = gamma_L*pres_L + pi_inf_L + 5.e-1_wp*rho_L*vel_L_rms + qv_L | |
| E_R = gamma_R*pres_R + pi_inf_R + 5.e-1_wp*rho_R*vel_R_rms + qv_R |
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| qL_prim_rs${XYZ}$_vf(j, k, l, i) = max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, i)) | ||
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = min(max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)), 1._wp) | ||
| alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) | ||
| end do | ||
|
|
||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps) | ||
| end do | ||
|
|
||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, i) = max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, i)) | ||
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = min(max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)), 1._wp) | ||
| alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) | ||
| alpha_R_sum = alpha_R_sum + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) | ||
| end do | ||
|
|
||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, num_fluids | ||
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps) | ||
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)/max(alpha_R_sum, sgm_eps) | ||
| end do |
There was a problem hiding this comment.
Suggestion: Guard against the case where all alpha values clamp to zero after limiting, which would make alpha_*_sum be zero and leave all alpha entries at zero after normalization. This can lead to rho=0 and divisions by zero downstream when forming H and wave speeds. If the sum is below sgm_eps, redistribute uniformly (or keep the largest component) to ensure a valid partition of unity. [possible issue, importance: 8]
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| qL_prim_rs${XYZ}$_vf(j, k, l, i) = max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, i)) | |
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = min(max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)), 1._wp) | |
| alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) | |
| end do | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps) | |
| end do | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, i) = max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, i)) | |
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = min(max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)), 1._wp) | |
| alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) | |
| alpha_R_sum = alpha_R_sum + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) | |
| end do | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/max(alpha_L_sum, sgm_eps) | |
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)/max(alpha_R_sum, sgm_eps) | |
| end do | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| qL_prim_rs${XYZ}$_vf(j, k, l, i) = max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, i)) | |
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = min(max(0._wp, qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)), 1._wp) | |
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, i) = max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, i)) | |
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = min(max(0._wp, qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)), 1._wp) | |
| alpha_L_sum = alpha_L_sum + qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) | |
| alpha_R_sum = alpha_R_sum + qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) | |
| end do | |
| if (alpha_L_sum < sgm_eps) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = 1._wp/real(num_fluids, wp) | |
| end do | |
| alpha_L_sum = 1._wp | |
| end if | |
| if (alpha_R_sum < sgm_eps) then | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = 1._wp/real(num_fluids, wp) | |
| end do | |
| alpha_R_sum = 1._wp | |
| end if | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, num_fluids | |
| qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + i)/alpha_L_sum | |
| qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + i)/alpha_R_sum | |
| end do |
| E_L = gamma_L*pres_L + pi_inf_L + 5.e-1_wp*rho_L*vel_L_rms + qv_L | ||
| E_R = gamma_R*pres_R + pi_inf_R + 5.e-1_wp*rho_R*vel_R_rms + qv_R | ||
|
|
||
| H_L = (E_L + pres_L)/rho_L |
There was a problem hiding this comment.
Suggestion: Protect divisions by rho_L and rho_R in enthalpy when density can approach zero (e.g., after limiting or at vacuum states). Clamp densities with max(rho, sgm_eps) to avoid NaNs and ensure stable wave speed estimates in the HLLC solver. [possible issue, importance: 8]
New proposed code:
E_L = gamma_L*pres_L + pi_inf_L + 5.e-1_wp*rho_L*vel_L_rms + qv_L
E_R = gamma_R*pres_R + pi_inf_R + 5.e-1_wp*rho_R*vel_R_rms + qv_R
-H_L = (E_L + pres_L)/rho_L
-H_R = (E_R + pres_R)/rho_R
+H_L = (E_L + pres_L)/max(rho_L, sgm_eps)
+H_R = (E_R + pres_R)/max(rho_R, sgm_eps)| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, 2 | ||
| Re_L(i) = dflt_real | ||
|
|
||
| Re_R(i) = dflt_real | ||
| if (Re_size(i) > 0) Re_L(i) = 0._wp | ||
|
|
||
| if (Re_size(i) > 0) Re_R(i) = 0._wp | ||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do q = 1, Re_size(i) | ||
| Re_L(i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + Re_idx(i, q))/Res(i, q) & | ||
| + Re_L(i) | ||
| end do | ||
|
|
||
| Re_L(i) = 1._wp/max(Re_L(i), sgm_eps) | ||
|
|
||
| end do | ||
|
|
||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do i = 1, 2 | ||
| Re_R(i) = dflt_real | ||
|
|
||
| if (Re_size(i) > 0) Re_R(i) = 0._wp | ||
|
|
||
| $:GPU_LOOP(parallelism='[seq]') | ||
| do q = 1, Re_size(i) | ||
| Re_R(i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + Re_idx(i, q))/Res(i, q) & | ||
| + Re_R(i) | ||
| end do | ||
|
|
||
| Re_L(i) = 1._wp/max(Re_L(i), sgm_eps) | ||
| Re_R(i) = 1._wp/max(Re_R(i), sgm_eps) | ||
| end do |
There was a problem hiding this comment.
Suggestion: Prevent division by zero if any Res(i,q) entries are zero or extremely small due to user input or preprocessing. Apply max(Res(i,q), sgm_eps) in the denominator when accumulating to ensure finite Reynolds numbers. [general, importance: 7]
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, 2 | |
| Re_L(i) = dflt_real | |
| Re_R(i) = dflt_real | |
| if (Re_size(i) > 0) Re_L(i) = 0._wp | |
| if (Re_size(i) > 0) Re_R(i) = 0._wp | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do q = 1, Re_size(i) | |
| Re_L(i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + Re_idx(i, q))/Res(i, q) & | |
| + Re_L(i) | |
| end do | |
| Re_L(i) = 1._wp/max(Re_L(i), sgm_eps) | |
| end do | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, 2 | |
| Re_R(i) = dflt_real | |
| if (Re_size(i) > 0) Re_R(i) = 0._wp | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do q = 1, Re_size(i) | |
| Re_R(i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + Re_idx(i, q))/Res(i, q) & | |
| + Re_R(i) | |
| end do | |
| Re_L(i) = 1._wp/max(Re_L(i), sgm_eps) | |
| Re_R(i) = 1._wp/max(Re_R(i), sgm_eps) | |
| end do | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do i = 1, 2 | |
| Re_L(i) = dflt_real | |
| Re_R(i) = dflt_real | |
| if (Re_size(i) > 0) Re_L(i) = 0._wp | |
| if (Re_size(i) > 0) Re_R(i) = 0._wp | |
| $:GPU_LOOP(parallelism='[seq]') | |
| do q = 1, Re_size(i) | |
| Re_L(i) = qL_prim_rs${XYZ}$_vf(j, k, l, E_idx + Re_idx(i, q))/max(Res(i, q), sgm_eps) & | |
| + Re_L(i) | |
| Re_R(i) = qR_prim_rs${XYZ}$_vf(j + 1, k, l, E_idx + Re_idx(i, q))/max(Res(i, q), sgm_eps) & | |
| + Re_R(i) | |
| end do | |
| Re_L(i) = 1._wp/max(Re_L(i), sgm_eps) | |
| Re_R(i) = 1._wp/max(Re_R(i), sgm_eps) | |
| end do |
There was a problem hiding this comment.
why do we have two loops when we could have one
There was a problem hiding this comment.
Ask❓
why do we have two loops when we could have one
Answer:
In the original code, Re_L and Re_R were computed in two separate loops (first for the left state, then for the right). Functionally, both loops perform the same accumulation over q for each i, then apply the same stabilization 1/max(..., sgm_eps). There’s no data dependency between the left and right accumulations, so keeping them separate wasn’t necessary.
In this refactor, they’re consolidated into a single loop over i that:
- Initializes both Re_L(i) and Re_R(i),
- Accumulates both left and right sums in the same inner loop over q,
- Finalizes both with the same normalization.
Benefits of one loop:
- Reduces loop overhead and improves cache locality by touching Re_idx/Res and the corresponding qL/qR entries once per i.
- Lowers code duplication, making maintenance less error-prone (e.g., any future fixes like adding max(Res(i,q), sgm_eps) can be made in one place).
- Better GPU mapping: fewer separate GPU_LOOP regions and improved kernel fusion potential.
In short, there’s no correctness need for two loops; a single loop computes both sides independently and more efficiently.
User description
Description
Reduced PR from (#855),
Essentially, I refactored math-critical items in
s_hllc_riemann_solversubroutine.PR Type
Enhancement
Description
Refactor HLLC Riemann solver variable initialization
Consolidate loop structures for better performance
Optimize Reynolds number calculations
Streamline pressure and energy computations
Diagram Walkthrough
File Walkthrough
m_riemann_solvers.fpp
HLLC Riemann solver performance optimizationsrc/simulation/m_riemann_solvers.fpp
organization
computations