Openmp cce by prathi-wind · Pull Request #1035 · MFlowCode/MFC

prathi-wind · 2025-11-10T20:13:16Z

User description

Description

Add OpenMP support to MFC on the Cray Compiler. Passes all test cases, and adds CI to test on Frontier.

Also, adds in mixed precision made added for Gordon Bell.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Something else

PR Type

Enhancement, Bug fix

Description

Add OpenMP support to MFC on the Cray Compiler with comprehensive case optimization
Implement mixed precision support throughout the codebase for improved performance and memory efficiency
Refactor boundary condition arrays from negative indexing (-1:1) to positive indexing (1:2) for optimization
Add case optimization guards to skip unnecessary 3D computations in 2D simulations (WENO, viscous, MHD, capillary)
Introduce simplex noise generation module for procedural perturbation support
Fix GPU compatibility issues in chemistry and FFT modules for both NVIDIA and AMD compilers
Update MPI operations to support mixed precision with proper type conversions and 64-bit indexing
Refactor bubble Eulerian projection data structure for improved memory layout
Fix MUSCL workspace array naming conflicts and GPU parallel loop syntax issues
Add CI testing on Frontier supercomputer for Cray Compiler validation

Diagram Walkthrough

flowchart LR
  A["Cray Compiler<br/>OpenMP Support"] --> B["Case Optimization<br/>2D/3D Guards"]
  A --> C["Mixed Precision<br/>wp/stp Types"]
  B --> D["WENO, Viscous,<br/>MHD Modules"]
  C --> E["Boundary Arrays<br/>MPI Operations"]
  C --> F["Bubble EL,<br/>QBMM Modules"]
  G["GPU Compatibility<br/>Fixes"] --> H["Chemistry<br/>FFT Modules"]
  I["Simplex Noise<br/>Module"] --> J["Perturbation<br/>Support"]
  D --> K["Performance<br/>Optimization"]
  E --> K
  H --> K

File Walkthrough

Relevant files

Enhancement

17 files

m_boundary_common.fpp `Refactor boundary arrays and add case optimization support` src/common/m_boundary_common.fpp Changed boundary condition array indexing from `(-1:1)` to `(1:2)` to support case optimization Updated all references from negative indices `(-1, 1)` to positive indices `(1, 2)` throughout the module Added conditional compilation directives `#:if not` `MFC_CASE_OPTIMIZATION or num_dims > 2` to skip 3D boundary operations in optimized 2D cases Changed data type from `real(wp)` to `real(stp)` for mixed precision support in buffer parameters Updated MPI I/O operations to handle mixed precision with `MPI_BYTE` and element count calculations	+249/-231
m_weno.fpp `Add case optimization guards to WENO reconstruction schemes` src/simulation/m_weno.fpp Added `#:include 'case.fpp'` directive for case optimization support Wrapped WENO order 5 and 7 reconstruction code with conditional compilation `#:if not MFC_CASE_OPTIMIZATION or weno_num_stencils > N` Nested TENO stencil calculations within additional optimization guards for higher-order schemes Maintained existing WENO algorithm logic while enabling compile-time optimization for reduced stencil cases	+286/-274
m_bubbles_EL.fpp `Refactor bubble Eulerian projection data structure for mixed precision` src/simulation/m_bubbles_EL.fpp Changed `q_beta` from `type(vector_field)` to `type(scalar_field),` `dimension(:), allocatable` for better memory layout Updated allocation from `q_beta%vf(1:q_beta_idx)` to `q_beta(1:q_beta_idx)` with individual scalar field setup Changed data type from `real(wp)` to `real(stp)` for mixed precision support in gradient calculations Updated `s_gradient_dir` subroutine signature to accept raw arrays instead of scalar field types Updated all references from `q_beta%vf(i)%sf` to `q_beta(i)%sf` throughout the module	+50/-48
m_boundary_conditions.fpp `Update boundary condition array indexing to positive indices` src/pre_process/m_boundary_conditions.fpp Updated boundary condition array indexing from `(-1:1)` to `(1:2)` in function signatures Added conditional logic to map old negative indices `(-1)` to new positive index `(1)` and `(1)` to `(2)` Updated all `bc_type` array accesses to use new indexing scheme with explicit if-statements for location mapping	+20/-12
m_viscous.fpp `Add case optimization conditionals to viscous module` src/simulation/m_viscous.fpp Added `case.fpp` include directive for case optimization support Wrapped viscous stress tensor computation blocks with `MFC_CASE_OPTIMIZATION` conditional to skip 3D-specific calculations in 2D cases Reorganized gradient computation loops with conditional compilation for dimensional optimization Improved code indentation and structure for better readability	+319/-316
m_riemann_solvers.fpp `Optimize MHD calculations and improve GPU parallelization` src/simulation/m_riemann_solvers.fpp Added `case.fpp` include for case optimization support Wrapped MHD-related 3D vector operations with `MFC_CASE_OPTIMIZATION` conditionals to optimize 2D cases Expanded private variable lists in GPU parallel loops for better memory management Fixed duplicate variable assignments and improved variable initialization in HLLC solver Added explicit type conversions for mixed precision support	+78/-35
m_mpi_common.fpp `Support mixed precision and dynamic dimensionality in MPI` src/common/m_mpi_common.fpp Added `case.fpp` include directive for case optimization Changed `halo_size` from `integer` to `integer(kind=8)` for 64-bit support Fixed dimension indexing to use `num_dims` instead of hardcoded value 3 for 2D/3D compatibility Added explicit type conversions between `wp` and `stp` precision kinds in MPI buffer operations	+23/-23
m_cbc.fpp `Optimize CBC module with case-specific conditionals` src/simulation/m_cbc.fpp Added `case.fpp` include for case optimization support Moved `dpres_ds` variable declaration from module level to local scope within subroutine Added `dpres_ds` to private variable list in GPU parallel loop Wrapped `dpi_inf_dt` assignment with `MFC_CASE_OPTIMIZATION` conditional for single-fluid optimization Improved GPU update directive formatting	+7/-8
m_global_parameters.fpp `Add simplex noise perturbation parameters` src/pre_process/m_global_parameters.fpp Added new `simplex_perturb` logical flag for simplex noise perturbation support Added `simplex_params` derived type variable to store simplex noise parameters Initialized simplex perturbation parameters in module initialization routine	+12/-0
m_qbmm.fpp `Mixed precision support and case optimization for QBMM` src/simulation/m_qbmm.fpp Changed `pb` parameter type from `real(wp)` to `real(stp)` for mixed precision support Wrapped bubble model coefficient calculations with preprocessor conditionals for case optimization Added missing loop variables `i1, i2, j` to GPU parallel loop private list Added TODO comment regarding `rhs_pb` and `rhs_mv` precision types	+98/-87
m_simplex_noise.fpp `New simplex noise generation module` src/pre_process/m_simplex_noise.fpp New module implementing 3D and 2D Perlin simplex noise functions Includes gradient lookup tables and permutation vectors for noise generation Provides `f_simplex3d` and `f_simplex2d` functions for procedural noise	+245/-0
m_fftw.fpp `FFT filter GPU implementation refactoring` src/simulation/m_fftw.fpp Refactored GPU FFT filter implementation to use direct device addresses instead of pointers Removed `#:block UNDEF_CCE` wrapper and simplified GPU data management Restructured loop logic to process initial ring separately before main loop Improved compatibility with both NVIDIA and AMD GPU compilers	+115/-130
m_start_up.fpp `Mixed precision I/O and time stepping improvements` src/simulation/m_start_up.fpp Changed `WP_MOK` from 8 bytes to 4 bytes for mixed precision I/O Updated MPI file read calls to use `mpi_io_type` multiplier for data size Added logic to copy `q_cons_ts(1)` to `q_cons_ts(2)` for multi-stage time steppers Fixed NaN checking to use explicit `real()` conversion for mixed precision Added GPU updates for bubble and chemistry parameters Removed unused GPU declare directive for `q_cons_temp`	+88/-54
m_rhs.fpp `Mixed precision and GPU compatibility in RHS module` src/simulation/m_rhs.fpp Added conditional GPU declare directives for OpenACC-specific declarations Changed loop variable types to `integer(kind=8)` for large array indexing Wrapped flux allocation logic with `if (.not. igr)` condition Updated `bc_type` parameter dimension from `1:3, -1:1` to `1:3, 1:2` Changed `pb_in`, `mv_in` parameters to `real(stp)` type for mixed precision Added TODO comments about precision type consistency	+57/-45
m_icpp_patches.fpp `Mixed precision support for patch identification arrays` src/pre_process/m_icpp_patches.fpp Added conditional compilation for `patch_id_fp` array type based on `MFC_MIXED_PRECISION` flag Uses `integer(kind=1)` for mixed precision mode, standard `integer` otherwise Applied consistently across all patch subroutines for memory efficiency	+75/-4
m_ibm.fpp `Mixed precision support for IBM module` src/simulation/m_ibm.fpp Changed `pb_in` and `mv_in` parameters to `real(stp)` type for mixed precision Added missing variables to GPU parallel loop private list Added type casting for `pb_in` and `mv_in` in interpolation calls Changed `levelset_in` access to use explicit `real()` conversion Updated `pb_in, mv_in` intent from INOUT to IN in interpolation subroutine	+7/-5
inline_capillary.fpp `Case optimization for capillary tensor calculations` src/simulation/include/inline_capillary.fpp Wrapped 3D capillary tensor calculations with preprocessor conditional Only computes 3D components when not using case optimization or when `num_dims > 2` Reduces unnecessary computations for 2D simulations	+7/-6

Bug fix

2 files

m_muscl.fpp `Rename MUSCL arrays and fix GPU loop syntax` src/simulation/m_muscl.fpp Renamed MUSCL workspace arrays from `v_rs_ws_x/y/z` to `v_rs_ws_x/y/z_muscl` to avoid naming conflicts Fixed GPU parallel loop syntax and indentation consistency Corrected `#:endcall` directives to include full macro name for clarity Improved code formatting and nested subroutine indentation	+158/-158
m_chemistry.fpp `GPU compatibility fixes for chemistry module` src/common/m_chemistry.fpp Removed sequential GPU loop directives that were causing compilation issues Added temporary variable `T_in` for type conversion in temperature calculations Wrapped large GPU parallel loop with `#:block UNDEF_AMD` to handle AMD compiler differences Added missing variables to GPU parallel loop private and copyin lists	+113/-112

Additional files

48 files

bench.yml	+8/-0
submit-bench.sh	+1/-1
submit-bench.sh	+1/-1
test.yml	+0/-4
CMakeLists.txt	+18/-7
case.py	+104/-0
load_amd.sh	+7/-0
setupNB.sh	+5/-0
3dHardcodedIC.fpp	+113/-2
macros.fpp	+5/-4
omp_macros.fpp	+25/-1
m_derived_types.fpp	+26/-9
m_helper.fpp	+6/-6
m_precision_select.f90	+14/-0
m_variables_conversion.fpp	+10/-34
m_data_input.f90	+11/-10
m_assign_variables.fpp	+18/-4
m_checker.fpp	+23/-0
m_data_output.fpp	+32/-34
m_initial_condition.fpp	+35/-14
m_mpi_proxy.fpp	+29/-2
m_perturbation.fpp	+90/-1
m_start_up.fpp	+3/-2
m_acoustic_src.fpp	+8/-11
m_body_forces.fpp	+1/-1
m_bubbles_EE.fpp	+4/-4
m_bubbles_EL_kernels.fpp	+15/-15
m_data_output.fpp	+25/-26
m_derived_variables.fpp	+15/-13
m_global_parameters.fpp	+6/-11
m_hyperelastic.fpp	+1/-1
m_hypoelastic.fpp	+4/-4
m_igr.fpp	+1616/-1385
m_mhd.fpp	+1/-1
m_sim_helpers.fpp	+31/-26
m_surface_tension.fpp	+38/-34
m_time_steppers.fpp	+86/-37
p_main.fpp	+2/-0
golden-metadata.txt	+154/-0
golden.txt	+10/-0
build.py	+2/-1
lock.py	+1/-1
case_dicts.py	+16/-0
input.py	+1/-1
state.py	+3/-3
modules	+4/-0
pyproject.toml	+3/-2
frontier.mako	+11/-10

…, started work on enter and exit data, compiles

…, exit data, and update

…d WAIT

…e, add mappers to derived types, change how allocate is done

…loop

…types, removed rest of pure functions, fix issue with acoustic on nvfortran

…utput

src/common/m_boundary_common.fpp

src/simulation/m_viscous.fpp

CMakeLists.txt

examples/3D_IGR_jet_1fluid/case.py

wilfonba · 2025-11-12T19:01:23Z

load_amd.sh

This file needs to be deleted, moved to misc/, or incorporated into the toolchain somehow. This PR doesn't add AMD compilers anyway, so my suggestion would be deleted for now.

setupNB.sh

src/common/include/3dHardcodedIC.fpp

src/simulation/m_rhs.fpp

src/simulation/m_time_steppers.fpp

toolchain/pyproject.toml

src/simulation/p_main.fpp

wilfonba

Approve for benchmark

wilfonba · 2025-11-13T03:51:48Z

I addressed a bunch of my own comments and pushed the changes. Hopefully the weird phoenix test faiulure goes away. It didn't look like an MFC code problem. The rest of the comments are more relevant to a general support of half precision, so I don't think they're very urgent at the moment.

wilfonba

Approve for benchmark

src/common/m_boundary_common.fpp

prathi-wind and others added 30 commits July 22, 2025 13:16

Update mfc python and cmake to support OpenMP

8584e89

Fixed issue with not compiling on CPU builds

7521731

Temporary commit

90c6738

OMP parallel and parallel loop

06a783e

Removed pure markings

2abfad5

Added routine and declare and partial data, non compiling

db1b8c5

Some manual changes to codebase, and implemented attach

ed29d13

Changed parallel loop to also include the end parallel

c1b41a6

Ran formatter

29e9404

Fixed some issues with matching start and end of parallel loop macros…

cec9867

…, started work on enter and exit data, compiles

Moved macro code to their corresponding file, and finished enter data…

33eca5f

…, exit data, and update

remove line that sets default_val to empty string

11822f5

Fixed GPU_PARALLEL for omp and ran formatter

314fa13

Add syscheck of OpenMP, add omp support for GPU_HOST_DATA, ATOMIC, an…

9d23036

…d WAIT

Update var name

de586ad

Change how parallel loop is translated

84ddc01

Ran formatter

7246e9b

Remove extraneous build flags

d5381aa

Remove thermochem function calls

dbcb6f3

Remove LTO add always to data allocation omp, switch delete to releas…

94222f4

…e, add mappers to derived types, change how allocate is done

Fixed parallel loop when no OpenMP or OpenACC

473d19d

Update how allocate macro works, update riemann solver, and parallel …

5703c19

…loop

Passing most 1D cases

7021690

Fixed IGR 2D, readded parallel loop in cbc, undid changes in derived …

828d9d8

…types, removed rest of pure functions, fix issue with acoustic on nvfortran

Forgot to add something for IGR and add back parallel loop for data o…

cb02511

…utput

change binding + test suite works

69b5792

Added missing space

a01f262

Chemistry works with OpenACC and almost works with OpenMP

d7fbcab

Added a half-precision data type

b3c1ad1

Readd LTO to cmake

0d550aa

Copilot finished reviewing on behalf of sbryngelson November 12, 2025 17:21

qodo-code-review bot reviewed Nov 12, 2025

View reviewed changes

src/common/m_boundary_common.fpp Show resolved Hide resolved

src/simulation/m_viscous.fpp Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

sbryngelson approved these changes Nov 12, 2025

View reviewed changes

Anand Radhakrishnan and others added 3 commits November 12, 2025 13:40

fix post_process benchmark

a774051

fix benchmarking on frontier

87fae79

fix benchmarking on frontier

771797b

sbryngelson approved these changes Nov 12, 2025

View reviewed changes

wilfonba reviewed Nov 12, 2025

View reviewed changes

wilfonba approved these changes Nov 12, 2025

View reviewed changes

wilfonba added 3 commits November 12, 2025 22:19

example case fix ups

f1afee3

CMakeLists.txt fix

e08bcae

Addressing more PR comments

b2ad78a

wilfonba and others added 6 commits November 12, 2025 23:03

fix boundary condition patches and simplex noise

88cbac0

more bug fixes

7ae0a65

Merge remote-tracking branch 'upstream/master' into openmp_cce

78ff9c6

format

4f2f579

fix IB markers seg fault

56097f8

Fix for benchmarking on frontier

8a66941

wilfonba self-requested a review November 13, 2025 22:54

wilfonba approved these changes Nov 13, 2025

View reviewed changes

Anand Radhakrishnan added 2 commits November 13, 2025 21:18

fix to make benchmark work on frontier

f5ac529

format

5fa68fa

wilfonba approved these changes Nov 14, 2025

View reviewed changes

sbryngelson reviewed Nov 14, 2025

View reviewed changes

src/common/m_boundary_common.fpp Show resolved Hide resolved

sbryngelson reviewed Nov 14, 2025

View reviewed changes

src/common/m_boundary_common.fpp Show resolved Hide resolved

anandrdbz merged commit 7169711 into MFlowCode:master Nov 14, 2025
44 of 47 checks passed

Conversation

prathi-wind commented Nov 10, 2025 • edited by sbryngelson Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

Type of change

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

wilfonba Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wilfonba left a comment

Choose a reason for hiding this comment

Uh oh!

wilfonba commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wilfonba left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

7 participants

prathi-wind commented Nov 10, 2025 •

edited by sbryngelson

Loading

wilfonba commented Nov 13, 2025 •

edited

Loading