Skip to content

Openmp cce#1035

Merged
anandrdbz merged 221 commits intoMFlowCode:masterfrom
prathi-wind:openmp_cce
Nov 14, 2025
Merged

Openmp cce#1035
anandrdbz merged 221 commits intoMFlowCode:masterfrom
prathi-wind:openmp_cce

Conversation

@prathi-wind
Copy link
Contributor

@prathi-wind prathi-wind commented Nov 10, 2025

User description

Description

Add OpenMP support to MFC on the Cray Compiler. Passes all test cases, and adds CI to test on Frontier.

Also, adds in mixed precision made added for Gordon Bell.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Something else

PR Type

Enhancement, Bug fix


Description

  • Add OpenMP support to MFC on the Cray Compiler with comprehensive case optimization

  • Implement mixed precision support throughout the codebase for improved performance and memory efficiency

  • Refactor boundary condition arrays from negative indexing (-1:1) to positive indexing (1:2) for optimization

  • Add case optimization guards to skip unnecessary 3D computations in 2D simulations (WENO, viscous, MHD, capillary)

  • Introduce simplex noise generation module for procedural perturbation support

  • Fix GPU compatibility issues in chemistry and FFT modules for both NVIDIA and AMD compilers

  • Update MPI operations to support mixed precision with proper type conversions and 64-bit indexing

  • Refactor bubble Eulerian projection data structure for improved memory layout

  • Fix MUSCL workspace array naming conflicts and GPU parallel loop syntax issues

  • Add CI testing on Frontier supercomputer for Cray Compiler validation


Diagram Walkthrough

flowchart LR
  A["Cray Compiler<br/>OpenMP Support"] --> B["Case Optimization<br/>2D/3D Guards"]
  A --> C["Mixed Precision<br/>wp/stp Types"]
  B --> D["WENO, Viscous,<br/>MHD Modules"]
  C --> E["Boundary Arrays<br/>MPI Operations"]
  C --> F["Bubble EL,<br/>QBMM Modules"]
  G["GPU Compatibility<br/>Fixes"] --> H["Chemistry<br/>FFT Modules"]
  I["Simplex Noise<br/>Module"] --> J["Perturbation<br/>Support"]
  D --> K["Performance<br/>Optimization"]
  E --> K
  H --> K
Loading

File Walkthrough

Relevant files
Enhancement
17 files
m_boundary_common.fpp
Refactor boundary arrays and add case optimization support

src/common/m_boundary_common.fpp

  • Changed boundary condition array indexing from (-1:1) to (1:2) to
    support case optimization
  • Updated all references from negative indices (-1, 1) to positive
    indices (1, 2) throughout the module
  • Added conditional compilation directives #:if not
    MFC_CASE_OPTIMIZATION or num_dims > 2 to skip 3D boundary operations
    in optimized 2D cases
  • Changed data type from real(wp) to real(stp) for mixed precision
    support in buffer parameters
  • Updated MPI I/O operations to handle mixed precision with MPI_BYTE and
    element count calculations
+249/-231
m_weno.fpp
Add case optimization guards to WENO reconstruction schemes

src/simulation/m_weno.fpp

  • Added #:include 'case.fpp' directive for case optimization support
  • Wrapped WENO order 5 and 7 reconstruction code with conditional
    compilation #:if not MFC_CASE_OPTIMIZATION or weno_num_stencils > N
  • Nested TENO stencil calculations within additional optimization guards
    for higher-order schemes
  • Maintained existing WENO algorithm logic while enabling compile-time
    optimization for reduced stencil cases
+286/-274
m_bubbles_EL.fpp
Refactor bubble Eulerian projection data structure for mixed precision

src/simulation/m_bubbles_EL.fpp

  • Changed q_beta from type(vector_field) to type(scalar_field),
    dimension(:), allocatable for better memory layout
  • Updated allocation from q_beta%vf(1:q_beta_idx) to
    q_beta(1:q_beta_idx) with individual scalar field setup
  • Changed data type from real(wp) to real(stp) for mixed precision
    support in gradient calculations
  • Updated s_gradient_dir subroutine signature to accept raw arrays
    instead of scalar field types
  • Updated all references from q_beta%vf(i)%sf to q_beta(i)%sf throughout
    the module
+50/-48 
m_boundary_conditions.fpp
Update boundary condition array indexing to positive indices

src/pre_process/m_boundary_conditions.fpp

  • Updated boundary condition array indexing from (-1:1) to (1:2) in
    function signatures
  • Added conditional logic to map old negative indices (-1) to new
    positive index (1) and (1) to (2)
  • Updated all bc_type array accesses to use new indexing scheme with
    explicit if-statements for location mapping
+20/-12 
m_viscous.fpp
Add case optimization conditionals to viscous module         

src/simulation/m_viscous.fpp

  • Added case.fpp include directive for case optimization support
  • Wrapped viscous stress tensor computation blocks with
    MFC_CASE_OPTIMIZATION conditional to skip 3D-specific calculations in
    2D cases
  • Reorganized gradient computation loops with conditional compilation
    for dimensional optimization
  • Improved code indentation and structure for better readability
+319/-316
m_riemann_solvers.fpp
Optimize MHD calculations and improve GPU parallelization

src/simulation/m_riemann_solvers.fpp

  • Added case.fpp include for case optimization support
  • Wrapped MHD-related 3D vector operations with MFC_CASE_OPTIMIZATION
    conditionals to optimize 2D cases
  • Expanded private variable lists in GPU parallel loops for better
    memory management
  • Fixed duplicate variable assignments and improved variable
    initialization in HLLC solver
  • Added explicit type conversions for mixed precision support
+78/-35 
m_mpi_common.fpp
Support mixed precision and dynamic dimensionality in MPI

src/common/m_mpi_common.fpp

  • Added case.fpp include directive for case optimization
  • Changed halo_size from integer to integer(kind=8) for 64-bit support
  • Fixed dimension indexing to use num_dims instead of hardcoded value 3
    for 2D/3D compatibility
  • Added explicit type conversions between wp and stp precision kinds in
    MPI buffer operations
+23/-23 
m_cbc.fpp
Optimize CBC module with case-specific conditionals           

src/simulation/m_cbc.fpp

  • Added case.fpp include for case optimization support
  • Moved dpres_ds variable declaration from module level to local scope
    within subroutine
  • Added dpres_ds to private variable list in GPU parallel loop
  • Wrapped dpi_inf_dt assignment with MFC_CASE_OPTIMIZATION conditional
    for single-fluid optimization
  • Improved GPU update directive formatting
+7/-8     
m_global_parameters.fpp
Add simplex noise perturbation parameters                               

src/pre_process/m_global_parameters.fpp

  • Added new simplex_perturb logical flag for simplex noise perturbation
    support
  • Added simplex_params derived type variable to store simplex noise
    parameters
  • Initialized simplex perturbation parameters in module initialization
    routine
+12/-0   
m_qbmm.fpp
Mixed precision support and case optimization for QBMM     

src/simulation/m_qbmm.fpp

  • Changed pb parameter type from real(wp) to real(stp) for mixed
    precision support
  • Wrapped bubble model coefficient calculations with preprocessor
    conditionals for case optimization
  • Added missing loop variables i1, i2, j to GPU parallel loop private
    list
  • Added TODO comment regarding rhs_pb and rhs_mv precision types
+98/-87 
m_simplex_noise.fpp
New simplex noise generation module                                           

src/pre_process/m_simplex_noise.fpp

  • New module implementing 3D and 2D Perlin simplex noise functions
  • Includes gradient lookup tables and permutation vectors for noise
    generation
  • Provides f_simplex3d and f_simplex2d functions for procedural noise
+245/-0 
m_fftw.fpp
FFT filter GPU implementation refactoring                               

src/simulation/m_fftw.fpp

  • Refactored GPU FFT filter implementation to use direct device
    addresses instead of pointers
  • Removed #:block UNDEF_CCE wrapper and simplified GPU data management
  • Restructured loop logic to process initial ring separately before main
    loop
  • Improved compatibility with both NVIDIA and AMD GPU compilers
+115/-130
m_start_up.fpp
Mixed precision I/O and time stepping improvements             

src/simulation/m_start_up.fpp

  • Changed WP_MOK from 8 bytes to 4 bytes for mixed precision I/O
  • Updated MPI file read calls to use mpi_io_type multiplier for data
    size
  • Added logic to copy q_cons_ts(1) to q_cons_ts(2) for multi-stage time
    steppers
  • Fixed NaN checking to use explicit real() conversion for mixed
    precision
  • Added GPU updates for bubble and chemistry parameters
  • Removed unused GPU declare directive for q_cons_temp
+88/-54 
m_rhs.fpp
Mixed precision and GPU compatibility in RHS module           

src/simulation/m_rhs.fpp

  • Added conditional GPU declare directives for OpenACC-specific
    declarations
  • Changed loop variable types to integer(kind=8) for large array
    indexing
  • Wrapped flux allocation logic with if (.not. igr) condition
  • Updated bc_type parameter dimension from 1:3, -1:1 to 1:3, 1:2
  • Changed pb_in, mv_in parameters to real(stp) type for mixed precision
  • Added TODO comments about precision type consistency
+57/-45 
m_icpp_patches.fpp
Mixed precision support for patch identification arrays   

src/pre_process/m_icpp_patches.fpp

  • Added conditional compilation for patch_id_fp array type based on
    MFC_MIXED_PRECISION flag
  • Uses integer(kind=1) for mixed precision mode, standard integer
    otherwise
  • Applied consistently across all patch subroutines for memory
    efficiency
+75/-4   
m_ibm.fpp
Mixed precision support for IBM module                                     

src/simulation/m_ibm.fpp

  • Changed pb_in and mv_in parameters to real(stp) type for mixed
    precision
  • Added missing variables to GPU parallel loop private list
  • Added type casting for pb_in and mv_in in interpolation calls
  • Changed levelset_in access to use explicit real() conversion
  • Updated pb_in, mv_in intent from INOUT to IN in interpolation
    subroutine
+7/-5     
inline_capillary.fpp
Case optimization for capillary tensor calculations           

src/simulation/include/inline_capillary.fpp

  • Wrapped 3D capillary tensor calculations with preprocessor conditional
  • Only computes 3D components when not using case optimization or when
    num_dims > 2
  • Reduces unnecessary computations for 2D simulations
+7/-6     
Bug fix
2 files
m_muscl.fpp
Rename MUSCL arrays and fix GPU loop syntax                           

src/simulation/m_muscl.fpp

  • Renamed MUSCL workspace arrays from v_rs_ws_x/y/z to
    v_rs_ws_x/y/z_muscl to avoid naming conflicts
  • Fixed GPU parallel loop syntax and indentation consistency
  • Corrected #:endcall directives to include full macro name for clarity
  • Improved code formatting and nested subroutine indentation
+158/-158
m_chemistry.fpp
GPU compatibility fixes for chemistry module                         

src/common/m_chemistry.fpp

  • Removed sequential GPU loop directives that were causing compilation
    issues
  • Added temporary variable T_in for type conversion in temperature
    calculations
  • Wrapped large GPU parallel loop with #:block UNDEF_AMD to handle AMD
    compiler differences
  • Added missing variables to GPU parallel loop private and copyin lists
+113/-112
Additional files
48 files
bench.yml +8/-0     
submit-bench.sh +1/-1     
submit-bench.sh +1/-1     
test.yml +0/-4     
CMakeLists.txt +18/-7   
case.py +104/-0 
load_amd.sh +7/-0     
setupNB.sh +5/-0     
3dHardcodedIC.fpp +113/-2 
macros.fpp +5/-4     
omp_macros.fpp +25/-1   
m_derived_types.fpp +26/-9   
m_helper.fpp +6/-6     
m_precision_select.f90 +14/-0   
m_variables_conversion.fpp +10/-34 
m_data_input.f90 +11/-10 
m_assign_variables.fpp +18/-4   
m_checker.fpp +23/-0   
m_data_output.fpp +32/-34 
m_initial_condition.fpp +35/-14 
m_mpi_proxy.fpp +29/-2   
m_perturbation.fpp +90/-1   
m_start_up.fpp +3/-2     
m_acoustic_src.fpp +8/-11   
m_body_forces.fpp +1/-1     
m_bubbles_EE.fpp +4/-4     
m_bubbles_EL_kernels.fpp +15/-15 
m_data_output.fpp +25/-26 
m_derived_variables.fpp +15/-13 
m_global_parameters.fpp +6/-11   
m_hyperelastic.fpp +1/-1     
m_hypoelastic.fpp +4/-4     
m_igr.fpp +1616/-1385
m_mhd.fpp +1/-1     
m_sim_helpers.fpp +31/-26 
m_surface_tension.fpp +38/-34 
m_time_steppers.fpp +86/-37 
p_main.fpp +2/-0     
golden-metadata.txt +154/-0 
golden.txt +10/-0   
build.py +2/-1     
lock.py +1/-1     
case_dicts.py +16/-0   
input.py +1/-1     
state.py +3/-3     
modules +4/-0     
pyproject.toml +3/-2     
frontier.mako +11/-10 

prathi-wind and others added 30 commits July 22, 2025 13:16
…, started work on enter and exit data, compiles
…e, add mappers to derived types, change how allocate is done
…types, removed rest of pure functions, fix issue with acoustic on nvfortran

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

cursor[bot]

This comment was marked as resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs to be deleted, moved to misc/, or incorporated into the toolchain somehow. This PR doesn't add AMD compilers anyway, so my suggestion would be deleted for now.

Copy link
Contributor

@wilfonba wilfonba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve for benchmark

@wilfonba
Copy link
Contributor

wilfonba commented Nov 13, 2025

I addressed a bunch of my own comments and pushed the changes. Hopefully the weird phoenix test faiulure goes away. It didn't look like an MFC code problem. The rest of the comments are more relevant to a general support of half precision, so I don't think they're very urgent at the moment.

@wilfonba wilfonba self-requested a review November 13, 2025 22:54
Copy link
Contributor

@wilfonba wilfonba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve for benchmark

@anandrdbz anandrdbz merged commit 7169711 into MFlowCode:master Nov 14, 2025
44 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

7 participants