ExMatEx

Exascale Co-Design Center for Materials in Extreme Environments

About ExMatEx

This document originally published at Los Alamos as LALP-11-020. PDF version.

Abstract

Exascale computing presents an enormous opportunity for solving some of today’s most pressing problems, including clean energy production, nuclear reactor lifetime extension, and nuclear stockpile aging. At their core, each of these problems requires the prediction of material response to extreme environments. Our Center’s objective is to establish the interrelationship between software and hardware required for materials simulation at the exascale while developing a multiphysics simulation framework for modeling materials subjected to extreme mechanical and radiation environments. This will be accomplished via a focused effort in four primary areas:

  • Scale-bridging algorithms: Our science strategy is a UQ-driven adaptive physics refinement in which coarse-scale simulations spawn sub-scale direct numerical simulations as needed.
  • Programming models: This task-based approach leverages the extensive concurrency and heterogeneity expected at exascale while enabling fault tolerance within applications.
  • Proxy applications: Proxy apps and kernels play a key role in our co-design process, as they are the main mechanism for exploring algorithm design space and communicating the application workload to the hardware architects and system software developers.
  • Co-design analysis and optimization: Performance models, simulators (from node- to system-level, including exascale complexity), and scalable analysis tools will inform a co-optimization loop to address the challenges of power, resiliency, concurrency, and heterogeneity that will characterize exascale platforms.

The programming models and approaches developed to achieve this will be broadly applicable to a variety of multiscale, multiphysics applications beyond the materials science ones addressed here.

Science Strategy

Our adaptive physics refinement technique is illustrated in Fig. 1 for a high strain-rate loading problem in which the coarser-scale model, for example a finite element method, spawns finer-scale crystal plasticity or atomistic models as needed when the standard empirical constitutive model is less accurate, for instance in the vicinity of shock fronts. Such a procedure may be carried through multiple levels of refinement, or applied to the time domain, using ab initio techniques to compute activation energies for a rate theory or kinetic Monte Carlo model.

Proxy Applications

Our multi-institutional team will also leverage considerable experience in petascale simulation and vendor interaction to create flexible proxy applications (“apps”) that encompass both Gordon Bell Prize-winning single-scale applications that achieve petascale performance using single program multiple data (SPMD) approaches, as well as a scale-bridging prototype that represents the asynchronous task-based multiple program multiple data (MPMD) programming model that avoids bulk synchronous parallelism and is more likely to survive the transition to exascale. These two classes of proxy apps will target distinct hardware characteristics: node-level data structures, memory and power management for the single-scale SPMD apps (e.g. molecular dynamics), and system-level data movement, fault management, and load balancing for the scale-bridging MPMD apps. The proxy apps are a condensation of the “real” apps that capture the broader workflow but are built to readily explore strategies such as data layout and solution algorithms, and overlay strategies for fault and power management.

Co-Design Strategy

Borrowing from agile development concepts, we will establish and execute a continuous (i.e., throughout the project lifetime) algorithm/hardware modeling, evaluation, optimization, and synthesis loop (Fig. 2), including optimization for performance, memory and data movement, power, and resiliency. Proxy applications and performance models/simulators will be used to introduce a realistic domain workload into the exascale hardware and software stack development process at an early stage, and enable real scientific applications ready when exascale platforms become available.

The proxy apps will be used to explore the breadth of algorithm space (including programming models and other implementation choices), and will be co-optimized together with the emerging vendor architecture designs with respect to price, power (energy consumption), performance (time-to-solution), and resilience (robustness under node- and system-level jitter, faults, and other cross-cutting challenges) within the externally imposed constraints (“P3R” optimization).

Opportunities for Collaboration

We anticipate several potential areas of synergy with ASCR-supported exascale computer science research. These include, but are not limited to:

X-Stack Software
  • Task-based programming models, frameworks, compilers, and debugging tools.
  • Runtime systems enabling task-based resource allocation and management.
  • Scalable tools for analyzing workflow characteristics and performance.
  • Novel tools for “stress-testing” application and software stack resiliency at scale.
  • Co-design of fault and power management strategies that share responsibility between applications, OS/runtime system, and hardware-level instructions/signals.
Advanced Architectures
  • Utilizing single-scale SPMD and scale-bridging MPMD proxy applications to evaluate power and heat management strategies, as with Sandia’s Structural Simulation Toolkit (SST).
  • Co-design of fault and power management strategies that share responsibility between applications, OS/runtime system, and hardware-level instructions/signals.
Scientific Data Management & Analysis
  • In situ visualization and analysis techniques exploiting the same task-based paradigm as our adaptive physics refinement technique.
  • Data analysis and visualization techniques for multi-scale, multi-physics simulations.