The inability to predict lasting languages and architectures led us to develop OCCA, a C++ library focused on host-device interaction. Using run-time compilation and macro expansions, the result is a novel single kernel language that expands to multiple threading languages. Currently, OCCA supports device kernel expansions for the OpenMP, OpenCL, and CUDA platforms. Computational results using finite difference, spectral element and discontinuous Galerkin methods show OCCA delivers portable high performance in different architectures and platforms.
Native APIs: MATLAB, Python, Julia, C++, C#, C, F90. Optional back-ends: OpenCL, CUDA, OpenMP, Pthreads, Intel COI.
To obtain the solutions of large sparse linear system of equations arised from the discretization of elliptic partial differential equations. The method uses an aggregation strategy based on parallel maximal independent set algorithm and defines piecewise constant interpolations from coarse to fine grids. The solution procedure uses optimal Krylov accelerated cycling strategy (K-cycle). The implementations use OCCA2 for portability across several hardware architectures and multi-threading approaches.
To predict the tsunami wave propagation using discontinuous Galerkin methods on triangular meshes for two dimensional Shallow water equations. The method uses multirate Adams-Bashforth method for time integration, a positivity preserving method, and a slope limiter for stability of the numerical scheme. The implementations use OCCA2 for portability across several hardware architectures and multi-threading approaches.
gLaser:
Slides.To simulate a laser intensity in laser-induced thermo therapy, we developed gLaser application which solves Radiative Transport Equation in biological tissue. The algorithm applies Spectral Element method in space and sphercial harmonics in the velocity field. The implementation uses OCCA which enables the app to run in difference parallel modes.
Hermes:
We employ Hermite methods to discretize a variety of Hyperbolic Initial Value Problems. Building on Hermite interpolation, these methods allow for the construction of arbitrary order methods based on 2k-point stencils in k spatial dimensions. A remarkable feature is that the CFL condition of the resulting scheme is independent of the approximation order. The implementation uses OCCA2 for portability across various computing architectures.
Nodal Discontinuous Galerkin Methods: my interests include developing practical numerical methods for time-dependent electromagnetics, acoustics, computational seismology, and fluid dynamics and also steady state elliptic problems. In particular I am working to improve the efficiency and robustness of these methods through the use of local time-stepping methods, artificial viscosity for shock capturing, preconditioners, and convergent adaptive schemes.
HPC: My team at VirginiaTech is part of the Argonne Exascale Co-Design center: CESAR. We are developing programming tools for programming at exascale. I am working with David Medina to develop a new multi-threading framework, OCCA, that provides a unified abstract interface and compute kernel programming model that interfaces behind the scenes with OpenCL, CUDA, pThreads, and OpenMP threading models.
Applications: I am collaborating with the Shell Computational and Modeling group to develop a world class seismic inversion simulation platform based on the GPU accelerated discontinuous Galerkin methods that my research team has developed. I am also working with researchers at the MD Anderson Cancer Center to develop a prototype therapy modeling tool for the Magnetic Resonance Guided Laser Instertitial Thermal Therapy (MRgLITT) that can provide predictive capabilities..
Recent Activity:
Research Interests: High order finite element methods, adaptivity, DG, hybridized DG, stabilized methods, least squares methods, wave equations, computational fluid dynamics.
Projects: Adaptive Plane Wave DG, Discontinuous Petrov-Galerkin methods
Recent Activity:
Research Interests: computational methods for wave-like problems (in acoustics, electromagnetism, elastodynamics, oceanography, ...), absorbing boundary conditions, absorbing layers, perfectly matched layers, ...
Projects: Computational Methods for Seismic Imaging
Recent Activity:
Research Interests: Numerical PDEs, High Performance Computing
Projects:
Recent Activity:
Research Interests: High performance numerical methods for scientific computing applications
Projects:
Recent Activity:
Research Interests: Fast methods for solving discrete systems arising from high order finite element methods for time and frequency domain wave propagation.
Projects: Adaptive plane wave DG.
Recent Activity:
Research Interests: Device-accelerated algorithms for numerical methods
Projects:
Recent Activity:
Tools for multi-core computing:
OCCA2: The inability to predict lasting languages and architectures led us to develop OCCA, a C++ library focused on host-device interaction. Using run-time compilation and macro expansions, the result is a novel single kernel language that expands to multiple threading languages. Currently, OCCA supports device kernel expansions for the OpenMP, OpenCL, and CUDA platforms. Computational results using finite difference, spectral element and discontinuous Galerkin methods show OCCA delivers portable high performance in different architectures and platforms.
Native APIs: MATLAB, Python, Julia, C++, C#, C, F90. Optional back-ends: OpenCL, CUDA, OpenMP, Pthreads, Intel COI.
Unstructured grid discontinuous Galerkin based solvers:
MIDG2: A time-domain Maxwell's solver based on discontinuous Galerkin discretization in space and low-storage Runge-Kutta discretization in time accelerated by the OCCA2 extensible many-core programming interface. Capabilities include: Maxwell's equations solved in 2D and 3D on unstructured grids, hybrid MPI+OCCA2 parallelism.
pasidg: A time-dependent shallow water equation solver. Capabilities: reads GEBCO bathymetry data (link), GSHHS coastline data (link), gmsh meshing (link), interactive global region selection, multirate linear multistep local time-stepping, positivity preserving limiter, TVB limiting, and OCCA2 based acceleration.
RiDG: A time-dependent linear acoustic-elastic water equation solver. Capabilities: multirate linear multistep local time-stepping, MPI+OCCA2 hybrid parallelism, acoustic/vti/tti/elastic modules, disk-free reverse time migration, consistent temporal correlation, time-reversed multirate time-stepping.
Spectral element based solvers:
brainNek:
Preprints:
Abstract: “High-order finite-difference methods are commonly used in wave propagators for industrial subsurface imaging algorithms. Computational aspects of the reduced linear elastic vertical transversely isotropic propagator are considered. Thread parallel algorithms suitable for implementing this propagator on multi-core and many-core processing devices are introduced. Portability is addressed through the use of the OCCA runtime programming interface. Finally, performance results are shown for various architectures on a representative synthetic test case.”
Abstract: “The inability to predict lasting languages and architectures led us to develop OCCA, a C++ library focused on host-device interaction. Using run-time compilation and macro expansions, the result is a novel single kernel language that expands to multiple threading languages. Currently, OCCA supports device kernel expansions for the OpenMP, OpenCL, and CUDA platforms. Computational results using finite difference, spectral element and discontinuous Galerkin methods show OCCA delivers portable high performance in different architectures and platforms.”
Abstract: “We present a study of two residual a posteriori error indicators for the Plane Wave Discontinuous Galerkin (PWDG) method for the Helmholtz equation. In particular we study the h-version of PWDG in which the number of plane wave directions per element is kept fixed. First we use a slight modification of the appropriate a priori analysis to determine a residual indicator. Numerical tests show that this is reliable but pessimistic in that the ratio between the true error and the indicator increases as the mesh is refined. We therefore introduce a new analysis based on the observation that sufficiently many plane waves can approximate piecewise linear functions as the mesh is refined. Numerical results demonstrate an improvement in the efficiency of the indicators.”
Abstract: “We discuss the development, verification, and performance of a GPU accelerated discontinuous Galerkin method for the solutions of two dimensional nonlinear shallow water equations. The shallow water equations are hyperbolic partial differential equations and are widely used in the simulation of tsunami wave propagations. Our algorithms are tailored to take advantage of the single instruction multiple data (SIMD) architecture of graphic processing units. The time integration is accelerated by local time stepping based on a multi-rate Adams-Bashforth scheme. A total variational bounded limiter is adopted for nonlinear stability of the numerical scheme. This limiter is coupled with a mass and momentum conserving positivity preserving limiter for the special treatment of a dry or partially wet element in the triangulation. Accuracy, robustness and performance are demonstrated with the aid of test cases. We compare the performance of the kernels expressed in a portable threading language OCCA, when cross compiled with OpenCL, CUDA, and OpenMP at runtime.”
Accelerated Numerical Methods:
Abstract: “Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. In a recent publication, we have shown that DG methods also adapt readily to execution on modern, massively parallel graphics processors (GPUs). A number of qualities of the method contribute to this suitability, reaching from locality of reference, through regularity of access patterns, to high arithmetic intensity. In this article, we illuminate a few of the more practical aspects of bringing DG onto a GPU, including the use of a Python-based metaprogramming infrastructure that was created specifically to support DG, but has found many uses across all disciplines of computational science.”
Abstract: “Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. Lately, another property of DG has been growing in importance: The majority of a DG operator is applied in an element-local way, with weak penalty-based element-to-element coupling. The resulting locality in memory access is one of the factors that enables DG to run on off-the-shelf, massively parallel graphics processors (GPUs). In addition, DG’s high-order nature lets it require fewer data points per repre- sented wavelength and hence fewer memory accesses, in exchange for higher arithmetic intensity. Both of these factors work significantly in favor of a GPU implementation of DG. Using a single US$400 Nvidia GTX 280 GPU, we accelerate a solver for Maxwell’s equations on a general 3D unstructured grid by a factor of 40 to 60 relative to a serial computation on a current-generation CPU. In many cases, our algorithms exhibit full use of the device’s available memory bandwidth. Example computations achieve and surpass 200 gigaflops/s of net application- level floating point work. In this article, we describe and derive the techniques used to reach this level of performance. In addition, we present comprehensive data on the accuracy and runtime behavior of the method. ”
Abstract: “A multirate Adams-Bashforth (AB) scheme for simulation of electromagnetic wave propagation using the discontinuous Galerkin finite element method (DG-FEM) is presented. The algorithm is adapted such that single-instruction multiple-thread (SIMT) characteristic for the implementation on a graphics processing unit (GPU) is preserved. A domain decomposition strategy respecting the multirate classification for computation on multiple GPUs is presented. Accuracy and performance is analyzed with help of suitable benchmarks.”
Abstract: “Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. Lately, another property of DG has been growing in importance: The majority of a DG operator is applied in an element-local way, with weak penalty-based element-to-element coupling. The resulting locality in memory access is one of the factors that enables DG to run on off-the-shelf, massively parallel graphics processors (GPUs). In addition, DG’s high-order nature lets it require fewer data points per represented wavelength and hence fewer memory accesses, in exchange for higher arithmetic intensity. Both of these factors work significantly in favor of a GPU implementation of DG. Using a single US$400 Nvidia GTX 280 GPU, we accelerate a solver for Maxwell’s equations on a general 3D unstructured grid by a factor of around 50 relative to a serial computation on a current-generation CPU. In many cases, our algorithms exhibit full use of the device’s available memory bandwidth. Example computations achieve and surpass 200 gigaflops/s of net application-level floating point work.”
Discontinuous Galerkin Methods:
Abstract: “The low-storage curvilinear discontinuous Galerkin (LSC-DG) method reduces the storage requirements for solving symmetric and linear linear symmetric conservation laws in curvi- linear domains when compared to the standard discontinuous Galerkin method. We perform a semidiscrete, a priori convergence analysis of LSC-DG and determine sufficient conditions on se- quences of meshes to guarantee the same rate of convergence as the original upwind DG method on curvilinear domains. Computational results confirm the optimal order convergence on sample mesh sequences that satisfy the sufficiency conditions. Additional results show that the sufficient conditions for optimal order convergence of LSC-DG may also be necessary conditions..”
Abstract: “We develop and analyze an adaptive hybridized Interior Penalty Discontinuous Galerkin (IPDG-H) method for H(curl)- elliptic boundary value problems in 2D or 3D arising from a semi-discretization of the eddy currents equations. The method can be derived from a mixed formula- tion of the given boundary value problem and involves a Lagrange multiplier that is an approximation of the tangential traces of the primal variable on the interfaces of the underlying triangulation of the computational domain. It is shown that the IPDG-H technique can be equivalently formulated and thus implemented as a mortar method. The mesh adaptation is based on a residual-type a posteriori error estimator consisting of element and face residuals. Within a unified framework for adaptive finite element methods, we prove the reliability of the estimator up to a consistency error. The performance of the adaptive symmetric IPDG-H method is documented by numerical results for representative test examples in 2D.”
Abstract: “We consider discontinuous Galerkin (DG) approximations of the Maxwell eigenproblem on meshes with hanging nodes. It is known that while standard DG methods provide spurious-free and accurate approximations on the so-called k-irregular meshes, they may generate spurious solutions on general irregular meshes. In this paper we present a mortar-type method to cure this problem in the two-dimensional case. More precisely, we introduce a projection based penalization at non-conforming interfaces and prove that the obtained DG methods are spectrally correct. The theoretical results are validated in a series of numerical experiments on both convex and non convex problem domains, and with both regular and discontinuous material coefficients.”
Numerical Analysis & Methods:
Abstract: “An open question concerns the spatial distribution of nodes that are suitable for high-order Lagrange interpolation on the triangle and tetrahedron. Several current methods used to produce nodal sets with small Lebesgue constants are recalled. A new approach is presented for building nodal distributions of arbitrary order, that is based on curvilinear finite-element techniques. Numerical results are shown which demonstrate that, despite the explicit nature of this construction, the resulting node sets are well suited for interpolation and competitive with existing sets for up to tenth-order polynomial interpolation. Matlab scripts which evaluate the node distributions on the equilateral triangle are included.”
Abstract: “We derive inverse trace inequalities for hp-finite elements. Utilizing orthogonal polynomials, we show how to derive explicit expressions for the constants when considering triangular and tetrahedral elements. We also discuss how to generalize this technique to the general d-simplex.”
Mathematical result: Given a d-dimensional simplex $D$ then $\| u \|_{\partial D} \leq \sqrt{\frac{(N+1)(N+d)}{d} \frac{|\partial D|}{|D|}} \|u\|_D \; \forall \; u\in \mathbb{P}^N(D)$.
Radiation Boundary Conditions:
Abstract: “ We develop complete plane wave expansions for time-dependent waves in a half-space and use them to construct arbitrary order local radiation boundary conditions for the scalar wave equation and equivalent first order systems. We demonstrate that, unlike other local methods, boundary conditions based on complete plane wave expansions provide nearly uniform accuracy over long time intervals. This is due to their explicit treatment of evanescent modes. Exploiting the close connection between the boundary condition formulations and discretized absorbing layers, corner compatibility conditions are constructed which allow the use of polygonal artificial boundaries. Theoretical arguments and simple numerical experiments are given to establish the accuracy and efficiency of the proposed methods”
Abstract: “ We construct and analyze new local radiation boundary condition sequences for first-order, isotropic, hyperbolic systems. The new conditions are based on representations of solutions of the scalar wave equation in terms of modes which both propagate and decay. Employing radiation boundary conditions which are exact on discretizations of the complete wave expansions essentially eliminates the long time nonuniformities encountered when using the standard local methods (perfectly matched layers or Higdon sequences). Specifically, we prove that the cost in terms of degrees-of-freedom per boundary point scales with $\ln\frac{1}{\epsilon}\cdot\ln\frac{cT}{\delta}$, where $\epsilon$ is the error tolerance, T is the simulation time, and $\delta$ is the separation between the source-containing region and the artificial boundary. Choosing $\delta\sim\lambda$, where $\lambda$ is the wavelength, leads to the same estimate which has been obtained for optimal nonlocal approximations. Numerical experiments confirm that the efficiencies predicted by the theory are attained in practice.”
Abstract: “ We present a new auxiliary variable formulation of high-order radiation boundary conditions for the numerical simulation of waves on unbounded domains. Retaining the flexibility of Higdon’s wave-product conditions, our approach allows arbitrary-order implementations. When applied to the scalar wave equation, the proposed method leads to balanced, symmetrizable systems of wave equations on the boundary. It can also be extended to first-order systems. Corner compatibility conditions are derived for the auxiliary variable equations. They are shown experimentally to lead to stable, accurate results.”