Parallel Computations Efficiency: Abaqus, Ansys and Simmakers

Hardware that is based on parallel computing architecture has recently been gaining increasing popularity in high performance computing.

The efficiency of parallel processing hardware in engineering problem solving such as the computer simulation of physical processes is not directly dependent on the number of processors: four CPU cores do not in fact provide a fourfold speed increase in solving complex engineering problems over one CPU core. Similarly, the transfer of computation to graphics cards with hundreds of cores cannot provide a hundredfold increase in speed.

First of all, parallel computation acceleration is limited by computational algorithms; running algorithms with a low degree of parallelization on supercomputers and high-performance workstations is irrational. The notion of "efficiency of parallelization" is explained by Amdahl's law, according to which if at least 1/10 of the program is executed sequentially, then the acceleration cannot be increased beyond 10 times the original speed regardless the number of cores employed.

Telling examples of the limited effectiveness of algorithm parallelization for solving engineering problems are provided in the relatively weak results of worldwide leaders in computer-aided engineering (CAE) software - Abaqus and Ansys.

In SIMULIA's Abaqus transfer of computations from 2 CPU cores to 4 CPU cores, the speedup factor was 1.7 times. Transferring these algorithms to CUDA architecture with 448 cores of Nvidia Tesla C2075 sharing 4 CPU cores resulted in an increase of only 3.5 times [Source].

SIMULIA’s Abaqus performance acceleration when transferring from 2 to 4 CPU cores. SIMULIA’s Abaqus performance acceleration when using 4 CPU cores and 448 GPU cores.

SIMULIA’s Abaqus performance acceleration when transferring from 2 to 4 CPU cores

SIMULIA’s Abaqus performance acceleration when using 4 CPU cores and 448 GPU cores

Ansys also achieved parallelization efficiency of algorithms commensurate with Abaqus. When increasing the number of CPU cores from two to eight, the processing speed of the Ansys Mechanical 15.0 package tripled. Sharing between 2 CPU cores and the 2880 cores on the Nvidia Tesla K40 video accelerator was 3.5 times faster than the 2 CPU cores alone [Source].

Ansys Mechanical 15.0 performance acceleration with parallel processing

Ansys Mechanical 15.0 performance acceleration with parallel processing

The mathematical solvers embedded in the «Frost 3D Universal» software demonstrate the superior computational algorithm parallelization and use of parallel architecture in terms of efficiency.

A computer model of production wells was used to compare the parallel computing speed on CPUs and GPUs.

Soil thermal field distribution over 5 years in the XZ plane

The hardware was selected from widely available user computing resources such as the Intel Core i7 CPU and the Nvidia Titan graphics card.

Intel Core i7-3770 Nvidia GeForce GTX Titan
Intel Core i7 CPU nVidia Titan video card
Specifications Specifications
Cores: 4 Cores: 2688
Base Clock: 3.4 GHz Base Clock: 836 MHz
Boost Clock: 3.9 GHz Boost Clock: 876 MHz
Graphics Card Power: 77 W Graphics Card Power: 250 W
Recommended price: $305 Recommended price: $1080

The three-dimensional model was discretized with different spatial steps. As a result, meshes with the following number of nodes were obtained: ~2 million, 4 million, 8 million and 16 million. Each computational mesh was computed on 1 core of Intel Core i7, 4 cores of Intel Core i7 and the GeForce GTX Titan video card. Below there are computational results for the two-year simulation forecast.

Number of nodes Processing time, s Speedup factor
1 core of Intel Core i7 4 cores of Intel Core i7 GeForce GTX Titan 4 cores of Intel Core i7 to 1 core GeForce GTX Titan to 4 cores of Intel Core i7 GeForce GTX Titan to 1 core Intel Core i7
2,000,000 9.62 h

(34,632 s)

5.97 h

(21,504 s)

34.11 min

(2,047 s)

1.61x 10.50x 16.91x
4,000,000 18.16 h

(65,388 s)

10.63 h

(38,287 s)

57.65 min

(3,459 s)

1.70x 11.06x 18.90x
8,000,000 34.33 h

(123,600 s)

19.22 h

(69,221 s)

1.62 h

(5,844 s)

1.78x 11.84x 21.14x
16,000,000 61.14 h

(220,104 s)

32.98 h

(118,736 s)

2.62 h

(9,456 s)

1.85x 12.55x 23.27x

Computation acceleration chart

The performance of 1 core of Intel Core i7 represents an speedup factor of 1x

It should be noted that, when comparing the computational speed on multi-core architectures, the following model parameters have a significant impact on the acceleration:
- number of materials;
- the number of boundary conditions;
- mesh uniformity;
- multiplicity of mesh nodes and computational cores;
- conformity of thermo-physical properties of materials.
It means that the maximum acceleration on parallel architectures could be achieved on the simplest models with a uniform computational mesh and the minimum number of materials and boundary conditions. In practice, however, computational models are more complicated, that’s why our speed analysis was based on the production wells simulation model for more objective results.

Conclusions:

  1. The use of computational algorithms with a low degree of parallelization is inefficient on multi-core processors and video accelerators.
  2. The major engineering analysis software packages on the market contain a high degree of serial code, significantly hampering the acceleration potential of parallel computing. This is largely due to the implementation of now dated mathematical solver algorithms, developed when there were no technologies such as CUDA and therefore not designed to take advantage of these parallelization technology enhancements.
  3. Mathematical algorithms in the latest generation CAE software are designed basing on parallel processing technology. It allows achieving speedup by a factor of ten by transferring computation from one CPU core to multi-core graphics accelerators.

2 thoughts on “Parallel Computations Efficiency: Abaqus, Ansys and Simmakers

  1. Please check mistyping:

    Ansys also achieved parallelization efficiency of algorithms commensurate with Abaqus. When increasing the number of CPU cores from two to eight, the processing speed of the Ansys Mechanical 15.0 package tripled. Sharing between 2 CPU cores and the 2880 cores on the Nvidia

    Tesla

    K40 video accelerator was 3.5 times faster than the 2 CPU cores alone

Leave a Reply to Joaquin Obregon Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>