Performance, Intel MKL and multithreading

Updated May 10, 2024



Contents


Back to table of contents


Intel® oneAPI Math Kernel Library (oneMKL)

To improve performance and reduce analysis times, flow5 uses Intel’s Math Kernel Library (Intel MKL) to perform operations on linear systems.

Intel MKL is a library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms, and vector math. The routines in MKL are hand-optimized specifically for Intel processors. The library supports Intel processors and is available for Windows, Linux and macOS operating systems.

The performance and efficiency of this library is impressive and has led to a dramatic reduction in analysis times compared to those of xflr5.


flow5 vs. xflr5

The analysis times in flow5 have been greatly reduced with the use of Intel's MKL library and with multithreading. This in turn allows the use of higher mesh sizes. The following chart shows the improvement when running on an Intel Core i5@2.50GHz.


intel_perf


Back to top

flow5 on macOS

The benefits are similar on a low-end aging macOS. The macMini 2014 platform was limited to 4Gb RAM, so that memory issues limited the benchmark to matrix sizes no greater than 10000.
The MKL librairies have been replaced by the macOS native vecLib framework starting with v7.03, with comparable performance.
flow5 runs smoothly on macOS mini M1 with the Rosetta 2 translator, with good performance.


macOS_perf.svg


Back to top

flow5 on different CPU

The analysis times diminish with the processor's speed and number of threads as shown in the graph below.
It is to be noted that Intel's MKL library is specifically optimized for Intel processors, so that the benefit on an AMD processor is not quite as significant as could have been expected given the number of cores. The improvement is still significant nonetheless.



perf_200613.svg


The model used to perform the testing can be downloaded here: LU_test.fl5.

Back to top

Troubleshooting

MKL has been reported to run slowly on Intel "Efficiency cores" with LU factorization times increasing by one or two orders of magnitude.

This problem is fixed by forcing MKL to run on "Performance cores". These pages explain how to proceed:

In practice, the environment vaiables which control thread usage need to be set before running flow5.

  1. open a terminal window
  2. set the environment variable adapted to the processor, for instance “set KMP_HW_SUBSET=8c:intel_core,1t” in Windows and “export KMP_HW_SUBSET=8c:intel_core,1t” in Linux
  3. launch flow5 from the same terminal window by typing c:/path/to/flow5.exe

Alternatively the environment variable can be set at system level, or by creating a script.




Back to top