next up previous contents index
Next: Multi-thread parallelization of aoforce, Up: Running Parallel Jobs Previous: Setting up the parallel   Contents   Index


OpenMP parallelization of dscf, odft and ricc2

The OpenMP parallelization does not need any special program startup. The binaries can be invoked in exactly the same manner as for sequential (non-parallel) calculations. The only difference is, that before the program is started the environment variable PARNODES has to be set to the number or threads that should be used by the program, the scripts will set OMP_NUM_THREADS to the same value and start the OpenMP binaries. The number of threads is essentially the max. number of CPU cores the program will try to utilize. To exploit e.g. all eight cores of a machine with two quad-core CPUs set

export PARNODES=8
(for csh and tcsh use setenv PARNODES=8).

Presently the OpenMP parallelization of ricc2 comprises all functionalities apart from the recently LT-SOS-RI-MP2, and the calculation of expectation values for $ \hat{{S}}^{2}_{}$ .
Note that the memory specified with $maxcor is for OpenMP-parallel calculation the maximum amount of memory that will be dynamically allocated by all threads together. To use your computational resources efficiently, it is recommended to set this value to about 75% of the physical memory available for your calculations, but to at most 16000 (megabytes). (Due to the use of integer* 4 arithmetics the ricc2 program is presently limited to 16 Gbytes.)

In the dscf program the OpenMP parallelization covers presently only the Hartree-Fock coulomb and exchange contributions to the Fock matrix in fully integral-direct mode and is mainly intended to be used in combination with OpenMP parallel runs of ricc2. Nevertheless, the OpenMP parallelization can also be used in DFT calculations, but the numerical integration for the DFT contribution to the Fock matrix will only use a single thread (CPU core) and thus the overall speed up will be less good. Memory usage is low and dscf will ignore $maxcor settings.

The odft module is parallelized using OpenMP. For LHF an almost ideal speedup is obtained because the most expensive part of the calculation is the evaluation of the Fock matrix and of the Slater-potential, and both of them are well parallelized. The calculation of the correction-term of the grid will use a single thread.

Restrictions:


next up previous contents index
Next: Multi-thread parallelization of aoforce, Up: Running Parallel Jobs Previous: Setting up the parallel   Contents   Index
TURBOMOLE M