This directory contains an example of several versions of Cholesky Factorization algorithm.

dpotrf: An implementation that calls the Intel® Math Kernel Library (Intel® MKL) dpotrf function to directly perform the factorization. This can be a serial implementation or threaded implementation depending on the version of the Intel MKL library that is linked against.

crout: A serial implementation that uses the Crout-Cholesky algorithm for factorization. The same approach is parallelized for the other Intel® Threading Building Blocks (Intel® TBB) based approaches below.

depend: A parallel version of Crout-Cholesky factorization that uses an Intel TBB flow graph. This version uses a dependence graph made solely of continue_node objects. This an inspector-executor approach, where a loop nest that is similar to the serial implementation is used to create an unrolled version of the computation. Where the Intel MKL calls would have been made in the original serial implementation of Crout-Cholesky, instead nodes are created and these nodes are linked by edges to the other nodes that they are dependent upon. The resulting graph is relatively large, with a node for each instance of each Intel MKL call. For example, there are many nodes that call dtrsm; one for each invocation of dtrsm in the serial implementation. The is very little overhead in message management for this version and so it is often the highest performing.

join: A parallel version of Crout-Cholesky factorization that uses an Intel TBB flow graph. This version uses a data flow approach. This is a small, compact graph that passes tiles along its edges. There is one node per type of Intel MKL call, plus join_nodes that combine the inputs required for each call. So for example, there is only a single node that applies all calls to dtrsm. This node is invoked when the tiles that hold the inputs and outputs for an invocation are matched together in the tag-matching join_node that precedes it. The tag represents the iteration values of the i, j, k loops in the serial implementation at that invocation of the call. There is some overhead in message matching and forwarding, so it may not perform as well as the dependence graph implementation.

This sample code requires a recent Intel TBB library (one that supports the flow graph). And also the Intel MKL library.

System Requirements

For the most up to date system requirements, see the release notes.

Files
cholesky.cpp
Source code for example.
init.cpp
Source code for example.
Makefile
Makefile for building the example.
Directories
msvs
Contains Microsoft* Visual Studio* workspace for building and running the example (Windows* systems only).
xcode
Contains Xcode* IDE workspace for building and running the example (OS X* systems only).

For information about the minimum supported version of IDE, see release notes.

Build instructions

General build directions can be found here.

Usage
cholesky [size=value] [blocksize=value] [num_trials=value] [output_prefix=value] [algorithm=value] [num_tbb_threads=value] [input_file=value] [-x] [-h] [size [blocksize [num_trials [output_prefix [algorithm [num_tbb_threads]]]]]]
where:
size - the row/column size of NxN matrix (size <= 46000)
blocksize - the block size; size must be a multiple of the blocksize
num_trials - the number of times to run each algorithm
output_prefix - if provided the prefix will be preappended to output files: output_prefix_posdef.txt and output_prefix_X.txt; where X is the algorithm used
if output_prefix is not provided, no output will be written
algorithm - name of the used algorithm - can be dpotrf, crout, depend or join
num_tbb_threads - number of started TBB threads
input_file - if provided it will be read to get the input matrix
-x - skips all validation
-h - show this message

Up to parent directory
Legal Information

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others.
© 2016, Intel Corporation