Computing Matrix Exponentials using C++ with boost library


Evaluating the exponential of a matrix is an important problem that arises in physics, mathematics and engineering. For example, in quantum theory, a central problem consists in solving the ODE of Schrödinger equation,

    \[\frac{\partial}{\partial t}\left|\psi\right> = - \frac{i}{\hbar}H\left|\psi\right>\;,\label{eq:0}\]

where H is a hermitian matrix and \left|\psi\right> is a complex vector.

If the Hamiltonian, H is time independent, the solution for Eq. (??) is given by

    \[\ \left|\psi_f\right> = \exp\left(-\frac{i}{\hbar}Ht\right)\left|\psi_i\right> \;,\]

where \left|\psi_f\right> is the final state, and \left|\psi_i\right> is the initial state.

There are several methods to compute matrix exponential. Methods involving approximation theory, differential equations, the matrix eigenvalues, and the matrix characteristic polynomial have been proposed.

Computed by Taylor Series Expansion

A direct way to define the matrix exponential <p class="ql-center-displayed-equation" style="line-height: 18px;">     \[\exp(At)\] </p>

where t is a real number is undoubtedly  through the exponential power series expansion,

    \[\ e^{At} = I + \sum_{n=1}^{\infty}\frac{(At)^n}{n!} \;, \label{eq:app:MatrixExp}\]

whose convergence is guaranteed for any square matrix A. If we momentarily ignore the efficiency, we can simply sum up the series until adding another term which does not alter the numbers stored in computer. That is, if

    \[\ T_k(At) = \sum_{n=0}^{k}\frac{(At)^n}{n!}\;,\label{eq:app:MatrixExpSum}\]

, then we can find k such that

    \[\ \|T_k(At) - T_{k+1}(At)\| \leqslant \delta \;, \label{eq:app:MatrixExpSuminquality}\]

where \delta is some prescribed error tolerance. Concern over where to truncate the series is of importance if efficiency is being considered. Unfortunately, such an algorithm is know to be inefficiency even in the scalar case, and there are lots of papers\cite{Moler2003, Liou1966} concerning the truncation error of Taylor series. As a result, directly summing up all Taylor series seems not work very well.

Computed by Diagonalization of the Matrix

For most physicists, another instinctive method is to diagonalize the matrix A, such that

    \[\ A=VDV^{-1} \;, \label{eq:app:Matrixdiagonalize}\]

then using the power series definition of <p class="ql-center-displayed-equation" style="line-height: 17px;">     \[e^{tA}\] </p>


    \[\ e^{tA} = Ve^{tD}V^{-1} \;, \label{eq:app:Matrixdiagonalize1}\]

where V is the matrix whose columns are eigenvectors of A, D=\mathrm{diag}(\lambda_1,...,\lambda_n), and \lambda_n are the eigenvalues of A. The exponential of tD is very easy to compute and we have a satisfactory build-in routine in lots of computer languages for computing the exponential of a scalar,

    \[\ e^{tD} = \mathrm{diag}(e^{\lambda_1t},...,e^{\lambda_nt}) \;. \label{eq:app:Matrixdiagonalize2}\]

The difficulty with this approach is not find the eigenvalues of the matrix A in itself, but occurs when A does not have a complete set of linearly independent eigenvectors. In this case, A is defective and there is no invertible matrix of <p class="ql-center-displayed-equation" style="line-height: 12px;">     \[V\] </p>

, therefore, the algorithm will break down. In real computing world, the difficulties occur even when A is nearly defective. Define the condition number as \mathrm{cond}(V)=\|V\|\|V^{-1}\|. While A is nearly defective, then \mathrm{cond}(V) will be very large. The errors of computing <p class="ql-center-displayed-equation" style="line-height: 17px;">     \[e^{tA}\] </p>

, including the roundoff errors from the eigenvalues computation, may be magnified by \mathrm{cond}(V). As a result, the computed exponential of a matrix will most likely be inaccurate when \mathrm{cond}(V) is very large. An example had been demonstrated in this paper\cite{Moler2003}.

Computed by Padé Approximation

The most easy method for computing e^{At} numerically might be Taylor series, but we have discussed that this approach is inefficiency and not accurate. A Padé approximation often gives better result of a given function – e^At here than truncating its Taylor series, and it may still work where the Taylor series does not converge well. Since it is defined as a rational function which is a ratio of polynomial series, it can be calculated numerically easily.  Therefore, Padé approximations are used extensively in computer calculation. A Padé approximation approximates a function in only one variable, an approximation of a function in two variables is called a Chisholm approximation, and in multiple variables is called a Canterbury approximation.

A Padé rational approximation to f(x) is the quotient of two polynomials N_{p/q}(x) and D_{p/q}(x) of degrees p and q respectively. This is so-called (p,q)-degree type Padé approximation. We use the notation R_{p/q}(x) to denote this quotient:

    \[\ R_{p/q}(x) = \frac{N_{p/q}(x)}{D_{p/q}(x)}\;,\]

where x is a scalar. If x is a matrix – A, the quotient is defined as

    \[\ R_{p/q}(A) =\left[D_{p/q}(A)\right]^{-1}N_{p/q}(A)\;,\]

The polynomials used in Eq. (??) are

    \[\ N_{p/q}(A) = n_0\mathrm{I} + n_1A+n_2A^2+...+n_pA^p\;,\]


    \[\ D_{p/q}(A) = \mathrm{I} + d_1A+d_2A^2+...+d_qA^q\;.\]

In the case q=0, the approximation will reduce to the Taylor (Maclaurin) expansion for f(A). There are p+1 unknown coefficients in N_{p/q}(A), and q unknown coefficients in D_{p/q}(A), hence the rational function R_{p/q}(A) has p+q+1 unknown coefficients. Assume that <p class="ql-center-displayed-equation" style="line-height: 18px;">     \[f(A)\] </p>

is analytic and has the Maclaurin expansion,

    \[\ f(A) = a_0\mathrm{I} + a_1A+a_2A^2+...+a_kA^k+...\;,\]

then R_{p/q}(A) is said to be a Padé approximation to the series f(A).
Since the highest possible oder of nonzero derivative of R_{p/q}(A) is p+q, that is

    \[\ \frac{\partial^{k}}{\partial A^{k}}R_{p/q}(A)=0\;,\;\;\;\;\;\forall k \geqslant p+q+1\;,\]

as a result, the first p+q derivatives of f(A) and R_{p/q}(A) are to agree at A=0,

    \[\ \left.\frac{\partial^{k}}{\partial A^{k}}R_{p/q}(A) \right|_{A=0}=\left. \frac{\partial^{k}}{\partial A^{k}}f(A)\right|_{A=0}\;, \;\;\;\;\; k=0,1,...,p+q\;.\]

Eq. (??) implies that,

    \[\ R_{p/q}(A)-f(A) = \sum_{k=p+q+1}^{\infty}c'_kx^k.\]

Multiply D_{p/q} on Eq. (??) giving

    \[\ D_{p/q}(A)f(A)-N_{p/q}(A) = D_{p/q}\sum_{k=p+q+1}^{\infty}c'_kx^k = \sum_{k=p+q+1}^{\infty}c_kx^k \;,\]

and we obtain

    \[\ \sum_{i=0}^{p}n_iA^i - \left( I + \sum_{i=1}^qd_iAi\right)\left(\sum_{i=0}^{\infty}a_iA^i\right)=\sum_{k=p+q+1}^{\infty}c_kx^k \;.\]

When the left side of Eq. (??) is multiplied out out and the coefficients of the power of A^i are set equal to zero for i=0,1,...,p+q, the result is a system of p+q+1 linear equations:

    \[\ a_0-n_0=0\]

    \[\ d_1a_0+a_1-n_1=0\]

    \[\ d_2a_0+d_1a_1+a_2-n_2=0\]

    \[\ \vdots\;\;\;\;\;\;\;\;\;\;\;\;\;\vdots\;\;\;\;\;\;\;\;\;\;\;\;\;\vdots\]

    \[\ d_qa_{p-q}+d_{q-1}a_{p-q+1}+\dots+a_p-n_p=0\]


    \[\ d_{q}a_{p-q+1}+d_{q-1}a_{p-q+2}+ \cdots +d_1a_{p}+a_{p+1}=0\]

    \[\ d_{q}a_{p-q+2}+d_{q-1}a_{p-q+3}+ \cdots +d_1a_{p+1}+a_{p+2}=0\]

    \[\ \vdots\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\vdots\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\vdots\]

    \[\ d_{q}a_{p}+d_{q-1}a_{p+1}+ \cdots +d_1a_{p+q-1}+a_{p+q}=0\]

The q equations in Eq. (??) involve only the  unknowns d_1, d_2, \cdots, d_q, and have to be solved first. Then the equations in Eq. (??) are used successively to find n_0, n_1, \cdots, n_p.

Go back to our original problem, and setting f(A) as e^A gives us,

    \[\ a_n=\frac{1}{n!} \;.\]

Solving Eq. (??) and Eq. (??) together with Eq. (??)  gives us,

    \[\ N_{p/q} = \sum_{i=0}^{p}\frac{(p+q-i)!p!}{(p+q)!i!(p-i)!}A^i\]


    \[\ D_{p/q}= \sum_{i=0}^{q}\frac{(p+q-i)!q!}{(p+q)!i!(q-i)!}(-A)^i\]

It has been discussed that there are several reasons \cite{Moler2003} why the diagonal approximants (p = q) are preferred over the off diagonal approximants (p \neq q) for stability and economy of computation. For p=q, we have n_0=1, n_i=n_{i-1}\frac{p+1-i}{(2p+1-i)i}, and d_i = (-1)^in_i. As noted in Sidje’s thesis\cite{Sidje1994}

, Eq. (??) can be written as the following irreducible form for economical computing reason,

    \[\ R_{p/p} = \left\{\begin{array}{c} 1+2 \left(\sum_{i=0}^{p/2}n_{2i}A^{2i} - A\sum_{i=0}^{p/2-1}n_{2i+1}A^{2i} \right)^{-1}\left( A\sum_{i=0}^{p/2-1}n_{2i+1}A^{2i} \right) \\  \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \mathrm{if}\;p\;\mathrm{is}\;\mathrm{even}\;,\\ -1-2\left(A\sum_{i=0}^{(p-1)/2}n_{2i+1}A^{2i} - \sum_{i=0}^{(p-1)/2}n_{2i}A^{2i} \right)^{-1}\left( \sum_{i=0}^{(p-1)/2}n_{2i}A^{2i} \right) \\  \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \mathrm{if}\;p\;\mathrm{is}\;\mathrm{odd}\;.\\ \end{array}\right. %\sum_{i=0}^{q}\frac{(p+q-i)!q!}{(p+q)!i!(q-i)!}(-A)^i\]

Since Padé approximation is only accurate near the origin so that the approximation of exp(A) is not valid, when \|A\| is too large. Fortunately, this problem will be solved when we introduce so-called scaling and squaring technology\cite{Moler2003 ,Sidje1998}. We make use of the exponential property

    \[\ exp(A)=\left[\mathrm{exp}\left(2^{-s}A\right)\right]^{2^s} \approx \left[\mathrm{R_{p/p}}\left(2^{-s}A\right)\right]^{2^s} \;,\]

where s is chosen such that \|2^{-s}A\| \leqslant 1/2. The idea is to choose s to be a nature number for which M=\mathrm{exp}\left(2^{-s}A\right) can be reliably and efficiently computed, and then to compute the result \mathrm{exp}(A)=M^{2^s} by repeated squaring.

Because the result is evaluated by repeated squaring the exponential of scaled matrix, the drawback of this algorithm may come from the fact that if s\gg1, then the roundoff errors may be large. The error analysis discussed in these papers\cite{Moler2003, Sidje1998} has shown that if <p class="ql-center-displayed-equation" style="line-height: 19px;">     \[\|2^{-s}A\| \leqslant 1/2\] </p>


    \[\ \left[\mathrm{R_{p/p}}\left(2^{-s}A\right)\right]^{2^s}=\mathrm{exp}(A+E) \;, \label{eq:app:PadeError1}\]


    \[\ \frac{\|E\|}{\|A\|} \leqslant \frac{(p!)^2}{((2p)!(2p+1)!}\left(\frac{1}{2}\right)^{2p-3}\approx \left\{\begin{array}{c} 0.34\times10^{-15}\;\;\;\;(p=6)\\ 0.11\times10^{-18}\;\;\;\;(p=7)\\ 0.27\times10^{-22}\;\;\;\;(p=8)\\ \end{array}\right. \;. \label{eq:app:PadeError2}\]

Therefore, a value of p=6 is generally satisfactory for computer computing while using double precision.

Matrix Exponential Source Code

In this section, we will show the source code of matrix exponential in Listing ?? which is implemented based on irreducible (p,p)-degree Padé approximation described in the previous section and Fortran source code of Expokit software package\cite{Sidje1998}. Our source code is written by C++, and we take the advantage of Boost C++ Libraries which have a lots of useful Basic Linear Algebra routines (uBLAS). Our source code is released under GPLv2, its later version, or Boost Software License Version 1.0.


*   \file expm.hpp
*   Implement matrix exponential using pade approximation.
*  Copyright (c) 2007
*  \author Tsai, Dung-Bang  蔡東邦
//  Department of Physics,
//  National Taiwan University.
//  E-Mail : dbtsai [_at_] dbtsai [_dot_] org
//  Begine : 2007/11/20
//  Last modify : 2007/11/26
//  Version : v0.4
//  expm_pad computes the matrix exponential exp(H) for general matrixs,
//  including complex and real matrixs using the irreducible (p,p) degree
//  rational Pade approximation to the exponential
//  exp(z) = r(z) = (+/-)( I+2*(Q(z)/P(z))).
//  Usage :
//   U = expm_pad(H)
//   U = expm_pad(H, t),
//   U = expm_pad(H, t, p),
//  where t is a real number which is default set to 1.0 such that U=exp(t*H),
//  and p is internally set to 6 (recommended and gererally satisfactory).
//  See also MATLAB supplied functions, EXPM and EXPM1.
//  Reference :
//  EXPOKIT, Software Package for Computing Matrix Exponentials.
//  ACM - Transactions On Mathematical Software, 24(1):130-156, 1998
// Use, modification and distribution are subject to the Boost Software License,
// Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at

#include <boost/numeric/ublas/vector.hpp>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/lu.hpp>
#include <boost/numeric/ublas/traits.hpp>

namespace boost { namespace numeric { namespace ublas {

template MATRIX expm_pad(const MATRIX H, typename type_traits::real_type t = 1.0, const int p = 6){
typedef typename MATRIX::value_type value_type;
typedef typename MATRIX::size_type size_type;
typedef typename type_traits<value_type>::real_type real_value_type;
assert(H.size1() == H.size2());
assert(p >= 1);
const size_type n = H.size1();
const identity_matrix<value_type> I(n);
matrix<value_type> U(n,n),H2(n,n),P(n,n),Q(n,n);
real_value_type norm = 0.0;
// Calcuate Pade coefficients
vector<real_value_type> c(p+1);
for(size_type i = 0; i < p; ++i)
c(i+1) = c(i) * ((p - i)/((i + 1.0) * (2.0 * p - i)));
// Calcuate the infinty norm of H, which is defined as the largest row sum of a matrix
for(size_type i=0; i<n; ++i) {
real_value_type temp = 0.0;
for(size_type j = 0; j < n; j++)
temp += std::abs(H(i, j));
norm = t * std::max<real_value_type>(norm, temp);
// If norm = 0, and all H elements are not NaN or infinity but zero,
// then U should be identity.
if (norm == 0.0) {
bool all_H_are_zero = true;
for(size_type i = 0; i < n; i++)
for(size_type j = 0; j < n; j++)
if( H(i,j) != value_type(0.0) )
all_H_are_zero = false;
if( all_H_are_zero == true ) return I;
// Some error happens, H has elements which are NaN or infinity.
std::cerr<<"Null input error in the template expm_pad.\n";
std::cout << "Null INPUT : " << H <<"\n";
// Scaling, seek s such that || H*2^(-s) || < 1/2, and set scale = 2^(-s) int s = 0; real_value_type scale = 1.0; if(norm > 0.5) {
s = std::max(0, static_cast((log(norm) / log(2.0) + 2.0)));
scale /= real_value_type(std::pow(2.0, s));
U.assign((scale * t) * H); // Here U is used as temp value due to that H is const

// Horner evaluation of the irreducible fraction, see the following ref above.
// Initialise P (numerator) and Q (denominator)
H2.assign( prod(U, U) );
Q.assign( c(p)*I );
P.assign( c(p-1)*I );
size_type odd = 1;
for( size_type k = p - 1; k > 0; --k) {
( odd == 1 ) ?
( Q = ( prod(Q, H2) + c(k-1) * I ) ) :
( P = ( prod(P, H2) + c(k-1) * I ) ) ;
odd = 1 - odd;
( odd == 1 ) ? ( Q = prod(Q, U) ) : ( P = prod(P, U) );
Q -= P;
// In origine expokit package, they use lapack ZGESV to obtain inverse matrix,
// and in that ZGESV routine, it uses LU decomposition for obtaing inverse matrix.
// Since in ublas, there is no matrix inversion template, I simply use the build-in
// LU decompostion package in ublas, and back substitute by myself.

// Implement Matrix Inversion
permutation_matrix<size_type> pm(n);
int res = lu_factorize(Q, pm);
if( res != 0) {
std::cerr << "Matrix inversion error in the template expm_pad.\n";
// H2 is not needed anymore, so it is temporary used as identity matrix for substituting.
lu_substitute(Q, pm, H2);
(odd == 1) ?
( U.assign( -(I + real_value_type(2.0) * prod(H2, P))) ):
( U.assign(   I + real_value_type(2.0) * prod(H2, P) ) );
// Squaring
for(size_type i = 0; i < s; ++i)
U = (prod(U,U));
return U;




Dialogue & Discussion