CCAMA – A customized alternating minimization algorithm for structured covariance completion problems

ccama-visual 

Written by:
Armin Zare and Mihailo R. Jovanovic

April 2016

Matlab Files
Download all Matlab files

Papers
Theory and algorithms:   arXiv:1412.3399
Application to turbulent flows:   arXiv:1602.05105

Purpose

This website provides a Matlab implementation of a customized alternating minimization algorithm (CCAMA) for solving the covariance completion problem (CC). This problem aims to explain and complete partially available statistical signatures of dynamical systems by identifying the directionality and dynamics of input excitations into linear approximations of the system. In this, we seek to explain correlation data with the least number of possible input disturbance channels. This inverse problem is formulated as a rank minimization, and for its solution, we employ a convex relaxation based on the nuclear norm. CCAMA exploits the structure of (CC) in order to gain computational efficiency and improve scalability.

Problem formulation

Consider a linear time-invariant system

 	begin{array}{rcl} 	dot{x} 	& !! = !! & 	A , x ; + ; B , u 	[0.1cm] 	y 	& !! = !! &  	C, x 	end{array}

where x(t) is the state vector, y(t) is the output, u(t) is a stationary zero-mean stochastic process, A is the dynamic matrix, B is the input matrix, and C is the output matrix. For Hurwitz A and controllable (A, B), a positive definite matrix X qualifies as the steady-state covariance matrix of the state vector

 	X ,mathrel{mathop:}=, lim_{t,rightarrow ,infty} mathbf{E} left( x(t), x^*(t) right),

if and only if the linear equation

 	A,X ,+, X A^* ;=; -left(B H^* ,+, H B^*right),

is solvable for H. Here, mathbf{E} is the expectation operator, H represents the cross-correlation of the state x and the input u, and * denotes the complex conjugate transpose.

The algebraic relation between second-order statistics of the state and forcing can be used to explain partially known sampled second-order statistics using stochastically-driven LTI systems. While the dynamical generator A is known, the origin and directionality of stochastic excitation u is unknown. It is also important to restrict the complexity of the forcing model. This complexity is quantified by the number of degrees of freedom that are directly influenced by stochastic forcing and translates into the number of input channels or {rm rank}(B). It can be shown that the rank of B is closely related to the signature of the matrix

 	begin{array}{rcl} 	Z  	&!! mathrel{mathop:}= !!& 	-(A,X ,+, XA^*) 	[.15cm] 	&!! = !!& 	B H^* ,+, H B^*. 	end{array}

The signature of a matrix is determined by the number of its positive, negative, and zero eigenvalues. In addition, the rank of Z bounds the rank of B.

Based on this, the problem of identifying low-complexity structures for stochastic forcing can be formulated as the following structured covariance completion problem

 	begin{array}{cl} 	{rm minimize} 	&  	-{rm log,det}left(Xright) ,+, gamma,||Z||_* 	[.25cm] 	{rm subject~to}  	& 	~A , X ,+, X A^* ,+, Z  ;=; 0 	[.15cm] 	& 	,,left(C X C^* right)circ E ,-, G ;=; 0. 	 end{array} 	hspace{1.5cm}     	{rm (CC)}

Here, gamma is a positive regularization parameter, the matrices A, C, E, and G are problem data, and the Hermitian matrices X, Z are optimization variables. Entries of G represent partially available second-order statistics of the output y, the symbol circ denotes elementwise matrix multiplication, and E is the structural identity matrix,

 	E_{ij} ;=; 	left{ 	begin{array}{ll} 	1, 	&~ 	mathrm{if} ~ G_{ij} ~ {rm is~available} 	[.1cm] 	0, 	&~ 	mathrm{if}~ G_{ij}~ {rm is~unavailable.} 	end{array} 	right.

Convex optimization problem (CC) combines the nuclear norm with an entropy function in order to target low-complexity structures for stochastic forcing and facilitate construction of a particular class of low-pass filters that generate colored-in-time forcing correlations. The nuclear norm, i.e., the sum of singular values of a matrix, ||Z||_* mathrel{mathop:}= sum_{i} sigma_i (Z), is used as a proxy for rank minimization. On the other hand, the logarithmic barrier function in the objective is introduced to guarantee the positive definiteness of the state covariance matrix X.

In (CC), gamma determines the importance of the nuclear norm relative to the logarithmic barrier function. The convexity of (CC) follows from the convexity of the objective function

 	J_p(X,Z)  	;mathrel{mathop:}= ; 	 -{rm log,det} X  	 ,+, 	  gamma,||Z||_*

and the convexity of the constraint set. Problem (CC) can be equivalently expressed as follows,

 	begin{array}{cl} 	{rm minimize} 	& 	-{rm log,det} X ,+, gamma,||Z||_* 	[.25cm] 	{rm subject~to} 	& 	~{cal A}, X ,+, {cal B}, Z ,-, {cal C}  ,=, 0, 	 end{array} 	 hspace{1.5cm}     	{rm (CC}-{rm 1)}

where the constraints are now given by

 	left[ 	begin{array}{c} 	{cal A}_1  {cal A}_2 	end{array} 	right] 	X ,+, 	left[  	begin{array}{c} 	I  0 	end{array}  	right] 	Z ,-, 	left[  	begin{array}{c} 	0  G 	end{array}  	right] 	,=, 0.

Here, {cal A}_1 and {cal A}_2 are linear operators, with

 	begin{array}{c} 	{cal A}_1(X) ; mathrel{mathop:}= ; A , X ; + ; XA^* 	[.15cm] 	{cal A}_2(X) ; mathrel{mathop:}= ; (C , X , C^*)circ E 	end{array}

and their adjoints, with respect to the standard inner product left<M_1, M_2right> mathrel{mathop:}= {rm trace} , (M_1^* M_2), are given by

 	begin{array}{c} 	{cal A}_1^dagger (Y) ; = ; A^* , Y ;+; Y A 	[.15cm] 	{cal A}_2^dagger (Y) ; = ; C^* (Ecirc Y) , C. 	end{array}

By splitting Z into positive and negative definite parts,

 	Z ; = ; Z_+ ; - ; Z_-,~~~~Z_+ ; succeq ; 0,~~~~Z_- ; succeq ; 0

it can be shown that (CC-1) can be cast as an SDP,

 	begin{array}{cl} 	{rm minimize} 	& 	-{rm log,det} X 	; + ;  	gamma 	left( {cal A}, (Z_+) , + , {rm trace}, (Z_-)right) 	[.25cm] 	{rm subject~to} 	& {cal A}_1(X) ,+, Z_+ , - , Z_- ,=, 0 	[0.15cm] 	& {cal A}_2 (X) ,-, G ,=, 0 	[0.15cm] 	& Z_+ , succeq , 0,~~~~Z_- , succeq , 0. 	 end{array} 	 hspace{1.5cm}     	{rm (P)}

The Lagrange dual of (P) is given by

 	begin{array}{cl} 	{rm maximize} 	& 	{rm log,det} left( {cal A}_1^dagger(Y_1)  	, + ,  	{cal A}_2^dagger(Y_2) right)  	, - ;  	left<G,, Y_2 right>  	, + ; n 	[.25cm] 	{rm subject~to} 	& 	||Y_1||_2  	, leq ,  	gamma 	 end{array} 	 hspace{1.5cm}     	{rm (D)}

where Hermitian matrices Y_1, Y_2 are the dual variables associated with the equality constraints in (P) and n is the number of states.

Alternating Minimization Algorithm (AMA)

The logarithmic barrier function in (CC) is strongly convex over any compact subset of the positive definite cone. This makes it well-suited for the application of AMA, which requires strong convexity of the smooth part of the objective function.

The augmented Lagrangian associated with (CC-1) is

	 	begin{array}{l} 	{cal L}_rho (X, Z; Y_1, Y_2)  	; = ; 	displaystyle{-{rm log,det} X , + , gamma, ||Z||_*}   	, +, 	left<Y_1,, {cal A}_1 (X) + Z right> 	, + , 	left<Y_2,, {cal A}_2 (X) - G right> 	~ + 	[0.25cm]  	hfill 	{  	displaystyle{ 	frac{rho}{2}, ||{cal A}_1 (X) , + , Z||_F^2} 	; + ; 	displaystyle{ 	frac{rho}{2}, ||{cal A}_2 (X) , - , G||_F^2} 	} 	end{array}

where rho is a positive scalar and ||cdot||_F is the Frobenius norm.

AMA consists of the following steps:

 	begin{array}{rrcl} 	X-mathbf{minimization}~mathbf{step:} 	& 	~~ 	X^{k+1}  	&!!mathrel{mathop:}=!!& 	 {rm argmin}_{X} ,  {cal L}_0, ( X,, Z^k; , Y_1^k,, Y_2^k) 	[0.25cm] 	Z-mathbf{minimization}~mathbf{step:} 	& 	~~ 	Z^{k+1}  	&!!mathrel{mathop:}=!!&  	{rm argmin}_{Z} , {cal L}_{rho}, ( X^{k+1},, Z; , Y_1^k,, Y_2^k) 	[0.25cm] 	Y-mathbf{update}~mathbf{step:} 	& 	~~ 	Y_1^{k+1} 	&!!mathrel{mathop:}=!!&  	Y_1^k ,+, displaystyle{rho left( {cal A}_1 (X^{k+1}) , + , Z^{k+1} right)} 	[0.25cm] 	& 	~~ 	Y_2^{k+1}  	&!!mathrel{mathop:}=!!&  	Y_2^k ,+, displaystyle{rho left( {cal A}_2 (X^{k+1}) , - , G right)} 	end{array}

These terminate when the duality gap

 	Delta_{mathrm{gap}} 	; mathrel{mathop:}= ; 	-{rm log,det} X^{k+1} ,+, gamma,||Z^{k+1}||_* , - , J_d left(Y_1^{k+1}, Y_2^{k+1} right)

and the primal residual

 	Delta_{mathrm{p}}  	;mathrel{mathop:}=;  	||{cal A} , X^{k+1} ,+, {cal B}, Z^{k+1} ,-, {cal C}||_F

are sufficiently small, i.e., |Delta_{mathrm{gap}}| leq epsilon_1 and Delta_{mathrm{p}} leq epsilon_2. In the X-minimization step, AMA minimizes the Lagrangian {cal L}_0 with respect to X. This step is followed by a Z-minimization step in which the augmented Lagrangian {cal L}_{rho} is minimized with respect to Z. Finally, the Lagrange multipliers, Y_1 and Y_2, are updated based on the primal residuals with the step-size rho.

The X-minimization step amounts to a matrix inversion and the Z-minimization step amounts to application of the soft-thresholding operator {cal S}_tau on the singular values of a matrix:

 	begin{array}{rcl} 	X^{k+1}  	&!!=!!& 	left( 	{cal A}_1^dagger (Y_1^k) 	,+, 	{cal A}_2^dagger (Y_2^k) 	right)^{-1} 	[0.25cm] 	Z^{k+1}  	&!!=!!&  	{cal S}_tau left( - {cal A}_1 ( X^{k+1} ) , - , (1/rho)Y_1^k right), ~~~~ tau = gamma/rho 	[0.25cm] 	Y_1^{k+1} 	&!!=!!&  	{cal T}_{gamma} left(Y_1^k ,+, rho, {cal A}_1 (X^{k+1}) right) 	[0.25cm] 	Y_2^{k+1}  	&!!=!!&  	Y_2^k ,+, displaystyle{rho left( {cal A}_2 (X^{k+1}) , - , G right)} 	end{array}

For Hermitian matrix M with singular value decomposition M = U Sigma U^*, {cal S}_tau and {cal T}_tau denote the singular value thresholding and saturation operators, respectively:

 	begin{array}{rclrcl} 	{cal S}_tau (M)  	&!! mathrel{mathop:}= !!& 	U , {cal S}_tau (Sigma) ,U^*, 	& 	~~~~ 	{cal S}_tau (Sigma)  	&!! = !!&  	{rm diag} left( left(sigma_i ,-, tau right)_+ right) 	[.25cm] 	{cal T}_tau (M)  	&!! mathrel{mathop:}= !!& 	U , {cal T}_tau (Sigma) ,U^*, 	& 	~~~~ 	{cal T}_tau (Sigma)  	& !! = !! & 	{rm diag} left( min left( max( sigma_i, -, tau ), tau right) right) 	end{array}

with a_+ mathrel{mathop:}= {rm max} , {a, 0}. The updates of Lagrange multipliers guarantee dual feasibility at each iteration, i.e., ||Y_1^{k+1}||_2 leq gamma for all k.

Alternating Direction Method of Multipliers (ADMM)

In contrast to AMA, ADMM minimizes the augmented Lagrangian in each step of the iterative procedure. In addition, ADMM does not have efficient step-size selection rules. Typically, either a constant step-size is selected or the step-size is adjusted to keep the norms of primal and dual residuals within a constant factor of one another Boyd, et al. ’11.

ADMM consists of the following steps:

 	begin{array}{rrcl} 	X-mathbf{minimization}~mathbf{step:} 	& 	~~ 	X^{k+1}  	&!!mathrel{mathop:}=!!& 	 {rm argmin}_{X} ,  {cal L}_{rho}, ( X,, Z^k; , Y_1^k,, Y_2^k) 	[0.25cm] 	Z-mathbf{minimization}~mathbf{step:} 	& 	~~ 	Z^{k+1}  	&!!mathrel{mathop:}=!!&  	{rm argmin}_{Z} , {cal L}_{rho}, ( X^{k+1},, Z; , Y_1^k,, Y_2^k) 	[0.25cm] 	Y-mathbf{update}~mathbf{step:} 	& 	~~ 	Y_1^{k+1} 	&!!mathrel{mathop:}=!!&  	Y_1^k ,+, displaystyle{rho left( {cal A}_1 (X^{k+1}) , + , Z^{k+1} right)} 	[0.25cm] 	& 	~~ 	Y_2^{k+1}  	&!!mathrel{mathop:}=!!&  	Y_2^k ,+, displaystyle{rho left( {cal A}_2 (X^{k+1}) , - , G right)} 	end{array}

which terminate with similar stopping criteria to ADMM.

While the Z-minimization step is equivalent to that of AMA, the X-update in ADMM is obtained by minimizing the augmented Lagrangian. This amounts to solving the following optimization problem

 	begin{array}{cl} 	{rm minimize} 	&!! 	- {rm log,det} X 	;+; 	displaystyle{frac{rho}{2}} , displaystyle{sum_{i , = , 1}^2} ,||{cal A}_i (X) ,-, U_i^k||_F^2 	 	 end{array}

where U_1^k mathrel{mathop:}= -left( Z^k + (1/rho)Y_1^k right) and U_2^k mathrel{mathop:}= G - (1/rho)Y_2^k. From first order optimality conditions we have

 	-,X^{-1}  	,+,  	rho, {cal A}_1^dagger ( {cal A}_1 (X) - U_1^k ) 	,+,  	rho, {cal A}_2^dagger ( {cal A}_2 (X) - U_2^k ) 	;=; 	0.

Since {cal A}_1^dagger {cal A}_1 and {cal A}_2^dagger {cal A}_2 are not unitary operators, the X-minimization step does not have an explicit solution. To solve this problem we use a proximal gradient method Parikh and Boyd ’13 to update X. By linearizing the quadratic term around the current inner iterate X_i and adding a quadratic penalty on the difference between X and X_i, X_{i+1} is obtained as the minimizer of

 	-{rm log,det} X 	; + ; 	rho, displaystyle{sum_{j , = , 1}^2} left<{cal A}_j^daggerleft({cal A}_j (X_i) - U_j^k right), Xright> 	; +; displaystyle{frac{mu}{2}} , ||X ,-, X_i||_F^2.

To ensure convergence of the proximal gradient method, the parameter mu has to satisfy mu ge rho, lambda_{max}( {cal A}_1^dagger {cal A}_1 + {cal A}_2^dagger {cal A}_2), where we use power iteration to compute the largest eigenvalue of the operator  {cal A}_1^dagger {cal A}_1 + {cal A}_2^dagger {cal A}_2.

By taking the variation with respect to X we arrive at the first order optimality condition

 	mu, X ,-, X^{-1} 	;= ; 	mu, X_i   	,-, 	rho,  	displaystyle{sum_{j , = , 1}^2} {cal A}_j^daggerleft({cal A}_j (X_i) - U_j^k right) 	hspace{1.5cm}     	{rm (*)}

The solution to (*) can be iteratively computed as

 	X_{i+1}  	; = ;  	V , {rm diag} left( g right) V^*

where the jth entry of the vector g is given by

 	displaystyle{g_j 	;=; 	displaystyle{frac{lambda_j}{2mu}} ,+, sqrt{left(displaystyle{frac{lambda_j}{2mu}}right)^2 ,+, displaystyle{frac{1}{mu}}}}.

Here, lambda_j's are the eigenvalues of the matrix on the right-hand-side of (*) and V is the matrix of the corresponding eigenvectors. As it is typically done in proximal gradient algorithms Parikh and Boyd ’13, starting with X_0 mathrel{mathop:}= X^k, we obtain X^{k+1} by repeating inner iterations until the desired accuracy is reached.

Acknowledgements

This project is supported by the