Geometric methods for the estimation of structured covariances

Introduction

Consider a zero-mean, real-valued, discrete-time, stationary random process {x(t), ~ tin mathbb{Z} } . Let

denote the autocorrelation function, and

T:= left[ begin{array}{cccc} r(0) & r(1) & ldots& r(n-1) r(-1)& r(0) & ldots & r(n-2) vdots& vdots & ddots & vdots r(-n+1)& r(-n+2)& ddots& r(0) end{array}right]

the covariance of the finite observation vector

{{bf x}} = [x(0),~ x(1),~ldots,~x(n-1)]'

i.e. T=E({{bf x}{bf x}}') . Estimating from an observation record is often the first step in spectral analysis.

$hat{T} :=frac{1}{m}sum_{k=1}^m {bf x}_k {bf x}_k^prime$

may not have the Toeplitz structure. We seek a Toeplitz structured estimate by solving

$min_{{color{red} T}in mathcal{T}} d({color{red}T}, {color{blue}hat{T}}), hspace*{3cm}(1)$

where mathcal{T}:=left{T: Tgeq 0, T~ mbox{being Toeplitz} right} . In this, represents a suitable notion of distance. In the following, we overview some notions of distance and the relation among them. If the distance is not symmetric, we use d(T||hat{T}) instead. This approach can be carried out verbatim in approximating state covariances.

Likelihood divergence

Assuming that {x(t), tin mathbb{Z}} is Gaussian, and letting ${bf x}:= [{bf x}_1,ldots, {bf x}_m]$ , the joint density function is

$p({{bf x}}; T)=(2pi)^{- frac{mn} {2}} |T|^{-frac{m}{2}} exp(-frac{1}{2}{rm trace}(T^{-1}hat{T}) ),$

where $hat{T}=frac{1}{m}{bf x}{bf x}^prime$ is the sample covariance matrix. Given hat{T} , it is natural to seek a Tin mathcal{T} such that p({bf x}; T) , equivalently log p({bf x}; T) , is maximal. Alternatively, one may consider the likelihood divergence

$begin{array}{rl} d_{rm L}(T||hat{T}) :=& frac{1}{m}(log p({{bf x}}; hat{T})-log p({{bf x}}; T)) =& frac{1}{2}left(-log|hat{T}| +log|T|+ {rm trace} (hat{T}T^{-1})-n right). end{array}$

Using $d_{rm L}(T||hat{T})$ , then the relevant optimization problem in (1) is

$min_{{color{red} T}in mathcal{T}} left{ -log|{color{blue}hat{T}}| +log|{color{red} T}|+ {rm trace} ({color{blue}hat{T}}{color{red}T}^{-1})-nright},$

which is not convex. A necessary condition for a local minimum is given in [Burg, et al.]:

${rm trace} left( ( T^{-1}hat{T} T^{-1}-T^{-1}) Q right)=0,$

for all Toeplitz . An iterative algorithm for solving the above equation and obtain a (local) minimum is proposed in [Burg, et al.]. This is implemented in CovEst_blw.m.

Kullback-Leibler divergence

The Kullback-Leibler (KL) divergence between two probability density functions , hat{p} on $mathbb{R}^{n}$ is

$d_{rm KL}(p|| hat{p}) :=int_{mathbb{R}^n} plog (frac{p}{hat{p}}) dx.$

In the case where and hat{p} are normal with zero-mean and covariance and hat{T} , respectively, their KL divergence becomes

$d_{rm KL}(p|| hat{p}) =frac{1}{2}left(log|hat{T}| -log|T|+ {rm trace} (That{T}^{-1})-n right),$

while switching the order $d_{rm KL}(hat{p}|| p)= d_{rm L}(T||hat{T})$ given earlier. If $d_{rm KL}(p|| hat{p})$ is used in (1) , then the optimization problem becomes

$min_{{color{red} T}in mathcal{T}} left{ log|{color{blue}hat{T}}| -log|{color{red}T}|+ {rm trace} ({color{red}T}{color{blue}hat{T}}^{-1})-nright},$

which is convex. This is implemented in CovEst_KL.m which requires the installation of cvx.

Fisher metric and geodesic distance

The KL divergence induces the Fisher metric, which is the quadratic term of $d_{rm KL}(p|| p+delta)$

$g_{p,{rm Fisher}}(delta)=int frac{delta^2}{p}dx.$

For probability distributions parameterized by a vector theta , the corresponding metric is often referred to as the Fisher-Rao metric and given by

$g_{p,{rm Fisher-Rao}}(delta_theta)=delta_{theta}'Eleft[ left( frac{partial log p}{partialtheta}right)left( frac{partial log p}{partialtheta}right)' right]delta_{theta}.$

For zero-mean Gaussian distributions parametrized by the corresponding covariance matrices, it becomes the Rao metric

$g_{T, {rm Rao}}(Delta)=left|T^{-1/2} Delta T^{-1/2} right|_F^2.$

The geodesic distance between two points and hat{T} is the log-deviation distance

$d_{rm Log}(T,hat{T})=|log (hat{T}^{-1/2}T hat{T}^{-1/2}) |_F.$

When $d_{rm Log}(T,hat{T})$ is used in (1) , the corresponding optimization problem is not convex. Linearization of the objective function about hat{T} may be used instead, this leads to

$min_{{color{red}T}in mathcal{T}}left{ |{color{blue}hat{T}}^{-1/2}{color{red}T} {color{blue}hat{T}}^{-1/2}-I |_Fright}$

which is convex.

Bures metric and Hellinger distance

The Fisher information metric is the unique Riemannian metric for which stochastic maps are contractive [N. Cencov, 1982]. In quantum mechanics, a similar property has been sought for density matrices, which are positive semi-definite matrices with trace equal to one. There are several metrics for which stochastic maps are contractive. They take the form

${rm trace}(Delta D_{T}(Delta)).$

For example, in the Bures metric

$D_{T,{rm Bures}}(Delta):=M, ~mbox{where~}frac12 (TM+MT)=Delta.$

The Bures metric can also be written as

$g_{T, {rm Bures}}(Delta) =min_W Big{ ~|Y|_F^2 ~mid~ Delta=frac12(YW'+WY'), ~mbox{and}~T=WW' Big}.$

If ${rm trace}(T)=|W|_F^2=1$ , we regard as a point on the unit sphere, and this point is not unique. The geodesic distance between two density matrices is the smallest great-circle distance between corresponding points, which is call Bures length. The smallest “straight line” distance is the Hellinger distance which is

$begin{array}{rl} d_{rm H}(T, hat{T}):=!&min_{U, V} left{|T^{frac12}U!-!hat{T}^{frac12}V |_Fmid UU'!=!I,VV'!=!Iright} =&min_{U} left{|T^{frac12}U!-!hat{T}^{frac12} |_F mid UU'=Iright}. end{array}$

The distance has a closed form

$d_{rm H}(T, hat{T})=left({rm trace}(T+hat{T}-2(hat{T}^frac12 T hat{T}^frac12)^frac12) right)^frac12.$

When the above distance is used in (1) , the corresponding optimization problem is actually equivalent to a linear matrix inequality (LMI) [1]. This will be shown in the following section.

Transportation distance

Let {bf x} and bf{y} be random variables in $mathbb{R}^n$ having pdf's p_x and p_y . A formulation of the Monge-Kantorovich transportation problem (with a quadratic cost) is as follows. Determine

$d^2_{rm W_2} (p_x, p_y)!:=!inf_{p}left{E (|{bf x}-{bf y}|^{2})~mid~ int_x p!=!p_y,int_y p!=!p_x right},$

where is the joint distribution for {bf x} and bf{y} . The metric $d_{{rm W}_2}$ is known as the Wasserstein metric. For {bf x} and bf{y} being zero-mean normal with covariances and hat{T} and also jointly normal, we obtain

$d^2_{rm{W_2}}(p, hat{p})= min_{S}left{ {rm trace}(T+hat{T}-S-S') ~mid~ left[ begin{array}{cc} T & S S' & hat{T} end{array} right]geq 0 right},$

where $S:=E({bf x} bf{y}^prime)$ . The distance $d_{rm{W_2}}(p, hat{p})$ has a closed form which is given by

$d_{rm{W_2}}(p, hat{p})=left({rm trace}(T+hat{T}-2(hat{T}^frac12 T hat{T}^frac12)^frac12) right)^frac12.$

So $d_{rm{W_2}}(p, hat{p})=d_{rm H}(T, hat{T})$ . So using either $d_{rm{W_2}}(p, hat{p})$ , or $d_{rm H}(T, hat{T})$ , the relevant optimization problem (1) becomes

$min_{{color{red}T}in mathcal{T},; S}left{{rm trace}({color{red}T}+{color{blue}hat{T}}-S-S') ~mid~ left[ begin{array}{cc} {color{red}T} & S S' & {color{blue}hat{T}} end{array} right]geq 0right}.$

It is an LMI thus a convex optimization problem. This is implemented in CovEst_transp.m.

An alternative interpretation for the transportation distance is as follows. We postulate the stochastic model

where {bf v} represents noise, and hat{T} and the covariances of {bf hat x} and {bf x} , respectively. By seeking the least possible noise variance while allowing possible coupling between {bf x} and {bf v} brings us the minimum transportation distance. Analogous rationale, albeit with different assumptions has been used to justify different methods. For instance, assuming that {bf hat x}= {bf x}+{bf v} while {bf x} and {bf v} are independent leads to

$min_{Tinmathcal{T}}left{ {rm trace}(hat{T}-T)~mid~ hat{T}-Tgeq0right}.$

This is implemented in CovEst_Q.m (SCovEst_Q.m for state-covariance estimation). Then, also, assuming a “symmetric” noise contribution as in

{ bf hat {x}}+{bf hat{v}}={bf x}+{bf v},

where the noise vectors hat{bf v} and {bf v} are independent of {bf x} and {bf hat{x}} , we are called to solve

$min_{Tin {mathcal{T}}, Q,hat{Q}}left{ {rm trace}(hat{Q}+Q)~mid~ hat{T}+hat{Q}=T+Q,~Q,hat{Q}geq0right},$

where hat{Q} and designate covariances of hat{bf v} and {bf v} , respectively. This is implemented in CovEst_QQh.m (SCovEst_QQh.m for state covariance estimation).

Mean-based covariance approximation

We investigate the concept of the mean as a way to fuse together sampled statistics from different sources into a structured one. In particular, given a set of positive semi-definite matrices ${hat{T}_i, ~i=1,ldots, N }$ , we consider the mean-based structured covariance

$T^*:=argmin_{Tin{mathcal{T}}}~ frac{1}{N}sum_{i=1}^N d^2(T,hat{T}),$

for a metric d(T,hat{T}) or a divergence d(T||hat{T}) (and a suitable choice of power). Below, we consider each of the distances mentioned earlier.

Mean-based on KL divergence and likelihood

The optimization problem based on $d_{rm KL}$ is

$min_{Tin mathcal{T} } ~!!left{frac{1}{N}sum_{i=1}^{N}left(log |hat{T}_i|!-!log |T|!+!{rm trace}(That{T}_i^{-1})!-!nright)right},$

and is equivalent to

$min_{Tin mathcal{T} }~left{-log |T|+{rm trace}(T hat{T}^{-1})right},$

where $hat{T}=left(frac{1}{N}sum_{i=1}^N hat{T}_i^{-1}right)^{-1}$ is the harmonic mean of the $hat{T}_i$ 's. On the other hand, if the likelihood divergence $d_{rm L}$ is used, the optimization problem

$min_{Tin mathcal{T} }~!! left{frac{1}{N}sum_{i=1}^{N}left(-log|hat{T}_i|!+!log|T|!+!{rm trace} (hat{T}_i T^{-1})!-!nright)right}$

is equivalent to

$min_{Tin mathcal{T} }~left{log|T|+{rm trace} (hat{T} T^{-1})right},$

where $hat{T}=frac{1}{N}sum_{i=1}^N hat{T}_i$ is the arithmetic mean of $hat{T}_i$ 's.

Mean based on log-deviation

In this case

$min_{Tin mathcal{T}}~left{frac{1}{N}sum_{i=1}^{N} |log (hat{T}_i^{-1/2}T hat{T}_i^{-1/2}) |_F^2right}$

is not convex in . If the admissible set mathcal{T} is relaxed to the set of positive definite matrices, the minimizer is precisely the geometric mean, and it is the unique positive definite solution to the following equation

$sum_{i=1}^N log(hat{T}_i^{-1}T)=0.$

When N=2 , is the unique geometric mean between two positive definite matrices

$T=hat{T}_1^frac12 (hat{T}_1^{-frac12}hat{T}_2hat{T}_1^{-frac12})^frac12 hat{T}_1^frac12.$

One may consider instead the linearized optimization problem

$min_{Tin mathcal{T}}~left{sum_{i=1}^N |hat T_i^{-1/2}That T_i^{-1/2}-I|_F^2 right}$

which is convex in .

Mean based on transportation/Bures/Hellinger distance

Using the transportation/Bures/Hellinger distance, the optimization problem becomes

$min_{Tin mathcal{T},; S_i}~bigg{frac{1}{N}sum_{i=1}^N{rm trace}(T+hat{T}_i-S_i-S_i') ~mid ~ left[ begin{array}{cc} T & S_i S_i' & hat{T}_i end{array} right]geq 0,~forall~ i=1,cdots, N bigg}.$

When is only constrained to be positive semi-definite, it has been shown that the transportation mean is the unique solution of the following equation

$frac{1}{N}sum_{i=1}^N (T^frac12 hat{T}_i T^frac12)^frac12=T.$

The routine Tmean.m provides a solution to the mean based Toeplitz estimation problem which is based on cvx

Examples

Identifying a spectral line in noise

We compare the performance of the likelihood estimation and the transportation-based method in identifying a single spectral line in white noise. Consider a single “observation record”

$x(t)=cos(frac{pi}{4}t+phi)+v(t),; t=0,1,ldots, 10,$

with the “noise” generated from a zero-mean Gaussian distribution. Hence the covariance matrix

hat{T}=left[ begin{array}{c} x(0)x(1) vdots x(10)end{array}right]left[ begin{array}{cccc} x(0)&x(1)& cdots& x(10)end{array}right]

is 11times 11 and of rank equal to . We use CovEst_blw and CovEst_transp to approximate hat{T} with a covariance having admissible structure. We compare the corresponding maximum-entropy spectral estimates (subplot 2&3) with the spectrum obtained using pburg (matlab command) shown in subplot 1. The true spectral line is marked by an arrow in each plot.

Averaging multiple spectra

Consider different runs

x_k(t)=cos(omega_k t+phi_k)+v_k(t),~t=0,1,ldots, 10

with $omega_1=frac{pi}{4}-frac{pi}{100}$ , $omega_2=frac{pi}{4}$ , $omega_3=frac{pi}{4}+frac{pi}{100}$ and $phi_1=frac{pi}{4}$ , $phi_2=frac{2pi}{4}$ , $phi_3=frac{3pi}{4}$ an v_1, v_2 and v_3 generated from zero-mean Gaussian distribution with variance 0.0025 . We consider the three rank one sample covariances. We estimate the likelihood and transportation mean of these three sampled covariance with the requirement that the mean also be Toeplitz (using CovEst_blw and Tmean). We plot the corresponding maximum entropy spectra. As a comparison, we also take the average of three observation records, and compute the Burg's spectrum based on the averaged data. The three results are shown in the following figure. Both the likelihood and the Burg's methods have two peaks near the central frequency $frac{pi}{4}$ , whereas the transportation-based method has only one peak at approximately the mean of the three instantiations of the underlying sinusoid.

Reference

code1

[1] L. Ning, X. Jiang and T.T. Georgiou, “Geometric methods for the estimation of structured covariances,” submitted, http://arxiv.org/abs/1110.3695.