### MR forward models

Before we introduce the concept of compressed sensing MR, we begin with the discussion of MR forward model. Mathematically, the forward model for k-space measurement can be described by.

$$ {b}^i\left(\mathbf{k}\right)=\int {\gamma}^i(r){e}^{-j2\pi {k}^Tr} dr,\kern1em i=1,\dots, C, $$

(1)

where **r** ∈ R^d and **k** ∈ R^{d}, d = 2, 3 denote the image domain and k-space coordinates, respectively, C is the number of the coil and the i-th coil image γ^{i} is given by

$$ {\upgamma}^i\left(\mathbf{r}\right)=\upnu \left(\mathbf{r}\right){s}^i\left(\mathbf{r}\right) $$

(2)

Here, ν(**r**) denotes the contrast weighted bulk magnetization distribution, and *s*^{i}(**r**) is the corresponding coil sensitivity of the i-th coil. Although the expression may give an impression that the measurement is two- or three- dimensional, this is indeed originated from a 1-D measurement since the k-space trajectory is a function of time, i.e. **k**:= **k**(*t*), and we acquire one k-space sample at each time point t. This makes the MR imaging inherently slow, since we need to scan through a 3-D object via 1-D trajectories.

Dynamic MRI is another important MR technique to monitor dynamic processes such as brain hemodynamics and cardiac motion. Among the various forms of dy- namic MR modeling, here we mainly focus on the k-t formulation. Specifically, consider a discrete imaging equation for cartesian trajectory for simplicity. Because the samples along the readout direction are fully sampled, most of the dynamic MR formulation is applied separably after taking the Fourier transform along the read- out direction. More specifically, let γ*(s, t)* denote the unknown image content (for example, proton density, T1/T2 weighted image, etc.) on the spatial coordinate s along the phase encoding line at time instance t. Then, the k-t space measurement *b(k, t)* at time t is given by

$$ b\left(k,t\right)=\int \gamma \left(s,t\right){e}^{-j2\pi ks} ds $$

(3)

where γ*(s, t)* is the spatio-temporal image which may be weighted by coil sensitivity map for the case of parallel imaging.

Throughout the paper, for simplicity we often use the operator notation for (1):

$$ {b}^i=F\left[{\mathrm{S}}^i\right]\upnu, \kern1em i=1,\cdots, \mathrm{C}, $$

where [S^{i}] is a diagonal operator comprised of the i-th coil sensitivity. Similarly, we use operator notation for (3).

or

$$ {b}^i=F{\gamma}^i,\kern1em \mathrm{i}=1,\cdots, \mathrm{C} $$

(5)

for the case of parallel imaging.

### Compressed sensing theory

#### Performance guarantees

Compressed sensing (CS) theory [12, 16, 17] addresses the accurate recovery of unknown sparse signals from underdetermined linear measurements and has become one of the main research topics in the signal processing area for the last two decades [18,19,20,21,22,23]. Most of the compressed sensing theories have been developed to address the so-called single measurement vector (SMV) problems [12, 16, 17]. More specifically, let m and n be positive integers such that m < n. Then, the SMV compressive sensing problem is given by

$$ (P0):{\displaystyle \begin{array}{c}\operatorname{minimize}\ {\left\Vert \mathrm{x}\right\Vert}_0\\ {}\mathrm{subject}\ \mathrm{to}\ \left\Vert \mathrm{y}-\mathrm{Ax}\right\Vert <\upepsilon, \end{array}} $$

(6)

where y ∈ R^m, A ∈ R^*m*\×*n*, x ∈ R^n, and ϵ denotes the noise level. (P0) implies that the solution favours the sparsest solution.

Since (P0) requires a computationally expensive combinatorial optimization, greedy methods [24], reweighted norm algorithms [25, 26], or convex relaxation using the l_{1} norm [12, 27] have been widely investigated as alternatives. In particular, the convex relaxation approach addresses the following l_{1} minimization problem:

$$ (P1):{\displaystyle \begin{array}{c}\operatorname{minimize}\ {\left\Vert \mathrm{x}\right\Vert}_1\\ {}\mathrm{subject}\ \mathrm{to}\ \left\Vert \mathrm{y}-\mathrm{Ax}\right\Vert <\upepsilon, \end{array}} $$

(7)

One of the important theoretical tools of CS is the so-called restricted isometry property (RIP), which enables us to guarantee the robust recovery of certain input signals [17]. More specifically, a sensing matrix \( \mathrm{A}\in {\mathrm{R}}^{m\times \mathrm{n}} \) is said to have a k- restricted isometry property (RIP) if there is a constant 0 ≤ δ_{k} < 1 such that

$$ \left(1-{\updelta}_k\right)\left\Vert \mathrm{x}\right\Vert 2\le \left\Vert \mathrm{Ax}\right\Vert 2\le \left(1+{\updelta}_k\right)\left\Vert \mathrm{x}\right\Vert 2. $$

for all x ∈ R^n with ||x||_{0} ≤ k. It has been demonstrated that δ_{2k} < \( \sqrt{2}-1 \) is sufficient for l_{1}/l_{0} equivalence [12]. For many classes of random matrices, the RIP condition is satisfied with high probability if the sampling pattern is incoherent and the number of measurements satisfies m ≥ *ck* log(*n/k*) for some constant c > 0 [17].

#### Optimization approaches

In practice, the following unconstrained form of optimization problem is often used:

$$ \underset{x}{\min}\frac{1}{2}{\left\Vert \mathrm{y}-\mathrm{Ax}\right\Vert}^2+\uplambda {\left\Vert \Psi \mathrm{x}\right\Vert}_1 $$

(8)

where y is noisy measurement, λ is the regularization parameter, and Ψ refers to an analysis transform such that Ψx becomes sparse. Note that for the special case of Ψ = I, (8) is reduced to (*P*1).

One of the technical issues in solving (8) is that the cost function is not smooth at Ψx = 0, which makes the corresponding gradient ill-defined. This leads to the development of various techniques, which are culminated as the new type of convex optimizaton theory called *proximal optimization* [28]. For example, the popular op- timization methods such as forward-backward splitting (FBS) [29], split Bregman iteration [30], alternating directional method of multiplier (ADMM) [31], Douglas- Rachford splitting algorithm [32], and primal-dual algorithm [33] have been devel- oped to solve compressed sensing problems. However, the comprehensive coverages of these techniques deserves another review paper, so in this section we mainly review the ADMM algorithms which have been extensively used for compressed sensing MRI ever since its first introduction to compressed sensing MRI [34].

More specifically, we convert the problem (8) to the following constraint problem:

$$ \underset{\gamma, \mathrm{u}}{\min }\ \frac{1}{2}{\left\Vert y- A\gamma \right\Vert}^2+\uplambda {\left\Vert u\right\Vert}_1 $$

(9)

subject to

$$ \mathrm{u}=\Psi \upgamma . $$

(10)

Then, the associated ADMM is given by

$$ {\gamma}^{\left(k+1\right)}=\mathit{\arg}\ \underset{x}{\min }\ \frac{1}{2}{\left\Vert y- A\gamma \right\Vert}^2+\frac{\upmu}{2}{\left\Vert {\upzeta}^{(k)}+\Psi \upgamma -{\mathrm{u}}^{(k)}\right\Vert}^2 $$

$$ {u}^{\left(k+1\right)}=\mathit{\arg}\ \underset{u}{\min}\kern0.50em \uplambda {\left\Vert u\right\Vert}_1+\frac{\upmu}{2}{\left\Vert {\upzeta}^{(k)}+\Psi {\upgamma}^{\left(k+1\right)}-\mathrm{u}\right\Vert}^2 $$

$$ {\upzeta}^{\left(k+1\right)}={\upzeta}^{(k)}+{x}^{\left(k+1\right)}-{u}^{\left(k+1\right)}, $$

Now, each step of ADMM has a closed-form solution. Algorithm 1 summarises the resulting ADMM iteration, where shrink_{1} denotes the soft-tresholding:

$$ {shrink}_1\left(\mathrm{x},\uptau \right)=\operatorname{sgn}\left(\mathrm{x}\right)\ \max \left\{0,|\mathrm{x}|-\uptau\ \right\}, $$

(11)

for the threshold value τ > 0.

In addition to the analysis prior in (8), the total variation (TV) penalty has been also extensively used for imaging applications, because the finite difference operator

can sparsify smooth images. Specifically, the TV minimisation problem assumes the following form:

$$ \underset{x}{\min}\frac{1}{2}{\left\Vert y- Ax\right\Vert}^2+\uplambda \mathrm{TV}\left(\mathrm{x}\right), $$

(12)

where T V (x) is given by

$$ \mathrm{TV}\left(\mathrm{x}\right)={\left\Vert \nabla x\right\Vert}_{1,p}, $$

(13)

where ‖∙‖_{1, p}, *p* = 1, 2 deotes the l_{1}/l_{p}-mixed norm. In d-dimensional space (e.g. d = 2 for images), the discretized implementation can be defined as

$$ {\left\Vert \nabla x\right\Vert}_{1,p}={\sum}_{i=1}^n{\left\Vert \nabla x(i)\right\Vert}_p $$

where ∇x(i) ∈ R^{d} denotes the gradient of x at the i-th coordinate and n denotes the number of discretizated samples.

In order to apply ADMM for (12), we need to focus on the primal formulation of the total variation penalty:

$$ TV(x)={\left\Vert \nabla x\right\Vert}_{1,p}={\sum}_{i=1}^n{\left\Vert \nabla x(i)\right\Vert}_p $$

(14)

Now, we define a splitting variable u(i) = ∇x(i) ∈ R^{d}. Then, the constraint opti- mization formulation is given by

$$ \underset{x,{\left\{u(i)\right\}}_{i=1}^n}{\min}\frac{1}{2}{\left\Vert y- Ax\right\Vert}^2+\uplambda {\sum}_{i=1}^n{\left\Vert u(i)\right\Vert}_p $$

(15)

$$ \mathrm{Subject}\ \mathrm{to}\ \mathrm{u}\left(\mathrm{i}\right)=\nabla x(i),i=1,\cdots, n $$

(16)

Then, the associated ADMM is given by

$$ {x}^{\left(k+1\right)}=\mathit{\arg}\ \underset{x}{\min }\ \frac{1}{2}{\left|\left|y- Ax\right|\right|}^2+\frac{\upeta}{2}{\sum}_i{\left\Vert {\upzeta}^{(k)}(i)+\nabla x(i)-{\mathrm{u}}^{(k)}(i)\ \right\Vert}^2 $$

$$ {u}^{\left(k+1\right)}(i)=\mathit{\arg}\ \underset{u(i)}{\min}\kern0.50em \uplambda {\left\Vert u(i)\right\Vert}_p+\frac{\upeta}{2}{\left\Vert {\upzeta}^{(k)}(i)+\nabla {x}^{\left(k+1\right)}(i)-\mathrm{u}\left(\mathrm{i}\right)\ \right\Vert}^2 $$

$$ {\upzeta}^{\left(k+1\right)}(i)={\upzeta}^{(k)}(i)+\nabla {x}^{\left(k+1\right)}(i)-{u}^{\left(k+1\right)}\left(\mathrm{i}\right) $$

Each step has the closed form expression. Algorithm 2 summarises the TV-ADMM algorithm, where shrink_{vec,2}(x) for x ∈ R*d* denotes the vector shrinkage [34]:

$$ {shrink}_{vec,2}\left(x,\uptau \right)=\frac{x}{\left|x\right|}\max \left\{0,\left|x\right|-\uptau\ \right\},x\in {\mathrm{R}}^d $$

(17)

for the threshold value τ > 0.

### Basic MR ingredients for compressed sensing

In order to apply compressed sensing theory for specific imaging applications, the unknown signal should be sparse in some transform domain, and the sensing matrix should be sufficiently incoherent. This section shows why these conditions can be readily satisfied in MR imaging, which is one of the main reasons that allows for successful applications of CS theory to MR imaging.

#### Sparsity of MR images

An MR image is rarely sparse in its own. However, one of the important observations that led to the successful applications of CS to MR is that the sparsity is closely related to signal redundancies. This is because redundant signals can be easily converted to sparse signals using some transforms.

Basically, there are three major directions that have been investigated in com- pressed sensing MRI: 1) the spatial domain redundancy, 2) the temporal domain redundancy, and 3) the coil domain redundancy. For example, as shown in Fig. 1(a), natural images can be sparsely represented in finite difference or wavelet transform domain, although the image is not sparse in its own. This observation is the main idea that allows for total variation (TV) and wavelet transform approaches for image denoising, and reconstruction. Accordingly, TV and wavelets have been the main transforms that have been extensively used in most of the CS MRI researches. On the other hand, dynamic MR images such as cardiac cine, functional MRI, and MR parameter mapping have significant redundancy along the temporal dimension as shown in Fig. 1(b). For example, if the image is perfectly periodic, then temporal Fourier transform may be the optimal transform to sparsify the signal. However, in many dynamic MR problems, the temporal variations are dependent on the MR physics as well as specific motion of organs, so the analytic transform such as Fourier transform may not be an optimal solution, but more data-driven approaches such as PCA or dictionary learning are better options. Indeed, these observation leads to dictionary learning approaches that will be discussed later.

On the other hand, the coil redundancy is somewhat distinct compared to the spatial- and temporal- domain redundancy. As shown in Fig. 1(c), the redundancy of the coil image stems from the underlying common images, which results in cross- channel redundancies:

$$ {s}^j(r){\gamma}^i(r)={s}^i(r){\gamma}^j(r),\mathrm{i}\ne \mathrm{j},\forall \mathrm{r} $$

(18)

where *s*^{i} denotes the i-th coil sensitivity map and *γ*^{i} is the coil image. The rela- tionship in (18) is easy to show since the coil image is given by (2). The main idea of the classical parallel MRI is to exploit this relationship. Specifically, the sensi- tivity encoding (SENSE) [1] exploits the image domain redundancy described in (18), whereas the k-space domain approaches such as GRAPPA [3] exploits its dual relationship in the k-space:

$$ {\widehat{s}}^j(r)\ast {\widehat{\gamma}}^i(r)={\widehat{s}}^i(r)\ast {\widehat{\gamma}}^j(r),\mathrm{i}\ne \mathrm{j},\forall \mathrm{r} $$

(19)

where ∗ denotes the convolution, and \( {\widehat{s}}^j \), \( {\widehat{\gamma}}^j \)enote the Fourier transform of *s*^{j}and *γ*^{j}respectively. Later we will describe how these two expressions of the coil dimensional redundancies have been exploited in compressed sensing MRI.

#### Incoherent sampling pattern

In compressed sensing MRI, downsampling pattern is very important to impose the incoherence, but its realization is limited by MR physics. For example, in the 2-D acquisition, the readout direction should be fully sampled, so there is only freedom along the phase encoding direction in designing incoherence sampling patterns. Figure 2(a)-(c) shows the examples of realizable sampling patterns that have been exploited in the literature: a) cartesian undersampling, b) radial trajectory, and c) spiral trajectory. In particular, the radial and spiral trajectories, which have been studied even before the advanced of the compressed sensing [35], have more incoherent radial sampling patterns compared to the cartesian undersampling.

On the other hand, if we deal with 3-D imaging or dynamic imaging, there are more rooms for sampling pattern design, since there are two dimensional degree of freedom. Figure 2(d) shows an example of 2-D random sampling pattern. For the case of wavelet transform-based sparsity imposing prior, Lustig et al. [14] showed that the incoherence can be optimized by generalizing the notion of PSF to Transform Point Spread Function (TPSF). Specifically, TPSF measures how a single transform coefficient of the underlying object ends up influencing other transform coefficients of the measured undersampled object. To calculate the TPSF, a single point in the wavelet transform space at the i-th location is transformed to the image space and then to the Fourier space. Once the corresponding Fourier space data is subsampled by the given downsampling pattern, its influence in the wavelet transform domain can be calculated by taking inverse transform followed by the inverse wavelet trans- form. To have the best incoherence property, the side of TPSF should be as small as possible, such that the side lobe contribution can be removed by shrinkage op- eration. This was used as the main criterion for sampling pattern design.

#### Historical milestones

The earliest application of the compressed sensing for MRI was done by Lustig and his collaborators [14]. The application of CS to dynamic MRI was pioneered by Jung et al. [18] and Gamper et al. [36], which was later significantly improved by Jung et al. [20]. The positive results of these earlier works have resulted in flurry of new ideas in static and dynamic MRI.

In terms of combining CS with parallel imaging, the first application for static imaging was done by Block et al. [37] by combining total variation penalty and parallel imaging; and by Liang et al. [38] in more general form using SENSE. In dynamic imaging, the works by Jung et al. [18] and Feng et al. [39] are among the first that combine the SENSE type parallel imaging with the k-t domain compressed sensing. Combination with CS and parallel imaging using GRAPPA type constraints was pioneered by Lustig et al. [40]. Coil compression techniques have been also investigated to reduce the number of coils for CS reconstruction [41]. Feng et al. [42] later combined compressed sensing, parallel imaging, and golden-angle radial sampling for fast and flexible dynamic volumetric MRI, which has been approved for clinical use.

Aside from the standard l_{1} and TV penalty, several innovative sparsity inducing penalty have been used for compressed sensing MRI. For example, Trzasko et al. [43, 44] proposed a direct l_{0} mimimization approaches, whereas Knoll et al. [45, 46] proposed a generalized total variation approaches, and Sung et al. [47] combined the approximate message passing algorithm with parallel imaging. This idea was then extended to the dictionary learning [20, 48, 49] and motion compensation [20, 50]. Then, the idea of low-rank regularization was soon introduced [49, 51,52,53] thanks to the theoretical advances by Candes et al. [54]. The low-rank idea was further ex- tended to structured low-rank approach for parallel imaging [55] and single coil imaging with the finite support [56]. The duality between the compressed sensing and low-rank Hankel matrix approaches were discovered by Jin and his colleagues [57,58,59,60] and Ongie et al. [61]. In particular, the unified framework for compressed sensing and parallel MRI in terms of low-rank Hankel matrix approaches was presented by Jin et al. [57] and its theoretical performance guarantees was also given in [62].

In the following, we provide more detailed reviews of these historical milestones in compressed sensing MRI.