-
Notifications
You must be signed in to change notification settings - Fork 0
/
GangSong_WPEII_report.tex
1044 lines (841 loc) · 56.4 KB
/
GangSong_WPEII_report.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[letterpaper,12pt]{article}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{amsmath,rotating,array,multirow,colortbl,setspace,color}
\usepackage{lineno}
\usepackage{algorithm}
\title{Parametrization of Deformation Fields in Diffeomorphic Image Registration}
\author{Gang Song}
\date{}
\begin{document}
\maketitle
\begin{abstract}
\end{abstract}
\section{Introduction}
\label{sec:intro}
A diffeomorphism is a differentiable map with a differentiable
inverse. These maps form a Lie group on which a (right invariant)
metric may be induced. Both the metric structure and intrinsic
regularity of the diffeomorphism group are of great value in fields
that seek to (statistically) quantify shape and its natural variation.
One such field is medical imaging where numerous researchers employ
diffeomorphisms in the description and detection of disease and
development.
Purpose of image registration
small and large deformation
The registration energy: image term and regularization term, how they interact
How to optimize
% In this paper we discuss these papers:
% Beg's LDDMM demonstrates the successfulness of diffeomorphic image registration \cite{Beg2005Computing}.
% Vercauteren's Differomorphic Demons \cite{Vercauteren2007,Vercauteren2009} build upon Arsigny' Log-Euclidean framework \cite{Arsigny2006} (\textit{NEED TO CONFIRM}). Later, this approach is formulated more rigoriously \cite{Vercauteren2008Symmetric}, which works completely in the log-domain.
% A similar work was introduced by Ashburner \cite{Ashburner2007}, especially he introduced a fast numerical method to compute the exponential mapping.
% A recent improvement by Mansi \cite{Mansi2010} extends the deformation with incompressibility constraints, i.e. the divergence free. Also, interesting work \cite{Chen2010} from Chen independently proposed another more flexible way to control divergence as well as the curl of the deformation field.
\subsection{structure of the paper}
First, this report will illustrate several popular algorithms for
diffeomorphic image mapping. We describe the different variants by
contrasting the parameterization of the diffeomorphic transformation
model. We especially fill in the mathematical details, assuming a
common engineering background, to aid in a complete understanding of
the derivation of the most important equations.
Second, we compare the relationship between these algorithms and
discuss how they can be converted to each other by simplifying
assumptions or changing optimization scheme etc. We show that several
large deformation algorithms may be viewed as
extensions of a simple common image registration method, the demons algorithm. We summarize the connections between each model in a graph. Each connection will be illustrated in corresponding sections.
Last, we discuss various regularization approaches applied in these
methods and show how they can be unified under the same framework
using the Sobolev gradient. The regularization schemes reflect how the
gradient of the image similarity term is mapped to the transformation space.
% (Thirdly, we compare how each algorithm compute the inverse transform.)
% \section{mathematical notation}
\newcommand{\vect}[1]{\mathbf{#1}}
\newcommand{\field}[1]{\mathbf{#1}}
\newcommand{\image}[1]{#1}
\newcommand{\I}{\image{I}}
\newcommand{\J}{\image{J}}
\renewcommand{\u}{\vect{u}}
\renewcommand{\v}{\vect{v}}
\renewcommand{\c}{\vect{c}}
\newcommand{\h}{\vect{h}}
\newcommand{\w}{\vect{w}}
\newcommand{\myphi}{\phi}
\newcommand{\mypsi}{\psi}
\newcommand{\D}{D}
\renewcommand{\d}{\nabla}
\newcommand{\dd}{\text{d}}
\newcommand{\p}{\partial}
\renewcommand{\L}{\Delta} % laplacian
\newcommand{\R}{\mathbb{R}}
\newcommand{\myS}{S}
\newcommand{\myR}{R}
\newcommand{\myE}{E}
\newcommand{\ld}{\langle}
\newcommand{\rd}{\rangle}
\newcommand{\LL}{\mathcal{L}} % operator L
\newcommand{\tQ}{\mathcal{Q}}
\newcommand{\Id}{\text{Id}}
\newcommand{\tG}{{G}} % operator L
\newcommand{\Diff}{\text{Diff}}
\newcommand{\VV}{\mathcal{V}}
\newcommand{\opL}{\mathcal{L}}
\section{From Small Deformation to Large Deformation in Image Registration}
The goal of image registration is to find a viable transform from one image to the other. Such a transform defines pixel-wise correspondences between two images.
To simplify notation, we assume that any image $\I$ discussed here is a scalar function defined on the same spatial domain: $\I$ : $\Omega \subseteq \R^d \to \R$.
For example, a 3D image has $d=3$. A transform $\myphi$ is a mapping between two spatial domains: $\Omega \rightarrow \Omega$.
If one transform $\myphi$ is applied to an image $I$, a new warped
image is generated as the composition of two functions $\I \circ
\myphi$, which is a new function $\Omega \to \R$.
Given an template image $\I_0$ (i.e. \textit{the fixed image}) and an
individual image to be warped $\I_1$ (i.e. \textit{the moving image}),
the image registration problem optimizes a transform $\myphi$ such
that the warped moving image $\I_1 \circ \myphi$ is close to the fixed image $\I_0$ and $\myphi$ is smooth.
This is formulated as minimization of an energy function
$\myE(\myphi)$, which usually comprises two parts:
\begin{align}
\label{eq:energy}
\myE(\myphi) = \lambda_S \myS(\I_0,\I_1,\myphi) + \lambda_R \myR(\myphi),
\end{align}
in which $\lambda_S$ and $\lambda_R$ are two scalar weights. The first term in the right is an image similarity function $\myS(\I_0, \I_1, \myphi)$, evaluating the similarity between two images using the transform $\myphi$.
How to choose an appropriate similarity function is a fundamental problem in image registration, which, however, is not a topic in this report.
Here we adopt the sum of square difference (SSD) of two images as the
similarity function throughout this paper, for both its simplicity and
familiarity. One can write SSD as:
\begin{align}
\myS(\I_0, \I_1, \myphi) = \| \I_1 \circ \myphi - \I_0 \|^2.
\label{eq:SSD}
\end{align}
The second part is a transform regularization function $\myR(\myphi)$,
which is usually solely defined on the transform. This term is a prior
on the transform smoothness. For some specific parameterized transformation model, for example, the affine transformation or the B-spline transformation, the smoothness is implicitly encoded through corresponding parameterization forms.
However, for general non-rigid registration when no other information is available, the transform $\myphi$ is defined using a deformation vector field $\u$: $\Omega \to \R$,
\begin{align}
\myphi(x) = x + \u(x), \; x \in \Omega.
\end{align}
Such a parameterization is simply adding an identity transform with a vector field.
Regularization functions are normally defined on $\u$ instead of directly on $\myphi$.
Later we may simply denote a transform $\myphi$ using $\u$ for simplicity.
$\u(x)$ is a zero vector field when $\myphi(x)=x$.
This form is strongly connected to Hooke's law about the equilibrium position of elastic material (\textit{cite Gee's paper}).
\subsection{A First Example: Horn-Schunck Optical Flow}
\label{sec:opticalflow}
There are many choices of regularization terms. One simple model is to
constrain first order variation of the displacement field $\u$. This
is denoted as the membrane energy model \cite{Ashburner2007} and
provides one form of Tikhonov regularization \cite{Horn1981}:
\begin{align}
\label{eq:membrane}
\R(\u) = \frac{1}{2} \int_\Omega \sum_{i=1}^d\sum_{j=1}^d (\frac{\p \u_i(x)}{\p x_j})^2 \dd x
= \frac{1}{2} \int_\Omega \sum_{i=1}^d \| \d \u_i(x) \|^2 \dd x = \frac{1}{2} \int_\Omega \| \d\u(x)\|^2 \dd x.
\end{align}
Solving the Euler-Lagrangian equation of this functional gives the negative Laplacian function as its derivative:
\begin{align}
\label{eq:membraneEL}
\frac{\p R(\u)}{\p \u_i} = - \sum_{j=1}^d \frac{\p^2 \u_i(x)}{\p x_j^2}
= - \L \u_i(x)
, \; i=1 \cdots d \;.
\end{align}
When $\u$ is close to zero, the SSD function of Eq.\ref{eq:SSD} can be approximated using the first order Taylor expansion:
\begin{align}
\I_1 \circ (x + \u ) - \I_0 \approx (\I_1 - \I_0 ) + \ld u, \d I_1(x) \rd
\label{eq:approxopticalflow}
\end{align}
Combining it with the Tikhonov regularization leads to the overall energy function of Horn-Schunck optical flow method:
\begin{align}
\myE(\u) = \lambda_S \int_\Omega ((\I_1 - \I_0 ) + \ld \u, \d I_1 \rd)^2 \dd x
+ \lambda_R \int_\Omega \sum_{i=1}^d \| \d \u_i(x) \|^2 \dd x
\end{align}
The Euler-Lagrangian equations of $\myE(\u)$ at each pixel $x$ gives the linear system of the optimal point:
\begin{align}
\lambda_S (\I_1 - \I_0 + \ld \u, \d I_1(x) \rd) \frac{\p I_1(x)}{\p x_i}
- \lambda_R \L \u_i(x) = 0
, \; i=1 \cdots d
\label{eq:opticalflow}
\end{align}
An iterative optimization scheme was given in \cite{Horn1981}, which is essentially the Jacobi method of solving the large sparse linear system. Such a method only works for small $\u$. However we are going to reformulate it and explore some important ideas adopted in large deformation methods.
Notice that the first term in the left is the derivative of the image similarity function and the second term is the derivative of the regularization function, we actually have a general form of the functional derivative as:
\begin{align}
\d_\u \myE(\u) = \d_\u \myS(\u) + \d_\u \myR(\u).
\end{align}
Various optimization schemes can be derived from this representation. For example, the steepest gradient descent is an iterative scheme as $\u \leftarrow \u - \d \myE(u)$. By introducing the second order approximation of $myE(u)$ one can apply Gauss-Newton method, Levenberg-Marquardt algorithm, or the Gauss method when the second order derivative of $\myE$ is feasible. In this paper we leave out the discussion of different optimization stategies by focusing on the derivation of the gradient.
The optimal point can also be computed by iteratively solving the linear system $\d_u \myE = 0$.
Specifically if we denote $\tQ$ as an operator on $\u$: $\tQ(\u) = \d_\u \myR(\u)$, we have $\d_\u \myS(\u) + \tQ(\u) = 0$. $\u$ is solved as:
\begin{align}
\u = \tQ^{-1}(-\d_\u \myS(\u))
\label{eq:projection}
\end{align}
As the regularization $\myR$ is solely defined on $\u$, $\tQ^{-1}$ is
independent of choice of $\myS$. This formulation now can be viewed as
projecting of $-\d_\u \myS(\u)$ to a new vector space. Each iteration
consists of two steps. The first gives the initial estimate of the displacement from the image observation. The second regularizes the estimate by projecting to a smooth vector space. Such a two step approach is practically adopted in the optimization of many image registration algorithms and we will elaborate on this later.
For Horn-Schunck optical flow, $\tQ^{-1}$ is indeed the inverse
operator of Laplacian and thus filters out high-frequency components
of the deformation field $\u$ in the Fourier domain. Later we will
also analyze other transformation models through the perspective of
projecting the similarity function gradient to the vector space.
\begin{align}
& \image{I_1} - \image{I_0 \circ \varphi} \\
& \varphi(x) = x + \vect{u}(x) \\
& \varphi(x) = x + \exp(\vect{v}(x)) \\
& \I_0 - \I_1 \circ \myphi \\
& \myphi(x) = x + \u(x) \\
& \myphi(x) = x + \exp(\v(x)) \\
% & \D \myphi = \frac{\partial \myphi}{\partial x} \\
% & \D \myphi \circ \myphi(x,t) \\
% & \D \myphi | _{ \myphi(x,t) } \\
% & \D (g \circ h) = \D g |_{h(x)} \D h \\
% & \D (g \circ h) = \D g \circ {h(x)} \D h \\
& \D_{x} (g \circ h) = \D_{h(x)} g \, \D h \\
& |\D \myphi | \\
\end{align}
\subsection{Demons algorithm}
\label{sec:demons}
Before introducing image registration of large deformation, we briefly
go over another important small deformation registration method:
Thirion's demons algorithm \cite{Thirion98}. The demons algorithm is
one of the most efficient methods applied in medical image
registration and has many variants (refer to Vercauteren's Insight into efficient image registration Sec.1 pp2). It also serves as a baseline algorithm of two large deformation models which are discussed in this paper.
In Thirion's original paper \cite{Thirion98}, the demons algorithm considered the image registration as a diffusion process. The pixels of images were pushed by demons with local forces inspired by optical flow equations. However the diffusion scheme was described more or less like a heuristic procedure without a clear theoretical explanation. Cachier et al \cite{Cachier2003} justified the algorithm as an alternating minimization of an energy function. The idea is to introduce an auxiliary variable of $\c$ as the raw correspondences between pixels, which can be viewed as the observation of the underlying transform $\myphi$:
\begin{align}
\myE(\myphi,\c)= \frac{1}{\sigma_i^2}\| \I_1 \circ \c - \I_0 \|^2
+\frac{1}{\sigma_x^2} \|\c - \myphi\|^2
+\frac{1}{\sigma_T^2} \int_\Omega \| \d\u(x)\|^2 \dd x,
\end{align}
where $\sigma_i$ accounts for the noise on the image intensity, $\sigma_x$ accounts for a spatial uncertainty on the correspondences and $\sigma_T$ controls the amount of regularization. Now the demons algorithm is explained as an alternating optimization over $c$ and $\myphi$. The first step solves for $\c$ by optimizing $\myS(\c) = \frac{1}{\sigma_i^2}\| \I_1 \circ \c - \I_0 \|^2 +\frac{1}{\sigma_x^2} \|\c - \myphi\|^2 $ given current $\myphi$.
This is similar to the optical flow equation with the constraint that $c$ should be close to $\myphi$.
The second step solves for $\myphi$ by optimizing $\myR(\myphi) = \frac{1}{\sigma_x^2} \|\c - \myphi\|^2+\frac{1}{\sigma_T^2} \int_\Omega \| \d\u(x)\|^2 \dd x$ with given $\c$.
This is actually the projection step we discussed in the last
section. $c$ and $\myphi$ are typically initialized as the identity transform.
Since $\c$ and $\myphi$ have to be close, let $\c = \myphi + \w$ . The minimization of $\myS(\c)$ here used the same first order expansion of $I_1 \circ \c $ as in the optical flow algorithm:
\begin{align}
\myS(\c) = \myS(\myphi+\w) \approx
\frac{1}{\sigma_i^2} ( \I_1 \circ \myphi - \I_0 + \ld \w, \d_{\myphi} \I_1 ) \rd )^2
+ \frac{1}{\sigma_x^2} \| \w \|^2,
\end{align}
where $\d_{\myphi}\I_1$ means $(\d\I_1)(\myphi)$. The optimal $\w$ is now given by
\begin{align}
\left( \d_{\myphi}\I_1 \d_{\myphi}^T \I_1
+ \frac{\sigma_i^2}{\sigma_x^2}
\right) \w
+ (\I_1 \circ \myphi - \I_0 )\d_{\myphi}\I_1 = 0
\end{align}
Using the Sherman-Morrison formula, the solution yields one choice of demon displacement $w$ as:
\begin{align}
\w = -\frac{\I_1 \circ \myphi - \I_0 }
{\|\d_{\myphi}\I_1\|^2 + {\sigma_i^2}/{\sigma_x^2}}
\d_{\myphi}\I_1
\label{eq:adddemonforce}
\end{align}
Optimizing $\myphi$ (or $\u = \myphi - x$) given the auxiliary variable $\c$ is a quadratic form:
\begin{align}
\myR(\myphi) = \frac{1}{\sigma_x^2} \|\c - \myphi\|^2+\frac{1}{\sigma_T^2} \int_\Omega \| \d\u(x)\|^2 \dd x,
\end{align}
which has a closed form solution. Using the Euler-Lagrangian equation yields the optimal $\u$ given $\c$ as:
\begin{align}
\u = \left( \Id - \frac{\sigma_x^2}{\sigma_T^2} \L \right)^{-1} \left( \c - x \right),
\end{align}
where $\Id$ is the identity operator. Comparing this with
Eq.\ref{eq:projection}, now it is clear that the regularization step
is also projecting the auxiliary variable $\c-x$ using a smoothing
operator $\tQ^{-1} = \left( \Id - {\sigma_x^2}/{\sigma_T^2} \L \right)^{-1}$.
Thirion's demons algorithm applies Gaussian smoothing for efficient
computation: $\tQ^{-1} \u = \tG \ast \u$, where $\tG$ is a Gaussian
kernel and $\ast$ represents image convolution. For now let us simply
assume that $\left( \Id - {\sigma_x^2}/{\sigma_T^2} \L \right)^{-1} =
\tG$. We will come back to the exact regularization terms corresponding to Gaussian smoothing later.
Now we summarize the demons algorithm as:
\begin{algorithm}
\caption{Additive Demons Algorithm}
\label{alg:additivedemons}
\begin{enumerate}
\item{Given current transformation $\myphi$, compute the correspondence update field $\w$ using Eq.\ref{eq:adddemonforce}.
}
\item{Additive update: $\c \leftarrow \myphi + \w$.
}
\item{Regularization: $\myphi \leftarrow \Id + \tG \ast (\c - \Id)$. Go back to 1.
}
\end{enumerate}
\end{algorithm}
\subsection{Compositive Demons and Inverse Transform}
\label{sec:compositivedemons}
The additive update performs a gradient descent on $\w$ in the vector
space $\R^d$. Another way to combine to $\myphi$ and $\w$ is through composition:
\begin{align}
\c = \myphi \circ \left( \Id + \w \right)
\label{eq:demoncomp}
\end{align}
Its first order approximation at $\w \to 0$ is similar to the additive one: $\c = \myphi + \d\myphi\cdot\w$. The corresponding similarity term is:
\begin{align}
\myS(\c) = \myS(\myphi \circ (\Id + \w) ) \approx
\frac{1}{\sigma_i^2} ( \I_1 \circ \myphi - \I_0 + \ld \w, \d (\I_1 \circ \myphi ) \rd )^2
+ \frac{1}{\sigma_x^2} \| \w \|^2,
\end{align}
and the demons force $\w$ is:
\begin{align}
\w = -\frac{\I_1 \circ \myphi - \I_0 }
{\|\d (\I_1 \circ \myphi )\|^2 + {\sigma_i^2}/{\sigma_x^2}}
\d (\I_1 \circ \myphi )
\label{eq:compdemonforce}
\end{align}
We list the compositive demons algorithm in Alg. \ref{alg:compdemons}.
If we set $\sigma_i$ as a pixel-wise function: $\sigma_i(x) = \I_1 \circ \myphi - \I_0 $, Eq. \ref{eq:compdemonforce} is close to the form of the original demons force in \cite{Thirion98}. There are other variants of demons forces by choosing different forms of $\myS$ and various second-order optimization schemes.
\begin{algorithm}
\caption{Compositive Demons Algorithm}
\label{alg:compdemons}
\begin{enumerate}
\item{Given current transformation $\myphi$, compute the correspondence update field $\w$ using Eq.\ref{eq:compdemonforce}.
}
\item{Additive update: $\c \leftarrow \myphi \circ \left( \Id + \w \right)$.
}
\item{Regularization: $\myphi \leftarrow \Id + \tG \ast (\c - \Id)$. Go back to 1.
}
\end{enumerate}
\end{algorithm}
The compositive demons requries to compute the transform composition $\myphi \circ \left( \Id + \w \right)$ instead of simply adding $\myphi + \w$. It also computes the gradient of the warped image $\I_1$ as $\d (\I_1 \circ \myphi)$ (Eq. \ref{eq:compdemonforce}). In contrast, the additive update needs to warp the gradient image: $\d_{\myphi}\I_1$ (Eq. \ref{eq:adddemonforce}), which may have higher numerical error since the gradient image is usually a sparse image.
A more important difference between the compositive and the additive update schemes is related to the invertibility of the transforms.
When $\w$ is small, it is approximated that $\Id+\w$ is invertible
with $(\Id+\w)^{-1}=\Id-\w$. But this is generally false for large deformation $\w$.
In the additive update scheme $\c \leftarrow \myphi + \w$, $\c$ usually quickly becomes non-invertible after several iterations. The adding operation in the vector space does not preserve invertibility.
On the other side, the composition of two invertible transforms is still invertible.
This implies that the compositive scheme is able to yield an invertible transform when $\w$ is small in each iteration, which was validated in \cite{Vercauteren2009}.
This means that a large and invertible transform is able to be generated from the composition of many small deformation fields. In the following sections we are going to formalize this intuition and
reveal this as the background for many large deformation transformation models.
\subsection{Diffeomorphism Group}
\label{sec:diffgroup}
The small deformation algorithms we discussed before parametrize the
transform as $\myphi(x) = x + \u(x)$. Unless $u(x)$ is close to
zero, such a form is not necessarily invertible and in general cannot
preserve topology. In many applications, the nonrigid transform
between two images are very large, where 'large' is defined in a
historical context, that is, relative to transformations typically
captured by elastic or demons type algorithms. Traditionally, these
small deformation algorithms could not both explicitly enforce a
one-to-one mapping and provide a high-quality registration
solution. It is therefore necessary to exploit the large deformation
framework which can preserve topology while capturing a wide range of transformations.
Large deformation is conveniently represented as a diffeomorphism in
mathematics. A diffeomorphism $\myphi$ is a globally one-to-one
continuous and smooth mapping with a continuous and smooth inverse. In other words, $\myphi^{-1}$ exists and both $\myphi$ and $\myphi^{-1}$ are invertible. The diffeomorphism forms a group $\Diff$ under the composition operation, i.e. $\myphi_1 \circ \myphi_2 \in \Diff$ when $\myphi_1, \myphi_2 \in \Diff$:
\begin{align}
\Diff = \{ \myphi: \Omega \to \Omega | \myphi \text{ and } \myphi^{-1} \text{ are differentiable }\}.
\end{align}
Suppose there are a series of small deformations $(\mypsi_k \in \Diff)_{0 \leq k \leq n}$, we recursively define $\myphi_0 = \Id$ and $\myphi_{k+1} = \myphi_{k} \circ \mypsi_{k}$. $(\myphi_{k})$ forms a polygonal line in $\Diff$. Note that the solution of compositive demons falls into this representation. In a continuous setting this polygonal line become a curve: $\myphi(x, t), \, 0 \leq t \leq 1$. Here $t$ is a time variable and $\myphi(x, t)$ is the position of one particle whose position was $x$ when $t=0$.
When $\myphi(x, t)$ is differentiable in $t$, we have the o.d.e that generates a diffeomorphism:
\begin{align}
\frac{\dd}{\dd t} \myphi(x, t) = \v(\myphi(x, t), t),
\label{eq:ode}
\end{align}
where $\v$ satisfies continuity conditions in order to guarantee the
existence of the solution. But it is important to point out that
$\v(x,t)$ does not need to be invertible w.r.t $x$.
By introducing the time variable $t$, we can define a curve of transformations between two images $\I_0$ and $\I_1$. Define $\myphi_t$ as the transform time $t$ and similarly for $\v_t$:
\begin{align}
& \myphi_t(x) = \myphi(x, t) \\
& \v_t(x) = \v(x, t) \\
\end{align}
At time $t=0$, $\myphi_0 = \Id$; at time $t=1$, we get the final
transform $\myphi = \myphi_1$. Since $\myphi_t$ has an inverse transform, we can also define the transform between two time points $\myphi_{s,t}$ as:
\begin{align}
\myphi_{s,t} = \myphi_t \circ (\myphi_s)^{-1}.
\end{align}
We immediately have the following properties:
\begin{align}
& \myphi_{t, s} = \myphi_{s, t}^{-1} \\
& \myphi_{t, s} = \myphi_{r, t} \circ \myphi_{s, r} \\
& \myphi_1 = \myphi_{0, 1} \\
& \myphi_1^{-1} = \myphi_{1, 0}
\label{eq:invphi}
\end{align}
The o.d.e Eq. \ref{eq:ode} gives an important way to parameterize
$\myphi$ with the time-variant velocity fields $\v$. Instead of only
modeling the target transform $\myphi_{0,1}$, such a parameterization
encodes the whole temporal path that determines how one image $I_1$ is
deformed into another image $I_0$. Since $\myphi_t$ is a
diffeomorphism, this path is an invertible one-to-one mapping. An
invertible transform $\myphi_1$ can be constructed from a time-variant
velocity vector fields $\v_t$, which does not need to be
invertible. Such a parameterization of $\myphi$ avoids handling the existence of $\myphi^{-1}$ directly, which is a desirable property. In the following sections we discuss in detail how different approaches compute $\v$ in the setting of image registration. We show the relationship between different diffeomorhpism approaches and small deformation algorithms by deriving them from each other.
\subsection{Connection Between Small deformation and Diffeomorphism}
We already saw that the compositive demons is close to large deformation image registration. It is helpful to view Eq. \ref{eq:ode} by discretization of $t$. Suppose we discretize $\myphi_t, \, 0\leq t \leq 1$ into a series of $(\myphi_i)_{0 \leq i \leq N}$, at time $t$:
\begin{equation}
\begin{split}
& \frac{\dd}{\dd t} \myphi_t = \v_t(\myphi_t) \; , \\
& \frac{\myphi_{i+1}-\myphi_{t}}{\delta t} \approx \v_t(\myphi_t) \; , \\
& \myphi_{i+1} = \myphi_{i} + \delta t \v_t \circ \myphi_t \; , \\
& \myphi_{i+1} = (\Id + \delta t \v) \circ \myphi_t \; .\\
\end{split}
\label{eq:discretizeode}
\end{equation}
Let $\w = \delta t \v$ and we have:
\begin{align}
\myphi_{i+1} = (\Id + \w) \circ \myphi_i
\end{align}
Comparing with the composition used in Eq. \ref{eq:demoncomp} we see
the difference is whether the small deformation approximation occurs before or after the last transform $\myphi_i$. The advantage of Eq. \ref{eq:demoncomp} is that it is easier to compute the derivative of $\d_w(\myphi \circ (\Id + \w)) $; but physically it is more natural to compose the update $\w$ after the current transform $\w$. We will discuss the derivation from Beg. et al \cite{Beg2005Computing} in the Sec. \ref{sec:lddmm}.
Another important fact about this discretization is that although $\Id + \v$ is not necessarily invertible, $\Id + \delta t \v$ is usually invertible for a small scalar $\delta$ with $(\Id + \delta t \v)^{-1} \approx \Id - \delta t \v$. Thus $\myphi_N$ is also invertible. We will revisit this in Ashburner's method \cite{Ashburner2007} and Vercauteren's methods \cite{Vercauteren2009, Vercauteren2008Symmetric}.
Theoretically we only require $\v$ to be continuous to satisfy the
existence of $\myphi$; it is more desirable that it is also smooth
both for numerical issues and to provide an additional smoothness
control on $\myphi$. The smoothness constraint we discussed before about the small deformation $\w$ is now equivalent to the smoothness of the velocity field $\v$. The regularization term $R(\u)$ where $\u = \myphi - \Id$ naturally becomes a function on $\v$: $R(\v)$.
\section{Smooth Vector Fields and Sobolev Gradient}
Before we discuss various forms of parameterization $\v$ of
diffeomorphism $\myphi$, it is necessary to discuss the regularization
on the vector fields $v$. A general prior about the final solution of
the optimal transform is that it has to be a smooth one. The main aim
of regularization in diffeomorphic image registration is to guarantee
smoothness of $\v$ under noisy imaging conditions and thus implicitly
of the resulting transform $\myphi$. Velocity field regularity also
improves stability of the numerical integration scheme for both the
forward and the inverse transform.
The regularization term is an important factor not only in formulating
the registration energy function itself, but also in choosing the
corresponding optimization schemes. A vector field $\v$ is usually
defined in an Euclidean space $\L^2$ of square integrable
functions. However, this vector space does not have any smoothness
constraints. Hilbert space was introduced in the image registration domain by many researchers \cite{Trouve1998, Beg2005Computing, Hernandez2008, Zikic2010}. Hilbert space can be viewed as a space of smooth functions. This section summarizes the theoretical analysis of regularization as minimizing the image similarity directly in Hilbert space. Using the notion of optimization in Hilbert space we show that various diffeomorphism optimization schemes are unified in the same framework in the later sections.
\subsection{Examples of Regularization Terms and Their optimization Operators}
\label{sec:regularizerexample}
The smoothness of a vector field $\v$ is usually defined using its $k$-th order spatial derivatives. Before the discussion of Hilbert space, it is helpful to revisit the formulation of several common regularization terms used in the literature \cite{Horn1981,Beg2005Computing,Ashburner2007,Vercauteren2009} $\myR(\v)$ and their derivatives $\p\myR/\p\v$, which all can be derived using Euler-Lagrangian equations. Note that it requires the $k$-th order derivatives of $\v$ vanish at the boundary of the domain $\Omega$.
\subsubsection*{Membrane Model}
In Sec. \ref{sec:opticalflow} we already went over the membrane model:
\begin{align}
\label{eq:membraneA}
& \myR(\v) = \frac{1}{2} \int_\Omega \sum_{i=1}^d \| \d \v_i(x) \|^2 \dd x \;, \\
\label{eq:membraneAEL}
& \frac{\p \myR(\v)}{\p \v_i} = - \L \v_i(x)
, \; i=1 \cdots d \;.
\end{align}
\subsubsection*{Bending Energy Model}
\begin{align}
& \myR(\v) = \frac{1}{2} \int_\Omega \sum_{i,j,k=1}^d
\left(
\frac{\p^2 \v_i(x)}{\p x_j \p x_k}
\right)^2 \dd x
\\
\label{eq:dbending}
& \frac{\p \myR(\v)}{\p \v_i}
=
\sum_{j,k=1}^d
\frac{\p^4 \v_i(x)}{\p x_j^2 \p x_k^2}
=
\L^2 \v_i(x)
, \; i=1 \cdots d \;.
\end{align}
\subsubsection*{Laplacian model}
\begin{align}
& \myR(\v) = \frac{1}{2} \int_\Omega \sum_{i=1}^d (\L \v_i(x))^2 \dd x
= \frac{1}{2} \int_\Omega \sum_{i=1}^d
\left(
\sum_{j=1}^d
\frac{\p^2 \v_i(x)}{\p x_j^2}
\right)^2 \dd x
\\
\label{eq:dlaplacian}
& \frac{\p \myR(\v)}{\p \v_i}
=
\sum_{j,k=1}^d
\frac{\p^4 \v_i(x)}{\p x_j^2 \p x_k^2}
=
\L^2 \v_i(x)
, \; i=1 \cdots d \;.
\end{align}
Note that the Laplacian model and Bending model \cite{Ashburner2007}
actually share the same derivation (comparing Eq. \ref{eq:dbending} and Eq. \ref{eq:dlaplacian}).
\subsubsection*{Cauchy-Navier Model}
\label{sec:cauchynaviermodel}
This is used in Beg's paper \cite{Beg2005Computing} and it is a
variant of the Laplacian model. With a little abuse of notation, define the operator $\opL$ as $\opL = \gamma \Id - \alpha \L$:
\begin{align}
& \myR(\v) = \frac{1}{2} \int_\Omega \| \opL \v \| ^2 \dd x
= \frac{1}{2} \int_\Omega \sum_{i=1}^d
\left(
\gamma \v_i(x) - \alpha \sum_{j=1}^d \frac{\p^2 \v_i(x)}{\p x_i^2}
\right)^2 \dd x
\\
\end{align}
\begin{equation}
\label{eq:dBeg}
\begin{split}
\frac{\p \myR(\v)}{\p \v_i}
& =
\gamma^2 \v_i^2 - 2 \gamma \alpha \sum_{j=1}^d{\frac{\p^2 \v_i(x)}{\p x_i^2}}
+ \alpha^2 \sum_{j,k=1}^d{\frac{\p^4 \v_i(x)}{\p x_j^2 \p x_k^2}}
\\
& = (\gamma \Id - \alpha \L)^2 \v_i \\
& = L^2 \v_i
, \; i=1 \cdots d \;.
\end{split}
\end{equation}
\subsubsection*{Tikhonov Regularization and Gaussian Smoothing}
\label{sec:gaussian}
In Sec. \ref{sec:demons} we conveniently explain how the Gaussian
smoother $\tG$ acts as an inverse operator: $\tG = \left( \Id -
{\sigma_x^2}/{\sigma_T^2} \L \right)^{-1}$ for some $\L$. Strictly speaking, $\tG$ relates to the general Tikhonov Regularization \cite{Nielsen1997,Mansi2010}\footnote{Note: A typo is in \cite{Mansi2010}: the subscript of the equation in Sec 2.2 should be $1 \leq i_1, \cdots, i_k \leq d$, not $i_1 + \cdots + i_k=k$}:
\begin{align}
\myR(\v) = \frac{1}{2} \int_\Omega \sum_{i=1}^d
\v_i^2(x) +
\sum_{k=1}^{\infty}
\left(
\sum_{1 \leq j_1, \cdots, j_k \leq d}
\frac{1}{\sigma^{2k} k!} \,
\left(
\frac{\p^k \v_i(x)}{\p x_{j_1} \cdots \p x_{j_k}}
\right)^2
\right)
\dd x
\end{align}
By applying Euler-Lagrangian, we have:
\begin{equation}
\begin{split}
\frac{\p \myR}{\p \v_i} & = \v_i
+ \sum_{k=1}^{\infty}
\sum_{1 \leq j_1, \cdots, j_k \leq d}
\frac{(-1)^k}{\sigma^{2k}k!}
\frac{\p^{2k}}{\p x^2_{j_1} \cdots \p x^2_{j_k}} \v_i \\
& =
\v_i
+ \sum_{k=1}^{\infty}
\frac{(-1)^k}{\sigma^{2k}k!}
\left(
\sum_{j=1}^{d}
\frac{\p^{2}}{\p x^2_{j}}
\right)^k \v_i
\\
& =
\left(
\Id + \sum_{k=1}^{\infty}
\frac{(-1)^k}{\sigma^{2k}k!}
\L^k
\right) \v_i \;.
\end{split}
\end{equation}
When $k$ only goes to 1, not $\infty$, $\p \myR / \p \v_i$ becomes $(\Id - 1/\sigma^2 \L)$, which was used in Sec. \ref{sec:demons}. Using the notation of operator $\tQ$ in Eq. \ref{eq:projection}, we can now compute its inverse operator $\tQ^{-1}$ from the Fourier transform of the vector field $\v_i(x)$. Let $\mathcal{F}$ denote the Fourier transform, $\omega$ be the frequency variable, and $\mathcal{F}(\v_i(x)) = \hat{\v}_i(\omega)$. We have
\begin{equation}
\begin{split}
\mathcal{F}\left(
\L^k \v_i(x)
\right)
& = (-1)^k \omega^{2k} \hat{\v}_i(\omega)
\\
\mathcal{F}\left( \tQ \v_i(x) \right)
& = \mathcal{F}\left(
\left(
\Id + \sum_{k=1}^{\infty}
\frac{(-1)^k}{\sigma^{2k}k!}
\L^k
\right)
\v_i(x)
\right)
\\
& =
\sum_{k=0}^{\infty}
\frac{\omega^{2k}}{\sigma^{2k} k!} \hat{\v}_i(\omega)\\
& =
\exp \left(
\frac{\omega^2}{\sigma^2}
\right) \hat{\v}_i(\omega)
\end{split}
\end{equation}
To get the Fourier Transform of operator $\tQ^{-1}$, we simply compute the inverse of the coefficient at each $\xi$:
\begin{equation}
\begin{split}
\mathcal{F}\left( \tQ^{-1} \v_i(x) \right)
& =
\exp \left(
- \frac{\omega^2}{\sigma^2}
\right) \hat{\v}_i(\omega)
\end{split}
\end{equation}
Now it is clear that the inverse Fourier transform leads to the operator as Gaussian convolution smoother $G$.
\subsection{Sobolev Norm and Sobolev Space}
The various examples of regularization terms $\myR(\v)$ showed in the previous Sec. \ref{sec:regularizerexample} all share the same quadratic form of $k$-th order derivatives.
\begin{align}
\myR(\v) = \sum_{i=0}^K \lambda_i \ld \v^{(i)}, \v^{(i)} \rd \;,
\end{align}
where $v{(i)}$ is the $i$-th order derivative of $v$. For simplicity, we omit the scalar coefficient $\lambda_i$ in this section. Following the discussion in \cite{Zikic2010} {cite: miccai sobolev and original Neuberger}, using the differential operator $\opL = (\D^0, \cdots, \D^k)$ and the $L^2$ norm $\| \cdot \|_{L^2} = \ld \cdot, \cdot \rd$, the Sobolev norm $\ld \cdot, \cdot \rd_{H^k}$ is defined as:
\begin{align}
\ld \v, \v \rd_{H^k} = \sum_{i=0}^k \ld \v^{(i)}, \v^{(i)} \rd_{L^2} = \ld \opL \v, \opL \v \rd_{L^2} \;.
\end{align}
The quadratic form requires that the square of up to the $k$-th order derivative of smooth function $\v$ is integrable. Thus we formalize it as $v$ belongs to the Sobolev space $H^k$: $H^k = \{ f : \| f \|_{H^k} < \infty \}$. For simplicity, we omit $k$ in $H^k$. The inner product in Sobolev space is defined as:
\begin{align}
\ld \u, \v \rd_H = \ld \opL\u, \opL\v \rd_{L^2} = \ld \opL^+\opL \u, \v \rd_{L^2} \;,
\end{align}
where $\opL^+$ is called as the adjoint operator of $\opL$.
Now we can simply write the regularization term $\myR(\v)$ as: $\myR(\v) = \| \v \|_{H^k}$. Moreover the derivative of $\myR(\v)$ has the form:
\begin{align}
\d_\v \myR = \opL^+\opL \v = \left ( \sum_{i=0}^k (-1)^i \L^i \right ) \v\;.
\end{align}
\subsection{Registration in Sobolev Space}
Now we reexamine the image registration energy function Eq. \ref{eq:energy}
\begin{align}
\tag{\ref{eq:energy}}
\myE(\v) = \lambda_S \myS(\I_0,\I_1,\v) + \lambda_R \myR(\v),
\end{align}
Using the notion of the smooth function space $H$, the regularization term $\myR(\v)$ can be replaced by the constraint that $\v \in H$:
\begin{align}
\label{eq:sobolevenergy}
\myE(\v) = \myS(\I_0,\I_1,\v) \text{ , with } \v \in H \; .
\end{align}
Now the energy is the image similarity term only. Accordingly the optimization needs to be computed in Sobolev space to make sure that at each iteration step $\v$ is still in $H$. Sobolev gradient $(\d_\v E)_H$ is used to replace the previous $L^2$ gradient $(\d_\v E)_{L^2}$. We derive their relationship using the first order Taylor expansion with Sobolev inner product:
\begin{equation}
\begin{split}
\myE(\v + \h)
& \approx \myE(\v) + \ld (\d_\v\myE)_{L^2} , \h \rd_{L^2}
\\
& = \myE(\v) + \ld (\d_\v\myE)_{H} , \h \rd_H
\\
& = \myE(\v) + \ld \opL^+\opL (\d_\v\myE)_{H} , \h \rd_{L^2}
\end{split}
\end{equation}
Thus we can express the Sobolev gradient in terms of the $L^2$ gradient as:
\begin{align}
\label{eq:sobolevgradient}
(\d_\v\myE)_{H} = (\opL^+\opL)^{-1} (\d_\v\myE)_{L^2}
\end{align}
The projecting idea we discussed in deriving Horn-Schunck optical flow Sec. \ref{sec:opticalflow} is now also validated using the Sobolev gradient by comparing Eq. \ref{eq:sobolevgradient} with Eq. \ref{eq:projection}. Also from Sec. \ref{sec:gaussian} we know that the Gaussian smoothing is the inverse operator for the gradient of the first-order Tikhonov regularization. This validates the Gaussian smoothing step used in the demons algorithm Alg.\ref{alg:additivedemons} and \ref{alg:compdemons}. Thus we conclude that various registration algorithms we have discussed can be unified as minimizing the image similarity using the Sobolev gradient on the transform parameter $\v$, which is also true for the diffeomorphic approaches from Beg et al \cite{Beg2005Computing}, Vercauteren et al \cite{Vercauteren2007, Vercauteren2008Symmetric}.
\subsubsection{Sobolev Gradient and Preconditioning}
\label{sec:sobolevprecondition}
Another interesting property about the Sobolev gradient was discussed in \cite{Zikic2010}. If the operator $\opL$ is written as $\opL = (\Id, \opL_1)$, we get:
\begin{align}
\opL^+\opL = \Id + \opL_1^+ \opL_1
\end{align}
Eq. \ref{eq:sobolevgradient} now can also be explained using $\left(
\Id + \opL_1^+ \opL_1 \right)^{-1}$ as a preconditioner for the
$L^2$ gradient of $(\d_\v\myE)_{L^2}$. This connects the Sobolev
gradient with various Gauss methods with preconditioner like the
Gauss-Newton method and the Levenberg-Marquardt algorithm. Thus we can also unify the Levenberg-Marquardt algorithm applied in Ashburner's method \cite{Ashburner2007}.
\section{Differomorphism with Time-varying Velocity Fields}
\label{sec:lddmm}
We now have formalized the unifying registration framework using the Sobolev gradient. From this section, we study several forms of parameterization $\v$ applied in large deformation algorithms and discuss the connections among them and small deformation algorithms.
Small deformation transforms normally use a small displacement field $\u$ to model the mapping $\myphi = x + \u$. In contrast, a large deformation transform introduce an extra time variable $t$ to encode the warping path $\myphi(x,t)$ between two images. The first parameterization is the o.d.e Eq. \ref{eq:ode}:
\begin{align}
\tag{\ref{eq:ode}}
\frac{\dd}{\dd t} \myphi_t(x) = \v_t(\myphi_t(x)),
\end{align}
Optimizing the diffeomorphism transform $\myphi$ is equivalent to optimizing the time-varying velocity field $\v_t$.
\subsection{Beg's LDDMM algorithm}
Beg et al. proposed Large Deformation Diffeomorphic Metric Mapping (LDDMM) algorithm, which used this o.d.e as parameterization to optimize the following energy function:
\begin{align}
\myE(\v) = \frac{1}{2} \| I_0 \circ \myphi_{1,0} - I_1 \|^2
+
\frac{\sigma^2}{2}\int_{0}^{1} \| \opL v_t\|^2 \dd t \;,
\label{eq:lddmm}
\end{align}
where $\myphi_{1,0}$ is defined by Eq. \ref{eq:invphi} in
Sec. \ref{sec:diffgroup}, the operator $\opL$ represents the
Cauchy-Navier regularizer $\myR(\v)$: $\opL = \gamma \Id - \alpha \L$
in Sec. \ref{sec:cauchynaviermodel} with corresponding Hilbert space
$H$. Note that we omit the integral over the spatial domain $\int_x
\cdot\, \dd x$ for simplicity since $\v$ can be reinterpreted as a
vector with an element for each pixel. The image similarity term is
then $\myS(\v) = \frac{1}{2} \| I_0 \circ \myphi_{1,0} - I_1 \|^2 $.
Beg et al gave a rigorous way in computing the derivative of $\d_\v E$ in \cite{Beg2005Computing}. The key in the proof is to introduce the Gateaux variation of $\myphi_{s,t}$ w.r.t $\v$ (Lemma 2.1 from \cite{Beg2005Computing}). A small perturbation of $\v$ at time $r$ (i.e. $h_r$) affects all the transform $\myphi_t$ for $t>r$ cumulatively.
We wrote this without proof here:
\begin{align}
\p_\h \myphi_{s,t} = \D\myphi_{s,t}
\int_{s}^{t}
\left(
\D \myphi_{s,r}
\right)
^{-1} \h_r \circ \myphi_{s, r} dr
\end{align}
Although it is hard to define the Fr{\'e}chet derivative for $\myphi$, which means that the simple chain rule could not be applied directly to compute $\d_\v \myS$, Beg however showed that $\p_\v \myS$ still has a nice structure related to the F\'echet derivative in the proof of Theorem 2.1 in \cite{Beg2005Computing}:
\begin{align}
\p_\h \myS(\v) = - \int_{0}^{1}
\ld
|\D\myphi_{t,1}|
(\J_t^0 - \J_t^1)
\D(\J_t^0)
,
\h_t
\rd
\dd t \; ,
\label{eq:lddmmds}
\end{align}
where $\J_t^0 = \I_{0}\circ \myphi_{t,0}$, $\J_t^1 = \I_{1}\circ \myphi_{t,1}$, $D$ is the Jacobian matrix and $\| \cdot \|$ is the determinant value of the matrix.
For the regularization term, the Gateaux variation is easy to compute:
\begin{align}
\p_\h \myR(\v) = \sigma^2 \int_{0}^{1}
\ld
\opL^+\opL \v_t, \h_t
\rd \dd t
\end{align}
Now the overall Gateaux variation $\p_h \myE(\v)$ can be represented as:
% \begin{align}
$\p_\h \myE(\v) = \int_0^1
\ld
\d_{\v_t} \myE_t , \h_t
\rd \dd t $
% \; ,
% \end{align}
and $\d_{\v_t} \myE_t$ is defined as:
\begin{align}
\d_{\v_t} \myE_t = \opL^+\opL (\sigma^2 \v_t) - |\D\myphi_{t,1}|(\J_t^0 - \J_t^1)\D(\J_t^0)
\label{eq:lddmmgrad}
\end{align}
Beg et al proposed to use the Sobolev gradient instead of this $L^2$ gradient. From Eq. \ref{eq:sobolevgradient}, we get the gradient:
\begin{align}
(\d_{\v_t} \myE_t)_H = \sigma^2 \v_t - (\opL^+\opL)^{-1}(|\D\myphi_{t,1}|(\J_t^0 - \J_t^1)\D(\J_t^0))
\end{align}
\subsection{Numerical Algorithm of LDDMM}
\label{sec:lddmmalg}
In its numerical implementation, the time-varying velocity fields are discretized into $N$ time points $(v_{t_i} )_{0 \leq i \leq N-1}$. For each time point $i$, a standard steepest descent scheme is applied:
\begin{align}
\v_{t_i} \leftarrow \v_{t_i} - \epsilon (\d_{\v_{t_i}} \myE_{t_i})_H \; .
\end{align}
The inverse of $\opL^+\opL$ in the Soblev gradient can be computed from the Fourier domain.
One has to compute $\myphi_{t,1}$ and $\myphi_{t,0}$ for $(\d_{\v_{t_i}} \myE_{t_i})_H$ from Eq. \ref{eq:lddmmds}. $\myphi_{t,1}$ can be computed directly by composing $ (1/N)\myphi_{t_{N-1}} \circ \cdots \circ (1/N) \myphi_{t_{i}}$. For $\myphi_{t,0}$, one needs to compute the inverse of $\myphi_{0,t}$. Beg proposed to use a Semi-Lagrangian scheme in Sec. 3.3 of \cite{Beg2005Computing} for stable numerical computation, which is not covered in this paper. But in general to get the inverse transform is computationally intensive using a series of $v_t$.
LDDMM method requires that one store $N$ velocity fields. In each iteration, it needs to compute also $N$ gradient fields, $N$ compositions for $\myphi_{t,1}$ and solve $N$ inverse problems. Overall, this is a very expensive algorithm.
\subsection{Comparison with Horn-Schunck Optical Flow}
Revisiting the Horn-Schunck optical flow update Eq. \ref{eq:opticalflow} with the approximation Eq. \ref{eq:approxopticalflow}, one can rewrite it as:
\begin{align}
\d_\u\myE = (\I_0 \circ \myphi - \I_1) \D\I_1 - \lambda \L \u \;.
\end{align}
Comparing with the gradient at time $t$ of LDDMM (Eq. \ref{eq:lddmmgrad}), $\d_{\v_t} \myE_t$ is conveniently explained as the optical flow between $J_t^0$ and $J_t^1$ at time $t$, weighted by the Jacobian determinant $|\D\myphi_{t,1}|$ introduced by change of variable of $ y = \myphi_{t,1}(x)$.\footnote{One small difference is that the image similarity is defined as $\|I_0 \circ \myphi_{1,0} - I_1\|^2$, not $\|I_1 \circ \myphi_{0,1} - I_0\|^2$ (Eq.\ref{eq:SSD}). }
Thus the LDDMM algorithm can be summarized as computing $v_{t_i}$ in the way of the optical flow for all time $t_i$ and composing all $(v_{t_i})$ to get $\myphi_{t, 1}$ and $\myphi_{t, 0}$.
\subsection{Comparision with Compositive Demons}
Compositive demons described in \ref{sec:compositivedemons} can also be viewed as a greedy optimization of LDDMM. In compositive demons, when the iterative optimization procedure uses iterations, the update $\w_i, 1 \leq i \leq N$ from each iteration can be viewed as the velocity fields. At iteration $i$, the greedy optimization fixes all the velocity fields obtained before $i$ and only optimize $\w_i$.
The advantage of such a greedy way is that this does not require to store all the velocity fields before $i$, only their composition is needed. The disadvantage of such a greedy method does not put any constraint on the overall quality of $\w_i$. Each $\w_i$ is a local optimal but the transform path following all $w_i$ may be very curvy. In fact, Beg et al also \cite{Beg2005Computing} showed that the regularization used in LDDMM has the same optimal transform as the shortest path to $\Id$.
\section{Diffeomorphism with Stationary Velocity Fields}
The LDDMM method \cite{Beg2005Computing} uses the time varying
velocity fields $\v_t$ to parameterize the diffeomorphism
$\myphi$. Although the derivation is rigorous, the algorithm has a high complexity and also requires stable numerical methods in solving the inverse transform. One way to simplify the problem is to use stationary velocity fields $\v$, which does not change across time, to generate the diffeomorphism \cite{Arsigny2006,Ashburner2007,Vercauteren2007,Vercauteren2008Symmetric,Hernandez2009}. The o.d.e that generates the diffeomorphism is now written as:
\begin{align}
\frac{\dd}{\dd t} \myphi_t(x) = \v(\myphi_t(x)),
\end{align}
The stationary velocity field $\v$ is referred as the one-parameter generator of a subgroup $\Phi=(\exp(t\v))_{t \in \R}$, where $\v$ belongs to the tangent space of $\Phi$. The final transform is parameterized as the exponential at unit time $\exp(\v)$. The theory about such an infinite-dimensional space with a Lie group structure is still under research \cite{Arsigny2006, Vercauteren2008Symmetric, Sternberg1964, Mahony2002}.
Using a stationary velocity field $\v$ to replace the time varying velocity fields $\v_t$ greatly simplify the computation. We have
\begin{align}
\myphi_{s,t} = \myphi_{t-s} = \exp((t-s)v)
\end{align}
This shows that using stationary velocity fields, $\myphi_{s,t}$ only depends on the length between two time points and not on the starting point.
Although not every diffeomorphism can be parameterized by stationary velocity fields \cite{Ashburner2007}, such a model performs well empirically for medical image registration \cite{Hernandez2009, Vercauteren2009}. Parameterization of stationary velocity fields not only simplifies the compution; it also provides the possibiity of statistics study on one velocity field, which is desirable in many applications.
\subsection{Computation of Exponential Mapping}
In Sec. \ref{sec:lddmmalg} we discussed that computing $\myphi$
requiring $N$ times composition of vector fields and solve $N$ times
the p.d.e of the inverse transform $\myphi^{-1}$. \footnote{In fact,
this is the minimal amout of computation required. Increased
integration accuracy requires a dense sampling of the o.d.e. in time
and can require many more integration points than discretization
points, in particular if an accurate/consistent forward and inverse
transformation is desired.} With the stationary field, such complexity can be reduced to $\log N$ times composition for both $\myphi$ and $\myphi^{-1}$.
Arsigny et al \cite{Arsigny2006} introduced an efficient method to compute the exponential mapping $\myphi = \exp(\u)$ by exploiting the following property:
\begin{align}
\exp(\v) = \exp(N^{-1} \v )^N \;\text{for integer } N.
\end{align}
Using the first order approximation $\exp(N^{-1} \v) \approx \Id + N^{-1}\v$ for sufficiently large $N$, the following "scaling and squaring" algorithm (Alg.\ref{alg:exp}) computes recursively $K$ times for $N = 2^K$:
\begin{algorithm}
\caption{Scaling-and-squaring for Exponential Mapping}
\label{alg:exp}
\begin{enumerate}
\item{Scaling: choose $K$ s.t $2^{-K}\v$ is close to 0, e.g, $\max_x x\|2^{-K}\v(x)\| leq 0.5 $.
}
\item{First-order approximation: $\myphi^{0} = \Id + N^{-1}\v \approx \exp(2^{-N} \v) $.
}
\item{Squaring: recursive $K$ steps $\myphi^{k+1} \leftarrow \myphi^{k} \circ \myphi^{k}$.
}
\end{enumerate}
\end{algorithm}
Another important advantage is that one can also compute $\myphi^{-1}$, the inverse of $\myphi$, using the same scaling-and-squaring method with $\exp(-N^{-1} \v) \approx \Id - N^{-1}\v$.
\subsection{Ashburner's DARTEL algorithm}
Ashburner proposed to use stationary velocity field for diffeomorphic image registration. The algorithm was named DARTEL: Diffeomorphic Anatomical Registration using Exponentiated Lie algebra. DARTEL can be viewed as a stationary version of LDDMM in many ways. DARTEL uses the image registration energy
function:
% \footnote{Actually not exactly the same form. Ashburner defined the image similarity term as $\| I_1 \circ \myphi_{0,1} - I_0 \|^2$, here we reformulate it for consistency with LDDMM.}
% Eq. \ref{eq:lddmm} as:
\begin{align}
\myE(\v) = \frac{1}{2} \| I_0 - I_1 \circ \myphi_{1} \|^2
+
\frac{\sigma^2}{2} \| \opL \v\|^2 \;,
\label{eq:dartel}
\end{align}
Compared with Eq. \ref{eq:lddmmgrad}, the $L^2$ first order derivative of $\myS$ becomes:
\begin{align}
\d_\v \myS(\v) = \int_{0}^{1}
|\D\myphi_{-t}|
(\I_0 \circ \myphi_{-t} - \I_1 \circ \myphi_{1-t})
\D(\I_1 \circ \myphi_{1-t})
\, \dd t \; ,
\label{eq:dartelds}
\end{align}
Previously we discussed that it is expensive to compute this gradient in LDDMM. But luckily, Ashburner \cite{Ashburner2007} also derived a recursive scheme for $\d_\v \myS(\v)$ using $\log N$ times operation. Using together the scaling-and-squaring algorithm to compute $\myphi$, each gradient update altogether only needs $\log N$ times operation.
% For the sake of completeness, we include it here.
% By disretizing time $0$ to $1$ into $N=2^K$ time points, Eq. \ref{eq:dartelds} becomes:
% \begin{equation}
% \begin{split}
% \d_\v \myS(\v) & =
% \frac{1}{N} \sum_{n=0}^{N-1} |\D\myphi_{t_n,1}|
% (\I_1 \circ \myphi_{t_n} - \I_0\circ\myphi_{t_n}^0) \D(\J_{t_n}^0)
% \\
% & =
% \frac{1}{N} \sum_{n=0}^{N-1} |\D\myphi_{t_n,1}| (\J_{t_n}^1 - \J_{t_n}^0) \D(\J_{t_n}^0)
% \\
% \end{split}
% \end{equation}
Ashburner used the Levenberg-Marquardt algorithm in $L^2$ space as the optimization scheme, for which the second order derivatives of the energy also was computed. From the discussion in Sec. \ref{sec:sobolevprecondition}, one could also use the gradient descent in the Sobolev space acting as the preconditioner, which was used in Beg et al's LDDMM algorithm \cite{Beg2005Computing}.
% \subsection{DARTEL regularization and Gauss-Newton Method to Soblev gradient}
\subsection{Comparison with LDDMM}
DARTEL and LDDMM both models the transform from the o.d.e generator. Both algorithms compute the intergration over time using $N$ discretized time points. Both have a rigorous derivation of the energy gradient.
However, the DARTEL algorithm used the stationary velocity field $v$
as the parameterization of the transform $\myphi$. DARTEL is a simplified way to approximate LDDMM by approximating the velocity fields at all time points with their average.
The DARTEL update of the velocity field can be viewed as the average
of the optical flow at every time point. Instead of having to store
the velocity field at every time point, DARTEL only stores one
velocity field. LDDMM needs to compose (minimum) $N$ different velocity fields to get the transform and compute the gradient at each time point. DARTEL only needs $\log N$ times composition to compute both transform and the gradient.
Another difference between DARTEL and LDDMM is whether the
optimization occurs in$L^2$ space or the Sobolev space. DARTEL
computes sparse matrix inverse used in each Levenberg-Marquardt
iteration. Although Ashburner proposed to use the multi-grid method with
Gauss-Seidel iteration, such a computation still has high complexity compared to the convolution approach in the Fourier domain used in LDDMM. The higher frequency in the operator $\opL^+\opL$ used in $L^2$ gradient also leads to numerical instability.
\subsection{Vercauteren's Diffeomorphic Demons Algorithm}
Vercauteren extended the compositive demons algorthm in Sec. \ref{sec:compositivedemons} with diffeomorphism updates \cite{Vercauteren2009}. Following notations used in Sec. \ref{sec:demons}, in each iterative step, the diffeomorphic transformation can be represented as:
\begin{align}
\myphi \leftarrow \myphi \circ \exp(\w)
\end{align}
If $\myphi$ was initialized as a diffeormorphism (i.e. $\Id$), the composition of the fields $\w$ from all the iterations are $\myphi = \exp(\w_N) \circ \cdots \circ \exp(\w_1)$. As we know that $\exp(\w) = \Id + \w + o(\|\w\|)$, the compositive demons is a first order approximation of such a parameterization.
Note that the new regularization terms can be reinterpreted as $\| \w \|^2 = \| \log(s^{-1} \circ c )\|^2$.
Unlike LDDMM and DARTEL,Vercauteren et al did not compute the exact analytical form of $\p_\w \exp(\w)$ for any $\w$. In contrast they explored from the perspective of Lie group. We already discussed that the velocity field $v$ belongs to the Lie algebra, i.e. the tangent space, of the Lie group. The exponential mapping $\exp(\w)$ maps from a neighborhood of $0$ in the vector tangent space to a neighborhood of $\Id$ in the diffeomorphism transform group.
When $\w$ is close to $0$, one only checks the derivative $\p_{\w}\exp(\w)$ at $0$, similar to the small deformation methods. Without a very rigorous derivation, Arsigny et al indicated in \cite{Arsigny2006}
\begin{align}
\frac{\dd}{\dd\w} \exp(\w) \rvert_{\w=0} = \Id \; .
\end{align}
Again this implies that the around 0 $\exp(\w)$ and $\Id+\w$ has the same first derivative. So the diffeomorphic demons is very similar to the compositive demons and the difference is the compositive step is now $\myphi \leftarrow \myphi \circ \exp(\w)$.
\subsection{Comparison with DARTEL and Compositive Demons}
The update step used in diffeomorphic demons actually is an approximation of DARTEL algorithm. Instead of averaging the optical flow at every time points, diffeomorphic demons only used the optical flow (in other words, the demons forces) at time $0$ as a replace. Note that this is validated only when $\w$ is small and not true in general. Unlike DARTEL, the final transform $\myphi$ cannot be represented as the exponential mapping anymore.
Similarly to compositive demons, diffeomorphic demons did not track all the velocity fields generated in each iteration. It is more effcient and more greedy than LDDMM. The difference is that the transform is updated by composing with the exponential of update, instead of the current update directly.
Diffeomorphic demons thus can be viewed as a middle method between DARTEL and compositive demons. In practice, these three method have comparable performance \cite{Hernandez2008b,Vercauteren2009} in image matching results but may result in different transforms.
\subsection{Vercauteren's LogDemons Algorithm}
In order to keep the final transform with the exponential form $\exp(\v)$, Vercauteren et al proposed LogDemons \cite{Vercauteren2008Symmetric} to update the velocity field directly in the log domain. The aim is to combine update $w$ to current velocity $v$ to a new exponential transform such that:
\begin{align}
\exp(Z(\v, \epsilon \w)) = \exp(\v) \circ \exp(\epsilon \w) \;,
\end{align}
where $\epsilon$ is a small scalar to emphasize that $\epsilon \w$ is small.
Although mathematically it is still not clear about the Lie group structure of the infinite-dimensional space, Bossa et al has shown the Baker-Campbell-Hausdorff formula for Lie group could be applied successfully. In the settings of LogDemons, Vercauteren used the following first order approximation:
\begin{align}
Z(\v, \epsilon \w) = \v + \epsilon \w + \frac{1}{2}[\v, \epsilon\w]+\frac{1}{12}[\v,[\v,\epsilon\w]] + O(\|\epsilon\w\|^2) \;,
\end{align}
with the Lie bracket $[\cdot,\cdot]$ defined as:
$
[\v,\w]=|\D(\v)|\w-|\D(\u)|\w
$
and $|\D\cdot|$ represents the Jacobian determinant.
LogDemons also defines the regularization of $\myphi$ as $\| \D \log(myphi) \|^2=\|\v\|^2$. Thus LogDemons has the update step and regularization step as:
\begin{align}
& \c = \exp(\v) \;\text{ and } \v \leftarrow Z(\v, \w) \\
& \v \leftarrow \tG \ast \v
\end{align}
\subsection{compare to DARTEL}
LogDemons completely works in the log domain of the exponential mappings. It has the same parameterization as DARTEL, which makes them share some similar advantages, like efficient computation of exponential mapping and providing the velocity fields for further statistical analysis.
Unlike DARTEL, LogDemons is extended from diffeomorphic demons approach. It needs to satisfy the assumption that each update of velocity field has to be small. DARTEL has a more rigorous way to compute the gradient, which can be explained as the averaging of optical flows at all time points. LogDemons only used the optical flow at time $0$, as in the diffeomorphic demons. Instead, LogDemons utilizes BCH formula to integrate the update back into the velocity field in the log domain.