BI-PLANAR IMAGE REGISTRATION AND MODELING OF BONES by Joshua Boyd King

advertisement
BI-PLANAR IMAGE REGISTRATION AND MODELING OF BONES
by
Joshua Boyd King
A thesis submitted to the Faculty and the Board of Trustees of the Colorado
School of Mines in partial fulfillment of the requirements for the degree of Master of
Science (Engineering Systems).
Golden, Colorado
Date
Signed:
Joshua Boyd King
Approved:
Dr. William Hoff
Professor of Engineering
Thesis Advisor
Golden, Colorado
Date
Dr. Terry Parker
Professor and Head
Department of Engineering
ii
ABSTRACT
To apply computer aided surgery to bone related procedures a 3D model of the
bone needs to be preoperatively produced uniquely for each patient. This can be
accomplished very accurately by performing a CT scan of the patient and segmenting
the data to create a surface model, which would then need to be rigidly registered
to the patient during surgery. Unfortunately building a model in this manner is
undesirable because it is time consuming, costly and exposes the patient to large
amounts of X-rays.
To minimize cost, time and X-ray exposure it is desirable to use a deformable
3D model that can be altered during the registration process to match any patient.
The use of a deformable model directly couples the modeling and registration processes together, resulting in a 6 (rotation and translation) + N degree of freedom
optimization problem, where N is the number of deformable shape parameters.
Dr. Mahfouz and his colleagues at the University of Tennessee in Knoxville have
developed a registration method that uses a statistical bone atlas to estimate patient
specific femur models from two digitally reconstructed radiographs (synthetic X-ray
images) of the right femur. The objective of this work is to improve the speed and
accuracy of the method; specifically by modifying the existing evaluation function and
using a new search strategy. These improvements are then applied to the registration
and modeling problem for the right femur and L5 lumbar vertebra.
iii
TABLE OF CONTENTS
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
viii
ACKNOWLEDGMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
Chapter 1
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Chapter 2
BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Statistical Atlas Models . . . . . . . . . . . . . . . . . . . . . . . . .
3
4
6
2.1
2.2
2.3
Chapter 3
3.1
3.2
3.3
IMPROVEMENTS TO EVALUATION FUNCTION . . . . . . .
7
Image Gradients and the Effects of Blurring . . . . . . . . . . . . . .
Edge Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Single Point Contribution . . . . . . . . . . . . . . . . . . . . . . . .
7
10
12
Chapter 4
4.1
4.2
. . . . . . . . . . .
14
Cross-Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Two Stage Optimization . . . . . . . . . . . . . . . . . . . . . . . . .
14
16
Chapter 5
5.1
5.2
5.3
. . . . . . . . . . . . . . . . . .
18
Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Digitally Reconstructed Radiographs . . . . . . . . . . . . . . . . . .
Virtual Reality Toolbox and Camera Calibration . . . . . . . . . . . .
18
19
20
Chapter 6
6.1
6.2
IMPROVEMENTS TO SEARCH METHOD
IMPLEMENTATION DETAILS
EVALUATION AND TESTING
. . . . . . . . . . . . . . . . . .
22
Improvements to Evaluation Function . . . . . . . . . . . . . . . . . .
Improvements to Search Method . . . . . . . . . . . . . . . . . . . . .
22
25
iv
6.3
Experimental Results of the Full Algorithm . . . . . . . . . . . . . .
6.3.1 Femur Results . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.2 L5 Lumbar Vertebra Results . . . . . . . . . . . . . . . . . . .
Chapter 7
27
27
31
SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
Major Results and Conclusions . . . . . . . . . . . . . . . . . . . . .
Recommendations for Future Work . . . . . . . . . . . . . . . . . . .
34
35
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
7.1
7.2
v
LIST OF FIGURES
2.1
The (a) predicted region , (b) edge and (c) smoothed edge images. . .
5
2.2
Femur atlas model example modes of variation . . . . . . . . . . . . .
6
3.1
X-ray intensity profiles and corresponding derivatives with (dashed)
and without (solid) gaussian blurring. . . . . . . . . . . . . . . . . . .
8
Shift in the magnitude of the image gradient at different bone radii
and gaussian blurs . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
3.3
Example of morphological erosion . . . . . . . . . . . . . . . . . . . .
10
3.4
The (a) lumbar DRR and (b) corresponding gradient image. . . . . .
11
3.5
Canny edge detector applied to a lumbar DRR. . . . . . . . . . . . .
11
3.6
The true edge near erroneous data points where the predicted edge is
spread. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
3.2
3.7
The true edge near erroneous data points where the X-ray edge is spread. 13
4.1
Rays passing through the image planes of both camera views. . . . .
15
4.2
Flow chart of the two stage optimization strategy. . . . . . . . . . . .
17
5.1
Casting rays through a CT volume . . . . . . . . . . . . . . . . . . .
19
6.1
The plot of mean errors and the corresponding 90% confidence intervals
(error bars) for the four test cases used for the femur rigid optimization. 23
6.2
The plot of mean errors and the corresponding 90% confidence intervals
(error bars) for the four test cases used for the lumbar rigid optimization. 24
6.3
Comparison of the average magnitude errors of the femur with the standard search method in blue and the cross-correlation search method in
green. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
25
6.4
Comparison of the average magnitude errors of the lumbar with the
standard search method in blue and the cross-correlation search method
in green. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
6.5
Typical starting pose for the right femur. . . . . . . . . . . . . . . . .
29
6.6
Typical optimized shape for the right femur at the ICP registered rigid
pose. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
6.7
Typical starting pose for the L5 lumbar vertebra. . . . . . . . . . . .
32
6.8
Typical optimized shape for the L5 lumbar vertebra at the ICP registered rigid pose. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
Lumbar vertebra rendered partially transparent. . . . . . . . . . . . .
36
7.1
vii
LIST OF TABLES
6.1
Mean errors and 90% confidence intervals for the four test cases used
for the femur rigid optimization. . . . . . . . . . . . . . . . . . . . . .
23
Mean errors and 90% confidence intervals for the four test cases used
for the L5 lumbar rigid optimization. . . . . . . . . . . . . . . . . . .
24
6.3
Optimized femur shape point to surface errors. . . . . . . . . . . . . .
28
6.4
Analysis of optimized femur shape point to surface errors . . . . . . .
28
6.5
Optimized lumbar shape point to surface errors. . . . . . . . . . . . .
31
6.2
viii
ACKNOWLEDGMENT
I would like to thank Dr. Mohamed Mahfouz from the University of Tennessee in
Knoxville for the opportunity to work on this project and for supporting my research.
ix
1
Chapter 1
INTRODUCTION
The main focus of orthopedic medicine is the prevention and correction of injuries or deformities of the skeletal system and its associated muscles and joints. In
orthopedic surgeries, it is important to accurately cut or drill bones. Computer aided
surgery (CAS) technology has the potential to greatly improve the accuracy of such
surgeries, and thus improving the overall safety and recovery time of the patients.
Most CAS technologies need a 3D model of the bone [1]. This is because they
use sensor data (eg, fluoro images) and register them to a 3D model of the bone,
and then estimate the pose of the patient with respect to the sensor. The standard
method of producing the 3D model is to perform a preoperative CAT scan of the
patient and manually segment the 3D volume data. This process is not only costly,
but is time consuming and imposes a lot of radiation hazard to the patient.
An alternative is to create the 3D models using a small number (eg, 2) of still xrays. This approach would be faster, cheaper, and expose the patient to less radiation.
The problem is that there is not enough information to uniquely reconstruct a 3D
model from two x-ray projections alone. One way to deal with this ambiguity is to
use a priori knowledge of the shape of the bones. This knowledge can be captured
in the form of a deformable statistical atlas, created from examples of many bones.
The use of such a deformable model directly couples the modeling and registration
processes together, resulting in a 6 (rotation and translation) + N degree of freedom
optimization problem, where N is the number of deformable shape parameters.
2
A group at the University of Tennessee [2], has developed a registration method
that uses a statistical bone atlas to estimate patient specific femur models from two
digitally reconstructed radiographs (synthetic X-ray images) of the right femur. The
objective of this work is to improve the speed and accuracy of this method; specifically
by modifying the existing evaluation function and using a new search strategy. These
improvements are then applied to the registration and modeling problem for the right
femur and L5 lumbar vertebra.
The remainder of this thesis is organized as follows: Chapter 2 covers the relevant
background materials and previous work; Chapters 3 and 4 discusses the modifications to the existing Mahfouz method and their justifications; Chapter 5 covers the
implementation details of the genetic algorithm, digitally reconstructed radiographs
and the virtual camera calibration; Chapter 6 presents the evaluation and experimental results of the modifications discussed in Chapters 3 and 4; and finally Chapter 7
is the summary of this work and the recommendations for future work.
3
Chapter 2
BACKGROUND
2.1
Image Registration
The objective of image registration is to use one or more 2D images of an object
to find the rigid transformation from the corresponding 3D model in the computer
frame to the world frame of reference. Given a 3D model of the object and a model
of the imaging process a 2D image of the model can be predicted at a given rigid pose
and compared to the image of the actual object. Specifically for this research the 3D
models are of bones (right femur or L5 lumbar) which are registered to X-ray images.
Directly comparing the predicted images to the X-ray images is considered an
intensity based method where the predicted images can vary from from silhouettes
[3] of a 3D model to digitally reconstructed radiographs [4] computed from CT data.
The alternative to intensity based methods are feature based methods, such as [5]
and [6]. Feature based methods rely on the extraction/segmentation of landmarks or
other features such as contours and silhouettes from the X-ray images that are rigidly
aligned to the same features on the 3D model. The segmentation can be performed
either manually or automatically and once complete the method tends to be faster
than other methods. However, the segmentation process can be time consuming and
prone to error [6].
When using only one image in the registration process it is difficult to find the
depth of the object from the camera accurately without a secondary device, such as a
4
range sensor. Bi-planar registration can more easily establish the depth of the object
from the first camera using the second camera and vice versa.
2.2
Similarity Measures
The similarity measures described below were originally developed by Dr. Mah-
fouz and his colleagues to register 3D models of knee implants to single plane fluoroscope [3] images of patients with knee implants. The evaluation function is based
on an edge and region similarity measure and was optimized using simulated annealing. This method has since been applied to bi-planar image registration [2], where
a statistical bone atlas is registered to two approximately orthogonal synthetic Xray images. The bi-planar case used a coarse to fine optimization strategy with a
genetic algorithm to optimize the evaluation function for both the rigid and shape
parameters.
There are two types of predicted images that are used in the region and edge
similarity measure. The region image (Figure 2.1(a)) is the binary silhouette of the
projected 3D models and its corresponding smoothed edge image (Figure 2.1(c)),
which is created by running the silhouette through an edge detector (Figure 2.1(b))
and then filtering the result with an N by N gaussian mask. The edge is smoothed
to allow points near the edge to contribute to the score by an amount that decreases
with increasing distance from the edge. Edge points beyond N pixels from the edge
are given zero weight, where N is chosen to allow for some degree of modeling error,
but minimized to reject erroneous edge points.
The region score (Equation 2.1) for each individual view is computed by multiplying the X-ray image (Ij ) and predicted region image (Pj ) together, summing the
result and normalizing by the sum of the predicted image. Here, j = 1, 2 for the two
5
Figure 2.1. The (a) predicted region , (b) edge and (c) smoothed edge images.
views.
P P
Ij (u, v)Pj (u, v)
Rj = u P v P
u
v Pj (u, v)
(2.1)
The edge score (Equation 2.2) is very similar to the region score with the exception of using the edge enhanced (gradient magnitude) X-ray image (Iej ) and the
predicted smoothed edge image (P ej ).
P P
Iej (u, v)P ej (u, v)
Ej = u Pv P
u
v P ej (u, v)
(2.2)
Equations 2.1 and 2.2 are basically the average intensities of the overlapped
regions and edges of the X-rays and predicted images. Since it is assumed that the
bone in the input X-ray image will have a higher intensity than the surrounding soft
tissue and also have a strong edge, both functions should be at a maximum when the
predicted and X-ray images are correctly aligned.
The total score for each image is a weighted combination of both the region and
edge scores;
Sj = αRj + βEj
(2.3)
where the edge score is given a higher weight so that it dominates the score when
close to the true solution. The overall score is the summation of the total scores for
all j views. This score is then optimized to find the best pose.
6
There are a few disadvantages with the existing method, the first being that
strong edges in the gradient magnitude may bias the solution (Chapter 3.2). Second
the spreading of the predicted edge allows for multiple edge points to contribute the
overall score (Chapter 3.3). Third many function evaluation are needed during the
search, for example in [3] up to 10 thousand evaluations were needed (Chapter 4).
2.3
Statistical Atlas Models
The statistical atlas models used in [2] and [7] were constructed by applying
principal component analysis (PCA) [8] to data sets comprised of surface models
segmented from CT. The primary benefit to using PCA is that the dimensionality of
a data set can be reduced by only using the principal components that represent the
most significant variations from the statistical mean. The femur atlas for example
has a total of 188 principal components, where the first nine components represent
approximately 99% of variation of the model. An example of the variations of femur
atlas is shown in Figure 2.2 for the first three principal components.
Figure 2.2. Femur atlas model example modes of variation
7
Chapter 3
IMPROVEMENTS TO EVALUATION FUNCTION
3.1
Image Gradients and the Effects of Blurring
In image processing the gradient magnitude of an image (Equation 3.1) is used
to highlight abrupt changes in intensities that occur across object boundaries. In the
absence of blurring the maximum of gradient image is coincident with the edges of
the original image.
sµ
Igm (u, v) =
∂I
∂u
¶2
µ
+
∂I
∂v
¶2
(3.1)
Image blurring can occur naturally in the imaging process and is often used
artificially to suppress noise. The effect of blurring is that it can shift the maximum
of the gradient so that it is not coincident with the true edge locations. This shift
depends on the intensity profile of the edge and the amount of blurring present in the
image.
To show this effect a cross sectional slice through the shaft of the femur is approximated as a solid disk of uniform density and using parallel projection its X-ray
intensity profile is estimated using the following equation;
To estimate the magnitude of this effect, the bone was modeled as a cylinder of
uniform density. We modeled the x-ray intensities received at the image plane using
the equation:
Ip (x) = Io − Io exp (−µt(x))
(3.2)
8
where Io is the maximum intensity, µ is the X-ray linear attenuation coefficient for
bone and t(x) is the thickness of the disk as a function of the distance from its center.
To simulate blurring Ip is convolved with a gaussian. The overall process is shown
graphically in Figure 3.1, where the maximum of the derivative of the blurred appears
to ”shrink” the observed silhouette from its predicted (non-blurred) location. Figure
Figure 3.1. X-ray intensity profiles and corresponding derivatives with (dashed) and
without (solid) gaussian blurring.
3.2 shows the results of computing the edge location error for multiple disk radii and
gaussian blurs.
9
Figure 3.2. Shift in the magnitude of the image gradient at different bone radii and
gaussian blurs
In general an X-ray image, whether actual or synthetic will have some degree of
blurring, which will shift the maximum of its gradient. The predicted image has little
to no blurring and therefore the location of the maximum of its gradient is assumed
to be coincident with the true edge of the projected 3D model. This difference in
blurring results in a misalignment of the maximum of the gradients in each image.
The edge shifts shown in Figure 3.2 are approximately constant beyond a certain
bone thickness, therefore the shift itself is assumed to be a constant N , depending
only on the degree of blurring present in the image. In order to correctly align the
predicted edges with the observed edges, the silhouette of the predicted bone is shrunk
by N = 2 pixels prior to edge detection.
Morphological erosion is used to perform the shrinking of the silhouette because
10
it is a simple and relatively fast method when used on binary images, such as the
predicted images. Erosion can be defined as the logical operation;
{z|(B)z ⊆ A}
(3.3)
where the erosion of set A by set B is the set of all points z such that B, translated
by z, is contained within A [9]. Figure 3.3 below shows example sets A and B and
the resulting erosion of A by B. To shift the edges inward by the two pixels desired
Figure 3.3. Example of morphological erosion
the erosion operation is carried out twice.
3.2
Edge Segmentation
In X-rays of the femur the boundaries of the bone are fairly uniform and of a
much higher intensity then the surrounding soft tissue. The same is not true for the Xrays of the spine due to the irregular shape of the vertebrae. As can be seen in Figure
3.4(a) the X-ray of the L5 lumbar vertebra varies greatly in intensity and portions of
the bone cannot be easily distinguished from the surrounding soft tissue, which affect
11
intensities of the image gradient, Figure 3.4(b) in the same way. This non-uniformity
Figure 3.4. The (a) lumbar DRR and (b) corresponding gradient image.
in the gradient can be problematic when using the Mahfouz edge similarity measure
(equation 2.2) because strong edges will greatly dominate the score. To eliminate this
problem the dim and strong edges need to equally contribute to the score. This can
be accomplished by segmenting the gradient and giving it a constant value. A Canny
edge detector [10] is used to produce the segmented edge image (Figure 3.5). The
Canny was selected because it attempts to detect weak gradients by linking them to
strong gradients, thus allowing weak contours to be segmented.
Figure 3.5. Canny edge detector applied to a lumbar DRR.
12
3.3
Single Point Contribution
The original edge similarity measure spreads the predicted edge image before
comparing it with the input X-ray edge image. By spreading the predicted edges
there will be multiple points per original predicted edge pixel that can contribute
to the overall score. Consider the case of an edge contour near erroneous edge data
points shown in Figure 3.6, where the solid vertical line is the correct edge. If the
Figure 3.6. The true edge near erroneous data points where the predicted edge is
spread.
predicted vertical spread edge is evaluated at every possible position the maximum
of the score will likely occur between the correct edge and the erroneous data points
if they are close enough together.
Instead of allowing multiple edge points from the input image to match a single
13
predicted edge point, the matching will be limited to at most one input edge point to
each predicted edge point. By creating a distance map from the input edge image it
is possible to compute the distances between all the predicted and input edge points
(Figure 3.7). If each predicted edge point is within a certain maximum distance (7
pixels in this work) of a input edge point, then that distance is used when computing
the contribution. If beyond that distance, no contribution is made.
Figure 3.7. The true edge near erroneous data points where the X-ray edge is spread.
Once the distance map, D(u, v) is created from the input edge image it is
weighted using a Gaussian. The weighted distance map is defined as Dw (u, v) =
exp(−D(u, v)2 /2σ 2 ), where σ is 1.7.
14
Chapter 4
IMPROVEMENTS TO SEARCH METHOD
4.1
Cross-Correlation
Consider the case of optimizing translation only, assuming all other parameters
are known. The Mahfouz method uses a semi-random search over XYZ and calculates
the evaluation function (a product and sum) at each guess to the solution. Repeating
this process for many guesses is very similar to performing the cross-correlation (XC)
between two images, which can be computed fast if implemented in the frequency
domain [11].
In image processing XC is used for template matching, where the objective is to
find the u, v image location of a template within a larger image. The equation for
XC is defined below;
XCj (u, v) =
XX
m
Ij (u + m, v + n)Tj (m, n)
(4.1)
n
where Ij (u, v) is the larger image, Tj (m, n) is the template XCj (u, v) is the resulting
matrix of correlation coefficients.
Incorporating XC into the evaluation function allows it to search over the u and v
planar translations relative to each view, making it less dependant on prior knowledge
of translation. However there will always be some dependence on translation because
the size of the templates are directly related to the relative depth of the model in each
view. The modified evaluation function now only needs a hypothesis of rotation and
15
a rough estimate of the translation, whereas the original function needed an exact
hypothesis of rotation and translation.
The index of the maximum of the correlation image is the u, v location on the
image plan where the best match occurred. From this location, the intrinsic camera
parameters, and the transform from camera to world coordinates, a direction can
be calculated in the world reference frame. Using the origin of the camera as the
starting point and the calculated direction, a ray can be cast that passes through the
maximum location on the image plane (see Figure 4.1). With the rays from both
views the translation of the bone can be estimated by calculating the mid-point of
Figure 4.1. Rays passing through the image planes of both camera views.
the shortest line segment between the rays. This is the primary benefit and novelty
of using XC, because translation can now be calculated, which reduces the number
of DOF required in the optimization of the evaluation function by three.
The final evaluation function only uses the XC modified edge score. Although
XC can be applied to the region score as well it has been dropped to reduce the
16
overall computation time of the evaluation function. Incidently, this also classifies
this method as a purely feature based method.
4.2
Two Stage Optimization
The optimization strategy used can be broken into two stages (Figure 4.2). The
first stage optimizes the evaluation function for the rotation angles, and the 1st principal component of the atlas model using the genetic algorithm. Since the base model
may be significantly different than the actual bone the 1st principal component is
used to find a rough estimate of the shape and scale of the bone.
The second stage optimizes the rotation angles and the N principal components
that represent 99% of the atlas model variation. The second stage also uses the
genetic algorithm, where the bounds on the parameters optimized in the first stage
have been reduced. An intermediate step occurs at the end of each stage where
the translation is calculated from the maximum correlation peak locations, using the
optimized solutions at each stage in the XC function call.
17
Figure 4.2. Flow chart of the two stage optimization strategy.
18
Chapter 5
IMPLEMENTATION DETAILS
All the code and programs used for this research were implemented in Matlab.
The deformable atlas models, all code pertaining to the digitally recontracted radiographs and the CT data sets were provided by Dr. Mohamed Mahfouz, Brandon
Merkl and Mike Kuhn of the University of Tennessee in Knoxville.
5.1
Genetic Algorithm
A genetic algorithm (GA) is a semi random search method that simulates evolu-
tion. Given a population of individuals and a fitness function to score each individual
the GA uses a selection function where the highest scoring individuals are more likely
to be selected to reproduce or mutate (evolve) into the next generation.
In terms of optimization an individual is a guess to the solution of the evaluation
function being optimized. Initially a population of individuals is randomly generated
and doped with a reasonable guess to the solution to the evaluation function. At
each generation new guesses are generated using crossover and mutation functions.
The new guesses are evaluated and then replace the lowest scoring individuals in the
population. An individual in this case is the vector comprised of the translation and
rotation values, and the coefficients of the first N principal components. This process
is then repeated until a termination condition is meet or until a maximum number of
generations is reached.
The GA used is a Matlab implementation written by myself and is based on the
19
GAOT toolbox written by Houck et al [12]. Blend, arithmetic and uniform crossover
as well as single/multi-gene gaussian mutation operators were used to create the new
guesses for each generation. The individuals used in the operators were selected
using a roulette selection function. The elitist model, where the best individual of
the current population always survives to the next generation was also used.
5.2
Digitally Reconstructed Radiographs
Digitally reconstructed radiographs (DRR) are synthetic X-ray images and were
used in this work instead of real X-rays, because the ground truth, if not explicitly
known can be found fairly easily. The DRRs are created by casting rays through image
slices of a CT scan (Figure 5.1). The CT scan data is stored in a series of DICOM
(digital imaging and communications in medicine) files which contain information
such as: the slice image, slice thickness, slice location and the conversion of pixel
dimensions to SI units. Using the above information from the DICOM images a
continuous volume can be approximated using 3D interpolation.
Figure 5.1. Casting rays through a CT volume
Casting rays through the CT volume requires defining a virtual camera from
20
which the rays will be cast. The virtual camera itself is based on the pinhole camera
model with a focal length of 1199 mm, field of view of .1467 radians (8.4 degrees)
and image plane dimensions of 640 by 480 pixels1 . The directions that result in the
rays that pass through each pixel in the image plane can be calculated with respect
to the camera origin. These directions are then mapped using a transformation
matrix from the default camera view to the desired camera view. The rays themselves
are subdivided into a series of finite measurement locations along each ray. Each
measurement location has an XYZ component that is used in the 3D interpolation
with the CT volume to determine its CT number (intensity) at each location.
The CT number is related to the X-ray linear attenuation coefficient and is used
in equation 5.1 to simulate the X-ray image at each u, v pixel location;
Ã
DRR(u, v) = Io − Io exp −
X
!
µj (u, v)t
(5.1)
j
where Io is the maximum image intensity, µj (u, v) is the matrix of estimated linear
attenuation coefficients at each jth measurement and t is the measurement length.
5.3
Virtual Reality Toolbox and Camera Calibration
The virtual reality toolbox (VRT) is an add on package for Matlab that interacts
with virtual environments written in virtual reality modeling language (VRML) and
was used to render the predicted images of the 3D bone models, which are needed
for the evaluation function. To render the images at the specific scale and camera
views used in the generation of the DRRs, a viewing window is defined which can
be orientated to each camera position, however the VRT does not allow the user to
1
The values used for the virtual camera match those used to model the fluoroscope in the original
work.
21
explicitly define the field of view or focal length. The toolbox does have a parameter
called the ”Zoomfactor” that effectively adjusts the field of view, but how it is adjusted
or what is actually being adjusted is not documented in Matlab. Therefore the viewing
window needs to be calibrated using the ”Zoomfactor” so that the relative scale of
the predicted and DRR images are the same.
To calibrate the VRT viewing window to the DRR camera a similar object needs
to be rendered in both system. The object chosen was a 3D bar and was first generated
by emulating CT DICOMS containing the bar slices. Using the pixel spacing and the
distance between slices an equivalent VRML model was then produced for the VRT.
Once both system were able to render the small known object, the ”Zoomfactor” was
iteratively adjusted until the images produced from both systems were approximately
the same. The value of 2.7879 was used for the ”Zoomfactor”2 .
2
This is a unique value to Matlab R2006b and may not be the same in different versions.
22
Chapter 6
EVALUATION AND TESTING
The evaluation and testing of my improvements to the Mahfouz method was
performed in three parts. First the improvements to the evaluation function were
tested to verify whether or not there was any improvement over the original method.
Second the improvements to the search method were compared to the modified Mahfouz method for the rigid optimization using the known bone model. The third and
final part evaluates the full algorithm as defined in Chapter 4.2 for both the right
femur and L5 lumbar.
6.1
Improvements to Evaluation Function
To evaluate the changes made to the evaluation function a search was performed
with the GA over the rigid parameters using the known shape with approximately
12500 function evaluations. This was repeated 10 times for two different data sets for
a total of 20 trials. Each trial used a random starting pose each with a magnitude
error of 1.5 cm in translation and 8 degrees in rotation.
The 20 trial search was applied to four different cases for the purpose of comparison. Case 1 is the original Mahfouz method, case 2 applies the correction for
blurring, case 3 uses the blurring correction and edge segmentation and case 4 uses
all three improvements, namely the correction for blurring, edge segmentation and
the spreading of the X-ray edge instead of the predicted edge. Once the all the trials
were complete the average magnitude errors for each case were computed as well as
23
the corresponding 90% confidence interval assuming a t-distribution.
The results for all 4 cases for the femur are shown graphically in Figure 6.1,
where the values are displayed in Table 6.1. The overall results show that the errors
of all four case overlap when their confidence intervals are considered and therefore
they do not adequately show wither or not the modifications improved the evaluation
function.
Figure 6.1. The plot of mean errors and the corresponding 90% confidence intervals
(error bars) for the four test cases used for the femur rigid optimization.
Case 1
Case 2
Case 3
Case 4
|T | (mm)
0.865 ± 0.257
0.763 ± 0.227 0.866 ± 0.257
0.782 ± 0.232
|R| (deg)
0.697 ± 0.207
0.595 ± 0.177 0.979 ± 0.256
0.614 ± 0.182
Table 6.1. Mean errors and 90% confidence intervals for the four test cases used for
the femur rigid optimization.
24
Figure 6.2. The plot of mean errors and the corresponding 90% confidence intervals
(error bars) for the four test cases used for the lumbar rigid optimization.
The results for all 4 cases for the lumbar are shown graphically in Figure 6.2,
where the values are displayed in Table 6.2. Although the modifications when applied
to the lumbar are different than the original method (case 1), they are not vary
Case 1
Case 2
Case 3
Case 4
|T | (mm)
5.627 ± 1.671
0.294 ± 0.087 0.649 ± 0.193
0.552 ± 0.164
|R| (deg)
3.522 ± 1.046
0.800 ± 0.238 0.831 ± 0.247
0.973 ± 0.289
Table 6.2. Mean errors and 90% confidence intervals for the four test cases used for
the L5 lumbar rigid optimization.
different for each other. Comparing the results of the femur and lumbar together
the lowest errors in both translation and rotation occurred in case 2 where only the
correction for blurring was applied to the evaluation function.
25
6.2
Improvements to Search Method
To test the improvements to the search method, two cases were considered where
the rigid parameters where optimized using the GA for a known shape. The first
case uses the Mahfouz method with the improvements from the previous section
implemented. The second case is similar but uses XC in the evaluation function and
calculates translation using the ray calculation every ten generations.
For each case 10 trials were run with a random starting pose each with magnitude
errors of 1.5 cm and 8 degrees in translation and rotation respectively. Each trial was
run for 150 generations with approximately 35 function evaluations per generation.
During the trials the magnitudes errors in translation and rotation were computed
Figure 6.3. Comparison of the average magnitude errors of the femur with the standard search method in blue and the cross-correlation search method in green.
per generation of the search. These errors were then averaged across the trials at
26
each corresponding generation and are displayed graphically in Figures 6.3 and 6.4.
Figure 6.4. Comparison of the average magnitude errors of the lumbar with the
standard search method in blue and the cross-correlation search method in green.
For both the femur and lumbar the magnitude errors achieved by the XC search
method in approximately 10 generations are smaller than the errors achieved by the
original search method at generation 150. In terms of time the XC search method
also arrived at better answer faster than the original search method.
For the femur the error plots for the original method still has a negative slope so
it is possible that given enough generations it would have achieved the same errors
as seen in the XC method. For the lumbar the original method appears to have a
slopes close to zero and it is likely that it is getting consistently stuck in a solution
regardless of its starting position. Also the disparity in the errors for the lumbar are
much larger than they are for the femur.
27
6.3
Experimental Results of the Full Algorithm
The full algorithm, with the improvements to the evaluation function and the
two-stage search search method, was evaluated on DRR images of the femur and
lumbar vertebra using the deformable atlas models. The magnitude errors for the
starting translation and rotation used were 1.5 cm and 8 degrees. The search space
was bounded to ±8.8 degrees for each rotation angle and weights of ±0.75 on the
principal components.
The two atlas models used consisted of 188 femurs and 14 L5 lumbar vertebrae,
where the data sets evaluated where not included in the construction of the atlases.
Since the ground truth can only be estimated for rotation and translation when using
the atlas models, the results were evaluated by computing the point to surface errors
from the optimized atlas models to the known models. This was done by performing
iterative closest point [13] between the optimized atlas models and known models.
6.3.1
Femur Results
For the femur, the pose and the first 9 principal components of the atlas were
estimated, as described in Chapter 4.2, where the GA is used to optimize rotation and
shape, and the translation is calculated. Table 6.3 shows the point to surface errors
for the optimized atlas model for the 27 data sets evaluated and Table 6.4 shows the
analysis of those errors. Figure 6.5 and 6.6 show typical starting and ending poses
with the predicted silhouettes overlayed on the DRRs.
Although the mean RMS error is 0.387 cm, the mean of the maximum error is
1.500 cm. This suggests that most of the vertices in the optimized atlas model were
fairly close to the known model with the exception of some outliers, which typically
show up at the extremities of the bone as can be seen in Figure 6.6.
28
Max Error
RMS Error
Max Error
RMS Error
(cm)
(cm)
(cm)
(cm)
1
1.198
0.279
15
1.295
0.393
2
2.106
0.525
16
1.202
0.279
3
1.666
0.426
17
0.516
0.202
4
1.358
0.361
18
0.544
0.233
5
2.062
0.460
19
0.994
0.294
6
1.201
0.322
20
2.512
0.542
7
0.911
0.318
21
2.535
0.521
8
2.586
0.561
22
1.290
0.324
9
1.459
0.442
23
2.081
0.494
10
2.170
0.526
24
1.030
0.384
11
2.227
0.602
25
0.739
0.192
12
1.718
0.457
26
2.181
0.495
13
0.848
0.287
27
1.340
0.313
14
0.683
0.206
Table 6.3. Optimized femur shape point to surface errors.
Min
Max
Mean
Std
Max Error (cm)
0.516
2.586
1.500
0.641
RMS Error (cm)
0.192
0.602
0.387
0.122
Table 6.4. Analysis of optimized femur shape point to surface errors
29
Figure 6.5. Typical starting pose for the right femur.
30
Figure 6.6. Typical optimized shape for the right femur at the ICP registered rigid
pose.
31
6.3.2
L5 Lumbar Vertebra Results
For the lumbar vertebra, 13 principal components and the pose were estimated,
as described in Chapter 4.2, where the GA is used to optimize rotation and shape, and
the translation is calculated. Table 6.5 below shows the point to surface errors for the
optimized atlas model for the 2 data sets evaluated. Figure 6.7 and 6.8 show typical
starting and ending poses with the predicted silhouettes overlayed on the DRRs.
Max Error
RMS Error
(cm)
(cm)
1
0.544
0.160
2
0.482
0.157
Mean
0.513
0.158
Table 6.5. Optimized lumbar shape point to surface errors.
The mean RMS error is 0.158 cm and the mean maximum error is 0.513 cm. As
with the femur, this suggests that most of the vertices in the optimized atlas model
were fairly close to the known model with the exception of some outliers (see Figure
6.6). The ratio of the mean maximum error to the mean RMS error for the lumbar
is 3.269 to 1, which is better than the femur at 3.876 to 1. This difference may be do
the fact the lumbar DRRS are rendered at a third of the distance from the virtual
camera than the femur DRRS, therefore the pixel resolution is effectively three times
larger.
32
Figure 6.7. Typical starting pose for the L5 lumbar vertebra.
33
Figure 6.8. Typical optimized shape for the L5 lumbar vertebra at the ICP registered
rigid pose.
34
Chapter 7
SUMMARY
7.1
Major Results and Conclusions
The evaluation function has been improved by correcting for the misalignment of
edges due to blurring, segmenting the edges, and spreading the input X-ray segmented
edge images instead of the predicted edges images. The bi-planar search method was
also improved by incorporating cross correlation into the evaluation function and
calculating translation instead of optimizing it.
The most significant improvement to the original evaluation function is the correction of edge misalignments do to blurring. Although the simple cylinder approximation used to calculate the edge shift was reasonable for the femur, it was not for
the lumbar vertebra due to its irregular shape. In spite of this the results of the
cylinder approximation did improve the results for the lumbar vertebra as well. The
XC search method is another significant improvement over the original search method
and was shown for the rigid case to out perform the original search method.
As for the evaluation of the full algorithm the modeling errors are much to large
for the optimized models to be realistically used in computer aid surgery, where the
ideal error in terms of orthopedic surgery is less than 2 mm.
This work was limited to DRRs, where near perfect camera calibration is possible
and realistic effects such as varying source intensity and X-ray hardening do not occur.
To be applied to real X-ray images a robust calibration process would need to be used
35
to account for these effects. Another limitation is the overall optimization speed or
time required to optimize the atlas models, which is not currently fast enough for
intraoperative procedures, but is fast enough to be used preoperatively to create
patient specific bone models.
Another limitation is that the atlas model used for the lumbar consisted of a
small number of models and only 2 data sets to test against. It is likely that with
a larger atlas and more data sets that the results would be much different. Also
the variations of the atlas are ultimately defined by the statistics of the contained
models, an unusual bone, possibly deformed or containing un-modeled statistics would
not likely have a very accurate atlas model description. To minimize this possibility
many more models would need to be incorporated into the atlas.
The disadvantage to using a great number of models in the atlas is that the total
number of available PCs is directly proportional to number of models contained in
the atlas. This is a problem because the first N PCs needed to represent 99% of the
model variation would also scale with the total number of available PCs, therefore
using too many models in the atlas could result in an N dimensional space to large to
optimize. To maximize the statistical variation of the atlas and minimize the number
of PCs needed to represent the models, more than once atlas could be constructed for
the same bone. Each atlas could be based on a different patient demographic, such
as age, ethnicity or patient sex.
7.2
Recommendations for Future Work
Using only two camera views doesn’t allow for direct observation of Rz , the
common non-planar rotation angle between the two views. Using a third view that is
not approximately orthogonal to this non-planar rotation angle may further reduce
36
the errors in rotation. Using different views such as 45 degrees from the knee cap or
spine may give also improve the errors in shape by providing more edges that are not
observable in the orthogonal views.
If more information, such as the internal edges are rendered in the predicted
images and considered by the evaluation function the point to surface errors of the
optimized atlas models may be further reduced. As a crude approximation to an Xray image the predicted images could be generated by rendering the model partially
transparent (Figure 7.1) and the corresponding edge image would then have some
internal edges. In VRML, this can be done quickly by simply changing the surface
properties of the model prior to rendering. The reason this would be a crude approx-
Figure 7.1. Lumbar vertebra rendered partially transparent.
imation is that the current model is represented only as a surface (no interior) and
is rendered using self illumination, therefore the internal edges revealed by rendering
the model partially transparent would not necessary correspond to the internal edges
in the X-ray images.
A better approximation to an X-ray image would be to render the surface model
as a DRR with a uniform density. In general bones have a non-uniform density so the
approximation could be further improved by developing a bone atlas that considered
37
not only the surface of the bone but its internal density. Generating DRRs from this
volumetric bone atlas would then give the best overall approximation to the X-ray
images. Also rendering the predicted images in this manner would give the option to
compare the them directly to the X-ray images.
The feasibly of using DRRs as predicted images does however depend on how
fast they can be rendered. In the existing Matlab code it takes about 3.5 minutes
to generate each 640 by 480 DRR with 256 measurements along each projected ray
on a computer using a Pentium 4 3.0Ghz Dual Core processor. While optimizing
the existing code or using a different method altogether may speed up the process,
ultimately the I think the ideal route would be render the DRRs in hardware.
38
REFERENCES
[1] R. H. Taylor, S. Lavallee, G. C. Burdea, and R. Mosges, Eds., ComputerInegrated Surgery. Cambridge, Massachusetts and London, England: The MIT
Press, 1996.
[2] M. Kuhn, M. Mahfouz, E. ElHak, and B. Merkl, “Reconstruction of 3d patientspecific bone models from biplanar x-ray images,” in 12th International Conference on Biomedical Engineering, Singapore, Dec. 2005.
[3] M. R. Mahfouz, W. A. Hoff, R. D. Komistek, and D. A. Dennis, “A robust method for registration of three-dimensional knee implant models to
two-dimensional fluoroscopy images,” IEEE Transactions on Medical Imaging,
vol. 22, no. 12, pp. 1561–1574, Dec. 2003.
[4] G. P. Penney, J. Weese, J. A. Little, P. Desmedt, D. L. G. Hill, and D. J.
Hawkes, “A comparison of similarity measures for use in 2-d - 3-d medical image
registration,” IEEE Transactions on Medical Imaging, vol. 17, no. 7, pp. 586–895,
Aug. 1998.
[5] S. Benameura, M. Mignottea, S. Parentd, H. Labelled, W. Skallie, and
J. de Guisea, “3d/2d registration and segmentation of scoliotic vertebrae using statistical models,” Computerized Medical Imaging and Graphics, vol. 27,
pp. 321–337, 2003.
[6] Y. Zheng, M. S. Nixon, and R. Allen, “Automated segmentation of lumbar vertebrae in digital videofluoroscopic images,” IEEE Transactions on Medical Imaging, vol. 23, no. 1, pp. 45–52, Jan. 2004.
[7] B. Merkl and M. Mahfouz, “Unsupervised three-dimensional segmentation of
medical images using an anatomical bone atlas,” in 12th International Conference on Biomedical Engineering, Singapore, Dec. 2005.
[8] I. T. Jolliffe, Principal Component Analysis. Berlin, Germany: Springer-Verlag,
1986.
[9] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2nd ed.
Saddle River, New Jersery: Prentice Hall, 2001.
Upper
[10] J. Canny, “A computational approach to edge detection,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp. 679–698, Nov.
1986.
39
[11] J. P. Lewis. Fast normalized cross-correlation. Industrial Light & Magic. [Online].
Available: http://www.idiom.com/∼zilla/Papers/nvisionInterface/nip.html
[12] C. Houck, J. Joines, and M. Kay, “A genetic algorithm for function
optimization: A matlab implementation,” North Carolina State University,
Raleigh, NC, Tech. Rep. NCSU-IE-TR-95-09, 1995. [Online]. Available:
http://www.ise.ncsu.edu/mirage/GAToolBox/gaot/
[13] P. Besl and N. McKay, “A method for registration of 3-d shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 239–256,
Feb. 1992.
Download