Uploaded by Zhang Luxin

Final Project Report

advertisement
Abstract
In this project, we designed a scoring algorithm to evaluate the quality of historical
images and the images restored from them, using both an existing image
restoration algorithm (Real-ESRGAN) and our self-implemented algorithm.
Our scoring algorithm assesses images based on four dimensions: sharpness, noise,
contrast, and color. Throughout the development of the algorithm, we fine-tuned
parameters for these dimensions, striving to achieve optimal and precise scoring
outcomes. Additionally, we incorporated subjective feedback from individuals to
adjust the algorithm, intending to assign higher scores to images perceived as
visually appealing.
The image restoration algorithms employed include Real-ESRGAN and a
self-implemented algorithm derived from course content. Since the Real-ESRGAN
algorithm is proved to be effective on image restoration, we also used scores of
pictures restored by Real-ESRGAN to compare with scores of old pictures and
improve our self-implemented image restoration algorithm.
Upon applying our completed image scoring algorithm to evaluate 16 historical
images, those restored by Real-ESRGAN, and those restored by our
self-implemented algorithm, we got three scores for each respectively. From the
results, we observed that the scores for images restored by both algorithms were
generally higher than those of the original pictures. This suggests the efficacy of
our scoring algorithm in reflecting image quality. Moreover, the scores for images
restored by Real-ESRGAN and our self-implemented algorithm were consistently
close, indicating that both algorithms successfully restored old photos to a
relatively high quality, visually pleasing to the human eye.
Introduction
There are many ways to assess image quality, and one particularly straightforward
approach is assigning scores to images as a reflection of their quality. Thus, we
developed a scoring algorithm for this purpose.
In the formulation of our scoring algorithm, we carefully selected four
dimensions—sharpness, noise, contrast, and color—as variables. This decision was
made after an extensive search and comparison of various algorithms. Sharpness,
focusing on the clarity and detail of visual elements, was evaluated using the Sobel
and Laplacian algorithms. Noise reduction aimed to diminish distracting artifacts,
while contrast and color were considered for their overall visual impact and
quality.
Meanwhile, we fine-tuned our scoring algorithm to align better with human
perceptual preferences regarding images. Notably, we assigned negative weights to
sharpness and noise, as people tend to find images more comfortable when they
exhibit less sharpness and fewer unwanted artifacts. This subjective evaluation
tendency plays crucial roles in our scoring algorithm as well.
Continuously adjusting the weights of each parameter in our scoring algorithm for
effectiveness and accuracy, we ultimately determined the optimal weights through
multiple experiments. This scoring algorithm is used to score for old pictures with
various issues such as blurriness, high noise, fading, and so on.
For image restoration, we utilized Real-ESRGAN, a machine learning algorithm
developed by others, and a self-implemented algorithm. Real-ESRGAN is
designed to restore low-resolution images, especially enhancing realistic textures.
Our self-implemented algorithm employed various filters, including Gaussian,
bilateral, and median, for denoising.
Methodology
1. Overall system flow
We use a total of 16 old images and we pass these images to our two
restoration algorithms, one is the Real-ESRGAN algorithm and the other is
the algorithm implemented by ourselves. After we get the restored images,
we calculate the four evaluating scores for these images. We then calculate
the restoration scores by combining the four scores with weights and signs
discussed in the following section.
2. Machine Learning Algorithm
We used Real-ESRGAN (Real Enhanced Super-Resolution Generative Adversarial
Networks) algorithm, which was developed by Xintao Wang et al. in 2021 in the
paper Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure
Synthetic Data.
The basics of Real-ESRGAN is SRGAN (Photo-Realistic Single Image
Super-Resolution Using a Generative Adversarial Network). SRGAN employed an
innovative perceptual loss mechanism utilizing high-level feature maps of the
VGG network, coupled with a discriminator that promotes solutions challenging to
differentiate from the high-resolution reference images perceptually. The
perceptual loss is the weighted sum of a content loss and an adversarial loss. Its
function equation is as follows:
To avoid the perceptually unsatisfying overly smooth textures, the authors used a
VGG loss for the content loss. Here is its equation:
For the adversarial loss, the authors added the generative component of GAN to the
perceptual loss. The equation of the generative loss is as follows:
The SRGAN framework can recover the texture details of images at large
upscaling factors and increase the perceptual quality.
Afterwards, some scholars developed ESRGAN (Enhanced Super-Resolution
Generative Adversarial Networks) based on SRGAN to improve the image quality
of the hallucinated details. The enhancement was in three components: network
architecture, adversarial loss and perceptual loss.
For network architecture, the authors made two modifications to the generator
structure. The first one is removing all batch normalization (BN) layers. The
authors notice that BN layers are more likely to bring artifacts when the network is
deeper and trained under a GAN framework. Therefore, they remove the BN layers
to achieve stable training and consistent performance.
The second one is replacing the original basic block with the proposed
Residual-in-Residual Dense Block (RRDB). The authors observed that more layers
and connections could improve performance. Thus, they introduced RRDB, which
uses a deeper and more complex structure than the residual block in SRGAN. Here
is the figure showing above two modifications:
Based on ESRGAN, some scholars trained Real-ESRGAN with a more
synthesized practical high-order degradation modeling process and used a U-Net
discriminator to enhance the details of image restoration.
Here is the figure the authors used in the paper to explain the classical degradation
model and their enhancement of the model:
Regarding the network architecture, the authors utilized the same generator as
ESRGAN but introduced a U-Net discriminator with spectral normalization to
mitigate oversharpness and unnecessary artifacts induced by GAN training.
The training process consisted of two steps. Initially, a PSNR-oriented model was
trained using 𝐿1 loss. Subsequently, the trained PSNR-oriented model served as the
generator's initialization for training Real-ESRGAN.
3. Self-implement algorithm
When it comes to self-implement algorithms, we applied some image restoration
methods learned in class to process old photos.
As the flow shown above, we used different filters to denoise the images, and then
selected the optimal result based on a clarity rating. Specifically, we employed
three denoising filters: gaussian, bilateral, and median, using these filters to
experiment with addressing various types of common noise. We used Gaussian
blur to reduce high-frequency noise, applied Bilateral filter for denoising to
preserve edges in the image, and employed Median blur for effectively reducing
the salt-and-pepper noise.
The 2D Gaussian filter is commonly used in image processing, which function is
given by:
Where
represents the value of the Gaussian function at coordinates
,
and is the standard deviation, a parameter that controls the spread or width of the
Gaussian distribution. The larger the value of , the more significant the blurring
effect.
The 2D Bilateral filter is a non-linear filter that is effective in smoothing images
while preserving edges, which formula is given by:
where
is the intensity of the pixel at coordinates
in the input image,
is the spatial domain Gaussian function, which is a function of the spatial
distance between pixels
and the central pixel
.
is the range domain
Gaussian function, which is a function of the intensity difference between the
central pixel and the neighboring pixel.
Moreover,
is a normalization factor ensuring that the weights are normalized,
which function is given by:
Where
represents the spatial coordinates in the filter kernel, is the size of
the filter kernel, and it determines the spatial extent of the filtering.
In other words, the 2D Bilateral filter is a versatile filter that applies different
weights to pixels based on both their spatial distance and intensity difference,
allowing it to smooth the image while preserving edges.
The Median filter involves a window or kernel moving over the image, and for
each position, the median value of the pixel intensities within the window is
calculated. It is effective at removing salt-and-pepper noise and preserving image
details. However, it is computationally more intensive compared to linear filters
due to the sorting operation within the window.
To account for the possibility of excessive blurring in old photos, we also applied a
sharpen image filter to enhance clarity and address potential issues of over
blurring. To be specific, we created a sharpening kernel given by:
And then we applied this kernel for convolution to sharpen the image.
For the clarity assessment, as sharper images exhibit more high-frequency
components, we try to calculate the variance for pixels in the image. Therefore, we
utilized the Brenner gradient function, which calculates the grayscale variance
between adjacent pixels. Specifically, it calculated as follows:
Where
is the Brenner Gradient at pixel coordinates
and
represents the intensity of the pixel at coordinates
in the image.
This simple calculation makes it sensitive to edges with rapid intensity changes, as
the squared difference will be significant at such locations. Based on this feature,
we ultimately selected the filter with the highest grayscale variance as the final
result.
4. Scoring Algorithm
We searched different algorithm scores and decided on four final scores to rate our
pictures from multi-dimensions, including sharpness, noise, contrast, and color.
●
Sharpness: evaluates and assesses the clarity and distinctness of details
within an image. This involves analyzing how well the image preserves
edges and fine details. Specifically, we applied Sobel and Laplace with a 1:1
ratio to calculate sharpness. We take the negative value of the sharpness
score in our final restoration store as based on the way we calculate, the
lower the sharpness score, the better the picture.
●
Noise: evaluates the amount and type of unwanted random variations in an
image's pixel values, which can affect its overall quality. To calculate the
noise score, we first convert the image to grayscale by using cv2.cvtColor
with the cv2.COLOR_BGR2GRAY flag. Then, we applied np.std to calculate
the standard deviation of the pixel intensities in the grayscale image. The
standard deviation is a measure of the amount of variation. A higher
standard deviation implies more variation in pixel intensities, which can be
interpreted as more noise. We also take the negative value for the noise score
in our final restoration score calculation as the score stands for the
variations, which means the higher the score, the larger the noise.
Contrast: evaluates the difference in brightness or color that makes an
object distinguishable in the picture. We first convert the image to grayscale
using the convert('L') method because the contrast is typically evaluated in
luminance or brightness. The grayscale image is then converted to a NumPy
array, and the standard deviation of the pixel intensities is calculated using
np.std. The standard deviation here serves as a measure of contrast. A higher
standard deviation indicates a wider range of tones, which corresponds to
higher contrast.
●
●
Color: evaluates various aspects of how colors are represented and
perceived in an image, which is crucial for visual appeal, realism, and
effectiveness. First, the image is split into its blue, green, and red color
channels using cv2.split. Then, the absolute difference between the red and
green channels is calculated, which measures how much red and green differ
within the image. The absolute difference between a combination of the red
and green channels and the blue channel is also calculated, which represents
the difference between the yellow-blue components. The means and standard
deviations are computed for the above two parts. Then, the square root of the
sum of the squares of standard deviations and the square root of the sum of
the squares of means are calculated. The final color score is calculated as
std_root and 30% of mean_root. This formulation implies that the score
depends more on color variability (the std_root) as compared to the average
color difference (the mean_root).
Restoration Score = - Sharpness * 30% - Noise * 30% + Contrast * 20%
+ Color * 20%
Result
Final Scoring Logic with tuning parameters
Based on our scoring algorithm, we calculate the score of the original image, the
restored pictures from the Real-ESRGAN Deep Learning algorithm, and the
Self-implemented algorithm. We compare the score of the deep learning algorithm
with the score of the original picture and the score of the self-implemented
algorithm with the original picture. We got an accuracy of 87.5% for our final
scoring algorithm.
We tune the weight of each of our algorithms based on the subjective sense of the
whole picture without separating the background and range of interest. While we
have four dimensions for estimating the image, we give sharpness and noise more
weight compared with the color and contract. When we are exploring the database,
we notice the old pictures tend to be sharper with noisy information and less
contrast. After testing several of the old picture databases, we chose -0.3 for the
weight of image sharpness, -0.3 for the weight of image noise, 0.2 for the weight of
image contrast, and 0.2 for the weight of image color. The values are carefully
chosen to reflect the relative impact of each aspect on the overall image quality.
Scores based on different algorithms
In this group of images, the restoration algorithms mostly deal with the background
of the images. The deep learning algorithm gets the highest score of 8.14, followed
by the self-implemented algorithm with a score of 7.57, and the original old picture
got 2.29. In this case, we can see the clear difference in the background wall, where
the noise is large in the original picture. The deep learning algorithm has a good
ability to deal with noise as well as sharpness, and our self-implemented algorithm
blurs the noise information making the whole picture look more comfortable. The
color and contrast dimensions are increased during the restoration when the
distracting information is removed.
The trend of the score is also followed in this set of images. The deep learning
restoration algorithm gets 9.19 as the highest one, and the old picture has the
lowest score of 8.11. The self-implemented algorithm has 8.84 in the middle. The
scores in this group are higher than the last group because more details are
included. Our sharpness algorithm in the scoring algorithm uses the edge to
calculate the sharpness, and more details mean more edges. After the restoration
algorithms, the score does not change a lot because the unwanted information takes
up a small part of the image, for example, in the window edges and doll faces.
While the first group here focuses on the background, this group of pictures
focuses on the front part of the image. The sands are details in the picture and it
can clearly show how well different algorithms restore the picture. The deep
learning algorithm gets 7.16, the self-implemented algorithm gets 6.23, and the
original one has the lowest score 5.88. The differences between scores in the
original picture and the restored pictures are not that large because the unwanted
information takes a small part of the image.
Discussion
From the above score matrix, we can notice some corner cases with extreme
values. Based on our analysis, we believe the anomalies presented in our showcase
are the results of the interplay between the image restoration algorithm and the
scoring algorithm. We observe that in situations with complex backgrounds and
blurring, the accuracy of our scoring metric tends to decrease. We attribute this
phenomenon to the challenges in distinguishing noise under such conditions.
Additionally, when the image contains too many details, the calculation of
sharpness, which is based on the edges crossing different parts in the image, will
also cause some influence on the restoration score.
Conclusion
Based on our result matrix, except from some extreme cases, the images generated
from the Real-ESRGAN algorithm and our self-implemented algorithm have
higher scores than the original images.Moreover, in most cases, machine learning
methods tended to outperform self-implemented approaches. We attribute this to
the ability of machine learning networks to intelligently recognize information
within images, such as foreground and background, enabling personalized
restoration, whereas self-implemented methods typically apply uniform operations
to the entire image, making machine learning methods more nuanced in the
approach.
Meantime, as we can observe from the Analysis section, our scoring algorithm
sometimes may be influenced by image blurriness and details in the image. These
defects should be fixed and improved in our future implementation.
Reference
[1] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew
Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz,
Zehan Wang, Wenzhe Shi. Photo-Realistic Single Image Super-Resolution Using a
Generative Adversarial Network. In IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 4681-4690, 2017.
[2] Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen
Change Loy, Yu Qiao, Xiaoou Tang. ESRGAN: Enhanced Super-Resolution
Generative Adversarial Networks. In European Conference on Computer Vision
Workshops (ECCVW), 2018.
[3] Xintao Wang, Liangbin Xie, Chao Dong, Ying Shan. Real-ESRGAN: Training
Real-World Blind Super-Resolution with Pure Synthetic Data. In Proceedings of
the IEEE/CVF International Conference on Computer Vision, pp. 1905–1914,
2021.
Download