Final Project Report

Abstract In this project, we designed a scoring algorithm to evaluate the quality of historical images and the images restored from them, using both an existing image restoration algorithm (Real-ESRGAN) and our self-implemented algorithm. Our scoring algorithm assesses images based on four dimensions: sharpness, noise, contrast, and color. Throughout the development of the algorithm, we fine-tuned parameters for these dimensions, striving to achieve optimal and precise scoring outcomes. Additionally, we incorporated subjective feedback from individuals to adjust the algorithm, intending to assign higher scores to images perceived as visually appealing. The image restoration algorithms employed include Real-ESRGAN and a self-implemented algorithm derived from course content. Since the Real-ESRGAN algorithm is proved to be effective on image restoration, we also used scores of pictures restored by Real-ESRGAN to compare with scores of old pictures and improve our self-implemented image restoration algorithm. Upon applying our completed image scoring algorithm to evaluate 16 historical images, those restored by Real-ESRGAN, and those restored by our self-implemented algorithm, we got three scores for each respectively. From the results, we observed that the scores for images restored by both algorithms were generally higher than those of the original pictures. This suggests the efficacy of our scoring algorithm in reflecting image quality. Moreover, the scores for images restored by Real-ESRGAN and our self-implemented algorithm were consistently close, indicating that both algorithms successfully restored old photos to a relatively high quality, visually pleasing to the human eye. Introduction There are many ways to assess image quality, and one particularly straightforward approach is assigning scores to images as a reflection of their quality. Thus, we developed a scoring algorithm for this purpose. In the formulation of our scoring algorithm, we carefully selected four dimensions—sharpness, noise, contrast, and color—as variables. This decision was made after an extensive search and comparison of various algorithms. Sharpness, focusing on the clarity and detail of visual elements, was evaluated using the Sobel and Laplacian algorithms. Noise reduction aimed to diminish distracting artifacts, while contrast and color were considered for their overall visual impact and quality. Meanwhile, we fine-tuned our scoring algorithm to align better with human perceptual preferences regarding images. Notably, we assigned negative weights to sharpness and noise, as people tend to find images more comfortable when they exhibit less sharpness and fewer unwanted artifacts. This subjective evaluation tendency plays crucial roles in our scoring algorithm as well. Continuously adjusting the weights of each parameter in our scoring algorithm for effectiveness and accuracy, we ultimately determined the optimal weights through multiple experiments. This scoring algorithm is used to score for old pictures with various issues such as blurriness, high noise, fading, and so on. For image restoration, we utilized Real-ESRGAN, a machine learning algorithm developed by others, and a self-implemented algorithm. Real-ESRGAN is designed to restore low-resolution images, especially enhancing realistic textures. Our self-implemented algorithm employed various filters, including Gaussian, bilateral, and median, for denoising. Methodology 1. Overall system flow We use a total of 16 old images and we pass these images to our two restoration algorithms, one is the Real-ESRGAN algorithm and the other is the algorithm implemented by ourselves. After we get the restored images, we calculate the four evaluating scores for these images. We then calculate the restoration scores by combining the four scores with weights and signs discussed in the following section. 2. Machine Learning Algorithm We used Real-ESRGAN (Real Enhanced Super-Resolution Generative Adversarial Networks) algorithm, which was developed by Xintao Wang et al. in 2021 in the paper Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data. The basics of Real-ESRGAN is SRGAN (Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network). SRGAN employed an innovative perceptual loss mechanism utilizing high-level feature maps of the VGG network, coupled with a discriminator that promotes solutions challenging to differentiate from the high-resolution reference images perceptually. The perceptual loss is the weighted sum of a content loss and an adversarial loss. Its function equation is as follows: To avoid the perceptually unsatisfying overly smooth textures, the authors used a VGG loss for the content loss. Here is its equation: For the adversarial loss, the authors added the generative component of GAN to the perceptual loss. The equation of the generative loss is as follows: The SRGAN framework can recover the texture details of images at large upscaling factors and increase the perceptual quality. Afterwards, some scholars developed ESRGAN (Enhanced Super-Resolution Generative Adversarial Networks) based on SRGAN to improve the image quality of the hallucinated details. The enhancement was in three components: network architecture, adversarial loss and perceptual loss. For network architecture, the authors made two modifications to the generator structure. The first one is removing all batch normalization (BN) layers. The authors notice that BN layers are more likely to bring artifacts when the network is deeper and trained under a GAN framework. Therefore, they remove the BN layers to achieve stable training and consistent performance. The second one is replacing the original basic block with the proposed Residual-in-Residual Dense Block (RRDB). The authors observed that more layers and connections could improve performance. Thus, they introduced RRDB, which uses a deeper and more complex structure than the residual block in SRGAN. Here is the figure showing above two modifications: Based on ESRGAN, some scholars trained Real-ESRGAN with a more synthesized practical high-order degradation modeling process and used a U-Net discriminator to enhance the details of image restoration. Here is the figure the authors used in the paper to explain the classical degradation model and their enhancement of the model: Regarding the network architecture, the authors utilized the same generator as ESRGAN but introduced a U-Net discriminator with spectral normalization to mitigate oversharpness and unnecessary artifacts induced by GAN training. The training process consisted of two steps. Initially, a PSNR-oriented model was trained using 𝐿1 loss. Subsequently, the trained PSNR-oriented model served as the generator's initialization for training Real-ESRGAN. 3. Self-implement algorithm When it comes to self-implement algorithms, we applied some image restoration methods learned in class to process old photos. As the flow shown above, we used different filters to denoise the images, and then selected the optimal result based on a clarity rating. Specifically, we employed three denoising filters: gaussian, bilateral, and median, using these filters to experiment with addressing various types of common noise. We used Gaussian blur to reduce high-frequency noise, applied Bilateral filter for denoising to preserve edges in the image, and employed Median blur for effectively reducing the salt-and-pepper noise. The 2D Gaussian filter is commonly used in image processing, which function is given by: Where represents the value of the Gaussian function at coordinates , and is the standard deviation, a parameter that controls the spread or width of the Gaussian distribution. The larger the value of , the more significant the blurring effect. The 2D Bilateral filter is a non-linear filter that is effective in smoothing images while preserving edges, which formula is given by: where is the intensity of the pixel at coordinates in the input image, is the spatial domain Gaussian function, which is a function of the spatial distance between pixels and the central pixel . is the range domain Gaussian function, which is a function of the intensity difference between the central pixel and the neighboring pixel. Moreover, is a normalization factor ensuring that the weights are normalized, which function is given by: Where represents the spatial coordinates in the filter kernel, is the size of the filter kernel, and it determines the spatial extent of the filtering. In other words, the 2D Bilateral filter is a versatile filter that applies different weights to pixels based on both their spatial distance and intensity difference, allowing it to smooth the image while preserving edges. The Median filter involves a window or kernel moving over the image, and for each position, the median value of the pixel intensities within the window is calculated. It is effective at removing salt-and-pepper noise and preserving image details. However, it is computationally more intensive compared to linear filters due to the sorting operation within the window. To account for the possibility of excessive blurring in old photos, we also applied a sharpen image filter to enhance clarity and address potential issues of over blurring. To be specific, we created a sharpening kernel given by: And then we applied this kernel for convolution to sharpen the image. For the clarity assessment, as sharper images exhibit more high-frequency components, we try to calculate the variance for pixels in the image. Therefore, we utilized the Brenner gradient function, which calculates the grayscale variance between adjacent pixels. Specifically, it calculated as follows: Where is the Brenner Gradient at pixel coordinates and represents the intensity of the pixel at coordinates in the image. This simple calculation makes it sensitive to edges with rapid intensity changes, as the squared difference will be significant at such locations. Based on this feature, we ultimately selected the filter with the highest grayscale variance as the final result. 4. Scoring Algorithm We searched different algorithm scores and decided on four final scores to rate our pictures from multi-dimensions, including sharpness, noise, contrast, and color. ● Sharpness: evaluates and assesses the clarity and distinctness of details within an image. This involves analyzing how well the image preserves edges and fine details. Specifically, we applied Sobel and Laplace with a 1:1 ratio to calculate sharpness. We take the negative value of the sharpness score in our final restoration store as based on the way we calculate, the lower the sharpness score, the better the picture. ● Noise: evaluates the amount and type of unwanted random variations in an image's pixel values, which can affect its overall quality. To calculate the noise score, we first convert the image to grayscale by using cv2.cvtColor with the cv2.COLOR_BGR2GRAY flag. Then, we applied np.std to calculate the standard deviation of the pixel intensities in the grayscale image. The standard deviation is a measure of the amount of variation. A higher standard deviation implies more variation in pixel intensities, which can be interpreted as more noise. We also take the negative value for the noise score in our final restoration score calculation as the score stands for the variations, which means the higher the score, the larger the noise. Contrast: evaluates the difference in brightness or color that makes an object distinguishable in the picture. We first convert the image to grayscale using the convert('L') method because the contrast is typically evaluated in luminance or brightness. The grayscale image is then converted to a NumPy array, and the standard deviation of the pixel intensities is calculated using np.std. The standard deviation here serves as a measure of contrast. A higher standard deviation indicates a wider range of tones, which corresponds to higher contrast. ● ● Color: evaluates various aspects of how colors are represented and perceived in an image, which is crucial for visual appeal, realism, and effectiveness. First, the image is split into its blue, green, and red color channels using cv2.split. Then, the absolute difference between the red and green channels is calculated, which measures how much red and green differ within the image. The absolute difference between a combination of the red and green channels and the blue channel is also calculated, which represents the difference between the yellow-blue components. The means and standard deviations are computed for the above two parts. Then, the square root of the sum of the squares of standard deviations and the square root of the sum of the squares of means are calculated. The final color score is calculated as std_root and 30% of mean_root. This formulation implies that the score depends more on color variability (the std_root) as compared to the average color difference (the mean_root). Restoration Score = - Sharpness * 30% - Noise * 30% + Contrast * 20% + Color * 20% Result Final Scoring Logic with tuning parameters Based on our scoring algorithm, we calculate the score of the original image, the restored pictures from the Real-ESRGAN Deep Learning algorithm, and the Self-implemented algorithm. We compare the score of the deep learning algorithm with the score of the original picture and the score of the self-implemented algorithm with the original picture. We got an accuracy of 87.5% for our final scoring algorithm. We tune the weight of each of our algorithms based on the subjective sense of the whole picture without separating the background and range of interest. While we have four dimensions for estimating the image, we give sharpness and noise more weight compared with the color and contract. When we are exploring the database, we notice the old pictures tend to be sharper with noisy information and less contrast. After testing several of the old picture databases, we chose -0.3 for the weight of image sharpness, -0.3 for the weight of image noise, 0.2 for the weight of image contrast, and 0.2 for the weight of image color. The values are carefully chosen to reflect the relative impact of each aspect on the overall image quality. Scores based on different algorithms In this group of images, the restoration algorithms mostly deal with the background of the images. The deep learning algorithm gets the highest score of 8.14, followed by the self-implemented algorithm with a score of 7.57, and the original old picture got 2.29. In this case, we can see the clear difference in the background wall, where the noise is large in the original picture. The deep learning algorithm has a good ability to deal with noise as well as sharpness, and our self-implemented algorithm blurs the noise information making the whole picture look more comfortable. The color and contrast dimensions are increased during the restoration when the distracting information is removed. The trend of the score is also followed in this set of images. The deep learning restoration algorithm gets 9.19 as the highest one, and the old picture has the lowest score of 8.11. The self-implemented algorithm has 8.84 in the middle. The scores in this group are higher than the last group because more details are included. Our sharpness algorithm in the scoring algorithm uses the edge to calculate the sharpness, and more details mean more edges. After the restoration algorithms, the score does not change a lot because the unwanted information takes up a small part of the image, for example, in the window edges and doll faces. While the first group here focuses on the background, this group of pictures focuses on the front part of the image. The sands are details in the picture and it can clearly show how well different algorithms restore the picture. The deep learning algorithm gets 7.16, the self-implemented algorithm gets 6.23, and the original one has the lowest score 5.88. The differences between scores in the original picture and the restored pictures are not that large because the unwanted information takes a small part of the image. Discussion From the above score matrix, we can notice some corner cases with extreme values. Based on our analysis, we believe the anomalies presented in our showcase are the results of the interplay between the image restoration algorithm and the scoring algorithm. We observe that in situations with complex backgrounds and blurring, the accuracy of our scoring metric tends to decrease. We attribute this phenomenon to the challenges in distinguishing noise under such conditions. Additionally, when the image contains too many details, the calculation of sharpness, which is based on the edges crossing different parts in the image, will also cause some influence on the restoration score. Conclusion Based on our result matrix, except from some extreme cases, the images generated from the Real-ESRGAN algorithm and our self-implemented algorithm have higher scores than the original images.Moreover, in most cases, machine learning methods tended to outperform self-implemented approaches. We attribute this to the ability of machine learning networks to intelligently recognize information within images, such as foreground and background, enabling personalized restoration, whereas self-implemented methods typically apply uniform operations to the entire image, making machine learning methods more nuanced in the approach. Meantime, as we can observe from the Analysis section, our scoring algorithm sometimes may be influenced by image blurriness and details in the image. These defects should be fixed and improved in our future implementation. Reference [1] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4681-4690, 2017. [2] Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In European Conference on Computer Vision Workshops (ECCVW), 2018. [3] Xintao Wang, Liangbin Xie, Chao Dong, Ying Shan. Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1905–1914, 2021.

Final Project Report

Related documents

Products

Support

Final Project Report

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib