Ayush Dogra and Sanjeev Kumar
Image fusion techniques efficiently integrate complementary information from multiple source images into one single image to enhance the viewing perception of the observer. Fusing visible and infrared image data, for example, helps reveal hidden information in a scene and works quite well for detecting concealed weapons.
Image fusion techniques involve efficiently extracting and integrating complementary, useful information from multiple source images and enhancing the overall features without disturbing the aesthetics of the image, and then fusing that information together into one single image to enhance the viewing perception of the observer. The following article looks at common techniques and an example that fuses infrared and visible images for detecting a concealed gun.
Extracting meaningful features
Final image fusion tools exist for judging the statistical importance of the chosen coefficient that help decide as to whether to include the coefficient in the final image or not. To maximize the meaningful information transfer from source images to fused image, this information must be separated using tools such as orthonormal low-pass and high-pass filter banks. The orthonormal filter banks obtain a perfect reconstruction by avoiding aliasing and other degrading effects [1], [2].
Images have many features to consider, including objects and their boundaries, curves, textures, and details. These features can be categorized under low-frequency and high-frequency components. Low frequency information constitutes the homogenous or same pixel intensity areas while high-frequency areas are characterized by object boundaries or abrupt changes in pixel intensities along with noises. The process of breaking an image into its low frequency and high frequency components is known as multi-scale decomposition (MSD). Extracting multiple scales of information helps in refining details and improving overall fused image quality [3], [4], [5].
In many algorithms used today, extracting meaningful information involves using the saliency maps technique (Figure 1). Various saliency detection techniques exist to extract salient information, however, with the purpose of clearly defining object boundaries with all redundancies removed. Refining this information—which can be done by using various edge preserving filters—then produces weights. When applied to the details of an image, the weights enhance the details further and improve edge strength in the final fused image.
Image fusion algorithms aim to remove redundant information and preserve semantically useful information, so various enhanced base and detail layers fuse together using a set of defined rules called fusion rules, to get the maximum amount of useful information from those enhanced layers. Fusion rules help to consider and choose the most useful coefficients (pixels) out of a cluster of pixels. Various criteria help judge the statistical or geometrical importance of a coefficient or a pixel among a group of pixels [5], [6], [7].
Figure 2 shows several fusion rule categories. Some rules use complex statistical analysis to choose coefficients, while some are very generic to implement and do not use complex statistical analysis but rely on simple calculations. Examples of such rules include the average fusion rule, choose maximum fusion rule, and choose minimum fusion rule. As the names indicate, through these rules fused coefficients are obtained by simple averaging or choosing maximum value or minimum value coefficient from a group of coefficients. The choice of fusion rule depends largely on the type of data in hand.
If one fusion rule works on a dataset, it may or may not work efficiently on other datasets. Situations may also arise when forming a final fused image requires multiple fusion rules [1], [5]. After the fusion process creates a final image, two criteria help assess the algorithm’s performance. Visual observation (looking for appealing visual details) is the first criteria, while the second involves using mathematical formulas to evaluate performance which is necessary to validate visual results.
Performance evaluation metrics can be categorized into classical-based metrics and gradient-based metrics. In gradient-based metrics, sources images are considered to evaluate the fusion algorithm’s performance. Some examples include edge strength metric or fusion rate (QABF), Loss of information metric (LABF) and the artefact measure (NABF) metric. Classical-based metrics do not use any of the source images to evaluate fusion algorithms. Examples of these reference-based metrics methods include entropy, standard deviation, average gradient, and spatial frequency. Reference-based metrics today are quite popular among researchers, as they directly compare the features of a fused image with the features of all source images. This association provides a solidarity to the evaluation [8].
Gun dataset experiment and results
CSIR-CSIO’s (Chandigarh, India; www.csio.res.in)researchers used the single, open-source dataset called the gun dataset [16], which presents challenges because the images were captured in low-light conditions by visible and infrared sensors. The researchers implemented the same algorithms using MATLAB software from MathWorks (Natick, MA, USA; www.mathworks.com) on the gun dataset in order to test the performance of the particular algorithm on this dataset.
The infrared sensor reveals hidden information in the scene that is not visible to the naked eye. The temperature difference between the gun and the body of the person manifests itself in the shape of a gun. Infrared images, however, only contain object boundary information and have very low contrast and luminance. Image fusion can solve this problem by integrating the infrared image with the visible image, which offers high contrast and better visibility, but lacks hidden features in the scene.
Figure 3 shows visual results obtained from techniques like discrete cosine harmonic wavelet transform (DCHWT) [9], singular value decomposition (SVD) [10], guided filter (GF) [11], cross bilateral filter (CBF) [12], saliency detection (SD) [7], anisotropic diffusion (AD) [13], fourth order partial differential equation (FPDE) [14], and fast filtering IF (FFIF) [15]. Figure 4 shows objective evaluation results obtained by using mathematical tools on the same fused images. The experiment uses the reference-based metrics mentioned above to evaluate the fusion algorithms.
The visual results show that DCHWT, CBF and SD-based fusion rules reveal the targets more clearly and hence perform better. Targets are barely visible in FFIF, while other methods perform poorly on the dataset as well. Results show that DCHWT is the best performing algorithm. Other algorithms introduce artifacts, but DCHWT manages to preserve the edges and other meaningful details. In addition, it can also be seen in Figure 4 that the visual results are in agreement with the objective evaluation results in case of DCHWT.
Ayush Dogra, is a Postdoctoral Fellow at CSIR-Nehru, and Sanjeev Kumar, is a Senior Scientist at CSIR-CSIO (Research lab, government of India; Chandigarh, India; www.csio.res.in)