The presence of shadow in an image is a major problem associated with various visual processing applications such as object recognition, traffic surveillance and segmentation. In this paper, we introduce a method to remove the shadow from a real image using the morphological diversities of shadows and sparse representation. The proposed approach first generates an invariant image and further processing is applied to the invariant image. Here, shadow removal is formulated as a decomposition problem that uses separate local dictionaries for shadow and nonshadow parts, without using single global or fixed generic dictionary. These local dictionaries are constructed from the patches extracted from the residual of the image obtained after invariant image formation. Finally, non-iterative Morphological Component Analysis-based image decomposition using local dictionaries is performed to add the geometric component to the non-shadow part of the image so as to obtain shadow free version of the input image. The proposed approach of shadow removal works well for indoor and outdoor images, and the performance has been compared with previous methods and found to be better in terms of RMSE.
Shadow ; Dictionary learning ; Shadow removal ; Sparse coding ; MCA
Shadow detection and removal process is widely used as a pre-processing operation in various image processing applications for the removal of undesirable noise and objects. For example, applications such as video surveillance [1] , scene interpretation [2] and object recognition [3] require shadow removal as an initial step to eliminate the undesirable effect of noise on the performance of such systems. Once detected, shadows in images are used for applications such as detection of object shape and size in aerial images, detection of movement of objects in video surveillance system, and finding the number of light sources and illumination conditions in natural images. In digital photography, removal of shadows can help to improve the visual quality of photographs. Ignoring the existence of shadows in images can, in general, degrade the output quality.
Shadow detection and removal is an active research area for the last two decades. Several algorithms have been proposed based on learning [4] , color models [5] , region [6] and [7] and invariant image models [8] and [9] for shadow detection and removal in image as well as in videos. A major work by Lalonde et al. [10] mainly focuses on the shadows cast by objects onto the ground plane. Other notable works are based on assumptions of Lambertian reflectance and Planckian lighting [11] . Interested readers can see a review article by Sasi and Govindan [12] to get a more comprehensive report of the methodologies reported in the field of shadow detection and removal during the last decade.
Though shadow removal involving multiple images [13] and interactive methodologies [14] , [15] and [16] provides superior performance, fully automatic approaches available for single image shadow removal stand behind them in terms of performance. This is because of the fact that indoor and outdoor shadows are much affected by the direction, intensity of light source, as well as geometry and texture of the objects where shadow is cast.
The review carried out reveals that the research work reported in shadow detection and removal works satisfactorily in the case of user interaction [14] , [15] and [16] and for multiple images [13] . The automatic approaches available for single shadow removal are more complex to implement and set much restriction in the class of images under consideration [10] and [11] . Reintegration methods and local area processing are time intensive. Also, many methods cannot distinguish between near dark objects and shadows. In the case of small-patch regions, image in-painting method is more suitable, whereas in-painting large patch holes involve huge computation. Thus, the topic of single image shadow detection and removal requires a good amount of further research to develop an approach that provides satisfactory performances.
This paper proposes a method to remove shadow from a real image using sparse representation and a variation of MCA. Sparsity is a powerful way to approximate signals and images by using a sparse linear combination of atoms from an over-complete dictionary. Sparse representation is being used in signal and image processing applications such as de-noising [17] , super-resolution [18] , in-painting [19] , deblurring [20] , segmentation [21] , and compression [22] , demonstrating that sparse models are well suited to natural images as well. Starck et al. developed the idea of MCA in a series of papers and it was used in separating the texture from the geometric component [23] , [24] and [25] . MCA algorithm works by decomposing the image into edges and textures, using the morphological differences in these features [23] and [24] . Each of the shape features is related to a fixed dictionary of atoms such as wavelets or Discrete Cosine Transforms. Typical MCA is an iterative thresholding scheme in which the threshold linearly decreases to zero. Another similar work in the area is “Learning the Morphological Diversity” by Peyré et al. [26] . They introduced an adaptive MCA scheme by learning the morphologies of image layers. A combination of adaptive local dictionaries and fixed generic dictionaries using wavelets and curvelets is used for decomposition. The main deficiency of these models is the existence of similar atoms corresponding to cartoon and texture dictionaries that produce coherence. So, to get the expected solution, proper manual initialization is essential.
Apart from sparse representation, we are also using invariant image formation as a base step in the proposed shadow removal method. Many of the works in the area of shadow removal using invariant image formation were authored by Finlayson and his students [8] , [9] , [27] , [28] , [29] , [30] and [31] . In general, his methods are based on forming an invariant image, in which shadows do not appear followed by reconstructing the required missing components using reintegration. Invariant image formation results in the loss of photo quality of image. To bring back the fine details lost in the invariant image formation, reintegration is performed using Poisson equation by averaging over retinex paths [8] or Hamiltonian paths [27] . In most of the methods missing information after shadow removal is interpolated using image inpainting methods. Finlayson also limits his work to images that follow Lambertian model where Planckian illumination lights the scenes. However, real scenes need not satisfy Lambertian assumption.
The remaining part of this research report is put forth as in the following: The basics of sparse coding, dictionary learning, MCA and invariant image formation required for better readability of the paper is presented in Section 2 . The proposed method of shadow removal using sparse representation is discussed in Section 3 . The dataset used, results obtained and further discussions of the proposed work are given in Section 4 . The paper is concluded in Section 5 highlighting the approach used and performance gain achieved.
This section briefly reviews the theory behind sparse coding (SC), dictionary learning, morphological component analysis (MCA) and intrinsic image formation to better understand the proposed shadow removal methodology using MCA.
Using sparse coding, an image y can be expressed as a set of few elementary signals taken from an over-complete dictionary A , subject to α should be sparse.
|
( 1) |
Considering noise and sparsity constraint, we can add a regularization parameter λ and reformulate (1) as
|
( 2) |
Solution to the above problem is NP hard; however, many convex [32] , non-convex [33] optimization and greedy approximation algorithms [34] and [35] exist in literature to deal with problems having the above formulation. Since norm 2 minimization is equivalent to norm 1, L1 regularized LS also gives solution for the above problem [32] , [36] and [37] .
In dictionary learning, the algorithm is given samples of the form y = Aα , where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle \alpha \in {\mathfrak{R}}^m}
is an unknown random sparse vector and A is an unknown dictionary matrix in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle {\mathfrak{R}}^{n{_\ast}m}} . The goal is to learn A and α from given y such that
Dictionary learning can be formulated as given in (3) . For fixed dictionary solve system of equations subject to α is sparse. Then, for fixed α , update A.
|
( 3) |
where k denotes sparsity.
Dictionaries can be fixed, global or local. Global dictionaries are built from clean patches of selected images of a database. Earlier, in the sparse coding area, focus was mostly given to fixed over-complete dictionaries, such as wavelets and discrete cosine transform [38] . These approaches are called generic since the dictionary is pre-defined. Local dictionaries are learned online and hence more adapted to the input image. Different methodologies exist in dictionary learning literature, starting from fixed dictionaries to online dictionaries [39] . Fixed dictionaries find sparse approximations of the set of training signals for fixed dictionary, whereas optimized dictionaries keep sparse signals fixed and build optimized dictionaries [40] and [41] . MOD, K-SVD [42] , and online dictionary learning [39] and [43] are the popular algorithms in the area.
Morphological Component Analysis is used for the separation of the components of an image having different morphologies. MCA and Basis Pursuit are based on sparsity, but MCA is much faster and is capable of handling large data sets.
Consider an image y having ‘s ’ morphological components, such that Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle y=\sum_{k=1}^Sy_k}
, where yk denotes the k th geometric or textural component of y . To decompose the image y into Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle {\left\{y_k\right\}}_{k=1}^S} , the MCA algorithm finds the sparsest solution over the dictionaries Ak such that
|
( 4) |
where αk denotes the k th sparse solution
MCA algorithm uses fixed generic dictionaries such as wavelets and curvelets in representing geometric component and DCT for textural components. A major step in this algorithm is the selection of dictionaries that are mutually incoherent. Further details about the MCA algorithm can be found in Reference [24] .
An invariant image is invariant to illumination, color and intensity. Illumination invariant image is invariant to illumination. Shadow is caused by illumination and hence illumination invariant image is free from shadows. However, invariant image formation degrades photo quality of image. The proposed method of shadow removal uses invariant image formation proposed by Finlayson et al. [28] . Finlayson et al. say that the correct angle of projection is one with minimum entropy in the resulting invariant image. The result is a gray scale image that is independent of shadows.
In the proposed approach, shadow removal is formulated as a variation of image decomposition problem using MCA, which uses separate local dictionaries for shadow and nonshadow parts. The input image is initially used to generate an illumination invariant image [28] in which shadows are absent. The formation of invariant image also results in the loss of photo quality of image; hence, texture and edge information is lost and we need to bring back these fine details of the image to get shadow-free resultant image. Invariant image is then subtracted from input image to get the residual image that consists of shadow and geometric information of the input image. Sparse coding using orthogonal Matching Pursuit is applied using locally learned dictionaries. Finally, missing geometric components are reintegrated to the invariant image to get the resultant shadow-free image. The major steps during the shadow removal process is given in Fig. 1 (a)–(f).
|
Fig. 1. (a) Input image I ; (b) shadow free invariant image IN of input I ; (c) IS = I − IN ; (d) geometric component; (e) output of the proposed approach; and (f) ground truth. |
Major differences between the proposed approach and MCA are
Input shadow image I is initially used to generate an illumination invariant image, IN , in which shadows and relevant geometric information are absent. Difference between input image I and illumination invariant image IN generates a residual image IS where the shadows and the other geometric information are retained. Thus, I is now split into two images; the shadow image and the nonshadow image, I = IN + IS . Dictionary learning generates a dictionary Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S}}
using the training patches extracted from IS . Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S}} is further divided into two sub-dictionaries, which represent the shadow and the geometric components of IS , respectively. The problem of shadow removal using the proposed approach is formulated as given in (5) . Sparse coding is performed using Mallat and Zhangs Orthogonal matching pursuit [34] .
|
( 5) |
where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle y_{I_S}^k\epsilon R^n}
denotes the k th patch extracted from IS . Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle {\alpha }_{I_S}^k\epsilon R^m} are the sparse coefficients of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle y_{I_S}^k} with respect to Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S}\epsilon R^{n\times m}} and L denotes the maximum number of nonzero coefficients of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle {\alpha }_{I_S}^k} .
The proposed approach of shadow removal uses separate local dictionaries for shadow and nonshadow parts, instead of using single global dictionary. Even though shadows are affected globally, we perform decomposition and reconstruction locally, using local dictionary. Hence, using local dictionaries can contribute more results than using global dictionary. The proposed work mainly focuses on the use of a local dictionary or an adaptive strategy that builds the dictionary from the exemplar patches extracted from the shadow and the nonshadow regions of residual image (IS ). The reasons for not selecting a single global dictionary are
Initial dictionary Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S}}
is built from a set of overlapping patches from an image having shadow part IS . Dictionary learning for shadow detection problem can be formulated as follows
|
( 6) |
where αk denotes the sparse representation coefficients of yk based on Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S}}
, and λ represents regularization parameter.
To solve (6) , we used an online dictionary learning algorithm proposed by Mairal et al. [43] . The atoms constituting Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S}}
are further divided into two classes representing the geometric (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S\mbox{_}G}} ) and the shadow components (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S\mbox{_}S}} ). Figure 2 (a)–(f) illustrates the steps involved in partition. Dictionary partition is performed by computing texton histogram followed by k-means clustering. The sum of the norm of each atoms texton histogram in a cluster is computed and the one giving the minimum norm belongs to a shadow cluster and the other cluster is a geometric component. This is based on the observation that norm of shadow cluster when ploted always gives less area under the curve, whereas geometric cluster gives higher value for the same. Thus
|
( 7) |
|
Fig. 2. (a) Input image I , (b) invariant image; (c) IS = IN − I ; (d) initial dictionary learned from Is (e) Local Shadow dictionary; (f) Local Geometric dictionary. |
The MCA algorithms distinguish between the morphological diversities of geometric, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S\mbox{_}G}}
, and shadow, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S\mbox{_}S}} , sub-dictionaries by using mutual incoherence between them. The mutual coherence Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle \mu \left(A_{I_S\mbox{_}G}\mbox{},\mbox{ }A_{I_S\mbox{_}S}\right)} of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S\mbox{_}G}} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S\mbox{_}S}} is defined as
|
( 8) |
where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle d_{Gi}\mbox{},\mbox{ }d_{Sj}}
stand for the i th and j th atoms in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S\mbox{_}G}} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S\mbox{_}S}} , respectively and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle \left\langle d_{Si}\mbox{},\mbox{ }d_{Gj}\right\rangle } denotes the inner product of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle d_{Si}\mbox{},\mbox{ }d_{Gj}} . When each atom is normalized to have a unit l2 norm, the range of µ (A1 , A2 ) is [0, 1]. The mutual incoherence of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle d_{Gi}\mbox{},\mbox{ }d_{Sj}} is Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle 1-\mu \left(A_1\mbox{},\mbox{ }A_2\right)} . A smaller value of mutual coherence indicates that there are much diversities in the two sub dictionaries and decomposition will be better.
OMP algorithm is applied to each Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle y_{I_S}^k}
extracted from IS via minimization of (5) based on the two dictionaries Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S\mbox{_}S}} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S\mbox{_}G}} to find its sparse coefficients Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle {\tilde{\alpha }}_{I_S}^k} . To recover geometric component Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle I_S^G} of IS we use reconstructed patch Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle y_{I_S}^k} as follows.
in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle {\tilde{\alpha }}_{I_S}^k} is set to zeros to obtain Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle {\tilde{\alpha }}_{I_SS}^k} . Similarly, the corresponding coefficients of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S\mbox{_}S}} in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle {\tilde{\alpha }}_{I_S}^k} are set to zero to obtain Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle {\tilde{\alpha }}_{I_SG}^k} .
can be used to recover Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle I_{S_G}} by averaging the pixel values in overlapping regions. Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle y_{I_S}^k} can be re-expressed as geometric and shadow components as follows.
|
Finally, the shadow-removed image can be obtained via Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle I_{shadow\mbox{_}free}=I_N+} Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): I_{S_G}
. Instead of iteratively performing sparse coding, here, MCA based image decomposition is performed only one time for each patch Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle y_{I_S}^k} with respect to Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S}=\left[A_{I_S\mbox{_}G}\vert A_{I_S\mbox{_}S}\right]}
The entire algorithm is summarized in Algorithm 1: Non-iterative MCA based shadow removal.
This section gives detailed description about the implementation details, the dataset used, evaluation metrics and performance of the proposed shadow removal methodology.
Guo et al. [6] provide datasets for shadow detection as well as removal which are available in the University of Illinois at Urbana-Champaign database (UIUC).1 This dataset consists of 108 indoor as well as outdoor image pairs, photographed under a variety of illumination conditions. As this dataset also provides shadow-free image and ground truth of shadow region, it is useful for the evaluation of detection and removal.
Evaluation of the proposed method is performed using RMSE (Root mean squared error). Most of the shadow removal works reported in still images have not performed quantitative evaluation of the result. Guo et al. [6] , Miyazaki et al. [14] Gong and Cosker [15] , and Gryka et al. [16] are the major works that provide quantitative evaluation. Other notable works provide only visual comparison.
Experimental evaluation was performed on both indoor and outdoor images from UIUC database [6] . The images used in this paper include human subjects and natural objects in various environments comprising different background materials and textured surfaces. Figure 3 gives the sample result of few images using the proposed method. The procedure and implementation details of the proposed method are described as follows.
|
Fig. 3. Results of the proposed shadow removal process: Left – input image I ; Middle – output of the proposed approach; Right – Ground truth. |
The implementation of the intrinsic image is adopted from Finlayson et al. [28] . Successful removal of shadow depends to a great extent on this first phase. This method is simple and works by finding the invariant direction followed by generating a gray scale image. Finally, an L1, a chromatic intrinsic image that is free of shadows, is formed. The method does not need any sort of camera calibration or prior knowledge about the image. Invariant image results in the loss of fine details, namely edges and geometric features. The proposed method tries to bring back these fine details into this invariant image to get the final shadow-free image. A plot given in Fig. 4 shows the difference in RMSE for invariant image and resultant image using proposed approach for the entire set of 108 images from the UIUC dataset. It can be observed that the RMSE of the proposed approach always remains better than that of the invariant image.
|
Fig. 4. RMSE for the invariant image and the resultant image for the entire dataset. It can be observed that RMSE of the proposed approach always remain less than that of the invariant image. |
In the dictionary learning step, we used online dictionary learning proposed by Mairal et al. [43] to solve (6) with the suggested regularization parameter λ set to 0.15. Initial dictionary Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S}}
is built from a set of overlapping patches from residual image IS (see Fig. 2 (c) and (d)). Size of dictionary is set as 1024 and the dictionary learning iteration is fixed as 100. For dictionary learning, residual image of size 200 × 200 is used, from which patches of size 18 × 18 are extracted. The criteria for selecting patch size is further explained in this section. K-means clustering is used to classify initial dictionary into shadow and geometric dictionary as the proposed approach uses separate local dictionaries for shadow and geometric parts. Figure 2 (e) and (f) shows a sample of shadow and geometric dictionaries. Sparse coding was performed using Mallat and Zhangs Orthogonal Matching pursuit [34] .
We measure the diversities of two sub dictionaries, shadow and geometric (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S\mbox{_}G}}
and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://test.scipedia.com:8081/localhost/v1/":): {\textstyle A_{I_S\mbox{_}S}} ), using mutual coherence µ defined in (8) . Smaller value of mutual coherence leads to much diverse sub-dictionaries and improves the decomposition based on the two sub-dictionaries. As the dictionary atom size increases, mutual coherence also increases. Size of dictionary atoms influences the mutual coherence of the sub-dictionaries. We tried to plot the dictionary atom size versus the mutual coherence of local, global and generic dictionaries in Fig. 5 . For generic dictionaries such as Haar wavelet, DCT and Gaussian, mutual coherence decreases as atom size increases, whereas for local and global dictionaries, mutual coherence increases along with atom size. Also, µ of local dictionaries is less compared to that of global dictionaries. From Fig. 5 its clear that as atom size increases, mutual coherence of local and global dictionaries also increases, whereas decreasing atom size affects the computation time. So, we try to keep atom size as small as possible so that it always maintains a reasonable computation time during the online dictionary learning process. Hence, we fixed the patch size as 18 × 18, which in turn is equal to atom size 324. Even though µ of fixed generic dictionaries is less, the resultant shadow-free image using local dictionaries has less RMSE than the one using generic dictionaries.
|
Fig. 5. Dictionary atom size versus Mutual coherence (µ ) of generic, local and global dictionaries. |
The major difference between the proposed method and MCA was explained in Section 3 . MCA algorithms are iterative, whereas the proposed MCA approach is non-iterative. Even if the algorithm iterates for a few number of times, the result somewhat remains the same and there is no notable difference in the RMSE of the resultant image. This is clear if we observe the plot given in Fig. 6 , where RMSE of the proposed approach for image in Fig. 2 over 100 iteration is given. The intention of Fig. 6 is just to clarify that RMSE value never changes over iteration. Another difference from MCA is that the proposed approach uses local sub-dictionaries, whereas the MCA approach uses fixed generic dictionary. Plot given in Fig. 6 also gives the difference in performance using RMSE value of the resultant image using fixed dictionary, local sub-dictionary and after invariant image formation. The proposed approach using local dictionary substantially improves the performance.
|
Fig. 6. RMSE vs Iteration for image in Fig. 2 using generic dictionary, local dictionary and invariant image. Proposed approach is non-iterative and even if the algorithm iterates for 100 times, RMSE value of the resultant image remains almost constant. |
Quantitative evaluation of the proposed method has been performed using RMSE. Table 1 gives the performance of the proposed approach in terms of RMSE using generic, local and global dictionaries. Performance is comparatively better when using local dictionary. Even though µ of generic dictionaries is comparatively lower, RMSE of shadow removal using the proposed approach is better. The method has been quantitatively compared with Guo et al. [6] and Gryka et al. [16] and found to be better in terms of RMSE. Details of evaluation using images from UIUC dataset are given in Table 2 . The proposed method has been visually compared using results from Gong and Cosker [15] and is given in Fig. 7 .
Generic dictionary | |||||
---|---|---|---|---|---|
1[[#tn0010|a]] | 2[[#tn0015|b]] | 3[[#tn0020|c]] | Global Dictionary | Global + Local Dictionary [[#bib0135|[26]]] | Proposed Approach |
14.67 | 14.85 | 13.93 | 14.10 | 13.5 | 12.23 |
a. Haar wavelet packet and DCT dictionary.
b. DCT and shifted Kronecker delta.
c. iid Gaussian entries with zero mean.
Methodology | RMSE |
---|---|
Guo et al. 2013 [[#bib0035|[6]]] | 19.85 |
Gryka et al. 2015 [[#bib0085|[16]]] | 13.83 |
Invariant image [[#bib0145|[28]]] | 15.10 |
Proposed approach | 12.23 |
|
Fig. 7. Result of the proposed shadow removal process and Gong and Cosker [15] : Left – Input image I ; Middle – output of the proposed approach; Right – Result of Gong and Cosker [15] . |
Table 3 gives a qualitative evaluation of six major shadow removal algorithms – Miyazaki et al. [14] , Lalonde et al. [10] , Guo et al. [6] , Gong and Cosker [15] , Zhu et al. 2015 [11] and Gryka et al. [16] – to compare their performance with the proposed approach using properties such as computational load and preservation of textures.
Author | Method | Preservation of Texture | Computational load |
---|---|---|---|
Miyazaki et al. [[#bib0075|[14]]] | Interactive | Excellent | Low |
Lalonde et al. [[#bib0055|[10]]] | Automatic | Good | Low |
Guo et al. [[#bib0035|[6]]] | Automatic | Excellent | Medium |
Gong and Cosker [[#bib0080|[15]]] | Interactive | Good | Low |
Zhu et al. [[#bib0060|[11]]] | Automatic | Average | Medium |
Gryka et al. [[#bib0085|[16]]] | Interactive | Excellent | Low |
Proposed | Automatic | Excellent | Low |
Average running time of an image from UIUC dataset [6] is 78.34 ± 18 s/image, out of which 69.6 ± 10 is for dictionary learning. In comparison, for the same dataset, the method by Guo et al. [6] takes 104.718 s/image for shadow removal. Computational time of the proposed method is better compared with state of the art methods.
We have presented a method to remove shadow from an image using the morphological diversities of shadow coupled with invariant image formation. In the proposed approach, shadow removal is formulated as a decomposition problem that uses separate local dictionaries for shadow and nonshadow parts, without using single global dictionary. Local dictionaries used in the proposed approach are learned from the patches extracted from the residual of image obtained after invariant image formation. Finally, a variation of MCA-based image decomposition using local dictionaries is performed to add the geometric component to the non-shadow part of the image to obtain the shadow-removed version of input image. The proposed approach of shadow removal works well for indoor and outdoor images, and the method has been compared with previous approaches and was found to be better in terms of RMSE.
Published on 07/04/17
Licence: Other
Are you one of the authors of this document?