Hyperspectral Blind Unmixing using a Double Deep Image Prior

—With the rise of machine learning, hyperspectral image (HSI) unmixing problems have been tackled using learning-based methods. However, physically meaningful unmixing results are not guaranteed without proper guidance. In this work, we propose an unsupervised framework inspired by Deep-Image-Prior (DIP) that can be used for both linear and nonlinear blind unmixing models. The framework consists of three modules: (1) an Endmember estimation module using DIP (EDIP), (2) an Abundance estimation module using DIP (ADIP), and (3) a Mixing module (MM). EDIP and ADIP modules generate endmembers and abundances, respectively, while MM produces a reconstruction of the HSI observations based on the postulated unmixing model. We introduce a composite loss function that applies to both linear and nonlinear unmixing models to generate meaningful unmixing results. Additionally, we propose an adaptive loss weight strategy for better unmixing results in nonlinear mixing scenarios. The proposed methods outperform state-of-art unmixing algorithms in extensive experiments conducted on both synthetic and real datasets.


I. INTRODUCTION
In hyperspectral imaging (HSI) technology, sensors are able to capture the spectral reflectance of every pixel in a scene across hundreds or thousands of spectral channels [1].This wealth of spectral information enables more accurate material identification compared to RGB imaging [2].However, the observed reflectance is usually a mixture of the spectral signatures of the materials present in the scene due to the heterogeneity of the scene [3].Consequently, there is a need for methods that can quantitatively decompose, or unmix, the captured spectral signature into its spectral components, also referred to as "endmembers," and their corresponding proportions within the mixture, referred to as "abundances" [4], [5].
The linear mixture model (LMM), shown in Fig. 1a, ideally models observed HSI signatures as a linear combination of endmembers' signatures weighted by their corresponding abundances [1].However, when there are nonlinear effects such as multiple scattering, LMM is no longer applicable, because the signatures captured by sensors result from interactions with various materials at different levels/layers [3].
To address this nonlinearity, a nonlinear mixing model (NLMM) is proposed [3].Two popular NLMMs are intimate mixture and multilayered mixture, illustrated in Fig. 1b and Fig. 1c, respectively.In intimate mixture, different materials This paper was produced by the IEEE Publication Technology Group.They are in Piscataway, NJ.
Manuscript received April 19, 2021; revised August 16, 2021 are close to each other, while in multilayered mixture, interactions with different materials occur in different layers.Under both LMM and NLMM, the HSI blind unmixing problem involves two tasks [3]: (a) endmember estimation and (b) abundance estimation.Generally, endmember estimation methods are based on geometrical approaches, assuming that the data is embedded in a simplex whose vertices correspond to the endmembers.Two widely used methods that fall under this category are vertex component analysis (VCA) [6] and simplex volume maximization (SiVM) [7].Conversely, most abundance estimation methods in the literature are based on LMM [1].For example, when the endmembers have been estimated by endmember estimation methods, the linear unmixing problem can be reduced to a least square problem, which can be solved using a fully constrained least squares (FCLS) [8] solver.If the endmembers are known in the form of a rich spectral library, the abundance estimation problem can be formulated as a linear sparse regression (SR) problem [9], which has been tackled by methods such as sparse unmixing by variable splitting and augmented Lagrangian (SunSAL) [9] and collaborative SUnSAL (CLSunSAL) [10].There are also abundance estimation methods that take into account NLMM, such as the hierarchical Bayesian algorithm [11], where the unknown model parameters are estimated using the Metropolis-within-Gibbs sampling algorithm.Recently, a robust sparse unmixing (RSU) method with ℓ 2,1 -norm based loss function [12] has been proposed by considering the nonlinear terms as outliers.RSU is solved by the alternative direction method of multipliers (ADMM) [13].
Some methods perform endmember estimation and abundance estimation simultaneously, known as blind unmixing (BU).Nonnegative Matrix Factorization (NMF) [14]- [16] is an example of such a method, which decomposes a nonnegative observations matrix into two nonnegative matrices, where the endmember signature matrix and abundance matrix are interpreted as the endmembers and abundances, respectively, in the context of HSI.Another widely used BU method is Entropic Descent Archetypal Analysis (EDAA) [17], which is based on the concept that HSI data is generated by a linear combination of a small number of archetypes, representing the extreme points of the HSI data, and these archetypes are interpreted as endmembers.Some BU algorithms inspired by Nonnegative Blind Source Separation (nBSS) techniques [18]- [21] have been proposed by incorporating various constraints.For example, HiSun [22] introduces the John ellipsoid (JE) in nBSS to tackle ill-conditioned BU problems.However, these traditional approaches can be computationally complex.HyperCSI [23] proposes a fast BU algorithm by exploiting the fact that the simplest simplex of N vertices can be defined by N associated hyperplanes.More recently, machine learning techniques, particularly neural networks, have been utilized to tackle HSI unmixing tasks, resulting in numerous learning-based approaches [24]- [27].These methods can be broadly classified into two categories: supervised and unsupervised.Supervised methods are trained using pairs of HSI reflectances and their corresponding abundances [27], [28], which enables the models to map HSI spectra to corresponding abundances.In contrast, unsupervised methods learn a function to estimate endmembers and abundances from only the HSI reflectances, without knowledge of true abundances.These blind unmixing methods [29]- [33] utilize an autoencoder network structure with a linear decoder to reconstruct HSI spectra.The bottleneck of the autoencoder provides the abundance estimation, while the weights of the linear decoder give the endmember estimation.However, these methods are only effective in solving the linear blind unmixing problem, and cannot be applied to nonlinear blind unmixing problems [24].
Recently, [24] proposed a deep autoencoder to address the nonlinear blind unmixing problem that involves an additive nonlinear mixture component.Similarly, [26] introduced a novel nonlinear autoencoder structure by incorporating a crossproduct layer to account for nonlinear mixing mechanisms.In contrast, EGU-Net [2] models the nonlinear mixing of materials as spectral variability and solves the unmixing problem using a two-stream deep network that learns an additional network from the (nearly) pure endmembers extracted from HSI data via existing endmember extraction methods.
However, deep learning-based methods may produce unmixing results that lack physical meaning without appropriate guidance [2].To address this issue, UnDIP [34] has proposed using the simplex volume maximization algorithm to extract endmembers, which are then utilized as guidance to train an abundance estimation network using a deep image prior (DIP).
Despite these developments, the quality of guidance remains a bottleneck for unmixing networks trained with guidance, as we demonstrate later.Furthermore, most learning algorithms are unable to generalize from linear to nonlinear unmixing problems, as they rely on the autoencoder structure to solve either linear or nonlinear problems.To address these issues, we propose a novel guidance-based framework that utilizes double deep image prior techniques, which can overcome the performance limitations of existing guidance-based methods.Furthermore, the proposed method does not rely on the popular autoencoder structure and can be applied to both linear and nonlinear blind unmixing problems.We refer to our method as BUDDIP, and it offers the following contributions: 1) A novel framework, BUDDIP, is proposed that builds on the deep image prior (DIP) technique [35].Unlike the commonly used autoencoder structure, BUDDIP consists of three modules: an endmember estimation DIP (EDIP) module, an abundance estimation DIP (ADIP) module, and a mixing module (MM).BUDDIP has the flexibility to solve both linear and nonlinear blind unmixing problems.2) Instead of using random noise as input like other DIP based methods [34], [35], the proposed framework takes meaningful input by exploiting existing unmixing methods such as SiVM.A novel DIP network structure that is more efficient than the common one used in DIP based methods [34] is designed based on this input strategy.3) Inspired by training guidance, we propose a new composite loss function that can be applied to both linear and nonlinear blind unmixing cases.From the perspective of unmixing, the proposed loss function ensures that the network produces physically meaningful unmixing results and yields better endmember and abundance estimation than the guidance.From the perspective of network training, the proposed loss terms can be viewed as regularizations that alleviate overfitting problems in DIP techniques.4) In the nonlinear blind unmixing case, due to the complexity of nonlinear unmixing, we also propose an adaptive loss weight strategy to yield better unmixing results.5) Extensive numerical experiments on both synthetic and real datasets show that the proposed methods outperform state-of-the-art methods such as EGU-Net [2], UnDIP [34] and others [14], [36], [37], under both linear and nonlinear unmixing cases.The paper is structured as follows: Section II introduces related works.Section III describes the construction of each module of the proposed networks and how to train the apparatus in an entirely unsupervised manner.Section IV presents extensive experimental comparisons with competing methods.Finally, Section V concludes the paper.

II. RELATED WORK
We now overview some related works.

A. Linear Mixing Model (LMM)
The linear mixing model (LMM) is one of the most popular unmixing models in hyperspectral unmixing literature, which assumes that, for each pixel, the reflectance spectrum is a linear combination of the spectrum of the endmembers weighted by the corresponding abundances [1], [4].This model can be described as follows: where Y = [y 1 , ..., y n ] ∈ R p×n is a HSI data cube containing the reflectance spectra of n pixels across p spectral bands, i.e.,, y i ∈ R p×1 is the spectra of i th pixel; E = [e 1 , ..., e r ] ∈ R p×r is the endmember matrix containing the spectral signatures of r endmembers across p spectral bands, i.e., e i ∈ R p×1 models the spectra signature of the i th endmember (i = 1, ..., r); A = [a 1 , ..., a n ] ∈ R r×n is the corresponding fractional abundance matrix, i.e., a i ∈ R r×1 is the abundance vector containing the abundances of r different endmembers present in the i th pixel; and N ∈ R p×n represents additive white gaussian noise.
It should be noted that we discuss with flattened HSI image Y ∈ R p×n , flattened abundance map A ∈ R r×n , and flattened noise N ∈ R p×n for the purpose of simpler notation, but we actually work with the HSI image of size n 1 × n 2 , i.e., Y ∈ R p×n1×n2 , A ∈ R r×n1×n2 , and N ∈ R p×n1×n2 .Generally, the abundance is subjected to the non-negative constraint (ANC) and sum-to-one constraint (ASC), i.e., all the elements are equal to or greater than zero, A ≥ 01 and the abundance of each pixel should sum up to one, A T 1 r = 1 n , where 1 r is the all one vector with size r × 1.Similarly, the endmember matrix is also subjected to non-negative constraint (ENC), E ≥ 0, in order to be physically meaningful.
The goal of blind linear unmixing is to estimate E and A given only Y.A popular approach to address this type of problem involves solving [38]: where, R(A) is a regularizer depending on abundance matrix A, such as total variation (TV) [38].Generally, the choice of R is heavily dependent on the prior knowledge about the task at hand.The optimisation problem (2) is typically solved by the multiplicative update rule [15] or a two-stage cyclic descent method [38], which alternates between optimising A for fixed E, and vice versa.Despite its popularity and simplicity, the linear model can not handle more complex hyperspectral unmixing scenarios [2] such as those depicted in Fig. 1b and Fig. 1c.

B. Nonlinear Mixing Model
As an alternative to LMM, the nonlinear mixing model (NLMM) [11], [12], [39], [40] accounts for the presence of nonlinear effects by introducing additional nonlinear interaction terms in the LMM.Generally, a NLMM can be expressed as follows: where Y, E, A, N are akin to the same quantities appearing in (1) and O denotes the additional term accounting for nonlinear mixing effects.( 3) is also known as robust LMM (rLMM), and O is used to denote the outlier terms in [14].NLMM has been widely proposed in hyperspectral imaging literature, that differ from one another depending on how O is modelled.For example, the Bilinear model [39] is one of the most popular variants of NLMM, which models the observed spectra y k for the k th pixel as follows: where, y k , a k , n k correspond to the k th column vector of Y, A, N in (3).⊙ is the element-wise (Hadamard) product.The coefficient β i,j,k captures the degree of nonlinear interactions between the endmember e i and e j .For more comprehensive discussions regarding the Bilinear model, please refer to the Supplementary.The goal of blind nonlinear unmixing is also to estimate E and A (sometimes the outlier/nonlinear terms O) given only Y. Recently, this problem has been solved in [14] via minimising the following objective: The objective (5) defines a robust nonnegative matrix factorisation (NMF) problem and can be solved via an iterative blockcoordinate descent algorithm [14], which update each of the parameters E, A, O in turn while the other parameters are fixed.

C. Deep Image Prior (DIP)
DIP [35] was originally proposed to solve inverse problems such as denoising, given by: where, x 0 is a noisy image, and R is a regularizer explicitly capturing the prior information about the clean image x.Many contributions have concentrated on designing good regularisers, for example, Total Variation (TV) makes regions in the  image more uniform and plug-and-play prior [41] connects image inverse problems with well-developed image denoiser.DIP however solves the optimization problem given by: where, f θ (z) is a neural network parameterized by θ, with random input z.That is, DIP effectively replaces the regularizer R in (6) with a neural network.According to the DIP techniques [34], [35], the optimisation problem ( 7) can be solved by randomly initialising the parameter θ, and using a network optimiser such as gradient descent to update θ iteratively until a predetermined number of iterations is reached.After learning, the network parameterization would implicitly capture the prior and output the restored image given by x * = f θ * (z).
The DIP technique has been used to solve linear unmixing problems in UnDIP [34], where the endmember matrix is estimated via the existing method SiVM, which is used to guide the training of an abundance estimation network using a DIP network.However, UnDIP itself does not provide endmember estimation and the quality of its abundance estimation is limited by the quality of the guidance.Another problem of DIP techniques is that they often suffered from overfitting, which is usually solved by regularisation techniques such as the early stopping technique [35] and exponentially weighted averaging over the outputs from different runs [34], respectively.We next describe how we adapt the DIP technique to solve these issues.

III. BUDDIP
We now introduce the proposed general framework for blind hyperspectral linear and nonlinear unmixing tasks using the double deep image prior (BUDDIP).
The proposed BUDDIP network is a self-supervised endto-end network, which consists of three modules: EDIP, ADIP and MM, as shown in Fig. 2(a).EDIP is responsible for endmember estimation, whereas ADIP is responsible for abundance estimation.After obtaining the estimation of endmembers and abundances, the output of EDIP and ADIP modules will then be injected into MM to generate a reconstruction of the observed hyperspectral spectrum.
The ability to choose the MM module leads to the flexibility of the proposed BUDDIP.When MM is chosen to be an LMM, the proposed BUDDIP becomes linear BUDDIP (L-BUDDIP), which is capable of solving linear unmixing tasks.Similarly, when MM is chosen to be an NLMM, the proposed BUDDIP becomes non-linear BUDDIP (NL-BUDDIP), which can be used to solve non-linear unmixing tasks.In order to generate meaningful unmixing results, the proposed BUDDIP is trained with the guidance of endmembers and abundances, which are generated using any existing unmixing methods such as SiVM [7]+FCLS [8].However, different from other unmixing networks trained with guidance such as UnDIP [34] or EGU-Net [2], the performance of the proposed BUDDIP can surpass that of the guidance.

A. Endmember Estimation using DIP
We now introduce how to solve the endmember estimation problem using DIP.Like the two-stage cyclic descent method [38], let us first assume that, at the endmember estimation stage, we are given access to an estimate of the abundance matrix A G .This could be achieved by using some existing algorithms such as FCLS [8].Then, the optimization problem (2) would reduce to: In this work, following the idea of DIP, we propose to estimate the endmembers using a DIP network (EDIP) f θ E with learnable parameters θ E and an input T E .This leads to the optimization problem given by: In the original DIP [35] and unmixing work using DIP UnDIP [34], the network is fed with a random input, i.e., T E = z E , where z E is Gaussian noise in [34] and uniform noise between zero and 0.1 in [35].However, there are significant drawbacks to using random input as it does not contain any relevant information about the task or data that the network is designed to process.Another observation in [42] is that, when the network is given simultaneously a noisy observation and random noise, the network tends to ignore the noise.To address these drawbacks, we propose to give the DIP network a more meaningful input.Since there are many impressive unmixing works in the hyperspectral literature, we propose to use existing unmixing algorithms, such as SiVM [7], to generate an estimate of endmembers, E I , and use it as input to the proposed EDIP network, T E = E I .This can be viewed as a noisy estimation of the ground truth endmembers.
With the proposed input strategy, the network would already have a reasonable starting point and only need to learn the difference between the desired output and the input.This further motivates the design of the architecture of the proposed EDIP network f θ E .In particular, we use the ResNet [43] like structure, because the skip-connections proposed in ResNet can force the network to learn the difference between the input and output.Different from the common ResNet structure used in DIP and UnDIP, in this paper, we use a simpler network shown in Fig. 3.
The proposed EDIP network f θ E is built upon a block that consists of a Convolutional layer, a Batch normalisation layer, and an Activation layer.This type of block is very popular in neural network architectures like ResNet in literature [35], [44].However, different from the common 2D convolutional layer used in literature, we use the 1D convolutional layer in the proposed EDIP, because 2D convolutional networks are commonly used to tackle image related problems, where the image is represented as a 3D tensor.However, in the proposed EDIP network, the input and output are the endmember signatures represented as a 2D matrix.The 1D convolution will be performed over the spectral band dimension p in order to capture the spectral information as there is no spatial information in the endmember matrix.This block is repeated twice in the middle.As mentioned before, we want the network to learn the difference between the input and desired output, which is achieved via a skip connection shown in the side branch of Fig. 3.The output of the main branch and the skip connection will be added and injected into the final output block.The structure of the output block is the same as before except that the activation layer is replaced with the Sigmoid activation layer in order to meet the ENC constraint.After learning the parameter θ * E , the estimated endmember is given by Ê = f θ * E (E I ).

B. Abundance Estimation using DIP
We now introduce how to derive the DIP network for abundance estimation.Let us now assume that, at the abundance estimation stage, we are given access to an estimate of the endmembers E G , using some algorithm such as SiVM [7].Then, the optimization problem (2) would reduce to: We also propose to generate the estimation of the abundances using a DIP network (ADIP) f θ A with an input T A and learnable parameters θ A .This leads to the optimization problem given by: Similar to EDIP, we give the proposed ADIP a more meaningful input instead of random noise z A .We propose to use existing unmixing algorithms to generate an estimate of abundance A I , and use it as the input of the proposed ADIP network, T A = A I .
Akin to EDIP, the core of designing the architecture of the ADIP network is to force the network to learn the difference between the input and desired output.As a result, we use a similar structure as EDIP with several modifications, which are shown in Fig. 4. First of all, the ADIP network is used to estimate the abundances, which is represented as a 3D tensor like an image.Thus, the convolutional layer in the ADIP network is a 2D convolution, which can capture spatial information via various convolutional kernels.Because the abundance maps are more complex than the endmember matrix, we use four blocks to capture the rich information in abundance maps.Another difference is that, in order to meet ASC and ANC, we use softmax as the output layer of the ADIP network.After learning the parameter θ * A , the estimated abundance is given by Â = f θ * A (A I ).C. Blind Unmixing using Double DIP Finally, we will introduce how to construct the proposed general framework for blind unmixing using double DIP (BUDDIP).
After obtaining an estimation of endmember and abundance, Ê and Â, using EDIP and ADIP respectively, we can immediately generate a reconstruction of the observed HSI image by feeding Ê and Â into a Mixture Module (MM).The structure of the MM, which we will introduce later, is very flexible and dependent on which physical model is selected.After combining MM with EDIP and ADIP, the architecture of the proposed general BUDDIP framework is shown in Fig. 2.This general framework can be used to solve both linear and nonlinear unmixing problems, by applying different MM.
1) Linear BUDDIP (L-BUDDIP): In the LMM case, according to (1), the MM would perform the linear mixing process as follows: where ŶM is the reconstructed HSI observation.This is illustrated in Fig. 2(a)&(b).We coined it L-BUDDIP.
2) Nonlinear BUDDIP (NL-BUDDIP): In the NLMM case, according to the model (3), we can generate a reconstruction of the observed HSI image by feeding Ê, Â into the MM, which would perform the nonlinear mixing process as follows: where, is responsible for the linear effects in NLMM, and would account for the nonlinear effects in NLMM, where Ô = [ô 1 , ..., ôn ] ∈ R p×n .The choice of f N L is very flexible and dependent on which nonlinear model is selected.For example, in the FM model [45], according to (4), f N L would output Ô as follows: An illustration of the proposed NL-BUDDIP is shown in Fig. 2(a)&(c).Note that for other models such as GBM with extra parameters, those parameters can be learned via another DIP network according to [44].

D. Training Details
We now elaborate on how we optimize the various parameters associated with BUDDIP.
1) Loss function: According to the discussion above, the network has generated three estimations: Ê for endmembers, Â for abundances, and ŶM for reconstructed HSI observation, as shown in Fig. 2. Correspondingly, we propose to train our BUDDIP network using six loss functions (two for each output).To make the notations used later clear, we summarise some notations in Table.I.
First, for endmember estimation Ê by EDIP f θ E (E I ), motivated by the optimization problem (9) and the angle distance loss [37], we propose to use the loss function given by: where, α 1 and α 2 in (17) are the hyperparameters that control the relative importance of the corresponding loss term.ŶE is the HSI reconstruction by EDIP f θ E (E I ), given the guidance A G .y k , ŷE k are the k th pixel of HSI observation Y and HSI reconstruction ŶE , respectively.The proposed loss function in (17) is an extension of (9) that includes an additional angle distance loss.The first loss term, L EM SE , in (17) minimizes the discrepancy between Y and ŶE in Euclidean distance, similar to the optimization problem in Eq. ( 9).The second loss term, L EAng , provides a measure of disparity from a geometric perspective.As demonstrated in [37], the inclusion of this additional angle loss term can enhance the performance of the unmixing network.
As discussed before, in (18), the EDIP network f θ E is fed with a meaningful input E I , where E I is an estimate of endmembers generated via some existing unmixing algorithms.Similarly, A G is an estimate of abundances generated by some existing unmixing algorithms, which would serve as the guide to the training of the EDIP network.The guidance A G would guarantee the EDIP network to yield a meaningful estimation of endmembers.In this paper, E I and A G are generated by existing unmixing methods such as SiVM [7] and FCLS respectively.
Similarly, for abundance estimation Â by ADIP f θ A (A I ), we propose to use the loss function given by: where, and α 3 and α 4 are loss weights of L AM SE and L AAng .ŶA is the HSI reconstruction by ADIP f θ A (A I ), given the guidance E G .y k , ŷA k are the k th pixel of HSI observation Y and HSI reconstruction ŶA , respectively.Again, in (20), we need a pair of estimations A I and E G in order to provide ADIP network with a meaningful input and training guidance.Like before, A I and E G are generated by existing unmixing methods such as SiVM [7] and FCLS respectively.
Moreover, for reconstructed HSI observation ŶM , which is defined in (12) and (13), we also impose an additional loss function -the blind unmixing (BU) loss -given by: where, and α 5 and α 6 are the loss weights.y k , ŷM k are the k th pixel of HSI observation Y and HSI reconstruction ŶM , respectively.This additional loss L BU is necessary because otherwise EDIP and ADIP would yield endmember and abundance estimates close to guidance E G and A G , respectively.
The final loss function for linear blind unmixing is a combination of the above losses, as follows: where, L EDIP , L ADIP , L BU are defined in (17), (19) and (21).From the perspective of unmixing, the terms L EDIP and L ADIP in the composite loss function ensure that the network produces meaningful endmembers and abundances, in the sense that Ê, Â cannot deviate too much from E G , A G .At the same time, L BU allows the network to have the freedom to search for better estimations than the guidance E G , A G .From the perspective of network training, L EDIP and L ADIP can be interpreted as regularizations on the outputs of the BUDDIP network, Ê and Â, with L BU serving as the data fidelity term.This composite loss function can therefore alleviate the need for techniques such as early stopping [35] and exponentially averaging over different runs [34].By properly choosing the hyperparameters α 1∼6 , the network outputs will eventually reach an equilibrium state between inducing small fitting error (L BU ), and regularisation penalty, (L EDIP and L ADIP ).
In this paper, the proposed loss function ( 23) is applied to both linear and nonlinear scenarios.While it is possible to modify the form of ŶA and ŶE to accommodate nonlinear reconstruction of hyperspectral data, we choose to retain the linear form of these terms in the nonlinear case for the sake of simplicity and consistency between the linear and nonlinear cases.However, the purpose of L EDIP and L ADIP is to ensure that the network produces meaningful output, which is still valid under nonlinear conditions without the need for modification.This is because the nonlinear model (3) still contains a linear component.For this reason, the meaningful input E I , A I , and training guidance E G , A G are generated using the same unmixing algorithms (SiVM+FCLS) for both linear and nonlinear cases.
Different from the two-stage-cyclic descent method [38], the proposed network is trained in an end-to-end manner.Given only the HSI image Y, the learnable parameters {θ E , θ A } are learned by minimizing the composite loss in (23), using a variant of gradient descent optimizer, ADAM [46].Algorithm 1 Adaptive Loss Weight Strategy.

INPUT :
• α init 1∼6 -the initial value of loss weights • γ 1 , γ 2 -the rate of updating loss weights • α min , α max -the boundary of loss weights • g -update gap • J -the number of training epochs.TRAIN : 2) Adaptive loss weight strategy for NL-BUDDIP: To further improve the nonlinear unmixing performance, we also propose an adaptive loss weight strategy.If α 1∼4 are relatively larger than α 5∼6 , the network would simply mimic the guidance E G and A G On the contrary, if α 5∼6 is larger, the network would simply output meaningless endmembers and abundances.Thus, for the nonlinear case, we propose an adaptive loss weight strategy.The motivation is that, at the beginning of the training, we would like the network to converge to the guidance fast, so the loss weight of EDIP and ADIP, α 1∼4 should be larger.After approaching the guidance, the network should have more freedom to explore for a better solution, so we need to reduce α 1∼4 whilst increasing α 5∼6 .Finally, to avoid the weight exploding and vanishing problems, we need to set a threshold for the loss weights.This strategy is summarised in Alg. 1.

IV. EXPERIMENTS
We now conduct various experiments on both synthetic and real datasets, for linear and nonlinear unmixing tasks, to show the effectiveness of the proposed methods.In view of the space constraints, we have included more experiments in the supplementary materials.

A. Performance Metrics
We adopt some of the most popular metrics in literature to evaluate the unmixing performance of various algorithms.In particular, we employ root mean square error (RMSE), and abundance angle distance (AAD) [16], between the true abundance vector a k for k th pixel and the corresponding estimation âk , to measure the quality of abundance estimation.These metrics are given by: These metrics are then further averaged over the number of pixels to yield the final scalar quantities.
for the endmember estimation, we also adopt the wellknown spectral angle distance (SAD) to measure the dissimilarity between a true endmember signature e i and the corresponding estimate êi , where SAD is given by: This metric is instead averaged over the various endmembers to give the final scalar quantity.

B. Data
We also use both synthetic and real datasets to evaluate the performance of various unmixing algorithms.
1) Synthetic Dataset 1: We adopt the popular procedure in [27] to generate the synthetic HSI dataset for the linear unmixing problem.For the nonlinear unmixing case, we only need to modify the final linear mixing process with the corresponding nonlinear mixing counterpart.The procedure is as follows: • Endmember generation.The endmembers are generated by selecting mineral signatures from the famous USGS spectral library known as splib06 [47].The library contains values of spectral reflectance associated with different minerals over 224 channels, ranging from 0.4 µm to 2.5 µm.Six spectral signatures are randomly selected from the library, resulting in a 224×6 endmember matrix.These endmember signatures are shown in Fig. 5.
• Abundance generation.The abundances underlying the synthetic data is generated as follows.First, we divide a synthetic image of size a 2 × a 2 pixels into a 2 disjoint patches of size a × a pixels.Second, for all pixels of a given patch, we randomly select two endmembers out of the six spectral signatures, and assign them with fractions γ and 1 − γ, while the remaining four endmembers are assigned with value 0. Finally, the abundance map is convolved with a Gaussian filter of size (a + 1) × (a + 1), with variance set to be 2, followed by pixel-wisely rescaling to meet the ASC constraint.In this paper, we set a = 10 and γ = 0.8.• Mixing process.According to different mixing models (1) or (3), we generate correspondingly linear or nonlinear mixing synthetic data.• Noise contamination.Finally, the generated HSI data is contaminated with additive white gaussian noise (AWGN).The signal-to-noise ratio (SNR) is defined as SN R = 10 log 10 E[x T x]/E[n T n] , where x is the clean HSI data, and n is the noise.

2) Synthetic Dataset 2:
To test the effectiveness of the proposed methods when there is no pure pixel in HSI dataset, we also adopt the synthetic procedure in [23], [48]- [50], where a purity measure for an observed pixel y k is defined as ] (due to the ANC and ASC constraint).The larger ρ k is, the higher the purity of the pixel y k is.A set of n observed pixels {y k } n k=1 with ρ − 0.1 ≤ ρ k ≤ ρ is called a dataset with purity level of ρ.The synthetic generation procedure is basically the same as the procedure of synthetic dataset 1 except for the abundance generation, which is as follows: • Randomly sampling K = 10n abundance vectors from a Dirichlet distribution D(µ) where µ = (1/r)1 r , that is, and calculate the corresponding purity ρ k for all k.
• Construct a set of n abundance vectors with purity level ρ by randomly choosing n samples from Ω subject to ρ k ∈ [ρ − 0.1, ρ]. 3) Real Dataset: We also adopt the commonly used real HSI datasets Jasper Ridge [51], which contains 512 × 614 pixels associated with 224 spectral bands ranging from 380 nm up to 2500 nm with a spectral resolution of 9.46 nm.There are four endmembers in the scene: Road, Soil, Water, and Tree.A 100×100 pixels sub-images of the original images is considered in this article to lift the computational burden and deploy faster experimental studies.Due to dense water vapour and atmospheric effects, We also eliminate 26 spectral bands: channels 1-3, 108-112, 154-166 and 220-224, leaving only 198 out of the 224 bands for unmixing purposes.A representative image -associated with the 80th spectral band -is shown in Fig. 6. C. Evaluation on synthetic data 1) Linear Unmixing: We now compare the performance of our approaches under the linear mixing model using synthetic datasets, with that of some state-of-art learning-based methods, UnDIP [34], EGU-Net-ss [2], MNN-BU-2 [36], U-ADMM-BUNet-II [37], CNNAEU [52] and MiSiCNet [53].We also include the traditional method SiVM [7]+FCLS [8], Hyper-CSI [23], HiSun [22], and EDAA [17], where SiVM+FCLS is used to generate the training guide for some of the learningbased methods.Unless explicitly mentioned, the experiments are conducted under the default setting, where the structure of the proposed network is summarised in Table .The proposed network is trained using the Adam optimizer with a learning rate set to 5e − 3, and the number of epochs set to 6000.The synthetic HSI image dataset for both synthetic dataset 1&2 is constructed under the linear mixing model, which consists of 100 × 100 pixels.We also contaminate these HSI reflectances with AWGN leading to SN R = 30 dB.
Convergence Analysis: This experiment presents an analysis of the convergence behavior of the proposed method, by comparing the training process and blind unmixing performance under two different settings.In the first setting, denoted as "setting 1", the network is trained without any regularization or guidance by deactivating the L EDIP and L ADIP terms and setting α 1∼4 = 0 and α 5 = 1, α 6 = 0.1.In the second setting, denoted as "setting 2", the L EDIP and L ADIP terms are activated by setting α 1 = α 3 = α 5 = 1.0, α 2 = 0.001, α 4 = 0.01, and α 6 = 0.1.The number of epochs is set to 12000 for analysis purposes, while the remaining settings are kept as default.
The training process and the corresponding unmixing performance are plotted in Fig. 8 as a function of the number of epochs.In setting 1, the network converges at around 1200 epochs and achieves the smallest fitting error L BU M SE , but the worst unmixing performance, as evidenced by the high AAD and SAD values shown in Fig. 8c and Fig. 8d, respectively.Furthermore, the endmember SAD values increase with more epochs, indicating overfitting.Moreover, the regularisation penalties L EM SE , L AM SE , L EAng , and L AAng in Fig. 8a and Fig. 8b are also high.
In contrast, in setting 2 where all the regularization terms are activated, the network reaches an equilibrium state at around epoch 6000 as shown in Fig. 8, where it minimizes both the fitting error (L BU M SE and L BU Ang ) and the regularization penalties (L EM SE , L AM SE , L EAng , and L AAng ).This setting also yields improved unmixing performance, with an abundance AAD of approximately 4.4 and an endmember SAD of 1.67.Therefore, the proposed method does not require the use of techniques such as early stopping [35] or exponentially weighted averaging of outputs from multiple runs [34], since the L EDIP and L ADIP terms serve as effective regularizations.We choose 6000 epochs as a trade-off between training efficiency and unmixing quality.
Comparison of different input: We now evaluate the effectiveness of the proposed input strategy using synthetic dataset 1.We retain the default settings for BUDDIP except that the network now is fed with two types of input: gaussian input as UnDIP suggested [34], and the proposed input.The unmixing performance with respect to training epochs are shown in Fig. 9.It can be seen that with the proposed input strategy, the network achieves better unmixing when the number of epochs is small, say, 300.We attribute this to the fact that the proposed input can be viewed as a noisy observation of the underlying ground truth.Thus, the task of the network is to generate more refined unmixing results given the noisy input.In comparison, with the gaussian input, the network needs to generate unmixing results given noninformative input.It is also noticeable that, with the proposed input, BUDDIP also delivers better unmixing results when the number of epochs is large.
Performance vs. Purity Level ρ: In this section, we evaluate the impact of purity level ρ on the linear unmixing performance of various unmixing methods using synthetic dataset 2. Specifically, we retain the default settings and the purity ρ varies in [0.8, 0.9, 1.0], where ρ = 0.8 indicates the case where the HSI data is highly mixed and ρ = 1.0 means there are highly pure pixels.We also train the proposed method with guidances generated from two different methods, SiVM+FCLS and HyperCSI, respectively.The unmixing performance is summarised in Table .III.It is evident that, when the data is highly mixed, i.e., ρ = 0.8 and ρ = 0.9, HyperCSI performs the best among the various competing methods.Nevertheless, when the proposed method is trained with the guidance from HyperCSI, it achieves almost 1.6 times better performance than HyperCSI.For example, when HyperCSI shows abundance AAD of 2.19 and endmember SAD of 0.80 at purity level 0.8, the proposed method can further improve it to AAD of 1.97 and SAD of 0.49.In the meantime, although SiVM+FCLS failed to deliver satisfactory unmixing performance, the proposed network can use the unmixing results from SiVM+FCLS as the training guidance and further improve the performance by three times, from an abundance AAD of 11.4 to 3.82.At a purity level of ρ = 1.0, the proposed L-BUDDIP guided with SiVM has the lowest RMSE and AAD, as well as the second best SAD.Comparing the performance of L-BUDDIP trained with different guidance, it can be seen that the performance of the proposed method is affected by the quality of the guidance used during training.A higher quality guidance leads to better performance of the proposed method.Overall, the proposed L-BUDDIP performs well across different purity levels.Additionally, the proposed method can utilize the unmixing results of the state-of-the-art as training guidance (represented by E G and A G ) to further improve its performance.
Processing Time Comparison: In this section, we compare the processing time of various unmixing methods.The test platform is equipped with Intel Xeon Gold 6248 CPU 2.50GHz, Tesla V100 GPU, 503GB RAM.The results for linear unmixing case and nonlinear unmixing case are shown in Table .III and Table .IV, respectively.The quantities are obtained by averaging over ten individual runs.It is evident  that the network-based unmixing methods generally outperform traditional unmixing methods in terms of processing time, with the exception of HyperCSI.Among the networkbased methods, The proposed method stand out as the fastest unmixing approach.
2) Nonlinear Unmixing: We now evaluate the unmixing performance of the proposed NL-BUDDIP with various stateof-art unmixing methods, such as network based methods UnDIP [34], EGU-Net-ss [2], MNN-BU-2 [36], U-ADMM-BUNet-II [37], CNNAEU [52] and MiSiCNet [53], as well as the traditional methods SiVM [7]+FCLS [8], and rNMF [14], HyperCSI [23], HiSun [22], and EDAA [17], using synthetic dataset under FM model.Correspondingly, in the proposed network, the Mixing Module (MM) would use the FM model [45] as defined in (4), while the structure of EDIP and ADIP module are the same as those summarised in Table .II.We also use the Adam optimiser with learning rate set to 5e − 3 and the number of epochs set to 12000.By Fixed vs. Adaptive Loss Weight Strategy: We now evaluate the impact of the proposed adaptive loss weight strategy using the nonlinear HSI synthetic dataset 1 under FM model.Specifically, we vary the hyperparameters α init 2 , α init 4 , γ 1 , γ 2 , while the remaining are fixed as the default setting.The unmixing performance is reported in Fig. 10.The dots with γ 1 = γ 2 = 1 are results of the fixed loss weight strategy, while the others are results of the adaptive loss weight strategy.It is clear that, with the fixed strategy, the network can readily generate unmixing results better than the guidance.Nevertheless, with the proposed adaptive loss weight strategy, the network achieves the best unmixing performance in terms of both endmember SAD and abundance AAD.(1,10,0.8,0.9) (1,10,0.9,0.8) (1,1,0.8,0.9) (1,1,0.9,0.Performance vs. Purity Level ρ: In this section, we evaluate the impact of purity ρ on the nonlinear unmixing performance of the proposed methods using synthetic HSI dataset 2 under FM model.Specifically, we retain the default settings and the purity ρ varies in [0.8, 0.9, 1.0].The unmixing performance is summarised in Table .IV.It is clear that, at the purity level of ρ = 0.8, HyperCSI performs the best among the various competing methods with AAD of 4.6 and SAD of 1.5.On the other hand, although SiVM+FCLS only achieves AAD of 17.96 and SAD of 6.34, the proposed NL-BUDDIP can utilise those unmixing results as guidance and delivers 7 times better performance with AAD of 3.9959 and SAD of 0.8752, which is even better than HyperCSI.When ρ increases to 0.9 and 1.0, the proposed NL-BUDDIP achieves the best performance among all competitors.In general, the proposed NL-BUDDIP shows the state-of-the-art nonlinear unmixing performance across different purity levels.
The proposed L-BUDDIP with guidance from EDAA is trained using the hyperparameters α 1 = α 3 = α 6 = 0.01, α 2 = α 4 = 100, and α 5 = 1.0, while the hyperparameters for NL-BUDDIP with guidance from EDAA are set to α init = 0.01, γ 1 = γ 2 = 0.9, α min = 0.01, α max = 100, and g = 700.These hyperparameters were selected using grid search techniques.In contrast, the hyperparameters for L-BUDDIP with guidance from SiVM+FCLS were set to α 1 = 45.25, α 2 = 100, α 3 = 16.60,α 4 = 47.16,α 5 = 1.0, and α 6 = 0.08, which were determined using random search techniques.All networks were trained using a learning rate of 5e − 3 and epoch 6000.The qualitative results of estimated endmembers and abundances are shown in Fig. 11 and Fig. 12, respectively.The corresponding quantitative results are shown in Table .V. The results presented in Fig. 11 show that several existing methods, such as SiVM+FCLS, CNNAEU, UnDIP, and EGU-Net, failed to accurately estimate the signatures of the road, resulting in large endmember SAD values as indicated in Table .V.Moreover, both HyperCSI and HiSun failed to unmix the signature of water, while MiSiCNet produce the best estimation for the endmember of Tree for this dataset.In contrast, the proposed L-BUDDIP with guidance from SiVM+FCLS was able to recover more accurate endmember signatures for the road.Specifically, Table V shows that L-BUDDIP with SiVM+FCLS guidance achieved an Endmember SAD of 4.13 for the road, which represents a significant improvement compared to SiVM+FCLS, which achieved an SAD of 15.8 for the road.
Furthermore, when using guidance from EDAA, the proposed methods generated the most visually appealing unmixing results.The quantitative results presented in Table .V demonstrate that the proposed method has the best SAD for the road endmember and the overall average SAD among all endmembers.With improved endmember estimation, the proposed method also achieved the best abundance estimation in terms of RMSE and AAD.Moreover, NL-BUDDIP slightly outperformed L-BUDDIP in terms of abundance estimation.Although EDAA readily delivered good unmixing results with an AAD of 7.66, the proposed method was able to further improve the AAD to 5.2.Finally, the qualitative results presented in Fig. 12 indicate that the proposed method generates the best abundance estimation compared to its competitors.

V. CONCLUSION
In this work, we have proposed a general neural network structure that is capable of solving both linear and nonlinear hyperspectral blind unmixing problems.Different from the popular autoencoder structure, building upon Deep Image Prior (DIP) techniques, the proposed method consists of three modules: an Endmember estimation module using DIP (EDIP), an Abundance estimation module using DIP (ADIP), and a Mixing Module (MM).The EDIP and ADIP module   technique, we propose to use the estimations from existing unmixing methods as the input, based upon which, we design a more efficient DIP network structure with less learnable parameters.In order to generate meaningful unmixing results and further improve the unmixing performance, we propose a new composite loss function that is applicable in both linear and nonlinear unmixing cases.For the nonlinear case, we also propose an adaptive loss weight strategy to further improve the performance.Extensive experiments on both linear and nonlinear unmixing cases show that the proposed method can deliver better unmixing performance than the state-of-art unmixing methods such as SiVM+FCLS, rNMF, MNN-BU, U-ADMM-BUNet, EGU-Net, and UnDIP.

e 1 e 2 e 3 (Fig. 1 .
Fig. 1.Hyperspectral linear and nonlinear mixing scenarios.(a) Linear mixing.The image pixel is composed of 3 endmembers, e 1 , e 2 , e 3 .(b) Nonlinear mixing: intimate mixture.The image pixel is composed of a microscopic mixture of 3 endmembers, e 1 , e 2 , e 3 .(c) Nonlinear mixing: multilayered mixture.The image pixel is composed of 2 endmembers, tree and e 1 .Both individual reflection and interacted reflection reach the sensor.

( 21 )Fig. 2 .
Fig. 2. (a) The general architecture of the proposed BUDDIP.It consists of three modules: endmember estimation DIP (EDIP), abundance estimation DIP (ADIP) and mixing module (MM).The output of EDIP is denoted by the green arrow while that of ADIP by the blue arrow.(b) when MM performs LMM as defined in(12), we coined the whole model in (a) as L-BUDDIP, which can solve the linear blind unmixing problem.(c) While MM performs NLMM as defined in(13), we coined the whole model in (a) as NL-BUDDIP, which can solve the nonlinear blind unmixing problem.f L and f N L are defined in (14) and(15), respectively.

Fig. 3 .
Fig.3.The architecture of the proposed EDIP.We propose to give a meaningful input T E = E I , where E I is an estimation of endmember generated by existing methods, such as SiVM[7].The outputs of the main branch and the skipped input are added and forwarded to the final output block.

Fig. 4 .
Fig.4.The architecture of the proposed ADIP.We propose to give a meaningful input T A = A I , where A I is an estimation of abundances generated by existing methods, such as FCLS[8].The outputs of the skip connection and the main branch are concatenated and forwarded to the final output block.

Fig. 10 .
Fig.10.Fixed versus adaptive loss weight strategy under nonlinear unmixing case.X axis is abundance AAD and Y axis is endmember SAD.The dots with γ 1 = γ 2 = γ 3 = 1 are the fixed loss weight strategy, while the remaining are adaptive loss weight strategy.

TABLE I SUMMARY
OF SOME NOTATIONS.
E , θ A learnable parameters of proposed EDIP and ADIP network

TABLE III LINEAR
UNMIXING PERFORMANCE COMPARISON OF VARIOUS METHODS FOR DIFFERENT PURITY LEVEL ρ.THE BEST RESULTS ARE HIGHLIGHT IN BOLD.THE SECOND BEST ARE DENOTED WITH A STAR.

TABLE IV NONLINEAR
UNMIXING PERFORMANCE COMPARISON OF VARIOUS METHODS FOR DIFFERENT PURITY LEVEL ρ.THE BEST RESULTS ARE IN BOLD.THE SECOND BEST ARE DENOTED WITH A STAR.

TABLE V MEAN
AND STANDARD DEVIATION OF ABUNDANCE RMSE, AAD (IN DEGREES), ENDMEMBER SAD (IN DEGREES) BY DIFFERENT METHODS ON JASPER RIDGE.THE BEST RESULTS ARE IN BOLD.THE SECOND BEST ARE DENOTED WITH A STAR. 3011±0.04892.8836±0.0352⋆ 2.7488±0.02282.8532±0.0681⋆ 3.1966±0.0078⋆