Unified Neural Encoding of BTFs

Realistic rendering using discrete reflectance measurements is challenging, because arbitrary directions on the light and view hemispheres are queried at render time, incurring large memory requirements and the need for interpolation. This explains the desire for compact and continuously parametrized models akin to analytic BRDFs; however, fitting BRDF parameters to complex data such as BTF texels can prove challenging, as models tend to describe restricted function spaces that cannot encompass real‐world behavior. Recent advances in this area have increasingly relied on neural representations that are trained to reproduce acquired reflectance data. The associated training process is extremely costly and must typically be repeated for each material. Inspired by autoencoders, we propose a unified network architecture that is trained on a variety of materials, and which projects reflectance measurements to a shared latent parameter space. Similarly to SVBRDF fitting, real‐world materials are represented by parameter maps, and the decoder network is analog to the analytic BRDF expression (also parametrized on light and view directions for practical rendering application). With this approach, encoding and decoding materials becomes a simple matter of evaluating the network. We train and validate on BTF datasets of the University of Bonn, but there are no prerequisites on either the number of angular reflectance samples, or the sample positions. Additionally, we show that the latent space is well‐behaved and can be sampled from, for applications such as mipmapping and texture synthesis.


Introduction
One of the main goals in Computer Graphics is to create photorealistic renderings that appear plausible to the human eye. Beyond scene geometry, a lot of the visual plausibility comes from the realism of the rendered materials. While many analytic reflectance models can create realistic-looking materials, this is only one part of the challenge. The bigger part of the difficulty lies in matching the appearance of existing, real-world materials.
To tackle this issue, many research efforts have been focused on building devices that can take measurements of a given material's appearance. Typically, this means capturing calibrated photographs of a material sample at a number of different combinations of viewing and lighting direction (see [DvGNK99]). Rendering from such a discrete set of textures is quite impractical as it integrates suboptimally with the rest of the pipeline: at rendering time, we want to be able to query the appearance at an arbitrary view-light configuration, which has to be interpolated from the discrete set of measurements.
The obvious solution to this is to transform the discrete list of reflectance measurements at each point into a continuous representation that is practical for rendering. Analytic spatially-varying bidirectional reflectance distribution functions (SVBRDFs) are the most common choice of models for this purpose, and the straightforward approach would be to fit the parameters of an SVBRDF model given the measurements; however, this does not generalize well to measurements of many real-world materials. Most BRDF models make strong assumptions about the function they represent, such as Helmholtz reciprocity, energy conservation etc.; furthermore, they assume that the shape of the reflectance lobes can be approximated accurately with the analytic expression of the BRDF (often based on Gaussians).
In practice, most real-world materials are far from perfectly defined and isolated, which changes the reflectance response drastically. They contain many imperfections (dust, scratches, fuzziness etc.) which humans are perceptive of and which contribute to a more realistic appearance. For many spatially varying materials, the measurements at single positions also contain traces of light transport from nearby points (such as subsurface scattering, interreflections, shadowing etc.). Therefore we follow the convention of Koudelka et al. [KBMK01] and refer to the texel responses as A(pparent)BRDFs or reflectance functions. To reconstruct these ABRDFs accurately, we need more expressive models that make fewer assumptions about the data.
One possible approach to tackle this issue is to learn not only the set of parameters, but also the reflectance model (see [RJGW19]). Using a neural architecture similar to an auto-encoder, reflectance functions are projected to a parameter vector in latent space. The decoder (analog to the BRDF model) is learned during training, along with the projection into latent space. Effectively, this means that no assumptions on the reflectance model are made. One of the main shortcomings of this approach however, is that it does not generalize well across materials. A different instance of the network is trained for every new material BTF, meaning that the parameter space is not shared between materials.
Ideally, we desire an infrastructure that is common to all materials, a way of encoding them all to the same space. This would make the neural reflectance models truly analog to analytic parametric BRDFs. In this paper, we present our new unified architecture that is trained on a wide range of BTF texels and projects them all to a common latent space, and investigate the flexibility, stability and robustness of such encoding. Furthermore, we demonstrate the practical advantage of such a unified architecture for efficient encoding of ABRDFs of a new, previously unseen, material.

Related Work
Bidirectional Texture Functions (BTFs) were first introduced by Dana et al. [DvGNK99]. Intuitively, they are organized as a stack of 2-dimensional textures, where each texture corresponds to the material's appearance under a certain combination of viewing and lighting directions. Relinquishing spatial information, one can look at individual texel responses as reflectance measurements of a single point on the material. In that sense, individual texel responses are a list of reflectance values at different angles on a particular position on the material defining its appearance function. Although considered individually, in the case of BTFs, these local responses still contain non-local lighting effects such as subsurface scattering or interreflection.
Fitting parametric models Researchers have often employed fitting of a parametric or analytic model to explain real-world reflectance data. Here, early work in the context of BTFs used polynomials [MGW01] and Lafortune lobes [MLH02] to model the directional dependence of each texel. Later approaches model directional variation as mixtures of parameteric models [WDR11,SPS13]. As a side effect, parametric methods often provide physically meaningful and potentially user-editable quantities characterizing the geometry (e.g. surface normals), surface albedo, etc. [LBAD * 06, MG09, AWL13]. Related to our process of learning a decoder, Genetic Programming (GP) has previously been employed to learn new analytic BRDF models that better describe specific materials [BLPW14] Unfortunately, analytic BRDF models are generally not sufficiently expressive to capture the rich variety of local reflectance behavior observed in real-world materials, which leads to significant residuals compared to the original data. Accordingly, the residual of the fit is often kept and compressed separately [MCT * 05, WDR11]. Parametric methods also make additional assumptions about the data and the materials: fitting methods generally require close-to-perfect registration of the BTF data, parallax correction, as well as a clearly defined opaque material surface. Some or all of these assumptions may be violated when acquiring materials that do not do not occupy a clearly defined two-dimensional surface.
Latent spaces of appearance Researchers have also investigated finding a common parameter space for real-world measurements of appearance. Soler et al. [SSN18] have proposed a non-linear manifold using a Gaussian-process latent-variable model that is suitable for interpolation over the space of measured materials (MERL database). Sun et al. [SJR18] have proposed a data-driven diffusespecular separation which enables efficient material editing operations on the separated diffuse and specular components of measured BRDFs and a novel low-dimensional PCA model for measured BRDFs with similar dimensionality as analytic models. Lagunas et al. [LMS * 19] instead learn a perceptual feature space for materials (based on data gathered from crowdsourced experiments) that correlates with perceived appearance similarity. In the context of their BTF compression scheme, Havran et al. [HFM10] extract common intrinsic data between materials, but in general there has been little work on finding a shared projection basis for BTFs.
Neural encoding of appearance Several recent works have applied neural networks to encode material representations or light transport in scenes ([RDL * 15]). A comprehensive survey about deep appearance modelling can be found in [Don19]. Maximov et al. [MRF18] introduced the concept of "deep appearance maps", which use a small fully connected network as a material descriptor. Zsolnai-Fehér et al.
[ZFWW18] employ a neural network to render previews of materials with static scene geometry. Also related to this is neural texture rendering [TZN19], which features texture maps along with a neural renderer that allows to store much more information than diffuse textures, such as specular highlights and parallax. Kuznetsov et al. [KHX * 19] use Generative Adversarial Networks to avoid explicit modeling and simulation of the surface microstructure, achieving a more flexible representation of specularity.
Closest to our work is that of Rainer et al. [RJGW19] who first proposed neural encoding of BTFs. However, they employ a materialspecific autoencoder architecture that, while very good at encoding and interpolation of a specific material, does not generalize to other BTFs. This means that the entire network has to be expensively re-trained for encoding a different material BTF. The lack of a common parameter space for encoding ABRDFs also means that their method cannot support applications such as appearance synthesis or interpolation in latent space. Instead, we present a unified BTF encoding architecture that makes such latent space operations possible, while also making it very efficient to encode a previously unseen material without requiring any re-training.

Problem Analysis
BTFs are spatially-varying reflectance maps, meaning they also contain information about the spatial layout of ABRDFs. To keep input complexity low, we choose to ignore the spatial disposition and process each texel individually, without making use of the neighboring information. This means we encode each Apparent BRDF separately. The difficulty lies in the fact that ABRDFs describe a larger space of possible appearances than BRDFs. Since during the acquisition of the BTF, the point captured in the ABRDF is surrounded by the rest of the material, under directional lighting, the measurements contain a lot of non-local lighting effects, such as subsurface scattering and interreflections. Having these effects in the measurements allows for a more realistic rendering in the end, but it also makes individual treatment of texels more complex. This is one of the reasons why standard BRDFs, which by design model light transport in a single isolated point only, are not an optimal choice to approximate ABRDFs.
Another difficulty comes from the sample spacing of the measurements. ABRDFs are in practice a list of reflectance values with the corresponding light and view directions, for one position on the material. Depending on the acquisition protocol, the number of entries in that list, as well as the light/view directions that were sampled, is variable. Since we want to design an approach to encode any set of BTF measurements, no assumptions on angular resolution and sampling pattern can be made. The only prerequisite we impose on the input data, is that the hemispheres of lighting and viewing be sampled fairly uniformly and at a sufficient resolution to correctly sample most reflectance lobes. Fortunately, BTFs are usually sampled regularly in the angular domain, as there is little utility in adaptive sampling patterns for materials with spatial variations: a sampling strategy that is optimal for some set of points on the surface will be suboptimal for other points.
Since we want to learn the space of reflectance functions, beyond a mere mapping of ABRDFs to parameters, we model the entire process using neural networks. We refer the reader to [GBC16] for explanations of the deep learning concepts used in the following sections. Conceptually, our approach is close to an autoencoder ( [HS06]): The input measurements are encoded to a vector in parameter space, which, when run through the decoding model, should approximate as well as possible the input values. In that sense, the decoding network is analog to a BRDF model, with the difference that the decoding function is learned rather than analytically fixed.
Neural networks are known to yield excellent performance for a range of challenging problems when the input data is arranged on a regular grid (e.g. a 2D image), and the network architecture relies on convolutional layers that effectively constitute a type of regularization strategy. When processing ABRDFs, such an approach is unfortunately not possible, since their angular positions are irregular. The number of angular samples may even change from one ABRDF to another, which means that another standard neural network element-the fully-connected layer-is also not admissible. To handle the unstructured angular nature of the data, we introduce a new architecture that is invariant to both the number of angular measurements as well as their exact positions and ordering.

Method
In this section, we delve into the specificity of our neural encoding and decoding method. An input is an ABRDF, which we format as a list of n 7-dimensional entries: incoming light direction (2 dimensions), outgoing light direction (2 dimensions), and respective RGB reflectance measurement (3 dimensions Figure 2: Our architecture is conceptually an autoencoder for BTF texels, that works for any angular sampling resolution and pattern. It encodes input ABRDFs of arbitrary ordering and length n to a low-dimensional latent vector. Using an MLP that predicts weights from angles, we build a weights matrix that the expanded RGB measurements are multiplied with. Averaging across the vertical dimension allows us to recover a 3-by-m feature matrix for any input, satisfying the BTF sampling invariance criterion. Intuitively this is equivalent to discrete angular integration of the product of reflectance signal with angular filters. The remainder of our encoding architecture consists of standard fully-connected networks with ReLU activations. ( Figure 2) projects the input to a latent vector of small, fixed dimensionality. The decoding structure ( Figure 3) is able to reconstruct the input ABRDF given the corresponding latent vector. Similarly to an auto-encoder, the full encoding-decoding pipeline is optimized to best approximate an identity transformation (at the angles sampled in the input).
Neural Architecture Since we cannot make any assumptions about the input structure (in angular space), a flexible encoding pipeline is required. To satisfy this, we split the encoding network into two parts. In a first processing phase, a Multi-Layer Perceptron (MLP) outputs basis vectors of fixed dimensionality at each sampled angular position, which the reflectance measurements are projected on. Integration along the angular dimension reduces this to a fixed-size feature vector. In essence, this is a discrete approximation of an integration (in angular space) of the product of the reflectance lobes with learned filters.
The aim of this integration in the angular domain is to detect inherent properties of the reflectance functions through their responses to the filters. For instance, in the case where the filter learned by the MLP were to be constant, the recovered response would be the mean reflectance, which is a good approximation of diffuse albedo.
Another possibly more intuitive way of looking at our encoding approach is as an approximation of a linear layer. The most straightforward architecture would be a fully connected layer between the input list of RGB measurements and the latent vector. However, this is not possible because the ordering and number of angles in the input ABRDF can change between datasets. So instead, we use an MLP (parametrized on the light/view angles) to predict the weights that this fully connected layer would apply. Essentially, the MLP learns a continuous representation of the intended fully connected layer, in the angular domain.
Angular MLP The MLP takes the angles (in stereographic parametrization, similarly to [RJGW19]) of one light-view combination as input and returns a vector of weights. When processing a set of angular reflectance measurements, the angular MLP is run at every sampled light-view position, and we concatenate all mdimensional output vectors into a weight matrix A. On the other side, the list of RGB reflectance values is expanded m times. Multiplying the resulting weight matrix elementwise with the list of reflectance values is then equivalent to a basic dot product of the angular filters with the reflectance signals. ance requirements, we determined empirically that an averaging operation produces better results. This allows us to squash the dimension of n elements to one, which means that independently of the ordering and the number of angular samples, the result of this operation is always a 3-by-m matrix. The output of this processing step is a feature vector of fixed dimensionality, that we use as input to the more traditional encoder.
Consistently with the aim of the angular MLP to simulate a linear layer, we first apply a non-linear activation (parametric ReLU (Rectified Linear Unit) & addition of bias) on the unrolled 3mdimensional feature vector. The remaining part of the the encoder is composed of standard fully connected layers with ReLU activations. In practice, the fully connected part of the encoder only contains one hidden layer before projection to the latent space. , the learned mapping to the latent space has limited complexity. The reason for this convolutional downsampling is that the input ABRDFs are extremely large (almost 80,000 values), so directly using fully connected layers is impractical. Furthermore, a fully connected layer would require fixing the angular sampling of the ABRDFs, which works for Rainer et al. who train a new network per material. In our case, the angular sampling can change between datasets, so we want to remain flexible. To achieve this, our encoder learns a continuous representation of these fully connected weights based on the respective light/view angles with a small MLP (50,000 parameters), which allows us to create an approximate version of this fully connected layer at any given input size.
Decoder Network The decoder (Figure 3) is also a fully connected network with non-linear activations, following the same decoder design as Rainer et al [RJGW19]. It takes as input the latent coordinates of the ABRDF, along with the light and view directions in stereographic coordinates, which makes it practical for rendering. In practice, we use 4 hidden layers with ReLU activations. Training Specifications The entire architecture is trained end to end. To cheaply augment the data and to avoid overexposing the network to certain hues, we permute the RGB channels of input ABRDFs at every iteration. Furthermore, to make the network more robust to variations in the angular resolutions, the decoder receives a random subset of between 20% and 100% of the samples during training. The loss is still computed on the full set of angular samples, though. This ensures that even with a lower angular resolution, the projection still converges to the same position in latent space, and that the decoder interpolates smoothly between sampled angles.
We train on BTF texels from the Bonn BTF database [WGK14], which contains 7 material classes, each featuring 12 material BTF. We train and test on the texels of 11 out of 12 BTFs of each class. The 12 th material of each class is kept for validation and used for evaluation in the next section.
To keep training stable, the reflectance values in the ABRDFs are normalized in preprocessing, i.e., the mean gets subtracted and the resulting values are divided by their standard deviation. Additionally, to reduce the high dynamic range of measurements, a logarithmic transformation is applied to the values before the normalisation.
Once training is completed, compressing the appearance of a BTF texel simply becomes a matter of evaluating the network given the corresponding list of measurements as input. For rendering, only the projected latent maps and the decoder layers are required.
Implementation Details In our implementation, the angular MLP consists of 4 hidden layers, each with 128 neurons plus ReLU activations, and m = 800. The MLP hence outputs 3 vectors of 800 weights for each angular configuration of light/view, that are multiplied elementwise with each RGB reflectance measurement (expanded 800 times). The encoder only consists of a PreLU activation, one hidden layer with 128 neurons and a ReLU activation. The decoder consists of 4 hidden linear layers of 106 neurons with ReLU activations (same architecture as [RJGW19]). Whilst those parameters remain fixed, we explore several possibilities of latent space dimensionality in the following section.
We train with standard stochastic gradient descent, learning rate of 0.2, batch size of 10, 100 ABRDFs per dataset per epoch. At every epoch, we load a new random set of 100 ABRDFs from each material BTF. We train for 1000 epochs, which takes about 40 hours on average on an NVIDIA GeForce RTX 2070. We found empirically that using an L1 loss gives our encoding a more accurate average hue and preserves more contrast than the L2 loss used by Rainer et al.

Results
To assess the performance of our network, we visualize reconstruction results on the 7 datasets used for validation (unseen at training/testing time). We compare to the architecture from [RJGW19], which we consequently refer to as custom network, as Rainer et al. train a new instance of the network for every material. In contrast, we refer to our architecture as general network as it is trained on many different BTF datasets and evaluated on unseen materials.

Accuracy
When dealing with BTFs, it is difficult to compare with ground truth, because as soon as rendering is performed, the original textures are (commonly linearly) interpolated, which introduces bias. The only available method is comparing reconstructions with the original textures at the angles that were sampled in the ABRDF. Figure 4 displays the texture reconstructions of each of the validation materials at one particular combination of light/view directions.
Comparisons in texture space In Figure 4, we compare the ground truth with the custom network of [RJGW19] trained on all Bonn training materials, except the validation materials displayed in the figure, to the custom network overfitted to the individual material, and to our network trained on all training materials. This is a skewed comparison in the sense that the networks of columns 2 and 5 are evaluated on unseen materials, while the network of column 3 was trained solely on the evaluated material.
Furthermore, we use the same latent space dimensionality and decoder size for all networks. This means they all dispose of the same compression and decoding budget, allowing us to assess how well each architecture is able to learn a more general embedding. As the custom network in column 3 is overfitted to the specific material, it represents the upper performance bound that a general architecture could reach.
On average, the custom network overfitted to the specific material performs slightly better than our network. However, this is to be expected as the custom network uses all its encoding budget to cater to the specific appearance of the material, while our network adopts an average solution that works well for all classes of materials. In that sense, our network performs almost as well on an unseen material as the custom network on an overfitted material, given the same parameter budget.
Overall, the main drawbacks of the encoding are firstly a loss of spatial detail (the reconstructions are slightly blurrier than the original). This is most likely due to slight misalignment or parallax in the original data, which means that individual positions on the material still move around in texture space, making the information harder to encode when we process ABRDFs individually. The other issue seems to be a damping of specular highlights for some materials (e.g., Fabric12). The most likely explanation for this is that specular highlights only show in a small subset of captured angles, making this part of the signal less crucial to the reconstruction loss. Diffuse albedo, anisotropy, intershadowing, etc., play a much bigger role in the loss than localized specular highlights.
Influence of the number of latent dimensions In order to tackle these loss-of-detail issues, we investigate the influence of latent space dimensionality, i.e., how much storage budget is given to the network to encode each ABRDF. More latent dimensions means more specific reflectance details can be encoded on each direction (anisotropy, specularity etc.) and the network is given more parameters to separate similar-looking texels.
Rainer et al. set a standard for a reasonable reconstruction performance. When overfitting the projection to a specific dataset, 8 latent dimensions are a good compromise between maximizing reconstruction accuracy and minimizing storage. We attempt to find the best compromise between a small network and similar performance.  Regarding the size of the decoder, Rainer et al. showed that 4 hidden layers represent the sweet spot in depth. We experimented with wider layers (more neurons), which also decreases the reconstruction error, but in a far less drastic measure than increasing the latent size. Additionally, it makes the network heavier and slower to train, as well as much slower to run for rendering applications. For this reason, we consider a latent size of 32 with the original layer width of 106 neurons to be the most efficient compromise for maximized performance at reasonable compression: this amounts to storing roughly 10 RGB textures for one encoded BTF material.
Comparisons on renderings When rendering with the original dataset, the ground truth renderings are inevitably corrupted by linear interpolation between nearest sampled directions (usually 9 light/view combinations). Many very localized specular details can get lost or blurred out in the process. Hence, some additional reconstruction accuracy can be gained with networks that interpolate well in between the original sampled views, even if at the originally sampled positions (as in Figure 4) the reconstruction is not perfect. The stability of the neural interpolation was demonstrated in [RJGW19]: with superior interpolation capabilities, even if the reconstruction performance at originally sampled directions is not perfect, we still obtain a near-equal quality performance on renderings.
In Figure 5, we compare renderings with the custom overfitted network and with two instances of our architecture, to renderings with the original BTF. Both full BTF and neural rendering use the same rendering code as [RJGW19]. For reference, renderings of the BTF-mapped cylinders in Mitsuba, at 800 × 800 pixels, at 32 samples per pixel, parallelized on 10 processes, pathtraced with parallax mapping of the height field associated to the material, take 1.2 minutes on our machine for our general network with 32 latent dimensions, versus 1.1 minutes with the custom network of [RJGW19] at 8 latent dimensions.    In the third column we use our architecture with 8 latent dimensions and decoder layers of 106 neurons (same budget as [RJGW19]). In the fourth column, we show our model of choice, with 32 latent dimensions this time. The increase in encoding budget greatly improves the reconstruction, even though all the materials displayed are unseen by our network at training times (reserved for validation). This means our codec generalizes well outside of the training set.
It is apparent that our network performs very well at encoding spatial detail (most noticeable on the leather12 dataset), better than the custom overfitted network. Non-local effects like subsurface scattering and intershadowing are particularly well replicated by our architecture. However, for materials with sharp specular lobes (see wood12 and fabric12), some of the specular highlights are damped. In this area, the custom network remains slightly more faithful, albeit applying more spatial blur. This is most likely due to the custom decoder learning a specific reflectance shape that is characteristic to the overfitted material's texels. Our network however, has to learn an average reflectance shape across many materials with very varied appearance. Specular highlights proportionally only play a small role as the appearance of most materials in the database is dominated by other properties (diffuse albedo, intershadowing, etc.). For visualization of temporal coherence, we provide animations of a BTF-textured plane under a moving point light source in the supplemental material, comparing the original BTF rendering to the custom network and our general network.
Evaluation on a different data source For additional comparisons, we process the first ten datasets of the UTIA MAM database [FKH * 18] with our network (trained and tested solely on the Bonn datasets). The UTIA BTFs, too, are uniformly sampled, but are less dense, containing 6,561 angular measurements compared to 22,801 for Bonn. However the materials are generally more specular and some of them contain transparent layers. To our knowledge, there is no heightfield parallax correction. Figure 6 show point-light renderings of the first ten materials of the UTIA MAM database, rendered using the original BTF, the custom network of Rainer et al. [RJGW19] overfitted to the respective material, and our network's predicted latent maps. The most notable difference is that our network tends to dim the specular highlights that are truncated in the original dataset, because it has not seen any examples of those in the training data, whereas the overfitting network [RJGW19] can learn those. Overall, we observe the expected slight degradation compared to Rainer et al., but beyond that, our network generalizes plausibly to a new, unseen, data source.

Stability of the latent space
Robustness to angular resolution We show that our novel architecture can accommodate any structure of input sampling. To enforce this flexibility, at training time, we randomly feed between 20% and 100% of the angular inputs to the angular MLP and encoder. For reference, the 6,561 angular measurements of the UTIA datasets amount to approximately 28% of the angular resolution of the Bonn datasets. The loss is still computed at all originally measured angular combinations, to make sure the model learns correct interpolation. Using the validation datasets, we show the convergence of the reconstruction loss as a function of the size of the angular subset in Figure 7. From less than 10% of angular samples on (2,280 randomly chosen out of the original 22,801), the reconstruction is stable and has converged. Figure 8 shows the same comparison on reconstructed textures. Each texture was rendered using a latent map that was projected from a random subset (of the respective proportion) of angular samples. Visually and quantitatively, the reconstructed appearance converges at less than 10% of the original samples. Since the network only needs a tenth of measurements of the Bonn datasets for similar reconstruction quality, this would allow capture times of these datasets to be reduced tenfold. Even datasets with a low-resolution angular sampling will benefit from this encoding that creates an appearance   model that interpolates smoothly and behaves well using the knowledge it gained from the densely sampled training materials, which will produce better renderings than interpolating a sparsely sampled set of angles.
Filtering in latent space An important technique to facilitate rendering at different scales is spatial downfiltering of textures, commonly applied as mipmapping. Texture pyramids are typically precomputed before rendering, to allow texture lookups at different resolutions depending on the footprint of the material in the rendering. We investigate the equivalence between downsampling in latent space before renderings, versus downsampling in reconstructed texture space in Figure 9. The difference is barely noticeable, filtering in latent space proves to be very stable. Furthermore, this saves computation time as it allows precomputation of latent mipmaps and avoids having to run the decoder multiple times.
Texture synthesis using the most compressed representation Finally, a stable latent space allows for efficient BTF synthesis. Texture synthesis on BTFs is challenging because BTFs are basically N-dimensional RGB textures, N being the number of angular measurements. Using our encoding, we can directly synthesize from the latent maps instead (32-dimensional). We use straightforward image quilting ([EF01]) on the latent maps in Figure 10 to enlarge the BTF by a factor of 5.
Visualization of the latent space We use a t-distributed Stochastic Neighbor Embedding (t-SNE) to visualize the behavior of the latent projection (see Figure 1). We observe that within a BTF, the projections of texels are very well clustered. There is however not an obvious consistency with the semantic classes defined in the Bonn database. Nevertheless, this is to be expected as our clustering is relative to appearance, while the semantic classes refer rather to fabric/material types and fabrication procedures. For instance, some of the carpet ABRDFs are very close in diffuse albedo and reflectance shape to those from felt materials, even though they are in different semantic classes.

Conclusion
We presented a novel architecture model capable of handling unstructured angular reflectance measurements. Based on autoencoders, our network projects ABRDFs into a low-dimensional latent space, analogous to analytic BRDF model parameters, while the decoder network is analogous to the analytic BRDF model expression. The network is trained on a variety of ABRDF samples from the Bonn BTF datasets, and evaluated on previously unseen materials. Compared to Rainer et al. who train a new network instance for every new material [RJGW19], encoding a new material with our method requires a simple evaluation of the encoder. Having a single autoencoder instance also means that the latent space is shared between materials, i.e., texels are projected into the same domain. This allows for exploration of the parameter space for applications such as latent mipmapping, texture synthesis etc.  Future Work One of the main obstacles hindering more generalization seems to be the lack of data. The Bonn database is the biggest available BTF database, but it is still relatively sparse compared to standard deep learning problems. Only 77 materials are shown to the network, and the validation datasets we evaluate on do not all have close matches in the training set. One way of tackling this issue would be to augment the training data with synthetic ABRDFs. This is a tricky endeavor, however, as most synthetic SVBRDF datasets are generated using common analytic BRDF models. This could bias the network into learning current analytic models, which conflicts with our goal of staying flexible to learn all the components of real-world reflectance functions.  like to thank the University of Bonn's Computer Graphics group for making their BTF database publicly available.