UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Exploratory analysis of provenance data using R and the provenance package

Vermeesch, P; (2019) Exploratory analysis of provenance data using R and the provenance package. Minerals , 9 (3) 10.3390/min9030193. Green open access

[thumbnail of minerals-09-00193-v3.pdf]
Preview
Text
minerals-09-00193-v3.pdf - Published Version

Download (761kB) | Preview

Abstract

Licensee MDPI, Basel, Switzerland. The provenance of siliclastic sediment may be traced using a wide variety of chemical, mineralogical and isotopic proxies. These define three distinct data types: (1) compositional data such as chemical concentrations; (2) point-counting data such as heavy mineral compositions; and (3) distributional data such as zircon U-Pb age spectra. Each of these three data types requires separate statistical treatment. Central to any such treatment is the ability to quantify the ‘dissimilarity’ between two samples. For compositional data, this is best done using a logratio distance. Point-counting data may be compared using the chi-square distance, which deals better with missing components (zero values) than the logratio distance does. Finally, distributional data can be compared using the Kolmogorov-Smirnov and related statistics. For small datasets using a single provenance proxy, data interpretation can sometimes be done by visual inspection of ternary diagrams or age spectra. However, this no longer works for larger and more complex datasets. This paper reviews a number of multivariate ordination techniques to aid the interpretation of such studies. Multidimensional Scaling (MDS) is a generally applicable method that displays the salient dissimilarities and differences between multiple samples as a configuration of points in which similar samples plot close together and dissimilar samples plot far apart. For compositional data, classical MDS analysis of logratio data is shown to be equivalent to Principal Component Analysis (PCA). The resulting MDS configurations can be augmented with compositional information as biplots. For point-counting data, classical MDS analysis of chi-square distances is shown to be equivalent to Correspondence Analysis (CA). This technique also produces biplots. Thus, MDS provides a common platform to visualise and interpret all types of provenance data. Generalising the method to three-way dissimilarity tables provides an opportunity to combine several datasets together and thereby facilitate the interpretation of ‘Big Data’. This paper presents a set of tutorials using the statistical programming language R. It illustrates the theoretical underpinnings of compositional data analysis, PCA, MDS and other concepts using toy examples, before applying these methods to real datasets with the provenance package.

Type: Article
Title: Exploratory analysis of provenance data using R and the provenance package
Open access status: An open access version is available from UCL Discovery
DOI: 10.3390/min9030193
Publisher version: https://doi.org/10.3390/min9030193
Language: English
Additional information: sediment; provenance; statistics; zircon; heavy minerals; point counting; petrography
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Earth Sciences
URI: https://discovery.ucl.ac.uk/id/eprint/10072978
Downloads since deposit
0Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item