# READ ME - UCL Multispectral Processed Images of Parchment Damage Dataset

## Description

The data presented here is a set of 2,800 multispectral images of an actual parchment, taken before and after a set of degradation procedures that were designed to match the most likely types of damage which may occur over the lifetime of parchment documents. It is presented here as a source for the digitisation and digital imaging community to allow others to use the dataset for their own research. The image files include various images including raw, white corrected, set registered, and calibration images, and the data set is 150GB in total.

The capture, preparation, and structure of this data is described in Giacometti, A., Campagnolo, A., MacDonald, L., Mahony,S., Robson, S., Weyrich, T., Terras, M., Gibson, A. (2015). “The value of critical destruction: evaluating multispectral image processing methods for the analysis of primary historical texts”. Journal of Digital Scholarship in the Humanities, Oxford University Press. This article is available in open access and those wishing to use the dataset should make themselves familiar with how the data was created and is organised, which is covered in the article, and also in the General Structure section, below. 

## License

The data is provided here with a Creative Commons Attribution 2.0 UK: England & Wales (CC BY 2.0 UK)license: http://creativecommons.org/licenses/by/2.0/uk/. Users are free to  Share — copy and redistribute the material in any medium or format, and 
Adapt — remix, transform, and build upon the material for any purpose, even commercially. In order to do so, you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use of the data.  


## Citing This Data

The data should be cited as: UCL Multispectral Processed Images of Parchment Damage Dataset, Giacometti, A., Campagnolo, A., MacDonald, L., Mahony,S., Robson, S., Weyrich, T., Terras, M., Gibson, A. (2015). UCL: DOI: 10.14324/000.ds.1469099 

## General Structure

The data is divided into 3 main sections: normalised, treated_normalised, and processed. Each one of those directories contains a series of directories with the names of each sample, like so:

- normalised
    - O610R  - O608R  - O606R  - O604R  - O602V  - O601V  - O512R  - O509R  - I309R  
    - I305R  - I303R  - I204R  - I202R  - O609R  - O607R  - O605R  - O603R  - O602R  
    - O601R  - O511R  - O508R  - I306R  - I304R  - I208R  - I203R

- treated_normalised
    - O610R  - O608R  - O606R  - O604R  - O602V  - O601V  - O512R  - O509R  - I309R  
    - I305R  - I303R  - I204R  - I202R  - O609R  - O607R  - O605R  - O603R  - O602R  
    - O601R  - O511R  - O508R  - I306R  - I304R  - I208R  - I203R

- processed
    - O610R  - O608R  - O606R  - O604R  - O602V  - O601V  - O512R  - O509R  - I309R  
    - I305R  - I303R  - I204R  - I202R  - O609R  - O607R  - O605R  - O603R  - O602R  
    - O601R  - O511R  - O508R  - I306R  - I304R  - I208R  - I203R

The *normalised* directory holds all of the data collected from each of the samples before any treatment was applied. Conversely, the *treated_normalised* directory holds all of the data collected from the samples after the application of the treatment, except for the tree control samples: I203, I303 and O601, where no treatment was applied. The third directory *processed* holds resulting images from processing algorithms. 

All images are contained within each one of those sample directories. Each image file is named following the convention detailed in the following section.

## Metadata

There are 8 pieces of information recorded in every filename: a unique serial number for an image, a parchment sample identification, image channels, lighting, filter, the image type, and the file format. Each image filename follows the following structure:

    <serial number>_<sample code>_<treatment>_
    <channels>_<lighting>_<reference_&_filter>_<type of image>.<extension>

Each piece of information is necessary to identify the image and quickly determine what it depicts:

Serial Number
:   Six digit sequential number that uniquely identifies an image.

Sample Code
:   Denotes three pieces of information: Whether the sample comes from the inner or outer sheet of the parchment, denoted by an `I` or an `O` accordingly, a three alpha-numeric code, which identifies a single sample of the parchment, and whether the image depicts the front or *recto* of the sample, denoted by `R`, or the back or *verso*, denoted by `V`. The *recto* of the sample also corresponds to the flesh side of the parchment, while the *verso* corresponds to the grain.

Treatment
:   Denotes whether the parchment has been submitted to any degradation treatment. `NO` denotes no treatment. Each of the treatment was assigned a two letter code.

[Two letter codes for degradation procedures.]
|  Code  |      Treatment      |
| :----- | :------------------ |
| **AD** | Aniline Dye         |
| **BL** | Blood               |
| **CH** | Calcium Hydroxide   |
| **CO** | Control             |
| **DS** | Dessicant           |
| **HA** | Hydrochloric Acid   |
| **HT** | Heat                |
| **IG** | Iron Gall Ink       |
| **II** | India Ink           |
| **MD** | Mechanical Damage   |
| **MO** | Mould               |
| **NO** | No treatment        |
| **OI** | Oil                 |
| **SA** | Sulphuric Acid      |
| **SC** | Scraping            |
| **SH** | Sodium Hypochlorite |
| **SK** | Smoke               |
| **SR** | Scrunching          |
| **TE** | Tea                 |
| **UV** | Ultraviolet light   |
| **WA** | Water 7.0 pH        |
| **WI** | Red Wine            |


Channels
:   Images are either monochrome, denoted by `L` for luminance, or contain three colour channels, denoted by `C` for colour.

Lighting
:   Lighting conditions used to capture the image. Reflective lighting is denoted by `R` and transmissive lighting by `T`. 

Reference & Filter
:   The reference & filter mark is a four character sequence specifying if the image is a reference image, and the wavelength of the filter. If the image is a white reference image – for calibration purposes, the a `W` is used, otherwise a `0` marks a regular image. The wavelength of the filter is denoted by a three digit number. If there is no filter, the keyword `NON` will appear in place of a wavelength. For example, a white image with no filter is marked by `WNON` and an image of a sample with a filter at $500 nm$ would be marked with `0500`. 

Type of Image
:   This space is reserved to mark the kind of image, all acquired images are initially marked with the keyword `ORIG` for original. Later white corrected or registered images are marked with `WCOR` and `SREG` respectively. 

Extension
:   The format of the image will be indicated by standard `UNIX` format extensions, i.e. `.tif` for *Tagged Image File Format*, or `.nef` for *Nikon Raw Format*.

For example:

    000001_I203R_NO_C_R_WNON_ORIG.tif
    000002_O602V_NO_L_T_0750_WCOR.tif 
    
The first file would correspond to image number `000001` of the verso of the `203` sample, which is from the inner parchment. It has not been submitted to any treatment. The image has three channels, and was photographed using reflective lighting and no filters. This is an original unprocessed image and it is saved in a `TIFF` uncompressed format.

The second file would correspond to image number `000002` of the verso of sample number `602`, which is from the outer parchment. It has not been submitted to any treatment. The image is monochrome, and was photographed using transmissive lighting and a filter centered at $750 nm$. This image has been corrected for illumination and is saved in `TIFF` uncompressed format.

##
For further questions please contact the supervisors of this research, Professor Adam Gibson adam.gibson@ucl.ac.uk, and Professor Melissa Terras, m.terras@ucl.ac.uk.