M6 – Collation and validation phases

Home

Methodology

Collations were completed by at least two people working independently. During our July 2022 pilot and the first phase of the official project (from January 2023 onwards) the work was divided among four team members and around eight freelance research assistants (RAs) who were paid a set fee per collation. Metadata and variants were entered into bespoke Google Sheets using our notation system, which was first developed for our pilot. The notation system went through further development stages during the project, with final revisions taking place in the Summer of 2024. We recorded all variants between a witness and the control: where there was a concordance, we left the cells of the spreadsheet empty. Variants in common between two (or more) witnesses suggest potential kinship which we can then explore.

As not all collections were acquired before the project started we opted to work in phases. Initially, the individual phases focused on particular collections: Barcelona M1964, New York Cary, Essercizi, and so on; since January 2024 we worked on what we call Sundry phases. These were designed to complete work on as many of the sonatas as possible (for example: finish all collations of K490).

The freelance RAs were initially recruited by the five team members; since May 2023, we also employed further approaches including email and posters. We used a dedicated Slack channel for the project communications, where questions and issues arising during the collation process were discussed. Various Google Forms were used to document issues such as incorrect sonata indexing or missing pages.

Roseingrave

Personalised worksheets for each RA and team member were created using Roseingrave which was developed to our specifications by Joseph Lou under the supervision of Professor Jérémie Lumbroso. The latest version of Roseingrave as well as extensive documentation is freely available here.

RAs were provided with a 35-page document (User Manual) which contains everything a prospective participant needed to know before working on their collations. The User Manual is now available on this website (link to be added).

After all of the collations in a particular phase were completed, the data was extracted into JSON files using Roseingrave and backed up using version control software (Git) and on SharePoint.

Initiating a work phase using Roseingrave

After collecting and indexing the relevant material for a phase of work, Roseingrave was used to create worksheets for each RA. The Roseingrave documentation linked above describes in detail how to use the tool.

To assist the prospective Roseingrave user we have provided a sample directory as part of our public GitHub repository (link available soon) which contains the necessary files to run a test operation. An installation of Python on your system is required. A detailed guide of the process is provided.

Validation

After the collation phase was completed, collations were checked for accuracy by the project team as well as a number of expert RAs. Following this workflow we managed to collate and validate a total of 2,617 extant witnesses during the project.

Validation was an important aspect of our project: it is imperative that we retained only the most accurate collation (in the correct format) for the analysis phases. Our system of collation (by up to three independent RAs) and validation (by a team member or trusted expert RA) ensured high accuracy and consistency across the entire corpus.

Jasper van der Klis, October 2025