The compilation process involved two broad tasks:
- Searching of bibliographies and library catalogues to identify extant witnesses; and
- Acquiring digital copies of witnesses and the indexing and storing of these copies.
Identifying extant witnesses was facilitated by consulting the main work in this area over the last hundred years: Gerstenberg 1933Gerstenberg, Walter. 1933. Die klavierkompositionen Domenico Scarlattis. G. Bosse. , Kirkpatrick 1953aKirkpatrick, Ralph. 1953a. Domenico Scarlatti. Princeton University Press. , Sheveloff 1970Sheveloff, Joel Leonard. 1970. ‘The Keyboard Music of Domenico Scarlatti: A Re-Evaluation of the Present State of Knowledge in the Light of the Sources’. PhD dissertation, Brandeis University. , Sheveloff 1985Sheveloff, Joel Leonard. 1985. ‘Domenico Scarlatti: Tercentenary Frustrations [Part 1]’. Musical Quarterly 71: 399–436. , Yáñez Navarro 2015Yáñez Navarro, Celestino. 2015. ‘Nuevas aportaciones para el estudio de las sonatas de Domenico Scarlatti: los manuscritos del Archivo de Música de las catedrales de Zaragoza’. PhD dissertation, Universitat Autònoma de Barcelona. , and Hail and O’Connor 2017Hail, Christopher, and Michael O’Connor. 2017. Scarlatti Domenico: A New Look at the Keyboard Sonatas of Domenico Scarlatti for People Who Use Both Sides of Their Brain. Protean Press. http://web.archive.org/web/20140911064422/http://mysite.verizon.net/chrishail/scarlatti/index.html. . Some other collections were identified through consulting RISM as well as library and archival catalogues. All of these were compiled from the start of the project in an Excel file we call the ‘Great Table’, which tracked not only the extant witnesses but also their location and progress on their acquisition. On top of this, the data was also presented in an accessible form and was easy to correct if necessary by any member of the team.
Following initial independent work by Professor Jérémie Lumbroso supported by a Princeton University seed grant, the acquisition of further digital copies of (almost) all extant witnesses on the Texting Scarlatti project was made possible through collaboration with the relevant libraries and archives for which we are very grateful. Upon receipt of a digital copy, a comma-separated value (CSV) index file was created for every collection following Professor Lumbroso’s earlier design. These CSV files tell us exactly what is in the collection and at what page of the file we can find the sonatas we are interested in.
Occasionally, some editing of the acquired files was required. In some cases, for instance, pages were in the wrong order or upside down. These issues were corrected at this stage using a variety of tools and are occasionally highlighted on the individual collections pages on the project website.
Comma-separated value (CSV) index files
Here is an example of a few lines of a CSV file describing the contents of the manuscripts now in the Staatsbibliothek zu Berlin:
file,kirkpatrick,local_index,page_start,page_length,memo,codex,variant
berlin-ms10051.pdf,,1,3,4,Giovanni Battista Martini - Sonata BroMI 275/1,Berlin 10051,
berlin-ms10051.pdf,,2,7,4,Martini - Sonata BroMI 275/2,Berlin 10051,
berlin-ms10051.pdf,,3,9,5,Martini - Sonata BroMI 275/4,Berlin 10051,
berlin-ms10051.pdf,,4,13,4,Martini - Sonata BroMI 275/6,Berlin 10051,
berlin-ms10051.pdf,,5,17,4,Martini - Sonata BroMI 275/3,Berlin 10051,
berlin-ms10051.pdf,,6,19,4,Martini - Sonata BroMI 275/5,Berlin 10051,
berlin-ms10051.pdf,,7,23,4,Lorenzo Bonuccelli - Sonata,Berlin 10051,
berlin-ms10051.pdf,30,8,28,3,Del S.r Dom.co Scarlatti,Berlin 10051,
This CSV file is the same thing as the following table:
| File | Kirkpatrick | Local Index | Page Start | Page Length | Memo | Codex | Variant |
|---|---|---|---|---|---|---|---|
| berlin-ms10051.pdf | 1 | 3 | 4 | Giovanni Battista Martini – Sonata BroMI 275/1 | Berlin 10051 | ||
| berlin-ms10051.pdf | 2 | 7 | 4 | Martini – Sonata BroMI 275/2 | Berlin 10051 | ||
| berlin-ms10051.pdf | 3 | 9 | 5 | Martini – Sonata BroMI 275/4 | Berlin 10051 | ||
| berlin-ms10051.pdf | 4 | 13 | 4 | Martini – Sonata BroMI 275/6 | Berlin 10051 | ||
| berlin-ms10051.pdf | 5 | 17 | 4 | Martini – Sonata BroMI 275/3 | Berlin 10051 | ||
| berlin-ms10051.pdf | 6 | 19 | 4 | Martini – Sonata BroMI 275/5 | Berlin 10051 | ||
| berlin-ms10051.pdf | 7 | 23 | 4 | Lorenzo Bonuccelli – Sonata | Berlin 10051 | ||
| berlin-ms10051.pdf | 30 | 8 | 28 | 3 | Del S.r Dom.co Scarlatti | Berlin 10051 |
The first row of the CSV file is like the header row of a table: it tells us exactly what data is in the columns below it.
- The
filecolumn points to the PDF of that particular collection. - The
kirkpatrickcolumn tells us what Scarlatti sonata in the Kirkpatrick catalogue is present in the collection. If this column is empty, it means the piece is either not by Scarlatti or not part of the Kirkpatrick catalogue. - The
local_indexcolumn is a simple order of appearance of the pieces in the collection. - The
page_startcolumn lists the starting page of each piece. - The
page_lengthcolumn lists the total number of pages each piece occupies. - The
memocolumn can be used to describe the contents of the particular piece, a title, or other comments. - The
codexcolumn is where we specify the siglum of a collection: the identifier we use in our project to refer to a particular collection. - Finally, the
variantcolumn is used for when the same piece appears multiple times in one collection to separate the first or second appearance. The sonata K52 for instance appears twice in Venice 1742: as number 10 and as number 61.
In this particular example, we can tell that the first 7 pieces are not by Scarlatti: the kirkpatrick column is empty for these rows, and as the memo column informs us, the pieces are either by Martini or by Bonuccelli. Only piece number 8, which is sonata K30 by Scarlatti, is relevant to us. We know that it starts on page 28 and that it takes up 3 pages in total.
Dissecting the collections: our Copycutter script
After creating an index file for the digital copy (PDF) of a particular collection, a computer script (named Copycutter (Python)) was used to cut the collection PDF into its constituent parts: the individual sonata PDFs. Our script, which is an adaptation of an earlier version by Professor Lumbroso, splits a PDF into separate parts based on the information in the CSV files following the logic in the previous paragraph. These were then automatically stored in directories using the Kirkpatrick catalogue number so that all witnesses of the same sonata could be found in the same place.
The collection PDFs, their index files, and the individual sonata files were backed up in multiple locations including a SharePoint hosted by Guildhall School. As part of this process, an index file (in JSON format) detailing the contents of our sonatas directory was created. This directory index file is essential to the Collation step of our project as it tells our Roseingrave software exactly which copies we have access to and where they can be found: more on this in the next section.
Our Copycutter script, like all other tools and scripts used in the project, is freely available on the public GitHub repository for the project (link available soon). A typical run of the script, which creates just over 3,700 individual PDFs from the collection files collected during the project, takes less than an hour on a desktop PC. It is also possible to target just one particular collection: splitting up the PDFs of the collections now in Lund (Engelhart/Wenster), which contain 28 sonatas in the Kirkpatrick catalogue, takes about 10 seconds.
Jasper van der Klis, October 2025