The Material
This project presents a pilot for an XML schema to help scholars understand and utilise the vast records of the East India Company. The East India Company was significant historical force and much of the world today is shaped by its actions during its existence from the 16th through to the 19th century. It is an important and fecund area for research and importantly for modern day scholars the Company maintained an efficient bureaucracy that kept extensive records. The British Library now has this entire collection of records, which is held along 14 kilometres of shelves. The enormous size of such a dataset suggests the usefulness of applying digital techniques. My first approach to choosing material from these records was to follow the correspondence of an East India Company employee. However when I began to examine these documents in closer detail I discovered that the transcriptions of these records would be an enormous project in and of itself. The handwriting was difficult to decipher and some of it was even written in code. This then turned my focus towards documents that were easily legible and/or already digitised. Searching through the records I noticed that some of the documents that were produced later in the life of the Company were typed and that there already existed compilations of previous documents. This project creates a pilot for an XML encoding of the compilation Treaties, Agreements, and Engagements, Between the Honorable East India Company and the Native Princes, Chiefs, and States, in Western India; The Red Sea; The Persian Gulf; &c. Also Between Her Britannic Majesty’s Government And Persia, Portugal, and Turkey printed in 1851 by Thomas R. Hughes. This document was also chosen because it has been digitised and uploaded for free access on Google Books, all images and transcriptions have made from this digitised version.
Document Analysis and Mark-up Design
This project has chosen to create a DTD (Document Type Definition) schema for this book in order to preserve the text in the book intact. A tree structure diagram has been created in order to visualise the structure of the mark-up design (SEE APPENDIX). Due to its nature as a compilation this source lends itself to being broken down into unique documents. Each document is then further broken down into two main sections, a brief paragraph giving a context for the main text followed by this text. This in turn may be followed by any grouping of elements that include; a record of signatories, a statement of date and place, notes and/or memorandum added by the author. The decision to create a DTD as opposed to an XML schema was due to the varied nature of the textual elements and the desire to preserve this variety. Researchers looking at how printed books were compiled at this time may be interested in the varying ways that this contextual paragraph is presented and those tracing original non English copies of documents might be interested in the exact translation text. This is the reason for the repeated usage of mixed data elements. An area for further consideration is a mark-up to denote line breaks in the document. This was not encoded due to time issues but is an area for development.
The documents in the source book have already been sorted into an index by the compiler arranged by place (ranging from states eg. Persia, to cities, Bombay, to villages Burat). The individual documents inside these indexes are then arranged by date. This XML project choose to keep this structure and each document is located within an index that corresponds to the index in the source book. However each document is also given a unique ID code which cannot be duplicated and could allow for cross-referencing between source texts. Each document type is also given an ID in order to facilitate comparison between them, grouping all those of the same type together.
An issue with printed material is that it can only be arranged in a singular way. A major concern of this project is to create a multiplicity of indexes allowing an overview of documents through a variety of groupings and presentations. The book does not allow for the documents to be presented progressing through time. To avoid repetition of data the elemnt from contextual paragraph (element ) is the value that the documents are sorted by. However some ’s do not contain a date and in this case a date reference has been added, preferably a way to sort using an if statement would be found. There are multiple calendars used throughout the documents and each is given an attribute of type: Gregorian, Islamic, or Hindu (this may develop with further research).
It would be interesting for future iterations of this project to assign geographical values to the index areas in order to visually map the progression of these documents over both time and space.
The main text of each document is then further subdivided into 2 categories, contractual or prose. Each of these elements in turn is split into either articles (for contract) or paragraphs (for prose). Contracts may also contain an internal introduction and confirmation sections. This was in order to facilitate the comparison of styles exploited by the various political actors and also to enable a study of changing political language and structure in these documents. Due to time constraints this project has not approached this area of analysis.
The other main focus of this project is to trace the usage of transliterated words through these documents. How have these words developed over time? What individual standardization choices did the compiler make and how have these changed compared to present day? In order to track this every word that was transcribed from a non-English language was given the ‘transliterated’ attribute with the #FIXED attribute-type of the value “true”. This technique is utilized to mark multiple elements as being related, it is also used to mark names as signatures, sig=“true”, or seals, seal=“true”, and to mark data that is added for reference eg. ref=“true”. Moving forward this project would make more use of this markup, using it to identify all documents signed by unique individuals and their rank at the time of signing.
This project shows that there are many capabilities left to exploit through DTD and XML schemas. It would be an area of further good research to expand this DTD to apply to all documents in the British East India Company records and so searches and collate data from all sources.
Bibliography
Kaplan, and Frédéric. "A Map for Big Data Research in Digital Humanities." Frontiers. April 18, 2015. Accessed April 30, 2018. https://www.frontiersin.org/articles/10.3389/fdigh.2015.00001/full.
"India Office Records and Private Papers." The British Library. January 29, 2015. Accessed April 30, 2018. https://www.bl.uk/collection-guides/india-office-records.
Keay, John. The Honourable Company: A History of the English East India Company. London: HarperCollins, 1993.
Thomas, R. Hughes. Treaties, Agreements, and Engagements, Between the Honorable East India Company and the Native Princes, Chiefs, and States, in Western India, the Red Sea, the Persian Gulf, &c, Also Between Her Britannic Majesty's Government, and Persia, Portugal, and Turkey. Bombay: Bombay Education Society's Press, 1851.
"Treaties, Agreements, and Engagements, Between the Honorable East India Company and the Native Princes, Chiefs, and States, in Western India, the Red Sea, the Persian Gulf, &c: Also Between Her Britannic Majesty's Government, and Persia, Portugal, and Turkey by - Books on Google Play." Google. Accessed April 30, 2018. https://play.google.com/store/books/details?id=b5lRAAAAcAAJ.