First International Workshop on applying EpiDoc/TEI Markup to Complex Scripts
On Monday and Tuesday, October 5th and 6th, 2015, the project „Text Database and Dictionary of Classic Mayan“ at Bonn organized a first international workshop on the use of the markup language XML, particularly XML based standards like TEI and EpiDoc, for annotating and studying non-alphabetic scripts writing systems.
The Text Encoding Initiative (TEI)1) http://www.tei-c.org/index.xml
is a consortium which collectively develops and maintains a standard for the representation of texts in digital form. Its provides a set of Guidelines which specify encoding and annotation methods for machine-readable texts by the use of the markup language XML.
The EpiDoc2) http://sourceforge.net/p/epidoc/wiki/Home/
recommendations are particulary built for structured TEI based markup of epigraphic documents.
During the workshop, experts on Maya writing, cuneiform, Egypt hieroglyphic writing, Luwian hieroglyphs, Minoan and Aegean scripts, and Hebrew inscriptions, as well as specialists for Latin and Greek inscriptions, discussed ways to digitally markup texts in complex writing systems with the help of the metadata standards like EpiDoc or TEI for representing texts in digital form.
In cooperation with experts for non-alphabetic, or respectively complex, writing systems and XML based metadata standards for the annotation of texts, the Bonn Maya Dictionary project addressed several questions. For instance whether and how the existing standards TEI and EpiDoc can also be employed to markup texts written in hieroglyphic, cuneiform or linear scripts, some of which have been only partially deciphered (Maya) and others of which remain illegible (Cypro-Minoan). Invited speakers included experts on Hieroglyphic Luwian (Annick Payne, Basel), Aegean writing systems (Miguel Valerio), Egypt writing (Daniel Werning, Berlin), Cuneiform traditions (Hubert Mara), and Aztec Hieroglyphic Writing (Gordon Whittaker, Göttingen).
The workshop took place at the Abteilung für Altamerikanistik [Department of Anthropology of the Americas], Universität Bonn.
On the first day of the workshop, 30-minute talks were held on syllabic and logo-syllabic writing systems, and the problems and challenges addressed in these talks were the subject of a round-table discussion with all participants on the second day.
A key objective of XML-based markup of hieroglyphic, linear, or cuneiform texts is to represent the original spelling and arrangement of signs in their respective contexts. A linear transcription alone cannot represent the original text or primary source in its entirety, as many potentially significant details remain undocumented. A detailed markup of the original text is therefore of great importance, particularly for partially deciphered and undeciphered writing systems. In such cases, an alphanumerical or numerical nomenclature is often used to refer to individual signs in order to facilitate corpus-linguistic analysis of the texts. In addition, the arrangement of the text and its position on the text carrier should be documented using TEI. To fully understand a writing system and the language and messages that it expresses, detailed representations of primary sources and their contexts must be prioritised when digitally marking up documents. Digital documentation of original spellings using standards like the XML based encoding TEI is fundamental to the study of deciphered and undeciphered scripts, especially in the case of hieroglyphic, linear and cuneiform systems, in order to be able to conduct detailed graphemic and graphetic analysis of the script. This analysis, in turn, necessarily constitutes the basis for the script’s linguistic and corpus linguistic investigation. We identify this area as a significant desideratum in epigraphic research.
Topics of discussion
Thus, the interdisciplinary workshop together with epigraphers and experts for XML, TEI and EpiDoc gave the opportunity to discuss methods for investigating syllabic and logo-syllabic writing systems using these XML-based standards.
The contributors addressed mainly the following three topics in their presentations:
1A) Graphemics and graphotactic strategies of the respective writing systems: what sign functions are known and what graphotactic strategies exist for representing meaningful units or words (e.g. affixation, infixation, ligature, superimposition, underspellings, diacritics, phonemic complements or indicators, semantic indicators, determinatives)? Do projects exist in which such graphemic strategies have been successfully coded using XML?
1B) Signary: Sign classification and sign catalogs, the classification of signs in a writing systems and the nomenclature used: what is the significance of sign classifications and nomenclatures in epigraphic research? Where are sign classifications used and what is their role in transliterating and transcribing texts in the respective writing systems?
2) Graphetics (The use of writing: formal structure of the linguistic units without creating differences in meaning, the structure of texts). Allography (sign variants) and spelling variants: what is the significance of allography? Are allographs annotated when transliterating and/or transcribing the respective writing systems, and if so, how? (In Maya writing, for example, there exist over twenty different graphic variants of the 3rd person pronoun u „he, she, it“.) What is the reading order of the signs in context, and are word separators or other graphic aids used to differentiate between meaningful units or words? Are there variations in the reading order of signs, and possibly even systematized variation? If so, are these anomalous reading orders documented or otherwise indicated by epigraphers? What is the reading order of the meaningful units within texts (e.g. linear spelling, column arrangement, etc.), and are variations or anomalies in these realms indicated as well? In case that images appear with text, is there a link between text and image? Are texts integrated into images or vice versa, and if so, how? How is the text arranged, and are the size of the text, the text carrier, and the writing material relevant to the graphemics (e.g. in Maya writing, underspellings may appear on objects that are difficult to carve)?
3) Current state of decipherment and legibility of the texts: What is the state of decipherment of the respective writing systems? How do we deal with undeciphered signs or text passages, and how do we mark them in the transliterations and transcriptions? How do we handle hypothetical decipherments of singular signs or passages, and how do we accommodate alternative readings of the same passage? (Physical) gaps: are they indicated in the original texts, and if so, how do we mark and comment upon them in the transliteration and transcription?
These questions, among others, fed first into a continuous discussion on the annotation of complex, non-alphabetic writing systems in form of the mailing list “ENcoding COmplex Writing Systems (ENCOWS” that arose from the workshop. In this list, the discussion of all aspects of encoding complex writing in these and other languages, including epigraphy, text transcription, object and historical metadata, vocabularies and terminology, publication and infrastructure is welcome. Secondly, regularly scheduled workshops of this kind will hopefully follow and keep the discussion continuing. For further questions about the EnCoWS list or the topic of our workshop, please use the comment section or contact us email@example.com
Footnotes [ + ]