Some Meditations on Markup

Submitted by Sarah Stanley on Wed, 12/03/2014 - 21:24

This is my medium-form post. I never got a chance to write my post for the markup and metadata unit, so I'm writing what I would have tried to write for that, but more in-depth.

Now that I've been introduced to many different Digital Humanities methods through Professor Cordell's Texts, Maps, and Networks course, I want to return to what got me here in the first place: markup. Over the course of this semester, I've experienced the many ways in which digital tools both constrain and enable the exploration of texts. No where has this been more apparent than in my work with the TEI. In my post on hacking and the TEI I explored the ways in which knowledge of the TEI's guidelines and features allowed me to do what some of my classmates could not. Since the TEI is both extensible and customizable, it allows projects to focus on vastly different things. However, I wish to explore the ways in which constraints, rather than extensions, actually help generate new knowledge.

My first example comes from the Early Caribbean Digital Archive. Over the course of the summer and the past semester, the ECDA's encoding team has been editing an ODD file which (to a certain extent) dictates our encoding practices. Like I mentioned in my post on hacking, this process of ODD editing has allowed us to expand what the TEI allows us to do. We've added the elements <commodity>, <flora>, <embSN> (embedded slave narrative), and several others which allow us to locate and describe textual features that the TEI consortium doesn't recognize. However, I've found that some of the most important work I've done has come out of the ECDA's self-imposed limitations.

The ECDA's markup team decided to use the element <distinct> to encode words and phrases that were in some way textually distinct. The TEI defines <distinct> as "any word or phrase which is regarded as linguistically distinct, for example archaic, technical, dialectical, non-preferred, etc., or as forming a part of a sublanguage." The ECDA decided to conflate this element and the <foreign> element, which encodes text written in foreign languages. Since the ECDA is invested in bringing out the voice of the other, we decided that differentiating between "dialects" (such as Haitian French) and "languages" (such as standardized French) would reinforce pre-existing ideas about the importance of certain languages. In many cases, we also used <distinct> instead of <hi>, when marking up italicized phrases. Our decision to constrain the TEI (i.e to get rid of the <foreign> element) were rooted in our theories of Caribbean texts and literary history. We further constrained the <distinct> element by creating a closed list of values on the type attribute (which is a required attribute). These type values were: agricultural, dialect, economic, ethnographical, geographical, mentioned, military, nautical, religious, and scientific. This means that any time an encoder used the <distinct> element, she needed to use one of those descriptors.

I was charged with encoding James Grainger's The Sugar Cane, which is an extensively annotated poem about the cultivation of sugar cane in St. Kitts. As I went through my text, I realized that the text italicized or otherwise marked as "distinct" several English words. As I tried deciding which type value to use, I realized that the only one that made any sense was "dialect." As a result, I have encoded marked-up many English language phrases with <distinct type="dialect" xml:lang="en">. This encoding emphasizes that, from the ECDA's perspective, even English is a type of dialect, on par with Antillean Creole or Haitian French. The prevalence of this specific tag with these specific values on @type and @xml:lang also says something about the text itself. By forcing myself to use the <distinct> element, I was able to see the ways in which the text itself made English strange.

From my work with the ECDA's constraints, I have been able to come up with some theories of how the English language functions as a dialect in the piece. Often times the English words encoded with <distinct> are words that the text condemns as corrupt or inaccurate. For example, one footnote, when describing a cashew, states:

Its Indian name is Acajou; hence corruptly called Cashew by the English. The fruit has no resemblance to a cherry, either in shape or size; and bears, at its lower extremity, a nut (which the Spaniards name Anacardo, and physicians Anacardium) that resembles a large kidney bean. (130-131)

This passage emphasizes the ways in which the English language corrupts (and by implication is corrupted by) its interaction with non-European languages. By making the English language distant and distinct, Grainger criticizes the English language for being influenced by people it colonizes. Grainger then goes on to propose a more correct and pure version of the English language, influenced by European language and scientific terminology alone. The text uses renditional distinctness to distance itself from English that has been corrupted by non-European languages.

Although our decision to use <distinct> rather than <foreign> or <hi> was rooted in pre-existing theories about the world, it ended up helping me to generate new theories about language in the text. The constraints that were imposed upon me, through our re-definition of TEI elements, were generative rather than restrictive.

Many of my recent experiences with publishing TEI through TEI Boilerplate and the Women Writers Project have also shed light on how publication and display constraints shape the TEI. Kevin Smith has a series of excellent posts on the topic visualization and the TEI. His discussion of prolepsis and metalepsis in markup is particularly useful when thinking about the ways in which output always constraints encoding, no matter how "descriptive" (as opposed to "procedural") the markup tends to be. Smith writes (of Wendell Piez's "Beyond the 'descriptive vs. procedural' distinction"):

[His] argument is that the productivity of the TEI (and generic languages like TEI) arise in the tension/slippage between its proleptic and metaleptic characteristics. TEI tries to be retrospective while also benefitting from strict validation schemes (looking forward). (Exploratory Markup)

While the TEI is certainly meant to describe pre-existing objects, it must necessarily create something new, whether as a result of the validation schemes (as in the example of <distinct>) or with publication constraints. I specifically noticed this when I taught a group of students how to markup texts and publish them in TEI boilerplate. The students were instructed to come up with a list of <interp>s, or their interpretive categories for the text. These were then applied to various elements as values on the ana attribute. Basically, the students needed to come up with a list of motifs that existed across all of their texts, and then markup when those motifs appeared. They then edited Boilerplate's custom.css file in order to make the different interpretations display with different colors. It was largely easy until the students realized that @ana could have an infinite number of values. This meant that they could apply several interpretations to one element. For example the phrase "Virgin Modeſty" could be encoded as <seg ana="#virginity #virtue">Virgin Modeſty<set> instead of only encoding #virginity or #virtue. Obviously, this was great for allowing students to create complex literary analyses with the TEI, but it made it much harder for students to render their documents in TEI Boilerplate.

The students were quite clever in coming up with workarounds for this problem. For example, one student continually encountered the coincidence of #magic and #animals. As a result, he created a new familiars <interp>, which allowed him to render this co-incidence as a new color.¹ The students also found ways to encode different parts of phrases with different @ana values, so that words alternated in color. The students obviously were aware that their documents would ultimately be displayed, and the constraints involved in this process drastically impacted the ways in which they encoded. The process of negotiation between the limitations placed by the css and xsl editing and the xml encoding itself allowed the students to create new methods for interpretation and generate new ideas about the text itself.

Now, of course, neither of these constraints (<distinct> or css) would have meant anything if the encoders didn't understand the inner workings of the various tools and theories of the at hand. For example, I could have ignored the ECDA's theoretical constraint which states that all distinct words should be encoded the same way. In this case, I would have simply thought: "Well, English isn't a dialect," and refrained from encoding the English phrases at all. If the students did not understand that combining @ana attribute values complicated their css files, they would have simply kept the multiple values, instead of generating new interpretive categories.

All of this seems to suggest that the creation of new knowledge using digital tools and methods is dependent upon not only the constraints of those tools, but also an awareness of the impact they have upon what you make. It is not simply enough to expand or constrain things like the TEI; it is also necessary to create productive frictions as we modify our tools.


¹ This document was encoded by Sebastian Alberdi, for Marina Leslie's Renaissance Bodies class. He also was responsible for much of the css editing for the project.