ShakespeareXML Logo

Editioning Shakespeare
Re-Thinking the Electronic Edition of Shakespeare
(Let's Start With Three Basic Problems)

Peter Paolucci, Ph.D. York University, Toronto, Canada

July 30, 2014
Revised January 6, 2016

If you would like to experience the functionality of the Shakespeare XML Project, or if you would like more information on the grammar of its XML, please fill out this request form. All other comments or inquiries should be addressed to: editors@shakespearexml.ca or shakes@yorku.ca.

Topics:


Page 1


Preamble

Editing Shakespeare is a daunting task with some unique challenges. For starters, the catalogue of printed FGEs (First Generation Editions) contains an unusually wide range of discrepant variants with no unequivocal way to resolve the problem of textual authority--even when the stemmatology seems clear. If it were only a matter of accidentals (punctuation, orthography, line numbering, etc.) our problems might be less contentious. However, when lines and whole speeches are added (or missing--depending on your point of view), when lexis and semantics can seem contradictory in two different editions, and when whole scenes have been re-sequenced, deciding on a "reliable" (some might say convincing) solution to the problem of documenting and synchronizing variations is suddenly much more difficult.

These problems are compounded by another fact: it's unusual that a writer of Shakespeare's notoriety left no autograph editions, and consequently there is no clearly authoritative text to resolve disputes. Ironically, in spite of scholarly training, editors and critics alike have been seduced by the sirens of speculation in these matters. Led into the dangerous waters of what Walsh calls "psychological intentionalism," or the idea that the texts themselves are "signs or notes of the author's mind" (37: Walsh quoting J. Dover Wilson), we might well ask why so many Shakespearean scholars have felt compelled to speculate about that which is absent, using evidence which is either non-existent or shoddy. Copy-text theory itself is implicated, but so is audience shorthand theory, abridgement theory (Irace), revision theory (Irace, Jowett), "Lost Archetype theory" (Rosenbaum 41), and two-text theory (Blayney) to name a few.

As Rosenbaum (The Shakespeare Wars) and others have suggested, the resultant and highly contentious debates have both informed and undermined the history of Shakespearean editorial scholarship. The long history of editing Shakespeare is fraught with arguments about problems of textual authority (1). And the legacy of editorial scholarship has not, as one might have hoped, helped propel us toward any satisfactory solution. For instance, whether by accident or design, most pre-1980s print editions give a confident (but false) impression of a single, monolithic text which is "the" play as it has been received; even variorum editions take a single (privileged) text as the basis for "the" edition. When faced with "good" and "bad" quartos in Hamlet, for example, and so many significant differences, it becomes apparent that conjectural emendation, speculative stemmatology and reductionist texts are as obfuscating and misleading as much as they are informing and clarifying. And who's to say with any certainty that the Q1 Hamlet can't work as a text (2) ?

F1 has conveniently served as a base text (Copytext) of trusted authority, perhaps because of its posthumous and collaborative publication by Shakespeare's own associates (Heminge and Condell), but also I suppose, because these editors were the first to impose a navigational order (act.scene.line) on the amorphous quarto versions. Many arguments have been recruited into the service of supporting the inherent literary and editorial superiority of F1, but evidence is not always convincing. Thus it has come to pass that recent editors (since the 1980s) working electronically and in print have gone beyond the variorum edition by visualizing editorial complexity in many new ways. There are appended, interlocking and conflated editions. For instance, the second edition of The Oxford Shakespeare (Jowett, Montgomery, Taylor and Wells: 2005) offers both the folio edition and quarto editions of Lear, one after the other. The second Norton edition (Greenblatt, Cohen, Howard, and Maus: 2008) raises the stakes of that ante and offers three different textual variations of Lear: the Q1 version on the verso side with a matching or synchronized F1 version of the recto side, and additionally, a conflated version that is appended after these two interlocked versions. In Greenblatt's second edition of the Norton, King Lear line 1.3.33 in Q1 reads "Gloucester: 'I shall, my liege'" while F1 shows "lord" instead of "liege." Barbara Lewalski's conflated version adopts the Q1 reading and suppresses F1.

Comparative renderings empower the scholarly reader, but one could also argue that they muddy the waters for the neophyte who seeks only the most basic understanding. In conflated renderings the reader might never be aware that variants exist; for some readers this is acceptable, for others, not so. One logistical problem emerging from the complexity of available editions is how to synchronize line numbering across so many textual variants. A second problem is how to markup texts when there are so many textual variants that need to be tracked and synchronized with each other. This paper discusses just a few of the ways in which the Shakespeare XML Project ("SXP") offers solutions to these problems. The paper also briefly discusses some ways to improve the sophistication of word search across all textual variations including modern and translated ones.

The SXP re-constitutes the old power hierarchy between editor and reader, shifting it away from the control of editorial expertise over to reader preference and empowerment. In so doing, the SXP makes possible the act of editioning, a continuously ongoing process by which multiple and variant chunks (lines or passages) of a play-and their correspondent annotations and commentaries-can be infinitely and spontaneously re-combined at will. These reader-controlled mashups automatically update through RSS feeds and other Web 2.0 technologies, and with different filters (linguistic, philosophical, biographical, historical, scholarly, public, country of origin, time period etc.). Instead of the noun, "an edition" or even "editions," both of which come from the physicality of the book, the SXP makes a case for editioning, a more ethereal process that emerges from the effects of digitization. Since each reader now modifies their own content ("edition") according to individual preferences, no two editions at any moment in time would necessarily be the same. Imagine collated or variorum lines, annotations, description of sources, embedded media, critical annotations, and other critical features shifting dynamically, like the pieces of color in a kaleidoscope. Refresh rates will be as short as a few minutes or as long as a few months, and only in the areas that interested the reader.

The SXP signifies a paradigm shift away from the edition as object (noun), towards editioning as process (verb); away from a top-down model of scholarly expertise (the editor knows best) and towards a bottom-up model of collective intelligence (the editor merely empowers the reader to make their own editorial choices). Arguably, the material print culture of the book, with its educated and class-based readers, over-determined the power attributed to knowledge producers (editors) and the consequent disenfranchisement of consumers (readers). It is not mere digitization alone that destabilizes this relationship, but the particular way in which Web 2.0 structurally undermines and reforms that "collaborative space where people can interact," to use Berners-Lee's term. The SXP's "texts" do not really exist as .html files on a drive, or even as records in a database on a server; they only need exist for that single moment when the reader calls them forth (assembles them) a customized but transient configuration of texts and annotations. A local "print" (output to paper) is merely a snapshot of the mashup at that moment in time.

Page 2

Problem 1: Line Numbering

The McKerrow-Hinman TLN numbering system takes F1 as the base text and is then forced to distort the line numbering on variant editions in order to retain consistency across different editions. Apart from the inherent awkwardness of this solution, there is the additional problem that the editorial decision of privileging F1 over alternatives is no longer a matter of orthodoxy. This shift in editorial practice is evidenced by the practice of transforming single-text editions (print and/or digital and even from variorum editions) into multi-editions that offer the reader a more flexible choice of a base text. I'm thinking here, for example, of the second edition of The Oxford Shakespeare (Jowett, Montgomery, Taylor and Wells: 2005) which offers both the Folio edition and Quarto editions of Lear, one after the other. The second Norton edition (Greenblatt, Cohen, Howard, and Maus: 2008) raises the stakes of that ante and offers three different textual variations of Lear: the Q1 version on the verso side with a matching or synchronized F1 version of the recto side, and additionally, a conflated version that is appended after these two interlocked versions. Digitally there is Bevington's As You Like It (3) and Hamlet Works.org, edited by Bernice W. Kliman (also the inventor of the Enfolded Hamlet, displayed here), Frank N. Clary, Hardin Aasand, and Eric Rasmussen; with Jeffery Triggs, webmaster at <http://www.Hamlet Works.org>.

The contentiously named "bad" quarto of Hamlet (Q1) seems to be "missing" lines and entire passages that appear elsewhere--for example, in Q2 and F1 (4) (although one might also argue that the other texts have a surplus of words and lines). Tracking variations and line numbers is complex and convoluted. Lear too, is notoriously problematic with the substantially different endings in F1 and Q1. The complexity of these FGEs is further compounded by problems inherent in F1 itself, with its own internal set of complications pertaining to multiple compositors and their differing typesetting eccentricities. And beyond that, more complications emerge from errors in pagination and the controversial "interruption theory" which suggests that a false start on F1 was made circa 1619 and consequently, some pagination and other errors ensued. The current line numbering systems (TLN and act.scene.line) break down when the complexities of FGEs try to accommodate textual differences. Why should F1 always be the base text? Why can't any edition, even Hamlet Q1, in any language, in any time, and in any place, become a base text? The answer is not simple. True, the weight of editorial orthodoxy and ideology privilege F1, but the practical fact of the matter is also that cross-referencing these lines in print is a nightmare. Hinman's landmark TLN (through line numbering) system and McKerrow's +1, +2 convention of re-numbering unsynchronized lines, have to date, been the only method of identifying lines across different textual variants when nonstandard (i.e.: non F1) editions are selected. All conventional line numbering methods, even Heminge-Condell, are contextually emergent from (and dependent on) print editions. You can only have one text against which the others are measured. Consequently, for better or worse, Hinman's tln system--and indeed in most print editions--use F1 as the default base text; and there is no flexible way to retain any kind of logical line re-numbering when gaps occur.

No longer is it true, as Jenkins and Foakes claimed, that "any editor has to choose between variant readings" (633); and no longer is it true, as Charles Whitworth observed, that the editor must "fix that which must remain unfixed, fluid, open, ambiguous, always at the mercy, inspired or banal of producers" (107). Rosenbaum describes a variation of this polemic in his discussion of the troublesome complexity of the ambiguous endings in Lear. There are those, says Rosenbaum, "who think it's [the bewildering phenomenon of meaning-altering choices] too complicated for the nonspecialist and can cause worries. In other words, you, the educated general reader, can't handle the truth" (128). Greenblatt's reported response was, "nonsense" (130).

Current editorial practice requires the flexibility to accommodate user preferences for any base text and a scalability that accommodates the collocation of multiple variations across time, medium and language. The editor's job has shifted from the expert who tells us which textual variant to privilege, to the facilitator who empowers us to select whichever edition we want to prioritize. This new world of editorial practice requires a new line numbering system and with it, a new way to markup text so that these reader-empowered choices can be properly--and efficiently--enabled. The Shakespeare XML Project (SXP) explores many ways to do this, but this essay only focuses on three practical responses to these new editorial principles.

Page 3

The SXP line numbering alternative is called RLN or "Relational Line Numbering." This new system allows any edition to be privileged over others and still allows textual collation, juxtaposition or superimposition without awkward and unnatural line numbering. This alternative method of markup only embeds code is directly related to navigational coordinates (act.scene.line, TLN and RLN), dramatic context (for example, speaker, listener and over-hearer), and properties of languages; any further markup uses X-Pointer to apply new XML tags dynamically to the appropriate coordinates in the text, and by implication, to all their corresponding lines variants in different media (theatrical and cinematic scripts) and language translations. Third and finally, I will briefly discuss the SXP's use of a genealogical phonetic tool called Soundex as an alternative way to apply single loose word search across multiple texts with variant orthography. Wikipedia calls Soundex a "phonetic algorithm for indexing names by sound, as pronounced in English" and as good luck would have it, PhP comes with built-in Soundex functionality! [Try entering "self" and "selfe" at <http://resources.rootsweb.ancestry.com/cgi-bin/Soundexconverter>]

RLN (Relational Line Numbering) shifts the focus away from the linear and sequential assumptions (limitations) inherent in the two conventional line numbering systems (act.scene.line and TLN). By assigning a unique identifier to each line much like DNA, all similar lines, regardless of the edition or even language in which they are written, share the same unique RLN. Thus, readers who locate a line in (say) F1 will easily be able to find its equivalents in eighteenth, nineteenth and twentieth editions, as well as in films, audio readings and even translations. Relational mapping allows scholars a more comprehensive way of studying the "versioning" of Shakespeare's plays over time without an awkward and unnatural numbering system that skips the literal sequencing on the page or adds "+" to additional lines. The Shakespeare XML project allows users to user conventional act.scene.line coordinates to find a line but underneath that search it is the RLN that makes everything work. In order to be consistent with the TLN system, each edition starts at 1 and increases incrementally by 1 for each line. In other words, SLN (Sequential Line Numbering) does not reproduce the unnatural skips and plusses that Hinman's TLN also produces.

Let's look at an example. First, using Excel, a map is made of each line in each version of each textual variant. The model is fully scalable so an infinite number of textual variants can be added. This mapping records both act.scene.line and TLN, but ultimately these are incidental in the RLN system--they are "legacy" navigational coordinates that can be used to locate a line. Clusters of similar lines are then colour coded to show they are part if a coherent unit, but each line always remains autonomous in its group. Each line in F1 has been assigned a unique ten-character id number using a tool called Password Generator <http://www.folder-password-expert.com/password-generator.html>. The ten-character string contains upper and lower case letters and numbers. In the example below, the line "You come most carefully vpon your houre." is TLN 7 in F1 and Q2, but it is TLN 4 in Q1. No matter. It's the same line wherever it is, and it has been assigned an RLN (in this case) of <y59zfaLJgG>. RLNs are not sequential and are randomly generated thus allowing for the infinite insertion (or deletion) of extra lines between existent TLNs. The line numbering flexibility also accommodates performance (theatrical or cinematic) scripts/texts that re-sequence scenes or even that add new lines (obviously not Shakespearean). So RLNs could, for example, easily link original lines from F1 to modern-day authored lines in a children's version of Shakespeare or any translation.

RLN assignments
Figure 1: Relational Line Numbers (RLNs) assigned to different lines in different editions (F1, Q2 and Q1)


Page 4

The resulting output locks each RLN to whatever act.scene.line or TLN number corresponds to it in all versions. Here below is what it looks like in F1.



RLN assignments
Figure 2: Showing the mapping of RLNs to sct.scene.line and true TLN (TTLN)


RLNs avoid the incongruence inherent in the TLN and act.scene.line systems. (5) The logic of true sequential line numbering or "true through line numbering" (TTLN) in individual quartos has been sacrificed in order to synchronize with F1. The physical constraints of printed text may have necessitated the collation of TLNs against some kind of "master copy," so by forced, generally-accepted convention, F1 has been the "gold standard" even though its authority has sometimes been contested and F1 is incomplete. Not so any more. There are many benefits. Why should the logic of the local line numbering in Q1 be sacrificed for the sake of line collation with other editions?

In the following F1 example, (6) the line numbering is logical because the line it remains consistent with the actual number of lines you see or "True Through Line Numbering System" (TTLN). It looks like this:

1 Actus Primus. Scoena Prima. Enter Barnardo and Francisco two Centinels. Barnardo. WHo's there? 5 Fran. Nay answer me: Stand & vnfold your selfe. Bar. Long liue the King. Fran. Barnardo? Bar. He. 10 Fran. You come most carefully vpon your houre.

but the numbering for Q1 (7) is defined by its relation to F1 and consequently the TTLN makes no sense in this standalone file: the fourth line you see is actually numbered as the 10th line. It looks like this:

Enter Two Centinels.
1. STand: who is that? 2. Tis I. 10 1. O you come most carefully vpon your watch,
Page 5

Note how Q1 skips from line 3 directly to line 10 in order to remain synchronized with F1, when in fact line 10 in Q1 ("1. O you come most carefully vpon your watch,") should really be TTLN 4 if it were to be consistent with its internal (local) line numbering logic.

This next example illustrates in a slightly different way, how digital editions remain unnecessarily trapped in the linear model of TLN with one standard base text. The editors follow the accepted practice of numbering line insertions by adding "+" to the extra lines while retaining the same stem number (in this case, 124):


123 The source of this our watch, and the chiefe head 1.1.106 124 Of this post hast and Romadge in the land. 1.1.107 124+1 { Bar. I thinke it be no other, but enso;} 1.1.108 124+2 {Well may it sort that this portentous figure} 1.1.109 124+3 {Comes armed through our watch so like the King} 1.1.110 124+4 {That was and is the question of these warres.} 1.1.111 124+5 { Hora. A moth it is to trouble the mindes eye:} 1.1.112

Those "+1, +2" sequences in the left-hand margin are cumbersome and do not map easily to other variants. Moreover, if Q1 with all its aberrations were also to be included in an enfolded edition, the overlays would exceed the space limits of the screen (or page). The complexity of three overlays, each with differing color and bracket codes, would be difficult to manage. Clearly, as long as F1 (or any text) remains an invariable, lodestar Copy-text, the TLN system will produce eccentric and illogical line numbering. The planned repository in the Shakespeare XML Project will accommodate an unlimited number of textual variants from any generation of editions (even in translation), each textual variant can always begin with line number 1 and proceed continuously. The logic of the local TLN numbering system (TTLN) will always remain consistent with itself.

Here's an example that shows the differences in the opening lines of FGE Hamlet texts but with the RLN assignment:

F1: TLN 1: Enter Barnardo and Francisco two Centinels. <wAGtmr84nw> Q1: TLN 1: Enter Two Centinels. <wAGtmr84nw> Q2: TLN 1: Enter Barnardo and Francisco two Centinels. <wAGtmr84nw> Q3: TLN 1: Enter Barnardo,and Francisco, two Centinele. <wAGtmr84nw> Q4: TLN 1: Enter BARNARDO, and FRANCISCO, two Centinels. <wAGtmr84nw>

Since each of these lines is essentially the same, they share the same RLN; in this case, the SXP has assigned all these lines the sample RLN of <wAGtmr84nw>. Once assigned, the "family" of lines associated with this line always remains associated by virtue of the shared the RLN; any edition therefore, can easily become the Copytext and any or all other additions will easily map to it. Find one instance of the line and you've found them all!

As long as the RLNs are mapped to each other, the reader can scroll down in one version while the other variants dynamically adjust. When collating or juxtaposing editions, the reader no longer needs to know the line or its TLN or act.scene.line designation, they merely have to locate any line from that family by searching for words or phrases. The Shakespeare XML Project is currently developing this functionality.

Page 6

(Apologies for this eccentric screen capture), but the line variant in this example is "Touching this dreaded sight, twice seene of vs," and it has been assigned an RLN of <jdltNeXV26>. That RLN maps to TTLN 32 in F1 and TTLN 30 in Q1. At the top of the menu, the user can then "lock in" the synchronicity of line numbering between the two versions, or unlock them and scroll them independently. The highlighted lines (40 in F1 and 30 in Q1) are simply the natural highlighting that occurs every 10 lines. It's the two lines at the top of each column (32 in F1 and 21 in Q1) that are the focus of this discussion.



RLN assignments
Figure 3: Synchronizing the same R:N number across two different textual variations (F1 and Q1)

This penultimate example illustrates what happens when one or more TLNs are different:

F1: TLN 16: Enter Horatio and Marcellus. <5khRVKOjF3> Q1: TLN 8: Enter Horatio and Marcellus. <5khRVKOjF3> Q2: TLN 16: Enter Horatio and Marcellus. <5khRVKOjF3> Q3: TLN: 16: Enter Horatio and Marcellus <5khRVKOjF3>

All these lines are considered equivalent and therefore they share the same RLN, in this case <5khRVKOjF3>. Find one line and you easily find them all.

The same RLN can also be assigned to more than one TLN. In this example, notice that Q3 distributes the same single line over two lines, numbers 13 and 14. The ability to take the same RLN and distribute it over two or more lines is spanning an RLN.

F1: TLN 14: Well, goodnight. If you do meet Horatio and <dhnB5xGfYy> Q1: TLN 5: And if you meete Marcellus and Horatio, <dhnB5xGfYy> Q2: TLN 14: Well, goodnight. If you do meet Horatio and <dhnB5xGfYy> Q3: TLN 13: Well, good night: <dhnB5xGfYy> Q3: TLN 14: If you do meet Horatio and Marcellus <dhnB5xGfYy>

All these lines are equivalent even though some are expressed as one line and others as two: therefore, they all share the same RLN, in this case, <dhnB5xGfYy>. When a visualization of this collation is displayed, the output shows that TLN 13 and 14 in Q3 are relationally mapped to TLN 14 in F1 and Q2, and TLN 5 in Q1.

As well, the SXP will soon have a "zoom" feature uninstalled which will allow readers to browse other lines in the vicinity of any particular line.

Here is a slightly different example. All of the following lines are equivalent even though some are expressed as (contained in) one line and others as two: therefore, they all share the same RLN, in this case, <dhnB5xGfYy>. When a visualization of this collation is displayed, the output shows that TLN 13 and 14 in Q3 are relationally mapped to TLN 14 in F1 and Q2, and TLN 5 in Q1.



F1: TLN 14: Well, goodnight. If you do meet Horatio and <dhnB5xGfYy> Q1: TLN 5: And if you meete Marcellus and Horatio, <dhnB5xGfYy> Q2: TLN 14: Well, goodnight. If you do meet Horatio and <dhnB5xGfYy> Q3: TLN 13: Well, good night: <dhnB5xGfYy> Q3: TLN 14: If you do meet Horatio and Marcellus <dhnB5xGfYy>

The Shakespeare XML Project allows readers to focus on texts in the database in different three ways: play-centric ("choose Hamlet or Lear"), edition-centric ("choose, Hamlet in any combination of Q1, Q2, Q3, Q4, F1"), or unit-centric (choose a line, phrase, or word from any edition and trace it in other editions over time). Plays or passages can be juxtaposed, but because each line is defined as a discrete bit of datum, readers can view single lines (and their words or phrases in as many editions or textual variants as they wish). The SXP cerates a kind of mashup, or über edition; by selecting particular combinations of plays, editions, and words/lines/phrases, a "subset edition" or subset mashup is created which can then be printed locally or, if copyright permissions allow, sent directly to a publisher for printing and binding. (This functionality is roughed in, but not yet built and we have not yet begun the necessary discussions with publishers). The SXP mashup is a master library, holding a superset of an infinite possible number of textual variants that can be re-mixed in multiple combinations and permutations. Because everything is built in open source, the SXP resource also connects easily to other electronic editions--even legacy ones.

Page 7

Problem 2: The Inflexibility of Static, Embedded Coding



Please note: Time and space do not allow me to include the schema and full XML grammar that the SXP has developed. The 32 page document is available to those who have an guest account for the system (8). The main point here is that the SXP avoids embedding hard-coded tags around the actual Shakespearean text as much as possible and offloads it to X-Pointer (9) so that the markup can take place virtually.

The methods and standards of current markup practice are dominated by the TEI model <http://www.tei-c.org>. As they themselves claim, "Since 1994, the TEI Guidelines have been widely used by libraries, museums, publishers, and individual scholars to present texts for online research, teaching, and preservation." The TEI's guidelines (P5 is the current standard) provide a logical rubric, a protocol, for the coding and markup electronic texts. The guidelines "...specify a set of markers (or tags) which may be inserted in the electronic representation of the text, in order to mark the text structure and other features of interest." The TEI is also flexible in its guidelines, allowing individuals to develop simpler data capture TEI-conformant schemas, for example with limited numbers of elements, or with shorter names for the tags being used most often." The Chapter 7 guidelines ("Performance Texts") have been designed to describing dramatic texts, and yet hey are not a good fit for the particular complexities of editing Shakespeare. Here is why.

Let's compare the TEI markup of the first few lines of Hamlet Q2, with the SXP's markup of the same lines. Notice the following things. First the SXP markup uses the RLN system to synchronize line variants with TLNs and act.scene.line coordinates across an infinite number editions; each new edition is marked up only one time with these navigational, dramatic context and language tags. This means that once a text has been marked up there is no need (ever) to go back to each edition and add more code if at a later point someone wanted to add new XML tags that identified grammatical or semantic properties or any other characteristic. So let's say someone looked at the line "Who's there" and wanted to add tags to describe that line. If I had multiple editions in the database base (F1, Q1, Q2, Q3, Q4, F2, F3, Rowe, Pope, Johnson, Capell, Warbrton, Theobald, even Bowdler) I would have to open up each edition and hard-code the same tags 14 different times! And even then, I would not be finished if new tags came along.

By using X-Pointer, SXP's markup tags can be dynamically assigned to the RLN number (in this case <Fk04TfCPFn>), and every edition with this line would be marked up in the same way when/if needed. A secondary benefit is that an infinite number of new tags can be added virtually through X-Pointer and therefore, the number of hard-coded tags inside the actual Shakespearean text of any/all these editions would never change and therefore, never become excessive. In this way, an infinite number of new markup tags can be added to every play/script/transcript in the database with only a few keystrokes. Second, the only other attributes the SXP marks up are dramatic context (who is speaking, who is being addressed and who overhears) and language. This structure, which separates language from dramatic context allows for words or phrases to be searched in with or without attention to speaker, listener and overhearer. More of this in the third section of this article, but the SXP markup allows searches by gender (including berdache), social status, and even familial relationships.

Page 8
[Example of typical TEI markup] 
<div1 n="I" type="Act"> <head>ACT I</head> <div2 n="1" type="Scene"> <head>SCENE I</head> <stage rend="italic">Enter Barnardo and Francisco, two Sentinels, at several doors</stage> <sp> <speaker>Barn</speaker> <l part="Y">Who's there?</l> </sp> <sp> <speaker>Fran</speaker> <l>Nay, answer me. Stand and unfold yourself.</l> </sp> <sp> <speaker>Barn</speaker> <l part="I">Long live the King!</l> </sp> <sp> <speaker>Fran</speaker> <l part="M">Barnardo?</l> </sp> <sp> <speaker>Barn</speaker> <l part="F">He.</l> </sp> <sp> <speaker>Fran</speaker> <l>You come most carefully upon your hour.</l> </sp> <sp> <speaker>Barn</speaker> <l>'Tis now struck twelve. Get thee to bed, Francisco.</l> </sp> <sp> <speaker>Fran</speaker> <l>For this relief much thanks. 'Tis bitter cold,</l> <l part="I">And I am sick at heart.</l> </sp> </div2> </div1>
Page 9
[Example of typical SXP markup]
<textBody> <locator tln="4" act="1" scene="1" line="2" rln="wAGtmr84nw"> <format appear="emphasis" align="center"> <stageDirection> Enter Barnardo, and Francisco, two Centinels. </stageDirection> </format> </locator> <locator tln="5" act="1" scene="1" line="4" editionId="" rln="Fk04TfCPFn"> <format appear="emphasis"> <cue charName="Barnardo" cueCode="" cueId=""> Bar. </cue> </format> <speaker charName="Barnardo" toDirect="Francisco" toIndirect="" speakerId=""> <language word="2" syllable="2"> VVHose there? </language> </speaker> </locator> <locator tln="6" act="1" scene="1" line="5" editionId="" rln="P5OF143A4h"> <format appear="emphasis"> <cue charName="Francisco" cueCode="" cueId=""> Fran. </cue> </format> <speaker charName="Francisco" toDirect="Marcellus" toIndirect="" speakerId=""> <language word="8" syllable="10"> Nay answere me. Stand and vnfolde your selfe. </language> </speaker> </locator> <locator tln="7" act="1" scene="1" line="7" editionId="" rln="4B3AzBtAvz"> <format appear="emphasis"<>cue charName="" cueCode="" cueId=""> Bar. </cue></format> <speaker charName="" toDirect="" toIndirect="" speakerId=""> <language word="" syllable=""> Long liue the King, </language> </speaker> </locator> <locator tln="8" act="1" scene="1" line="8" editionId="" rln="9EG15x2tNS"> <format appear="emphasis"> <cue charName="Franscisco" cueCode="" cueId=""> Fran. </cue> </format> <speaker charName="Francisco" toDirect="Barnardo" toIndirect="" speakerId=""> <language word="1" syllable="3"> Barnardo. </language> </speaker> </locator> <locator tln="9" act="1" scene="1" line="9" editionId="" rln="C1D3vofATy"> <format appear="emphasis"> <cue charName="Barbardo" cueCode="" cueId=""> Bar. </cue> </format> <speaker charName="Barnardo" toDirect="Francicso" toIndirect="" speakerId=""> <language word="1" syllable="1"> Hee. </language> </speaker> </locator> <locator tln="10" act="1" scene="1" line="10" editionId="" rln="y59zfaLJgG"> <format appear="emphasis"><cue charName="" cueCode="" cueId=""> Fran. </cue> </format> <speaker charName="Francisco" toDirect="Barbardo" toIndirect="" speakerId=""> <language word="7" syllable="9,11"> You come most carefully vpon your houre, </language> </speaker> </locator> <locator tln="11" act="1" scene="1" line="11" editionId="" rln="oxBY958lNg"> <format appear="emphasis"> <cue charName="Barbardo" cueCode="" cueId=""> Bar. </cue> </format> <speaker charName="Barbardo" toDirect="Francisco" toIndirect="" speakerId=""> <language word="10" syllable="11"> Tis now strooke twelfe, get thee to bed Francisco, </language> </speaker> </locator> <locator tln="12" act="1" scene="1" line="12" editionId="" rln="5sD8ZwDITc"> <format appear="emphasis"> <cue charName="Francisco" cueCode="" cueId=""> Fran. </cue> </format> <speaker charName="Francisco" toDirect="Marcellus" toIndirect="" speakerId=""> <language word="9" syllable="10"> For this reliefe much thanks, tis bitter cold, </language> </speaker> </locator> <locator tln="13" act="1" scene="1" line="13" editionId="" rln="WBauai8n7Y"> <speaker charName="Francisco" toDirect="Marcellus" toIndirect="" speakerId=""> <language word="6" syllable="6"> And I am sick at hart. </language> </speaker> </locator> <textBody>
Page 10

The concept behind conventional, hard-coded markup aims for a universal template, applicable not only to all drama from all time periods and subgenres, but also for "other performance texts such as cinema or TV scripts ...". However, in so doing such an aim inadvertently reproduces the editorial limitations and contributes to the editorial complexities already deeply ingrained in Shakespeare's FGEs. The P5 guidelines, for example, could be modified to accommodate the variations in line numbering as well as different systems of line numbering (TLN, divisions for the quartos, and act.scene.line for F1), but as it stands, the <l></l> tag is not particularly helpful (in this example) as a navigational marker unless RLNs are included. Adding an attribute to the <l> element would work as in these fictitious examples: <l acl="3.2.1">Some text here</l> or <l tln="123">Some text here</l> or <l rln="6gT54Faw@q">Some text here</l>.

The TEI guidelines state that the "who" attribute of the <speaker> element must "unambiguously identify the character to whom the speech is assigned" (210), but in a play like Lear where a composite rendering of the ending would require precisely this kind of ambiguity, the logic of the guidelines becomes difficult to sustain. Besides, in cases where there are more than two people on stage at once there is no way to tell who might also be an intended recipient of speaker's words. The idea of dramatic context is not just about the speaker and the person to whom words may ostensibly be spoken directly, but also a matter of who else is on stage at that moment and might over hear. So while P5 only tracks speaker and listener, SXP tracks speaker, the person spoken to (<toDirect>) as well as anyone else on stage who could potentially overhear those words (<toIndirect>).



Search by dramatic context
Figure 4: Search by dramatic context
So, SXP code like this includes direct and indirect listeners.
<locator tln="11" act="1" scene="1" line="11" editionId="" rln="oxBY958lNg"> <format appear="emphasis"> <cue charName="Barbardo" cueCode="" cueId=""> Bar. </cue> </format> <speaker charName="Barbardo" toDirect="Francisco" toIndirect="" speakerId=""> <language word="10" syllable="11"> Tis now strooke twelfe, get thee to bed Francisco, </language> </speaker> </locator>
but each character is also on a node that connects them to other kinds of dramatic metadata in the head of the file:
<persona charName="Polonius" charAbbr="Pol" charAnnotationPlay="" charAnnotationEditor="father of ophelia, father of Laertes, courtier of Denmark, spy of Denmark" gender="male" auralStyleSheetURI=".css" totalLinesSpoken="" totalWordsSpoken="" totalLinesOfProse="" totalLinesOfPoetry="" totalLinesHeard="" totalWordsHeard="" actorName="" performanceDate="" performanceLocation="" characterId="" performanceId="" />

Thus it is possible to search language only, or to search language (a word or phrase) by speaker-listener, but it is also possible to do a search that asks for (say) any time Polonius uses a particular word when a) Ophelia is being spoken to directly, or b) when Ophelia is on stage but not addressed directly (<inDirect>).

There's more. The TEI model (or any hard-coded model) embeds all the XML coding physically and literally, right into the file; the tags are wrapped around the actual dramatic text. This necessarily means the more markup that is added (semantic, grammatical, dramatic, critical, annotative, biographical, linguistic, historical, rhetorical, etc.), the more cluttered the text lines become with code. The file swells in size and takes longer to parse. Hard coding layers and layers of markup tags is no longer necessary thanks to technologies like X-Pointer and X-Path which allow coding to be applied dynamically and virtually from outside the file. The SXP uses CodeIgniter to help achieve this on-the-fly capability. One of CodeIgniter's many advantages is its powerful ability to leverage these XML standards to assign markup code to a line dynamically. The TEI's P5 Guidelines recognize X-Pointer and X-Path (10) and its nodes, ranges, and points, but exploiting this capability further is worthwhile pursuing because many new XML tags can be added dynamically and post-markup and as long as we know what the RLN is, the same markup can be applied immediately to many different textual variants and editions.

There are two additional character qualifiers here, and more details about them are stored in the tags at the top of the file (outside ). Male and female designations are clear enough, but berdache defines gender queer characters in disguise or not; those attributes are either MF (masculine female) or FM (feminineMale). Supernatural characters such as the ghost of Hamlet senior, Banquo, Ariel, Puck, and so on, can also be assigned gender attributes.

And so it is that the Shakespearean text in SXP contains very few markup tags; it is very efficient in choosing which tags are physically embedded in the Shakespearean text. Navigational coordinate tags track the relation between act.scene.line, TLN, (divisional markers for the quartos) and RLNs. Dramatic-context related tags (<cue> and <speaker>) describe the speaker, the direct listener(s) and anyone onstage at the time who hears or overhears what is said but is not directly addressed. Language tags separate spoken words from dramatic context (speaker-listeners) and contain fields for number of syllables per line (prose is coded as a zero). A few miscellaneous tags round it off: formatting tags to deal with alignment and text properties (bold-italic), stage direction tags, and heading tags. Once coded and edition never has to be coded again, no matter how many new tags are added or what database or statistical processes engage it, because all the other tags are applied dynamically from outside the file when, and as they are needed. An infinite number of new XML markup tags can be added without cluttering the text or having to repeat the embedded tagging for each variant or edition.

SXP code wraps (nests) language tags inside dramatic context tags, inside navigational coordinates tags. Format tags can be nested anywhere and basically describe italic and/or bold.

[One marked-up line of SXP code showing navigational coordinates, 
formatting tags, dramatic context and language XML descriptors]
<locator tln="20" act="1" scene="1" line="22" rln="mS7BW6rlNX"> <format appear="emphasis"> <cue charName="Francisco" cueCode="" cueId=""> Fran. </cue> </format> <speaker charName="Francisco" toDirect="Marcellus" toIndirect="Barnardo,Horatio" speakerId=""> <language word="4" syllable="4"> Giue you good night. </language> </speaker> </locator>

The <locator> element is navigational coordinates; <format> is self-evident, and <cue> marks the beginning of the dramatic context markup. The <speaker> element has four attributes: charName, toDirect, toIndirect, and speakerId. The charName attribute maps to a <cue> element as well as to a name or cluster of name variants (Ham. Ha. Hamlet) in the <dramatisPersonae> at the top of the file.

The code also contains a word count for that line and a total syllabic count, regardless of meter. These values are added by humans. Our word count separates contractions ('Tis = it is = 2 words) and simply counts the number of syllables per lines. In cases where the syllable count is ambiguous, the uncertainty is entered as comma-separated range value, as in the above example.

Page 11

SXP coding allows for a number of other kinds of refined searches. For example, search by a single play, any number of plays, or all plays in the database--and/or by language only, with or without regard for dramatic context. Because we scan meter syllabically only and we do not use the classical-accentual method of counting feet (iambs, dactyls, etc.) we can easily identify tumbling, truncated or otherwise unusual lines. The loose search uses Soundex (see below).



Figure 5 : Searching the ShakespeareXML database using language criteria only

The SXP language element separates language (words spoken) from the speaker (full dramatic context), just in case the search request is not interested in who is speaking or being spoken to; it also contains a word count for that line and a total syllabic count, regardless of meter. These values are calculated by humans. The SXP word count separates contractions ('Tis = it is = 2 words) and counts the number of syllables per line. In cases where the syllable count is ambiguous, the uncertainty is entered as comma-separated range value, as in this example:

[SXP markup of syllables showing an ambiguous syllable count in the line]
<language word="5" syllable="11,12"> Welcome Horatio, welcome good Marcellus, </language>

Here, the syllabic ambiguity is in "Horatio," which could scan as two syllables (Hor-RAY-sho) or three (Hor-RAY-SHE-o). The metric ambiguity in that line is expressed as "11,12." In cases where the range of possibility is greater, the lower and upper limits inclusively are listed, as in this case "9,12" meaning the line could have 9, 10, 11, or 12 syllables. Rather than relying on accentual-syllabic metrical analysis, the SXP counts only the number of syllables per line and notates aberrations from the metric base (decasyllabic). Thus in the search parameters, readers can filter results based on "tumbling" lines (11 syllables), "truncated" lines (9 syllables), "short" lines of 8 or less syllables (see Figure 5 above).

The SXP uses syllabic meter (simply the total number of syllables per line) whereas the TEI counts syllables using classic metric scansion (feet). In the SXP, line numbers and segments are delineated and then aberration from the metric base are identified and described. In this case metric exceptions in segment 1 (a spondee), and in segment 4 (a trochee) and identified and described as they occur.

[TEI markup of syllables showing meter expressed in feet]
<l n="356"> A need<seg type="foot" n="2" real="--">less a</seg>lexandrine ends the song,</l> <l n="357" met="-+|-+|-+|-+|-+|-+"> <seg n="1" real="++"> That, like </seg> a wounded snake, <seg n="4" real="+-"> drags its </seg> <seg n="5" real="++"> slow length </seg> along. </l>

Page 12

The SXP's outer (wrapper) or <locator> tag is a node used by X-Pointer and X-Path when dynamic markup is applied. The formatting tags (italics, etc.) can be nested inside any of the tags in the text. The <cue> tag registers the speaker and is equivalent to the TEI's <castItem> and <role> tags. The SXP's <cueCode> and tags are reserved for future use and will likely be used to port out to voice synthesizers. The tags may appear redundant since the <cue> tags already identify the speaker, but the grammar is so structured in order to accommodate situations where the cue in the text misidentifies the actual speaker. The SXP's speaker tag has four attributes: charName, toDirect, toIndirect, and speakerId. In this example, Francisco is the cued speaker so he is charName; he speaks directly to Marcellus (toDirect), but Barnardo and Horatio are also present and can hear the lines, so they are inDirect auditors. This information describes dramatic context and allows searches that ignore or include dramatic context. Let's look again at this example:

[One marked-up line of SXP code showing navigational coordinates, dramatic context and 
language parameters]
<locator tln="20" act="1" scene="1" line="22" rln="mS7BW6rlNX"> <format appear="emphasis"> <cue charName="Francisco" cueCode="" cueId=""> Fran. </cue> </format> <speaker charName="Francisco" toDirect="Marcellus" toIndirect="Barnardo,Horatio" speakerId=""> <language word="4" syllable="4"> Giue you good night. </language> </speaker> </locator>

Where there is uncertainty about whether or not an onstage character might have overheard something, a "?" after their name (Horatio?) marks the ambiguity. Markup should never obliterate or ignore ambiguity, only preserve it.

Thus, SXP markup allows searches by language and/or by dramatic context and/or by locator across one or more play(s), editions, textual; variants or scripts. As soon as the reader/user fills out a section of the search form those parameters are automatically included in the search. If the user/reader does not fill out anything in a given section of the search form, those parameters are excluded.



Search 2
Figure 6 : Searching by play


Stage directions are also assigned line identifiers (a navigational locus), are are coded like this:

<locator tln="1" act="1" scene="1" line="2" locatorId="wAGtmr84nw"> <format appear="emphasis" align="center"> <stageDirection> Enter Barnardo and Francisco two Centinels. </stageDirection>> </format> <format appear="emphasis" align="center"> <cue charName="Barnardo" cueCode="" cueId=""> Barnardo. </cue> </format> </locator>

This stage direction is also linked by virtue of its RLN, to all other versions of this particular stage direction. The <speaker> tag, like the <cue> tag, contains the charName attribute. This is not redundant because sometimes the cue and the speaker names are indistinguishable, other times, not. If a <cue> does not match the actual speaker, the system allows the discrepancy. Additionally, by embedding the charName attribute in the <speaker> and <cue> tags, the search process can be executed from outside the <cue>-<speaker> tags inwards, or from the inside outward, thus enabling two different kinds of search to happen in two different ways and making the parsing speed shorter. The attributes toDirect and toIndirect describe the person(s) to whom the lines are being directly addressed and others who may be intentionally and/or ironically overhearing (such as Polonius lurking behind the arras), or to describe anyone else on stage at that moment, and therefore, who deliberately or accidentally overhears what is being spoken. If more than one character is spoken to, or overhears, they are added as values of that attribute, in alphabetical order and comma separated.

Page 13

Problem 3: Search Functions

The anatomy a full SXP XML file for any given play has four general sections: <about>, <dramatisPersonae>, <navigation>, and <textBody>. Discussion here focuses on the fourth part, but there is large metadata segments at the top of the file. (These are described fully in the documents that describe the SXP's XML grammar.) The tags for metadata accommodate information about the specifics of the edition, copy name and item number, translator, the precise physical properties of the book, provenance, compositors, signature numbering and much more.

The outer (wrapper) or <locator> tag is a navigational coordinate and indicates TLN, act-scene-line equivalent and the RLN. This node is used by X-Pointer and X-Path when dynamic markup is applied. The formatting tags (italics, etc.) can be nested inside any of the tags in the text. The <cue> tag registers the delineated speaker and is equivalent to the TEI's <castItem> and <role> tags. The <cueCode> and <cuId> tags are reserved for future use and will likely be used to port out to voice synthesizers (11). The <speaker> tags may appear redundant since the <cue> tags already identify the speaker, but the grammar is so structured in order to accommodate situations where the cue in the text misidentifies the actual speaker. The speaker tag has three attributes: charName, toDirect, toIndirect, and speakerId. In this example, Francisco is the cued speaker so he is charName; he speaks directly to Marcellus (toIndirect), but Barnardo and Horatio are also present and can hear the lines, so they are inDirect auditors. This information describes dramatic context and allows searches that ignore or include dramatic context. Where there is uncertainty about whether or not an onstage character might have overheard something, a "?" after their name (Horatio?) marks the ambiguity. Markup should never obliterate or ignore ambiguity, only preserve it.

<locator tln="20" act="1" scene="1" line="22" locatorId="mS7BW6rlNX"> <format appear="emphasis"> <cue charName="Francisco" cueCode="" cueId=""> Fran. </cue> </format> <speaker charName="Francisco" toDirect="Marcellus" toIndirect="Barnardo,Horatio?" speakerId=""> <language word="4" syllable="4"> Giue you good night. </language> </speaker> </locator>

Stage directions are also assigned RLNs and can be searched alone or in dramatic context.

<locator tln="1" act="1" scene="1" line="2" locatorId="wAGtmr84nw"> <format appear="emphasis" align="center"> <stageDirection> Enter Barnardo and Francisco two Centinels. </stageDirection> </format> <format appear="emphasis" align="center"> <cue charName="Barnardo" cueCode="" cueId=""> Barnardo. </cue> </format> </locator>

This stage direction is also linked, by virtue of its RLN, to all other versions of this stage direction in this place. The tag, like the tag, contains the charName attribute. This is not redundant because sometimes the cue and the speaker names are indistinguishable and sometimes not. If a does not match the actual speaker, the system allows the discrepancy. Additionally, by embedding the charName attribute in the and tags, the search process can be executed from outside the - tags inwards, or from the inside outward, thus enabling two different kinds of search to happen in two different ways and making the parsing speed shorter. The attributes toDirect and toIndirect describe the person(s) to whom the lines are being directly addressed and others who may be intentionally and/or ironically overhearing (such as Polonius lurking behind the arras), or to describe anyone else on stage at that moment, and therefore, who deliberately or accidentally overhears what is being spoken. If more than one character is spoken to, or overhears, they are added as values of that attribute, in alphabetical order and comma separated.

Page 14

Since the language tags separate the words from the speaker they make it possible to search in ways that disregards-or include-dramatic context. Searches can be loose or strict; a loose search ignores orthographic variation. The SXP uses Soundex i for loose searches. Soundex is a phonetic algorithm especially developed for use by genealogists to track family names where there are phonetic variants. Soundex works beautifully because it is automatically built in as a function of PHP and merely needs to be called; no additional programming is required. Although some spelling variations yield slightly different Soundex values (leave=L100 and leaue=L000) these can easily be accommodated through a separate lookup table that lists equivalencies: they are few in Shakespeare: u=v, i=j and occasionally w and vv.

search function

If only the language section of the search form is filled out, the other sections are disregarded, by default. If however, the reader selects a parameter from language and from plays, then the search engine looks for that word (or words) across one, several, or all plays in the database. Further filtering is possible by limiting the search to a range or number of lines or coordinates with one or more plays.

search function

Similarly, the search can include or exclude any one or more of the following parameters from dramatic context. Thus, it is possible to search for word usage based on how (for example) male characters address female siblings.

search function
Page 15

To make searches fast and convenient, the SXP has added autocomplete (12) to its other search functions. By typing in a word or phrase, the autocomplete function finds the line. In the following illustration, the entry of a single word ("who") instantly yields a list of all the lines in all the plays and all their variants where that word appears. If two words are entered, the system returns all the instances where those two words appear next to each other; this search parameter is analogous to the Boolean "adjacent."



Autocomplete
Figure 7: The autocomplete search function

In SXP, autocomplete may be used with or without accidentals (punctuation, capitalization etc.)-this feature was added in order to accommodate users whose edition rendered phrases with or without small eccentricities. Autocomplete can also invoke the system's loose search that uses Soundex. Once the line has been located, a double click on that line yields its RLN and consequently, the other line-equivalents that share that RLN, in every other textual variant (assuming it is in the SXP database or other databases in other sites into which and out of which we port).

Page 16

Afterword

The SXP has portals out and entry points in that allow for links and data-streams to facsimiles, to other analytical tools, and to other editions (predicated of course on copyright clearance and other relevant legal agreements). Feeds to and from other analytic tools are much more difficult to design even with the power of XML and schema because of differences in technologies and context. The TEI has is a model for standardized markup of dramatic texts, but that kind of standardization is not as necessary as some might think. It does not matter if editions do not use all (or any) TEI tags; it is more important to standardize how resources talk to each other through the differences-that, is the true spirit of Web 2.0. So we return to the earlier point about the radical implications of Web 2.0: the seamless integration of disparate and varied sources as the desired goal and the biggest challenge. The importance of standardization should not be confused with the tyranny of conformity. The next section explains some the SXP is coded and shows why the TEI guidelines, in spite of their great value elsewhere, have become more of a hindrance than a help.

The TEI's resource, Electronic Textual Editing: Editors' Introduction, observes that "[t]he rapid spread of computing facilities and developments in digital technology in . offered the possibility of circumventing a number of practical (both physical and economic) limitations posed by the modern printed codex" (Burnard). Clearly, our dependency on physical text is, and will remain, pleasantly unavoidable, even though digitization promises to alleviate some limitations of physical text and offer new ways to visualize complexity. The largest part of the SXP, still undeveloped, promises to be the most powerful aspect if the resource. Html files do not have to "exist" anywhere anymore in the sense that they are actual files saved somewhere on a server; now they are generated dynamically by the database, given form by PHP or similar scripting languages and rendered by a browser. CodeIgniter even dynamically generates those databases "on the fly," just-in-time as needed, and only as long as needed; that database in turn then generates the html. Used in conjunction with X-Pointer, CodeIgniter can also help gather clusters of data from anywhere, assemble them into a new database, mark them up with the appropriate XML, and then transform them into PHP and HTML files, but only for the moment they are needed. We return once again then, to McLuhan's notion of the transience in orality; this Web 2.0 way of thinking about data means that "data" exists now, not as a static record in some database on some server somewhere, but it exists like a oral moment of intensity, fleeting and ethereal, and consigned to obscurity unless someone takes a moment to save-or print-it. That fleeting and transient moment of data alignment is an edition brought into existence for hours, minutes, or even seconds, and then discarded. This eternal cycle of transience is what editioning Shakespeare in a Web 2.0 world will mean.

If you already have a Shakespeare XML account you can login to the website right from here:





If you would like to obtain a guest account please fill out this request form. All other comments or inquiries should be addressed to: editors@shakespearexml.ca or shakes@yorku.ca. We keep your private information confidential.

Special Thanks

Funding for this project was generously provided by the Faculty of Arts, York University, and by The Social Sciences and Humanities Research Council of Canada (Small Grants Program).

York Univeristy logo

Thanks Melissa Dalgleish, Regi Khokher, Sonia Strimban, Nicole Renee Pereira, Mark Wadman, Tianshi Li, Xiaoli (Shirley) Hu, Boze Zekan and Dessy Pavlova.

Thanks also to Derek Allard at Dark Horse Consulting and Code Igniter