<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-8725958</id><updated>2012-01-17T22:05:13.453-05:00</updated><category term='TEI'/><category term='javascript prototype'/><category term='data curation'/><category term='digital curation'/><category term='zotero'/><category term='DH'/><category term='IDP'/><category term='balisageConference08'/><category term='Theory'/><category term='Ruby XSLT'/><title type='text'>Scriptio Continua</title><subtitle type='html'>Thoughts on software development, Digital Humanities, the ancient world, and whatever else crosses my radar.  All original content herein is licensed under a &lt;a href="http://creativecommons.org/licenses/by/3.0/us/"&gt;Creative Commons Attribution&lt;/a&gt; license.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>41</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-8725958.post-436560448943249136</id><published>2011-11-14T19:23:00.001-05:00</published><updated>2011-11-14T22:05:14.578-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TEI'/><category scheme='http://www.blogger.com/atom/ns#' term='Theory'/><category scheme='http://www.blogger.com/atom/ns#' term='DH'/><title type='text'>TEI in other formats; part the second: Theory</title><content type='html'>In my &lt;a href="http://philomousos.blogspot.com/2011/11/tei-in-other-formats-part-first-html.html" target="_blank"&gt;first&lt;/a&gt; post on this subject, I poked a bit at how one might represent TEI in HTML without discarding the text model from the TEI document. Now I want to talk a bit more about that model, and the theory behind it. I may at the end say irritable things about Theory as well, if you can stand it until the end. I humbly beg the reader's pardon.&lt;br /&gt;&lt;br /&gt;Let's look at the same document I talked about last time:&amp;nbsp;&lt;a href="http://papyri.info/ddbdp/p.ryl;2;74/source" style="background-color: whitesmoke; border-bottom-color: red; border-bottom-style: dashed; border-bottom-width: 1px; font-family: Georgia, 'Times New Roman', serif; line-height: 22px; text-decoration: none;"&gt;http://papyri.info/ddbdp/p.ryl;2;74/source&lt;/a&gt;&lt;span class="Apple-style-span" style="background-color: whitesmoke; font-family: Georgia, 'Times New Roman', serif; line-height: 22px;"&gt;&amp;nbsp;(see also&amp;nbsp;&lt;/span&gt;&lt;a href="http://papyri.info/ddbdp/p.ryl;2;74/" style="background-color: whitesmoke; font-family: Georgia, 'Times New Roman', serif; line-height: 22px; text-decoration: none;"&gt;http://papyri.info/ddbdp/p.ryl;2;74/&lt;/a&gt;). We can visualize the document structure using &lt;a href="http://www.graphviz.org/" target="_blank"&gt;Graphviz&lt;/a&gt;&amp;nbsp;and a spot of XSLT (click for high-res):&lt;br /&gt;&lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://philomousos.com/images/p.ryl.2.74.dot.png" style="margin-left: auto; margin-right: auto;" target="_blank"&gt;&lt;img border="0" height="51" src="http://2.bp.blogspot.com/-WDlALdxYAa4/TsG0cE5fhXI/AAAAAAAAADY/Yfd2RdFw2o0/s640/p.ryl.2.74.dot.png" width="640" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;tree structure of a TEI document&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;div&gt;It's a fairly flat tree. As an XML document, it has to be a tree, of course, and TEI leverages this built-in "tree-ness" to express concepts like "this text is part of a paragraph" (i.e. it has a tei:p element as its ancestor). In line 1, for example, we find&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;blockquote class="tr_bq"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;span class="webkit-html-tag" style="white-space: pre-wrap;"&gt;&amp;lt;supplied reason="lost"&amp;gt;Μάρ&amp;lt;/supplied&amp;gt;κος&lt;/span&gt;&lt;/span&gt;&lt;/blockquote&gt;meaning the first three letters of the name Markos have been lost due to damage suffered by the papyrus the text was written on, and the editor of the text has supplied them. The fact that the letters "Μάρ" are contained by the supplied element, or, more properly that the text node containing those letters is a child of the supplied element, means that those letters have been supplied. In other words, the parent-child relationship is given additional semantics by TEI. We already have some problems here: the child of supplied is itself part of a word, "Markos", and that word is broken up by the supplied element. Only the fact that no white space intervenes between the end of the supplied element and the following text lets us know that this is a word. It's even worse if you look at the tree version, which is, incidentally, how the document will be interpreted by a computer after it has been parsed:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-seS0EwPiCdo/TsG8UYOyLaI/AAAAAAAAADg/ntRasTl0Tzo/s1600/p.ryl.2.74.dot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-seS0EwPiCdo/TsG8UYOyLaI/AAAAAAAAADg/ntRasTl0Tzo/s1600/p.ryl.2.74.dot.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;There's no obvious connection here between the first and second halves of the name. And in fact, if we hadn't taken steps to prevent it, any program the processed the document might reformat it so that "Mar" and "kos" were no longer connected. We could solve this problem by adding more elements. As the joke goes, "XML is like violence. If it isn't working, you're not using it enough." We could explicitly mark all the words, using a "w" element, thus:&amp;nbsp;&lt;/div&gt;&lt;blockquote class="tr_bq"&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace; white-space: pre-wrap;"&gt;&amp;lt;w&amp;gt;&amp;lt;supplied reason="lost"&amp;gt;Μάρ&amp;lt;/supplied&amp;gt;κος&amp;lt;/w&amp;gt;&lt;/span&gt;&lt;/blockquote&gt;or&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-eZJFCQBZM2Y/TsHBTBYcBaI/AAAAAAAAADo/_qlDokCEamw/s1600/p.ryl.2.74.dot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-eZJFCQBZM2Y/TsHBTBYcBaI/AAAAAAAAADo/_qlDokCEamw/s1600/p.ryl.2.74.dot.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;which would solve any potential problems with words getting split up, because we could always fix the problem—we would know what all the words are. We could even attach useful metadata, like the lemma (the dictionary headword) of the word in question. We don't do this for a couple of reasons. First, because we don't need to. We can work around the splitting-up of words by markup. Second, because it complicates the document and makes it harder for human editors to deal with, and third, because it introduces new chances for overlap. Overlap is the Enemy, as far as XML is concerned. The more containers you have, the greater the chances one container will need to start outside another, but finish inside (or vice versa). Consider that there's no reason at all a region of supplied text shouldn't start in the middle of one word and end in the middle of another. Look at lines 5-6 for example:&lt;/div&gt;&lt;blockquote class="tr_bq"&gt;&lt;span class="Apple-style-span" style="background-color: #f8f6f4; color: #3c2217; font-family: 'Lucida Grande', Cardo, 'Arial Unicode MS', 'Galilee Unicode Gk', 'New Athena Unicode', 'Athena Unicode', 'Palatino Linotype', 'Titus Cyberbit Basic', 'Vusillus Old Face', Alphabetum, 'Galatia SIL', 'Code 2000', GentiumAlt, Gentium, 'Minion Pro', GeorgiaGreek, 'Vusillus Old Face Italic', 'Everson Mono', Aristarcoj, Porson, Legendum, 'Aisa Unicode', 'Hindsight Unicode', Caslon, Verdana, Tahoma; font-size: 16px; line-height: 22px;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;... οὐ̣[χ ἱκανὸν εἶ-]&lt;/span&gt;&lt;br id="al6" style="background-color: #f8f6f4; color: #3c2217; font-family: 'Lucida Grande', Cardo, 'Arial Unicode MS', 'Galilee Unicode Gk', 'New Athena Unicode', 'Athena Unicode', 'Palatino Linotype', 'Titus Cyberbit Basic', 'Vusillus Old Face', Alphabetum, 'Galatia SIL', 'Code 2000', GentiumAlt, Gentium, 'Minion Pro', GeorgiaGreek, 'Vusillus Old Face Italic', 'Everson Mono', Aristarcoj, Porson, Legendum, 'Aisa Unicode', 'Hindsight Unicode', Caslon, Verdana, Tahoma; font-size: 16px; line-height: 22px; text-align: left;" /&gt;&lt;span class="Apple-style-span" style="background-color: #f8f6f4; color: #3c2217; font-family: 'Lucida Grande', Cardo, 'Arial Unicode MS', 'Galilee Unicode Gk', 'New Athena Unicode', 'Athena Unicode', 'Palatino Linotype', 'Titus Cyberbit Basic', 'Vusillus Old Face', Alphabetum, 'Galatia SIL', 'Code 2000', GentiumAlt, Gentium, 'Minion Pro', GeorgiaGreek, 'Vusillus Old Face Italic', 'Everson Mono', Aristarcoj, Porson, Legendum, 'Aisa Unicode', 'Hindsight Unicode', Caslon, Verdana, Tahoma; font-size: 16px; line-height: 22px; text-align: left;"&gt;[ναι εἰ]ς&lt;/span&gt;&lt;/blockquote&gt;A supplied section begins in the middle of the third word from the end of line five, and continues for the rest of the line. The last word is itself broken and continues on the following line, the beginning of which is also supplied, that section ending in the middle of the second word on line six. This is a mess that would only be compounded if we wanted to mark off words.&lt;br /&gt;&lt;br /&gt;This may all seem like a numbing level of detail, but it is on these details that theories of text are tested. The text model here cares about editorial observations on and interventions in the text, and those are what it attempts to capture. It cares much less about the structure of the text itself—note that the text is contained in a tei:ab, an element designed for&amp;nbsp;delineating a block of text without saying anything about its nature as a block (unlike tei:p, for example). Visible features like columns, or text continued on multiple papyri, or multiple texts on the same papyrus would be marked by tei:divs. This is in keeping with papyrology's focus on the materiality of the text. What the editor sees, and what they make of it is more important than the construction of a coherent narrative from the text—something that is often impossible in any case. Making that set of tasks as easy as possible is therefore the focus of the text model we use.&lt;br /&gt;&lt;br /&gt;What I'm trying to get at here is that there is Theory at work here (a host of theories in fact), having to do with a way to model texts, and that that set of theories are mapped onto data structures (TEI, XML, the tree) using a set of conventions, and taking advantage of some of the strengths of the data structures available. Those data structures have weaknesses too, and where we hit those, we have to make choices about how to serve our theories best with the tools we have. There is no end of work to be done at this level, of joining theory to practice, and a great deal of that work involves hacking, experimenting with code and data. It is from this realization, I think, that the "more hack, less yack" ethic of THATCamp emerged. And it is at this level, this intersection, this interface, that scholar-programmer types (like me) spend a lot of our time. And we do get a bit impatient with people who don't, or can't, or won't engage at the same level, especially if they then produce critiques of what we're doing.&lt;br /&gt;&lt;br /&gt;As it happens, I do think that DH tends to be under-theorized, but by that I don't mean it needs more Foucault. Because it is largely project-driven, and the people who are able to reason about the lower-level modeling and interface questions are mostly paid&amp;nbsp;only&amp;nbsp;to get code into production, important decisions and theories are left implicit in code and in the shapes of the data, and aren't brought out into the light of day and yacked about as they should be.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-436560448943249136?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/436560448943249136/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=436560448943249136' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/436560448943249136'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/436560448943249136'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2011/11/tei-in-other-formats-part-second-theory.html' title='TEI in other formats; part the second: Theory'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-WDlALdxYAa4/TsG0cE5fhXI/AAAAAAAAADY/Yfd2RdFw2o0/s72-c/p.ryl.2.74.dot.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-8046331446184506124</id><published>2011-11-10T11:17:00.000-05:00</published><updated>2011-11-10T12:06:59.536-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TEI'/><title type='text'>TEI in other formats; part the first: HTML</title><content type='html'>&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', serif;"&gt;There has been a fair amount of discussion of late about TEI either having a standard HTML5 representation or even moving entirely to an HTML5 format. I want to do a little thinking "out loud" about how that might work.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', serif;"&gt;Let's start with a fairly standard EpiDoc document (EpiDoc being a set of guidelines for using TEI to mark up ancient documents).&amp;nbsp;&lt;a href="http://papyri.info/ddbdp/p.ryl;2;74/source"&gt;http://papyri.info/ddbdp/p.ryl;2;74/source&lt;/a&gt; (see&amp;nbsp;&lt;a href="http://papyri.info/ddbdp/p.ryl;2;74/"&gt;http://papyri.info/ddbdp/p.ryl;2;74/&lt;/a&gt; for an HTML version with more info) is a fairly typical example of EpiDoc used to mark up a papyrus document. The document structure is fairly flat, but with a number of editorial interventions, all marked up. Line 12, below, shows supplied, unclear, and gap tags&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;lt;lb n="12"/&amp;gt;&amp;lt;gap reason="lost" quantity="16" unit="character"/&amp;gt; &amp;lt;supplied reason="lost"&amp;gt; Π&amp;lt;/supplied&amp;gt;&amp;lt;unclear&amp;gt;ε&amp;lt;/unclear&amp;gt;ρὶ Θή&amp;lt;supplied reason="lost"&amp;gt;βας καὶ Ἑ&amp;lt;/supplied&amp;gt;&amp;lt;unclear&amp;gt;ρ&amp;lt;/unclear&amp;gt;μωνθ&amp;lt;supplied reason="lost"&amp;gt;ίτ &amp;lt;/supplied&amp;gt;&amp;lt;gap reason="lost" extent="unknown" unit="character"/&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', serif;"&gt;So how might we take this line and translate it to HTML? First, we have an &amp;lt;lb&amp;gt; tag, which at first glance would seem to map quite readily onto the HTML &amp;lt;br&amp;gt; tag, but if we look at the TEI Guidelines page for lb (&lt;a href="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-lb.html"&gt;http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-lb.html&lt;/a&gt;), we see a large number of possible attributes that don't necessarily convert well. In practice, all I usually see on a line break tag in TEI is an @n and maybe an @xml:id attribute. HTML doesn't really have a general-purpose attribute like @n, but @class or @title might serve. On &amp;lt;lb&amp;gt;, @n is often used to provide line numbers, so @title seems logical.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', serif;"&gt;Now &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;lt;gap reason="lost" quantity="16" unit="character"/&amp;gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', serif;"&gt; is a bit more of a puzzler. First, HTML's semantics don't extend at all to the recording of attributes of a text being transcribed, so nothing like the gap element exists. We'll have to use a general-purpose inline element (span seems obvious) and figure out how to represent the attribute values. TEI has no lack of attributes, and these don't naturally map to HTML at all in most cases. If we're going to keep TEI's attributes, we'll have to represent them as child elements. &amp;nbsp;We'll want to identify both the original TEI element and wrap its attributes and maybe its content too, so let's assume we'll use the @class attribute with a couple of fake "namespaces", "teie-" for TEI element names, "teia-" for attribute names, and "teig-" to identify attributes and wrap element contents (the latter might be overkill, but seems sensible as a way to control whitespace). We can assume a stylesheet with a span.teig-&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', serif;"&gt;attribute selector that sets display:none.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Times, 'Times New Roman', serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;lt;span class="tei-gap"&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;lt;span class="teig-attribute teia-&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;reason&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;"&amp;gt;lost&amp;lt;/span&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp;&amp;nbsp;&amp;lt;span class="&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;teig-&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;attribute teia-quantity"&amp;gt;16&amp;lt;/span&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;lt;span class="&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;teig-&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;attribute teia-unit"&amp;gt;character&amp;lt;/span&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', serif;"&gt;Like HTML, TEI has three structural models for elements: block, inline, and milestone. Block elements assume a "block" of text, that is, they create a visually distinct chunk of text. Divs, paragraphs, tables, and similar elements are block level. Inline elements contain content, but don't create a separate block. Examples are span in HTML, or hi in TEI. Milestones are empty elements like lb or br. TEI has several of these, and HTML, which has "generic" elements of the block and inline varieties (div and span) lacks a generic empty element. Hence the need to represent tei:gap as a span.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', serif;"&gt;tei:supplied is clearly an inline element, and we can do something similar to the example above, using span:&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;lt;span class="tei-supplied"&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;lt;span class="teig-attribute teia-reason"&amp;gt;lost&amp;lt;/span&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;lt;span class="teig-content"&amp;gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;Π&amp;lt;/span&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', serif;"&gt;and likewise with unclear:&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Georgia, 'Times New Roman', serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;lt;span class="tei-unclear"&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;nbsp; &amp;lt;span class="teig-content"&amp;gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;ε&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Now, doing this using generic HTML elements and styling/hiding them with CSS could be considered bad behavior. It's certainly frowned upon in the CSS 2.1 spec (see the note at &lt;a href="http://www.w3.org/TR/CSS2/selector.html#class-html"&gt;http://www.w3.org/TR/CSS2/selector.html#class-html&lt;/a&gt;). I don't honestly see another way to do it though, because, although RDFa has been suggested as a vehicle for porting TEI to HTML, there is no ontology for TEI, so no good way to say "HTML element p is the same as TEI element p, here". Even granting the possibility of saying that, it doesn't help with the attribute problem. And we're still left with the problem of presentation: what will my HTML look like in a browser? It must be said that my messing about above won't produce anything like the desired effect, which for line 12 is something like:&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;&lt;span class="Apple-style-span" style="background-color: white; color: #3c2217; font-family: 'Lucida Grande', Cardo, 'Arial Unicode MS', 'Galilee Unicode Gk', 'New Athena Unicode', 'Athena Unicode', 'Palatino Linotype', 'Titus Cyberbit Basic', 'Vusillus Old Face', Alphabetum, 'Galatia SIL', 'Code 2000', GentiumAlt, Gentium, 'Minion Pro', GeorgiaGreek, 'Vusillus Old Face Italic', 'Everson Mono', Aristarcoj, Porson, Legendum, 'Aisa Unicode', 'Hindsight Unicode', Caslon, Verdana, Tahoma; font-size: 16px; line-height: 22px;"&gt;[- ca.16 - Π]ε̣ρὶ Θή[βας καὶ Ἑ]ρ̣μωνθ[ίτ -ca.?- ]&amp;nbsp;&lt;/span&gt;&lt;/blockquote&gt;I could certainly make it so, probably with a combination of CSS and JavaScript, but what have I gained by doing so? I'll have traded one paradigm, XML + XSLT, for another, HTML + CSS + JavaScript. I'll have lost the ability to validate my markup, though I'll still be able to transform it to other formats. I should be able to round-trip it to TEI and back, so perhaps I could solve the validation problem that way. But is anything about this better than TEI XML? I don't think so…&lt;br /&gt;&lt;br /&gt;I suspect I'm missing the point here, and that what the proponents of TEI in HTML are really after is a radically curtailed (or re-thought) version of TEI that does map more comfortably to HTML. The somewhat Baroque complexity of TEI leads the casual observer to wish for something simpler immediately, and can provoke occasional dismay even in experienced users. I certainly sympathize with the wish for a simpler architecture, but text modeling is a complex problem, and simple solutions to complex problems are hard to engineer.&lt;br /&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-8046331446184506124?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/8046331446184506124/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=8046331446184506124' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/8046331446184506124'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/8046331446184506124'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2011/11/tei-in-other-formats-part-first-html.html' title='TEI in other formats; part the first: HTML'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-3729012614399718654</id><published>2011-06-28T11:00:00.001-04:00</published><updated>2011-06-28T11:01:32.298-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='digital curation'/><category scheme='http://www.blogger.com/atom/ns#' term='IDP'/><category scheme='http://www.blogger.com/atom/ns#' term='DH'/><title type='text'>Humanities Data Curation</title><content type='html'>Last Thursday, I attended the excellent &lt;a href="http://cirssweb.lis.illinois.edu/paloalto/"&gt;Humanities Data Curation Summit&lt;/a&gt;, organized by Allen Renear, Trevor Muñoz, Katherine L. Walter, and Julia Flanders. I'm still processing the day, which included a breakout session with Allen, Elli Mylonas, and Michael Sperberg-McQueen, who are some of my favorite people in DH.&lt;br /&gt;&lt;br /&gt;What I started thinking about today was that we'd skipped definitions at the beginning—there was a joke that Allen, as a philosopher, could have spent all day on that task. But in doing so, we elided the question of what is data in the humanities, and what is different about it from science or social science data.&lt;br /&gt;&lt;br /&gt;Humanities data are not usually static, collected data like instrument readings or survey results. They are things like marked up texts, uncorrected OCR, images in need of annotation, etc. Humanities datasets can almost always be improved upon. "Curation" for them is not simply preservation, access, and forward migration. It means enabling interested communities to work with the data and make it better. Community interaction needs to be factored into the data's curation lifecycle.&lt;br /&gt;&lt;br /&gt;I feel a blog post coming on about how the Integrating Digital Papyrology / &lt;a href="http://papyri.info"&gt;papyri.info&lt;/a&gt; project does this...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-3729012614399718654?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/3729012614399718654/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=3729012614399718654' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/3729012614399718654'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/3729012614399718654'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2011/06/humanities-data-curation.html' title='Humanities Data Curation'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-5164975367842189533</id><published>2011-01-20T09:09:00.004-05:00</published><updated>2011-01-23T15:30:44.242-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TEI'/><category scheme='http://www.blogger.com/atom/ns#' term='DH'/><title type='text'>Interfaces and Models</title><content type='html'>In my &lt;a href="http://philomousos.blogspot.com/2011/01/tei-is-text-modelling-language.html"&gt;last post&lt;/a&gt;, I argued that TEI is a text modelling language, and in the &lt;a href="http://philomousos.blogspot.com/2011/01/i-will-never-not-ever-type-angle.html"&gt;prior post&lt;/a&gt;, I discussed a frequently-expressed request for TEI editors that hide the tags.  Here, I'm going to assert that your editing interface (implicitly) expresses a model too, and because it does, generic, tag-hiding editors are a losing proposition.&lt;br /&gt;&lt;br /&gt;Everything to do with human-computer interfaces uses models, abstractions, and metaphors.  Your computer "desktop" is a metaphor that treats the primary, default interface like the surface of a desk, where you can leave stuff laying around that you want to have close at hand.  "Folders" are like physical file folders.  Word processors make it look like you're editing a printed page; HTML editors can make it look as though you're directly editing the page as it appears in a browser.  These metaphors work by projecting an image that looks like something you (probably) already have a mental model of.  The underlying model used by the program or operating system is something else again.  Folders don't actually represent any physical containment on the system's local storage, for example.  The WYSIWYG text you edit might be a stream of text and formatting instructions, or a Document Object Model (DOM) consisting of Nodes that model HTML elements and text.  &lt;br /&gt;&lt;br /&gt;If you're lucky, there isn't a big mismatch between your mental model and the computer's.  But sometimes there is: we've all seen weirdly mis-formatted documents, where instead of using a header style for header text, the writer just made it bold, with a bigger font, and maybe put a couple of newlines after it.  Maybe you've done this yourself, when you couldn't figure out the "right" way to do it.  This kind of thing only bites you, after all, when you want to do something like change the font for all headers in a document.&lt;br /&gt;&lt;br /&gt;And how do we cope if there's a mismatch between the human interface and the underlying model?  If the interface is much simpler than the model, then you will only be able to create simple instances with it; you won't be able to use the model to its full capabilities.  We see this with word processor-to-TEI converters, for example.  The word processor can do structural markup, like headers and paragraphs, but it can't so easily do more complex markup.  You could, in theory, have a tagless TEI editor capable of expressing the full range of TEI, but it would have to be as complex as the TEI is.  You could hide the angle brackets, but you'd have to replace them with something else.&lt;br /&gt;&lt;br /&gt;Because TEI is a language for producing models of texts, it is probably impossible to build a generic tagless TEI editor.  In order for the metaphor to work, there must be a mapping from each TEI structure to a visual feature in the editor.  But in TEI, there are always multiple ways of expressing the same information.  The one you choose is dictated by your goals, by what you want to model, and by what you'll want the model to do.  There's nothing to map to on the TEI side until you've chosen your model.  Thus, while it's perfectly possible (and is useful,&lt;a href="#note1"&gt;*&lt;/a&gt; and has been done, repeatedly) to come up with a "tagless" interface that works well for a particular model of text, I will assert that developing a generic TEI editor that hides the markup would be &lt;span style="font-weight:bold;"&gt;&lt;a href="http://fishbowl.pastiche.org/2007/07/17/understanding_engineers_feasibility/"&gt;hard&lt;/a&gt;&lt;/span&gt; task.&lt;br /&gt;&lt;br /&gt;This doesn't mean you couldn't build a tool to generate model-specific TEI editors, or build a highly-customizable tagless editor.  But the customization will be a fairly hefty intellectual effort.  And there's a potential disadvantage here too: creating such a customization implies that you know exactly how you want your model to work, and at the start of a project, you probably don't.  You might find, for example, that for 1% of your texts, your initial assumptions about your text model are completely inadequate, and so it has to be refined to account for them.  This sort of thing happens all the time.&lt;br /&gt;&lt;br /&gt;My advice is to think hard before deciding to "protect" people from the markup.  Text modeling is a skill that any scholar of literature could stand to learn.&lt;br /&gt;&lt;br /&gt;UPDATE: a comment on another site by Joe Wicentowski makes me think I wasn't completely clear above.  There's NOTHING wrong with building "padded cell" editors that allow users to make only limited changes to data.  But you need to be clear about what you want to accomplish with them before you implement one.&lt;br /&gt;&lt;br /&gt;&lt;a name="note1"&gt;*&lt;/a&gt;Michael C. M. Sperberg-McQueen has a nice bit on "padded cell editors" at &lt;a href="http://www.blackmesatech.com/view/?p=11"&gt;http://www.blackmesatech.com/view/?p=11&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-5164975367842189533?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/5164975367842189533/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=5164975367842189533' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/5164975367842189533'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/5164975367842189533'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2011/01/interfaces-and-models.html' title='Interfaces and Models'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-5460719897448734265</id><published>2011-01-11T07:47:00.008-05:00</published><updated>2011-01-11T09:35:54.916-05:00</updated><title type='text'>TEI is a text modelling language</title><content type='html'>&lt;div&gt;I'm teaching a &lt;a href="http://www.tei-c.org/"&gt;TEI&lt;/a&gt; class this weekend, so I've been pondering it a bit.  I've come to the conclusion that calling what we do with TEI "text encoding" is misleading.  I think what we're really doing is text &lt;b&gt;&lt;i&gt;modeling&lt;/i&gt;&lt;/b&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;TEI provides an XML vocabulary that lets you produce models of texts that can be used for a variety of purposes. Not a Model of Text, mind you, but models (lowercase) of texts (also lowercase).  &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;TEI has made the (interesting, significant) decision to piggyback its semantics on the structure of XML, which is tree-based.  So XML structure implies semantics for a lot of TEI.  For example, paragraph text appears inside &amp;lt;p&amp;gt; tags; to mark a personal name, I surround the name with a &amp;lt;persname&amp;gt; tag, and so on.  This arrangement is extremely convenient for processing purposes: it is trivial to transform the TEI &amp;lt;p&amp;gt; into an HTML &amp;lt;p&amp;gt;&lt;a href="#note1"&gt;*&lt;/a&gt;, for example, or the &amp;lt;persname&amp;gt; into an HTML hyperlink, which points to more information about the person.  It means, however, that TEI's modeling capabilities are to a large extent XML's own.  This approach has opened TEI up to criticism.  Buzetti (2002) has argued that its tree structure simply isn't expressive enough to represent the complexities of text, and Schmidt (2010) criticizes TEI for (among other problems) being a bad model of text, because it imposes editorial interpretation on the text itself.  &lt;/persname&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The main disagreement I have with Schmidt's argument is the assumption that there is a text independent of the editorial apparatus.  Maybe there is sometimes, but I can point at many examples where there is no text, as such, only readings.  And a reading is, must be, an interpretive exercise.  So I'd argue that TEI is at least honest in that it puts the editorial interventions front and center where they are obvious.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;As for the argument that TEI's structure is inadequate to model certain aspects of text, I can only agree.  But TEI has proved good enough to do a lot of serious scholarly work.  That, and the fact that its choice of structure means it can bring powerful XML tools to bear on the problems it confronts, means that TEI represents a "worse is better" solution.&lt;a href="#note2"&gt;†&lt;/a&gt;  It works a lot of the time, doesn't claim to be perfect, and incrementally improves.  Where TEI isn't adequate to model a text in the way you want to use it, then you either shouldn't use it, or should figure out how to extend it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;One should bear in mind that any digital representation of a text is &lt;i&gt;ipso facto&lt;/i&gt; a model.  It's impossible do anything digital without a model (whether you realize it's there or not).  Even if you're just transcribing text from a printed page to a text editor you're making editorial decisions, like what character encoding to use, how to represent typographic features in that encoding, how to represent whitespace, and what to do with things you can't easily type (inline figures or symbols without a Unicode representation, for example).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So why argue that TEI is a language for modeling texts, rather than a language for "encoding" texts?  The simple answer is that this is a better way of explaining what people use TEI for.  TEI provides a lot of tags to choose from.  No-one uses them all.  Some are arguably incompatible with one another.  We tag the things in a text that we care about and want to use.  In other words, we build models of the source text, models that reflect what we think is going on structurally, semantically, or linguistically in the text, and/or models that we hope to exploit in some way.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt; For example, &lt;a href="http://epidoc.sf.net/"&gt;EpiDoc&lt;/a&gt; is designed to produce critical editions of inscribed or handwritten ancient texts.  It is concerned with producing an edition (a reading) of the source text that records the editor's observations of and ideas about that text.  It does not at this point concern itself with marking personal or geographic names in the text.  An EpiDoc document is a particular model of the text that focuses on the editor's reading of that text.  As a counterexample, I might want to use TEI to produce a graph of the interactions of characters in Hamlet.  If I wanted to do that, I would produce a TEI document that marked people and whom they were addressing when they spoke.  This would be a completely different model of the text than a critical edition of Hamlet might be.  I could even try to do both at the same time, but that might be a mess—models are easier to deal with when they focus on one thing.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This way of understanding TEI makes clear a problem that arises whenever one tries to merge collections of TEI documents: that of compatibility.  Just because two documents are marked up in TEI, that does not mean they are interoperable.  This is because each document represents the editor's &lt;b&gt;&lt;i&gt;model&lt;/i&gt;&lt;/b&gt; of that text.  Compatibility is certainly achievable if both documents follow the same set of conventions, but we shouldn't &lt;b&gt;&lt;i&gt;expect&lt;/i&gt;&lt;/b&gt; it any more than we'd expect to be able to merge any two models that follow different ground rules.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Notes&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;a name="note1"&gt;*&lt;/a&gt; with the caveat that the semantics of TEI &amp;lt;p&amp;gt; and HTML &amp;lt;p&amp;gt; are different, and there may be problems. TEI's &amp;lt;p&amp;gt; can contain lists, for example, whereas HTML's cannot.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a name="note2"&gt;†&lt;/a&gt; See &lt;a href="http://www.dreamsongs.com/RiseOfWorseIsBetter.html"&gt;http://www.dreamsongs.com/RiseOfWorseIsBetter.html&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Yes, I wrote a blog post with endnotes and bibliography. Sue me.&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;&lt;span class="cit-name-surname"&gt;Buzzetti&lt;/span&gt; &lt;span class="cit-name-given-names"&gt;D&lt;/span&gt;&lt;i&gt;. "&lt;/i&gt;&lt;span class="cit-article-title"&gt;Digital Representation and the Text Model&lt;/span&gt;." &lt;abbr class="cit-jnl-abbrev"&gt;&lt;i&gt;New Literary History&lt;/i&gt;&lt;/abbr&gt; &lt;span class="cit-pub-date"&gt;2002&lt;/span&gt;; &lt;span class="cit-vol"&gt;33.1&lt;/span&gt;:&lt;span class="cit-fpage"&gt;61&lt;/span&gt;-&lt;span class="cit-lpage"&gt;88&lt;/span&gt;.&lt;/li&gt;&lt;li&gt;Schmidt, D. "The Inadequacy of Embedded Markup for Cultural Heritage Texts." &lt;i&gt;Literary and LInguistic Computing&lt;/i&gt; 2010; 25.3:337-356.&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-5460719897448734265?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/5460719897448734265/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=5460719897448734265' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/5460719897448734265'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/5460719897448734265'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2011/01/tei-is-text-modelling-language.html' title='TEI is a text modelling language'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-4007994442974032217</id><published>2011-01-06T23:14:00.006-05:00</published><updated>2011-01-06T23:31:44.502-05:00</updated><title type='text'>I Will Never NOT EVER Type an Angle Bracket (or IWNNETAAB for short)</title><content type='html'>&lt;div&gt;From time to time, I hear an argument that goes something like this: "Our users won't deal with angle brackets, therefore we can't use TEI, or if we do, it has to be hidden from them." It's an assumption I've encountered again quite recently.  Since it's such a common trope, I wonder how true it is.  Of course, I can't speak for anyone's user communities other than the ones I serve.  And mine are perhaps not the usual run of scholars.  But they haven't tended to throw their hands up in horror at the sight of angle brackets.  Indeed, some of them have become quite expert at editing documents in TEI.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The problems with TEI (and XML in general) are manifold, but its shortcomings often center around its not being expressive *enough* to easily deal with certain classes of problem.  And the TEI evolves.  You can get involved and change it for the better.  &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The IWNNETAAB objection seems grounded in fear.  But fear of what?  As I mentioned at the start, IWNNETAAB isn't usually an expression of personal revulsion, it's not just Luddism, it's IWNNETAAB by proxy: my users/clients/stakeholders won't stand for it.  Or they'll mess it up.  TEI is hard.  It has *hundreds* of elements.  How can they/why should they learn something so complex just to be able to digitize texts?!  What we want to do is simple, can't we have something simple that produces TEI in the end?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The problem with simplified editing interfaces is easy to understand: they are simple. Complexities have been removed, and along with them, the ability to express complex things.  To put it another way, if you aren't dealing with the tags, you're dealing with something in which a bunch of decisions have already been made for you.  My argument in the recent discussion was that in fact, these decisions tend to be extremely project-specific.  You can't set it up once and expect it to work again in different circumstances; you (or someone) will have to do it over and over again.  So, for a single project, the cost/benefit equation may look like it leans toward the "simpler" option.  But taken over many projects, you're looking either at learning a reasonably complex thing or building a long series of tools that each produce a different version of that thing.  Seen in this light, I think learning TEI makes a lot of sense.  On the learning TEI side, the costs go down over time, on the GUI interface side, they keep going up.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Moreover, knowing TEI means that you (or your stakeholders) aren't shackled to an interface that imposes decisions that were made before you ever looked at the text you're encoding, instead, you are actually engaging with the text, in the form in which it will be used.  You're seeing behind the curtain.  I can't really fathom why that would be a bad thing.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(Inspiration for the title comes from a book my 2-year-old is very fond of)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://en.wikipedia.org/wiki/Charlie_and_LolaG"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 220px; height: 244px;" src="http://upload.wikimedia.org/wikipedia/en/thumb/9/97/Charlieandlolatomato.PNG/220px-Charlieandlolatomato.PNG" border="0" alt="" /&gt;&lt;/a&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-4007994442974032217?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/4007994442974032217/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=4007994442974032217' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/4007994442974032217'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/4007994442974032217'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2011/01/i-will-never-not-ever-type-angle.html' title='I Will Never NOT EVER Type an Angle Bracket (or IWNNETAAB for short)'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-8445697106013734105</id><published>2010-12-28T07:10:00.001-05:00</published><updated>2010-12-28T07:12:22.083-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data curation'/><category scheme='http://www.blogger.com/atom/ns#' term='DH'/><title type='text'>DH Tea Leaves</title><content type='html'>&lt;div&gt;From reading my (possibly) representative sample of &lt;a href="https://dh2011.stanford.edu/"&gt;DH&lt;/a&gt; proposals, I'd say the main theme of the conference will not be "Big Tent Digital Humanities" but "data integration".  Of the 8 proposals I read, more than half of them were concerned with problems of connecting data across projects, disciplines, and different systems.  My proposal was too (making 9), so perhaps I did have a representative sample.  &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Data integration is a meaty problem, resistant to generalized solutions.  To my mind the answers, such as they are, will rely on the same set of practices that good data curation techniques use: open formats and open source code, and good documentation that covers the "why" of decisions made for projects as well as the "how."  Data integration is a process that involves gaining an understanding of the sources and the semantics of their structures before you can connect them together.  So, while there are tools out there that can enable successful data integration, there are (as usual) no silver bullets.  Grasping the meanings and assumptions embodied in each project's data structures has to be the first step and this is only possible when those structures have been explained.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-8445697106013734105?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/8445697106013734105/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=8445697106013734105' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/8445697106013734105'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/8445697106013734105'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2010/12/dh-tea-leaves.html' title='DH Tea Leaves'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-6895404428239178618</id><published>2010-12-05T16:29:00.002-05:00</published><updated>2010-12-06T09:32:11.938-05:00</updated><title type='text'>That Bug Bit Me</title><content type='html'>&lt;div&gt;"I had this problem and I fixed it" stories are boring to anyone except those intimately concerned with the problem, so I'm not going to tell that story. Instead, I'm going to talk about projects in the Digital Humanities that rely on 3rd party software, and talk about the value of expertise in programming and software architecture. From the outside, modern software development can look like building a castle out of Lego pieces: you take existing components and pop them together. Need search? Grab Apache Solr and plug it in. Need a data store? Grab a (No)SQL database and put your data in it. Need to do web development fast? Grab a framework, like Rails or Django. Doesn't sound that hard.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is, more or less, what &lt;a href="http://papyri.info/"&gt;papyri.info&lt;/a&gt; looks like internally. There's a &lt;a href="http://lucene.apache.org/solr/"&gt;Solr&lt;/a&gt; install that handles search, a &lt;a href="http://mulgara.org/"&gt;Mulgara&lt;/a&gt; triple store that keeps track of document relationships, small bits of code that handle populating the former two and displaying the web interface, and a &lt;a href="http://jruby.org/"&gt;JRuby&lt;/a&gt; on &lt;a href="http://rubyonrails.org/"&gt;Rails&lt;/a&gt; application that provides crowdsourced editing capabilities.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Upgrading components in this architecture should range from trivially easy, to moderately complex (that's only if some interface has changed between versions, for example).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So why did I find myself sitting in a hotel lobby in Rome a few weeks ago having to roll back an to an older version of Mulgara so the editor application would work for a presentation the next day? A bunch of our queries had stopped working, meaning the editor couldn't load texts to edit. Oops.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And why did I spend the last week fighting to keep the application standing up after a new release of the editor was deployed?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The answer to both questions was that our Lego blocks didn't function the way they were supposed to. They aren't Lego blocks after all—they're complex pieces of software that may have bugs. The fact that our components are open source, and have responsive developers behind them is a help, but we can't necessarily expect those developers to jump to solve our problems. After all, the project's tests must have passed in order for the new release to be pushed, and unless there's a chorus of complaint, our problem isn't necessarily going to be high on their list of things to fix.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;No, the whole point of using open source components is that you don't have to depend solely on other people to fix your problems. In the case of Mulgara, I was able to track down and fix the bug myself, with some pointers from the lead developer. The fix (or a better version of it) will go into the next release, and meantime we can use my patched version. In the case of the Rails issue, there seems to be a bug in the ActiveSupport file caching under JRuby that causes it to go nuts: the request never returns and something continually creates resources that have to be garbage collected. The symptom I was seeing was constant GC and a gradual ramping up of the CPU usage to the point where the app became unstable. Tracing back from that symptom took a lot of work, but once I identified it, we were able to switch away from file store caching, and so far things look good.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;My takeaway from this is that even when you're constructing your application from prebuilt blocks, it really helps to have the expertise to dig into the architecture of the blocks themselves. Software components aren't Lego blocks, and although you'll want to use them (because you don't have the time or money to write your own search engine from scratch) you do need to be able to understand them in a pinch. It also really pays to work with open source components. I didn't have to spend weeks feeding bug reports to a vendor to help them fix our Mulgara problem. A handful of emails and a about a day's worth of work (spread over the course of a week) were enough to get me to the source of the problem and a fix for it (a 1-liner, incidentally).&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-6895404428239178618?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/6895404428239178618/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=6895404428239178618' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/6895404428239178618'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/6895404428239178618'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2010/12/that-bug-bit-me.html' title='That Bug Bit Me'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-6874370350462815953</id><published>2010-05-10T10:42:00.003-04:00</published><updated>2010-05-10T12:20:00.234-04:00</updated><title type='text'>#alt-ac Careers: Digital Humanities Developer</title><content type='html'>(part 1 of a series)&lt;br /&gt;&lt;br /&gt;I've been a digital humanities developer, that is, someone who writes code, does interface and system design, and rescues data from archaic formats in support of DH projects in a few contexts during the course of my career.  I'll be writing a piece for Bethany Nowviskie's #alt-ac (&lt;a href="http://nowviskie.org/2010/alt-ac/"&gt;http://nowviskie.org/2010/alt-ac/&lt;/a&gt;) volume this year.  This is (mostly) not that piece, though portions of it may appear there, so it's maybe part of a draft.  This is an attempt to dredge up bits of my perspective as someone who has had an alternate-academic career (on and off) for the last decade.  It's fairly narrowly aimed at people like me, who've done advanced graduate work in the Humanities, have an interest in Digital Humanities, and who have or are thinking about jobs that employ their technical skills instead of pursuing a traditional academic career.&lt;br /&gt;&lt;br /&gt;A couple of future installments I have in mind are "What Skills Does a DH Developer Need?" and "What's Up with Digital Classics?"&lt;br /&gt;&lt;br /&gt;In this installment, I'm going to talk about some of the environments I've worked in.  I'm not going to pull punches, and this might get me into trouble, but I think that if there is to be a future for people like me, these things deserve an airing.  If your institution is more enlightened than what I describe, please accept my congratulations and don't take offense.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Working for Libraries&lt;/h3&gt;&lt;br /&gt;In general, libraries are a really good place to work as a programmer, especially doing DH projects.  I've spent the last three years working in digital library programming groups.  There are some downsides to be aware of:  Libraries are very hierarchical organizations, and if you are not a librarian then you are probably in a lower "caste".  You will likely not get consistent (or perhaps any) support for professional development, conference attendance, etc.  Librarians, as faculty, have professional development requirements as part of their jobs.  You, whose professional development is not mandated by the organization (merely something you have to do if you want to stay current and advance your career), will not get the same level of support and probably won't get any credit for publishing articles, giving papers, etc.  This is infuriating, and in my opinion self-defeating on the part of the institution, but it is an unfortunate fact.  &lt;br /&gt;&lt;br /&gt;[Note: Bethany Nowviskie informs me that this is not the case at UVA, where librarians and staff are funded at the same level for professional development.  I hope that signals a trend.  And by the way, I do realize I'm being inflammatory, talking of castes.  This should make you uncomfortable.]&lt;br /&gt;&lt;br /&gt;Another downside is that as a member of a lower caste, you may not be able to initiate projects on your own.  At many insitutions, only faculty (including librarians) and higher level administrators can make grant proposals, so if you come up with a grant-worthy project idea, someone will have to front for you (and get the credit).  &lt;br /&gt;&lt;br /&gt;There do exist librarian/developer jobs, and this would be a substantially better situation from a professional standpoint, but since librarian jobs typically require a Master's degree in Library and/or Information Science, libraries may make the calculation that they would be excluding perfectly good programmers from the job pool by putting that sort of requirement in.  These are not terribly onerous programs on the whole, should you want to get an MLIS degree, but it does mean obtaining another credential.  For what it's worth, I have one, but have never held a librarian position. &lt;br /&gt;&lt;br /&gt;It's not all bad though: you will typically have a lot of freedom, loose deadlines, shorter than average work-weeks, and the opportunity to apply your skills to really interesting and hard problems.  If you want to continue to pursue your academic interests however, you'll be doing it as a hobby.  They don't want your research agenda unless you're a librarian.  In a lot of ways, being a DH developer in a library is a DH developer's nirvana.  I rant because it's &lt;span style="font-weight:bold;"&gt;so&lt;/span&gt; close to ideal. &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Working for a .edu IT Organization&lt;/h3&gt;&lt;br /&gt;My first full time, permanent position post-Ph.D. was working for an IT organization that supports the College of Arts and Sciences at UNC Chapel Hill.  I was one of a handful of programmers who did various kinds of administrative and faculty project support.  It was a really good environment to work in.  I got to try out new technologies, learned Java, really understood XSLT for the first time, got good at web development and had a lot of fun.  I also learned to fear unfunded mandates, that projects without institutional support are doomed, and that if you're the last line of support for a web application, you'd better get good at making it scale.&lt;br /&gt;&lt;br /&gt;IT organizations typically pay a bit better than, say, libraries and since it's an IT organization they actually understand technology and what it takes to build systems.  There's less sense of being the odd man out in the organization.  That said, if you're &lt;span style="font-weight:bold;"&gt;the&lt;/span&gt; academic/DH applications developer it's really easy to get overextended, and I did a bad job of avoiding that fate, "learning by suffering" as Aeschylus said.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Working in Industry&lt;/h3&gt;&lt;br /&gt;Working outside academia as a developer is a whole other world.  Again, DH work is likely to have to be a hobby, but depending on where you work, it may be a relevant hobby.  You will be paid (much) more, will probably have a budget for professional development, and may be able to use it for things such as attending DH conferences.  Downsides are that you'll probably work longer hours and you'll have less freedom to choose what you do and how you do it, because you're working for an organization that has to make money.  The capitalist imperative may strike you as distasteful if you've spent years in academia, but in fact it can be a wonderful feedback mechanism.  Doing things the right way (in general) makes the organization money, and doing them wrong (again, in general) doesn't.  It can make decision-making wonderfully straightforward.  Companies, particularly small ones, can make decisions with a speed that seems bewilderingly quick when compared to libraries, which thrive on committees and meetings and change direction with all the flexibility of a supertanker.&lt;br /&gt;&lt;br /&gt;Another advantage of working in industry is that you are more likely to be part of a team working on the same stuff as you.  In DH we tend to only be able to assign one or two developers to a job.  You will likely be the lone wolf on a project at some point in your career.  Companies have money, and they want to get stuff done, so they hire &lt;span style="font-weight:bold;"&gt;teams&lt;/span&gt; of developers.  Being on a team like this is nice, and I often miss it.&lt;br /&gt;&lt;br /&gt;There are lots of companies that work in areas you may be interested in as someone with a DH background, including the semantic web, text mining, linked data, and digital publishing.  In my opinion, working on DH projects is great preparation for a career outside academia.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Funding&lt;/h3&gt;&lt;br /&gt;As a DH developer, you will more likely than not end up working on grant-funded projects, where your salary is paid with "soft money".  What this means in practical terms is that your funding will expire at a certain date.  This can be good.  It's not uncommon for programmers to change jobs every couple of years anyway, so a time-limited position gives you a free pass at job-switching without being accused of job-hopping.  If you work for an organization that's good at attracting funding, then it's quite possible to string projects together and/or combine them.  Though there can be institutional impedance mismatch problems here, in that it might be hard to renew a time-limited position, or to convert it to a permanent job without re-opening it for new applicants, or to fill in the gaps between funding cycles.  So some institutions have a hard time mapping funding streams onto people efficiently.  These aren't too hard to spot because they go though "boom and bust" cycles, staffing up to meet demand and then losing everybody when the funding is gone.  This doesn't mean "don't apply for this job," just do it with your eyes open.  Don't go in with the expectation (or even much hope) that it will turn into a permanent position.  Learn what you can and move on.  The upside is that these are often great learning opportunities.&lt;br /&gt;&lt;br /&gt;In sum, being a DH developer is very rewarding.  But I'm not sure it's a stable career path in most cases, which is a shame for DH as a "discipline" if nothing else.  It would be nice if there were more senior positions for DH "makers" as well as "thinkers" (not that those categories are mutually exclusive).  I suspect that the institutions that have figured this out will win the lion's share of DH funding in the future, because their brain trusts will just get better and better.  The ideal situation (and what you should look for when you're looking to settle down) is a place&lt;br /&gt; &lt;ul&gt;&lt;br /&gt; &lt;li&gt;that has a good track record of getting funded,&lt;/li&gt;&lt;br /&gt; &lt;li&gt;where developers are first-class members of the organization (i.e. have "researcher" or similar status),&lt;/li&gt;&lt;br /&gt; &lt;li&gt;where there's a team in place and it's not just you, and&lt;/li&gt;&lt;br /&gt; &lt;li&gt;where there's some evidence of long-range planning.&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;For the most part, though, DH development may be the kind of thing you do for a few years while you're young before you go and do something else.  I often wonder whether my DH developer expiration date is approaching.  Grant funding often won't pay enough for an experienced programmer, unless those who wrote the budget knew what they were doing [Note: I've read too many grant proposals where the developer salary is &amp;lt; $50K (entry-level) but they have the title "Lead Developer" vel sim.— for what it's worth, this positively screams "We don't know what we're doing!"].  It may soon be time to go back to working for lots more money in industry; or to try to get another administrative DH job.  For now, I still have about a year's worth of grant funding left.  Better get back to work.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-6874370350462815953?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/6874370350462815953/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=6874370350462815953' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/6874370350462815953'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/6874370350462815953'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2010/05/alt-ac-careers-digital-humanities.html' title='#alt-ac Careers: Digital Humanities Developer'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-6709360917220560947</id><published>2010-05-04T20:54:00.004-04:00</published><updated>2010-05-04T21:50:46.561-04:00</updated><title type='text'>Addenda et Corrigenda</title><content type='html'>The proceedings of the &lt;a href="http://www.library.upenn.edu/exhibits/lectures/ljs_symposium.html"&gt;2009 Lawrence J. Schoenberg Symposium&lt;/a&gt; on Manuscript Studies in the Digital Age, at which I was a panelist were recently published at &lt;a href="http://repository.upenn.edu/ljsproceedings/"&gt;http://repository.upenn.edu/ljsproceedings/&lt;/a&gt;.  I contributed a short &lt;a href="http://repository.upenn.edu/ljsproceedings/vol2/iss1/7/"&gt;piece&lt;/a&gt; arguing for the open licensing of content related to the study of medieval manuscripts.  &lt;br /&gt;&lt;br /&gt;Peter Hirtle, Senior Policy Advisor at the Cornell University Library, wrote me a message commenting on the piece and raising a point that I had elided (reproduced with permission):&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;Dear Dr. Cayless:&lt;br /&gt; &lt;br /&gt;I read with great interest your article on “Digitized Manuscripts and Open Licensing.”  Your arguments in favor of a CC-BY license for medieval scholarship are unusual, important, and convincing.&lt;br /&gt; &lt;br /&gt;I was troubled, however, to see your comments on reproductions of medieval manuscripts.  For example, you note that if you use a CC-BY license, “an entrepreneur can print t-shirts using your digital photograph of a nice initial from a manuscript page.”  Later you add:&lt;br /&gt; &lt;br /&gt;"Should reproductions of cultural objects that have never been subject to copyright (and that would no longer be, even if they once had) themselves be subject to copyright? The fact is that they are, and some uses of the copyright on photographs may be laudable, for example a museum or library funding its ongoing maintenance costs by selling digital or physical images of objects in its collection, but the existence of such examples does not provide an answer to the question: as an individual copyright owner, do you wish to exert control how other people use a photograph of something hundreds or thousands or years old?"&lt;br /&gt; &lt;br /&gt;There is a fundamental mistaken concept here.  While some reproductions of cultural objects are subject to copyright, most aren’t.  Ever since the Bridgeman decision, the law in the New York circuit at least (and we believe in most other US courts) is that a “slavish” reproduction does not have enough originality to warrant its own copyright protection.  If it is an image of a three-dimensional object, there would be copyright, but if it is just a reproduction of a manuscript page, there would be no copyright.  It may take great skill to reproduce well a medieval manuscript, but it does not take originality.  To claim that it does encourages what has been labeled as “copyfraud.”&lt;br /&gt; &lt;br /&gt;You can read more about Bridgeman on pps. 34-35 in my book on Copyright and Cultural Institutions: Guidelines for Digitization for U.S. Libraries, Archives, and Museums, available for sale from Amazon and as a free download from SSRN at &lt;a href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1495365"&gt;http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1495365&lt;/a&gt;.&lt;br /&gt; &lt;br /&gt;Sincerely,&lt;br /&gt;Peter Hirtle&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Peter is quite right that the copyright situation in the US, at least insofar as faithful digital reproductions of manuscript pages are concerned, is (with high probability) governed by the &lt;a href="http://en.wikipedia.org/wiki/Bridgeman_Art_Library_v._Corel_Corp."&gt;1999 Bridgeman vs. Corel&lt;/a&gt; decision.  So it is arguably best practice for a scholar who has photographed manuscripts to publish them and assert that they are in the public domain.&lt;br /&gt;&lt;br /&gt;I skimmed over this in my article because, for one thing, much of the scholarly content of the symposium dealt with European institutions and their online publication (some behind a paywall—sometimes a crazily expensive one) of manuscripts, so US copyright law doesn't necessarily pertain.  For another, I had in mind not just manuscripts but inscriptions, to which—to the extent that they are three-dimensional objects—Bridgeman doesn't pertain.  Finally, while it is common practice to produce faithful digital reproductions of manuscript texts, it is also common to enhance those images for the sake of readability, at which point they are (may be?) no longer "slavish reproductions" and thus subject to copyright.  The problem of course, as so often in copyright, is that there's no case law to back this up.  If I crank up the contrast or run a Photoshop (or Gimp) filter on an image, have I altered it sufficiently to make it copyrightable?  I don't know for certain, and I'm not sure anyone does.  So on balance I'd still argue for doing the simple thing and putting a &lt;a href="http://creativecommons.org/"&gt;Creative Commons&lt;/a&gt; license (&lt;a href="http://creativecommons.org/licenses/by/3.0/"&gt;CC-BY&lt;/a&gt; is my recommendation) on everything.  This is what the &lt;a href="http://www.archimedespalimpsest.org/"&gt;Archimedes Palimpsest&lt;/a&gt; project does, &lt;a href="http://www.archimedespalimpsest.org/imagebank_intro.html"&gt;for example&lt;/a&gt;, who are arguably in just this situation with their publication of multispectral imaging of the palimpsest.  And they are to be commended for doing so.&lt;br /&gt;&lt;br /&gt;Anyway, I'd like to thank Peter, first for reading my article and second for prodding me into usefully complicating something I had oversimplified for the sake of argument.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-6709360917220560947?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/6709360917220560947/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=6709360917220560947' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/6709360917220560947'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/6709360917220560947'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2010/05/addenda-et-corrigenda.html' title='Addenda et Corrigenda'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-4515773403272907672</id><published>2010-03-02T10:05:00.007-05:00</published><updated>2010-03-04T15:52:47.586-05:00</updated><title type='text'>Making a new Numbers Server for papyri.info</title><content type='html'>[UPDATE: added relationships diagram]&lt;br /&gt;&lt;br /&gt;One of the components of &lt;a href="http://papyri.info/"&gt;papyri.info&lt;/a&gt; (a fairly well-hidden one) is a service that provides lookups of identifiers and correlates them with related records in other collections. Over the last few weeks, I've been working on replacing the old, Lucene-based numbers server with a new, triplestore-based one. One of the problems with the old version (though not the one that initially sent me on this quest, which was that I hated its identifiers) was that its structure didn't match the multidimensional nature of the data.&lt;br /&gt;&lt;h4&gt;Dimensions:&lt;/h4&gt;&lt;ul&gt;&lt;li&gt;Collections in the PN (there are four, so far) are hierarchical: for the Duke Databank of Documentary Papyri (DDbDP) and the Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens (HGV—which has two collections, of metadata and translations), there are series, volumes, and items, and for the Advanced Papyrological Information System (APIS) there are institutions and items.&lt;/li&gt;&lt;li&gt;FRBR: there's a Work (the ancient document itself), which has expression in a scholarly publication, from which the DDbDP transcription, HGV records and translations, and APIS records and translations are derived; these may be made manifest in a variety of ways, including EpiDoc XML, an HTML view, etc.  The scholarly work has bibliography, which is surfaced in the HGV records. There is the possibility of attaching bibliography at the volume level as well (since these are actual books, sitting in libraries). Libraries may have series-level catalog records too.&lt;/li&gt;&lt;li&gt;There are relationships between items that describe the same thing.  DDbDP and HGV usually have a 1::1 relationship (but not always).  APIS has some overlap with both.&lt;/li&gt;&lt;li&gt;There are internal relationships as well.  HGV has the idea of a "principal edition," the canonical publication of a document (there are also "andere publikationen"—other publications).  DDbDP does as well, but expresses it slightly differently: older versions that have been superseded have stub records with a pointer to the replacement.  The replacements point backward as well, and these can form sometimes complex chains (imagine two fragments published separately, but later recognized as belonging to the same document and republished together).&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;br /&gt;Relationships:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_jz5dHCAaud4/S4_cJma9cRI/AAAAAAAAAAM/gxuLNyfQF50/s1600-h/relationships.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 243px;" src="http://1.bp.blogspot.com/_jz5dHCAaud4/S4_cJma9cRI/AAAAAAAAAAM/gxuLNyfQF50/s320/relationships.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5444812532004778258" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt; All of this is really hard to represent in a relational or document-oriented fashion.  It turns out though, that a graph database does really well.  I experimented with &lt;a href="http://www.mulgara.org/"&gt;Mulgara&lt;/a&gt; and found that it does the job perfectly.  I can write SPARQL queries that retrieve the data I need from the point of view of any component.  Then I can map these to nice URLs so that they are easy to retrieve, using a servlet that does some URL rewriting.  Some examples:&lt;br /&gt;&lt;br /&gt;An HGV record:&lt;br /&gt;&lt;a href="http://papyri.info/hgv/8875/rdf"&gt;http://papyri.info/hgv/8875/rdf&lt;/a&gt;&lt;br /&gt;An HGV translation:&lt;br /&gt;&lt;a href="http://papyri.info/hgvtrans/8875/rdf"&gt;http://papyri.info/hgvtrans/8875/rdf&lt;/a&gt;&lt;br /&gt;An HGV record's principal edition and andere pub.:&lt;br /&gt;&lt;a href="http://papyri.info/hgv/249/frbr:Work/rdf"&gt;http://papyri.info/hgv/249/frbr:Work/rdf&lt;/a&gt;&lt;br /&gt;An APIS record:&lt;br /&gt;&lt;a href="http://papyri.info/apis/berenike.apis.17/rdf"&gt;http://papyri.info/apis/berenike.apis.17/rdf&lt;/a&gt;&lt;br /&gt;A (corresponding) DDb record:&lt;br /&gt;&lt;a href="http://papyri.info/ddbdp/o.berenike;1;17/rdf"&gt; http://papyri.info/ddbdp/o.berenike;1;17/rdf&lt;/a&gt;&lt;br /&gt;A DDb series listing (with "human-readable" citation):&lt;br /&gt;&lt;a href="http://papyri.info/ddbdp/chla/rdf"&gt; http://papyri.info/ddbdp/chla/rdf&lt;/a&gt;&lt;br /&gt;A DDb volume listing:&lt;br /&gt;&lt;a href="http://papyri.info/ddbdp/chla;1/rdf"&gt;http://papyri.info/ddbdp/chla;1/rdf&lt;/a&gt;&lt;br /&gt;The DDb collection listing:&lt;br /&gt;&lt;a href="http://papyri.info/ddbdp/rdf"&gt;http://papyri.info/ddbdp/rdf&lt;/a&gt;&lt;br /&gt;The HGV collection listing:&lt;br /&gt;&lt;a href="http://papyri.info/hgv/rdf"&gt;http://papyri.info/hgv/rdf&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Results are also available in Notation3 or JSON formats (available by substituting "n3" or "json" for "rdf" in the URLs above).  All of this makes for a nice machine interface to the relationships between papyri.info data.  One that can be generated purely from the data files themselves, plus an RDF file that contains the abbreviated series citations from which DDbDP derives its identifiers.  The new Papyrological Editor that will allow scholars to propose emendations to existing documents and to add new ones will use it to determine what files to pull for editing.  I also plan to drive the new Solr-based search indexing (which is necessarily document-oriented) using it, since it provides a clear view of which documents should be aggregated.&lt;br /&gt;&lt;br /&gt;The URL schemes above illustrate what I plan to do with the new version of papyri.info.  Content URLs will be of the form &lt;code&gt;http://papyri.info/ &amp;lt;collection name&amp;gt; / &amp;lt;collection specific identifier&amp;gt; [/ &amp;lt;format&amp;gt;]&lt;/code&gt;.  Leaving off a format will give you a standard HTML view of the document + associated documents; &lt;code&gt;/source&lt;/code&gt; will give you the EpiDoc source document by itself; &lt;code&gt;/atom&lt;/code&gt; will give you an ATOM-based representation.  I'm also thinking of &lt;code&gt;/rdfa&lt;/code&gt; for an HTML-based view of the numbers server data, with embedded RDFa.&lt;div&gt;&lt;h4&gt;What's Next&lt;/h4&gt;I haven't done anything really sophisticated with this yet.  I'd like to experiment with extending the DCTERMS vocabulary to deal with (e.g.) typed identifiers.  Importing other vocabularies (like &lt;a href="http://vocab.org/frbr/core.html"&gt;FRBR&lt;/a&gt; or &lt;a href="http://bibliontology.com/"&gt;BIBO&lt;/a&gt;) may make sense as well.  We're talking about hooking this up to bibliography (via records in &lt;a href="http://www.zotero.org/groups/papyrology"&gt;Zotero&lt;/a&gt;) and ancient places (via &lt;a href="http://pleiades.stoa.org/"&gt;Pleiades&lt;/a&gt;).  It all works well with my design philosophy for papyri.info, which is that it should consist of data (in the form of EpiDoc source files and representations of those files), retrievable via sensible URLs, with modular services surrounding the data to make it discoverable and usable.&lt;br /&gt;&lt;br /&gt;I made a couple of changes to Mulgara during the course of this:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;turned off its strange and repugnant habit of representing namespaces and other URIs by declaring them as entities in an internal DTD in returned RDF results.  Please don't do this.  It's 2010.  For another thing it breaks if you have any URLEncoded characters (i.e. %something) in a URI, because your XML parser will think they are malformed parameter entities.&lt;/li&gt;&lt;li&gt;made the servlet return a 404 not found for queries with no hits (which seems more RESTfully correct)&lt;/li&gt;&lt;/ol&gt;Anyway, I need to revisit the Mulgara changes I made and try to either get them committed to the Mulgara codebase, or refactor them so that I'm not actually messing with Mulgara's internals.  I guess trying another triplestore is a third option.  Mulgara is fast, easy to use, and it solved my problem, so I went with it.  But there still might be better alternatives out there.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-4515773403272907672?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/4515773403272907672/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=4515773403272907672' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/4515773403272907672'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/4515773403272907672'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2010/03/making-new-numbers-server-for.html' title='Making a new Numbers Server for papyri.info'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_jz5dHCAaud4/S4_cJma9cRI/AAAAAAAAAAM/gxuLNyfQF50/s72-c/relationships.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-8654974467938308185</id><published>2009-12-31T09:38:00.002-05:00</published><updated>2009-12-31T10:01:06.885-05:00</updated><title type='text'>#APA2010</title><content type='html'>In between shortening my lifespan by doing a crazy yardwork project this week, I've been following with interest the tweets from &lt;a href="http://twitter.com/#search?q=%23MLA09"&gt;#MLA09&lt;/a&gt;.  A couple of items of interest were that Digital Humanities has become an &lt;a href="http://chronicle.com/blogPost/The-MLAthe-Digital/19468/"&gt;overnight success&lt;/a&gt; (only decades in the making), the job market (still) &lt;a href="http://www.briancroxall.net/2009/12/28/the-absent-presence-todays-faculty/"&gt;reeks&lt;/a&gt;, and there are &lt;a href="http://nowviskie.org/2009/monopolies-of-invention/"&gt;serious inequities&lt;/a&gt; in the status of non-faculty collaborators in DH projects.  None of this is new, of course, but it's good to see it so well stated in a highly-visible venue.&lt;br /&gt;&lt;br /&gt;I'm more than ever convinced that, despite the occasional feelings of regret, I made the right decision to stop seeking faculty employment after I got my Ph.D. DH was not then, and perhaps still isn't now, a hot topic in Classics.  It is odd, because some of the most innovative DH work comes out of Classics, but, as I've said on a number of occasions, DH pickup in the field is concentrated in a few folks who are 20 years ahead of everyone else.  It's interesting to speculate why this may be so.  Classics is hard: you have to master (at least) a couple of ancient languages (Latin, Greek at least), plus a couple of modern ones (French and German are the most likely suspects, but maybe Italian, Spanish, Modern Greek, etc. also, depending on your specialization), then a body of literature, history, and art before you can do serious work.  Ph.D.s from other disciplines sometimes quail when I describe the comps we had to go through (2 3-hour translation exams, 2 4-hour written exams, and an oral—and that's before you got to do your proposal defense).  It may be that there's no room for anything else in this mix, and it's something you have to add later on.  Virtually all the "digital classicists" I know are either tenured or are not faculty (and aren't going to be—at least not in Classics).  It's all a bit grim really.  A decade ago, if you were a grad student in Classics with an interest in DH, you were doomed unless you were willing to suppress that interest until you had tenure.  I don't know whether that's changed at all.  I hope it has.&lt;br /&gt;&lt;br /&gt;The good news, of course, is that digital skills are highly portable (and better-paid).  The one on-campus interview I had (for which I wasn't offered the job) would have paid several thousand (for a tenure-track job!) less than the (academic!) programming job I ended up taking.  And as fate would have it, I ended up doing &lt;a href="http://papyri.info"&gt;digital classics&lt;/a&gt; anyway, at least until the grant money runs out.&lt;br /&gt;&lt;br /&gt;So I wonder what the twitter traffic from &lt;a href="http://www.apaclassics.org/AnnualMeeting/10mtg/10meeting.html"&gt;APA10&lt;/a&gt; will be like next week.  Maybe DH will be the next big thing there too, but a scan of the program doesn't leave me optimistic.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-8654974467938308185?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/8654974467938308185/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=8654974467938308185' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/8654974467938308185'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/8654974467938308185'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2009/12/apa2010.html' title='#APA2010'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-5049491725200728188</id><published>2009-12-16T07:23:00.009-05:00</published><updated>2009-12-17T13:18:55.786-05:00</updated><title type='text'>Converting APIS</title><content type='html'>On Monday, I finished converting the APIS (Advanced Papyrological Information System) intake files to EpiDoc XML. I thought I'd write it up, since I tried some new things to do it. The APIS intake files employ a MARC-inspired text format that looks like:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;cu001 | 1 | duke.apis.31254916&lt;br /&gt;cu035 | 1 | (NcD)31254916&lt;br /&gt;cu965 | 1 | APIS&lt;br /&gt;&lt;br /&gt;status | 1 | 1&lt;br /&gt;cu300 | 1 | 1 item : papyrus, two joining fragments mounted in &lt;br /&gt;glass, incomplete ; 19 x 8 cm&lt;br /&gt;cuDateSchema | 1 | b&lt;br /&gt;cuDateType | 1 | o&lt;br /&gt;cuDateRange | 1 | b&lt;br /&gt;cuDateValue | 1 | 199&lt;br /&gt;cuDateRange | 2 | e&lt;br /&gt;cuDateSchema | 2 | b&lt;br /&gt;cuDateType | 2 | o&lt;br /&gt;cuDateValue | 2 | 100&lt;br /&gt;cuLCODE | 1 | egy&lt;br /&gt;cu090 | 1 | P.Duk.inv. 723 R&lt;br /&gt;cu500 | 1 | Actual dimensions of item are 18.5 x 7.7 cm&lt;br /&gt;cu500 | 2 | 12 lines&lt;br /&gt;cu500 | 3 | Written along the fibers on the recto; written &lt;br /&gt;across the fibers on the verso in a different hand and &lt;br /&gt;inverse to the text on the recto&lt;br /&gt;cu500 | 4 | P.Duk.inv. 723 R was formerly P.Duk.inv. MF79 69 R&lt;br /&gt;cu510_m | 5 | http://scriptorium.lib.duke.edu/papyrus/records/723r.html&lt;br /&gt;cu520 | 6 | Papyrus account of wheat from the Arsinoites (modern &lt;br /&gt;name: Fayyum), Egypt. Mentions the bank of Pakrouris(?)&lt;br /&gt;cu546 | 7 | In Demotic&lt;br /&gt;cu655 | 1 | Documentary papyri Egypt Fayyum 332-30 B.C&lt;br /&gt;cu655 | 2 | Accounts Egypt Fayyum 332-30 B.C&lt;br /&gt;cu655 | 3 | Papyri&lt;br /&gt;&lt;br /&gt;cu653 | 1 | Accounting -- Egypt -- Fayyum -- 332-30 B.C.&lt;br /&gt;cu653 | 2 | Banks and banking -- Egypt -- Fayyum -- 332-30 B.C.&lt;br /&gt;cu653 | 3 | Wheat -- Egypt -- Fayyum -- 332-30 B.C.&lt;br /&gt;cu245ab | 1 | Account of wheat [2nd cent. B.C.]&lt;br /&gt;cuPart_no | 1 | 1&lt;br /&gt;cuPart_caption | 1 | Recto&lt;br /&gt;cuPresentation_no | 1 | 1 | 1&lt;br /&gt;cuPresentation_display_res | 1 | 1 | thumbnail&lt;br /&gt;cuPresentation_url | 1 | 1 | http://scriptorium.lib.duke.edu/papyrus/images/thumbnails/723r-thumb.gif&lt;br /&gt;cuPresentation_format | 1 | 1 | image/gif&lt;br /&gt;cuPresentation_no | 1 | 2 | 2&lt;br /&gt;cuPresentation_display_res | 1 | 2 | 72dpi&lt;br /&gt;cuPresentation_url | 1 | 2 | http://scriptorium.lib.duke.edu/papyrus/images/72dpi/723r-at72.gif&lt;br /&gt;cuPresentation_format | 1 | 2 | image/gif&lt;br /&gt;cuPresentation_no | 1 | 3 | 3&lt;br /&gt;cuPresentation_display_res | 1 | 3 | 150dpi&lt;br /&gt;cuPresentation_url | 1 | 3 | http://scriptorium.lib.duke.edu/papyrus/images/150dpi/723r-at150.gif&lt;br /&gt;cuPresentation_format | 1 | 3 | image/gif&lt;br /&gt;perm_group | 1 | w&lt;br /&gt;&lt;br /&gt;cu090_orgcode | 1 | NcD&lt;br /&gt;cuOrgcode | 1 | NcD&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Some of the element names come from, and have the semantics of MARC, while others don't. Fields are delimited with pipe characters '|' and are sometimes 3 columns, sometimes 4. The second column is meant to express order, e.g. cu500 (general note) 1, 2, 3, and 4. If there are 4 columns, the third is used to link related fields, e.g. an image with its label. The last column is the field data, which can wrap to multiple lines. This has to be converted to EpiDoc like:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;&lt;br /&gt;&amp;lt;TEI xmlns="http://www.tei-c.org/ns/1.0"&amp;gt;&lt;br /&gt;   &amp;lt;teiHeader&amp;gt;&lt;br /&gt;      &amp;lt;fileDesc&amp;gt;&lt;br /&gt;         &amp;lt;titleStmt&amp;gt;&lt;br /&gt;            &amp;lt;title&amp;gt;Account of wheat [2nd cent. B.C.]&amp;lt;/title&amp;gt;&lt;br /&gt;         &amp;lt;/titleStmt&amp;gt;&lt;br /&gt;         &amp;lt;publicationStmt&amp;gt;&lt;br /&gt;            &amp;lt;authority&amp;gt;APIS&amp;lt;/authority&amp;gt;&lt;br /&gt;            &amp;lt;idno type="apisid"&amp;gt;duke.apis.31254916&amp;lt;/idno&amp;gt;&lt;br /&gt;            &amp;lt;idno type="controlno"&amp;gt;(NcD)31254916&amp;lt;/idno&amp;gt;&lt;br /&gt;         &amp;lt;/publicationStmt&amp;gt;&lt;br /&gt;         &amp;lt;sourceDesc&amp;gt;&lt;br /&gt;            &amp;lt;msDesc&amp;gt;&lt;br /&gt;               &amp;lt;msIdentifier&amp;gt;&lt;br /&gt;                  &amp;lt;idno type="invno"&amp;gt;P.Duk.inv. 723 R&amp;lt;/idno&amp;gt;&lt;br /&gt;               &amp;lt;/msIdentifier&amp;gt;&lt;br /&gt;               &amp;lt;msContents&amp;gt;&lt;br /&gt;                  &amp;lt;summary&amp;gt;Papyrus account of wheat from the Arsinoites (modern name: Fayyum), Egypt. &lt;br /&gt;                    Mentions the bank of Pakrouris(?)&amp;lt;/summary&amp;gt;&lt;br /&gt;                  &amp;lt;msItem&amp;gt;&lt;br /&gt;                     &amp;lt;note type="general"&amp;gt;Actual dimensions of item are 18.5 x 7.7 cm&amp;lt;/note&amp;gt;&lt;br /&gt;                     &amp;lt;note type="general"&amp;gt;12 lines&amp;lt;/note&amp;gt;&lt;br /&gt;                     &amp;lt;note type="general"&amp;gt;Written along the fibers on the recto; written across the fibers on &lt;br /&gt;                       the verso in a different hand and inverse to the text on the recto&amp;lt;/note&amp;gt;&lt;br /&gt;                     &amp;lt;note type="general"&amp;gt;P.Duk.inv. 723 R was formerly P.Duk.inv. MF79 69 R&amp;lt;/note&amp;gt;&lt;br /&gt;                     &amp;lt;textLang mainLang="egy"&amp;gt;In Demotic&amp;lt;/textLang&amp;gt;&lt;br /&gt;                  &amp;lt;/msItem&amp;gt;&lt;br /&gt;               &amp;lt;/msContents&amp;gt;&lt;br /&gt;               &amp;lt;physDesc&amp;gt;&lt;br /&gt;                  &amp;lt;p&amp;gt;1 item : papyrus, two joining fragments mounted in glass, incomplete ; 19 x 8 cm&amp;lt;/p&amp;gt;&lt;br /&gt;               &amp;lt;/physDesc&amp;gt;&lt;br /&gt;               &amp;lt;history&amp;gt;&lt;br /&gt;                  &amp;lt;origin&amp;gt;&lt;br /&gt;                     &amp;lt;origDate notBefore="-0199" notAfter="-0100"/&amp;gt;&lt;br /&gt;                  &amp;lt;/origin&amp;gt;&lt;br /&gt;               &amp;lt;/history&amp;gt;&lt;br /&gt;            &amp;lt;/msDesc&amp;gt;&lt;br /&gt;         &amp;lt;/sourceDesc&amp;gt;&lt;br /&gt;      &amp;lt;/fileDesc&amp;gt;&lt;br /&gt;      &amp;lt;profileDesc&amp;gt;&lt;br /&gt;         &amp;lt;langUsage&amp;gt;&lt;br /&gt;            &amp;lt;language ident="en"&amp;gt;English&amp;lt;/language&amp;gt;&lt;br /&gt;            &amp;lt;language ident="egy-Egyd"&amp;gt;In Demotic&amp;lt;/language&amp;gt;&lt;br /&gt;         &amp;lt;/langUsage&amp;gt;&lt;br /&gt;         &amp;lt;textClass&amp;gt;&lt;br /&gt;            &amp;lt;keywords scheme="#apis"&amp;gt;&lt;br /&gt;               &amp;lt;term&amp;gt;Accounting -- Egypt -- Fayyum -- 332-30 B.C.&amp;lt;/term&amp;gt;&lt;br /&gt;               &amp;lt;term&amp;gt;Banks and banking -- Egypt -- Fayyum -- 332-30 B.C.&amp;lt;/term&amp;gt;&lt;br /&gt;               &amp;lt;term&amp;gt;Wheat -- Egypt -- Fayyum -- 332-30 B.C.&amp;lt;/term&amp;gt;&lt;br /&gt;               &amp;lt;term&amp;gt;&lt;br /&gt;                  &amp;lt;rs type="genre_form"&amp;gt;Documentary papyri Egypt Fayyum 332-30 B.C&amp;lt;/rs&amp;gt;&lt;br /&gt;               &amp;lt;/term&amp;gt;&lt;br /&gt;               &amp;lt;term&amp;gt;&lt;br /&gt;                  &amp;lt;rs type="genre_form"&amp;gt;Accounts Egypt Fayyum 332-30 B.C&amp;lt;/rs&amp;gt;&lt;br /&gt;               &amp;lt;/term&amp;gt;&lt;br /&gt;               &amp;lt;term&amp;gt;&lt;br /&gt;                  &amp;lt;rs type="genre_form"&amp;gt;Papyri&amp;lt;/rs&amp;gt;&lt;br /&gt;               &amp;lt;/term&amp;gt;&lt;br /&gt;            &amp;lt;/keywords&amp;gt;&lt;br /&gt;         &amp;lt;/textClass&amp;gt;&lt;br /&gt;      &amp;lt;/profileDesc&amp;gt;&lt;br /&gt;   &amp;lt;/teiHeader&amp;gt;&lt;br /&gt;   &amp;lt;text&amp;gt;&lt;br /&gt;      &amp;lt;body&amp;gt;&lt;br /&gt;         &amp;lt;div type="bibliography" subtype="citations"&amp;gt;&lt;br /&gt;            &amp;lt;p&amp;gt;&lt;br /&gt;               &amp;lt;ref target="http://scriptorium.lib.duke.edu/papyrus/records/723r.html"&amp;gt;Original record&amp;lt;/ref&amp;gt;.&amp;lt;/p&amp;gt;&lt;br /&gt;         &amp;lt;/div&amp;gt;&lt;br /&gt;         &amp;lt;div type="figure"&amp;gt;&lt;br /&gt;            &amp;lt;figure&amp;gt;&lt;br /&gt;               &amp;lt;head&amp;gt;Recto&amp;lt;/head&amp;gt;&lt;br /&gt;               &amp;lt;figDesc&amp;gt; thumbnail&amp;lt;/figDesc&amp;gt;&lt;br /&gt;               &amp;lt;graphic url="http://scriptorium.lib.duke.edu/papyrus/images/thumbnails/723r-thumb.gif"/&amp;gt;&lt;br /&gt;            &amp;lt;/figure&amp;gt;&lt;br /&gt;            &amp;lt;figure&amp;gt;&lt;br /&gt;               &amp;lt;head&amp;gt;Recto&amp;lt;/head&amp;gt;&lt;br /&gt;               &amp;lt;figDesc&amp;gt; 72dpi&amp;lt;/figDesc&amp;gt;&lt;br /&gt;               &amp;lt;graphic url="http://scriptorium.lib.duke.edu/papyrus/images/72dpi/723r-at72.gif"/&amp;gt;&lt;br /&gt;            &amp;lt;/figure&amp;gt;&lt;br /&gt;            &amp;lt;figure&amp;gt;&lt;br /&gt;               &amp;lt;head&amp;gt;Recto&amp;lt;/head&amp;gt;&lt;br /&gt;               &amp;lt;figDesc&amp;gt; 150dpi&amp;lt;/figDesc&amp;gt;&lt;br /&gt;               &amp;lt;graphic url="http://scriptorium.lib.duke.edu/papyrus/images/150dpi/723r-at150.gif"/&amp;gt;&lt;br /&gt;            &amp;lt;/figure&amp;gt;&lt;br /&gt;         &amp;lt;/div&amp;gt;&lt;br /&gt;      &amp;lt;/body&amp;gt;&lt;br /&gt;   &amp;lt;/text&amp;gt;&lt;br /&gt;&amp;lt;/TEI&amp;gt;&lt;br /&gt;&lt;/pre&gt; &lt;br /&gt;&lt;br /&gt;I started learning Clojure this summer.  Clojure is a Lisp implementation on top of the Java Virtual Machine.  So I thought I'd have a go at writing an APIS converter in it.  The result is probably thoroughly un-idiomatic Clojure, but it converts the 30,000 plus APIS records to EpiDoc in about 2.5 minutes, so I'm fairly happy with it as a baby-step.  The script works by reading the intake file line by line and issuing SAX events that are handled by a Saxon XSLT TRansformerHandler, which in turn converts to EpiDoc.  So in effect, the intake file is treated as though it were an XML file and transformed with a stylesheet.&lt;br /&gt;&lt;br /&gt;Most of the processing is done with three functions:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;generate-xml&lt;/span&gt; takes a File, instantiates a transforming SAX handler from a pool of TransformerFactory objects, starts calling SAX events, and then hands off to the process-file function.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;(defn generate-xml&lt;br /&gt;  [file-var]&lt;br /&gt;    (let [xslt (.poll @templates)&lt;br /&gt;     handler (.newTransformerHandler (TransformerFactoryImpl.) xslt)]&lt;br /&gt;       (try&lt;br /&gt;         (doto handler&lt;br /&gt;           (.setResult (StreamResult. (File. (.replace &lt;br /&gt;             (.replace (str file-var) "intake_files" "xml") ".if" ".xml"))))&lt;br /&gt;           (.startDocument)&lt;br /&gt;           (.startElement "" "apis" "apis" (AttributesImpl.)))&lt;br /&gt;           (process-file (read-file file-var) "" handler)&lt;br /&gt;           (doto handler&lt;br /&gt;           (.endElement "" "apis" "apis")&lt;br /&gt;           (.endDocument))&lt;br /&gt;       (catch Exception e &lt;br /&gt;         (.println *err* (str (.getMessage e) " processing file " file-var))))&lt;br /&gt;     (.add @templates xslt)))&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;&lt;br /&gt;process-file&lt;/span&gt; recursively processes a sequence of lines from the file.  If lines is empty, we're at the end of the file, and we can end the last element and exit, otherwise, it splits the current line on pipe characters, calls handle line, then calls itself on the remainder of the line sequence.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;(defn process-file&lt;br /&gt;  [lines, elt-name, handler]&lt;br /&gt;  (if (empty? lines)&lt;br /&gt;    (.endElement handler "" elt-name elt-name)&lt;br /&gt;    (if (not (.startsWith (first lines) "#"))  ; comments start with '#' and can be ignored&lt;br /&gt;      (let [line (.split (first lines) "\\s+\\|\\s+")&lt;br /&gt;            ename (if (.contains (first lines) "|") (aget line 0) elt-name)]&lt;br /&gt;          (handle-line line elt-name handler)&lt;br /&gt;          (process-file (rest lines) ename handler)))))&lt;br /&gt;&lt;/pre&gt;  &lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;handle-line&lt;/span&gt; does most of the XML-producing work. The field name is emitted as an element, columns 2 (and 3 if it's a 4-column field) are emitted as @n and @m attributes, and the last column is emitted as character conthttp://www.blogger.com/img/blank.gifent.  If the line is a continuation of the preceding line, then it will be emitted as character data.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;(defn handle-line&lt;br /&gt;    [line, elt-name, handler]&lt;br /&gt;  (if (&gt; (alength line) 2) ; lines &lt; 2 columns long are either continuations or empty fields&lt;br /&gt;    (do (let [atts (AttributesImpl.)]&lt;br /&gt;        (doto atts&lt;br /&gt;          (.addAttribute "" "n" "n" "CDATA" (.trim (aget line 1))))&lt;br /&gt;        (if (&gt; (alength line) 3)&lt;br /&gt;          (doto atts&lt;br /&gt;            (.addAttribute "" "m" "m" "CDATA" (.trim (aget line 2)))))&lt;br /&gt;        (if (false? (.equals elt-name ""))&lt;br /&gt;          (.endElement handler "" elt-name elt-name))&lt;br /&gt;        (.startElement handler "" (aget line 0) (aget line 0) atts))&lt;br /&gt;        (let [content (aget line (- (alength line) 1))]&lt;br /&gt;            (.characters handler (.toCharArray (.trim content)) 0 (.length (.trim content)))))&lt;br /&gt;    (do &lt;br /&gt;      (if (== (alength line) 1)&lt;br /&gt;        (.characters handler (.toCharArray (aget line 0)) 0 (.length (aget line 0)))))))&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The &lt;span style="font-weight:bold;"&gt;-main&lt;/span&gt; function kicks everything off by calling &lt;span style="font-weight:bold;"&gt;init-templates&lt;/span&gt; to load up a ConcurrentLinkedQueue with new Template objects capable of generating an XSLT handler  and then kicking off a thread pool and mapping the &lt;span style="font-weight:bold;"&gt;generate-xml&lt;/span&gt; function to a sequence of files with the ".if" suffix.  -main takes 3 arguments, the directory to look for intake files in, the XSLT to use for transformation, and the number of worker threads to use. I've been kicking it off with 20 threads.  Speed depends on how much work my machine (3 GHc Intel Core 2 Duo Macbook Pro) is doing at the moment, but is quite zippy.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;(defn init-templates&lt;br /&gt;    [xslt, nthreads]&lt;br /&gt;  (dosync (ref-set templates (ConcurrentLinkedQueue.) ))&lt;br /&gt;  (dotimes [n nthreads]&lt;br /&gt;    (let [xsl-src (StreamSource. (FileInputStream. xslt))&lt;br /&gt;            configuration (Configuration.)&lt;br /&gt;            compiler-info (CompilerInfo.)]&lt;br /&gt;          (doto xsl-src &lt;br /&gt;            (.setSystemId xslt))&lt;br /&gt;          (doto compiler-info&lt;br /&gt;            (.setErrorListener (StandardErrorListener.))&lt;br /&gt;            (.setURIResolver (StandardURIResolver. configuration)))&lt;br /&gt;          (dosync (.add @templates (.newTemplates (TransformerFactoryImpl.) xsl-src compiler-info))))))&lt;br /&gt;  &lt;br /&gt;(defn -main&lt;br /&gt;    [dir-name, xsl, nthreads]&lt;br /&gt;  (def xslt xsl)&lt;br /&gt;  (def dirs (file-seq (File. dir-name)))&lt;br /&gt;  (init-templates xslt nthreads)&lt;br /&gt;  (let [pool (Executors/newFixedThreadPool nthreads)&lt;br /&gt;    tasks (map (fn [x]&lt;br /&gt;        (fn []&lt;br /&gt;          (generate-xml x)))&lt;br /&gt;      (filter #(.endsWith (.getName %) ".if") dirs))]&lt;br /&gt;      (doseq [future (.invokeAll pool tasks)]&lt;br /&gt;            (.get future))&lt;br /&gt;      (.shutdown pool)))&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;I had some conceptual difficulties figuring out how best to associate Templates with the threads that execute them.  The easy thing to do would be to put the Template creation in the function that is mapped to the file sequence, but that bogs down fairly quickly, presumably because a new Template is being created for each file and memory usage balloons pretty quickly. So that doesn't work. In Java, I'd either a) write a custom thread that spun up its own Template or b) create a pool of Templates.  After some messing around, I went with b) because I couldn't see how to do such an object-oriented thing in a functional way. b) was a bit hard too, because I couldn't see how to store Templates in a Clojure collection, access them, and use them without wrapping the whole process in a transaction, which seems like it would lock the collection much too much. So I used a threadsafe Java collection, ConcurrentLinkedQueue, which manages concurrent access to its members on its own. &lt;br /&gt;&lt;br /&gt;I've no doubt there are better ways to do this, and I expect I'll learn them in time, but for now, I'm quite pleased with my first effort. Next step will probably be to add some Schematron validation for the APIS files. My impression of Clojure is that it's really powerful, and a good way to write concurrent programs. To do it really well, I think you'd need a fairly deep knowledge of both Lisp-style functional programming and the underlying Java/JVM aspects, but that seems doable.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-5049491725200728188?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/5049491725200728188/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=5049491725200728188' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/5049491725200728188'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/5049491725200728188'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2009/12/converting-apis.html' title='Converting APIS'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-5140087941750117358</id><published>2009-10-27T10:19:00.006-04:00</published><updated>2009-10-27T12:32:59.883-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='DH'/><title type='text'>Object Artefact Script</title><content type='html'>A couple of weeks ago, I attended a workshop at the Edinburgh eScience Institute on the relation of text in ancient (and other) documents to its context and on the problems of reading difficult texts on difficult objects and ways in which technology can aid the process of interpretation and dissemination without getting in the way of it.  The meeting was well summarized by Alejandro Giacometti in his &lt;a href="http://giacometti.tumblr.com/post/213059488/object-artefact-script"&gt;blog&lt;/a&gt;, and the presentations are posted on the &lt;a href="http://www.nesc.ac.uk/action/esi/contribution.cfm?Title=1014"&gt;eSI wiki&lt;/a&gt;.  &lt;br /&gt;&lt;br /&gt;Kathryn Piquette discussed what would be required to digitally represent Egyptian hieroglyphic texts without divorcing them from their contexts as an integral part of monumental architecture. For example, the interpretation of the meaning of texts should be able to take into account the times of day (and/or year) when they would have been able to be read, their relationship to their surroundings, and so on.  The established epigraphical practice of divorcing the transcribed text from its context, while often necessary, does some violence to its meaning, and this must be recognized and accounted for.  At the same time, digital 3D reconstructions are themselves an interpretation, and it is important to disclose the evidence on which that interpretation is based. &lt;br /&gt;&lt;br /&gt;Ségolène Tarte talked about the process of scholarly interpretation in reading the &lt;a href="http://vindolanda.csad.ox.ac.uk/"&gt;Vindolanda tablets&lt;/a&gt; and similar texts.  As part of analysing the scholarly reading process, the &lt;a href="http://esad.classics.ox.ac.uk/"&gt;eSAD project&lt;/a&gt; observed two experts reading a previously-published tablet.  During the course of their work, they came up with a new reading that completely changed their understanding of the text.  The previous reading hinged on the identification of a single word, which led to the (mistaken) recognition of the document as recording the sale of an ox.  The new reading hinged on the recognition of a particular letterform as an 'a'.  The ways in which readings of difficult texts are produced—involving skipping around looking for recognizable pieces of text upon which (multiple) partial mental models of the texts are constructed, which must then be resolved somehow into a reading—means that an Interpretation Support System (such as the one eSAD proposes to develop) must be sensitive to the different ways of reading scholars use and must be careful not to impose "spurious exactitude" on them.&lt;br /&gt;&lt;br /&gt;Dot Porter gave an overview of a variety of projects that focus on representing text, transcription, and annotation alongside one another as a way into discussing the relationship between digital text and physical text.  She cautioned against attempts to digitally replicate the experience of the codex, since there is a great deal of (necessary) data interpolation that goes on in any detailed digital reconstruction, and this elides the physical reality of the text.  Digital representations may improve (or even make possible) the reading of difficult texts, such as the Vindolanda tablets or the &lt;a href="http://www.archimedespalimpsest.org/"&gt;Archimedes Palimpsest&lt;/a&gt;, so for purposes of interpretation, they may be superior to the physical reality.  They can combine data, metadata, and other contextual information in ways that help a reader to work with documents. But they cannot satisfactorily replicate the physicality of the document, and it may be a bit dishonest to try.&lt;br /&gt;&lt;br /&gt;I talked about the &lt;a href="http://github.com/hcayless/img2xml"&gt;img2xml&lt;/a&gt; project I'm working on with colleagues from UNC Chapel Hill.  I've got a post or two about that in the pipeline, so I won't say much here.  It involves the generation of SVG tracings of text in manuscript documents as a foundation for linking and annotation.  Since the technique involves linking to an XML-based representation of the text, it may prove superior to methods that rely simply on pointing at pixel coordinates in images of text.&lt;br /&gt;&lt;br /&gt;Ryan Bauman talked about the use of digital images as scholarly evidence.  He gave a fascinating overview of sophisticated techniques for imaging very difficult documents (e.g. carbonized, rolled up scrolls from Herculaneum) and talked about the need for documentation of the techniques used in generating the images.  This is especially important because the images produced will not resemble the way the document looks in visible light.  Ryan also talked about the difficulties involved in linking views of the document that may have been produced at different times, when the document was in different states, or may have used different techniques.  The Archimedes Palimpsest project is a good example of what's involved in referencing all of the images so that they can be linked to the transcription.&lt;br /&gt;&lt;br /&gt;Finally, Leif Isaksen talked about how some of the techniques discussed in the earlier presentations might be used in crowdsourcing the gathering of data about inscriptions.  Inscriptions (both published and unpublished) are frequently encountered (both in museums and out in the open) by tourists who may be curious about their meaning, but lack the ability to interpret them.  They may well, however, have sophisticated tools available for image capture, geo-referencing, and internet access (via digital cameras, smartphones, etc.).  Can they be employed, in exchange for information about the texts they encounter, as data gatherers?  &lt;br /&gt;&lt;br /&gt;Some themes that emerged from the discussion included: &lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;the importance of communicating the processes involved in generating digital representations of texts and their contexts (i.e. showing your work)&lt;/li&gt; &lt;br /&gt;&lt;li&gt;the need for standard ways of linking together image and textual data&lt;/li&gt;&lt;br /&gt;&lt;li&gt;the importance of disseminating data and code, not just results&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;This was a terrific workshop, and I hope to see followup on it.  ESAD is holding a workshop next month on "Understanding image-based evidence," that I'm sorry I can't attend and from which look forward to seeing the output.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-5140087941750117358?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/5140087941750117358/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=5140087941750117358' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/5140087941750117358'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/5140087941750117358'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2009/10/object-artefact-script.html' title='Object Artefact Script'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-1382727886631772746</id><published>2009-10-16T15:31:00.002-04:00</published><updated>2009-10-16T16:01:39.954-04:00</updated><title type='text'>Stomping on Innovation Killers</title><content type='html'>&lt;a href="http://twitter.com/foundhistory"&gt;@foundhistory&lt;/a&gt; has a nice &lt;a href="http://www.foundhistory.org/2009/10/16/3-innovation-killers-in-digital-humanities/"&gt;post&lt;/a&gt; on objections one might hear on a grant review panel that would unjustly torpedo an innovative proposal.  I thought it might be a good idea to take a sideways look at these as advice to grant writers.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:36pt"&gt;“&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;Haven’t X, Y, and Z already done this? We shouldn’t be supporting duplication of effort.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Are all of the stakeholders on board? (Hat tip to @patrickgmj for this gem.)&lt;/li&gt;&lt;br /&gt;&lt;li&gt;What about sustainability?&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-size:36pt"&gt;”&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So, some ideas for countering these when you're working on your proposal: &lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;Have you looked at work that's been done in this area (this might entail some real digging)?  If there are projects and/or literature that deal with the same areas as your proposal, then you should take them into account.  You need to be able to show you've done your homework and that your project is different from what's come before.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Who is your audience?  Have you talked to them?  If you can get letters of support from one or more of them, that will help silence the stakeholders objection.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;You ought to have some sort of story about sustainability and/or the future beyond the project, to show that you've thought about what comes next.  Even if your project is an experiment, you should talk about how you're going to disseminate the results so that those who come after will be able to build on your work.&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;I agree with Tom that these criticisms can be deployed to stifle creative work.  In technology, sometimes wheels need to be reinvented, sometimes the conventional wisdom is flat wrong, and sometimes worrying overmuch about the future paralyses you.  But if you're writing a proposal, assume these objections will be thrown at it, and do some prior thinking so you can spike them before they kill your innovative idea.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-1382727886631772746?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/1382727886631772746/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=1382727886631772746' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/1382727886631772746'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/1382727886631772746'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2009/10/stomping-on-innovation-killers.html' title='Stomping on Innovation Killers'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-6244740632285775431</id><published>2009-08-10T08:17:00.006-04:00</published><updated>2009-08-10T12:05:28.150-04:00</updated><title type='text'>Upgrade Notes</title><content type='html'>During my recent work on moving the &lt;a href="http://papyri.info/"&gt;Papyrological Navigator&lt;/a&gt; from Columbia to NYU, I ran into some issues that bear noting.  It's a bit hard to know whether these are generalizable, but they seem to me to be good examples of the kinds of things that can happen when you're upgrading a complex system, and I don't want to forget about them.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;&lt;a href="http://idp.atlantides.org/trac/idp/ticket/188"&gt;Issue #1&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;Search results in the PN are supposed to return with KWIC snippets, highlighting the search terms.  As part of the move, I upgraded Lucene to the latest release (2.4.1).  The Lucene in the PN was 2.3.x, but the developer at Columbia had worked hard to eke as much indexing speed out of it as possible, and had imported code from the 2.4 branch, with some modifications.  Since this code was really close to 2.4, I'd had reason to hope the upgrade would be smooth, and it mostly was.  Highlighting wasn't working for Greek though, even though the search itself was...&lt;br /&gt;&lt;br /&gt;Debugging this was really hard, because as it turned out, there was no failure in any of the running code.  It just wasn't running the right code.  A couple of the slightly modified Lucene classes in the PN codebase were being stepped on by the new Lucene because instead of a jar named "ddbdp.jar", the new PN jars were named after the project in which they resided (so, "pn-ddbdp-indexers.jar".  And they were getting loaded &lt;span style="font-style:italic;"&gt;after&lt;/span&gt; Lucene instead of before.  Not the first time I'd seen this kind of problem, but always a bit baffling.  In the end I moved the PN Lucene classes out of the way by changing their names and how they were called.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;&lt;a href="http://idp.atlantides.org/trac/idp/ticket/191"&gt;Issue #2&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This one was utterly baffling as well.  Lemmatized search (that is, searching for dictionary headwords and getting hits on all the forms of the word—very useful for inflected languages, like Greek) was working at Columbia, and not at NYU.  Bizarre.  I hadn't done &lt;span style="font-style:italic;"&gt;anything&lt;/span&gt; to the code.  Of course, it was my fault.  It almost always is the programmer's fault.  A few months before, in response to a bug report (and before I started working for NYU), I had updated the &lt;a href="http://epidoc.svn.sourceforge.net/viewvc/epidoc/trunk/transcoder/"&gt;transcoder&lt;/a&gt; software (which converts between various encodings for Ancient Greek) to conform to the recommended practice for choosing which precomposed (letter + accent) character to use when the same one (e.g. alpha + acute accent) occurs in both the Greek (Modern) and Greek Extended (Ancient) blocks in Unicode.  &lt;a href="http://socrates.berkeley.edu/~pinax/greekkeys/technicalDetails.html"&gt;Best practice&lt;/a&gt; is to choose the character from the Greek block, so \u03AC instead of \u1F71 for ά.  Transcoder &lt;span style="font-style:italic;"&gt;used&lt;/span&gt; to use the Greek Extended character, but since late 2008 it has followed the new recommendation and used characters from the Greek block, where available.  Unfortunately this change happened after transcoder had been used to build the lemma database that the PN uses to expand lemmatized queries.  So it had the wrong characters in it, and a search for any lemma containing an acute accent would fail.  Again, all the code was executing perfectly; some of the data was bad.  It didn't help that when I pasted lemmas into Oxygen, it normalized the encoding, or I might have realized sooner that there were differences.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;&lt;a href="http://idp.atlantides.org/trac/idp/ticket/230"&gt;Issue #3&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Last, but not least, was a bug which manifested as a failure in certain types of search.  "A followed by B within n places" searches worked, but "A and B (not in order) within n places" and "A but not B within n places" both failed.  Again, no apparent errors in the PN code.  The NullPointerException that was being thrown came from within the Lucene code!  After a lot of messing about, I was able to determine that the failure was due to a Lucene change that the PN code wasn't implementing against.  Once I'd found that, all it took to fix it was to override a method from the Lucene code.  This was actually a Lucene bug (https://issues.apache.org/jira/browse/LUCENE-1748) which I reported.  In trying to maintain backward compatibility, they had kept compile-time compatibility with pre-2.4 code, but broken it in execution.  I have to say, I was &lt;span style="font-weight:bold;"&gt;really&lt;/span&gt; impressed with how fast the Lucene team, particularly Mark Miller, responded.  The bug is already fixed.   &lt;br /&gt;&lt;br /&gt;So, lessons learned:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;Tests are good.  I didn't have any available for the project that contained all of the bugs listed here.  They exist (though coverage is spotty), but there are dependencies that are tricky to resolve, and I had decided to defer getting the tests to work in favor of getting the PN online.  Not having tests ate into the time I'd saved by deferring them.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;In both cases #1 and #3, I had to find the problem by reading the code and stepping through it in my head.  Practice this basic skill.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Look for ways your architecture may have changed during the upgrade.  Anything may be significant, including filenames.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Greek character encoding is the Devil (but I already knew that).&lt;/li&gt;&lt;br /&gt;&lt;li&gt;It's probably your fault, but it &lt;span style="font-style:italic;"&gt;might&lt;/span&gt; not be.  Look closely at API changes in libraries you upgrade.  Go look at the source if anything looks fishy.  I didn't expect to find anything wrong with something as robust as Lucene, but I did.&lt;/li&gt;&lt;br /&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-6244740632285775431?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/6244740632285775431/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=6244740632285775431' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/6244740632285775431'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/6244740632285775431'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2009/08/upgrade-notes.html' title='Upgrade Notes'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-5387335852914977283</id><published>2009-01-23T15:38:00.002-05:00</published><updated>2009-01-23T15:48:01.911-05:00</updated><title type='text'>Endings and Beginnings</title><content type='html'>It's been that sort of a week.  Great beginning with the inauguration on Tuesday and the start of a new Obama presidency.  My wife was in tears.  Growing up in a small southern town, she never imagined she'd see a black president, and now our youngest daughter will never know a world in which there hasn't been one.  Sometimes things do change for the better.&lt;br /&gt;&lt;br /&gt;On a personal note, I gave my notice to UNC on Tuesday.  My position was partially funded with soft money, and one-time money is one of the primary ways they're trying to address the budget crisis, in order not to lay off permanent employees (as is right and proper).  I'm rather sad about leaving, but I will be starting a job with the NYU digital library team in February, working on &lt;a href="http://papyri.info"&gt;digital papyrology&lt;/a&gt;.  This has the look of a job where I can unite both the Classics geek and the tech geek sides of my personality.  I may become unbearable.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-5387335852914977283?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/5387335852914977283/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=5387335852914977283' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/5387335852914977283'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/5387335852914977283'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2009/01/endings-and-beginnings.html' title='Endings and Beginnings'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-6528718985469993581</id><published>2008-12-31T14:55:00.004-05:00</published><updated>2009-01-05T14:38:32.962-05:00</updated><title type='text'>OpenLayers and Djatoka</title><content type='html'>For the last few weeks, I've been playing around with the new JPEG2000 image server released by the Los Alamos National Labs (&lt;a href="http://african.lanl.gov/aDORe/projects/djatoka/"&gt;http://african.lanl.gov/aDORe/projects/djatoka/&lt;/a&gt;).    I never could get the image viewer released along with it to work, and I immediately thought of OpenLayers (&lt;a href="http://openlayers.org"&gt;http://openlayers.org/&lt;/a&gt;), a javascript API for embedding maps.  OpenLayers is like Google Maps in many ways, but Free.  Besides maps, it works very well for any image, and provides a lot of tools developed for mapping, but also useful for displaying and working with any large image.  I wanted to use OpenLayers support for tiled images in conjunction with Djatoka's ability to render arbitrary sections of an image at a number of zoom levels (the number of levels available depends on how the image was compressed).&lt;br /&gt;&lt;br /&gt;After a lot of messing around and some false starts, I've developed a Javascript class that supports Djatoka's OpenURL API.  I've been testing it on JPEG2000 images created with ContentDM in the UNC Library's digital collections, with a good deal of success.  The results are not yet available online, because I don't have a public-facing server I can host it on, but the source code is up on github &lt;a href="http://github.com/hcayless/djatoka-openlayers-image-viewer/tree/master"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Instructions&lt;/span&gt;:&lt;br /&gt;&lt;br /&gt;Install Djatoka.  Incidentally, in order to get this in the queue for installation on our systems, I had to make Djatoka work on Tomcat 6.  The binary doesn't work out of the box, but when I rebuilt it on my system (RHEL 5), it worked fine.&lt;br /&gt;&lt;br /&gt;Copy the adore-djatoka WAR into your Tomcat webapps directory.  Follow the instructions on the Djatoka site to start the webapp.&lt;br /&gt;&lt;br /&gt;Grab a copy of OpenLayers.  Put the OpenURL.js file in lib/OpenLayers/Layer/ and run the build.py script.&lt;br /&gt;&lt;br /&gt;To just run the demo, copy the djatoka.html, the OpenLayers.js you just built, and the .css files from OpenLayers/theme/ and from the examples/ directory, as well as the OpenLayers control images from OpenLayers/img into the adore-djatoka directory in webapps.  You should then be able to access the djatoka.html file and see the demo.&lt;br /&gt;&lt;br /&gt;This all comes with no guarantees, of course.  It seems to work quite well with the JPEG2000 images I've tested, and the tiling means that each request of Djatoka consumes an equal amount of resources.  I've run into OutOfMemoryErrors when requesting full-size images, but this method loads them without any problem.&lt;br /&gt;&lt;br /&gt;Update (2009-01-05 14:37): I've posted a fix to the OpenURL.js script for a bug pointed out to me by John Fereira on the djatoka-devel list.  If you grabbed a copy before now, you should update.&lt;br /&gt;&lt;br /&gt;Update: screenshots --&lt;br /&gt;&lt;br /&gt;&lt;img src="http://farm2.static.flickr.com/1121/3165198310_d1f21b120c_o_d.jpg" /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;&lt;img src="http://farm2.static.flickr.com/1323/3164366943_c09a4952a4_o_d.jpg" /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;&lt;img src="http://farm4.static.flickr.com/3086/3165198612_eb04d43a58_o_d.jpg" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-6528718985469993581?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/6528718985469993581/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=6528718985469993581' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/6528718985469993581'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/6528718985469993581'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2008/12/openlayers-and-djatoka.html' title='OpenLayers and Djatoka'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-7547822467622848003</id><published>2008-10-29T09:12:00.001-04:00</published><updated>2008-10-29T09:14:44.120-04:00</updated><title type='text'>Thoughts on crosswalking</title><content type='html'>For the second Integrating Digital Papyrology project, we need to develop a method for crosswalking between EpiDoc (which is a dialect of TEI) and various database formats.  We've thought about this quite a bit in the past and we think that we don't just want to write a one-off conversion because (a) there will be more than one such conversion and (b) we want to be able to document the mappings between data sources in a stable format that isn't just code (script, XSLT, etc.)&lt;br /&gt;&lt;br /&gt;Some of the requirements for this notional tool are:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;should document mappings between data formats in a declarative fashion&lt;/li&gt;&lt;br /&gt;&lt;li&gt;certain fields will require complex transformations.  For example, the document text will likely be encoded in some variant of Leiden in the database, and will need to be converted to EpiDoc XML.  This is currently accomplished by a fairly complex Python script, so it should be possible to define categories of transformation which would signal a call to an external process.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;some mappings will involve the combination of database fields into a single EpiDoc element, and others, the division of a single field into multiple EpiDoc elements&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Context-specific information (not included in the database) will need to be inserted into the EpiDoc document, so some sort of templating mechanism should be supported.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;The mapping should be bidirectional.  We aren't just talking about exporting from a database to EpiDoc, but also about importing from EpiDoc, which is envisioned as an interchange format as well as a publication format.  This is why a single mapping document, rather than a set of instructions on how to get from one to the other would be nice.&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;So far, my questions to various lists have turned up favorable responses (i.e. "yes, that would be a good thing") but no existing standards....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-7547822467622848003?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/7547822467622848003/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=7547822467622848003' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/7547822467622848003'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/7547822467622848003'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2008/10/thoughts-on-crosswalking.html' title='Thoughts on crosswalking'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-8888572769585816357</id><published>2008-10-20T12:34:00.002-04:00</published><updated>2008-10-20T22:50:43.591-04:00</updated><title type='text'>On Bamboo the 2nd</title><content type='html'>I spent Thursday - Saturday last week at the second Bamboo workshop in San Francisco.  So some reactions:&lt;br /&gt;&lt;br /&gt;1) The organizers are well-intentioned and are sincerely trying to wrestle with the problem of cyberinfrastructure for Digital Humanities.&lt;br /&gt;&lt;br /&gt;2) That said, it isn't clear that the Bamboo approach is workable.  The team is very IT focused, and while they seem to have a solid grasp of large-scale software architecture, the ways in which that might be applied to the Humanities with any success aren't obvious.  There was a lot of misdirected effort between B1 and B2 by some very smart people, who I must say had the good grace to admit it was a nonstarter.  Their attempt to factor the practices of scholars into implementable activities resulted in something that lacked enough context and specificity to be useful.  A refocusing on context and on the processes that contain and help define the activities happened at the workshop and seems likely to go forward.&lt;br /&gt;&lt;br /&gt;3) The workshops themselves seem to have been quite useful.  I wasn't at any or the round one workshops, and I doubt I'll be at any of the others (I represented the UNC Library because the usual candidates weren't available), but everyone I talked to was very engaged (if often skeptical).  The connections and discussion that seem to have emerged so far probably make the investment worthwhile, even if "Bamboo" as conceived doesn't work.&lt;br /&gt;&lt;br /&gt;4) The best idea I heard came (not surprisingly) from Martin Mueller, who suggested Bamboo become a way to focus Mellon funding on projects that conform to certain criteria (such as reusable components and standards) for a defined period (say five years).  The actual outcome of the current Bamboo would be the criteria for the RFP.  Simple, encourages institutions to think along the right lines, might actually do some good, and might allow participation by smaller groups as well.  &lt;br /&gt;&lt;br /&gt;5) There was a lot of talk about the people who are both researchers and technologists (guilty).  These were variously defined as "hybrids," "translators," and, most offensively, "the white stuff inside the Oreo."  None of this was meant to be offensive, but in the end, it is.  People who can operate comfortably in both the worlds of scholarship and IT can certainly be useful go-betweens for those who can't, but that is not our sole raison d'être.  Until recently there haven't been many jobs for us, but that seems to be changing, and I hope it continues to.  See Lisa Spiro's excellent recent post on &lt;a href="http://digitalscholarship.wordpress.com/2008/10/18/digital-humanities-jobs/"&gt;Digital Humanities Jobs&lt;/a&gt; and Sean Gillies, who without having been there, manages to capture some of the reservations I feel about the current enterprise and pick up on the &lt;a href="http://sgillies.net/blog/819/second-guessing-project-bamboo/"&gt;educational aspect&lt;/a&gt;.  One possible useful future for Bamboo would be simply to foster the development of more "hybrids."&lt;br /&gt;&lt;br /&gt;6) The Bamboo folks have set themselves a truly difficult task.  They are making a real effort to tackle it in an open way, and should be commended for it.  But it is a very hard problem, and one for which there is still not a clear definition.  The software engineer part of my hybrid brain wants problems defined before it will even consider solutions.  The classicist part believes some things are just hard, and you can't expect technology to make them easy for you.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-8888572769585816357?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/8888572769585816357/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=8888572769585816357' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/8888572769585816357'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/8888572769585816357'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2008/10/on-bamboo-2nd.html' title='On Bamboo the 2nd'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-4604817491854047332</id><published>2008-09-28T20:50:00.004-04:00</published><updated>2008-09-29T07:59:02.918-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='zotero'/><title type='text'>Go Zotero!</title><content type='html'>The Thomson Reuters lawsuit against the developers of &lt;a href="http://www.zotero.org"&gt;Zotero&lt;/a&gt; is getting a &lt;a href="http://yro.slashdot.org/article.pl?sid=08/09/27/2113248"&gt;lot&lt;/a&gt; &lt;a href="http://laboratorium.net/archive/2008/09/28/thomson_reuters_the_gang_that_couldnt_sue_straight"&gt;of&lt;/a&gt; &lt;a href="http://dltj.org/article/zotero-lawsuit-extracts/"&gt;notice&lt;/a&gt;, which is good.&lt;br /&gt;&lt;br /&gt;I've noticed that in the library world, when people mention getting sued, it's with fear and the implication that this represents the end of the world.  It's an interesting contrast coming from working for a startup (albeit a pretty well-funded one) where lawsuits == a) publicity, and are not to be feared (perhaps even to be provoked) and/or b) are a signal that you've scared your competitors enough to make them go running to Daddy, thus unequivocally validating your business model.&lt;br /&gt;&lt;br /&gt;This is an act of sheer desperation on the part of Thomson Reuters.  They're hoping GMU will crumble and shut the project down.  I do hope Dan has contacted the &lt;a href="http://www.eff.org/"&gt;EFF&lt;/a&gt; (&lt;a href="http://www.eff.org/support"&gt;donate!&lt;/a&gt;) and that the GMU administration will take this for what it is: fantastic publicity for one of their most important &lt;a href="http://chnm.gmu.edu/index.php"&gt;departments&lt;/a&gt; and an indicator that they are doing something truly great.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-4604817491854047332?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/4604817491854047332/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=4604817491854047332' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/4604817491854047332'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/4604817491854047332'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2008/09/go-zotero.html' title='Go Zotero!'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-1817159289213922272</id><published>2008-08-15T22:36:00.002-04:00</published><updated>2008-08-15T22:54:21.325-04:00</updated><title type='text'>Back from Balisage</title><content type='html'>I never made it to Extreme, Balisage's predecessor, despite wanting to very badly, so I'm very glad I did go to its new incarnation.  I'm still processing the week's very rich diet of information, but it was very, very cool.&lt;br /&gt;&lt;br /&gt;Simon St. Laurent, who wrote one of the first XML books I bought back in 1999, &lt;a href="http://www.amazon.com/Inside-XML-DTDs-Scientific-Technical/dp/007134621X/ref=sr_1_4?ie=UTF8&amp;s=books&amp;qid=1218854797&amp;sr=8-4"&gt;Inside XML DTDs&lt;/a&gt; has a photo of one of the slides from my presentation in &lt;a href="http://news.oreilly.com/2008/08/mighty-markup-megadose.html"&gt;his Balisage roundup&lt;/a&gt; post.  This is the kind of κλέος I can appreciate!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-1817159289213922272?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/1817159289213922272/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=1817159289213922272' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/1817159289213922272'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/1817159289213922272'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2008/08/back-from-balisage.html' title='Back from Balisage'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-2244137176674159850</id><published>2008-08-14T14:15:00.003-04:00</published><updated>2008-08-14T14:23:29.664-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='balisageConference08'/><title type='text'>Balisage Presentation online</title><content type='html'>I just rsynced up my presentation on linking manuscript images to transcriptions using SVG for &lt;a href="http://www.balisage.net"&gt;Balisage&lt;/a&gt;, that I gave this morning.  It's at &lt;a href="http://www.unc.edu/~hcayless/img2xml/presentation.html"&gt;http://www.unc.edu/~hcayless/img2xml/presentation.html&lt;/a&gt;.  The image viewer embedded into the presentation is at &lt;a href="http://www.unc.edu/~hcayless/img2xml/viewer.html"&gt;http://www.unc.edu/~hcayless/img2xml/viewer.html&lt;/a&gt;.  Text paths are still busted at the highest resolution, as you'll see if you zoom all the way in, but apart from that it seems to work.&lt;br /&gt;&lt;br /&gt;Balisage has been a really great conference so far.  I highly recommend it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-2244137176674159850?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/2244137176674159850/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=2244137176674159850' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/2244137176674159850'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/2244137176674159850'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2008/08/balisage-presentation-online.html' title='Balisage Presentation online'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-9046650798420819239</id><published>2008-05-31T12:12:00.007-04:00</published><updated>2008-06-21T20:11:22.930-04:00</updated><title type='text'>New TransCoder release</title><content type='html'>This is something I've been meaning to wrap up and write up for a while now: thanks to the Duke &lt;span style="font-style: italic;"&gt;Integrating Digital Papyrology&lt;/span&gt; grant from the &lt;a href="http://www.mellon.org"&gt;Andrew W. Mellon Foundation&lt;/a&gt;, I've been able to make a bunch of updates to the Transcoder, a piece of software I originally wrote for the &lt;a href="http://epidoc.sf.net"&gt;EpiDoc&lt;/a&gt; project.  Transcoder is a Java program that handles switching the encodings of Greek text, for example from &lt;a href="http://en.wikipedia.org/wiki/Beta_code"&gt;Beta Code&lt;/a&gt; to &lt;a href="http://unicode.org"&gt;Unicode&lt;/a&gt; (or back again).  It's used in initiatives like &lt;a href="http://www.perseus.tufts.edu"&gt;Perseus&lt;/a&gt; and &lt;a href="http://www.stoa.org/projects/demos/home"&gt;Demos&lt;/a&gt;.  I've been modifying it to work with &lt;a href="http://scriptorium.lib.duke.edu/papyrus/texts/DDBDP.html"&gt;Duke Databank of Documentary Papyri&lt;/a&gt; XML files (which are &lt;a href="http://www.tei-c.org"&gt;TEI&lt;/a&gt; based).  Besides a variety of bug fixes, there is now also included in Transcoder a fully-functional SAX ContentHandler that allows the processing of XML files containing Greek text to be transcoded.&lt;br /&gt;&lt;br /&gt;There are a lot of complex edge cases in this sort of work.  For example, Beta Code (or at least the DDbDP's Beta) doesn't distinguish between medial (σ) and final (ς) sigmas.  That's an easy conversion in the abstract (just look for 's' at the end of a word, and it's final), but when your text is embedded in XML, and there may be an expansion (&amp;lt;expan&amp;gt;) tag in the middle of a word, for example, it becomes a lot harder.  You can't just convert the contents of a particular element--you have to be able to look ahead.  The problem with SAX, of course, is that it's stream-based, so no lookahead is possible unless you do some buffering.  In the end what I did was buffer SAX events when an element (say a paragraph) marked as Greek begins, and keep track of all the text therein.  That let me do the lookahead I needed to do, since I have a buffer containing the whole textual content of the &amp;lt;p&amp;gt; tag.  When the end of the element comes, I then flush the buffer, and all the queued-up SAX events fire, with the transcoded text in them.&lt;br /&gt;&lt;br /&gt;That's a lot of work for one letter, but I'm happy to say that it functions well now, and is being used to process the whole DDbDP.  Another edge case that I chose not to solve in the Transcoder program is the problem of apparatus and their contents in TEI.  An &amp;lt;app&amp;gt; element can contain a &amp;lt;lem&amp;gt; (lemma) and one or more &amp;lt;rdg&amp;gt; (readings).  The problem with it is that the lemma and readings are conceptually parallel in the text.  For example:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:Monaco, Courier, mono"&gt;The quick brown &amp;lt;lem&amp;gt;fox&amp;lt;/lem&amp;gt; jumped over the lazy dog.&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;rdg&amp;gt;cat&amp;lt;/rdg&amp;gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The TEI would be:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:Monaco, Courier, mono"&gt;The quick brown &amp;lt;app&amp;gt;&amp;lt;lem&amp;gt;fox&amp;lt;/lem&amp;gt;&amp;lt;rdg&amp;gt;cat&amp;lt;/rdg&amp;gt;&amp;lt;/app&amp;gt; jumped over the lazy dog&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So "cat" follows immediately after "fox" in the text stream, but both words occupy the same space as far as the markup is concerned.  In other words, I couldn't rely only on my fancy new lookahead scheme, because it broke down in edge cases like this.  The solution I went with is dumb, but effective: format the apparatus so that there is a newline after the lemma (and the reading, if there are multiple readings).  That way my code will still be able to figure out what's going on.  The whitespace so introduced really needs to be flagged as significant, so that it doesn't get clobbered by other XML processes though.  That has already happened to us once.  It caused a bug for me too, because I wasn't buffering ignorable whitespace.&lt;br /&gt;&lt;br /&gt;All that trouble over one little letter.  Lunate sigmas would have made life so much easier...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-9046650798420819239?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/9046650798420819239/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=9046650798420819239' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/9046650798420819239'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/9046650798420819239'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2008/05/new-transcoder-release.html' title='New TransCoder release'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-774703724042749192</id><published>2008-03-16T13:56:00.004-04:00</published><updated>2008-03-21T21:04:44.461-04:00</updated><title type='text'>D·M·S· Allen Ross Scaife 1960-2008</title><content type='html'>On Saturday afternoon, March 15th, I learned that my friend &lt;a href="http://www.uky.edu/%7Escaife/"&gt;Ross&lt;/a&gt; had died that morning after a long and hard-fought struggle with cancer. He was at his home in Lexington, Kentucky, surrounded by his family.&lt;br /&gt;&lt;br /&gt;Ross was one of the giants of the Digital Classics community. He was the guiding force behind the &lt;a href="http://www.stoa.org/"&gt;Stoa&lt;/a&gt;, and the founder of many of its projects. Ross was always generous with his time and resources and has been responsible for incubating many fledgling Digital Humanities initiatives.  His loss leaves a gap that will be impossible to fill.&lt;br /&gt;&lt;br /&gt;Ross was also a good friend, easy to talk to, and always ready to encourage me to experiment with new ideas.  I miss him very much.&lt;br /&gt;&lt;br /&gt;What he began will continue without him, and though we cannot ever replace Ross, we can honour his memory by carrying on his good work.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;update (March 21, 21:04)&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;a href="http://www.stoa.org/?p=786"&gt;Dot&lt;/a&gt; posted a lovely obituary of Ross at the Stoa.  &lt;a href="http://horothesia.blogspot.com/2008/03/connections-ross-scaife.html"&gt;Tom&lt;/a&gt; and &lt;a href="http://blogsearch.google.com/blogsearch?hl=en&amp;amp;q=Ross+Scaife&amp;amp;btnG=Search+Blogs"&gt;several others&lt;/a&gt; have posted nice memorials as well.&lt;br /&gt;&lt;br /&gt;On a happier note: my daughter, Caroline Emma Ross Cayless was born at 11:52 pm, March 19th. &lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-774703724042749192?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/774703724042749192/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=774703724042749192' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/774703724042749192'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/774703724042749192'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2008/03/dms-allen-ross-scaife-1960-2008.html' title='D·M·S· Allen Ross Scaife 1960-2008'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-8000279108826887922</id><published>2008-01-23T15:04:00.000-05:00</published><updated>2008-01-23T15:11:31.001-05:00</updated><title type='text'>Catching up</title><content type='html'>My New Year's resolution was to write more, and specifically to blog more, but so far all of my writing has been internally focussed at my job.  So I shall have another go...&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Speaking of New Year's, I spent a chunk of New Year's Eve getting &lt;a href="http://docsouth.unc.edu/csr"&gt;The Colonial and State Records of North Carolina&lt;/a&gt; off the ground.  It's driven by the &lt;a href="http://exist.sf.net"&gt;eXist&lt;/a&gt; XML database, of which I've grown rather fond.  &lt;a href="http://www.w3.org/TR/xquery/"&gt;XQuery&lt;/a&gt; has a lot of promise as a tool for digital humanists with large collections of XML.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-8000279108826887922?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/8000279108826887922/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=8000279108826887922' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/8000279108826887922'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/8000279108826887922'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2008/01/catching-up.html' title='Catching up'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-7205692024379222001</id><published>2007-10-22T10:20:00.000-04:00</published><updated>2007-10-22T13:11:09.738-04:00</updated><title type='text'></title><content type='html'>I've been at the &lt;a href="http://dhcs.northwestern.edu"&gt;Chicago Colloquium on Digital Humanities and Computer Science&lt;/a&gt; since yesterday, presenting on the Colonial and State Records project (available soon at http://docsouth.unc.edu).&lt;br /&gt;&lt;br /&gt;Interesting themes that have emerged:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The importance of Not Reading, i.e. how to use computational tools to investigate textual spaces when there is more text than you can digest by reading cover-to-cover.&lt;/li&gt;&lt;li&gt;Going beyond search: Discovery is an important task, but it's one we do quite well now, how do we go beyond just finding stuff and start to explore the data spaces that digital methods make available? Visualization tools are going to be an important component of this exploration.  Digitization and search hasn't changed the nature of research.  It has improved the speed with which research is done (nobody spends years producing concordances anymore), but it hasn't changed the questions we ask.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The dawn of Eurasian scholarship (this from Lewis Lancaster's talk): the divide between Occidental and Oriental scholarship no longer makes any sense (well, it never really did) and is probably over.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-7205692024379222001?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/7205692024379222001/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=7205692024379222001' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/7205692024379222001'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/7205692024379222001'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2007/10/ive-been-at-chicago-colloquium-on.html' title=''/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-3701749453517920548</id><published>2007-05-21T10:10:00.000-04:00</published><updated>2007-05-21T10:12:12.803-04:00</updated><title type='text'>Note to job seekers</title><content type='html'>When applying for a programming job, listing Dreamweaver as a skill is an automatic 50 demerits.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-3701749453517920548?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/3701749453517920548/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=3701749453517920548' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/3701749453517920548'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/3701749453517920548'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2007/05/note-to-job-seekers.html' title='Note to job seekers'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-8093749555000723996</id><published>2007-03-01T08:25:00.000-05:00</published><updated>2007-03-21T15:58:35.144-04:00</updated><title type='text'>I'm going to be a digital librarian!</title><content type='html'>As of March 15th, I will be working for the &lt;a href="http://www.lib.unc.edu/"&gt;UNC Library&lt;/a&gt; as a digital library programmer.  I'm going to miss &lt;a href="http://www.lulu.com/"&gt;Lulu&lt;/a&gt; a lot.  It's been a wonderful environment to work in, with people I'm going to find hard to leave.  But working with collections like &lt;a href="http://docsouth.unc.edu/"&gt;Documenting the American South&lt;/a&gt; is a text geek's Nirvana, so it was far too good an opportunity to pass up...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-8093749555000723996?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/8093749555000723996/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=8093749555000723996' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/8093749555000723996'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/8093749555000723996'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2007/03/im-going-to-be-diital-librarian.html' title='I&apos;m going to be a digital librarian!'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-5895305883569990269</id><published>2007-02-06T21:10:00.000-05:00</published><updated>2007-02-07T11:43:22.375-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Ruby XSLT'/><title type='text'>How has Ruby blown your mind?</title><content type='html'>...&lt;a href="http://on-ruby.blogspot.com/2007/01/blogging-contest-february-challenge.html"&gt;asks Pat Eyler&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I had the opportunity to learn Ruby as part of a work project last year and was immediately impressed by its object-orientation, its use of blocks, the straightforward way it handles multiple inheritance with modules, and just the elegance and speed with which I could work in it.  The moment that really changed the way I saw the language came when I had to generate previews of Word and OpenDocument (ODT) documents uploaded to the site I was working on.  Converting Word to ODT seemed like the way to go, since ODT has a zipped XML format, and can therefore be transformed to XHTML.  I have a lot of experience using XSLT to transform XML from one vocabulary to another, so this seemed like well explored territory to me, even if it would take a fair amount of work to accomplish.  As usual, I did some web-trolling to see who had dealt with this issue before me, in case the problem was already solved.  Google pointed me at J. David Eisenberg's &lt;a href="http://books.evc-cit.info/odf_utils/ruby_to_xhtml.html"&gt;ruby_odt_to_xhtml&lt;/a&gt;, which looked like a good start.  It didn't do everything I wanted, in particular it didn't handle footnotes adequately, but I didnt expect it would be too hard to modify.  The surprises came when I looked at the code...&lt;br /&gt;&lt;br /&gt;The first surprise was the utter lack of XSLT.  Not a huge surprise, perhaps.  I'd already gathered that Rubyists viewed XML with a &lt;a href="http://redhanded.hobix.com/images/camping-xml-situps.png"&gt;somewhat jaundiced eye&lt;/a&gt;.  Tim Bray has &lt;a href="http://www.tbray.org/ongoing/When/200x/2006/11/09/Optimizing-Ruby"&gt;lamented&lt;/a&gt; the state of XML support in Ruby as well.  Tim is quite right about the relative weakness of XML support in Ruby, even though I absolutely agree with the practice of avoiding XML configuration files.  There is a perfectly good Ruby &lt;a href="http://greg.rubyfr.net/pub/packages/ruby-xslt/files/README.html"&gt;frontend&lt;/a&gt; to &lt;a href="http://xmlsoft.org/XSLT/"&gt;libxslt&lt;/a&gt;, however, so it's use is not out of the question.  But there it was: for whatever reason, the author had decided not to use the technology I was familiar with...why would he do that, and could I still use his tool?&lt;br /&gt;&lt;br /&gt;The mind expansion came about when I started figuring out how to extend odt_to_xhtml to handle notes, which it was basically ignoring.  I wanted to turn ODT footnotes into endnotes with named anchors at the bottom of the page, links in the text to the corresponding anchor, and backlinks from the note to its link in the text.  Before describing what I found, I should give a little background on XSLT:&lt;br /&gt;&lt;br /&gt;At its most basic, XSLT expects input in the form of an XML document, and produces either XML or text output.  In XSLT, the functions are called templates.  Templates respond either to calls (as do functions in most languages) or, more often, to matches on the input XML document.  So a template like&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;lt;xsl:template match="text:p"&amp;gt;&lt;br /&gt;  &amp;lt;p&amp;gt;&amp;lt;xsl:apply-templates/&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;&amp;lt;/xsl:template&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;would be triggered every time a paragraph element in an OpenDocument content.xml is encountered and would output a &amp;lt;p&amp;gt; tag, yield to any other matching templates, and then close the &amp;lt;p&amp;gt; tag.&lt;br /&gt;&lt;br /&gt;As I looked at JDE's code, I saw lots of methods like this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;def process_text_list_item( element, output_node )&lt;br /&gt;style_name = register_style( element )&lt;br /&gt;item = emit_element( output_node, "li", {"class" =&gt; style_name} )&lt;br /&gt;process_children( element, item )&lt;br /&gt;end&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;emit_element does what it sounds like it does, adds a child element to the element passed in to the method with a hash of attribute name/value pairs.  It's process_children that really interests me:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;#  Process an element's children&lt;br /&gt;#  node: the context node&lt;br /&gt;#  output_node: the node to which to add the children&lt;br /&gt;#  xpath_expr: which children to process (default is all)&lt;br /&gt;#&lt;br /&gt;#  Algorithm:&lt;br /&gt;#  If the node is a text node, output to the destination.&lt;br /&gt;#  If it's an element, munge its name into&lt;br /&gt;#  &amp;lt;tt&amp;gt;process_prefix_elementname&amp;lt;/tt&amp;gt;. If that&lt;br /&gt;#  method exists, call it to handle the element. Otherwise,&lt;br /&gt;#  process this node's children recursively.&lt;br /&gt;#&lt;br /&gt;def process_children( node, output_node, xpath_expr="node()" )&lt;br /&gt;REXML::XPath.each( node, xpath_expr ) do |item|&lt;br /&gt;  if (item.kind_of?(REXML::Element)) then&lt;br /&gt;    str = "process_" + @namespace_urn[item.namespace] + "_" + item.name.tr_s(":-", "__")&lt;br /&gt;    if ODT_to_XHTML.method_defined?( str ) then&lt;br /&gt;      self.send( str, item, output_node )&lt;br /&gt;    else&lt;br /&gt;      process_children(item, output_node)&lt;br /&gt;    end&lt;br /&gt;  elsif (item.kind_of?(REXML::Text) &amp;&amp;amp; !item.value.match(/^\s*$/))&lt;br /&gt;    output_node.add_text(item.value)&lt;br /&gt;  end&lt;br /&gt;end&lt;br /&gt;#&lt;br /&gt;#  If it's empty, add a null string to force a begin and end&lt;br /&gt;#  tag to be generated&lt;br /&gt;if (!output_node.has_elements? &amp;&amp;amp; !output_node.has_text?) then&lt;br /&gt;  output_node.add_text("")&lt;br /&gt;end&lt;br /&gt;end&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Mind expansion ensued.  This Ruby class was doing exactly the same thing that I'd expect an XSLT stylesheet to do, with the help of a few lines of code to keep it going!  process_text_list_item is a template!  Coming from Java and then PHP, I'd have no hesitation switching to XSLT to accomplish a bit of XML processing like this, but in Ruby, there really wasn't any need.  I could write XSLT-like code perfectly naturally without ever leaving Ruby!&lt;br /&gt;&lt;br /&gt;Now, I still like XSLT, and I'd still use it in many cases like this, because it's portable across different lanaguages and platforms.  But here, where there are other considerations, it's wonderful that I'm not forced to step outside the language I'm working in to accomplish what I want.  In order to extend the code to handle notes, I just added some new template-like methods to match on notes and note-citations, e.g.:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;def process_text_note( element, output_node )&lt;br /&gt;process_children(element, output_node, "#{text_ns}:note-citation")&lt;br /&gt;end&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;In OpenDocument, notes are inline structures.  The note is embedded within the text at the point where the citation occurs, so to create endnotes, you need to split the note into a citation link and a note that is placed at the end of the output document.  To add the endnotes, I borrowed a trick from XSLT: modes.  If an XSL template has a mode="something" attribute, then that template will not match on an input node unless it was dispatched with an &lt;apply-templates mode="something"&gt;&amp;lt;apply-templates mode="something"/&amp;gt;.  So I did the same thing, e.g.:&lt;br /&gt;&lt;br /&gt;&lt;/apply-templates&gt;&lt;pre&gt;&lt;br /&gt;def process_text_note_mode_endnote( element, output_node )&lt;br /&gt;p = emit_element(output_node, "p", {"class" =&gt; "footnote"})&lt;br /&gt;process_children(element, p, "#{@text_ns}:note-citation", "endnote")&lt;br /&gt;process_text_s(element, p)&lt;br /&gt;process_children(element, p, "#{@text_ns}:note-body/#{@text_ns}:p[1]/node()")&lt;br /&gt;process_children(element, p, "#{@text_ns}:note-body/#{@text_ns}:p[1]/following-sibling::*")&lt;br /&gt;end&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The method that controls the processing flow in JDE's code is called analyze_content_xml.  I just added a call to my moded methods in analyze_content_xml and modified process_children to take a mode parameter.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;def process_children( node, output_node, xpath_expr="node()", mode=nil )&lt;br /&gt;if xpath_expr.nil?&lt;br /&gt;  xpath_expr = "node()"&lt;br /&gt;end&lt;br /&gt;REXML::XPath.each( node, xpath_expr ) do |item|&lt;br /&gt;  if (item.kind_of?(REXML::Element)) then&lt;br /&gt;    str = "process_" + @namespace_urn[item.namespace] + "_" + item.name.tr_s(":-", "__")&lt;br /&gt;    if mode&lt;br /&gt;      str += "_mode_#{mode}"&lt;br /&gt;    end&lt;br /&gt;    if ODT_to_XHTML.method_defined?( str ) then&lt;br /&gt;      self.send( str, item, output_node )&lt;br /&gt;    else&lt;br /&gt;      process_children(item, output_node)&lt;br /&gt;    end&lt;br /&gt;  elsif (item.kind_of?(REXML::Text) &amp;&amp;amp; !item.value.match(/^\s*$/))&lt;br /&gt;    output_node.add_text(item.value)&lt;br /&gt;  end&lt;br /&gt;end&lt;br /&gt;#&lt;br /&gt;#  If it's empty, add a null string to force a begin and end&lt;br /&gt;#  tag to be generated&lt;br /&gt;if (!output_node.has_elements? &amp;&amp;amp; !output_node.has_text?) then&lt;br /&gt;  output_node.add_text("")&lt;br /&gt;end&lt;br /&gt;end&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Done.  Easy.  Blew my mind.&lt;p&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-5895305883569990269?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/5895305883569990269/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=5895305883569990269' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/5895305883569990269'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/5895305883569990269'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2007/02/how-has-ruby-blown-your-mind.html' title='How has Ruby blown your mind?'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-296517108904392801</id><published>2007-01-20T23:05:00.000-05:00</published><updated>2007-01-20T23:27:14.693-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='javascript prototype'/><title type='text'>Prototype grows up</title><content type='html'>&lt;a href="http://www.prototypejs.org/"&gt;http://prototypejs.org&lt;/a&gt; is the new site for Prototype 1.5.  As the Ajaxian blog noted: &lt;a href="http://ajaxian.com/archives/prototype-15-now-with-documentation"&gt;Now with Documentation&lt;/a&gt;!  Of course, Prototype always had &lt;span style="font-style: italic;"&gt;some&lt;/span&gt; documentation; quite good documentation at that, even though there were substantial pieces missing and you had to go digging sometimes.&lt;br /&gt;&lt;br /&gt;Prototype played a big part in reawakening my interest in Javascript as a programming language.  I was rather anti-Javascript for a while, having fought many bloody battles with cross-browser incompatibilities in the early 2000's (UNC Chapel Hill, my then employer, had standardized, somewhat foolishly, on Netscape 4.7, but of course we had to support IE too -- nightmare).  I got back into it seriously when I started to notice all of the AJAXy and Web 2.0-ish stuff going on.  I've learned a lot from digging around in the Prototype source code, so the spotty documentation actually did me some good.&lt;br /&gt;&lt;br /&gt;Kudos to the Prototype development team and the contributors to the documentation effort.  You've done us all a great service!  I look forward to using 1.5...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-296517108904392801?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/296517108904392801/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=296517108904392801' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/296517108904392801'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/296517108904392801'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2007/01/prototype-grows-up.html' title='Prototype grows up'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-115642574453411931</id><published>2006-08-24T09:22:00.000-04:00</published><updated>2006-08-24T09:22:24.543-04:00</updated><title type='text'>XSL-FO 2.0 Workshop 2006</title><content type='html'>&lt;h3&gt;International Workshop on the future of the Extensible Stylesheet Language (XSL-FO) Version 2.0&lt;/h3&gt;&lt;br /&gt;&lt;a href="http://www.w3.org/Style/XSL/2006-Workshop/#announcement"&gt;XSL-FO 2.0 Workshop 2006&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I have two suggestions:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;use CSS instead of weird attribute-based style declarations.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;for God's sake, have a reference implementation.&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-115642574453411931?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/115642574453411931/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=115642574453411931' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/115642574453411931'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/115642574453411931'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2006/08/xsl-fo-20-workshop-2006.html' title='XSL-FO 2.0 Workshop 2006'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-115453058010070125</id><published>2006-08-02T10:40:00.000-04:00</published><updated>2006-08-02T10:56:21.436-04:00</updated><title type='text'>Boycott Blackboard!</title><content type='html'>I knew Blackboard had a patent application for their LMS, but apparently it has been &lt;a href="http://yro.slashdot.org/yro/06/08/02/1217219.shtml"&gt;granted&lt;/a&gt; and their first act was to file a lawsuit against one of their competitors.  This is terrible on many levels.  Not least that such a stupid patent should never have been granted,.  Of course, the USPTO would probably let me patent my nose hair. &lt;br /&gt;&lt;br /&gt;I certainly won't be using Blackboard for my XML class in the Fall, and I'd encourage other instructors to drop it too.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-115453058010070125?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/115453058010070125/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=115453058010070125' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/115453058010070125'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/115453058010070125'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2006/08/boycott-blackboard.html' title='Boycott Blackboard!'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-112990205399731288</id><published>2005-10-21T09:09:00.000-04:00</published><updated>2005-10-21T09:40:54.036-04:00</updated><title type='text'>Google Library</title><content type='html'>I just read John Battelle's &lt;a href="http://battellemedia.com/archives/001952.php"&gt;post&lt;/a&gt; on the AAP's lawsuit.  The comments are particularly interesting, with a couple of very strident ones criticising Google.  I have a theory about how Google plans to justify their actions:&lt;br /&gt;&lt;ol&gt;   &lt;li&gt;Libraries are allowed, under copyright law, to make a single copy of any work in their possession.  This is called the Library Exemption.  There is a nice outline of the terms &lt;a href="http://www.unc.edu/%7Eunclng/SILS-display8.htm"&gt;here&lt;/a&gt;.  The libraries themselves can't get in trouble for contracting with Google to do this for them, because they are receiving no commercial advantage from it.  Google clearly is receiving a competitive advantage from it, BUT:&lt;br /&gt;  &lt;/li&gt;   &lt;li&gt;They may be able to make a good case for Fair Use, depending on the nature of what they keep from the book.  There are four aspects to be weighed in any Fair Use defense (see &lt;a href="http://en.wikipedia.org/wiki/Fair_use"&gt;Wikipedia&lt;/a&gt;):&lt;/li&gt;   &lt;ol&gt;     &lt;li&gt;the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;&lt;/li&gt;     &lt;li&gt;the nature of the copyrighted work;&lt;/li&gt;     &lt;li&gt;the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and&lt;/li&gt;     &lt;li&gt;the effect of the use upon the potential market for or value of the copyrighted work.&lt;/li&gt;   &lt;/ol&gt; &lt;/ol&gt; Clearly Google hopes for commercial advantage from the use of the scanned books, so they might fail the first test.  The second doesn't really apply: these are clearly books subject fully to copyright law.  It's the third and fourth aspects that I think are the center of Google's defense.  A copy is a copy, but a searchable &lt;span style="font-style: italic;"&gt;index&lt;/span&gt; created from a scanned copy is arguably a transformative use of the book.  A human being can neither read the index, nor reconstruct the original from it, so Google may be able to successfully defend themselves on aspect #3.  Their main weakness is the existence of page images from the original scan.  These may or may not be stored and accessible in such a way that a whole copy of the original could be reconstructed and read.  Aspect #4 is another winner for Google.  The clear effect of this system will be to sell more copies of the publishers' books.  The only (theoretical) commercial harm caused to the publishers is that they are effectively prevented from rolling a Google Print of their own, which might bring them in more money than simply selling their books.  So Google wins on at least two of the four counts, and the act of copying itself is protected under the Library Exemption. &lt;br /&gt;&lt;br /&gt;I suspect the AAP would have an uphill battle in winning this one.  I wouldn't be surprised if they wanted Google to license their books for their index at some fairly exorbitant rate, and Google refused to pay because they're doing the publishers a favor.  That would make the lawsuit a negotiating tactic.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-112990205399731288?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/112990205399731288/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=112990205399731288' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/112990205399731288'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/112990205399731288'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2005/10/google-library.html' title='Google Library'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-111473311667859063</id><published>2005-04-28T20:02:00.000-04:00</published><updated>2005-04-28T20:05:16.680-04:00</updated><title type='text'>When SEOs Attack</title><content type='html'>Search Engine Foo: &lt;a href="http://iuniverse.com/"&gt;iUniverse Book Publishing: Book Publisher for Self Publishing and Print on Demand&lt;/a&gt;.  Care to guess what terms they're optimizing for?  It does seem to work.  They show up #1 for a Google search on "&lt;a href="http://www.google.com/search?hl=en&amp;q=self+publishing"&gt;self publishing&lt;/a&gt;."  So clearly this sort of spamming works.  But it leads to pretty hilarious prose:&lt;br /&gt;&lt;br /&gt;&lt;quote&gt;&lt;/quote&gt;&lt;blockquote&gt;&lt;quote&gt;iUniverse, the leading online book &lt;span style="color: rgb(255, 0, 0);"&gt;publisher&lt;/span&gt;, offers the most comprehensive book   &lt;span style="color: rgb(255, 0, 0);"&gt;publishing&lt;/span&gt; services in the self-&lt;span style="color: rgb(255, 0, 0);"&gt;publishing&lt;/span&gt; industry—awarded the Editor's Choice award by PC Magazine and chosen by thousands of satisfied authors as the leading print-on-demand book &lt;span style="color: rgb(255, 0, 0);"&gt;publisher&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;We help authors to prepare a manuscript, design and self-&lt;span style="color: rgb(255, 0, 0);"&gt;publish&lt;/span&gt; a book of professional quality, publicize and market their book, and print copies of their book for sale online and in bookstores around the world.&lt;br /&gt;&lt;br /&gt;As an innovative book &lt;span style="color: rgb(255, 0, 0);"&gt;publisher&lt;/span&gt;, we also offer exclusive services such as our acclaimed Editorial Review and our revolutionary Star Program, designed to discover and nurture exceptional new talent within our growing author community.&lt;br /&gt;&lt;br /&gt;Don't wait any longer to get that manuscript off your desk and into the marketplace. With iUniverse as your book &lt;span style="color: rgb(255, 0, 0);"&gt;publisher&lt;/span&gt;, you can become a &lt;span style="color: rgb(255, 0, 0);"&gt;published&lt;/span&gt; author in a matter of weeks. Why not get started today?&lt;/quote&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Yes, indeed. Publish your book with a publishing publisher and be published. Ouch. Not sure I'd pay them an exorbitant fee to edit my book.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-111473311667859063?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/111473311667859063/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=111473311667859063' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/111473311667859063'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/111473311667859063'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2005/04/when-seos-attack.html' title='When SEOs Attack'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-111115285418160272</id><published>2005-03-18T08:20:00.000-05:00</published><updated>2005-03-18T08:38:10.486-05:00</updated><title type='text'>Writing Code</title><content type='html'>I'm coming to the conclusion that writing code, as an activity, really is like writing prose. I find myself treating code projects just like writing projects:&lt;br /&gt;&lt;ol&gt;   &lt;li&gt;I spend the first part of the project thinking about it and being (apparently) very unproductive. (25-30%)&lt;br /&gt;&lt;/li&gt;   &lt;li&gt;After I reach some sort of critical mass in my thinking, I very quickly pour out everything into code/onto the page. The project is 80% done as far as volume goes at this point. (10-20%)&lt;br /&gt;&lt;/li&gt;   &lt;li&gt;I spend the rest of the time editing, bugfixing, refining, etc. (50-60%)&lt;/li&gt; &lt;/ol&gt; For larger projects, this cycle gets repeated for each component of the project. This is precisely the pattern I followed when writing my dissertation.  I don't know if this kind of working method is in any way typical, but it does seem to produce the desired results. It makes giving project completion estimates next to impossible though, because I really have no idea how long the project will take until I enter the hyper-productive phase, and when that's complete, I often still have a lot of work to do, even though the bulk of the code/writing is done.  &lt;br /&gt;&lt;br /&gt;This is why it's best for me if I've got some variety in a job. The hyper-productive phase really can't overlap with anything else: if I'm interrupted then, I'll get off track and it may blow the whole day, but in phase 1 or 3 I'm better off not spending all my time focusing on the project, because I'll just end up web surfing. Or blogging.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-111115285418160272?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/111115285418160272/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=111115285418160272' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/111115285418160272'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/111115285418160272'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2005/03/writing-code.html' title='Writing Code'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-110921056210113350</id><published>2005-02-23T20:59:00.000-05:00</published><updated>2005-03-23T08:52:54.813-05:00</updated><title type='text'>Tagging Notes</title><content type='html'>From a conversation this afternoon: the tags used in folksonomies are deliberately stupid. They are atomic units of information. So a tag can be any atomic unit--it doesn't have to be a word, it could be a URL, a zip code, or anything else you can think of that isn't reducible.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-110921056210113350?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/110921056210113350/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=110921056210113350' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/110921056210113350'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/110921056210113350'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2005/02/tagging-notes.html' title='Tagging Notes'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-110920864761127833</id><published>2005-02-23T20:30:00.000-05:00</published><updated>2005-02-23T20:30:47.610-05:00</updated><title type='text'>adaptive path » ajax: a new approach to web applications</title><content type='html'>&lt;a href="http://www.adaptivepath.com/publications/essays/archives/000385.php"&gt;adaptive path » ajax: a new approach to web applications&lt;/a&gt;.  Web application development is starting to get really exciting again.  The funny thing is that a lot of this technology has been around for a while, and even though IE supported it, you didn't see tools like these.  I wonder whether the development of Firefox is really what's pushed it.  Certainly nearly all the developers I know shun IE.  So perhaps having a capable Free browser was what sparked all this innovation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-110920864761127833?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/110920864761127833/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=110920864761127833' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/110920864761127833'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/110920864761127833'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2005/02/adaptive-path-ajax-new-approach-to-web.html' title='adaptive path » ajax: a new approach to web applications'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-110892541288972265</id><published>2005-02-22T23:00:00.000-05:00</published><updated>2005-02-22T22:59:12.700-05:00</updated><title type='text'>Blog-binding</title><content type='html'>Recently, I've been seeing a number of companies and projects springing up around the idea of publishing blogs as books. The examples I'm aware of are &lt;a href="http://blogbinders.com/"&gt;Blogbinders&lt;/a&gt;, &lt;a href="http://qoop.com/"&gt;qoop&lt;/a&gt;, &lt;a href="http://ljbook.com/"&gt;LJBook&lt;/a&gt;, and most recently, &lt;a href="http://www.bookthisblog.com/"&gt;book this blog&lt;/a&gt;, but I'd bet there are more. What I'm wondering about is how useful the blog-directly-to-book pathway is. Wouldn't an application that aggregates your blog posts into an editing environment (like Word or OpenOffice) be more useful? Can we really smooth over the formatting differences between web and print well enough to produce (automatically) a nice-looking book 100% of the time? I'm a little skeptical.&lt;br /&gt;&lt;br /&gt;From my (admittedly cursory) browsing, it looks like blogbinders have a human in the middle of the process, and qoop certainly did for their only title so far, John Battelle's &lt;a href="http://battellemedia.com/archives/001252.php"&gt;SearchBlog&lt;/a&gt;. Requiring a human being in the loop raises costs and introduces scaling issues.&lt;br /&gt;&lt;br /&gt;There's a fine tradition of publishing diaries, going back at least to &lt;a href="http://www.perseus.tufts.edu/cgi-bin/ptext?doc=Perseus%3Atext%3A1999.02.0002"&gt;Caesar&lt;/a&gt;, but unless you happen to be famous, or have a blog that's truly interesting a high percentage of the time, the way to monetize blogs is more likely to be on the &lt;a href="http://www.hardballtimes.com/"&gt;Hardball Times&lt;/a&gt; model, where you build up an audience, and then sell them work they're interested in.&lt;br /&gt;&lt;br /&gt;But if blogs really are the &lt;a href="http://www.salutor.com/2005/02/blogapalooza.html"&gt;ultimate vanity presses&lt;/a&gt;, then there may indeed be money in printing them, if you charge the blogger enough up front. It will be interesting to see how all this shakes out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-110892541288972265?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/110892541288972265/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=110892541288972265' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/110892541288972265'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/110892541288972265'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2005/02/blog-binding.html' title='Blog-binding'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-109884280756846099</id><published>2004-10-26T22:03:00.000-04:00</published><updated>2004-10-26T22:06:47.566-04:00</updated><title type='text'>Reflexive link</title><content type='html'>Just noted on my &lt;a href="http://people.lulu.com/blogs/view_post.php?post_id=3281"&gt;work&lt;/a&gt; blog the Wired article by Hilary Rosen on Creative Commons.  Always nice to see a new convert.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-109884280756846099?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/109884280756846099/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=109884280756846099' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/109884280756846099'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/109884280756846099'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2004/10/reflexive-link.html' title='Reflexive link'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8725958.post-109780087200037890</id><published>2004-10-14T20:37:00.000-04:00</published><updated>2004-10-14T20:42:09.460-04:00</updated><title type='text'>First Post</title><content type='html'>Just trying this again.  Oddly, &lt;a href="http://www.mozilla.org/projects/camino/"&gt;Camino&lt;/a&gt; wouldn't let me type in the text area, but &lt;a href="http://www.mozilla.org/products/firefox/"&gt;Firefox&lt;/a&gt; will. Curious....&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8725958-109780087200037890?l=philomousos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://philomousos.blogspot.com/feeds/109780087200037890/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=8725958&amp;postID=109780087200037890' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/109780087200037890'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8725958/posts/default/109780087200037890'/><link rel='alternate' type='text/html' href='http://philomousos.blogspot.com/2004/10/first-post.html' title='First Post'/><author><name>Hugh Cayless</name><uri>http://www.blogger.com/profile/05582281819129069158</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://4.bp.blogspot.com/-AzoP5gBkhsA/TlP6uXWN6pI/AAAAAAAAACo/UtHNVFJJgGk/s220/Photo%2B1.jpg'/></author><thr:total>0</thr:total></entry></feed>
