Saturday, November 12, 2005
Exciting developments at, where they're reporting they've completed "the first syntactically annotated electronic Greek New Testament". This seems certain to be a very rich New Testament resource, and one that is likely to support a lot of interesting follow-on work, similar to the history of the Penn Treebank for English.


I'm way behind on my blogging, but meant to post about BATSIS, the Biblical And Theological Studies Indexing System, which was kind enough to post about the hyper-concordance. BATSIS is all about tagging Bible resources in the style of

As a card-carrying computational linguist (well, maybe my membership has lapsed due to too much time spent doing management), i'm not completely on board with this whole tagging thing. What makes it good (the ability to generate ad hoc tags on the fly) is also its weakness: what exactly does the tag "bible" mean? If you google for "bible" and other things, you'll soon find the term is ambiguous: the XML Bible is nobody's idea of Holy Scripture. I guess the answer to "what does a given tag mean?" is "whatever some tagger meant it to mean", but that's not a very satisfactory answer. Nevertheless, anything that adds more structure and organizes resources has to be a good thing (though it's annoying that visiting their site brings up some video ad in another window).

They've also got a discussion going on about different schemes for abbreviating the names of Bible books (more evidence that you don't have to dig too deep before the tagging approach breaks down: see also the comments to "Biblical Lumpers and Splitters"). The only standard i use for resources at is OSIS, which uses abbreviations from the Society of Biblical Literature Manual of Style (warning, that's a 300-page PDF). An individual verse reference separates book, chapter and verse with periods ("John.3.16"). See the OSIS specification for additional details.

