Sunday, April 23, 2006

I've created a draft RDF vocabulary for books of the Bible, and i'm looking for feedback before officially publishing it. The vocabulary describes the books (with fairly traditional subclasses like NewTestamentBook > Epistle > GeneralEpistle), textual divisions, and authors. It includes some basic properties like names (in common, abbreviated, and full forms), authorship, and references for passages. RDF seemed to capture the majority of the most useful information, and it comes with fewer processing requirements, so i've used that rather than OWL (though that means losing a few niceties like inverse functions,  and cardinality and disjointness constraints). A picture named biblebooks.jpg

There's no information in this that would surprise anybody: the main benefit i see is providing URIs that uniquely define these books as resources for the emerging Semantic Web, something i haven't seen "out there" yet. This follows the recommendation of the Architecture of the World Wide Web that everything of importance should have a URI (Tim Berners-Lee even suggests that you provide one for yourself). So this provides a definitive URI for e.g. the book of Romans - - which other resources can refer to (including creating their own versions and stating equivalence) and extend. These URIs are names, not (yet) retrievable resources: so if you visit any of these in a browser, you won't find anything (but that's accepted practice for URIs).

Note that "Book" here is an abstract class, not any particular version or translation of one (those are Texts in this scheme). I got here by starting to layer additional OWL on top of the Composite Gospel Index in RDF (which has been out for nearly a year and a half now), and realized i hadn't properly defined the vocabulary i was using. So here i'm attempting to provide a fairly generic basic vocabulary that other vocabularies can build upon.

Some issues:

  • i'm not completely happy with the URI naming conventions where authors and conventional book names collide (e.g. Mark, both an author and the conventional name of a Gospel). The current version uses the shorter name in constructing the identifier for the author (, and a longer version for the book ( That's alright, but it means some book URIs are formed differently than others (e.g., where no prefixing of "LetterTo" or other special treatment is required). An alternative is to use a separate namespace for authors, so you'd have for the author, and for the book. But i'm a little uneasy with encoding classes in the namespace. Since there are distinct RDF properties for the names themselves, however, maybe this isn't such a big deal.
  • Typically you'd separate out instance data from the vocabulary itself, so the vocabulary can be used to define your own instances. In this case, though, the instances are really part of the reference information, and a closed set (at least in the conventional view!), so i've included them.
  • I've included a properties for authorship, though of course that's problematic, even if you only intend to represent traditional authorship claims.
  • The list of text types isn't very complete, though perhaps it's good enough for the New Testament, my main focus for the time being. Longer term, it ought to include other things like Prayer, perhaps Song?
  • I need to add more documentation before publishing it (though most of it ought to be clear).

Frankly, it seems to me this ought to be the work of some more respectable standards body like SBL or OSIS, rather than little old me. But since i'm already at a point where i need something like this, i'm forging ahead, and i'd appreciate any feedback that sem-web savvy Blogos readers might have.

The vocabulary is located here. Note this is a temporary location and should not be considered stable: it will definitely move once finalized.

9:32:00 AM #  Click here to send an email to the editor of this weblog.  comment []  trackback []