Friday, February 17, 2006

One of my big goals with SemanticBible continues to be a freely sharable New Testament lexicon that

  • has a semantic organization
  • connects existing lexical resources in Greek and English, particularly
  • lives on the web as a resource (using URIs), freely sharable and structured to be re-usable (which to me means RDF or OWL)

In fact, i was blogging about this stuff well over two years ago: it's discouraging how little progress i've made since then!

Here's a thought experiment about how such a lexicon might be structured, starting from available resources (and happily ignoring the thorny issues of publishers and copyright!). The work involved probably exceeds what i can accomplish, at least right now: that's why it's a thought experiment. But i keep finding myself backing into such a project from different directions.

The fundamental unit of this lexicon is the assignment of a word form to a sense. For example, the verb hubrizo has at least two senses: 'insult' and 'mistreat' (the NAS lexicon includes a third, 'to be insolent'). So there would be two (or three) entries corresponding to the dictionary form hubrizo. Louw-Nida provides this kind of sense-mapping information.

Each lexical entry would have several reference properties:

  • the lexical form itself in Greek, as well as an English gloss (i can't render Greek adequately in my blog, unfortunately)
  • the part of speech (here, Verb)
  • the Strong's number (here, 5195): this keys the entry to several existing lexicons
  • the domain and sub-domain identifiers from Louw-Nida (in the case of hubrizo/insult, that's Communication (#33) and Insult, Slander (P'), as well as the sense index (33.390, which also includes another sense, enubrizo/insult: this would be a different entry, however)
  • the corresponding English sense from Wordnet: here, that's the verb synset 00839460, which is an instance of verb.communication, and includes the other verbs "diss" and "affront", neither of which occur (as verbs) in any of the common Bible translations.

With this much information, and a proper Semantic Web or other linkable implementation, you could connect in a number of other resources. Eventually you'd want a concordance back to the text: that takes real work, because you have to determine for each form in the text which sense it is, which can't be done automatically (though maybe it can be bootstrapped).

How is this different than what we have now? Some of it exists on-line, but for dictionary forms, without senses (Zack Hubert's lexicon includes nice RESTful URLs). The pairing of dictionary forms and senses occurs in some lexicons (like Louw-Nida or the New American Standard Greek lexicon), but either in print versions, or inside proprietary software interfaces like Libronix. Nothing exists in OWL or RDF to my knowledge.

Moving from thought experiment to realization is a big task. While the vocabulary of the New Testament is a modest 5500 words, Louw-Nida's lexicon has 25,000 senses. But much of the key information already exists: it's "just" a matter of organizing it, combining it,  and transforming it to the right representation. If you're interested in pursuing such a goal (or can contribute toward it in some way), send me an email.


9:16:19 AM #  Click here to send an email to the editor of this weblog.  comment []  trackback []