A Semantic Representation for the New Testament


The World Wide Web provides new capabilities for sharing information with humans across the globe, including the words of Scripture. But this information, expressed as it is in human language, is limited in two respects:

The Semantically-Annotated New Testament (SemANT) Project seeks to overcome these two limitations by representing the semantics of Scripture in a way that can be shared and processed across the Internet.

An early requirement for such an endeavor is the selection of an appropriate formalism for representing semantic knowledge. This document describes detailed criteria for such a formalism and presents an illustrative example. It addresses how to automatically translate this formalism into machine-readable representations for distribution across the World Wide Web. It also describes the relationship between this formalism and expressions in human language, including Greek (the language in which the New Testament was written), English, and other languages that currently lack a translation of the New Testament.

Elements of Semantic Representation

The purpose of SemANT is to provide an unambiguous representation for the linguistic meaning of each verse in the New Testament. To achieve this goal, it's necessary to define a representation language which has clear semantics. Such a language includes symbols which refer to a rich domain of information including individuals, concepts, actions, and attributes. It also includes a syntax, namely, rules for combining symbols into expressions, and rules for combining simple expressions into more complex ones.

This representation should be language-neutral, not tied directly to English, Greek, or another particular human language, to make it easier to translate it into other languages that don't yet have a Bible translation. While keeping machine processing as a primary goal, human readability is an important practical concern: someone should be able to understand in general terms what an expression in this language represents. To this end, the language must be represented in plain text (not binary data), and use readable symbols (not, for example, numeric indices).

This representation should substantially eliminate certain kinds of ambiguity that are common to human languages. Term ambiguity is the tendency for the same word to be used for several different (but usually related) meanings. For example, the Greek word 'logos' is often appropriately translated as 'word' in English, but it also includes other senses like 'message', 'gospel', and 'speech' (something spoken), which can only be determined by the context in which it is used. Other kinds of ambiguity are structural, or syntactic. A classic example for students of linguistics is "I saw the man in the park with a telescope": the speaker may be using the telescope to see the man, or the man may be carrying the telescope.

While the use of human intelligence together with context will reduce many kinds of ambiguity, it will by no means completely eradicate it. In some cases, Biblical scholars have not been able to agree what the proper reading of a verse should be. In such cases, the semantic language should allow the representation of alternatives, rather than forcing a choice. In addition, SemANT will only address ambiguity of linguistic expression, not philosophy or depth of understanding. For example, Jesus said "Heaven and earth will pass away, but my words will not pass away." (Matt 24:35) A number of things are clear from the language in this verse: heaven and earth refer to familiar concepts, "my" refers to things that belong to Jesus, and "words" refers to things said (by Jesus, in this context). "To pass away" here means to cease to exist, which he is describing as a future situation, which will not happen to Jesus' words, in contrast to heaven and earth. However, equally many things (or perhaps more) are not made clear. How will heaven and earth pass away, and when will this happen? What is Jesus' purpose in making this comparison? What does it mean for his words to not pass away? (Obviously the sound waves stopped reverberating long ago!) These are not linguistic questions, but theological and philosophical ones, hence this level of precision goes beyond linguistic interpretation.

Iterative Development of Semantic Annotation

While creating a full representation of the linguistic content for an expression is a highly detailed task, incremental progress should be possible. The following levels represent successively more detailed (and more difficult) types of annotation:

There are numerous other representation problems to address. Narrative portions of Scripture (like the Gospels and Acts) frequently include both quoted speech (Jesus said "...") and reported speech (he said to them, "... do not presume to say to yourselves, `We have Abraham as our father' ...", Matt 3:7-9).

An Example

Here's a detailed example, based on the first part of Col 3:16, "Let the word of Christ dwell in you richly ...".

I've freely invented both notation and symbols for this example: they should be taken as indicative of the kind of representation that is likely to be used, though the details may vary.

