James Strong, IT Heavyweight

James Strong, LL.D., S.T.D.

The author of Strong's Exhaustive Concordance was serious about data long before anybody had conceived of Information Technology as an occupation. My print version of Strong's has 1390 pages of small font, three-column excerpts from Bible verses, indexed by each and every single word in the King James Version. Even function words like "the" are included, though they're presented in a compressed format that just references the verse (10 pages for "the" in an 18-column format!).

In contrast, the preface is a single page: apparently Strong felt the data spoke for themselves. One additional page provides directions and explanations (I wonder what Tufte would say about the presentation). Everything else in the main concordance is just data.

I've been working on converting Strong's Greek Lexicon of the New Testament to an XML format, starting from the text that's incorporated in Crosswire's excellent Sword Project. I'm not quite there yet, but i've been impressed at Strong's intuitive grasp of the value of structured data, even though in a pre-computer era he could only express it via typography.

Structuring a Greek Lexicon

Strong's numbers have become the de facto reference system for Greek and Hebrew words in the BIble. This may be his single most important design choice: it allows someone who doesn't read the original languages to nevertheless find the precise lexical entry with ease. It also abstracts away from the grammatical details, unifying e.g. different inflected forms of verbs. Of course, you can only go so far without really understanding Greek: remember the adage about a little knowledge being a dangerous thing. But the popularity of Strong's system is certainly due in large measure to the ease with which complete novices can use it to find additional information about the original language. Some subsequent works (like Thayer's) use the same system.

Working with the whole dictionary, I discovered the entries from 3203 to 3302 aren't there! You'd probably never notice the gap from normal usage. I confirmed this is true in my print version as well. I'd love to know why: given that the words are in alphabetical order, it's quite curious. I'm tempted to suspect that he accidentally went from 3202 to 3303, and it didn't get noticed until late enough that changing the numbering would have been prohibitively hard (with an electronic version, it would be trivial of course). 2717 is also mysteriously missing.

The entries are structured as follows

I've been surprised at how much of Strong's entries are devoted to derivational information. A typical entry for a minor term spends as many bytes pointing you to other definitions as it does on the definition itself.

The inadequacy of his typographic approach becomes painfully obvious when you get to common words with a lot of different meanings and KJV translations. Here Strong devised a set of symbols:'+' indicates a many-to-one mapping from Greek to English, 'X' a Greek idiom, parentheses for function words or syllables that aren't the main word (e.g. '(let) go' for "ago"), brackets when an extra word is required in the English rendering.

It's pretty clear by this point that you really want a structured representation for all this: typography alone isn't a rich enough data representation for everything he wants to do.

Parsing Strong's Greek Dictionary

I created a new, ad hoc XML schema for the results. I worked for a while on creating something OSIS compliant, but i haven't worked through the specification enough yet to know exactly what that means.

Here's a sample entry, in all it's glory:

derivation=" from 1223 and 3049" gloss="cast in mind, consider, dispute, muse, reason, think.">
to reckon thoroughly, i.e. (genitive case) to deliberate (by reflection or discussion)

from 1223 and 3049; to reckon thoroughly, i.e. (genitive case) to deliberate (by reflection or discussion):--cast in mind, consider, dispute, muse, reason, think.


After all that ... i wrote some code to create the XMLified result, but i'm not happy with the results, i don't want to spend the time to fix it by hand, and i've concluded that Strong's isn't going to give me what i want anyway. I'll be looking to do the same kind of thing with Thayers, if i can find an OSIS version of it. I've been waiting to finish this post (which i started two weeks ago), but i've finally concluded this idea isn't going to get born. I guess that's what sometimes happens in blogland.

I haven't seen the newer, updated versions of his concordance, like the oddly titled "Strongest Strong's Exhaustive Concordance of the Bible." The one on my desk has probably been in my possession for 25 years or more. It has my notes in the front about "shamar" (Hebrew entry #8104), a Hebrew verb that i used to coin a meaningful middle name for Claire Shamara, my first-born, now 20 years old.