Thursday, September 16, 2004

I've been working for several weeks with the GEDCOM 6.0 XML format for genealogy data, and i've entered about half the random written information i have into this XML file, which is rendered by this XSLT into this browsable version (all are works in progress). The XSL closely follows this work by Michael Kay (who literally wrote the book on XSLT), and his example taught me a number of cool new tricks for this sometimes baroque but very powerful language.

One difficulty with using the GEDCOM 6.0 format is that information about an individual is distributed across several different elements, linked by an ID. For example, here's my individual information:

 <IndividualRec Id="BoisenSean">
   <NamePart Type="given name" Level="3">Sean Cornell</NamePart>
   <NamePart Type="surname" Level="1">Boisen</NamePart>
  <PersInfo Type="occupation">
   <Information>computer scientist, manager</Information>

but then my birth is represented in a separate event record

 <EventRec Id="BoisenSeanBirth" Type="birth" VitalType="birth">
   <Link Target="IndividualRec" Ref="BoisenSean" />
   <Link Target="IndividualRec" Ref="ClaycombDorothy" />
   <Link Target="IndividualRec" Ref="BoisenElliott" />
  <Date Calendar="Julian">September 21, 1958</Date>
    <PlacePart Type="town" Level="4">Tacoma</PlacePart>,
    <PlacePart Type="state" Level="2">Washington</PlacePart>,
    <PlacePart Type="country" Level="1">United States</PlacePart>

My death (had it already taken place) would be yet another event record, likewise for my marriage or other events. Yet another element type is used to record my membership in a family.

All of this gives it very much the feel of a relational data structure, because that's just what it is, for all the reasons that make relational structures appropriate. But for the simpler cases, i'm thinking it would be nice to take a more compact structure like this:

ID: BoisenSean
NAME: Boisen, Sean Cornell
OCCUPATION: computer scientist
BORN: September 21, 1958
   AT Tacoma, Washington, United States
   OF father [BoisenElliott]
   OF mother Boisen, Dorothy Louise (Claycomb)
MARRIED: July 25, 1998
   TO Zarba, Donna Irene (Jones)
   AT Andover, Essex County, Massachusetts, United States
NOTE: married at Free Christian Church by Jack L. Daniel
DIED: September 20, 2018
NOTE: this hasn't happened yet

and use a program to generate the various informational elements. Of course, this output won't be fully linked in to other records (if it could be, you wouldn't need the distributed representation in the first place), and will therefore require some manual adjustment. But particularly since i hope to gather a lot more information from relatives who don't even know how to spell XML, it seems some more amenable format may be required.

I'm working on some Perl code to process this, which i'll post once it's done (not quite yet).

11:51:26 PM #  Click here to send an email to the editor of this weblog.  comment []  trackback []