GEDCOM: The Next Generation
Most genealogists who use computers to store their ancestral data can tell you that GEDCOM is important but they are probably unable to say what it does, or even what GEDCOM stands for. And yet, hundreds of technically-oriented genealogists around the world are working together to design a better GEDCOM that will support the very process of genealogy research, not just the recording of research results.
GEDCOM is the data format that allows different types of computers and programs to exchange genealogical data. GEDCOM is an acronym of Genealogical Data Communication. GEDCOM was originally developed by the Family History Department of the Church of Jesus Christ of Latter-day Saints to pass data between the various computer systems at the Family History Library, LDS temples, and archives of the LDS Church, including its Personal Ancestral File program. It has also been adopted by virtually all other vendors of genealogical programs and integrated into their products, becoming the standard for data transfer among genealogy programs.
Thanks to GEDCOM, you can enter data into Family Tree Maker, export it to Family Origins, download data from the LDS Ancestral File and integrate it into your own, then upload your finished project to an online database such as GENSOURCE or convert the data to Web pages. You can send your data to relatives without regard to the computer program they are using.
A GEDCOM file is a plain text file that can be read by any word processor, but seldom is. That's because it was meant to convey data from one computer program to another and so its content is arranged in a sort of code. Experts can look into a GEDCOM file and understand a family structure, but reading GEDCOM directly is not for the faint of heart.
The GEDCOM standard itself has evolved over the years to its current level, version 5.5. Of course, how well various programs implement this standard determines how successful a transfer of data will be. Problems with varying field lengths, field names, and even types of relationships have caused some users to complain that GEDCOM is not flexible enough for their needs. Also, the latest generation of genealogy programs is keeping track of far more types of data than the GEDCOM standard has defined for transfers. For example, recording sources of information associated with events or facts in a genealogy program is becoming important. Yet, there is no uniform way to transfer this source information to another program using GEDCOM in its current state.
In the past few months, three separate developments have been announced that, I believe, will lead to the next generation of genealogical data transfer standards. Genealogists should watch these carefully, giving input to the standards design process when appropriate. I call them, "GEDCOM, The Next Generation."
GEDCOM (FD)
On May 1, 1998, the LDS Church Family History Department announced a proposed GEDCOM standard to replace GEDCOM 5.5. Called "GEDCOM (Future Direction)" the draft standard was the topic of a presentation at the National Genealogical Society convention a few days later in Denver, Colorado. According to the Family History Department's Jed Allen who sent out the announcement message, "The genealogical information model underlying the GEDCOM 5.5 standard has approximately 80 entities (including records, repeating groups and multi-valued fields), and about 140 uniquely defined fields, many of which are used in multiple entities. By comparison, the genealogical information model underlying the GEDCOM (Future Direction) standard has approximately 300 entities and about the same number (140) uniquely defined fields. GEDCOM (Future Direction) allows developers to define more genealogical subjects, but does it for the most part with field definitions that we are familiar with. The two standards documents are roughly the same size by page count due in part to the improved normalization of the GEDCOM (Future Direction) standard."
According to Allen, the GEDCOM (FD) document is being proposed for discussion by the genealogical community. A version of the document is available for downloading at http://web.genealogia.fi/wwwhome/kaila/gedfevpr.exe . It is in WordPerfect Envoy format, which is a portable document format similar to but not compatible with Adobe's PDF. It comes with a stand-alone document viewer which allows readers to append notes and comments. Allen invited all interested parties to send their electronic comments to him as part of the review process.
According to one person who attended the presentation at NGS, LDS Church representatives predicted a gradual changeover to the new GEDCOM over the next two years.
GedXL
Meanwhile, several developers have been working with the new Extensible Markup Language being proposed for exchanging of structured information over the Internet. XML, as it is known, is a format for data exchange that is related to HTML, the hyper text markup language of the World Wide Web. Both are subsets of the SGML format that was developed for the publishing industry. XML is being developed to overcome the limitations of HTML and allow the Web to be used for electronic commerce, access to databases, and for many other computing and communication tasks.
Web browsers that can handle XML are not yet available, although Microsoft's Internet Explorer in version 4.0 already has some XML capabilities built in. Specialized versions of XML are rapidly being developed for various disciplines. For example, Microsoft has promised that Word, Excel, and Powerpoint, in their next versions, will save their files in an XML format for easy exchange over the web. Chemists, pharmacists, auto dealers, librarians, and experts in many other fields are developing XML "data type definitions" for their fields.
GedML, genealogical data in XML, was announced on April 15, 1998, by Michael H. Kay, a computer scientist in England. Kay's GedML uses the information data model of GEDCOM 5.5 to express genealogical information in XML browsable format. Kay published GedML at URL http://users.iclway.co.uk/mhkay/gedml/index.html and is seeking comments on his proposal. Kay sees an XML version of GEDCOM as a way of using industry standard techniques to solve the problems of GEDCOM compatibilities. He cited a number of benefits, including intelligent indexing of genealogical data by web-based search engines which can distinguish, for example, between Ireland as a country and Ireland as a surname. Instead of searching for sites that merely mention Ireland, you will be able to search for sites containing genealogical data referring to the surname Ireland.
While very little note has been taken of GedML since its introduction, it has the potential of completely eliminating the need for GEDCOM as we know it and particular the need for any future direction of GEDCOM.
Lexicon Working Group
In order for GEDCOM or even GedHL to work in the first place, everyone using these standards must agree on an underlying data model of how genealogical information relates to the real world. Officers of GenTech, the Texas non-profit organization that sponsors the yearly convention of the same name, have supported a genealogical Lexicon Working Group for the past several years with the purpose of defining and standardizing terms used by genealogists. While this might seem basic and unnecessary, it is vitally important to build higher structures like GEDCOM and GedML upon a stable foundation of terms.
Co-sponsored by the Federation of Genealogical Societies, the LWG has most recently been working on a data model for genealogical research that described for technologists the activities and thought processes that a genealogy researcher goes through when researching a genealogy. According to Beau Sharbrough, president of GenTech and a member of the LWG, the group intends to issue its report in August as an RFC, or request for comment, similar to the way Internet standards are developed and published. At least two of three GENTECH-sponsored sections at the FGS convention in Cincinnati, OH, this month will address LWG issues and their implications for genealogy programs.
Some information on the Lexicon Working Group is available at the GenTech web site, URL http://www.gentech.org/lexicon.htm . (It appears, however, that this has not been updated in the last six months. The FGS web site, www.fgs.org , does not mention this project.)
The Next Generation
It will be interesting in the next few months to see how these three proposals will slug it out in the genealogical arena. What is clear is that everyone agrees that the current data model for GEDCOM and for most genealogy programs is outdated. Genealogical data as presented today are merely assertions or conclusions with very little reasoning or support attached to the entries in a program. The next generation of genealogy programs, and some are already appearing, will force users to list their sources and reasoning for claiming dates, places, and relationships. Room will be given for noting alternate conclusions, citing multiple spellings, storing questionable but interesting data, and evaluating source material. This flexibility will make genealogy programs more difficult to use, but their data should more reliable.
A particular effort is going into enabling both internal and external reference links within a genealogical data file. This will allow later researchers to follow the references to their sources more easily, probably by clicking on a citation in a browser window. When this is possible, perhaps we will no longer need a dedicated genealogy program, but rather merely a web browser to view our genealogy. In that future day, when all genealogical data is online, supported, enriched with pictures, stories, and perhaps genetic data, we may simply sit and watch our ancestors' lives flow past on a three dimensional TV screen. When that comes about, I wonder if we'll think it is even interesting anymore.
References
- GEDCOM (Future Direction) Document Download from http://web.genealogia.fi/wwwhome/kaila/gedfevpr.exe
- GedML View at http://users.iclway.co.uk/mhkay/gedml/index.html
- GenTech/FGS Lexicon Working Group Charter is at http://www.gentech.org/lexicon.htm
More About XML and New Web Developments
- Byte Magazine's March 1998 Cover Story http://www.byte.com/art/9803/sec5/sec5.htm
- The SGML and XML Web Page http://www.oasis-open.org/cover/sgml-xml.html
- The World Wide Web Consortium's XML Section http://www.w3c.org/XML/