|
A GEDCOM file is a plain text file that can be read by any word
processor, but seldom is. That's because it was meant to convey
data from one computer program to another and so its content is
arranged in a sort of code. Experts can look into a GEDCOM file
and understand a family structure, but reading GEDCOM directly
is not for the faint of heart.
|
|
| A
GEDCOM file is a plain text file that can be read by
any word processor, but seldom is. |
|
|
The GEDCOM standard itself has evolved over the years to its current
level, version 5.5. Of course, how well various programs implement this
standard determines how successful a transfer of data will be. Problems
with varying field lengths, field names, and even types of relationships
have caused some users to complain that GEDCOM is not flexible enough
for their needs. Also, the latest generation of genealogy programs is
keeping track of far more types of data than the GEDCOM standard has
defined for transfers. For example, recording sources of information
associated with events or facts in a genealogy program is becoming important.
Yet, there is no uniform way to transfer this source information to
another program using GEDCOM in its current state.
In the past few months, three separate developments have been announced
that, I believe, will lead to the next generation of genealogical data
transfer standards. Genealogists should watch these carefully, giving
input to the standards design process when appropriate. I call them,
"GEDCOM, The Next Generation."
GEDCOM
(FD)
On May 1, 1998, the LDS Church Family History Department announced
a proposed GEDCOM standard to replace GEDCOM 5.5. Called "GEDCOM (Future
Direction)" the draft standard was the topic of a presentation at the
National Genealogical Society convention a few days later in Denver,
Colorado. According to the Family History Department's Jed Allen who
sent out the announcement message, "The genealogical information model
underlying the GEDCOM 5.5 standard has approximately 80 entities (including
records, repeating groups and multi-valued fields), and about 140 uniquely
defined fields, many of which are used in multiple entities. By comparison,
the genealogical information model underlying the GEDCOM (Future Direction)
standard has approximately 300 entities and about the same number (140)
uniquely defined fields. GEDCOM (Future Direction) allows developers
to define more genealogical subjects, but does it for the most part
with field definitions that we are familiar with. The two standards
documents are roughly the same size by page count due in part to the
improved normalization of the GEDCOM (Future Direction) standard."
According to Allen, the GEDCOM (FD) document is being proposed for
discussion by the genealogical community. A version of the document
is available for downloading at http://web.genealogia.fi/wwwhome/kaila/gedfevpr.exe.
It is in WordPerfect Envoy format, which is a portable document format
similar to but not compatible with Adobe's PDF. It comes with a stand-alone
document viewer which allows readers to append notes and comments. Allen
invited all interested parties to send their electronic comments to
him as part of the review process.
According to one person who attended the presentation at NGS, LDS Church
representatives predicted a gradual changeover to the new GEDCOM over
the next two years.
GedXL
Meanwhile, several developers have been working with the new Extensible
Markup Language being proposed for exchanging of structured information
over the Internet. XML, as it is known, is a format for data exchange
that is related to HTML, the hyper text markup language of the World
Wide Web. Both are subsets of the SGML format that was developed for
the publishing industry. XML is being developed to overcome the limitations
of HTML and allow the Web to be used for electronic commerce, access
to databases, and for many other computing and communication tasks.
Web browsers that can handle XML are not yet available, although Microsoft's
Internet Explorer in version 4.0 already has some XML capabilities built
in. Specialized versions of XML are rapidly being developed for various
disciplines. For example, Microsoft has promised that Word, Excel, and
Powerpoint, in their next versions, will save their files in an XML
format for easy exchange over the web. Chemists, pharmacists, auto dealers,
librarians, and experts in many other fields are developing XML "data
type definitions" for their fields.
GedML, genealogical data in XML, was announced on April 15, 1998, by
Michael H. Kay, a computer scientist in England. Kay's GedML uses the
information data model of GEDCOM 5.5 to express genealogical information
in XML browsable format. Kay published GedML at URL http://users.iclway.co.uk/mhkay/gedml/index.html
and is seeking comments on his proposal. Kay sees an XML version of
GEDCOM as a way of using industry standard techniques to solve the problems
of GEDCOM compatibilities. He cited a number of benefits, including
intelligent indexing of genealogical data by web-based search engines
which can distinguish, for example, between Ireland as a country and
Ireland as a surname. Instead of searching for sites that merely mention
Ireland, you will be able to search for sites containing genealogical
data referring to the surname Ireland.
While very little note has been taken of GedML since its introduction,
it has the potential of completely eliminating the need for GEDCOM as
we know it and particular the need for any future direction of GEDCOM.
Lexicon
Working Group
In order for GEDCOM or even GedHL to work in the first place, everyone
using these standards must agree on an underlying data model of how
genealogical information relates to the real world. Officers of GenTech,
the Texas non-profit organization that sponsors the yearly convention
of the same name, have supported a genealogical Lexicon Working Group
for the past several years with the purpose of defining and standardizing
terms used by genealogists. While this might seem basic and unnecessary,
it is vitally important to build higher structures like GEDCOM and GedML
upon a stable foundation of terms.
Co-sponsored by the Federation of Genealogical Societies, the LWG has
most recently been working on a data model for genealogical research
that described for technologists the activities and thought processes
that a genealogy researcher goes through when researching a genealogy.
According to Beau Sharbrough, president of GenTech and a member of the
LWG, the group intends to issue its report in August as an RFC, or request
for comment, similar to the way Internet standards are developed and
published. At least two of three GENTECH-sponsored sections at the FGS
convention in Cincinnati, OH, this month will address LWG issues and
their implications for genealogy programs.
Some information on the Lexicon Working Group is available at the
GenTech web site, URL http://www.gentech.org/lexicon.htm.
(It appears, however, that this has not been updated in the last six
months. The FGS web site, www.fgs.org,
does not mention this project.)
The
Next Generation
It will be interesting in the next few months to see how these three
proposals will slug it out in the genealogical arena. What is clear
is that everyone agrees that the current data model for GEDCOM and for
most genealogy programs is outdated. Genealogical data as presented
today are merely assertions or conclusions with very little reasoning
or support attached to the entries in a program. The next generation
of genealogy programs, and some are already appearing, will force users
to list their sources and reasoning for claiming dates, places, and
relationships. Room will be given for noting alternate conclusions,
citing multiple spellings, storing questionable but interesting data,
and evaluating source material. This flexibility will make genealogy
programs more difficult to use, but their data should more reliable.
|
A particular effort is going into enabling both internal and
external reference links within a genealogical data file. This
will allow later researchers to follow the references to their
sources more easily, probably by clicking on a citation in a browser
window. When this is possible, perhaps we will no longer need
a dedicated genealogy program, but rather merely a web browser
to view our genealogy. In that future day, when all genealogical
data is online, supported, enriched with pictures, stories, and
perhaps genetic data, we may simply sit and watch our ancestors'
lives flow past on a three dimensional TV screen. When that comes
about, I wonder if we'll think it is even interesting anymore.
|
|
|
A
particular effort is going into enabling both internal
and external reference links within a genealogical
data file. This will allow later researchers to follow
the references to their sources more easily.
|
|
|
References
More
About XML and New Web Developments
|