Decoding Soundex
Overheard in GenForum, November 14, 2002
Q: This question may be too simple to be true to most of you, but can somebody please explain what SOUNDEX is? -- Spitfire
A: One of the things about getting involved in genealogy is it has its own language. Like many professions and other hobbies, there are some terms that are unique to the hobby of genealogy and then there are other terms, that while not unique to the hobby, have been so completely incorporated into the hobby that we forget that it might be used elsewhere.
Soundex is one of those terms that seems to be owned by genealogists, but in fact is the creation of non-genealogists. While it was not created for genealogists, genealogists recognize the many uses for Soundex. Genealogy programs usually offer a Soundex feature and there are times when our computer searches incorporate a Soundex search.
What is Soundex?
Soundex is an index based on phonics rather than exact spelling. Through such an index, those surnames that sound alike, but are in fact spelled differently can be grouped together. It is this grouping that has made it such a useful index system for genealogists. Instead of having to compile a list of all variant spellings, and looking for each one individually, the researcher is given a variety of spelling variations together and needs only to concentrate on given names.
Contrary to popular belief, the Soundex was not created for genealogists. In truth the Soundex cards that genealogists use for the census years 1880 through 1930 were the result of one of the many programs that came out after the Great Depression in an attempt to make work for those educated individuals including teachers and writers who could not get a job in their profession.
The soundexing of the census began with the 1880 census because of a need to see who, in the 1930s, would be coming of age and needing Social Security. That is the reason that this Soundex was not of every household in the 1880 census, but only of those households with children aged ten or less.
In each case, Soundex cards were created to make something easier for the government. Either they were used to track those of certain ages or they were created to make it easier to locate individuals. In the case of passenger lists, Soundexing was done so that it was easier to verify the date and boat upon which an immigrant arrived when someone was going through the naturalization process.
The Code
What makes the Soundex work is the grouping of like sounding letters together and assigning them a number. Then the code is created using the first letter of the surname followed by a three letter code for the next three letters that are recognized by the code (vowels and those consonants that do not have hard sounds are not coded).
Represents the Letters
1 = B P F V
2 = C S K G J Q X Z
3 = D T
4 = L
5 = M N
6 = R
The table above is used to determine which letters get assigned which number. Notice that the letters A, E, I, O, U, Y, H, and W are not coded. We will look at these letters in a moment to see how they are handled.
To find the code for a surname, you start by writing down the first letter of the surname. The Soundex code always begins with a letter, and that letter is always the first letter of the name to be coded. You would then look at the next letter in the surname and find it in the table, recording the number. This step is repeated until you either run out of letters or you have three numbers.
If we were to code the surname Johnson it would look like J525. The J is the first letter of the surname. The O and the H are ignored. The N is coded to a 5. The S is coded to a 2. The O is ignored. The N is coded to a 5.
When you find a surname that has letters that are identical or that share the same code that are side by side, the second letter is ignored. So the surname Black would be coded as B420. The B for the first letter of the surname. The letter L is represented by code 4. The A is ignored. The letter C is coded by the 2. Because the letter K is next to C and it uses the same code number as the C it is ignored. Because you have run out of letters, the last number of the code is a 0 because the code must have a total of four characters.
We have seen that certain letters are ignored. The letters A, E, I, O, U, and Y are ignored, but they are treated as a separator. When you are coding and you come across one of these letters, you ignore the letter and code the next one. The vowels act as a separator so even if you had a surname with the letter combination NON you would still code both of the Ns because the O is between them. In the case of the H and W, though, they are ignored completely, as though were not even in the word. They do not act as a separator (this has often caused some confusion when coding a surname).
They Aren't There
While the Soundex code was intended to group like sounding names together, it does not work well with Eastern European surnames. They often have consonants that are silent when the surname is spoken, but because of the way the code is designed are included in the code. As a result it is often necessary to think of variant spellings or listen to the sound of the surname and try coding it by what you hear rather than the actual spelling.
Also you have seen that some letters are ignored. If you discover that the surname you are looking for does not appear, try recoding the surname treating the H and W as separators. While those who originally created the cards were not supposed to do this, it is always a possibility.
In Conclusion
The Soundex system of indexing offers us an index that groups certain variant spellings together. While it is a time saver, it is often necessary to look at alternative spellings that would also alter the Soundex Code and, of course, if the initial letter of the surname can by different, you will find that you must search each variant since the code relies on that initial letter.
Rhonda R. McClure is a professional genealogist specializing in celebrity trees and computerized genealogy. She has been involved in online genealogy for fifteen years. She is an award-winning author of several genealogy how-to books, including The Complete Idiot's Guide to Online Genealogy, The Genealogist's Computer Companion, and Finding Your Famous and Infamous Ancestors. She may be contacted at [email protected].
See more advice from Rhonda in her columns Expert Tips, Tigs and Trees, and Overheard in the Message Boards.