Site menu:

Categories

Links:

Archives

Subscribe2

Researcher concerns about the digitized records available on the web sites of NARA’s digitization partners

by on May 13, 2010


Recently on NARAtions we heard from researchers who expressed concerns about the digitized records available on the web sites of NARA’s digitization partners.  We shared these concerns with NARA’s Access Programs office, NARA’s Digital Strategies and Services Staff, and with our digitization partners and we would like to respond to these concerns here on NARAtions.

Since 2007 NARA has entered into three major digitization partnerships with Ancestry.com, Footnote.com, and Genealogical Society of Utah (Family Search).  Under these agreements, the partners have created approximately 60 million digital copies of NARA records.  There are also an additional 70 million images of NARA records on Ancestry.com that Ancestry created prior to entering a digitization agreement with NARA.  Ancestry.com digitized, indexed, and placed these images online using NARA microfilm publications that are available to anyone by purchase from NARA.  This was strictly the work of Ancestry, with no involvement, oversight, or quality assurance work by NARA.

NARA takes the concerns raised by researchers seriously.  We are working with our partners to improve their digital products, including those produced before the partnership agreement, as problems come to our attention.  Our partners want to rectify errors and are cooperating in doing so, though some of the issues are difficult to resolve in a seamless and timely manner.  In these difficult cases, the partners post advisory notices to alert users to the anomalies.  There are two examples of issues affecting the browse structure that have been rectified by Ancestry at NARA’s request due to researcher input.  In the first example, the ship name portion of the browse structure for New York passenger arrivals, on November 13, 1893, did not include the ship Etruria.  That has been corrected.  The second example involves a misspelled township in the 1920 Pennsylvania census. Throop township in Lackawanna County was spelled as “Throap.”   Because of this, the search function did not work for this township.  That has been corrected.  We have pointed out to Ancestry that the browse list of townships still contains the misspelling, so we have asked them to correct it.  This does not affect the ability to browse the township, but we all want it corrected nonetheless.

Both the partners and NARA are involved in several aspects of quality assurance work with regard to projects under the digitization partnerships.  The quality assurance work relates the images, metadata, content completeness, and final transfer to NARA.  There are four main areas of quality control (QC):

  1. QC of imaging is the responsibility of the partner, following standards reported to, and approved by, NARA.  The precise standards are proprietary information.
  2. QC of metadata is the responsibility of the partner, following standards reported to, and approved by, NARA.  The precise standards are proprietary information.
  3. QC of content is the responsibility of NARA – Specifically, NARA does a page-by-page review against a five percent sample of the original records to find and identify information which might have been left out, such as the back of a document that has only a stamp or small notation.  All such information has to be captured.  (Higher levels are reviewed if quality concerns surface during review.)  The partner corrects any omissions found in the review. Skipped pages are imaged and inserted into the images folder at the correct location.
  4. QC relating to transfer of digital materials to NARA – The partners send the digital materials to NARA on hard drives.  NARA staff checks a sample of the images and metadata to verify that the metadata on each hard drive is associated with the correct image and that the metadata the partner agreed to provide is delivered. The staff also checks a sample of the unique identifiers associated with each image to verify that the identifiers are correct. If there are problems with the metadata or images sent by the partner, NARA contacts the partner to resolve the problems.

At the March meeting of the Researcher Users Group at the National Archives in the Washington, DC Area, some researchers requested the ability to report partner website problems directly to NARA, so we can be involved in trying to resolve them.  We agreed and indicated that users may report specific matters to us, though obviously this is not a requirement.  Anyone choosing to report to us may write to digitization@nara.gov.   Problem reporting is an option that we invite you to exercise when you come across specific problems, if you wish to use it.  It is not a substitute for quality control, but rather an additional avenue for improving quality.  With approximately 130,000,000 digitized images of NARA records on partner websites, some errors are inevitable.  We are happy that some researchers want to help everyone by letting us know of problems, and we look forward to helping resolve them.

NARA recently posted a list of the digitized records available on the partners’ web sites at the request of researchers. The “status” column on the list has been deleted in response to a concern that we might be giving a false impression that some digitized publications contain all of the records in a publication.  We certainly did not intend to do that.  We had included a “status” column to indicate the status of the digitization work because some publications on Footnote were still being digitized even though images were available online.  The status of “complete” for publications was simply meant to indicate that the partner had completed its digitization work on the publication.

We agree that the statement in the lengthy Family Search press release from 2007 about access to records held by NARA may not have been as clear as possible. Records held by NARA which aren’t restricted due to legal or preservation restrictions are available to anyone.  NARA recognizes however that not everyone can visit one of our 30 NARA research rooms in person and a researcher at any one of these research rooms is limited to the records available at that facility only.  Providing online access to digitized copies of records gives those who cannot readily visit NARA the ability to research our holdings. NARA’s own press release related to its partnership Family Search states that “Digitization makes possible unprecedented access to the unique historic documents in the custody of the National Archives…  These records [Civil War and later pension files], of great interest to genealogists and others, are currently available only at the National Archives Building in Washington, DC.”

Thank you to all of the researchers who have taken the time to comment about our digitization partnerships.


Comments

Sharon Elliott May 13, 2010 at 6:49 pm

E-gads, what a bunch of nit-picky whiners. I love having the partners provide access and know that digitization and indexing are not perfect. I really cannot fathom what the agenda is of the complainers.

Debra Overbey May 14, 2010 at 10:10 am

E-gads, my complaint is that if I don’t know how someone indexed Samuel, I will miss 39 individuals in the Ancestry 1860 Federal Census database that someone indexed as “Sanil.” And that is only one of the record groups where the commonly used abbreviation of “Sam’l” is indexed in this way. I do not have time to send corrections for nor to list in this post every mistake that I have found. This is just one of my problems with using materials produced in partnership with non-NARA sources. I am sure that mistakes in any derivative source will always occur, simply because of human error BUT don’t tout these sources as the ultimate. Every entity (especially those who are charging for a quality product) should be held accountable for producing what they claim. NARA, and as a result the taxpayer to whom these records actually belong, is not holding the partners responsible. THIS is what the nitpicking is about. I cannot go to any NARA repository easily because of my location. If NARA is allowing these public records to be digitized and placed online by commercal services to make them more accessible for everyone, then NARA should require more stringent quality control to ensure accuracy.

Peggy Reeves May 15, 2010 at 3:14 pm

Thank you so much for taking the time to promptly investigate our concerns as researchers! Your very detailed answer is appreciated, but it also shows that NARA is not fully aware of the extent of the problem with the digitization.

I have seen the “advisory notices” that are included in the explanations of the online databases. T288 is one of the most in-demand microfilms in the research room, obvious to anyone who observes the traffic in and out of that cabinet on a daily basis. Yet the online version at ancestry is only about 70% complete. Interestingly, the “advisory notice” used to say: “10% missing”. Within the last couple years, it was changed to say that only 1% of that index is missing, but I can’t see what has changed. I keep hearing that it is the Navy cards that are missing. Yes, those are missing, but so are a whole LOT of others! NARA needs to be honest about what their partners have done (and not done). Thank you for taking the first step in this by removing the column that says “100% complete” from the list of digitized films. That is a great start.

Thanks for the quality control explanation. It seems to verify that the army of untrained volunteers being recruited from the FamilySearch website to make the indexes to the scanned images of NARA records (that customers then pay to see on the subscription sites) are supposed to do their own proofreading. Is that correct?

The huge army of volunteers are creating bad indexes at a speed that would be mathematically impossible for the legitimate researchers to keep up with. New mistakes due to poor training, lack of experience, lack of work ethic, and no supervision multiply exponentially while researchers spend hours searching databases trying to find a scan that may or may not be there. If it’s there, it may not ever be found due to the ridiculous way that it was indexed.

Try going to the ancestry Civil War pensions (T288) and scan for just the last name “Massey”. Look at the first name on the list. Have you ever heard of anybody with a first name of: “??Aner T. Massey”? Click on the image of the card. Can you read it? It looks like “Clarence” to me…even though the person who did the scanning cut off part of the top of the card. Are we to believe that both the scanner and the indexer proofread their work and decided that this was all okay, with no need for anyone to go back and do it right? Obviously, both should be fired…oops, can’t fire them, they are volunteers, so we have to take whatever we get, and pay for it, too! Now go to the last page of Masseys, and look at the last six names on that page. Can’t even guess at a soldier’s name for those, can we? The person doing the “cutting edge scanning” cut off the top line of the card that contains the soldier’s name. So how can we find him if we do a general scan of his name when he doesn’t even have a name at ancestry?

How about 1812? How would you get a list of all men from the same regiment in 1812? You should be able to go to the ancestry 1812 service record database and put in the Captain’s name, along with the state as a keyword, and get a list of all of them, right? Not even close! I found this out when I was researching Captain Tinnen’s Regiment in MO. You get a list, but how do you know you have them all? I only got 89% of them when I tried it. I had to order the entire regiment of service records and get the names from the file jackets myself. I tried to figure out why some of them didn’t come up on ancestry’s list. I put in the names of the missing men and found that for some of them, ancestry had just plain copied the spelling wrong from the file jackets (which are very readable). In other cases, a space was accidentally typed before a name or a regiment, so that scanning for the name, such as “Smith”, won’t come up unless you hit the space bar once before typing “Smith”. Also, the column that shows the name of the regiment wasn’t put in as a heading, it was typed in by different individuals who worded it differently each time, each one adding various typos as they went along. Every time there was a typo in “Tinnen” or whichever keyword you use, those men won’t come up on your scan of that particular keyword. This is true across the board on any database that you are using.

Do you now see the extent of the problem and the dire need for painstaking proofreading to be done by people who actually care? How can anyone possibly list all of this? I could go on and on, but I fear that I already have.

You are correct that NARA’s press release about the partnership with FamilySearch was fine. However, the line that I previously quoted from the FamilySearch press release which you stated was “not as clear” as it could have been, was not the least big ambiguous or unclear. On the contrary, it was crystal clear to anyone who speaks English, and FamilySearch should print a retraction and an apology.

I hope that NARA will soon explore possibilities and overcome the obstacles to hiring their own professional staff to do such important digitizing, indexing, and preservation projects, instead of contracting it out to other companies who use apathetic, unsupervised volunteers. What we have now is loss of records, and it is a shame.

Rebecca May 19, 2010 at 2:15 pm

Thank you again for taking the time to express your concerns about the digitized NARA records on Ancestry.com and Footnote.com. We shared additional comments with NARA’s Access Programs office, NARA’s Digital Strategies and Services Staff, and with our digitization partners. NARA is aware that there are issues with several of the digitized publications on our partners’ web sites – the vast majority of which were digitized prior to our partnership agreements. When issues relating to these pre-partnership efforts come to our attentions, we pursue them with our partners and work with them to implement solutions. Records that are being digitized under our partnership agreements are subject to quality control by both NARA and the partners and there is both training and supervision of those doing the digitization.

You mentioned particularly the digitization of the General Index to Pension Files (NARA microfilm publication T288) and of the 1860 Census (NARA microfilm publication M653). Both of these microfilm publications were purchased and digitized by Ancestry before NARA had a digitization agreement with them. Consequently NARA was not involved in the creation of the digital copies of these records or in the quality assurance work for these publications. We agree that more than one percent of T288 is not available on Ancestry and have asked Ancestry to update its advisory notice. We will also ask Ancestry about the indexing of the 1860 Census.

We understand completely the importance of indexes to online research and will continue to work with our staff, our volunteers, and our partners to emphasize accuracy and completeness of data associated with digitized records.

Astrid Willis June 15, 2010 at 2:01 pm

Thank you for NARA & Peggy Reeves on May 15th, 2010 for adding the exact conserns that I have re records on NARA & Ancestry.com. I am researching my husbands family on his Mother & Fathers side for our 6 children & our coming along grand children & GGG kids. I am fortunate to have my in-laws family names down apprx. 1860 & 1870. But i am stumbling along after that, do to various types of spelling their names & also incomplete records. How would you like to have your Mom born in 1832 & listed as Indian Native from Alabama? On some census’ the same person are listed as Mulatto other places as Black.

I would like to recommend that a law be passed that all records concerning people descending from Native American Indians & African Americans be free of charge to family researches. As it is now the stumbling blocks are numerous, but I refuse to give up.

I am fortunate to have the records for my side of the family, since in Norway all Genological records can be researched free of charge. Thank you for allowing me to comment.

Rebecca June 16, 2010 at 7:35 am

Astrid – Thank you for sharing your research experiences. The National Archives has a number of resources available on our web site that may be helpful in your research on Native American and African American family history. You can also send research questions to our reference staff at inquire[at]nara.gov.

- Rebecca

Julia August 9, 2010 at 10:32 am

I am very happy to read about the NARA effort to reign in partner’s errors. While looking at passenger manifests for ships landing in San Francisco in 1956, I found quite a few manifest pages that were filed under the wrong ship on ancestry’s site. Since I was using my library’s ancestry membership, however, I couldn’t notify the company of these errors (needed to supply the library’s contact info, but only one librarian knows about the account & he wasn’t there – sigh). I also suspect that there are ships that aren’t included on ancestry for this time period (including the one I need) if the shipping news columns in the newspaper are an indication.

These are just a few of the enormous number of errors I’ve personally found on ancestry. It sounds like Commenter #1 has been lucky in her researches, but I suspect that my experience more closely matches those of others. It’s good to hear that help is on the way.

Peggy Reeves August 10, 2010 at 11:32 pm

It has now been nearly three months since Rebecca told us (in her posting directly above Astrid’s) that: “We agree that more than one percent of T288 is not available on Ancestry and have asked Ancestry to update its advisory notice.”

Now, nearly three months later, if you go to ancestry and click on “search” and then “military” and then scroll down the inset menu box to find “Civil War Pension index: General Index to Pension Files” (which is T288) and click on that, you can read ancestry’s advisory notice without a subscription, and see that they have apparently not found the time, in three months, to change that one little number. They are still shamelessly lying to their subscribers and potential subscribers by saying that only 1% of that index is “missing”, when the percentage is actually closer to 25-30%, even after this has been brought to their attention again and again, including on this blog and by NARA officials. If you were a researcher whose primary interest was the Civil War, would it be worth buying a subscription to Ancestry if only 1% of that database was missing? Would you still be willing to pay to subscribe if you knew how much was REALLY missing? Ancestry says nothing and NARA says, in essence, Oops, we’ll talk to them about it (wink, wink)!

Those of us non-partners who do on-site research at NARA in D.C. were told at a user’s meeting that the “partners” have to fix errors brought to their attention within 30 days or a “reasonable” time. How many months is “reasonable” to correct one little number on one database explanation?

My question to NARA is, what are you going to do about this? When are you going to hold your “partners” responsible for telling the truth about your records by restricting their access to more records until they shape up and fix what they already claim to have, instead of looking the other way while the general public who own the federal records continue to be deceived by these people?

Ancestry says they have all of the U.S. census pages from all of the years completely indexed. Yet they were so badly indexed by people who were barely literate in English, that Familysearch is still recruiting volunteers through their website to index them again. Ancestry is telling the general public that the census is “complete” out of one side of their mouth, but apparently asking Familysearch to recruit volunteers to index U.S. census from the other side of their mouth! Which is it, Ancestry? Are you done with the census or not?!! Make up your mind and tell the TRUTH for a change! Do you think it’s appropriate to put the little red “updated” tag beside the census years that you are re-doing, leading people to believe that perhaps you are adding new census pages instead of just cleaning up the massive mess that you made of it the first time?

Familysearch still recruits anybody anywhere in the world to do indexing work from home (see their website main page). The nonprofit company, Familysearch, then brags about making these indexes available for free, but you won’t see the images for free. They hand those over to their huge for-profit “partners” (see press releases at their various websites), Ancestry and Footnote, to charge big subscription fees for the general public to access the images.

The same “partners” who are not telling the truth are the very people who have special access and do not even have to be searched or go through security when entering and leaving the NARA building, able to freely walk out with whatever original documents they want, while the rest of us have to leave early to allow an outrageous amount of time to be searched TWICE.

Is it prudent to trust these people not to walk out with original documents? Have they made a reputation for them selves of being honest and trustworthy?! Has Familysearch ever apologized for the press release saying that they are putting NARA records online that you can’t even see if you show up at NARA in person? Any on-site NARA researcher recognizes this as a lie, so why doesn’t NARA?

Rebecca August 12, 2010 at 2:47 pm

Peggy – We regret that the change to the T288 advisory notice hasn’t already been made. We emailed Ancestry on May 12th regarding the advisory notice for T288. I emailed our contact at Ancestry again yesterday and have asked him for a date by which we can expect the change to be made. Once we have that date we will post it here.

You commented on the security procedures for the volunteers who work at the National Archives Building in Washington, DC. The individuals participating in NARA’s partnership with FamilySearch.org are members of NARA’s volunteer corps and are subject to the same security measures as our other volunteers. In order to be a NARA volunteer, they have met several requirements including undergoing the NACI background check that all NARA staff and volunteers undergo and completing our volunteer training program. The complete requirements for becoming a NARA volunteer can be found on our web site at http://www.archives.gov/careers/volunteering/metro-dc.html#required

NARA wants the projects conducted under our partnerships to be as useful to online researchers as possible. For that reason, we continue to stress the importance of accuracy and completeness in our partnership projects to all of those who are working on them including NARA staff, volunteers, and our partners.

Thank you again for sharing your concerns with us.

- Rebecca

Rebecca August 12, 2010 at 4:28 pm

Peggy – I’m happy to report that Ancestry has updated the advisory notice for T288. The revised text reads: Please Note: Due to deficiencies in the microfilms of the original source cards (i.e. faded, illegible, etc.), a small percentage of the pension cards were not included in this index, and may be re-scanned and included at a later date if legible digital scans can be created.

You had mentioned that you think between 25-30 percent of the images on T288 are missing. We talked with our reference staff about this and they estimate that less than 10 percent of the images are not included online.

-Rebecca

Peggy Reeves August 12, 2010 at 9:30 pm

What method did your reference staff use to estimate what is missing from Ancestry’s version of T288? I looked up some surnames that had a good number of cards (maybe 200-300 cards for one name). I counted the number of cards that were on the NARA microfilm for each particular name, and then looked to see how many of those cards could be found at Ancestry. Consistently, 25-30% were missing. That number is not a guess on my part, it is an objective finding based on that sampling, and I don’t think any reasonable person, certainly not an honest person, calls 25-30% a “small number” to be missing from an index that people are paying to use! Some cards may have been scanned but will never be found because of a typo during the indexing.

Me and other “regulars” at NARA use T288 more than anything else, and are constantly finding missing names and nonsensical spellings of names any time we bring up a list at Ancestry. Clearly they are rushing the product to market without ever proofreading anything.

Initially, that “advisory notice” on T288 said that 10% of the images were missing. Then Ancestry abruptly changed it to 1% for reasons unknown. Now it’s “a small number”. This is dishonest. They continue to delude subscribers, with NARA’s blessings. If you were missing 25-30% on tests, depending upon the school, you would probably have to repeat the class! Ancestry needs to repeat this oft-used database and do it right this time.

When confronted withT288 at a meeting, Mr. Hastings said the “missing” stuff is the Navy cards, which were too dark to scan. This is correct, but that is only a small part of what is missing. The people doing the scanning apparently didn’t take the time to check the regimental index or the numerical index to pensions to at least get a name to enter for the Navy cards or any others that didn’t scan well or were illegible. Of course that would have taken too much of their valuable time and eaten into their profits.

Some of the Navy cards are illegible on the microfilm, but since T288 is in ALPHABETICAL order, a last name could have easily been entered into the index with the notation that it is a Navy card, so that people scanning for that name would know that there is, indeed, a Navy pension (or 2 or 3) for that surname, even if they can’t see the image of it. Also, if Ancestry purposely didn’t include any of the Navy cards, then the database should have stated from the beginning that it is T288 MINUS all of the Navy pensioners.

That brings up another big problem, which is the fact that the volunteer labor force for the profit-making “partners” that you trust so much are not researchers at all and seemingly have no idea how to appropriately cite a source. A client recently sent me an item that she printed out from Ancestry. It shows an image of a WWI draft registration card. Nowhere on the printout does it give the correct citation, which would be NARA Record Group 163, microfilm M1509, roll # (NARA has rolls 1 thorough 4,277 in this set). Instead, it prints out the “source information” as being the Ancestry database of WWI draft registration cards, roll #1684364. It doesn’t say WHO’S roll number, as if the microfilm belongs to Ancestry! This is an LDS church roll number, not a NARA one. Nowhere on the printout does it say that the custodian of the original paper is NARA, and it was microfilmed at NARA, and the LDS church used the same microfilm, putting their own roll number on it, which is different from NARA’s. The database explanation does credit NARA (though they don’t give the record group), but NARA does not appear anywhere on the printouts that people make from home.

Footnote also uses “volunteers” recruited by Familysearch, and they have similar issues. They have several databases for the Southern Claims. They use NARA’s explanation for the various databases for barred and disallowed claims, and for the approved claims, but they also have several places where you can click for “more information” about the Southern Claims. One of them is Dick Eastman’s newsletter, where Mr. Eastman says: “Footnote.com’s online Southern Claims Commission database contains images of every claim and all accompanying paperwork.” Mr. Eastman apparently doesn’t use these records, and his bias is obvious. This isn’t even CLOSE to being the truth, however!

In addition to Familysearch having to pay for a NARA staff person to be at their offsite location, they should also have to pay for some auditors who are actually researchers, to clean things up, and penalize them when they are caught cheating their subscribers. NARA owes it to the taxpayers who are paying for subscriptions. Subscribers expect the databases to be complete and well-done, and appropriately described and cited, especially when they CLAIM to be complete!

How does NARA penalize the partners in order to have incentive for them to stop lying? The subscribing public would like to know.

Rebecca August 14, 2010 at 10:39 am

Peggy – I think you have some great ideas for how to handle materials that do not digitize well. We will keep them in mind as we work on future projects and will also pass them along to our partners. The staff member who estimated that less than 10% of the images were missing from the digitized version of T288 has retired.

Regarding the M 1509 WWI draft registration cards citation on Ancestry, the Ancestry website states that the source for digitization was microfilm at the Family History Library (FHL). This explains their use of the FHL roll number in the citation. Ancestry digitized M 1509 from microfilm at FHL, without any involvement from NARA as either a seller of microfilm or a partner.

I also generated a printout of an image from M1509 on Ancestry. In addition to the Source Citation, my printout included information about the NARA microfilm publication in the Source Information in addition to the FHL roll number. Ancestry’s source citation for many other microfilm publications includes the NARA microfilm publication number and the roll number as well. We will follow up with Ancestry to clarify how they develop the citations.

We agree that projects NARA undertakes with its partners should be well done. For that reason all of our projects include requirements for quality control by NARA and by the partners (see our original post above). NARA requires the partners to correct problems found during the quality control and assurance reviews of these partner projects, and the partners do make those corrections.

Thanks again for your suggestions.
- Rebecca

Subscribe to Email Updates