Tech Tuesday: Applied Research puts NARA “Out in Front” at NAGARA
At the National Association of Government Archivists and Records Administrators (NAGARA) plenary address in Nashville a few weeks ago, I was asked to talk about NARA’s new Applied Research Division, which wandered into an explanation about why we haven’t been ERA Research for the past two years. Folks were encouraged to attend my 1940 Census session, featuring NARA research partners who are using cool smart tools to make sense out of scanned images—there was not an empty chair in the room, leading to fruitful discussions and promising collaborations…and that’s what you missed at NAGARA!
Here’s the full story.
The National Association of Government Archives and Records Administrators (NAGARA) met in Nashville, TN on July 13-16. One of the best attended technology sessions – in my humble and unbiased opinion – was Friday’s session that I chaired, “The Way We Were: The 1940 Census,” which featured NARA archivist, Ms. Constance “Connie” Potter, and two NARA Applied Research partners, Dr. Kenton McHenry from University of Illinois at Urbana-Champaign (UIUC), and Dr. Richard Marciano from University of North Carolina (UNC), Chapel Hill (who also happened to present on the UNC DCAPE project the day before).
Connie led the session, as NARA’s expert who helps researchers access the census records. She explained how the traditional processes for searching through the past censuses changed with 1940, along with new questions and supplemental schedule.
Connie then pointed us to some information posted recently on the NARA web (link provided at the end of this post), to help folks prepare for the 1940 Census release next year on April 2. For example, you can read through the manual given to the census takers giving detailed instructions about the questions and properly filling out the forms.
Dr. Kenton McHenry, our second speaker, gave a lively presentation about the project he and his team are working on at the National Center for Supercomputing Applications (NCSA) at the University of Illinois. His presentation focused on the possibilities for free and searchable access of census records.
When the census forms are scanned, they are exactly that: images that do not provide any descriptive information such as who/what is in this image; you get a snapshot of a columns filled with questions and hand-written responses, but there is no way to search through the names or addresses — or other related info collected on the form, for that matter — on tens of thousands of scanned images.
After each decennial census is conducted (72 years has to pass for its legal release), the forms are scanned and made available by NARA. Once the images are released, commercial genealogy companies hire people to input names, addresses, and other data handwritten on the forms – a process which could take several months – then they make the information available through their searchable database on their website for a fee.
Take this illustration here on the left, as an example. It could take 4-7 months for people to read through text and interpret or recognize the information on the form (such as the word “Daughter,” in this example), then “fat-finger” – or manually type – words into a database.
But to be sure the word/meaning is accurate another person would have to confirm that the word indeed reads “Daughter” and is correctly spelled. Imagine the number of people who would have to intervene if someone had really bad handwriting!
Dr. McHenry’s team uses advanced technologies that enable computers to recognize parts of a scanned form – such as columns (last name, first name, address) or combinations of image patterns (such as handwritten words or numbers) to assign meaning with accuracy. They use open source tools, which ultimately means that access to the released data would be quickly available at no charge.
The third speaker was Dr. Richard Marciano from UNC’s Sustainable Archives & Leveraging Technologies Lab, also called SALT. He described a project where his team uses census and other data to create mashups that allow a researcher to visually explore and combine the data in ways never imagined before. For example, you could examine a map showing the geographic distribution of people of different races in a city, and overlay that with a map showing areas of discrimination.
Last September, for example, Richard presented at NARA’s 1940 Census Workshop, the SALT team’s “T-RACES” project. T-RACES provided an analysis of redlining practices conducted between 1932 and 1964 by home loan finance companies (including the FHA) using census data to exclude non-white families from receiving housing loans.
Click on the picture below to watch a video of the September presentation – but it’s about one hour long, so finish reading this blog post, then watch the video!
We then facilitated a discussion with the session attendees to show how NARA’s research efforts have and continue to address the challenges and needs of archives and records management communities like NAGARA. It was clear that the audience – including the genealogy service companies in attendance, was interested in the possibilities for improved access using technologies such as those presented by Kenton and Richard. In addition, participation in social media — such as volunteer crowd sourcing collaborations — can ensure that future censuses will be quickly and easily accessible at no charge to the public.
Here are some links related to this post:
- To prepare for the release of the 1940 Census next year, visit NARA’s web site for the 1940 Census.
- Follow all our blog posts on the 1940 Census!
- Learn about the National Association of Government Archives and Records Administrators NAGARA
- The 1940 Census Workshop, hosted by NARA on September 13, 2010
- Just posted! Pictures from the NAGARA Session C-14
- Join the Applied Research Facebook Group
We’d love to hear your feedback! Please leave your comments and questions below, or send an email to us at firstname.lastname@example.org