12 December 2023

How to Batch Process Your Genealogy Documents

I spent 24 years coding websites before I retired. Now I apply those job skills to genealogy. I was faster than my colleagues because I found ways to be more efficient.

This past weekend I added 114 military records to my family tree. I would have doubled that number, but the website they come from dies every day at 2 p.m. Eastern Time. (Do they unplug the router when they go home?)

These records include a ton of facts about each soldier, but the key facts I'm after are when, where, and how he died. Adding each of these invaluable records to my family tree has many steps:

  1. Search for the soldier on the website (in this case, it's the website of the Benevento State Archives in Italy).
  2. View his page of details and download a PDF with the record image.
  3. Extract the image from the PDF. (This is a function of Adobe Acrobat.)
  4. Edit the image in Photoshop for top quality and a consistent image size (1500 pixels wide).
  5. Create a source citation from a template I created.
  6. Add the citation and a title to the image's document properties and drag it into Family Tree Maker.
  7. Create the death fact for the soldier, add the same source citation, and attach the image to it.
  8. Add the date of death and a category (Military) to the image.

Now do that 113 more times.

At first, I didn't realize the site was crashing at 2:00 each day, so I was working as if it might crash at any moment. To borrow a computer programming term, I started batch-processing the military records.

You'll be faster, more efficient, and more professional with this genealogy document-handling method.
You'll be faster, more efficient, and more professional with this genealogy document-handling method.

Real batch processing means one computer program automates a series of tasks over and over. In this case, I suppose I'm the computer, running the 8 steps above on soldier after soldier. Doing it this way ensures that:

  • All my military record document images have consistent quality.
  • All the source citations for these records follow the same format.
  • None of the 8 steps are skipped.

For this project, I have one more ace in the hole. The website has these documents for every man from the Benevento Province who died in World War I. First I made a list of every document for each of my ancestral hometowns. From these lists I created one spreadsheet of 274 soldiers. That tells me exactly who I'm searching for each time. I added a column where I can mark which documents are now in my family tree.

I came up with a way to cram in as many documents as possible before the site crashes each day. I search for and open the summary pages of 6 soldiers in different tabs. I immediately download each man's PDF file and label them consistently. For example, AutoreGiuseppe1875MilitaryRecord.pdf. That's last name, first name, year of birth, military record.

Next, with the 6 tabs still open, I open each PDF file one at a time and extract the images. I use an old version of Abobe Acrobat Pro where the command for this is File > Export > Image > JPEG. Then I drag and drop all the images into Photoshop. For each one, the process is this:

  • Image > Auto Color. For some reason, the documents all look very yellow. Auto Color makes the paper white, the ink black, and the rubber stamps blue. That's how they looked when I saw several of them in person.
  • Image > Auto Contrast. This makes the ink a bit darker and the paper a bit whiter.
  • Export As. Here I can reduce the file size by entering a consistent image width of 1500 pixels.

Now I have 6 document images waiting for their source citations. The details for the citations are on the 6 open tabs in my web browser. Here's the format I'm using:

From the Benevento State Archives, military records, fallen soldiers; register #75, record #4292, class #1893
http://archiviodistatobenevento.beniculturali.it/index.php?it/209/ricerca-caduti/caduti/2183
http://archiviodistatobenevento.beniculturali.it/ImgDb/Riproduzioni/ASBnRm075_04292.pdf

When I went to the archives to see my grandfather's record, all I needed was the register number, record number, and class number. These are the critical facts.

The first URL in the citation is the page that's open in those 6 tabs. The second URL (found on that page) is for the PDF itself. The register and record number are on the page, and they're also part of the PDF's URL. The class # is the soldier's year of birth.

One at a time for the 6 open tabs, I:

  • Find the soldier in my tree and add his death fact.
  • Create the source citation and put it in the image's file properties.
  • Drag the image into Family Tree Maker and make it his profile picture unless he has a better one.
  • Add the source citation to the death fact and attach the image to it.
  • Add the date of death and a category to the image.

When you batch process any type of document in this way, you achieve a level of professionalism. As you're doing it, you'll find yourself getting into a groove that lets you move faster through the steps. Once those steps become familiar, you can process a group of documents faster than you ever imagined.

My master spreadsheet contains 105 men who probably aren't in my family tree. Yet. They came from towns I haven't completely documented. (To see what I mean, read "How to Create and Share Your Ancestral Town Database.") That means I should have 55 more military records to add using this batch process. I'm sure I can get that done in one or two more sessions using this method. (In fact, I finished in one session!)

Keep batch processing in mind when you're tracking down any type of document. When the NYC Municipal Archives put their vital records online, I downloaded so many documents. I created a citation template and fixed each image's color, contrast, and size. This added a ton of value to my New York City ancestors. Here's another look at the idea: "Step-by-Step Source Citations for Your Family Tree."

Imagine the consistency you can achieve if you handle all your census records this way. Or ship manifests. Or newspaper clippings. Think through your process for each document type, including what to add to your family tree. Go through it step by step, then repeat for all the same types of documents. Now you've done some truly professional genealogy work.

2 comments:

  1. Excellent advice. Like you, I pull certs from the NYC Muni Archives site all the time. Instead of pdf, I do a screen grab of the cert only so it can be uploaded to my tree(s) and easily visible.

    ReplyDelete
    Replies
    1. It was like a miracle when I discovered that Acrobat had an Export Image function. It does a perfect job--very high resolution!

      Delete