22 November 2022

When Is a Genealogy Harvest Too Big?

Halfway through International Genealogy Loose Ends Month I faced up to a big problem. (See "Make November Genealogy Loose Ends Month.") My Family Tree Maker file is too bloated with 57,827 people. It can take 3 hours for me to compact the file, which is an important maintenance step. I have to leave my computer running overnight so my files can upload to the cloud.

Something's got to give!

Early last week I was fixing existing images in my family tree—not adding new ones. I edited every World War I and II draft card to crop out the black space. I love the results! (See "How to Improve Your Digital Genealogy Documents.") I replaced bad images with good ones of a smaller file size. That's a worthwhile task, and I planned to move on to bad census images next.

Then I remembered a loose end from earlier this year. The New York City Municipal Archives released vital records for the city's boroughs, and I have tons of relatives from the city.

You've hit the jackpot in vital records for your family tree. Can you accept them?
You've hit the jackpot in vital records for your family tree. Can you accept them?

In my family tree, whenever possible, I used the NYC vital record indexes on Ancestry.com to note certificate numbers. For example, my grandmother was born in the Bronx in 1899. In the description field for her birth, I added "Bronx birth certificate #3072." I did that for every birth, marriage, and death record from the city when possible. (See the "Day 5" section of "7 Days to a Better Family Tree.")

This past week I spent a day gathering 172 documents from the Municipal Archives' website. The most efficient way to tackle this task was to use the latest GEDCOM file exported from my family tree.

I opened my GEDCOM in a text editor and searched for:

  • Bronx birth certificate
  • Bronx death certificate
  • Bronx marriage certificate

…doing that for Manhattan, Brooklyn, and Queens as well. The certificates are downloadable as PDFs, but I can export the certificate from Acrobat as an image. Two-page certificates export as two images. Now I have 172 PDFs plus 309 document images!

With my Family Tree Maker file already struggling under its own weight, I'm not about to add 309 images to it. Holy cow, that would take forever anyway.

Instead, I know I have the information available whenever I need it. I can create a source citation for each certificate that includes a link to the PDF. This way, anyone who finds their relative in my tree on Ancestry can get the document for themselves.

Sticking to the True Goal of My Family Tree

I began thinking of what I could do next without adding more documents to my tree.

My family tree's goal is to help people with ancestors from any of my ancestors' hometowns. It has names, dates, and places for TONS of people. But most of those facts have no sources or documents.

To add value to my tree, I can build useful source citations that include a link to the original documents. I don't have to add the documents themselves.

But before I build my missing source citations, I have another, really big loose end. Many years ago I documented my Grandpa Leone's entire hometown of Baselice, Italy. I did this by viewing microfilmed vital records (1809–1860) at a local Family History Center. It took me about 5 years to do. (See "Why I Recorded More Than 30,000 Documents.")

All those countless facts have well-crafted but useless source citations. Why is that? Because they cite the microfilm number you would need to order from a Family History Center. And they ended their microfilm program a few years ago.

Source citations can become obsolete. I know. I had 25,000 of them in my family tree. Here's the format for the updated citations.
Source citations can become obsolete. I know. I had 25,000 of them in my family tree. Here's the format for the updated citations.

Today all the documents I was citing are available on Italy's free Antenati website. (See "How to Use the Online Italian Genealogy Archives.") It would be fantastic to rid myself of these 25,000 bad source citations and create usable ones to replace them.

In fact, I'm going to delete every one of the outdated citations in one fell swoop. Then I'll work on adding good citations. I can go to the Sources tab in Family Tree Maker and rip out all the bad source citations at once. They're gone now. The process was scary, but all is well. And it cut my 7GB file size in HALF!!

Wrapping Up "Loose Ends Month"

I'll close out November by creating or improving source citations for documents and facts already in my family tree. Then I'll return to my previous project which documents the town of Baselice after 1860.

It's been a tremendous experience focusing on loose ends this month. I'm so excited by all I found. I don't know about you, but I'd like to dedicate one week a month to genealogy loose ends. Who knows? There may come a time where everything is all tied up.

Yeah, right!


  1. You must have a high level of confidence of not loosing source documentation?

    1. I have a very tight file back-up policy, and all my documents are searchable on my computer. I can locate whichever one I need in a second.

  2. DiAnn, I know this idea might be a little out of your comfort zone, but it might be time to look at a different piece of software. I believe you use root magic, but you should give Legacy Family Tree a test drive. I, too, have a large Italian and Nova Scotia family with about 71,000 names with 4700 images (documents, photos, etc.) attached to the family tree GEDCOM files and it takes about 20-30 minutes to back up the files (the largest is 5.12GB of all the images).

    By the way, I have been working through the Antenati files and I am curious on how your source the Antenati site and documents?

    Thanks for the great topics. Cheers!

    1. I use Family Tree Maker because synchronizing my work with my tree that appears on Ancestry.com is something I can't do without. My next article will include a graphic to show how I source Antenati files, but here is the basic format:

      From the xxx State Archives, YEAR TYPE, TOWN, document xx, image xx of xx at book url

      image URL

    2. Note that my article about using Antenati explains how to get the exact document image URL. http://bit.ly/antenati -- and thanks for reading!

    3. Actually, it's this article that has the source citation image for Antenati!

    4. Yes, I saw how you sourced the Antenati in the article after I re-read the images a few times. Thanks for pointing it out to us. 2022 has been the year that I have really cleaned up my sources and tried to make them consistent. I am always looking for better ways of doing the information placed in the sources.