23 April 2024

2 Free Tools Can Read Document Images for You

Genealogist Lisa Alzo uses a website called Transkribus for recognizing text within images. It's a process that's been around for decades: Optical Character Recognition or OCR. I looked into Transkribus, but it isn’t free. So I searched for free OCR options we can all use.

It turns out a tool you may already be using has this capability. It’s OneNote!

I can think of 2 key reasons to use OCR in genealogy research:

  1. To pull text from images so you don't have to re-type it.
  2. To translate a large amount of text from another language.

Last June I wrote about a book that tells the history of one of my ancestral hometowns. (See "How to Use a Foreign-Language Book for Family Tree Research.") A distant cousin sent me the Italian-language book years ago. I began using Google translate and saving the results in a Word document. It’s tedious work, though. I have to type the Italian into Google Translate so it can generate the English translation.

You're probably already using 2 free tools that can do more for your family tree than you know. They can extract text from a genealogy document image.
You're probably already using 2 free tools that can do more for your family tree than you know. They can extract text from a genealogy document image.

Extract Text from a Photo and Translate

Using OneNote, you can:

  • Photograph (or scan) the pages of the book.
  • Drop the images into a OneNote file.
  • Extract the text by right-clicking an image and choosing Copy Text from Picture. This puts the text in memory.
  • Paste what's in memory either below the image or in a new section.
  • Translate that text by choosing Translate > Translate Page.

The translated text appears in a new section of your OneNote document. It's ready for you to format and look over for any errors. It’s hard to find OCR software that will format your text nicely, so there's always a little work to do. OneNote keeps the line breaks from the original, so you have to do some editing to make it more readable.

The translation uses British English even though U.S. English is set as my preferred language. I'll have to change words like favour, colour, and analysed for myself. And I have to look out for footnote numbers. You know how books use a small, raised number to point you to a footnote? They don't get extracted as a superscript number, so they tend to blend into the text.

I can imagine spending a day putting that book on my scanner, and capturing two pages at a time in an image file. Then I can drop a bunch of images into OneNote, extract and translate.

Turn Handwriting into Text

I did three tests with handwritten Italian documents. OneNote failed to extract the text from them. One of my tests was a 1942 death record with a fill-in-the-blanks format. OneNote extracted the typewritten parts of the form, and skipped over the handwriting!

Then I wrote a simple note in the nicest print I can manage. OneNote couldn't extract any text. If it could, that would be handy for capturing what's written on the back of a family photo.

Then I learned that Google Docs can extract text for you, too. The steps are as follows:

  • Log into your free Google Drive account using a web browser or the app.
  • Upload an image of the text you want to capture.
  • Right-click that image and choose to "Open with" > "Google Docs."

The Doc file will contain the image and its extracted text.

This is an easy way to turn handwriting into text. I tested it on the note I printed, and it worked perfectly. I tested it on an old Italian death record and it didn't recognize anything. But it should be great for the backs of photos or old letters written by your ancestors.

I encourage you to give them both a try.

15 comments:

  1. Wow! You always have such great information, and this one is over the top! Thanks for all you do to help us put more leaves on our family trees.

    ReplyDelete
  2. I always read, enjoy and learn from your articles. Lately, however, and today, particularly, PDF ads interfere with my reading enjoyment and interrupt my focus. I've already complained to Google but want you to know. I used to be able to save your articles and keep them for reference but no longer can because of this issue. Thank you.

    ReplyDelete
    Replies
    1. I think I've turned off that big ad that takes up the whole screen. Hopefully that worked. Thanks for the feedback!

      Delete
  3. I tried to upload a jpg image to Google Docs of a file that I wanted to turn the handwriting to text, but I got a message saying that the selected file is not available to upload. Any comments?

    ReplyDelete
    Replies
    1. Were you uploading the image from your computer or from where it resides on another website? Maybe if the image isn't in your possession you can't upload it.

      Delete
    2. I have the file on my computer. Originally it was a screenshot from an online record. I have tried uploading it to my Google Docs in many formats including jpg, png and pdf. I always get the message saying that the file is not supported for upload.

      Delete
    3. I'm sorry to hear that, and I can't make any sense of it.

      Delete
  4. What a wonderful tip! I have a book about my GF's hometown in Hungary that is written in Hungarian. I'm so excited to translate the book using these tools! Thank you!

    ReplyDelete
  5. Thank you again!!! Wish they had better hand writing in some of those old documents!

    ReplyDelete
    Replies
    1. As I was testing it on handwriting, I kept thinking I can probably make out the text easier than it can!

      Delete
  6. This is a great tip as usual. I hope handwriting recognition improves. It is really a game changer.

    ReplyDelete
  7. The only caveat is that if use Google docs it goes toward you total Google storage including email and photos, so I probably won't use it.

    ReplyDelete
    Replies
    1. Seriously? You can always download and delete the items from Google Docs when you're done. My Google account has 15 GB of space and I've only used 1.57 GB of it.

      Delete