Digitization: What you want

Or, where’d you get your information from huh?

Non-Mormon researchers are frequently shocked by things like the total sales of the Joseph Smith Papers Project’s volumes. Comparable papers editions sell frequently in the hundreds, whereas the first Journals volume of the JSPP sold scores of thousands. Now, I realize that the vast majority of those volumes are destined to reside as trophies on Latter-day Saint bookshelves, unread. However Mormons clearly have an interest in history that drives feats of strength that would be absurd to believers in other traditions. Voici, the digitization of published and manuscript (or holograph) materials. Various institutions have, over the last decade, digitized a shocking amount of material, an oeuvre that has, for example, allowed me to research and publish in Mormon history when I otherwise would not have been able.

Inspired by WVS’s recent post, I’d like to evaluate aspects of the various digitization efforts. Basically, I’m Anubis, this is my scale, and I just happen to have the feather of Ma’at. Things that are important:

Digital Images
If you live far from the repositories or simply don’t have time to run to the library, having access to images of the documents is really a tremendous win. Digital images alone are typically of manuscript documents as OCR (see below), is so readily available. Important manuscript collections include the Selected Collections DVDs, which are available online if you are a BYU Student, and are partially available to everyone at the CHL website. The full DVDs were apparently supposed to be made available there last fall and then last spring. Not sure where that update stands. Having color images of these documents is, in a word, huge. Some of this material was restricted not too long ago. Another example are the Cache Valley diaries at the USU Library digital collections.

Transcripts come in two flavors: OCR and human.

OCR stands for Optical Character Recognition, and it has been around for a long time, though the technology certainly has improved. Back in the early nineties, several groups took it upon themselves to scan books and sell digital libraries using the now deprecated NFO file format. The internet sort of killed this, but many of these tools are still in circulation as they also include sources that are otherwise not accessible. These include Signature’s New Mormon Studies CD-ROM and Deseret Books’ Gospelink (LDS Library is another though now defunct example). The former just sounds like a period piece, but it is still a must-have for the Woodruff diaries alone.

The thing about OCR is that, especially twenty years ago, it was sort of crap. So there are not a few errors, and it is generally advisable to compare against originals. Good for searching, but always verify.

More recently, groups are making published materials available as images with OCR texts associated with them. Examples include University Digital Collections who have also produced the Utah Digital Newspapers, the LDS Church History Library’s ambitious and quickly enormous Internet Archive, and the ever beneficent Google Books.

Human transcripts take a lot more effort. My Adobe Acrobat can OCR a scanned document in seconds. It would take me much, much, longer to transcribe it by hand. Because of the effort required, these transcripts are only typically performed for manuscript documents. And the only two institutions to provide images and manual transcripts of manuscript materials, of which I am aware, are the Joseph Smith Papers and the BYU Digital Collections, viz., the Mormon Missionary Diaries, and the Overland Trails Diaries. And really, the quality of these two groups’ offerings are simply incredible. With BYU the document transcripts are available in both HTML and PDF.

Finding Aids
What good do these 0s and 1s floating in the cloud do if no one can find them? For example, if you didn’t know that the Utah Genealogical and Historical Magazine was recently moved from the BYU Digital Collections to the LDS Family History Library digital repository there would be no way to find it. No soup for you! Google is definitely your friend here, but drilling down into the catalogues can sometimes be the only way to find things. Consequently search engine-friendly finding aids are a real boon (like those that existed at BYU before their upgrade [shakes fist at sky]).

And here is a thing. Because the amount of content that is now digitized is so enormous that one cannot read it all (for interesting discussions see here and here), and while it is very helpful to be able to search through a document transcript, the ability to search globally across a collection yields truly a glorious fruit (the consumption of which leads to knowledge or death, depending on how lazy you are). I can honestly say that every project that I have worked on has been improved due to global search functions. Who has global search? Legacy NFO’s, universities, Google, and the JSPP. Losers: The CHL and the FHL.

Even better than global search, is the advanced search of University digital collections. This allows you to search for terms near each other, truncate search terms, search particular collections, and constrain searches all at the same time (granted legacy NFOs let you do this as well).

Winner, winner, chicken dinner
BYU for both prolificacy and quality, they are unmatched. JSPP you came close. Your transcripts are gold, but your search is still unwieldy and shallow.

As wonderful as it is to have documents available in any form, having images with advanced searchable transcripts is wonderfuler.


  1. I’ve spent most of my time with docs lacking transcriptions, but imaging is still a major boon. Search is the key for imprints and it will only get better. One hopes that Yale and the many other repositories like Huntington will expose much more of their materials digitally.

  2. You think that you can front when revelation comes? (Best unintentional LDS rap line ever.)

    In dealing with electronic libraries in Biblical studies, one of the criticisms has been that search capability removes the breadth that comes with having to read through a multitude of sources before finding what you’re looking for. I think that’s true to an extent. That said, I love my electronic collections, and insha’allah will see more of them in the future.

  3. J. Stapley says:

    WVS, agreed. Do they have the same pressures that Mormon repositories have to actually do it, however?

    And right on Ben. Ben P., wrote up a thing relating to the idea of laziness, which I linked to in the post, but screwed up the actual link (fixed now). But here it is again. I think that you need both. Read broadly and deeply, and then search link a fiend.

  4. J. Stapley says:

    I also forgot to mention the UU Library’s new Pioneer Diaries series. They have digitized some diaries, but instead of making them available online for free, they have coupled them with their in-house print-on-demand machine. So you can buy each journal volume for $15 or so. I picked up the first available volume of the Kesler diary to check it out. The binding is meh, but it is definitely better than nothing if you know what you are looking for. Hopefully, once they have recouped the cost of digitization, they will be made more widely accessible.

  5. Jonathan Green says:

    For digital images, quality and format are important. The resolution of the scans needs to be sufficient for small type or fine details of a manuscript, and the images need to be full-color or at least gray scale, rather than 2-bit b/w. And most of all, the images need to be plain vanilla JPGs and not restricted to PDFs (which is unwieldy for large documents) or, heaven help us, some awful proprietary format like DejaVu.

  6. J. Stapley says:

    Important note, Jonathan. The Utah Digital Newspapers contains notriously poor quality two bit scans–very difficult to work with. I don’t mind PDFs, though.

  7. I can’t make it past the image of Stapley passing final judgment. Quaking in my boots!!!

  8. There are such wonderful resources available online. Posts like this are great for hints about what’s happening in the archival world and providing a new link or two. I currently have about seventy digital sources to check for each biography I write — it can be a little tedious, but Netflix streaming can help dull the pain of the repetition, and the search always turns up some real gems.

  9. It is scary, Cynthia!

    Amy, it really can be a slog, but certainly less of a slog than not having search capability!

  10. Thanks for some great ideas of places to look when I am trying to find something specific. I especially didn’t know there was a way to get to some of these search portals without being a university student. The links in the posts were as valuable as anything in the last few weeks!

    Thanks again!

%d bloggers like this: