Saturday, October 22, 2005

Navigating the new library

I've been pretty quiet here for awhile, mostly because the William B. Greene research has taken on a life of its own, not just because A Special Answer to a Special Prayer threatens to be about twice as long as I expected, or because the editing and republication of Greene's works is at a particularly demanding stage (expect all of the versions of the work on transcendentalism, as well as a new bibliographical essay soon), but also because--to my great surprise--word of the nature of the work has prompted some new and very practical kinds of collaboration. Mutualism seems to be finding its moment.

Of course, a big part of what's eating up my potential blogtime is the research. Access to "full text" databases of many old sources makes up, in some small part, for the fact that so many books that used to be available on browsable shelves are now tucked away in remote storage facilities. Text-searching is the new shelf-browsing. Unfortunately, we're at a very awkward moment in the transition.

First, about those open stacks--there are fewer and fewer of them, and it looks like the trend may continue. There is talk at the local university here of actually closing one of our two libraries, and jamming its contents either into the other library--a 7-floor structure that now has about half of one floor actually dedicated to browsable stacks--or into compact storage at the regional depository. I've already talked, I think, about the inefficiencies of the storage option. Last weekend, I made a thorough search of the section of the PSs that is 19th century American literature. After culling for storage, that's basically one long row of bookcases, front and back. That's not a lot of books, in terms of what the library possesses in its collection, but it's a lot of books to look at and evaluate, a lot of indexes to check for Greene and Shaws. It is, however, a quantity of books that one can, with some diligent work, get through in few hours. Let's shoot for a number. Based on experience from my bookselling days, I would guess that a very low estimate of the number of books I looked over in an afternoon was 4000, of which I had to actually open a couple of hundred. Now, I'm guessing that there are at least three times as many books from this section in the regional depository as there are on the shelf. Assuming the percentage of potentially interesting books is roughly the same, we're talking about something like 500-600 books, or a pretty full day of browsing. But with browsing not an option, we're talking about 500-600 request forms to be filled out, and 500-600 books that must be picked by depository workers, trucked to my library, and handled by circulation desk workers, all before I get my chance to take the several seconds it will take to check an index.

Let me be clear about the kind of research I'm doing. It involves piecing together the lives of Greene and his family from the most offhand sorts of mentions, the most fragmentary sorts accounts. Many of my best clues about Greene's family life have come from brief passages in the memoirs of his associates. A fine example came from last weekend's browsing expedition. Lydia Maria Child made an offhand reference to William Greene's daughter, Elizabeth, in one of her letters and in the course of a sentence provided the first solid clue I have found to her approach and practice in the work she did with poor single mothers just before her death. A few weeks before it was the discovery of some potentially important historical writings by WBG, thanks to a rather dismissive mention tucked away in a book about Elizabeth Palmer Peabody. Yesterday, it was a mention, in Robert Gould Shaw's letters, that Uncle William and Uncle George seem to have patched things up--the sort of thing that sends you off on a whole new search.

This is, of course, the kind of searching that ought to be made much easier by the advent of "full text" databases, such as American Periodical Series Online and the Making of America archives. Keyword searching has, indeed, been helpful for finding mentions too slight to even make it to an index. If the results of such searches were dependable, then research could be sped up immensely.

The results of such searches are not dependable.

All too often, what you are searching through is raw OCR text, which has not even been edited to make sure keywords, such as author's names, are correct. For 19th century sources, with fonts that are likely to fragment or be misread in the OCR process, this is a huge problem. There are already difficulties associated with texts of the period. The spelling of names may be somewhat less consistent than in contemporary sources, and abbreviations are much more common. Imagine my surprise when a search term like "wm. batchelder green" turned up a couple of key bits. I've done enough OCR work to know some of the likely mis-scans, and for a name like "Greene," there are simply too many of them to pursue along with all the other combinations of names, abbreviated names, misspelled names and initials.

So. . .

We're somewhere between the time when we could, if we were willing to put in the work, do exhaustive shelf-searches, and the time when we'll be able to do exhaustive and dependable electronic searches. At the moment, it isn't really clear if we've gained or lost research power.

No comments: