Monday, November 21, 2005

A first go at Google Books

After reading all sorts of newspaper coverage about the conflict between publishers and Google over the intellectual property rights issues involved with the Google Books scanning project, I decided to see what the project amounted to so far. I had been getting the "invitation" to try my searches on the new service for awhile, but hadn't waded in. Having done so now, I have to say I'm underwhelmed.

You've probably heard about the plan. Google will scan all the books in a number of libraries, setting things up so that the full text is searchable, but only a fraction within the limits of fair use will actually be viewable. Publishers are objecting to the scanning process, since it involves making a complete copy. Google wants to argue that it's still fair use since nobody can get to the full copy. Some of the publishers simply want the right to say yes or no. I'm no big defender of intellectual property, but I may be leaning just a bit towards the publishers on this one.

In practice, what you get is, unsurprisingly, a big tease. More than that, you get a complicated tease, which requires you to sign into a Google account to some some text, although other pages are open to all and some are simply not readable at all. If by chance you get to view the information you searched for, you have no printing options beyond the Print Screen command, and windows are sized so that even that involves some awkward adaptations.

There's no way to gauge what you're not finding, but the text editing seems to be good and the search engine functions pretty much as you expect Google to function. I did a fairly careful set of searches using my keywords for the William B. Greene research. I came up with a dozen or so listing that looked like they might have new information in them. In about a third of the cases, the information was blocked, so all I got was a new citation to search down.

What's GOOD about what I got: more than half of the references I was eventually able to track down were not properly indexed in the volumes, and would have been nearly impossible to track down otherwise.

What's NOT SO GOOD about what I got: nearly everything I eventually tracked down was no more than a mention. I found one new letter by Greene, but it was in a source that my other search strategies would have found anyway. I also found a suggestion that Bessie Greene and Susan Dimock had been a couple, which wasn't substantiated at all seriously, but was a new one on me. I didn't find dozens of references I know are out there. The bottom line is that there are a lot of books in the world and if Google ever gets a fraction of them online, it may need a much better set of search tools.

What's REALLY NOT GOOD about what I got is that I couldn't tell whether the offhand mentions were simply that or whether they were more substantial, even though the whole text was right there.

I'm something of a special case, working on the sort of project where I can hardly afford to skip over any small reference to my subject. Some of my best bits about the Greenes have come in the form of offhand mentions. Theoretically, then, I might be the guy who, seeing that there is a mention of William Batchelder Greene on page 245 of a book I can search but can't read, might actually lay his money down and purchase something. My research library is, in fact, well stocked with books which only mention one of the Greenes a single time, but which provide valuable context.

I still won't buy blind. And I'm guessing not many other folks will either.

So does that mean the publishers shouldn't cooperate with Google? Is the whole scheme pointless? Elsewhere, I've noted that "the library" is going through a serious transition, as books on the shelf are replaced with full-text electronic copies and books in remote storage facilities. I've even had some journal articles stored remotely delivered to me in electronic pdf form. For serious researchers, the possibility of searching the full texts of full libraries is tremendous. And it looks like Google may be setting a higher standard for full text searchability. But there's an important difference between the technologies that are good for searching and those that are good for reading. Publishers, booksellers, and librarians have a product in hand that no "ebook" or pdf file is going to surpass anytime soon, at least when it comes to ease of use. There are plenty of sites which provided the texts of whole books, some recent and some in public domain. What they have in common is that it sucks to read books on a computer. And it really sucks when providers start trying to manage use, by forcing users to print single pages, etc. Harvard's Women Working Open Collection is probably the best online book source I have seen, at least in terms of readability, searchability and printability. It's still something of a pain in the tush to navigate, but that's largely just the nature of the beast with electronic texts.

If Google were to concentrate on works in the public domain, or if publishers were to cooperate on a large scale (figuring selling books might make up for the availability of texts), all the ingenuity of their programming teams could be aimed at making those texts accessible, rather than the current goal of. . .

I'm struck, really, by how unclear the goal of Google Books ultimately is. What are we to make of a search engine that won't really let you file anything?

No comments: