Friday, July 13, 2007

Google Books is hiding things again

More stupid search engine tricks. Back in May, I noted some peculiarities of Google Books' search engines. If you follow the links from that original post, you will notice some new peculiarities, including the disappearance of the 1849 Amos E. senter edition of Equitable Commerce from the results for: inauthor:josiah inauthor:warren. That important edition is still available from Google Books; you can follow the link above to see what Equitable Commerce looked like before Stephen Pearl Andrews edited it. But it, and one other listing, no longer show up in a general search for Warren's work. There are still five listings for Warren, but two are empty placeholders, referring to editions that are unlikely to be available in the archive anytime soon. It's puzzling.

Equally puzzling is the fact that you can now view the plain text for individual pages on Google Books, but cannot easily print that text, or individual page-images, from the standard viewer. Presumably, this is to protect the value of Google Books' investment in scanning and OCR work. It strikes me that a better strategy would be to do the scanning and OCR work well, rather than accumulating an archive half-full of corrupt texts. There is quite simply nobody out there who can compete with Google's ability to accumulate and store searchable data. As bad as it is, Google Books is still one of those must-use resources. But that won't necessarily always be the case, and the real threat to Google's supremacy is probably not the next Google, but the various projects that will inevitably spring up to do efficiently, for some particular audience, what Google Books has done in such a slipshod manner.

No comments: