A Voracious Reader

Ruadhán Mac Cormaic September 2010

The Case for Books, by Robert Darnton, Public Affairs, 240 pp, £13.99, ISBN: 978-1586488260

In L’an 2440 (The Year 2440), the best-selling and recently reissued utopian novel by Louis-Sébastien Mercier published in 1771, the narrator falls asleep after a discussion about the injustices of Parisian life and wakes up in the same city seven centuries later. Mercier, a prolific writer in the literary underground of the late ancien régime, embraced Enlightenment faith in progress; his utopia was not a distant land but a familiar city transformed by time into a version of his ideal society. It was an orderly place. Public spaces had been reshaped and the roads made wider, healthcare had improved, clothes were more comfortable and torture had been outlawed. There were no armies, taxes, prostitutes, aristocrats or beggars. Moderation and modesty had replaced abundance and conceit.

During his wander around twenty-fifth century Paris, the narrator comes upon the national library ‑ the Bibliothèque du roi under Louis XV ‑ expecting to find four great halls containing thousands of volumes. To his surprise however, he finds only a small cabinet containing a few books. What happened to the once vast and rich collection, he asks the staff. Was it destroyed by fire? “Yes, it was a fire,” they reply, “but we started it deliberately with our own hands.” Gone were fifty thousand dictionaries, a hundred thousand works of poetry, eighty thousand law volumes, 1.6 million travel books and a billion novels. Among those preserved were Homer, Plato, Shakespeare and Molière, but Herodotus and Cicero were purged, as, to the author’s evident approval, were half of Voltaire’s writings. This was the library’s solution to a problem that was considered pressing in Mercier’s day: the sense of being overwhelmed by information and unable to find what was relevant in a sea of ephemera. And so, by eliminating all that was deemed repetitive, puerile, miserable, frivolous, dangerous, gloomy or useless, the library had done society a great service. After all, “nothing leads the mind farther astray than bad books”, the librarian remarks.

More than two centuries after Mercier wrote his fantasy, the idea of fitting the whole of mankind’s documentary heritage into an improbably small space ‑ this time a desktop computer or a mobile phone handset ‑ has become a real prospect. Having diagnosed a broadly similar problem to that which impelled Mercier to his imaginary act of selective destruction ‑ namely, our helplessness before the vast accumulated wealth of published knowledge and our frustrated desire to access and profit from it ‑ the internet giant Google has proposed one of the most radical and daring ideas in the history of publishing: to scan every book ever published and make them available to anyone with access to the internet.

Guided by its founders’ mission to “organise the world’s information and make it universally accessible and useful”, the company has for the past five years been digitising millions of books and making them searchable on its website. Writing in the New York Times in October 2009, Google co-founder Sergey Brin set out two ambitions for Google Books, as the project is known. The first was to help preserve the world’s repository of printed knowledge, the second to make it more widely available.

The great majority of books are accessible only to the most tenacious researchers at the world’s leading academic libraries, he wrote, while books written after 1923 – the date from which most works are subject to US copyright ‑ disappear into a “literary black hole”.

With rare exceptions, one can buy them only for the small number of years they are in print. After that, they are found only in a vanishing number of libraries and used book stores. As the years pass, contracts get lost and forgotten, authors and publishers disappear, the rights holders become impossible to track down.

Inevitably, Brin suggests, the few remaining copies are left to deteriorate slowly, assuming they are not lost to fires, floods or other disasters. Did not the great library at Alexandria burn three times, in 48 BC, AD 273 and AD 640, he asks, as did the Library of Congress, where a fire in 1851 destroyed two-thirds of the collection. “I hope such destruction never happens again, but history would suggest otherwise. More important, even if our cultural heritage stays intact in the world’s foremost libraries, it is effectively lost if no one can access it easily.”

Put like that, why should anyone object? Who, after all, is against preserving books and expanding their reach? And yet, in the five years since the digitisation project began, Google Books has given rise to some of the most intense debate provoked by any of the company’s pioneering plans. One of the most consistent and well-informed critics is Robert Darnton, a leading historian of eighteenth century France, a specialist in the history of books and, since 2007, director of Harvard University library.

In The Case for Books, a short collection of previously published ‑ and quite diverse ‑ essays, reviews and scholarly articles, Darnton asks whether the era of the book as we know it ‑ a codex of bound pages ‑ is coming to an end, and if so whether we should celebrate its demise and the triumph of new technology or mourn an irreplaceable loss. Essays examine different aspects of book history: paper, bibliography and the reading process have their own chapters, and large sections are devoted to detailing ways in which libraries and universities are experimenting with technology to better integrate and improve access to their collections.

And yet while the collection leaps ‑ somewhat uneasily perhaps ‑ between centuries and topics, Google Books and Darnton’s efforts to make sense of its implications for libraries, for the reading process, for the very idea of the book itself, are recurring preoccupations. Not that Darnton is quite a polemicist. As with many other specialists, his views on Google’s project seem ambivalent: he is seduced by the breathtaking potential of the idea yet deeply anxious about how it could turn out.

The project will make book learning accessible on a new, worldwide scale, despite the digital divide that separates the poor from the computerised, and Darnton is alive to the historical potential of digitisation to develop the power that Gutenberg unleashed more than five centuries ago.

Who could not be moved by the prospect of bringing virtually all the books from America’s greatest research libraries within the reach of all Americans, and perhaps eventually to everyone in the world with access to the internet? Not only will Google’s technological wizardry bring books to readers, it will also open up extraordinary opportunities for research, a whole gamut of possibilities from straightforward word searches to complex text-mining.

Darnton draws on the idea of a Republic of Letters as imagined in the eighteenth century ‑ a place with no police, no boundaries, no inequalities other than those determined by talent. That republic was open to anyone who could exercise the main attributes of citizenship, writing and reading. “Writers formulated ideas, and readers judged them,” he writes. “Thanks to the power of the printed word, the judgments spread in widening circles, and the strongest arguments won.”

But while proclaiming its egalitarian ideals, the Republic of Letters “suffered from the same disease that ate through all societies in the 18th century: privilege”. Printing and the book trade were dominated by exclusive guilds, while the books themselves could not appear legally without a royal privilege and a censor’s approbation. The founding principle may have been the diffusion of light, but the truth was that it seemed to many people a dark and forbidding place.

Darnton wisely stops short of identifying the internet with the Enlightenment (as he points out, the dominant actors in the online market are engaged not in an idealistic campaign to spread knowledge but in ferocious commercial competition to finish one another off), but he is clearly excited by the web’s potential to unlock some of those ideals that the eighteenth century couldn’t quite live up to. Openness and exchange are the web’s watchwords. Thanks to “open access” repositories of digitised articles, countless individual research libraries’ multimedia online projects and openly amateur websites such as Wikipedia, learning has never been so accessible. “The democratisation of knowledge now seems to be at our fingertips,” Darnton writes. “We can make the Enlightenment ideal come to life in reality.”

Most of the essays reproduced in The Case for Books appeared first in the New York Review of Books, where Darnton, being more concerned to elucidate the complexities of the legal cases involving Google, did not pursue the possible connections between the Enlightenment project and the push to digitise the world’s books in any great detail. But listen carefully to Google’s all-conquering, quasi-messianic view of itself and it’s not that difficult to hear echoes of the philosophes. Google’s “mission” to “organise the world’s information and make it universally accessible and useful” recalls Diderot and d’Alembert’s introduction of their Encyclopédie to prospective readers as a systematic account of “the order and concatenation of human knowledge”. As Darnton himself has noted elsewhere, the term encyclopédie, derived from the Greek word for circle, expressed the notion of a world of knowledge, which the encyclopedists could circumnavigate and map. The techies in Mountain View, California would surely approve.

And yet the encyclopédistes, with their metaphor of the “tree of knowledge”, were engaged in an ideological enterprise. They set out to design new categories of information and impose order on the accumulated clutter of human knowledge, but the approach was highly selective. They thought they could map the indeterminate topography of knowledge because they could “limit the domain of the knowable and pin down a modest variety of truth”, as Darnton puts it.

There was a strategy behind the Encyclopédie project: it aimed to shape knowledge in such a way as to remove it from the clergy and to put it in the hands of intellectuals committed to the Enlightenment.

Google may share the encyclopédistes’ sense of messianic purpose, but in spirit their aim is in some respects closer to what Diderot and d’Alembert set themselves against: a dictionary or compendium of information arranged neutrally in the order of the alphabet ‑ or in this case, ordered by keyword search ‑ albeit on a scale never before attempted.

And it is precisely that ‑ its sheer scale ‑ that has made Google Books so controversial. Darnton outlines several grounds for his misgivings, some technical (the quality of its scanning), others principled (the ceding of control, as he sees it, of a large bloc of mankind’s cultural heritage to a private, profit-making enterprise) but most stem from one overriding concern: the apparently unassailable power, reach and resources of the Californian colossus.

Darnton has followed the progress of Google Books closely over the past five years. On being appointed to the directorship of Harvard library he learned that the institution was involved in secret talks with Google about the firm’s project to digitise millions of books, beginning with Harvard’s, and to market the digital copies. He was dazzled by the vision of a “mega-library” bigger than anything dreamt of since Alexandria, but with time grew doubtful about whether Google was “a natural ally of libraries” at all.

In 2005, a group of authors and publishers brought a class action legal case against Google, alleging violation of copyright. The company had been busy digitising books that were in the public domain and making them available online at no cost to the viewer, but it had also scanned a significant number of library books that were protected by copyright in order to provide search services that displayed small snippets of the text.

In October 2008, after long and confidential negotiations, the opposing parties announced agreement on a settlement. Under its terms, Google will sell access to a huge data bank composed mainly of copyrighted, out of print books digitised from research libraries. Colleges, universities and other organisations will be able to subscribe by paying for an “institutional licence” providing access to the data bank. A “public access licence” will make this material available to public libraries, where Google will provide free viewing of the digitised books on one computer terminal. Individuals will be able to access and print out digitised versions of the books by purchasing a “consumer licence” from Google, which will distribute 63 per cent of the revenue to the rightsholders and retain 37 per cent. Meanwhile, Google will continue to make books in the public domain available for users to read, download and print free of charge. The result, as Darnton remarks, will be the world’s largest library.

Does that mean that the research library is bound for obsolesence? On the contrary, Darnton argues, Google Books will make them more important than ever. For all the company’s ambition and unprecedented wealth, few take seriously the claim that it could ever come close to digitising all the world’s books. But even if it scanned 90 per cent of titles in the United States, Darnton writes, the remaining, non-digitised books could be important. Tastes and criteria of importance change from generation to generation. “Our descendants may learn a lot from studying our harlequin novels or computer manuals or telephone books,” he notes.

Google’s digitisation programme is made possible by agreements it signed with a selection of major research libraries, beginning with five (Harvard University, University of Michigan, New York Public Library, the Bodleian in Oxford and Stanford University) and a number of others in the US and Europe, including the Lyon municipal library and the Bavarian state library. But, as Darnton points out, the combined holdings of all its American partner-institutions would not come close to exhausting the stock of books in the United States.

Contrary to what one might expect, there is little redundancy in the holdings of the five libraries: 60 per cent of the books being digitised by Google exist in only one of them. There are about 543 million volumes in the research libraries of the United States. Google reportedly set its initial goal of digitising at 15 million. As Google signs up more libraries ‑ at last count, 31 American libraries are participating in [Google Books] ‑ the representativeness of its digitised database will improve. But it has not yet ventured into special collections, where the rarest works are to be found. And of course the totality of world literature ‑ all the books in all the languages of the world ‑ lies far beyond Google’s capacity to digitise.”

In these circumstances, Google’s choices become critical. What does it select for scanning. What does it omit? How does it arrive at its decisions? Will it scan multiple editions of its titles and, critically, how will it choose which one to put at the top of the search list? A reader looking for Shakespeare’s early plays may have neither the time nor the inclination to sift through hundreds of editions, but if they did, they’d find important differences in the text between many of them. Google will surely come up with an algorithm to rank demand for books but “nothing suggests that it will take account of the standards prescribed by bibliographers, such as the first edition to appear in print or the edition that corresponds most closely to the expressed intention of the author”, Darnton laments.

Google may seem unassailable now, but dominance on the internet, as brand names such as MySpace and AOL attest, can be a cruelly transient thing. Google may disappear or be eclipsed by a competitor with better technology or a savvier business plan, which could make its database as obsolete as 3.5-inch disks or CD-ROMs today. Then there is the certainty that it will miss books, skip pages, blur images and make other errors, while there is no guarantee that its copies will last. Bits become degraded over time. Documents may get lost as hardware and software quickly become extinct. As Darnton points out, we have lost eighty per cent of all silent films and fifty per cent of all films made before World War II. The best preservation system ever invented, he adds, was the old-fashioned, pre-modern book.

These are alarming observations, and they help Darnton make a persuasive case for shoring up research libraries, but they don’t in themselves amount to an indictment of Google’s project. Technological and bibliographical standards can be improved after all, while the concerns about obsolecence and deterioration would apply to any digitisation project. Indeed Google’s status as a profit-making entity gives it a strong incentive to improve its offering, while its technical know-how and great wealth mean it also has ample resources, something which might not be the case with a public digitisation project. “These are valid questions,” Sergey Brin has written, “and being a company that obsesses over the quality of our products, we are working hard to address them ‑ improving bibliographic information and categorisation, and further detailing our privacy policy. And if we don’t get our product right, then others will.”

Darnton’s argument is strongest when he writes of Google’s power in the market. He fears its “monopolistic tendencies” and worries that the class action settlement it has agreed with American publishers and writers ‑ which could provide the template for other parts of the world ‑ makes the company invulnerable to competition.

Most book authors and publishers who own US copyrights are automatically covered by the settlement. They can opt out of it; but whatever they do, no new digitising enterprise can get off the ground without winning their assent one by one, a practical impossibility, or without becoming mired down in another class-action suit. If approved by the court … the settlement will give Google control over the digitising of virtually all books covered by copyright in the United States.

Not true, Google retorts. Nothing in the US settlement precludes any other company or organisation from pursuing their own similar effort, Brin has said. “The agreement limits consumer choice in out-of-print books about as much as it limits consumer choice in unicorns. Today, if you want to access a typical out-of-print book, you have only one choice ‑ fly to one of a handful of leading libraries in the country and hope to find it in the stacks.”

Darnton is no doubt correct, when he looks back over the course of digitisation in the 1990s, to believe that libraries missed a great opportunity. Action by a grand alliance of research institutions supported by a coalition of charitable foundations could have done the job at a feasible cost and “designed it in a manner that would have put the public interest first”. Instead Google is now in a position to dominate a market with no serious competitors. Microsoft has dropped its own major digitisation project, and other initiatives are minute in comparison with Google’s. “Google’s record suggests that it will not abuse its double-barreled fiscal-legal power. But what will happen if its current leaders sell the company or retire?”

One disturbing scenario is provided by the experience of the scholarly journal business. As professional journals grew in different fields and sub-fields, learned societies produced them and libraries bought them, a system that worked well for about a century. Then commercial publishers discovered that they could make a fortune by selling subscriptions to the journals. Once a library had subscribed over a certain period, academic staff and students came to expect an uninterrupted flow of issues. Publishers therefore knew that they could raise the prices without affecting demand, and the result, Darnton writes, is that the Journal of Comparative Neurology, for example, now costs $25,910 for a year’s subscription. That means the acquisitions budgets of research libraries are weighted heavily towards serials and that the demand for monographs has declined.

The fundamental problem, as Darnton sees it, is that libraries and businesses exist for radically different reasons. The former are there to promote the public good by encouraging learning, whereas the latter are created to make money for their shareholders. It is true that the two purposes can intersect – Google’s view is self-evidently that it can generate significant revenue by expanding access to written knowledge ‑ and the public good usually depends on a profit-making economy. “Yet if we permit the commercialisation of the content of our libraries, there is no getting around a fundamental contradiction,” Darnton remarks. To digitise collections and sell the product in ways that fail to guarantee wide access would turn the internet into “an instrument for privatising knowledge that belongs in the public sphere”.

Darnton’s is a professional’s critique, a wise, measured and well-informed view that states and research libraries who have yet to grapple with these questions would do well to heed. Some of his grounds for concern may seem overdone ‑ for example, the fear that Google could raise prices dramatically contradicts everything we know about the company, whose business model is founded on giving free access to information and generating revenue through advertising ‑ while others (to do with technical deficiencies and bibliographic choices) are serious but eminently capable of being addressed through cooperation between Google and its partner institutions.

The most interesting dilemmas Darnton raises are essentially political ones: whether Google Books represents a privatisation of knowledge and whether the public good of disseminating our documentary heritage more widely justifies agreeing to it. (It is arguable that the project represents a sharing of knowledge with the company more than its transfer ‑ the libraries whose holdings Google is using are not about to close ‑ but Google’s power is such that most commentators expect it to become the dominant source for digitised books.)

Such questions are only raised in passing in Darnton’s collection. For a more expansive argument as to why ceding public control of digitised books might be a bad thing, one might go to the lively debate that has been taking place in France for the past two years. In the case put forward by Jean-Noël Jeanneney, the former head of the Bibliothèque Nationale de France and the most vocal French sceptic, the Google project represents not simply a shift from public control to private, but from a multiplicity of national guardians of cultural heritage to a single, mono-cultural entity that symbolises the internet’s tendency to elide cultural diversity for the sake of a blander (and, yes, Anglo-American) uniformity.

This becomes the screen through which Jeanneney appraises the whole project. In his polemical work Quand Google Défie L’Europe (updated in French this year and translated as Google and the Myth of Universal Knowledge in 2005), he is preoccupied by the cultural-linguistic dimension, and specifically the effect that Google Books could have on relatively smaller cultures outside the dominant English-speaking sphere.

Google’s sensibilities are firmly Anglo-American, Jeanneney argues. He fears that its selection of books for digitisation would be informed by this single perspective and that its presentation of texts based on keywords decontextualises them in culturally damaging ways. The peril, he warns, is of “a forced homogenisation of cultures, in this sector as in others”. He cites Michael Gorman, former president of the American Library Association, who draws a distinction between accessing knowledge and merely retrieving a few pages from a book without the context of the whole. And he goes further: the question of context does not just apply to the work itself but also to the cultural context and language in which it was conceived, written, published, read and understood. Depending on the criteria a search engine applies to its searching and presentation of information, that context can be destroyed or distorted. “The practical value of Google (and every other search engine constructed in a similar way) is deceptive, since the offer is accompanied by no specific information on the limits of the search or the representativeness of the corpus in which it has been carried out,” he writes.

Making an impassioned case against the single model, Jeanneney writes of what is at stake:

In the long term, it’s about constructing a more harmonious balance for the good of the whole planet. We’re not just fighting for a better virtual library. At a time when we have learned, after the failure of Marxism, that cultural forces have as much if not more weight than material interests on the course of History, it’s about the hierarchies of the future, about national dignities, mutual influences, representations, crossed perceptions, stereotypes. It’s about prejudices and about tolerance.

Pointing to the success of European projects such as Airbus and the satellite Gallileo ‑ both competitors or counterweights to American rivals ‑ Jeanneney urges the European Union to devise and fund a major digitisation project of its own, and deplores a lack of will to do so up to this point. In 2005, French and German leaders announced that they would work together to develop a multimedia search engine called Quaero (Latin for “I search”) that many saw as a direct challenge to Google, but progress on the project has been slow due to lack of funding.

Jeanneney’s views have reached the ear of important figures in France however, and late last year President Nicolas Sarkozy announced that the state would allocate €750 million towards the digitisation of France’s cultural patrimony. “We won’t let ourselves be stripped of our heritage for the benefit of a big company, no matter how friendly, big or American it is,” he said when announcing the project, known as Gallica. “We are not going to be stripped of what generations and generations have produced in the French language, just because we weren’t capable of funding our own digitisation project,” Sarkozy said. There will be a role for Google in the project (the books may be made available on Google and Gallica), but the message from the government was that the balance of power would lean firmly towards the state’s side.

Whereas Darnton gives a professional’s view on the debate, looking primarily at the implications of Google Books for the world’s research libraries, Jeanneney ‑ even if he too easily falls back on Gaullist stereotypes of power relations between the US and Europe ‑ does well to expand the terrain. For libraries, the questions posed by Google Books go to the heart of what it is they are for. What role is there for the research library in a world where a great proportion of their holdings can be accessed online, where publications will increasingly be “born digital” and where hand-held readers could change the act of reading in radical, if yet unclear, ways? These are vital questions.

More importantly, Google Books poses searching questions of states themselves. If the future of the library as a place of learning, and of the state’s monopoly of control over its documentary heritage, is up for discussion, then governments, cultural agencies and intellectuals should play an active role in that debate. We cannot know whether Darnton’s worst fears will be fulfilled, but they are far-reaching enough to merit serious thought. It could well be that Google will live up to its laudable plans and provide a more sophisticated, intellectually rigorous, culturally sensitive and exhaustive service than any coalition of libraries, governments or charitable foundations could ever do. One of the most striking themes in Darnton’s collection however is the sense that very few people had considered the possibilities or the ramifications of a large-scale digitisation project before Google came along with its big idea. By the time they had absorbed the implications, the process was already in train.

It may be too early to predict how the relationship between the French state and Google may develop, but the public-private deal suggested by French officials would appear to offer a useful model, marrying the firm’s technological prowess and unassailable reach with the state’s concern for the public good in a way that both deem mutually beneficial. Technology moves faster than policy, but for those states that have yet to grapple with the implications of digitisation, this should be recognised as one of the most urgent cultural policy questions they face.

Google has come up with a much better idea than Louis-Sébastien Mercier’s in 1771. The firm should be lauded for its daring and ambition. But states should also insist on having their say.

Ruadhán Mac Cormaic is the Paris Correspondent of The Irish Times.

Enter your email Address

A Voracious Reader

Advertisement