Publishing's biggest challenge

Last Updated : Jun 14 2013 | 4:18 PM IST

Even if Google Print-type initiatives fail, the hide-bound publishing industry will never be the same.

Most nations have the equivalent of a national library; an archive, which receives a copy of every book published in the country. Since most nations are also signatories to the International Copyright Act, those books are part of the international ISBN catalogue, which assigns a unique number to every book.

Given the quantity of books published annually, that means lots of dead trees. It also means an extremely cumbersome process to access these verbal repositories.

In theory, an Estonian who wishes to read or quote a verse written in Malayalam, can write to India, trace the relevant book and quote the verse, with proper attribution under the concept of "fair use". In practice, this process could take months, if not years.

It would all be a lot simpler if books were stored electronically format in digital archives. Then, a researcher could go to the relevant website, make a search, and see passages onscreen. The concepts of "fair use" and copyright would remain in play but the access time would be massively reduced.

Copyright is well-defined globally though time periods vary considerably from nation to nation. Most books move out of copyright 50 years after the author's death. In certain cases, (Mickey Mouse is a famous example) the copyright can be renewed or extended by inheritors.

In others, such as the works of Rabindranath Tagore, copyright can be extended for a further period by legislation (this has thankfully run out in the case of Tagore).

Even this apparently clear concept becomes a little fuzzy in the hard sciences and mathematics. Two academics working on the same problems may write papers containing large chunks of identical work without breach of copyright.

There may be only one solution to a given problem and both academics may have arrived at that independently. Or, both may be building on work done by a third party.

While copyright can get fuzzy, "fair use" is inherently fuzzy. You may cite an extract of a copyrighted work if there is a clear reason to do so. That extract should be attributed and acknowledged. Prior permission should be taken from the copyright holder if it's a long extract.

But the definition of length is flexible; citing 250 words from a 1,000-page book may be considered fair use whereas copy-pasting an entire 50-word poem may not be fair use.

The Gutenberg Project was the first major attempt to create such an archive "" volunteers created a repository of e-texts of various books that were out of copyright. The Gutenberg Project archive is a treasure trove.

In technical terms, the costs of digital archive creation revolve around storage and man-hours. Text scanners can create images of every printed page and convert those back into ascii text.

These electronic files can be stored on hard disks and read onscreen by widely-available word processing programs and e-readers. A specific word or phrase or multiple instances of specific words and phrases can be instantly found and collated.

Given the rapidly falling costs of storage, creating terabytes of searcheable digital archives is simple. It does of course, require man-hours to carry out scanning and storing, quite apart from the man-hours required to program search-algorithms. But the savings in terms of physical space, backing up, search time and permanence more than compensate.

Several major Internet-based businesses have started to put together digital book archives. There's amazon.com, which allows page view searches and onscreen browsing of its online catalogues. There's the Yahoo Print service. And, most ambitiously, there is Google Print, which intends to scan every book in existence and create an online repository.

The Amazon search process is restricted. You can search for a specific title and then read a few pages if it's in the Amazon list.

Yahoo Print is sticking to books that are out of copyright and in the public domain. Google goes a lot further. If the initiative works as planned, it would be possible for somebody to run a meta-search for a term like "Iraq" or "Churchill" or "Churchill + Iraq" and retrieve relevant sections from every book ever published that carries such a term.

If the book is out of copyright, you can download the entire text. If the book is in copyright, you could view sections and download bits and pieces in accordance with fair use. It is unclear what the revenue model for the service would be; there could be payments routed to copyright holders, and so on.

Several major libraries have cooperated with the search-engine giant and, at the very least, this has been a shot in the arm for Gutenberg-type initiatives.

However publishers have been extremely antagonistic about releasing their intellectual property, for fear of piracy and copyright violation. The academic and legal fraternities are split on whether this initiative is in accordance with commonly-understood rules of fair use and copyright.

Publishing is globally amongst the most hide-bound industries in terms of leveraging technology in marketing. There has scarcely been a major book printed anywhere in the last two decades that has not been electronically processed before it was put on paper.

This means that, if publishers wanted to, they could considerably shorten time to market for e-texts and supply e-books quicker and cheaper than print editions.

They haven't "" e-texts are simultaneously released with print editions (when they are released at all) and cost as much as the print edition. Also, if publishers were prepared to release their electronic proofs to national libraries, e-archives could be created much quicker, without the pain of scanning print editions.

As it stands, the New York-based Authors Guild has sued Google for "massive copyright violation" and there will probably be more class action suits on this issue.

One way or another, this will force a long-overdue review of definitions of copyright, fair use and libraries. An idea this big cannot really be suppressed "" and it will also force publishers to refine their e-book models, whether these specific initiatives succeed or fail.

Connect with us on WhatsApp

Publishing's biggest challenge

Also Read

Markets likely to drift down further

Signals are bearish

Devangshu Datta: Market lessons from drug dealers

The RBI`s balancing act

Expect resistance at 8200

Explore News