What is the Internet Archive?

Context: Internet Archive (IA) is embroiled in a major legal challenge as it faces off against traditional publishers accusing it of copyright violations. The free digital library is fighting the forced removal of around half a million books from its platform, which it argues functions like a library. 

Internet Archive

Internet Archive

  • It is an American nonprofit digital library founded in 1996 by Brewster Kahle.
  • It provides free access to collections of digitized materials including websites, software applications, music, audiovisual, and print materials.
  • The Archive also advocates for a free and open Internet.  
  • Its mission is to provide ‘universal access to all knowledge’. 
  • The Internet Archive allows the public to upload and download digital material to its data cluster, but the bulk of its data is collected automatically by its web crawlers, which work to preserve as much of the public web as possible.
  • Its web archive, the Wayback Machine, contains hundreds of billions of web captures.
  • The Archive provides specialized services relating to the information access needs of the print-disabled. Publicly accessible books were made available in a protected Digital Accessible Information System (DAISY) format. 
  • DAISY is designed as a complete audio substitute for print material. 

Case against Internet Archive: 

  • Many traditional publishers have alleged that Internet Archive violated their copyrights and illegally made their books available to the public, by scanning physical copies and distributing the digital files.
  • Traditional publishers were against IA’s temporary ‘National Emergency Library’ (NEL) initiative that it launched during the COVID-19 pandemic. This was to allow more users to access the e-books in its collection while physical libraries were locked down.
    • During the NEL, IA lifted the technical controls enforcing its one-to-one owned-to-loaned ratio and allowed up to ten thousand patrons at a time to borrow each e-book on the Website.
    • In general, IA uses a system known as ‘controlled digital lending’ to limit the number of people who can access an ebook.
    • It ended its emergency library system after being hit with the lawsuit.
    • Internet Archive used the doctrine of fair use to defend itself in the case, but this did not hold up. 
  • Hachette vs Internet Archive Case (2020): 
    • Traditional publishers Hachette, HarperCollins, Wiley, and Penguin Random House sued Internet Archive.
    • In 2023, an order was issued in favour of the publishers.
    • The order stated: IA’s Website includes millions of public domain e-books that users can download for free and read without restrictions. However, the Website also includes 3.6 million books protected by valid copyrights. 

Why are books being removed?

  • As a result of the lawsuit, IA was forced to remove over half a million books from its database.
  • While IA identifies itself as a library, it has been compared to a shadow library or a piracy database by traditional publishers, who disagree with its ‘controlled digital lending’ approach.
  • Despite the removal, however, the Internet Archive is still home to a rich collection.
    • It still contains 835 billion web pages, 44 million books and texts, 15 million audio recordings, 10.6 million videos, 4.8 million images, and 1 million software programs. 
    • Live concerts and television programs also make up part of this collection.

Wayback Machine: 

  • The Wayback Machine is a digital archive of the World Wide Web founded by the Internet Archive in 1996 and launched to the public in 2001, it allows the user to go ‘back in time’ to see how websites looked in the past.
  • The Wayback Machine was created as a joint effort between Alexa Internet (owned by Amazon.com) and the Internet Archive. 
  • Hundreds of billions of web sites and their associated data (images, source code, documents, etc.) are saved in a database.
  • There is a good chance of finding content such as old websites that no longer exist today, earlier versions of existing websites, deleted social media posts, archived versions of paywalled articles, and archived versions of content that is blocked or censored in some jurisdictions.
  • Wayback Machine is useful for personal research or to access information sources, but users should be cautious about relying on the data obtained through such sources, as the saved information can sometimes be outdated or inaccurate.
  • This has created more than 28 years of web history accessible through the Wayback Machine. 
  • The platform claims users can explore over 866 billion saved web pages through its own search service.
  • ‘Archive-It’ program identifies important web pages on the Internet Archive’s website.
    • Archive-It: Created in early 2006, Archive-It is a web archiving subscription service that allows institutions and individuals to build and preserve collections of digital content and create digital archives.
    • Archive-It allows the user to customize their capture or exclusion of web content they want to preserve for cultural heritage reasons.
    • Through a web application, Archive-It partners can search, catalogue, manage, browse, and view their archived collections. 
    • Periodically, the data captured through Archive-It is indexed into the Internet Archive’s general archive.
  • Not all web sites are available because many web site owners choose to exclude their sites. 

Prelims Practice Question: 

Q. Regarding ‘Internet Archive’ consider the following statements:

  1. Digital Accessible Information System (DAISY) format is designed as a complete audio substitute for print material.
  2. Content that is blocked or censored in some jurisdictions cannot be found through the use of Wayback Machine.

Which of the statements given above is/are correct?

(a)1 only

(b)2 only

(c)Both 1 and 2

(d)Neither 1 nor 2

Answer: (a)

Share this with friends ->