The Internet Archive - PowerPoint PPT Presentation

Loading...

PPT – The Internet Archive PowerPoint presentation | free to view - id: 1dffa-MzMyZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

The Internet Archive

Description:

... Internet Archive. 116 Sheridan Avenue. The Presidio of San ... Chat the Planet. Internet Archive Site Map. THE WEB. Wayback Machine. Recall Full Text Search ... – PowerPoint PPT presentation

Number of Views:160
Avg rating:3.0/5.0
Slides: 23
Provided by: AndrewK164
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: The Internet Archive


1
The Internet Archive
  • 116 Sheridan Avenue
  • The Presidio of San Francisco
  • San Francisco, California 94129
  • www.archive.org

2
Our Mission
Global Access to All Intellectual and Creative
Works
  • Information accessible
  • to anyone
  • from anywhere.
  • Information distributed
  • across the globe in
  • a network of regional
  • digital libraries.

3
Our Grand Vision
  • The Internet Archive Proposition
  • If you contribute open content
  • And if you agree to sharing, use, and reuse
  • We will provide you with
  • Unlimited Storage
  • Unlimited Bandwidth
  • Forever
  • For Free

4
What is in the Archive?
  • Web Pages
  • 40 Billion Pages / 500 Terabytes / 47 Million
    Sites
  • Films Videos
  • 8,000 Moving Images
  • Music Spoken Word
  • 19,000 Concerts / 1,000 Lectures Readings
  • Books Texts
  • 27,000 Titles
  • Software
  • 10,000 Programs

5
Web Archive
  • Wayback Machine
  • Snapshot Every Two Months
  • 2.5 Billion Pages per Crawl
  • 15 Terabytes per month
  • Recall Full Text Search
  • 11 billion pages
  • The Internet Library
  • Archived URLs Organized by Subject

6
Moving Images Archive
  • Films
  • Prelinger Collection Feature Films
  • Videos
  • Animation, News, Games, Lectures
  • Television
  • Technology, World Events, Interviews
  • Open Source Collection
  • Individual Contributions

7
Audio Archive
  • Music
  • Live Concerts, Net Labels, Classical
  • Spoken Word
  • Presidential Recordings, Lectures, Poetry
  • Radio Programs
  • News, Public Affairs, Politics
  • Open Source Collection
  • Individual Contributions

8
Books Texts Archive
  • The Million Book Project
  • Partnered with Carnegie Mellon University
  • International Childrens Digital Library
  • 25 Languages / 45 Countries
  • Project Gutenberg
  • Freely Downloadable Public Domain Books
  • Internet Bookmobile
  • Egypt, India, Uganda, USA

9
Software Archive
  • Machinima
  • Animations Using Video Game Engines
  • Speed Runs
  • Record Breaking Game Play Movies
  • Software Electronic Press Kits
  • Background on Major Software Releases
  • Classic Software Preservation
  • Digital Game Archives

10
Internet Archive Site Map
  • MOVING IMAGES
  • Prelinger Film Archives
  • Computer Chronicles
  • SIGGRAPH Theater
  • Net Café
  • World at War
  • Open Source Movies
  • Feature Films
  • MSRI Math Lectures
  • Open Mind
  • Shaping San Francisco
  • Brick Films
  • Mosaic Middle East News
  • Guerilla News Network
  • Game Videos
  • Machinima
  • Speed Runs
  • Videogame Previews
  • Software Videos
  • Skill Replays
  • Classic Software Preservation
  • Election 2004
  • Independent News
  • Media Arts
  • Youth Media
  • Listen Up
  • Youth Sounds
  • Chat the Planet

11
Internet Archive Site Map
  • THE WEB
  • Wayback Machine
  • Recall Full Text Search
  • TEXTS BOOKS
  • Million Books Project
  • Childrens Library
  • Project Gutenberg
  • Arpanet
  • Open Source Books
  • Dance Manuals
  • Internet Bookmobile
  • AUDIO
  • Live Music Archive
  • Net Labels
  • Presidential Recordings
  • Democracy Now
  • Other Minds
  • Conference Proceedings
  • Naropa Audio Archives
  • Gender Talk
  • Open Source Audio
  • Blues Country
  • Electronic Experimental
  • Hip Hop Rock
  • Indie Jazz
  • Spoken Word

12
The Television Archive
  • 20 Global Television Networks
  • 24 hours a day
  • 7 days a week
  • 4 Languages
  • English, Russian, Japanese, Arabic
  • 20 terabytes per month

13
Internet Archive Process
  • Acquire Content
  • If Analog, Digitize and Encode
  • If Audio/Video, Create Derivatives
  • Create XML Metadata
  • Update Search Engine
  • Curate Individual Items
  • Create Backups
  • Enable Web Access

14
Technology Data Acquisition
  • Web Pages
  • Heritrix Web Crawler IA Developed
  • Book and Text Scanning
  • Kirtas APT BookScan 1200
  • Film Video Digitizing
  • Multiple Formats High Capacity
  • Contribution Engine
  • Automatic Format Deriver

15
Technology - Storage
  • Petabox
  • Scalable Data Repository
  • One Million Gigabytes
  • High Density / Low Power
  • Remote Management
  • Geographic Redundancy
  • San Francisco
  • Amsterdam
  • Alexandria
  • Asia (2005)

16
Technology - Access
  • 10 million hits per day
  • 60,000 unique visitors / day
  • 135,000 files downloaded / day
  • 1.5 gigabits/sec as of Q4 2004
  • XML based search engine

17
Internet Archive Partners
  • National Libraries and Archives
  • Library of Alexandria, Egypt
  • Canadian National Library
  • French National Library
  • National Archives UK
  • Library of Congress USA

18
Internet Archive Partners
  • Universities
  • Ars Digita University Computer Science
  • Carnegie Mellon University - Million Books
    Project
  • MIT - Open Courseware
  • Naropa University - Poetry
  • Northwestern University - SCOTUS
  • Rice University - Connexions
  • University of Maryland - Childrens Digital
    Library
  • University of Toronto - Canadiana Archive
  • University of Virginia - Miller Center Public
    Affairs

19
Internet Archive Partners
  • Specific Content Providers
  • ACF Newsource - Radio Program Archives
  • EOGEO - NASA LandSat Project
  • United Nations - UN Environment Program
  • Link TV - Mosaic Middle Eastern News
  • MSRI - Math Sciences Research Institute
  • Tucows - Software Archives

20
Internet Archive in Numbers
  • Web Sites
  • 40 Billion Pages, 500 Terabytes
  • New Web Crawl Every 2 Months 30 TB
  • Collections Video, Audio, Texts
  • 55,000 Unique Items
  • 120 Terabytes
  • Storage Costs
  • 1,500 / Terabyte
  • PetaBox 1.5 Million
  • Site Activity 10 million hits / day
  • 60,000 Visitors 135,000 Downloads / day
  • Typical Bandwidth Usage 750 Megabits / second

21
Internet Archive Awards
  • Computerworld Smithsonian
  • Laureate Award - 2000
  • PC World
  • Best of the Web - 2002
  • Yahoo Internet Life
  • Site of the Year - 2002
  • Digital Archives
  • Annual Award - 2002
  • PC Magazine
  • Top 100 Classic Sites - 2004

22
Internet Archive
  • Wayback Machine
  • Moving Images
  • Books Texts
  • Music Spoken Word
  • Classic Software
  • www.archive.org
  • stewart_at_archive.org
About PowerShow.com