Why Not Store Everything in Main Memory? Why use disks? - PowerPoint PPT Presentation


PPT – Why Not Store Everything in Main Memory? Why use disks? PowerPoint presentation | free to download - id: 7052ac-YjQ0M


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Why Not Store Everything in Main Memory? Why use disks?


Background: There is a Twitter Archive at US Library of Congress Certainly the record of what millions of Americans say, think, and feel each day (their tweets) is a ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 25
Provided by: William1177


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Why Not Store Everything in Main Memory? Why use disks?

Background There is a Twitter Archive at US
Library of Congress Certainly the record of what
millions of Americans say, think, and feel each
day (their tweets) is a treasure-trove for
historians. But is the technology feasible, and
important for a federal agencygt Is it
cost-effective to handle the three V's that form
the fingerprint of a Big Data project volume,
velocity, and variety? U.S. Library of Congress
said yes and agreed to archive all tweets sent
since 2006 for posterity. But the task is
daunting. Volume? US LoC will archive 172
billion tweets in 2013 alone (300 each from 500
million tweeters), so many trillions,
2006-2016? Velocity? currently absorbing gt 20
million tweets/hour, 24 hours/day, seven
days/week, each stored in a way that can
last. Variety? tweets from a woman who may run
for president in 2016 and Lady Gaga. And
they're different in other ways. "Sure, a tweet
is 140 characters" says Jim Gallagher, US Library
of Congress Director of Strategic Initiatives,
"There are 50 fields. We need to record who wrote
it. Where. When." Because many tweets seem
banal, the project has inspired ridicule. When
the library posted its announcement of the
project, one reader wrote in the comments box
"I'm guessing a good chunk ... came from the
Kardashians." But isn't banality the point?
Historians want to know not just what happened in
the past but how people lived. It is why they
rejoice in finding a semiliterate diary kept by a
Confederate soldier, or pottery fragments in a
colonial town. It's as if a historian today
writing about Lincoln could listen in on what
millions of Americans were saying on the day he
was shot. Youkel and Mandelbaum might seem like
an odd couple to carry out a Big Data project
One is a career Library of Congress researcher
with an undergraduate degree in history, the
other a geologist who worked for years with oil
companies. But they demonstrate something
Babson's Mr. Davenport has written about the
emerging field of analytics "hybrid
specialization." For organizations to use the
new technology well, traditional skills, like
computer science, aren't enough. Davenport points
out that just as Big Data combines many
innovations, finding meaning in the world's
welter of statistics means combining many
different disciplines. Mandelbaum and Youkel
pool their knowledge to figure out how to archive
the tweets, how researchers can find what they
want, and how to train librarians to guide them
but not how to data mine them!. Even before
opening tweets to the public, the library has
gotten more than 400 requests from doctoral
candidates, professors, and journalists. "This
is a pioneering project," Mr. Dizard says. "It's
helping us begin to handle large digital data."
For "America's library," at this moment, that
means housing a Gutenberg Bible and Lady Gaga
tweets. What will it mean in 50 years? I ask
Dizard. He laughs and demurs. "I wouldn't look
that far ahead." I would! It will mean that
data mining will drive a sea-change move to
vertically storing all big data!
NSA programs (PRISM (email/twitter/facebook
analytics?) and ? (phone record analytics) Two
US surveillance programs one scooping up
records of Americans' phone calls and the other
collecting information on Internet-based
activities (PRISM?) came to public attention.
The aim data-mining to help NSA thwart
terrorism. But not everyone is cool with it. In
the name of fighting terrorism, the US gov has
been mining data collected from phone companies
such as Verizon for the past seven years and from
Google, Facebook, and other social media firms
for at least 4 yrs, according to gov docs leaked
this week to news orgs. The two surveillance
programs, one that collects detailed records of
telephone calls, the other that collects data on
Internet-based activities such as e-mail, instant
messaging, and video conferencing facetime,
skype?, were publicly revealed in "top secret"
docs leaked to the British newspaper the Guardian
and the Washington Post. Both are run by the
National Security Agency (NSA), the papers
reported.  The existence of the telephone
data-mining program was previously known, and
civil libertarians have for years complained that
it represents a dangerous and overbroad incursion
into the privacy of all Americans. What became
clear this week were certain details about its
operation such as that the government sweeps up
data daily and that a special court has been
renewing the program every 90 days since about
2007. But the reports about the Internet-based
data-mining program, called PRISM, represent a
new revelation, to the public. Data-mining can
involve the use of automated algorithms to sift
through a database for clues as to the existence
of a terrorist plot. One member of Congress
claimed this week that the telephone data-mining
program helped to thwart a significant terrorism
incident in the United States "within the last
few years," but could not offer more specifics
because the whole program is classified. Others
in Congress, as well as President Obama and the
director of national intelligence, sought to
allay concerns of critics that the surveillance
programs represent Big Government run amok. But
it would be wrong to suggest that every member of
Congress is on board with the sweep of such data
mining programs or with the amount of oversight
such national-security endeavors get from other
branches of government. Some have hinted for
years that they find such programs disturbing and
an infringement of people's privacy. Here's an
overview of these two data-mining programs, and
how much oversight they are known to
have.  Phone-record data mining On Thursday,
the Guardian displayed on its website a
top-secret court order authorizing the telephone
data-collection prog. The order, signed by a
federal judge on the mysterious Foreign
Intelligence Surveillance Court, requires a
subsidiary of Verizon to send to the NSA on an
ongoing daily basis through July its telephony
metadata, or communications logs, between the
United States and abroad or wholly within the
United States, including local telephone calls. 
Such metadata include the phone number calling
and the number called, telephone calling card
numbers, and time and duration of calls. What's
not included is permission for the NSA to record
or listen to a phone conversation. That would
require a separate court order, federal officials
said after the program's details were made
public. After the Guardian published the court's
order, it became clear that the document merely
renewed a data-collection that has been under way
since 2007 and one that does not target
Americans, federal officials said. The judicial
order that was disclosed in the press is used to
support a sensitive intelligence collection op,
on which members of Congress have been fully and
repeatedly briefed, said James Clapper, director
of national intelligence, in a statement about
the phone surveillance program. The classified
program has been authorized by all three branches
of the Government. That does not do much to
assuage civil libertarians, who complain that the
government can use the program to piece together
moment-by-moment movements of individuals
throughout their day and to identify to whom they
speak most often. Such intelligence operations
are permitted by law under Section 215 of the
Patriot Act, so-called business records
provision. It compels businesses to provide
information about their subscribers to the
government. Some companies responded, but
obliquely, given that by law they cannot comment
on the surveillance programs or even confirm
their existence. Randy Milch, general counsel
for Verizon, said in an e-mail to employees that
he had no comment on the accuracy of the Guardian
article, the Washington Post reported. The
alleged order, he said, contains language that
compels Verizon to respond to government
requests and forbids Verizon from revealing the
order's existence.
Will NSA leaks wake us from our techno-utopian
dream? A vast surveillance state is being made
possible by technologies that we were told would
liberate us. Christian Science Monitor Dan
Murphy, Staff writer, 6/10/13 They work a few
hundred yards from one of the Library of
Congress's most prized possessions a vellum copy
of the Bible printed in 1455 by Johann Gutenberg,
inventor of movable type. But almost six
centuries later, Jane Mandelbaum and Thomas
Youkel have a task that would confound Gutenberg.
The researchers are leading a team that is
archiving almost every tweet sent out since
Twitter began in 2006. A half-billion tweets
stream into library computers each day. Their
question How can they store the tweets so they
become a meaningful tool for researchers a sort
of digital transcript providing insights into the
daily flow of history? Thousands of miles away,
Arnold Lund has a different task. Mr. Lund
manages a lab for General Electric, a company
that still displays the desk of its founder,
Thomas Edison at its research headquarters in
Niskayuna, N.Y. But even Edison might need
training before he'd grasp all the dimensions of
one of Lund's projects. Lund's question How can
power companies harness the power of data to
predict which trees will fall on power lines
during a storm thus allowing them to prevent
blackouts before they happen? The work of
Richard Rothman, a professor at Johns Hopkins
University in Baltimore, is more fundamental to
save lives. The Centers for Disease Control and
Prevention (CDC) in Atlanta predicts flu
outbreaks, once it examines reports from
hospitals. That takes weeks. In 2009, a study
seemed to suggest researchers could predict
outbreaks much faster by analyzing millions of
Google searches. Spikes in queries like "My kid
is sick" signaled a flu outbreak before the CDC
knew there would be one. That posed a new
question for Dr. Rothman and his colleague Andrea
Dugas Could Google help predict influenza
outbreaks in time to allow hospitals like the one
at Johns Hopkins to get ready? They ask different
questions. But all of these researchers form
part of the new world of Big Data a phenomenon
that may revolutionize every facet of life,
culture, and, well, even the planet. From curbing
urban crime to calculating the effectiveness of a
tennis player's backhand, people are now
gathering and analyzing vast amounts of data to
predict human behaviors, solve problems, identify
shopping habits, thwart terrorists everything
but foretell which Hollywood scripts might make
blockbusters. Actually, there's a company poring
through numbers to do that, too. Just four years
ago, someone wanted to do a Wikipedia entry on
Big Data. Wikipedia said no there was nothing
special about the term it just combined 2
common words. Today, Big Data seems everywhere,
ushering in what advocates consider the biggest
changes since Euclid. Want to get elected to
public office? Put a bunch of computer geeks in a
room and have them comb through databases to
glean who might vote for you then target them
with micro-tailored messages, as President Obama
famously did in 2012. Want to solve poverty in
Africa? Analyze text messages and social media
networks to detect early signs of joblessness,
epidemics, and other problems, as the United
Nations is trying to do. Eager to find the right
mate? Use algorithms to analyze an infinite
number of personality traits to determine who's
the best match for you, as many online dating
sites now do. What exactly is Big Data? What
makes it new? Different? What's the downside?
Such questions have evoked intense interest,
especially since June 5. On that day, former
National Security Agency analyst Edward Snowden
revealed that, like Ms. Mandelbaum or Rothman,
the NSA had also asked a question Can we find
terrorists using Big Data like the phone
records of hundreds of millions of ordinary
Americans? Could we get those records from, say,
Verizon? Mr. Snowden's disclosures revealed that
PRISM, the program the NSA devised, secretly
monitors calls, Web searches, and e-mails, in the
United States and other countries.
Short Message Service (SMS) is a text messaging
service component of phone, web, or mobile
communication systems, uses standardized
communications protocols that allow the exchange
of short text messages between fixed line or
mobile phone devices. SMS is the most widely
used data application in the world, with 3.5
billion active users, or 78 of all mobile phone
subscribers. The term "SMS" is used for all types
of short text messaging and the user activity
itself in many parts of the world. SMS as used
on modern handsets originated from radio
telegraphy in radio memo pagers using
standardized phone protocols. These were defined
in 1985, as part of the Global System for Mobile
Communications (GSM) series of standards as a
means of sending messages of up to 160 characters
to and from GSM mobile handsets. Though most SMS
messages are mobile-to-mobile text messages,
support for the service has expanded to include
other mobile technologies, such as ANSI CDMA
networks and Digital AMPS, as well as satellite
and landline networks. Message size
Transmission of short messages between the SMSC
and the handset is done whenever using the Mobile
Application Part (MAP) of the SS7 protocol.
Messages are sent with the MAP MO- and
MT-ForwardSM operations, whose payload length is
limited by the constraints of the signaling
protocol to precisely 140 octets (140 octets
140 8 bits 1120 bits). Short messages can be
encoded using a variety of alphabets the default
GSM 7-bit alphabet, the 8-bit data alphabet, and
the 16-bit UCS-2 alphabet. Larger content
(concatenated SMS, multipart or segmented SMS, or
"long SMS") can be sent using multiple messages,
in which case each message will start with a User
Data Header (UDH) containing segmentation info.
Text messaging, or texting, is the act of
typing and sending a brief, electronic message
between two or more mobile phones or fixed or
portable devices over a phone network. The term
originally referred to messages sent using the
Short Message Service (SMS) only it has grown to
include messages containing image, video, and
sound content (known as MMS messages). The sender
of a text message is known as a texter, while the
service itself has different colloquialisms
depending on the region. It may simply be
referred to as a text in North America, the
United Kingdom, Australia and the Philippines, an
SMS in most of mainland Europe, and a TMS or SMS
in the Middle East, Africa and Asia. Text
messages can be used to interact with automated
systems to, for example, order products or
services, or participate in contests. Advertisers
and service providers use direct text marketing
to message mobile phone users about promotions,
payment due dates, etcetera instead of using
mail, e-mail or voicemail. In a straight and
concise definition for the purposes of this
English Language article, text messaging by
phones or mobile phones should include all 26
letters of the alphabet and 10 numerals, i.e.,
alpha-numeric messages, or text, to be sent by
texter or received by the textee. Security
concerns Consumer SMS should not be used for
confidential communication. The contents of
common SMS messages are known to the network
operator's systems and personnel. Therefore,
consumer SMS is not an appropriate technology for
secure communications. To address this issue,
many companies use an SMS gateway provider based
on SS7 connectivity to route the messages. The
advantage of this international termination model
is the ability to route data directly through
SS7, which gives the provider visibility of the
complete path of the SMS. This means SMS messages
can be sent directly to and from recipients
without having to go through the SMS-C of other
mobile operators. This approach reduces the
number of mobile operators that handle the
message however, it should not be considered as
an end-to-end secure communication, as the
content of the message is exposed to the SMS
gateway provider. Failure rates without backward
notification can be high between carriers
(T-Mobile to Verizon is notorious in the US).
International texting can be extremely unreliable
depending on the country of origin, destination
and respective carriers.
Twitter is an online social networking and
microblogging service that enables its users to
send and read text-based messages of up to 140
characters, known as "tweets". Twitter was
created in March 2006 by Jack Dorsey and by July,
the social networking site was launched. The
service rapidly gained worldwide popularity, with
over 500 million registered users as of 2012,
generating over 340 million tweets daily and
handling over 1.6 billion search queries per day.
Since its launch, Twitter has become one of the
ten most visited websites on the Internet, and
has been described as "the SMS of the Internet.
Unregistered users can read tweets, while
registered users can post tweets thru the
website, SMS, or a range of apps for mobiles.
Twitter Inc. is in San Francisco, with servers
and offices in NYC, Boston, San Antonio. Tweets
are publicly visible by default, but senders can
restrict message delivery to just their
followers. Users can tweet via the Twitter
website, compatible external apps (such as for
smartphones), or by Short Message Service (SMS)
available in certain countries. While the service
is free, accessing it thru SMS has phone fees.
Users may subscribe to other users' tweets  this
is known as following and subscribers are known
as followers or tweeps, a portmanteau of Twitter
and peeps. Users can also check people who are
un-subscribing them on Twitter (unfollowing).
Also, users have the capability to block those
who have followed them. Twitter allows users to
update their profile via their mobile phone
either by text messaging or by apps released for
certain smartphones and tablets. Twitter has been
compared to a web-based Internet Relay Chat (IRC)
client. In a 2009 Time essay, described the basic
mechanics of Twitter as "remarkably simple" As
a social network, Twitter revolves around the
principle of followers. When you choose to follow
another Twitter user, that user's tweets appear
in reverse chronological order on your main
Twitter page. If you follow 20 people, you'll see
a mix of tweets scrolling down the page
breakfast-cereal updates, interesting new links,
music recommendations, even musings on the future
of education. Pear Analytic analyzed 2,000
tweets (originating from US in English) over a
2-week period in 8/09 from 1100 am to 500 pm
(CST) and separated them into six
categories Pointless babble 40
Conversational 38 Pass-along value 9
Self-promotion 6 Spam 4 News 4
Social networking researcher Danah Boyd argues
what Pear researchers labeled "pointless babble"
is better characterized as "social grooming"
and/or "peripheral awareness" (which she explains
as persons "wanting to know what the people
around them are thinking and doing and feeling,
even when co-presence isnt viable"). Format
Users can group posts by topic/type with hashtags
words or phrases prefixed with a "" sign.
Similarly, "_at_" sign followed by a username is
used for mention/reply to other users. To repost
a message from another Twitter user, and share it
with one's own followers, retweet function,
symbolized by "RT" in the message. In late 2009,
the "Twitter Lists" feature was added, making it
possible for users to follow (as well as mention
and reply to) ad hoc lists of authors instead of
individual authors Through SMS, users can
communicate with Twitter thru 5 gateway numbers
short codes for US, Canada, India, New Zealand,
Isle of Man-based number for international use.
There is also a short code in the UK only
accessible to those on the Vodafone, O2 and
Orange networks. In India, since Twitter only
supports tweets from Bharti Airtel an alternative
platform called smsTweet was set up by a user to
work on all networks - GladlyCast exists for
mobile phone users in Singapore, Malaysia,
Philippines. The tweets were set to a
140-character limit for compatibility with SMS
messaging, introducing the shorthand notation and
slang commonly used in SMS messages. The
140-character limit has also increased the usage
of URL shortening services such as bit.ly,
goo.gl, and tr.im, and content-hosting services,
such as Twitpic, memozu.com and NotePub to
accommodate multimedia content and text longer
than 140 characters. Since June 2011, Twitter has
used its own t.co domain for automatic shortening
of all URLs posted on its website. Trending
topics A word, phrase or topic that is tagged
at a greater rate than other tags is said to be a
trending topic. Trending topics become popular
either thru a concerted effort by users, or by an
event prompts people to talk about one specific
topic These topics help Twitter and their users
understand what's happening in the
world. Trending topics are sometimes the result
of concerted efforts by fans of certain
celebrities or cultural phenomena, particularly
musicians like Lady Gaga (known as Little
Monsters), Justin Bieber (Beliebers), and One
Direction (Directioners), and fans of the
Twilight (Twihards) and Harry Potter
(Potterheads) novels. Twitter has altered the
trend algorithm in the past to prevent
manipulation of this type. Twitter's March 30,
2010 blog post announced that the hottest Twitter
trending topics would scroll across the Twitter
homepage. Controversies abound on Twitter
trending topics Twitter has censored hashtags
other users found offensive. Twitter censored the
Thatsafrican and the thingsdarkiessay hashtags
after users complained they found the hashtags
offensive. There are allegations that twitter
removed NaMOinHyd from trending list and added
Indian National Congress sponsored hashtag.
Adding and following content There are
numerous tools for adding content, monitoring
content and conversations including Telly (video
sharing, old name is Twitvid), TweetDeck,
Salesforce.com, HootSuite, and Twitterfeed. As of
2009, fewer than half of tweets were posted using
the web user interface with most users using
third-party applications (based on analysis of
500 million tweets by Sysomos). Verified accounts
In June 2008, Twitter launched a verification
program, allowing celebrities to get their
accounts verified.97 Originally intended to
help users verify which celebrity accounts were
created by the celebrities themselves (and
therefore are not fake), they have since been
used to verify accounts of businesses and
accounts for public figures who may not actually
tweet but still wish to maintain control over the
account that bears their name. Mobile Twitter
has mobile apps for iPhone, iPad, Android,
Windows Phone, BlackBerry, and Nokia There is
also version of the website for mobile devices,
SMS and MMS service. Twitter limits the use of
third party applications utilizing the service by
implementing a 100,000 user limit. Authentication
As of August 31, 2010, third-party Twitter
applications are required to use OAuth, an
authentication method that does not require users
to enter their password into the authenticating
application. Previously, the OAuth authentication
method was optional, it is now compulsory and the
user-name/password authentication method has been
made redundant and is no longer functional.
Twitter stated that the move to OAuth will mean
"increased security and a better
experience". Related Headlines On August 19,
2013, Twitter announced Twitter Related
Headlines. Usage Rankings Twitter is ranked
as one of the ten-most-visited websites worldwide
by Alexa's web traffic analysis. Daily user
estimates vary as the company does not publish
statistics on active accounts. A February 2009
Compete.com blog entry ranked Twitter as the
third most used social network based on their
count of 6 million unique monthly visitors and
55 million monthly visits. In March 2009, a
Nielsen.com blog ranked Twitter as the
fastest-growing website in the Member Communities
category for February 2009. Twitter had annual
growth of 1,382 percent, increasing from 475,000
unique visitors in February 2008 to 7 million in
February 2009. In 2009, Twitter had a monthly
user retention rate of forty percent. Demographics
Twitter.com Top5 Global Markets by Reach ()
CountryPercent IndonesiaJun 2010 20.8,
Dec 2010 19.0 BrazilJun 2010 20.5, Dec 2010
21.8 VenezuelaJun 2010 19.0, Dec 2010 21.1
NetherlandsJun 2010 17.7, Dec 2010 22.3
JapanJun 2010 16.8, Dec 2010 20.0 Note Visitor
age 15, home and work locations. Excludes
visitation from public computers such as Internet
cafes or access from mobile phones or PDAs. In
2009, Twitter was mainly used by older adults who
might not have used other social sites before
Twitter, said Jeremiah Owyang, an industry
analyst studying social media. "Adults are just
catching up to what teens have been doing for
years," he said. According to comScore only
eleven percent of Twitter's users are aged twelve
to seventeen. comScore attributed this to
Twitter's "early adopter period" when the social
network first gained popularity in business
settings and news outlets attracting primarily
older users. However, comScore also stated in
2009 that Twitter had begun to "filter more into
the mainstream", and "along with it came a
culture of celebrity as Shaq, Britney Spears and
Ashton Kutcher joined the ranks of the
Twitterati." According to a study by Sysomos in
June 2009, women make up a slightly larger
Twitter demographic than men fifty-three
percent over forty-seven percent. It also stated
that five percent of users accounted for
seventy-five percent of all activity, and that
New York City has more Twitter users than other
cities. According to Quancast, twenty-seven
million people in the US used Twitter as of
September 3, 2009. Sixty-three percent of Twitter
users are under thirty-five years old sixty
percent of Twitter users are Caucasian, but a
higher than average (compared to other Internet
properties) are African American/black (sixteen
percent) and Hispanic (eleven percent)
fifty-eight percent of Twitter users have a total
household income of at least US60,000. The
prevalence of African American Twitter usage and
in many popular hashtags has been the subject of
research studies. On September 7, 2011, Twitter
announced that it has 100 million active users
logging in at least once a month and 50 million
active users every day. In an article published
on January 6, 2012, Twitter was confirmed to be
the biggest social media network in Japan, with
Facebook following closely in second. comScore
confirmed this, stating that Japan is the only
country in the world where Twitter leads
Facebook. Finances Funding Twitter's San
Francisco headquarters located at 1355 Market St.
Twitter raised over US57 million from venture
capitalist growth funding, although exact numbers
are not publicly disclosed. Twitter's first A
round of funding was for an undisclosed amount
that is rumored to have been between US1 million
and US5 million. Its second B round of funding
in 2008 was for US22 million and its third C
round of funding in 2009 was for US35 million
from Institutional Venture Partners and Benchmark
Capital along with an undisclosed amount from
other investors including Union Square Ventures,
Spark Capital and Insight Venture Partners.
Twitter is backed by Union Square Ventures,
Digital Garage, Spark Capital, and Bezos
Expeditions. In May 2008, The Industry Standard
remarked that Twitter's long-term viability is
limited by a lack of revenue. Twitter board
member Todd Chaffee forecast that the company
could profit from e-commerce, noting that users
may want to buy items directly from Twitter since
it already provides product recommendations and
promotions. The company raised US200 million in
new venture capital in December 2010, at a
valuation of approximately US3.7 billion. In
March 2011, 35,000 Twitter shares sold for
US34.50 each on Sharespost, an implied valuation
of US7.8 billion. In August, 2010 Twitter
announced a "significant" investment lead by
Digital Sky Tech that, at US800M, was reported
to be the largest venture round in history.
Twitter has been identified as a possible
candidate for an initial public offering by
2013. In December 2011, the Saudi prince Alwaleed
bin Talal invested 300 million in Twitter. The
company was valued at 8.4 billion at the time.
Revenue sources In July 2009, some of Twitter's
revenue and user growth documents were published
on TechCrunch after being illegally obtained by
Hacker Croll. The documents projected 2009
revenues of US400,000 in the third quarter and
US4 million in the fourth quarter along with
25 million users by the end of the year. The
projections for the end of 2013 were
US1.54 billion in revenue, US111 million in net
earnings, and 1 billion users. No information
about how Twitter planned to achieve those
numbers was published. In response, Twitter
co-founder Biz Stone published a blog post
suggesting the possibility of legal action
against hacker. On April 13, 2010, Twitter
announced plans to offer paid advertising for
companies that would be able to purchase
"promoted tweets" to appear in selective search
results on the Twitter website, similar to Google
Adwords' advertising model. As of April 13,
Twitter announced it had already signed up a
number of companies wishing to advertise
including Sony Pictures, Red Bull, Best Buy, and
Starbucks. To continue their advertising
campaign, Twitter announced on March 20, 2012,
that it would be bringing its promoted tweets to
mobile devices. Twitter generated US139.5
million in advertising sales during 2011 and
expects this number to grow 86.3 to US259.9
million in 2012. The company generated US45
million in annual revenue in 2010, after
beginning sales midway through that year. The
company operated at a loss through most of 2010.
Revenues were forecast for US100 million to
US110 million in 2011. Users' photos can
generate royalty-free revenue for Twitter, with
an agreement with WENN being announced in May
2011. In June 2011, Twitter announced that it
would offer small businesses a self serve
advertising system. In April 2013, Twitter
announced that its Twitter Ads self-service ads
platform was available to all US users without an
invite. Technology Implementation Great
reliance is placed on open-source software. The
Twitter Web interface uses the Ruby on Rails
framework, deployed on a performance enhanced
Ruby Enterprise Edition implementation of Ruby.
As of April 6, 2011, Twitter engineers confirmed
they had switched away from their Ruby on Rails
search-stack, to a Java server they call Blender.
From spring 2007 to 2008 the messages were
handled by a Ruby persistent queue server called
Starling, but since 2009 implementation has been
gradually replaced with software written in
Scala. The service's application programming
interface (API) allows other web services and
applications to integrate with Twitter.
Individual tweets are registered under unique IDs
using software called snowflake and geolocation
data is added using 'Rockdove'. The URL shortner
t.co then checks for a spam link and shortens the
URL. The tweets are stored in a MySQL database
using Gizzard and acknowledged to users as having
been sent. They are then sent to search engines
via the Firehose API. The process itself is
managed by FlockDB and takes an average of 350
ms. On August 16, 2013, Twitters Vice President
of Platform Engineering Raffi Krikorian shared in
a blog post that the company's infrastructure
handled almost 143,000 tweets per second during
that week, setting a new record. Krikorian
explained that Twitter achieved this record by
blending its homegrown and open source. Interface
On April 30, 2009, Twitter adjusted its web
interface, adding a search bar and a sidebar of
"trending topics" the most common phrases
appearing in messages. Biz Stone explains that
all messages are instantly indexed and that "with
this newly launched feature, Twitter has become
something unexpectedly important a discovery
engine for finding out what is happening right
now." In March 2012, Twitter became available in
Arabic, Farsi, Hebrew and Urdu, the first
right-to-left language versions of the site.
About 13,000 volunteers helped with translating
the menu options. it is available in 33 different
languages. Outages When Twitter experiences an
outage, users see the "fail whale" error message
image created by Yiying Lu, illustrating eight
orange birds using a net to hoist a whale from
the ocean captioned "Too many tweets! Please wait
a moment and try again." Twitter had
approximately ninety-eight percent uptime in 2007
(or about six full days of downtime). The
downtime was particularly noticeable during
events popular with the technology industry such
as 2008 Macworld Conf Expo keynote.
Privacy and security Twitter messages are public
but users can also send private messages. Twitter
collects personally identifiable information
about its users and shares it with third parties.
The service reserves the right to sell this
information as an asset if the company changes
hands. While Twitter displays no advertising,
advertisers can target users based on their
history of tweets and may quote tweets in ads
directed specifically to the user. A security
vulnerability was reported on April 7, 2007, by
Nitesh Dhanjani and Rujith. Since Twitter used
the phone number of the sender of an SMS message
as authentication, malicious users could update
someone else's status page by using SMS spoofing.
The vulnerability could be used if the spoofer
knew the phone number registered to their
victim's account. Within a few weeks of this
discovery Twitter introduced an optional personal
identification number (PIN) that its users could
use to authenticate their SMS-originating
messages. On January 5, 2009, 33 high-profile
Twitter accounts were compromised after a Twitter
administrator's password was guessed by a
dictionary attack. Falsified tweets including
sexually explicit and drug-related messages
were sent from these accounts. Twitter launched
the beta version of their "Verified Accounts"
service on June 11, 2009, allowing famous or
notable people to announce their Twitter account
name. The home pages of these accounts display a
badge indicating their status. In May 2010, a bug
was discovered by Inci Sözlük, involving users
that allowed Twitter users to force others to
follow them without the other users' consent or
knowledge. For example, comedian Conan O'Brien's
account, which had been set to follow only one
person, was changed to receive nearly 200
malicious subscriptions. In response to Twitter's
security breaches, the US Federal Trade
Commission brought charges against the service
which were settled on June 24, 2010. This was the
first time the FTC had taken action against a
social network for security lapses. The
settlement requires Twitter to take a number of
steps to secure users' private information,
including maintenance of a "comprehensive
information security program" to be independently
audited biannually. On 12/14/10, USDoJ issued a
subpoena directing Twitter to provide information
for accounts registered to or associated with
WikiLeaks. Twitter decided to notify its users
and said "...it's our policy to notify users
about law enforcement and governmental requests
for their information, unless we are prevented by
law from doing so"....
Open source Twitter has a history of both using
and releasing open source software while
overcoming technical challenges of their service.
A page in their developer documentation thanks
dozens of open source projects which they have
used, from revision control software like Git to
programming languages such as Ruby and Scala.
Software released as open source by the company
includes the Gizzard Scala framework for creating
distributed datastores, the distributed graph
database FlockDB, the Finagle library for
building asynchronous RPC servers and clients,
the TwUI user interface framework for iOS, and
the Bower client-side package manager. The
popular Twitter Bootstrap web design library was
also started at Twitter and is the most popular
repository on GitHub. Innovators patent agreement
On April 17, 2012, Twitter would implement an
Innovators Patent Agreement which obligate
Twitter to only use its patents for defense. URL
shortener t.co is a URL shortening service
created by Twitter. It is only available for
links posted to Twitter and not available for
general use. All links posted to Twitter use a
t.co wrapper. Twitter hopes that the service will
be able to protect users from malicious sites,
and will use it to track clicks on links within
tweets. Having previously used the services of
third parties TinyURL and bit.ly. Twitter began
experimenting with its own URL shortening service
for private messages in March 2010 using the
twt.tl domain, before it purchased the t.co
domain. The service was tested on the main site
using the accounts _at_TwitterAPI, _at_rsarver and
_at_raffi. On Sept 2, 2010, an email from Twitter to
users said they would be expanding the roll-out
of the service to users. On June 7, 2011, Twitter
was rolling out the feature. Integrated
photo-sharing service On June 1, 2011, Twitter
announced its own integrated photo-sharing
service that enables users to upload a photo and
attach it to a Tweet right from Twitter.com.
Users now also have the ability to add pictures
to Twitter's search by adding hashtags to the
tweet. Twitter also plans to provide photo
galleries designed to gather and syndicate all
photos that a user has uploaded on Twitter and
third-party services such as TwitPic. Use and
social impact Dorsey said after a Twitter Town
Hall with Barack Obama held in July 2011, that
Twitter received over 110,000 AskObama
tweets. Main article Twitter usage Twitter has
been used for a variety of purposes in many
industries and scenarios. For example, it has
been used to organize protests, sometimes
referred to as "Twitter Revolutions", which
include the Egyptian revolution, 20102011
Tunisian protests, 20092010 Iranian election
protests, and 2009 Moldova civil unrest. The
governments of Iran and Egypt blocked the service
in retaliation. The Hill on February 28, 2011
described Twitter and other social media as a
"strategic weapon ... which have the apparent
ability to re-align the social order in real
time, with little or no advanced warning." During
the Arab Spring in early 2011, the number of
hashtags mentioning the uprisings in Tunisia and
Egypt increased. A study by the Dubai School of
Government found that only 0.26 of the Egyptian
population, 0.1 of the Tunisian population and
0.04 of the Syrian population are active on
Twitter. The service is also used as a form of
civil disobedience in 2010, users expressed
outrage over the Twitter Joke Trial by making
obvious jokes about terrorism and in the British
privacy injunction debate in the same country a
year later, where several celebrities who had
taken out anonymised injunctions, most notably
the Manchester United player Ryan Giggs, were
identified by thousands of users in protest to
traditional journalism being censored. Another,
more real time and practical use for Twitter
exists as an effective de facto emergency
communication system for breaking news. It was
neither intended nor designed for high
performance communication, but the idea that it
could be used for emergency communication
certainly was not lost on the originators, who
knew that the service could have wide-reaching
effects early on when the San Francisco,
California company used it to communicate during
earthquakes. The Boston Police tweeted news of
the arrest of the 2013 Boston Marathon Bombing
suspect. A practical use being studied is
Twitter's ability to track epidemics, how they
spread. Twitter has been adopted as a
communication and learning tool in educational
settings mostly in colleges and universities. It
has been used as a backchannel to promote student
interactions, especially in large-lecture
courses. Research has found that using Twitter in
college courses helps students communicate with
each other and faculty, promotes informal
learning, allows shy students a forum for
increased participation, increases student
engagement, and improves overall course
grades. In May 2008, The Wall Street Journal
wrote that social networking services such as
Twitter "elicit mixed feelings in the
technology-savvy people who have been their early
adopters. Fans say they are a good way to keep in
touch with busy friends. But some users are
starting to feel 'too' connected, as they grapple
with check-in messages at odd hours, higher
cellphone bills and the need to tell
acquaintances to stop announcing what they're
having for dinner." Television, rating
Twitter is also increasingly used for making TV
more interactive and social. This effect is
sometimes referred to as the "virtual
watercooler" or social television the practice
has been called "chatterboxing". Statistics
Most popular accounts As of July 28, 2013, the
ten accounts with the most followers belonged to
the following individuals and organizations262
Justin Bieber (42.2 mil followers worldwide)
Katy Perry (39.9m) Lady Gaga (39.2m) Barack
Obama (34.5) - most followed account for
politician Taylor Swift (31.4m) Rihanna
(30.7m) YouTube (31m) - highest account not
representing an individual Britney Spears
(29.7m) Instagram (23.6m) Justin Timberlake
(23.3m) Other selected accounts 12. Twitter
(21.6m) 16. Cristiano Ronaldo (20m) - highest
account athlete 58. FC Barcelona (9.5m) -
highest account representing a sports team Oldest
accounts 14 accounts belonging to Twitter
employees at the time and including _at_jack (Jack
Dorsey), _at_biz (Biz Stone) and _at_noah (Noah
Glass). Record tweets On February 3, 2013,
Twitter announced that a record 24.1 million
tweets were sent the night of Super Bowl
XLVII. Future Twitter emphasized its news and
information-network strategy in November 2009 by
changing the question asked to users for status
updates from "What are you doing?" to "What's
happening?" On November 22, 2010, Biz Stone, a
cofounder of the company, expressed for the first
time the idea of a Twitter news network, a
concept of a wire-like news service he has been
working on for years.
The dark side of Big Data involves much more than
Snowden's disclosure, or what the US does. And
what made Big Data possible did not happen
overnight. The term has been around for at least
15 years, though it's only recently become
popular. "It will be quite transformational,"
says Thomas Davenport, an information technology
expert at Babson College in Wellesley, Mass., who
co-wrote the widely used book "Competing on
Analytics The New Science of Winning." Going
back to the beginning. Big Data starts with ... a
lot of data. Google executive chairman Eric
Schmidt has said that we now uncover as much data
in 48 hours 1.8 zettabytes (that's
1,800,000,000,000,000,000,000 bytes) as humans
gathered from "the dawn of civilization to the
year 2003." You read that right. The head of a
company receiving 50 billion search requests a
day believes people now gather in a few days more
data than humans have done throughout almost all
of history. Mr. Schmidt's claim has doubters.
But similar assertions crop up from people not
prone to exaggeration, such as Massachusetts
Institute of Technology researcher Andrew McAfee
and MIT professor Erik Brynjolfsson, authors of
the new book "Race Against the Machine." "More
data crosses the Internet every second," they
write, "than were stored in the entire Internet
20 years ago." A key driver of the growth of data
is the way we've digitized many of our everyday
activities, such as shopping (increasingly done
online) or downloading music. Another factor our
dependence on electronic devices, all of which
leave digital footprints every time we send an
e-mail, search online, post a message, text, or
tweet. Virtually every institution in society,
from government to the local utility, is churning
out its own torrent of electronic digits about
our billing records, our employment, our
electricity use. Add in the huge array of sensors
that now exist, measuring everything from traffic
flow to the spoilage of fruit during shipment,
and the world is awash in information that we had
no way to uncover before all aggregated and
analyzed by increasingly powerful computers. Most
of this data doesn't affect us. Amassing
information alone doesn't mean it's valuable. Yet
the new ability to mine the right information,
discover patterns and relationships, already
affects our everyday lives. Anyone, for
instance, who has a navigation screen on a car
dashboard uses data streaming from 24 satellites
11,000 miles above Earth to pinpoint his or her
exact location. People living in Los Angeles and
dozens of other cities now participate, knowingly
or not, in the growing phenomenon of "predictive
policing" authorities' use of algorithms to
identify crime trends. Tennis fans use IBM
SlamTracker, an online analytic tool, to find out
exactly how many return of serves Andy Murray
needed to win Wimbledon. When we use sites like
SlamTracker, companies take note of our browsing
habits and, through either the miracle or the
meddling of Big Data, use that information to
send us personal pitches. That's what happens
when AOL greets you with a pop-up ad (Slazenger
tennis balls 70 percent off!). In their book,
"Big Data A Revolution That Will Transform How
We Live, Work, and Think," Kenneth Cukier and
Viktor Mayer-Schönberger mention Wal-Mart's
discovery, gleaned by mining sales data, that
people preparing for a hurricane bought lots of
Pop-Tarts. Now, when a storm is on the way,
Wal-Mart puts Pop-Tarts on the shelves next to
the flashlights. But what excites and concerns
people about Big Data is more far-reaching than
that. Seeing the bigger picture taking a closer
look at some of the people in the digital
trenches. I follow Mandelbaum and Mr. Youkel down
a corridor of the Library of Congress, past
exhibits redolent of history and what you might
expect from what we call "America's library,"
with its 38 million books on 838 miles of
shelving. They open a door. We pass behind people
staring at huge computer screens and enter a room
that doesn't look as if it belongs in a library
at all. It's the size of a gym, with fluorescent
lights overhead and tall metal boxes rising from
the floor. "The tweets come here," Mandelbaum
says. It's been three years since Twitter
approached the library with a question. What the
online networking service started in 2006 had
become a new way of communicating. Would there,
Twitter asked, be historical value in archiving
tweets? "We saw the value right away," says
Robert Dizard, deputy director of the library.
"Our mission is, preserve the record of
Arnold Lund is looking ahead. Lund has a Ph.D. in
experimental psychology. He holds 20 patents, has
written a book on managing technology design, and
directs a variety of projects for General
Electric. Last year, a tree fell on power lines
behind my house. As the local utility repaired
things, an electrical surge crashed my computer,
destroying all the contents. Lund's power line
project has my attention. "For power companies,
one of the largest expenses is managing foliage,"
he says. "We lay out the entire geography of a
state and the overlay of the power grid. We use
satellite data to look at tree growth and cut
back where there's most growth. Then we predict
where the most likely problem is. We have 50
different variabilities to see the probability of
outage." In that one compressed paragraph, I see
three big changes Mr. Cukier and Mr.
Mayer-Shönberger say Big Data brings to research.
It's what we might call the three "nots." Size,
not sample. For more than a century,
statisticians have relied on small samples of
data from which to generalize. They had to. They
lacked the ability to collect more. The new
technology means we can "collect a lot of data
rather than settle for ... samples." Messy, not
meticulous. Traditionally, researchers have
insisted on "clean, curated data. When there was
not that much data around, researchers had to be
as exact as possible." Now, that's no longer
necessary. "Accept messiness," they write,
arguing that the benefits of more data outweigh
our "obsession with precision." Correlation, not
cause. While knowing the causes behind things is
desirable, we don't always need to understand how
the world works "to get things done," they
note. Lund's lab exemplifies all three. First,
his "entire geography" and 50 variables involve
massive sets of data information streaming in
from sensors, satellites, and other sources about
everything from forest density to prevailing wind
direction to grid loads. Second, he looks for
"probability" not "obsessive precision."
Correlation? Lund values cause, but the reason
behind, say, tree growth interests him less than
spotting correlations that might spur action. "Ah
that tree," he exclaims, as if he is an
engineer in the field. "Better get the trucks out
ahead of the storm!" Cukier and Mayer-Schönberger
cite the United Parcel Service to bolster their
argument about correlation. UPS equips its trucks
with sensors that identify vibrations and other
things associated with breakdowns. "The data do
not tell UPS why the part is in trouble. They
reveal enough for the company to know what to
do." Lund's boss, GE chief executive officer Jeff
Immelt, also talks about sensor data. The company
is now investing 1 billion in software and
analytics, which includes putting sensors on its
jet engines to help enhance fuel efficiency. Mr.
Immelt has said that just a 1 percent change in
"fuel burn" can be worth hundreds of millions of
dollars to an airline. "You save an oil guy 1
percent," Immelt said at a conference this
spring, "you're his friend for life." While Lund
has talked glowingly about how much data his
projects can collect, he wants to make sure I
know data isn't everything. "As a scientist," he
says, "I know the biggest challenge is finding
the right questions. How do you find the
questions important to business, society, and
culture?" Rothman has questions, too. "We work
in emergency rooms," he says about himself and
Dr. Dugas. "We're the boots on the
ground." Rothman's work has involved emergency
medicine and the nexus between public health and
epidemics, including influenza, which kills as
many as 500,000 people a year around the world
and about 45,000 in the US. The two researchers
wanted to find out if the Google national study
held lessons for Baltimore and their emergency
room (ER). They studied Google queries for the
Baltimore area queries about flu symptoms, or
chest congestion, or where to buy a thermometer.
If they could spot spikes, that might help solve
one crucial problem. "Crowding," Dugas says.
"Huge issue."
When epidemics start, people rush to hospitals.
Waiting rooms fill up. If Google trends showed a
spike just as epidemics started, ERs could staff
up and reserve more space for the surge of
patients. The link between Google spikes and
hospital visits in Baltimore turned out to be
strong, especially for children. As soon as the
first news reports surfaced about the 2009 H1N1
virus, pediatric ER visits at Hopkins increased
at the peak by as much as 75 percent. But when
the two researchers looked closer, they found
something unexpected. No flu. It turned out that
news reports about H1N1 elsewhere fueled a rush
to ERs in Baltimore what one researcher called
"fear week." "If you just looked at correlation
for flu, you'd say it was a false trend," says
Dugas. Even so, she and Rothman found data
important for ERs No matter why people are
coming, they need to staff up. The Baltimore
study also showed the importance of finding out
what was behind all those medically related
Google searches in other words, not just
correlation but cause. Like GE's Lund, Rothman
emphasizes the value of "the questions you're
asking." Evidence that Big Data promises enormous
benefits is more than anecdotal. MIT's Mr.
Brynjolfsson did a study in 2012 examining 179
companies. He found those whose decisions were
"data-driven" had become 5 to 6 percent more
productive in ways only the use of data could
explain. On the other hand, consider just this
one data point If you type "Big Data Dark Side"
into Google, you'll get 40 million results.
Despite the potential, there's also peril. The
dark side of Big Data concerns Laura DeNardis,
Internet scholar, author of three books, and
professor at American University's school of
communication in Washington. She and others worry
not exclusively about three questions. Does
the new technology (1) erode privacy, (2) promote
inequality, and (3) turn government into Big
Brother? She points to public health data as one
potential source of abuse. Her concern echoes
that of critics who fear that supposedly
anonymous patient records are not anonymous at
all. As far back as the 1990s, a Massachusetts
state commission gave researchers health data
about state workers, believing this would help
officials make better health-care decisions.
William Weld, then governor of Massachusetts,
assured workers their files had been scrubbed of
the data that could identify them. Harvard
University computer science graduate student took
this promise of privacy as a challenge. Using
just three bits of data, Latanya Sweeney showed
how to identify everyone including Weld, whose
diagnoses, medications, and entire medical
history Ms. Sweeney, now a professor at Harvard,
gleefully sent to his office. Today there are
powerful ways to identify people from records
supposed to keep things private. And there are
concerns other than our health records. Dr.
DeNardis worries about how much companies know
about our social media habits. "Take a look at
the published privacy policies of Apple,
Facebook, or Google," she says. "They know what
you view, when you make a call, where you are.
People consent to that by 'I agree' to privacy
terms. But how carefully are they read?" She's
not alone. Jay Stanley of the American Civil
Liberties Union describes one example of what
companies can do with what they know about us
"credit-scoring." "Credit card companies," he
wrote in a blog, "sometimes lower a customer's
credit card limit based on the repayment history
of other customers at stores where a person
shops." Do we want Master Card to lower our
credit-card limits, thinking we're a risk, just
because people who frequent the stores we do
don't pay their bills? In addition to individual
privacy, critics worry about Big Data's impact in
more expansive ways, such as the growing gap
between rich and poor nations. Large American
companies can hire hundreds of data analysts. How
can Bangladesh compete? Will this aggravate the
global digital divide? Perhaps most worrisome to
people at the moment is the government's use of
Big Data to monitor its own citizens, or others,
in the name of national security. "The American
people," President Obama after the NSA story
broke, "don't have a Big Brother who is snooping
into their business."
Did Obama mean George Orwell's term doesn't
include governments secretly monitoring calls,
e-mails, audio, and video of citizens suspected
of nothing? Commandeering information from firms
like Yahoo and Google? The questions that arose
from Snowden's revelations in June encompass
issues of privacy, confidentiality, freedom, and,
of course, security. The Obama administration
argues that monitoring personal info keeps the
country safe, asserting that PRISM has helped
foil 54 separate terrorist plots against the
US. Some lawmakers on Capitol Hill dispute that
number, though, and in recent weeks momentum has
been building in Washington to rein in the NSA.
Not only has support increased on the left and
right to adopt more oversight of its surveillance
program, polls show a hardening of public opinion
about snooping, too. Meanwhile, there is no
doubt about the fury in other countries when the
news broke especially in Germany, where critics
have compared American monitoring of foreigners'
phone calls and e-mails with that of Stasi, the
former hated East German secret police. In fact,
some of those most upset about the NSA
revelations include Americans alarmed about what
the new technology means outside US borders.
Suzanne Nossel, head of the PEN American Center,
which works to free writers and artists around
the world imprisoned for free speech, worries
about the government use of data from private
companies to stifle dissent. "It's not new," she
says, citing the Chinese dissident Shi Tao,
imprisoned by China in 2004 for posting political
commentary on foreign websites, and still locked
up. "Yahoo China had assisted the Chinese
government. They used Yahoo data to convict
him." But then Ms. Nossel talks about the recent
unrest in Turkey, where the Turkish military shot
and arrested dozens of protesters in Istanbul's
Taksim Square. To find more of what they called
"looters," the Turkish government went to Twitter
and Facebook for help and announced that
Facebook was "responding positively," something
Facebook has denied. And Nossel sees a difference
between 2004 and now. Talking about the most
repressive governments in the world, she argues
that "the government ability to sweep and search
is now so great, it tips the scale. No
technology on the side of human rights advocates
can confront it. That's new and chilling."
What have we learned? There's a notable "Sesame
Street" episode from years back in which Cookie
Monster wanders into a library and drives the
librarian crazy by asking over and over for a
cookie. "This is a LIBRARY!" the librarian
finally screams, forgetting to whisper. "We have
books! Just books!" That's certainly been our
image of what libraries do. "You can still find
books here," Mandelbaum reminds me, standing in a
room full of processors. But figures over the
past decade seem to show that books those
rectangular things with pages we turn are
slowly on the way out in the Digital Age. That's
less significant than it might seem, though.
After all, we value books because of the
knowledge they hold. We've changed the way we
convey knowledge many times. Big Data is another
source of knowledge. Will it become a more
integral part of tomorrow's libraries? It is
perhaps fitting that one of the "Sesame Street"
characters most in tune with the future is ...
the Count. He counts everything. His role is to
teach kids the importance of counting. Big Data
allows us to count everything and analyze what
we find. But are numbers enough? Brynjolfsson
and Mr. McAfee compare Big Data to Leeuwenhoek's
development of the microscope in the 1670s. They
are, after all, both tools. They let people see
lots of things that have always been around. Of
course, the microscope also prompted us to ask
questions we could never ask before. Big Data
does that, too. RECOMMENDED How much do you
know about cybersecurity? Take our quiz. Still,
while Big Data can predict a flu outbreak or
where trees fall, it can't, by itself, resolve
the economic and moral dilemmas we have. Whether
to keep power running, help patients faster, or
preserve the record of America, Big Data teaches
us what's out there, not what's right. There's
nothing inherently wrong with Big Data. What
matters, as it does for Arnold Lund in California
or Richard Rothman in Baltimore, are the
About PowerShow.com