The%20quality%20of%20social%20interaction:%20Towards%20an%20automatic%20analysis%20of%20sentiments%20in%20informative%20and%20persuasive%20texts. - PowerPoint PPT Presentation

About This Presentation
Title:

The%20quality%20of%20social%20interaction:%20Towards%20an%20automatic%20analysis%20of%20sentiments%20in%20informative%20and%20persuasive%20texts.

Description:

... behalf of the members in free natural language writing and speech ... This virtual world has fast throughput of data and processed information and the ... – PowerPoint PPT presentation

Number of Views:367
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: The%20quality%20of%20social%20interaction:%20Towards%20an%20automatic%20analysis%20of%20sentiments%20in%20informative%20and%20persuasive%20texts.


1
The quality of social interaction Towards an
automatic analysis of sentiments in informative
and persuasive texts.
  • Khurshid Ahmad,
  • Department of Computing, University of Surrey
  • Department of Computer Science, Trinity College,
    Dublin, Ireland
  • Workshop on Information Management and e-Science,
    Lancaster e-Science Centre, Lancaster University,
    5th October 2005

2
Motivation
Newly emergent subjects and e-Science Behavioural
Economics Investor Psychology Social Studies of
Finance Economic Sociology The number of
items of quantitative and qualitative information
available to well-equipped actor is, in effect,
infinite, yet the capacity of any agencement
humans, machines, algorithms, location,.. to
apprehend and to interpret that data is finite
(Hardie and Mackenzie 2005). The economies of
calculation (Mackenzie 2003, 2004, 2005)
3
Motivation
Newly emergent subjects and e-Science I
remember 29 very well, Steinbeck writes (2002
17), We had it madeI remember the drugged and
happy faces of people who built paper fortunes in
stocks they couldnt possibly have paid forTheir
eyes had the look you see around the roulette
table. Then, however, came panic, and panic
changed to dull shockPeople remembered their
little bank balances, the only certainties in a
treacherous world. They rushed to draw the money
out. There were fights and riots and lines of
policemen. Some banks failed rumors began to
fly
4
Motivation
  • Of all the contested boundaries that define the
    discipline of sociology, none is more crucial
    than the divide between sociology and economics
    Talcott Parsons, for all his synthesizing
    ambitions, solidified the divide. Basically,
    Parsons made a pact ... you, economists,
    study value we, the sociologists, will study
    values.
  • If the financial markets are the core of many
    high-modern economies, so at their core is
    arbitrage the exploitation of discrepancies in
    the prices of identical or similar assets.

MacKenzie, Donald. 2000b. Long-Term Capital
Management a Sociological Essay. In (Eds) in
Okönomie und Gesellschaft, Herbert Kaltoff,
Richard Rottenburg and Hans-Jürgen Wagener.
Marberg Metropolis. Pp 277-287.
5
Motivation
  • Social studies of finance repopulates abstracted
    financial markets with human
  • traders and speculators, who have particular and
    complex relations to what they understand to be
    the market
  • inventors of market models and formulas, that
    prove to be contested and fallible
    interpretations of economic reality rather than
    unproblematic representations
  • designers of technology and risk assessment
    models, which have normative choices and criteria
    at their hearts and
  • journalists who do not just write impassive
    financial news, but play important roles in
    marketing financial products and creating space
    for speculation in everyday life.

de Goede, Marieke (2005). "Resocialising and
Repoliticising Financial Markets Contours of
Social Studies of Finance". Economic
Sociology.Vol. 6, No. 3 - July 2005
6
Motivation
Newly emergent subjects and e-Science Criminology
Crime Perception, Detection and
Prevention Anthropology Ethnic and Cultural
Identity The number of items of quantitative
and qualitative information available to
well-equipped actor is, in effect, infinite, yet
the capacity of any agencement humans, machines,
algorithms, location,.. to apprehend and to
interpret that data is finite (Hardie and
Mackenzie 2005)
7
Motivation Bounded Rationality
  • Herbert Simon
  • Mechanisms of Bounded Rationality rationality
    is bounded when it fails short of omniscience
    largely due to failures of knowing all of the
    alternatives, uncertainty about relevant
    exogenous events, and inability to calculate
    consequences (pp 356)
  • Human behaviour, even rational human behaviour,
    is not to be accounted for by a handful of
    invariants (pp 367)

8
Motivation Sentiment Analysis?
  • In the 1960s and 1970s The unpredictability of
    inflation was a primary cause of business
    cycles.
  • Friedman the level of inflation was not a
    problem it was the uncertainty about future
    costs and prices that would prevent entrepreneurs
    from investing and lead to a recession (Milton
    Friedman 1977).
  • Friedmans conjecture could only be plausible if
    the uncertainty were changing over time so this
    was my goal. Econometricians call this
    heteroskedasticity. (Robert Engle 2003)

Friedman, M. (1977), "Nobel Lecture Inflation
and Unemployment," Journal of Political Economy,
85, 451-472. Engle, Robert (2003)RISK AND
VOLATILITY ECONOMETRIC MODELS AND FINANCIAL
PRACTICE, Nobel Lecture, December 8, 2003
9
Motivation Sentiment Analysis?
  • Two strands of literature imply asymmetry in the
    response of exchange rates to news.
  • First Strand bad news in good times should
    have an unusually large impact
  • Second Strand bad news should have unusually
    large effects
  • Robert Engle was shared the 2003 Nobel Prize in
    Economic sciences on formulating the impact of
    news on economic and financial variables.
    News was code for the announcement of key
    economic indices by various agencies.

Torben G. Andersen, Tim Bollerslev, Francis X.
Diebold Clara Vega (2002). MICRO EFFECTS OF
MACRO ANNOUNCEMENTSREAL-TIME PRICE DISCOVERY IN
FOREIGN EXCHANGE. Working Paper 8959 Cambridge,
MA NATIONAL BUREAU OF ECONOMIC RESEARCH.
http//www.nber.org/papers/w8959
10
Motivation Bounded Rationality
  • Daniel Kahneman
  • Maps of Bounded Rationality Two generic modes
    of cognitive function an intuitive mode, where
    judgements and decisions are made automatically
    and rapidly, and a controlled mode which is
    deliberate and slower (pp 449)
  • Kahneman and Tversky found that intuitive
    judgements occupy a position between
    automatic operation of perception and the
    deliberate operations of reasoning (e.g.
    discrepancy between statistical judgement and
    statistical knowledge). (pp 450)
  • Highly accessible features will influence
    decisions, while features of low accessibility
    will be largely ignored. (pp459)
  • Abrupt transition from risk aversion to risk
    seeking could not be plausibly explained by a
    utility function for wealth (pp 461)

11

Motivation Bounded Rationality
  • Japanese yen/US dollar exchange rate (decreasing
    solid line) US consumer price index (increasing
    solid line) Japanese consumer price index
    (increasing dashed line), 19701 - 20035,
    monthly observations

Why is it that Japanese consumer price index is
following the same trend as the US CPI?
12
Motivation I wrote therefore I existed I may
write and change the world
The real world Genre
News Reports Regulatory Body Reports Informative
Commentaries Letters to the Editors Rumour-laden e-mails Appelative
Semi-structured interviews Confidence Surveys Expressive
Language and text are constitutive (and not
merely representational) -- society is not
reducible to language and linguistic analysis
(Hodgson 200062). -- Discourses are broader
than language, being constituted not just in
texts, but also in definite institutional and
organizational practices (Jackson 2004).
But text is all we have after the event, the
interview, the survey, the news, the review a
trace of the sentiment.
13
The quality of social interaction or the world
according to Khurshid Ahmad
  • Any analysis of the interaction between the
    members of a well defined social group, where
    each is engaged in optimising return on his or
    her economic and social investment, should
    involve an analysis of the 'sentiments' of the
    group members

14
The quality of social interaction or the world
according to Khurshid Ahmad
  • The sentiment is expressed in the news and views
    that emanate for and on behalf of the members in
    free natural language writing and speech
    excerpts.
  • The quantifiable aspects of the exchange of
    objects abstract (power) and concrete (money,
    goods, and services) have to be assessed in the
    context of how the news and views may impact on
    the exchange.

15
The quality of social interaction or the world
according to other folk
  • More importantly the sentiment may be expressed
    through action
  • (a) panic buying and selling of financial
    instruments by the investors and traders, and
  • (b) the sometimes complacent attitude of the
    regulators, are good examples of economic, social
    and political action by individuals and groups.

Simon, H.A. (1978). Rational Decision-Making in
Business Organizations. Nobel Lectures,
Economics 1969-1980, (Editor) Assar Lindbeck,
World Scientific Publishing Co. Singapore, 1992.
http//www.nobel.se/economics/laureates/1978/simo
n-lecture.html. Kahneman, D. (2002). Maps of
Bounded Rationality A perspective on Intuitive
Judgement and Choice, Les Prix Nobel 2002.
(Editor) Professor Tore Frangsmyr.
http//www.nobel.se/economics/laureates/2002/kahne
man-lecture.html. Mackenzie, Donald. (2000).
Fear in the Markets. London Review of Books.
Vol 22 (No. 8).
16
The quality of social interaction or the world
according to other folk
  • Actions motivated by panic can equally well be
    seen in mass hysteria related to national/ethnic
    identity that, in turn, can motivate concerns
    related to security and safety (Jackson 2004).

Jackson, Richard (2004). The Social
Construction of Internal War In (Ed.) Richard
Jackson. (Re)Constructing Cultures of Violence
and Peace. Rodopi Amsterdam/New York.
17
e-Science and social interaction?
  • The UK e-Science programme is moving towards
    successful completion.
  • Major contribution has been made to UK science
    and technology
  • Bioinformatics, psychiatry, chemistry and
    engineering (Discovery Net and myGrid)
  • New ways of doing chemistry (CombeChem)
  • Visualisation of complex systems (RealityGrid)
  • Novel design (GEODISE)
  • Safer aircrafts (DAME)

18
e-Science and social interaction?
  • Crime, conflict, and economy are deeply
    interrelated and highly interactive.
  • However, data and methods in each area are in a
    mono-disciplinary silo, referred to by some as
    data tombs, where access to others requires
    significant mediation.
  • Data required in each case includes quantitative
    data, textual data, and historical data.

19
e-Science and social interaction?
  • Social sciences and the so-called hard sciences
    increasingly use complementary methodologies, and
    a century or more of discussion of methodology,
    statistical methods and structural models is
    witness to this.
  • E-Science offers the potential for convergence of
    scientific methods through provision of a common
    underlying structure, or "grid", of computational
    methods, data-base technologies and conceptual
    models.

20
e-Science and social interaction?
  • Social scientists often want to develop evidence
    based substantive theory. They want to know what
    determines what, e.g. long term unemployment and
    social exclusion
  • And social scientists want to explore the
    consequences of policy changes on individual
    behaviour, e.g. encouragement to stay on at
    school on educational attainment, truancy, and
    social exclusion
  • Social science data sets may be small (lt10GB
    (some exceptions)) but they are complex

(Imitation is the sincerest form of flattery
Rob)
21
e-Science and social interaction?
Financial Economics Sociology of Crime Crime Science Social Anthropology
Macro-micro Economic Indicators Census Statistics Survey of Social Attitudes Life-style and Well-being Statistics Macro-micro Economic Indicators Census Statistics Survey of Social Attitudes Life-style and Well-being Statistics Macro-micro Economic Indicators Census Statistics Survey of Social Attitudes Life-style and Well-being Statistics
Market Movement Crime Statistics Ethnicity-related data
Political News Reports, Editorials, Letters to the Editor Political and Social Opinion Polls Consumer Confidence Survey Political News Reports, Editorials, Letters to the Editor Political and Social Opinion Polls Consumer Confidence Survey Political News Reports, Editorials, Letters to the Editor Political and Social Opinion Polls Consumer Confidence Survey
Investor/Trader Confidence Surveys Regulatory Body Output Financial News Citizen Confidence Surveys Police Forces/Home Office Reports Crime Reports Ethnic Minority Surveys Police Forces/Home Office Reports Crime Reports
22
The Surrey Society Grid Demonstrator
  • Was developed under the aegis of the ESRC
    e-Social Science Programme (FINGRID).
  • demonstrated how Grid technologies could support
    novel research activities in financial economics
    that involve
  • the rapid processing of large volumes of
    time-varying qualitative and quantitative data
    (Monte Carlo simulation, wavelet analysis, fuzzy
    logic and neural network based simulations)
  • fusing/visualising of such qualitative and
    quantitative data (qualitative data news,
    e-mails- and quantitative data non-stationary
    and heteroskadistic data collated at different
    frequencies and in different units.

23
The Society Grid Demonstrator
  • Globus Toolkit 3.0 (based on Open Grid Services
    Architecture (OGSA))
  • Java CogKit (Java Commodity Grid) for resource
    management and system integration
  • Languages for Development
  • Java for the implementation of the application
  • Reuters SSL Developers Kit (Java) for the
    connection with the Reuters streaming data
  • Other Technologies
  • XML (NewsML) for the news information
  • JMatlink (adapted to Linux environment for the
    communication with Matlab environment)
  • CGI for communication of Java Applet with the
    server side

24
The Society Grid Demonstrator
  • Live financial data news, historical time series
    data and tick data provided by Reuters, (Reuters
    SSL SDK).
  • Time series analysis a FORTRAN bootstrap
    algorithm, and the MATLAB toolkit for Wavelet
    Analysis (via JMatLink)
  • News/Sentiment analysis System Quirk components
    for terminology extraction, ontology learning and
    local grammar analysis.
  • Visualisation and fusion System Quirk components
    for corpus visualisation, financial charting, and
    data fusion.

25
Design and Performance of the Society Grid
Time in ms (log)
Number of CPUs
26
The new (e-) Social Sciences?
  • Social sciences deal with collectives, or
    agencements comprising human beings, technical
    devices, algorithms, workplaces and so on (Callon
    1998), such that the number of items of
    quantitative and qualitative information to a
    well equipped economic actor, or agencement, is,
    in effect, infinite, yet the capacity of any
    agencement to apprehend and to interpret that
    data is finite (Hardie and MacKenzie 2005)

Callon, Michael. (1998). The Laws of the
Markets. Oxford Blackwell. Hardie, Iain
MacKenzie, Donald. (July 2005). An Economy of
Calculation Agencement and Distributed Cognition
in a Hedge Fund (http//www.sps.ed.ac.uk/staff/An
20Economy20of20Calculation.pdf)
27
The new (e-) Social Sciences?
  • The number of data items available to an
    agencement in a market place financial
    instruments, commodity markets, e-Bay (?) is
    potentially infinite but at any give time only a
    fraction of that data can be processed. The
    market place is a fickle place and the
    information derived from historical data can be
    so quickly outdated that in any agencement for
    a selective, socially distributed,
    technologically-mediated economy of
    calculation.
  •  
  • The economies of calculation and the agencements
    that underpin them stretch beyond individual
    firms the sifting of information often takes
    place in networks of interacting participants.
    The features of processes involved for
    instance, where agency lies, the types of
    information that are deemed relevant or
    irrelevant, how that information is processed
    are consequential. They affect, for example, the
    possibility of a global market and help shape
    how markets and politics interact. (Hardies
    Mackenzie 2005).

Hardie, Iain MacKenzie, Donald. (July 2005).
An Economy of Calculation Agencement and
Distributed Cognition in a Hedge Fund (available
from D.MacKenzie_at_ed.ac.uk)
28
The new (e-) Social Sciences?
  • Sentiments and the sociology of financial markets
  • Mackenzie has focused on how a mathematical-econom
    ics theory is used to create a new instrument
    especially arbitrage (Mackenzie 2003) and options
    markets (Mackenzie and Millo 2003, Mackenzie
    2004)- and then the theory is used to explain and
    monitor the workings of the instrument.
  • Mackenzie, Knorr-Cettina and others are studying
    the rise of electronic markets where people in
    distant geographical locations can be
    interactionally present

Mackenzie, Donald. (2003). Long-Term Capital
Management and the sociology of arbitrage.
Economy and Society Vol. 32 (No. 3). pp 349-380.
29
The new (e-) Social Sciences?
  • Sentiments and the sociology of financial markets
  • Mackenzie used interviewing techniques to
    understand the collapse of a large arbitrage firm
    (Long-Term Capital Management, LTCM), a firm that
    pioneered trading of financial instruments that
    sought to profit from price discrepancies the
    24/7 watch on price discrepancies requires a
    distributed computational infrastructure.
  • Mackenzie (2003) has looked at the change in the
    value of the instruments and has conducted just
    under 70 interviews with partners and employees
    of the failed firm, including a Nobel Laureate
    who was a partner, and with other experts,
    together with documents that were found to have
    precipitated or hastened the demise of LTCM. The
    sentiment about LCTM as expressed in the
    interviews, and in some of the key documents,
    formed the basis of an analysis of a set of time
    series and the computation of key parameters of
    the time series.

Mackenzie, Donald. (2003). Long-Term Capital
Management and the sociology of arbitrage.
Economy and Society Vol. 32 (No. 3). pp 349-380.
30
The new (e-) Social Sciences?
  • Sentiments and the sociology of financial markets
  • Mackenzie found that he was working with a
    community of people who had organized themselves
    and knew each other. There was evidence that
    imitation of the business model and practices
    adapted by the firm by others played a major role
    in the demise of the firm. Most importantly for
    us Mackenzie cites the existence of a fax sent by
    one of the principals of the firm that asked
    investors to make more investment as problems had
    started to arise this fax was posted on the
    Internet within five minutes of its dispatch and
    contributed to the demise of the firm. The
    sentiments expressed by the principal were
    misconstrued by the recipients and despite the
    fairly sound reasons expressed in the fax, albeit
    in a febrile atmosphere, bounded rationality of
    the recipients came into play.

Mackenzie, Donald. (2003). Long-Term Capital
Management and the sociology of arbitrage.
Economy and Society Vol. 32 (No. 3). pp 349-380.
31
The new (e-) Social Sciences?
  • Sentiments and the sociology of financial markets
  • Knorr-Cetina and Bruegger (2002) have looked at
    the emergence of electronic markets and focused
    on the virtual societies being formed in the
    financial markets through the infrastructure that
    supports electronic trading.
  • The trading room operative is in a disembodied
    world dealing with a on-screen reality that
    lacks an off-screen counterpart a form of
    arepresentation (appresentation) of markets. The
    operative is connected to others through
    electronic mail, news and data feeds (this is not
    explicitly dealt with in Knorr-Cteina and
    Bruegger), and has access to a computing system
    that can process very complex data in a timely
    and efficient manner.
  • This virtual world has fast throughput of data
    and processed information and the rapidity of the
    interaction perhaps compensates for the
    disembodied nature of the electronic trading
    markets.

Knorr-Cetina, Karin Bruegger, Urs. (2002).
Global Microstructures The Virtual Societies of
Financial Markets. American Journal of
Sociology. Volume 107, pp 909-950.
32
The new (e-) Social Sciences?
There is a constant stream of news and e-mails in
a dealing room. Some directly from news agencies
() and some annotated items based on the news
Hardie, Iain MacKenzie, Donald. (July 2005).
An Economy of Calculation Agencement and
Distributed Cognition in a Hedge Fund (available
from D.MacKenzie_at_ed.ac.uk)
33
The new (e-) Social Sciences?
There is a constant stream of news and e-mails in
a dealing room. Some directly from news agencies
() and some annotated items based on the news
Hardie, Iain MacKenzie, Donald. (July 2005).
An Economy of Calculation Agencement and
Distributed Cognition in a Hedge Fund (available
from D.MacKenzie_at_ed.ac.uk)
34
The new (e-) Social Sciences?
Hardie, Iain MacKenzie, Donald. (July 2005).
An Economy of Calculation Agencement and
Distributed Cognition in a Hedge Fund (available
from D.MacKenzie_at_ed.ac.uk)
35
The new (e-) Social Sciences?
But whilst the trader is not reading the news
off the live news wire streams Reuters,
Bloomberg, BBC, CNN- somebody else is eyeballing
the news for the content (Brazilian economics,
Chilean politics) and the sentiment (bonds so hot
that they were on fire!)
Hardie, Iain MacKenzie, Donald. (July 2005).
An Economy of Calculation Agencement and
Distributed Cognition in a Hedge Fund (available
from D.MacKenzie_at_ed.ac.uk)
36
The classical Social Sciences Eyeballing the
text!
  • The key requirement in contemporary social
    sciences is to complement the analysis of a range
    of data sets, demographic, economic and
    political, with data related to the person
    (Kahneman 2002, Simon 1972), or lived experience
    (Sacks 1992, Sliverman 2004)

Sacks, H., (1992). Lectures on Conversation.
Oxford Blackwell Publishers (Ed. Gail
Jefferson). Silverman, David. (2004). Who
cares about experience?. In (Ed.) David
Silverman. Qualitative Research. London Sage
Publications. pp 342-367.
37
The classical Social Sciences Eyeballing the
text!
Package Function Facilities
ATLAS.ti text analysis and model building. Users attach code and annotate search/select segments by code Manual hotlinks connecting segments, displays link information diagrammatically. Similar segments can be coded automatically
The General Inquirer content analysis Users can establish patterns in the meaning of words supported by large content dictionaries (Lasswell Value Dictionary Harvard Psycho-Sociological Dictionary)
Nvivo Entry level qualitative text analysis Users supply text patterns and can analyse text data base through text-pattern matching to search for repetition, variant word forms, recurrent phrases.
QUALRUS General purpose qualitative analysis package Offers intelligent suggestions throughout the coding process analysis of data once it has already been coded
TextSmart (SPSS's module) coding and analyzing open-ended survey questions Automated stemming grouping of synonyms excludes grammatical words automatically Term clustering text categorisation based on clustering Dictionary free approach
38
The classical Social Sciences Eyeballing the
text!
  • What is missing in the qualitative analysis
    packages?
  • The texts have to be eye-balled Most phrases,
    clauses, paragraphs have to be coded/annotated by
    hand ? impossible task when texts all around us
    is exploding
  • There is a need for a domain specific thesaurus
    (conceptually-organised terminology or
    ontology) for each new domain ?
  • Identify ontological commitments
  • Find terms, and the broader/narrower equivalents
    synonyms and antonyms
  • Maintain terminology data bases
  • Texts that are conceptually similar within a
    domain have to be clustered using unsupervised
    learning algorithms

39
The new (e-) Social Sciences? Towards an
automatic analysis
  • What is missing in the qualitative analysis
    packages?

40
The new (e-) Social Sciences? Towards an
automatic analysis
  • One key result of close social interaction is the
    emergence of a sub-set of the natural language of
    a given community that is idiosyncratic of the
    desires, aspirations, goals and prejudices of the
    community ? idiosyncratic nature of the
    ontological commitment of the community
  • The subset has its own lexicogrammar and is
    called language for special purposes of a given
    specialism
  • Lexicogrammar Vocabulary (terminology) Local
    Grammar

41
The new (e-) Social Sciences? Towards an
automatic analysis
July 2005 Reuters Financial News Service News
items disambiguated using an automatic extracted
terminology and an automatically local grammar
that only recognises changes in financial
instruments
Total Per Hour
Number of News Items 134,975 208
Number of Words 46,337,111 71508
Raw Sentiment 774,507 1195
Raw Positive 520, 006 802
Raw Negative 254, 501 393
Filtered Sentiment 56,102 87
Filtered Positive 17,340 27
Filtered Negative 38,762 60
42
The new (e-) Social Sciences? Towards an
automatic analysis
Changes in semantic orientation for a news
input, for July 2005 for all shares in the FTSE.
43
The new (e-) Social Sciences? Towards an
automatic analysis
  • There is no obvious technique in social science
    research method that can improve the researchers
    productivity in collecting and analysing large
    volumes of speech and text.
  • Social scientists survey, and occasionally
    interview, interesting individuals in various
    social groups analyse the survey form and
    quantify.
  • So what about the data collected in the field.
    Data is buried in tombs never to be taken out
    again.
  • Most text, if ever, is hand-coded by the social
    science researcher and then the proxy of the
    interpretation of the codes is presented as
    objective analysis.

The real world Genre
News Reports Regulatory Body Reports Informative
Commentaries Letters to the Editors Rumour-laden e-mails Appelative
Semi-structured interviews Confidence Surveys Expressive
44
The new (e-) Social Sciences? Towards an
automatic analysis
The real world Genre
News Reports Regulatory Body Reports Informative
Commentaries Letters to the Editors Rumour-laden e-mails Appelative
Semi-structured interviews Confidence Surveys Expressive
  • We present a method for systematically
    identifying sentiment bearing phrases in large
    volumes of streaming texts a local grammar
    comprising templates to extract the phrases with
    a minimal number of false positives.
  • The sentiments are aligned with quantitative
    (time-varying) information and results
    co-integrated and tested for Granger causality
  • The grammar itself is constructed automatically
    from a corpus of domain specific texts

45
Conclusions and Future Work
  • The methods developed in the Society Grids
    project can be used
  • to investigate how a persons perception of his
    or her own well being, at different times and in
    different places, and in various facets - social,
    political and economic.
  • This can be the same or at variance with, say for
    example, crime statistics, economic indicators,
    achievements or failures of (other) ethnic/racial
    categories.
  • These can be extended to the new areas like
  • the reassurance gap in policing
  • totalising war discourse that leads to
    ethnic/racial conflicts

46

Towards an automatic analysis of sentiments?
  • We rely on reviews and opinion polls of various
    kinds
  • Film TV reviews Book reviews Resort reviews
  • Bank reviews Automobile Review White good
    reviews
  • Consumer surveys write your own reviews
  • Newspaper editorials Editors choice.

47

Towards an automatic analysis of sentiments?
  • We rely on the sentiment of the reviewers,
    editors, investment experts, and
  • We do know the cost of durables, shares,
    holidays.
  • A reasonable price is rejected if the reviews
    are poor an exorbitant price is acceptable if
    the reviews are good
  • Bad reviews stick in the mind for longer than
    good reviews.

48

Towards an automatic analysis of sentiments?
  • We rely on the sentiment of the more vociferous
    in the society sometimes
  • The vociferous may call black white, and white
    black
  • The vociferous may repudiate facts and purvey
    fiction.

49

Towards an automatic analysis of sentiments?
A new bank has just been launched Punter Smith
has passed his judgement on the bank. Which of
the two columns tells us that he likes the new
outfit?
online service unethical practices
online experience low funds
direct deposit other problems
local branch old man
low fees lesser evil
well other virtual monopoly
small part probably wondering
printable version little difference
true service other bank
other bank possible moment
inconveniently located extra day
Turney, Peter D. (2002). Thumbs Up or Thumbs
Down? Semantic Orientation Applied to
Unsupervised Classification of Reviews. In Proc
of the 40th Ann. Meeting of the Ass. for Comp.
Linguistics (ACL). Philadelphia, July 2002, pp.
417-424. (Available at http//acl.ldc.upenn.edu/P/
P02/P02-1053.pdf).
50

Towards an automatic analysis of sentiments?
How can a machine detect the positive/negative
sentiment from texts? We eyeball the collocation
of words like excellent poor in text corpus.
online service unethical practices
online experience low funds
direct deposit other problems
local branch old man
low fees lesser evil
well other virtual monopoly
small part probably wondering
printable version little difference
true service other bank
other bank possible moment
inconveniently located extra day
The point wise mutual information is computed
between word1 word2
Semantic orientation of phrase is given as
Turney, Peter D. (2002). Thumbs Up or Thumbs
Down? Semantic Orientation Applied to
Unsupervised Classification of Reviews. In Proc
of the 40th Ann. Meeting of the Ass. for Comp.
Linguistics (ACL). Philadelphia, July 2002, pp.
417-424. (Available at http//acl.ldc.upenn.edu/P/
P02/P02-1053.pdf).
51

Towards an automatic analysis of sentiments?
How can a machine detect the positive/negative
sentiment from texts? We eyeball the collocation
of words like excellent poor in a number of
texts.
Phrase Semantic Orientation Phrase Semantic Orientation
online service 2.780 unethical practices -8.484
online experience 2.253 low funds -6.843
direct deposit 1.288 other problems -2.748
local branch 0.421 old man -2.566
low fees 0.333 lesser evil -2.288
well other 0.237 virtual monopoly -2.050
small part 0.053 probably wondering -1.830
printable version -0.705 little difference -1.615
true service -0.732 other bank -0.850
other bank -0.850 possible moment -0.668
inconveniently located -1.541 extra day -0.286
52

Towards an automatic analysis of sentiments?
  • Robert Engles contribution Volatility may vary
    considerably over time large (small) changes in
    returns are followed by large (small) changes.

Engle, R. F. (1982). Autoregressive conditional
heteroscedasticity with estimates of the variance
of United Kingdom inflation. Econometrica Vol
50, pp 9871007.
53

Towards an automatic analysis of sentiments?
  • Engle and Ng have developed the concept of the
    news impact curve.
  • To condition at time t on the information
    available at t - 2 and thus consider the effect
    of the shock e t-1 on the conditional variance ht
    in isolation.
  • The conditional variance is affected by the
    latest information, the news e t-1
  • The symmetric case Both positive and negative
    news has the same effect.
  • The assymetric case a positive and an equally
    large negative piece of news do not have the
    same effect on the conditional variance.

Engle, R. F. and Ng, V. K (1993). Measuring and
testing the impact of news on volatility, Journal
of Finance Vol. 48, pp 17491777.
54
News Analysis and Sentiment Analysis
  • Dan Nelson (1992) recognized that volatility
    could respond asymmetrically to past forecast
    errors. In a financial context, negative returns
    seemed to be more important predictors of
    volatility than positive returns. Large price
    declines forecast greater volatility than
    similarly large price increases. This is an
    economically interesting effect that has wide
    ranging implications

55

Towards an automatic analysis of sentiments?
Asymmetric case
Symmetric case
Engle, R. F. and Ng, V. K (1993). Measuring and
testing the impact of news on volatility, Journal
of Finance Vol. 48, pp 17491777.
56
Towards an automatic analysis of sentiments?
  • News Effects
  • I News Announcements Matter, and Quickly
  • II Announcement Timing Matters
  • III Volatility Adjusts to News Gradually
  • IV Pure Announcement Effects are Present in
    Volatility
  • V Announcement Effects are Asymmetric
    Responses Vary with the Sign of the News
  • VI The effect on traded volume persists longer
    than on prices.

Andersen, T. G., Bollerslev, T., Diebold, F X.,
Vega, C. (2002). Micro effects of macro
announcements Real time price discovery in
foreign exchange. National Bureau of Economic
Research Working Paper 8959, http//www.nber.org/p
apers/w8959
57
Eyeballing News for Sentiments
  • Qualitative research methods are being used in
    financial economics, and in sociological studies
    of financial markets, for systematically studying
    the hopes and fears of the traders, investors,
    and regulators in the analysis of the behaviour
    of the markets.
  • Since 2000, the analysis of news wire has become
    selective and targeted.
  • Some researchers choose news related to economic
    and financial topics
  • news about employment
  • distinguish between scheduled and non-scheduled
    news announcements

58
Eyeballing News for Sentiments
  • Some pre-select keywords that indicate change in
    the value of a financial instrument including
    metaphorical terms like above, below, up and down
    and use them to represent positive/negative
    news stories.
  • Some use the frequency of collocation patterns
    for assigning a feel-good/bad score to the
    story
  • Good news stories appear to comprise collocates
    like revenues rose, share rose
  • Bad news stories contain profit warning, poor
    expectation
  • Neutral stories contain collocates such as
    announces product, alliance made
  • The sentiment of the story is then correlated
    with that of a financial instrument cited in the
    stories and inferences made.

59
Automating News Analysis for Extracting Sentiments
  • We adopt a text-driven and bottom-up method
    starting from a collection of texts in a
    specialist domain, together with a representative
    general language corpus,
  • and use the following five-step algorithm for
    identifying discourse patterns with more or less
    unique meanings, without any overt access to an
    external knowledge base

60
Automating News Analysis for Extracting
Sentiments A method
  1. Select training corpora Reuters Corpus Volume 1
    (RCV1) and a general language corpus.
  2. Extract key words
  3. Extract key collocates
  4. Extract local grammar using collocation and
    relevance feedback
  5. Assert the grammar as a finite state automaton.

61
Automating News Analysis for Extracting
Sentiments An experiment
  • I. Select training corpora
  • Training-Corpus
  • The British National Corpus, comprising
    100-million tokens distributed over 4124 texts
    (Aston and Burnard 1998)
  • Reuters Corpus Volume 1 (RCV1) comprising news
    texts produced in 1996-1997 and contains 181
    million words distributed over 806,791 texts

62
Automating News Analysis for Extracting
Sentiments An experiment
  • II. Extract key words
  • The frequencies of individual words in the RCV1
    were computed using System Quirk
  • for describing how our method works we will use a
    randomly selected component of the corpus the
    output of February 1997, henceforth referred to
    as the RCV1-Feb97 corpus
  • the RCV1-Feb97 corpus containing 14 Million words
    distributed 63,364 texts.

63
Automating News Analysis for Extracting
Sentiments An experiment
Ranks RCV1 Feb97 (NRCV1Feb9714 Million) Cumulative Number of Tokens () British National Corpus (NBNC100 Million) Cumulative Number of Tokens ()
1-10 the, to, of, in, a, and, said, on, s, for 0.87 M (21.3) the, of, and, a, in, to, for, is, as, that 22.3 M (22.3)
11-20 at, that, was, is, it, by, with, from, percent, be 0.28 M (6.8) was, I, on, with, as, be, he, you, at, by 6.51 M (6.5 )
21-30 as, he, million, year, its, will, but, has, would, were 0.17 M (4.2) are, this, have, but, not, from, had, his, they, or 4.23 M (4.2)
31-40 an, not, are, have, which, had, up, n, new, market 0.13M (3.3) which, an, she, where, here, we, one, there, all, been 3.05 M (3.1)
41-50 this, we, after, one, last, company, u, they, bank, government 0.10M (2.6) their, if, has, will, so, would, no, what, can, when 2.35 M (2.4)
64
Automating News Analysis for Extracting
Sentiments An experiment
Token RCV1 Feb97 (NRCV1Feb97 14,244,349) RCV1 Feb97 (NRCV1Feb97 14,244,349) RCV1 Feb97 (NRCV1Feb97 14,244,349) BNC (NBNC100,000,000) BNC (NBNC100,000,000) BNC (NBNC100,000,000) Weirdness (a/b)
Rank fRCV1Feb97 fRCV1Feb97 / NRCV1Feb97 (a) Rank fBNC fBNC / NBNC (b) Weirdness (a/b)
percent 19 65763 0.462 3394 2928 0.003 157.84
market 40 36349 0.255 301 30078 0.030 8.49
company 46 29058 0.204 219 40118 0.040 5.09
bank 49 28041 0.197 562 17932 0.018 10.99
shares 56 23352 0.164 1285 8412 0.008 19.51
65
Automating News Analysis for Extracting
Sentiments An experiment
  • III. Extract key collocates

f Left Right Total z-score
percent 65763
up 5315 4360 955 5315 15.91
rose 4361 3988 373 4361 13.04
rise 2391 980 1411 2391 7.12
down 2291 1636 655 2291 6.82
fell 2074 1844 230 2074 6.17
66
Automating News Analysis for Extracting
Sentiments An experiment
  • IV. Extract local grammar using collocation and
    relevance feedback

Pattern f Collocate Left Right z-score
10 percent to 108 rose 24 0 5.45
by 10 percent to 18 rose 5 0 2.27
rose 10 percent to 14 billion 0 7 4.24
rose 20 percent to 11 billion 1 7 6.02
67
Automating News Analysis for Extracting
Sentiments An experiment
  • V. Assert the grammar as a finite state automaton
  • The (re-) collocation patterns can then be
    asserted as a finite state automata for each of
    the movement verbs and spatial preposition
    metaphors

68
Automating News Analysis for Extracting
Sentiments An experiment
  • V. Assert the grammar as a finite state automaton
  • The (re-) collocation patterns can then be
    asserted as a finite state automata for each of
    the movement verbs and spatial preposition
    metaphors

69
Automating News Analysis for Extracting
Sentiments An experiment
  • V. Assert the grammar as a finite state automaton
  • The (re-) collocation patterns can then be
    asserted as a finite state automata for each of
    the movement verbs and spatial preposition
    metaphors

70
Experiments and Evaluation of sentiment analysis
method
  • V. Assert the grammar as a finite state automaton
  • The (re-) collocation patterns can then be
    asserted as a finite state automata for each of
    the movement verbs and spatial preposition
    metaphors

71
Automating News Analysis for Extracting
Sentiments Some results
Changes in the total number of positive/negative
words together with those that are used in the
local grammars (filtered positive / negative
words) and total number of words.
72
Automating News Analysis for Extracting
Sentiments Some results
Changes in the total number of positive/negative
words together with those that are used in the
local grammars (filtered positive / negative
words) and total number of words.
73
Automating News Analysis for Extracting
Sentiments Bradford Riots?
  • BBC News tracked from 9/11/1999 to 5/08/2005 for
    the keywords Bradford Riots, Burnley Riots, and
    Oldham Riots

City Number of News Items Total of Tokens Average of Tokens (Std. Dev)
Bradford 253 175191 3368 (5478)
Burnley 172 99059 2304 (3236)
Oldham 261 151696 3096 (3041)
74
Automating News Analysis for Extracting
Sentiments Bradford Riots?
  • BBC News tracked from 9/11/1999 to 5/08/2005 for
    the keywords Bradford Riots, Burnley Riots, and
    Oldham Riots. The results for the period July
    2001-July 2002

75
Automating News Analysis for Extracting
Sentiments Bradford Riots?
Rate of change?
76
Automating News Analysis for Extracting
Sentiments Bradford Riots?
The common agencements persons, places,
institutions and acts
77
Grids for Automating News Analysis
  • We followed Hughes et al. (2003) word frequency
    counting approach to evaluate the performance of
    our implementation
  • The corpora used in our experiments are the Brown
    Corpus and the Reuters RCV1 Corpus

Files Size (Mb) Words (M)
Brown 500 5.2 1.0
RCV1 806,791 2576.8 169.9
78
Grids for Automating News Analysis
Time in seconds
Number of CPUs
79
Afterthought
  • Though we have devised programs that can learn
    unambiguous patterns of use of positive or
    negative sentiment, a sentence is always used in
    the context of other sentences and the context
    may change if the inference is made on the basis
    of one sentence only
  • One can argue that a new text is a response to
    some or all of the existing texts, and in that
    sense each text is contextualised within a
    network of other texts - even if all the existing
    texts unambiguously expressed a positive
    sentiment, a new text with strong negative
    sentiment may invalidate all of the positive
    sentiment.

80
Conclusions and Future Work
Data Sources Financial Economics Sociology of Crime Crime Science Social Anthropology
Quantitative Macro-micro Economic Indicators Census Statistics Survey of Social Attitudes Life-style and Well-being Statistics Macro-micro Economic Indicators Census Statistics Survey of Social Attitudes Life-style and Well-being Statistics Macro-micro Economic Indicators Census Statistics Survey of Social Attitudes Life-style and Well-being Statistics
Quantitative Market Movement Crime Statistics Ethnicity-related data
Qualitative Political News Reports, Editorials, Letters to the Editor Political and Social Opinion Polls Consumer Confidence Survey Political News Reports, Editorials, Letters to the Editor Political and Social Opinion Polls Consumer Confidence Survey Political News Reports, Editorials, Letters to the Editor Political and Social Opinion Polls Consumer Confidence Survey
Qualitative Investor/Trader Confidence Surveys Regulatory Body Output Financial News Citizen Confidence Surveys Police Forces/Home Office Reports Crime Reports Ethnic Minority Police Forces/Home Office Reports Crime Reports
Write a Comment
User Comments (0)
About PowerShow.com