A note on extracting sentiments in financial news in English, Arabic - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

A note on extracting sentiments in financial news in English, Arabic

Description:

none – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: A note on extracting sentiments in financial news in English, Arabic


1
A note on extracting sentiments in financial
news in English, Arabic Urdu
  • Yousif Almas
    Khurshid Ahmad

Department of Computing University of Surrey,
Guildford, UK y.almas_at_surrey.ac.uk
Department of Computer Science Trinity College,
Dublin, Ireland kahmad_at_cs.tcd.ie
The Second Workshop on Computational Approaches
to Arabic Script-based Languages LSA 2007
Linguistic Institute July 21, 2007 Stanford
University
2
Motivation
Daniel Kahneman (Nobel Prize Awarded 2002)
Herbert Simon (Nobel Prize Awarded 1978)
Robert Engle(Nobel Prize Awarded 2003)
News Impact Curvesand the Asymmetrical Effectof
News
Bounded Rationalityand Information Overload
Behavioural Financeand Human Psychology
3
Outline
  • Introduction
  • Background
  • Method
  • Experiments
  • Evaluation and Conclusion

4
Introduction World Language Hierarchy
The world language hierarchy (1997)
The world language hierarchy (2050)
Graddol, David. (1997) The Future of English?
London The British Council.
5
Introduction Financial News
  • Many Sources and Languages
  • Information about local markets are not always
    available in English

Q To what extend do you consider news in taking
buy/sell decisions? A Regularly and in two
languages (English and Arabic) (Head of Treasury
in an international bank based in the Middle East)
6
Introduction The role of Financial Language
Financial News
write
analyse
restrict
FinancialLanguage
Financial Professionals
communicate
use
Financial Reporters
describe
survey
report
affect
Financial Markets
7
Introduction - Eyeballing the text!
  • What is missing in the qualitative analysis
    packages?
  • The texts have to be eye-balled Most phrases,
    clauses, paragraphs have to be coded/annotated by
    hand ? impossible task when texts all around us
    is exploding (Surdenau et al, 2003 Chan and Wai
    2005 Lan et al., 2005 Das and Chen, 2006)
  • There is a need for a domain specific thesaurus
    (conceptually-organised terminology or
    ontology) for each new domain ?
  • Identify ontological commitments
  • Find terms, and the broader/narrower equivalents
    synonyms and antonyms
  • Maintain terminology data bases
  • Texts that are conceptually similar within a
    domain have to be clustered using unsupervised
    learning algorithms
  • Almost all systems are Anglo-centric

8
Introduction - Objective
  • Propose a language-informed framework for
    financial news analysis using techniques of
    corpus linguistics and special language
    terminology
  • Expect that positive and negative sentiment
    expressed will be tied to aspects of language
    like metaphor
  • Assume frequency, collocation and local grammar
    analysis will lead to patterns useful for the
    automatic analysis of financial news

9
Introduction - Objective
(2) analyse
Financial News
System
(1) learn
(3) assist
analyse
restrict
write
FinancialLanguage
communicate
Financial Professionals
use
Financial Reporters
describe
report
affect
survey
Financial Markets
10
Background Behavioural Finance
Investors are not rational agents (machines) that
can process all available information rationally
Psychological and irrationality factors must be
considered when studying the movements of
financial markets
Clipart Source www.attitude2food.com
11
Background The Role of Media
  • Tetlock et al (2005, 2006) have examined whether
    a simple quantitative measure of language can be
    used to predict individual firms accounting
    earnings and stock returns
  • 1) the fraction of negative words in
    firm-specific news stories forecasts low firm
    earnings
  • 2) firms stock prices briefly under-react to the
    information embedded in negative words and
  • 3) the earnings and return predictability from
    negative words is largest for the stories that
    focus on fundamentals.

12
Background Special Languages
  • A special language is a linguistic subsystem
    intended for unambiguous communication in a
    particular subject field using a terminology and
    other linguistic means (ISO 1087)
  • Authors of special texts share a common
    vocabulary and common habits of word usage
    (Hirschman and Sager, 1982)
  • Grammatical constructions of a natural language
    is significantly reduced in special languages
    (Kittredge, 1982)
  • Word frequency correlates with its acceptability
    in a language community (Quirk et al, 1985)
  • There is a close relation between word
    distribution and information-bearing phrases
    (Hirschman,1986)

13
Background Financial Language
  • A special language comprising a terminology and
    metaphorical mappings couched in a local grammar
  • Oilinstrument prices rosemovement to
    68.52value a barrel amid worries about an
    escalation in the standoff between Iran and the
    westcause.
  • The firm lowered its revenue outlook for the
    first quarter last night and now expects revenue
    to fall six percent from the fourth quarter.
  • Ryanairs healthy margins give its earning a
    strong defence.

Source www.reuters.co.uk
14
Background - Local Grammars
  • Descriptors of particular parts of language use
    or sentences with specific functions (Gross,
    1993)
  • Capture the contextual properties of lexical
    items
  • Consider the lexical, syntactic and semantic
    restrictions that words exhibit
  • Would only accept sentences that are meaningful
    and related to the task

15
Background - Local Grammars
  • Words like rose or fall may be used as a name
  • Local grammar rejects spurious use
  • A local grammar, used almost exclusively in
    financial reporting, can be used to extract
    true sentiment from raw sentiment

16
Background Arabic Financial News
  • Increasing liquidity in some parts of the Arab
    world poured large amounts of money in the local
    financial markets and grasped the attention of
    local and international media (e.g. Reuters,
    CNBC, CNN, BBC,etc.)

17
Background Case Study
  • Correlation between manually selected positive
    and negative words in Al-Wafd Arabic newspaper
    and the Egyptian pound showed some
    anti-correlation between negative news and the
    value of the pound (Ahmed and Almas, IV05)

Positives - Negatives - Financial Instrument
18
Background Metaphors in Finance
falling or rising ?????? ?? ??????
sick or healthy ???? ?? ?????
ascending or descending ???? ?? ????
Cartoons Source http//www.aleqt.com/
19
Background - Metaphors
  • Metaphors can be both culture and language bound,
    but in financial news, they usually relate to
    physical/biological movements across languages,
    some exceptions
  • English bullish and bearish
  • Arabic cod (hamoor, ?????)

20
Background Financial Language and Trading
Financial Services(English)
Agricultural Commodities)Urdu)
Oil (Arabic)
21
Background Multilingual Analysis
  • Some metaphors may not transfer across languages
  • If the pre-dominant trading changes from
    financial instruments, e.g. shares, currencies,
    bonds, to commodities, will the patterns then
    survive?
  • What is seen as positive news, say, in the USA
    might be received as a negative news in the
    Middle East (e.g. the direction of oil prices)

22
Background Word Order
  • Word order and collocation extraction in a
    multilingual environment
  • English (SVO) Oracle profit rose 50 percent
  • Arabic (VSO) ?????? ????? ?????? 50 ?? ?????
  • Urdu (SOV) ?????? ?? ????? ??? 50 ???? ?????

6 intervening tokens
?????? ????? ????? ??????
????? 138 ?? ????? Arabic Gloss
percent by-ratio Gulf Cement
profit rise English Translation Gulf
Cement profit up 138 percent
length is neutral
1 intervening token
23
Method
Term extraction
  • Statistical corpus-based (Ahmad et al., 2006)
  • 1- Frequency Analysis and Terminology Extraction
  • 2- Collocation Extraction
  • 3- Significant N-gram Extraction
  • There is more information in a sequence of words
    (collocations) than in words individually (Firth,
    Halliday and Sinclair)
  • Focus on special language properties
  • Raw corpora are useful
  • Automatic identification of patterns

Collocation Extraction
N-gram Extraction
N-gram Normalisation
Pattern Generation
Pattern Pruning
24
Method
  • Frequency Analysis and Terminology Extraction
  • Identify terms in special language texts by
    comparing the relative frequency distribution of
    a special language corpus with one that is
    representative of a general language (Ahmad,
    1995)
  • Extract terms with frequency and weirdness
    z-score above a positive threshold

Term extraction
Collocation Extraction
N-gram Extraction
N-gram Normalisation
Weirdness (w) ((fSpecial / fGeneral)
(NGeneral/NSpecial)) f frequency , N
corpus size (tokens)
Pattern Generation
Pattern Pruning
25
Method
  • Example Properties of the keyword percent in
    English, Arabic and Urdu financial corpora
  • Using top news corpora for wierdness analysis

Term extraction
Collocation Extraction
N-gram Extraction
N-gram Normalisation
Pattern Generation
Pattern Pruning
26
Method
  • Collocation Extraction (Smajda, 1993)
  • For a given word, find all collocates at
    positions -5 to 5 (Is it applicable to Arabic?
    what about morphological/syntactic Complexity?)
  • Avoid semantic constraints (e.g. doctor and
    nurse)
  • Three criteria
  • strength (normalized frequency) 95 rejection
    (K-Score)
  • position histogram must not be flat (U-Score)
  • select peak from histogram (P-Score)

Term extraction
Collocation Extraction
N-gram Extraction
N-gram Normalisation
Pattern Generation
Pattern Pruning
(adapted from a slide by V. Hatzivassiloglou)
27
Method
Term extraction
Collocation Extraction
N-gram Extraction
N-gram Normalisation
Pattern Generation
Pattern Pruning
28
Method
  • Extract N-grams in the corpus that comprise
    highly collocating keywords (U,k,p) (10,1,1)
    with weirdness z-score 0
  • ?Avoids closed class collocates

Term extraction
Collocation Extraction
N-gram Extraction
N-gram Normalisation
Pattern Generation
Pattern Pruning
29
Method
  • Replace words with frequency z-score less than a
    threshold by a place marker , merge contiguous
    place markers

Term extraction
Collocation Extraction
N-gram Extraction
Input
?????? ????? ????? ?????? ?????
138 ?? ????? percent by-ratio Gulf
Cement profit rise Gulf Cement profit up
138 percent
N-gram Normalisation
Output
Pattern Generation
?????? ?????
????? 138 ?? ????? percent
by-ratio profit rise
profit up 138 percent
Pattern Pruning
30
Method
Term extraction
Collocation Extraction
N-gram Extraction
N-gram Normalisation
Pattern Generation
Pattern Pruning
31
Method
  • Discard specific and frequent Arabic proclitics
    (e.g. the conjunction and (w, ?)

Term extraction
Collocation Extraction
N-gram Extraction
N-gram Normalisation
Pattern Generation
Pattern Pruning
32
Experiments LoLo (????) Local-Grammar for
Learning Terminology
  • Designed and developed for managing corpora
    (mainly Arabic-script based languages and
    English)
  • Tools
  • Corpus analyser
  • Rules editor
  • Information extractor
  • Information visualiser
  • Each component is accessible via LoLos GUI and
    all the data generated can be exported.

33
Experiments LoLos Architecture
General Language Corpus
Candidate Knowledge
Analyser
Special Language Corpus
user
Editor
Extractor
Visualiser
Knowledge Base
Texts
34
Experiments - Corpora
Financial (divided into training and test)
Top News
35
Experiments Top 10 Keywords
1
2
3
9
5
6
8
10
4
7
English

billion
bid
percent
pounds
market
shares
share
growth
company
?????
?????
???????
?????
?????
?????
?????
????
???
?????
2
1
3
8
9
10
4
5
6
7
Arabic
dollar
the-oil
million
prices
barrel
percent
billion
price
company
the-dollar
????
????
???
??
????
????
????
?????
??????
????
1
2
3
8
9
4
5
6
7
10
Urdu
bank
less
rupees
million
capital
percent
increase
increase
10 million
100,000
36
Experiments Arabic Patterns
37
Evaluation Corpus Regional Polarity
Sentences
38
Evaluation
Precision
Precision
Recall
Recall
Positives
Negatives
39
Conclusions
  • We have captured the essence of financial news in
    English and Arabic, the similarities and the
    differences
  • The method produces productive patterns that have
    high accuracy but low coverage at the sentence
    level, coverage is much higher at the document
    level (circa 50-60)
  • Lead sentences are very regular and give the most
    important information or a summary the lead
    sentence is the news (indication of news item
    predominate polarity?)

40
Future Work
  • Unsupervised classification (clustering) of
    patterns as positive and negative and
    bootstrapping the patterns base utilising a cross
    sentence local grammar
  • Arabic lead sentences start with a verb and many
    are metaphorical movement words ? Seed lexicon
  • Titles contain a paraphrased word or phrase of
    the polarity mentioned in the lead ? Automatic
    bootstrapping and clustering
  • Asymmetry (positives are more than negatives) ?
    Automatic labelling of clusters
  • Effect of pre-processing Arabic corpora
  • Polarity across languages/regions/cultures

41
Future Work
"????" ???? ??????? 2 ???????? ?????? 10 ???????
???? ????? ??? "????" ??????? 5 ????? ????????
11-7-2007? ????? ???? ????? ???????? ?????? ???
???? ??? 2? ?? ??????? ???? ?????? ?????? ???
10 ??????? ???? (??????? ????? 3.75...
???? ?????? ?????? ????? ?????? ????? ??? ????
?????? ???????? ????? ?????? ????? ????? ??????
???????? ???? ????? 15-7-2007? ????? ????? ?????
?? ??? ???? ?????? ??????? ??? ????? ????? ???
????? ?????? ???? ?????? ??????? ?? ???????..
Source www.alarabiya.net
42
Future Work
  • Evaluation of the translation of the General
    Inquirer positive and negative lexicons to
    Arabic and Urdu

43
Questions
Write a Comment
User Comments (0)
About PowerShow.com