Title: Application of bibliometric analysis Thed van Leeuwen CWTS Leiden University
1Application of bibliometric analysisThed van
LeeuwenCWTSLeiden University
- DAIR-Workshop, 5 november 2008
2Contents
- Introduction of CWTS and bibliometrics
- Part I Various databases
- Differences between Web of Science and Scopus
- Adequacy of citations indexes
- Part II Various indicators one can encounter
- ISI Impact Factors calculation and validity
- International rankings
- The H-index and its limitations
- Part III Various methodological issues
- Journal field normalization
- Citation Windows Impact Measurement
- Typology of the studies
- Methodology of the studies
3Introduction of CWTS and bibliometrics
4Introduction of CWTS
- CWTS is a research institute at Leiden University
Faculty of Social Sciences. - CWTS is/was based mainly on contract research,
but still produces roughly 10 papers in
scientific journals per year. - Currently in a transformation process, based on a
mix of public funding and contract research,
focused on both fundamental research in the field
and service contracts.
5The work of CWTS
- CWTS conducts from the early 1990s large scale
bibliometric research performance analyses that
accompany research assessments by review
committees - Many of these studies had a disciplinary
character, initiated by the VSNU. - A revision of the evaluation protocol (SEP) in
2003 caused a shift of the analyses to
university- or institute-initiated studies.
6Introduction of bibliometrics
- Bibliometrics can be defined as the quantitative
analysis of science and technology performance
and the cognitive and organizational structure of
science and technology. - Basic for these analyses is the scientific
communication between scientists through (mainly)
journal publications. - Key concepts in bibliometrics are output and
impact, as measured through publications and
citations. - Important starting point in bibliometrics
scientists express, through citations in their
scientific publications, a certain degree of
influence of others on their own work. - By large scale quantification, citations indicate
influence or (inter)national visibility of
scientific activity.
7CWTS data system
- CWTS has a full bibliometric license from Thomson
Reuters Scientific to conduct evaluation studies
using the Web of Science - Our database covers the period 1981-2007.
- Some characteristics
- 27.000.000 publications
- 500.000.000 million citation relations between
source papers - 48.000.000 authors (incl variations)
- 28.000.000 million addresses, some 90 cleaned up
over the last 10 years. - Contains reference sets for journal and field
citation data
8Part IVarious databases
9Differences between Citation Indexes
implications for bibliometric studies
- Martijn S. Visser and Henk F. Moed
10Multidisciplinary Citation Indexes
- Web of Science
- since 1963, formerly produced by ISI
- ca. 9,000 Journals are indexed
- Scopus
- launched by Elsevier in 2004
- ca. 15,000 journals, conf papers and other
- Google Scholar
- launched in 2004
- coverage unclear
11Disciplinary distribution of journals in WoS
- Roughly 5.000 journals from natural-, life-,
medical- and technical sciences. - Roughly 2.500 journals from the social- and
behavioral sciences. - Roughly 1.500 journals from the humanities.
12Contents
- Comparing Web of Science and Scopus on a paper by
paper basis - RAE 2001 Coverage differences in a practical
situation - Implications for bibliometric studies case study
in the field of oncology (SCImago)
13Comparing WoS Scopus
- Time period 1996 2006
- Snapshot in time
- Only citeable documents
- Matching algorithm
14Matching WoS with Scopus on a paper by paper basis
Scopus
Intersection 9.4 M
15I From a WoS perspective
- To what extent are citeable documents (articles,
letters, notes and reviews) in journals processed
for the WoS covered by Scopus?
16Scopus coverage of WoS papers increases over time
17Incomplete coverage of WoS journals decreases
over time
In 2005 82 of WoS journals were completely
covered in Scopus
In1996 55 of WoS journals were completely
covered in Scopus
WoS Journals
WoS journals not covered by Scopus at all
18II From a Scopus perspective
- To what extent are citeable documents (articles,
notes, letters, reviews, conference papers) in
sources processed for the Scopus not covered by
Web of Science? (Scopus Surplus)
19NATURAL AND LIFE SCIENCES
20ENGINEERING SCIENCES, SOCIAL SCIENCES HUMANITIES
21III From an external perspective
- To what extent are documents in an external file
processed for the WoS and/or Scopus?
22Coverage differences in a practical case 2001
RAE papers
- Up to 4 publications of every active faculty
staff member - Time Period 1994/1996 2000
- Papers assigned to units of assessment
23Only small overall coverage differences
24Coverage differences for Units of Assessments
(Sciences)
- Scopus WoS gt 3
- Nursing
- Clinical Dentistry
- Civil Engineering
- Other Studies and professions allied to Medicine
- Computer Science
- Mineral and Mining Engineering
- Wos Scopus gt 3
- Pharmacology
- Food Science and Technology
- Physics
- Anatomy
25Why are the differences relatively small?
- Best papers are more likely to be published in
high impact journals which tend to be processed
by both indexes - British academics may not often use sources
solely processed by Scopus
26Need to characterize the surplus coverage of
Scopus
- Geographical location
- Language
- References
- Citedness
27Case study for journals in Oncology
- Cooperation between SCImago and CWTS (Carmen
Lopez Illescas) - Comparison between Scopus and Web of Science
- Comparison on a journal-by-journal basis
28Overlap from a WoS perspective
29Overlap from a Scopus perspective
30Characterization of Scopus surplusLanguage and
Publisher country
31Characterization of Scopus surplusother
journal properties
32Main characteristics Scopus surplus
- More journals in non-English languages
- More recently established
- Less often refereed
- Relatively low impact factors
-
33Implications for bibliometric studies
- Web of Science tends to select only journals with
sufficient high impact (research front) - Scopus tends to be more representative of the
total of scientific literature - The position of the Western World in Scopus is
probably less dominant
34Strong relationship between nr WoS and Scopus
papers for the 50 most productive countries
35Average Citation rate in WoS and Scopus for the
50 most productive countries
36Negative relationship between number of papers in
Scopus surplus and average citation rate
37Effect of Scopus surplus
- Countries that profit most in terms of percentage
of published documents tend to show a decline in
their average citation rate - Keeping in mind the RAE outcomes, the effect of
the extension is also that the countries
publishing relatively often in non-English
journals, show a decline in their average
citatioin rate - Conclusion More is not necessarily better !
38Discussion
- Citation indexes (will) adopt a more inclusive
coverage policy in which citation impact is less
important as a criterion for selection. - This will have implications
- for the way bibliometric assessments of research
performance have to be carried out - for the interpretation of bibliometric
indicators and rankings derived from these
databases
39Discussion
- Should bibliometric studies aim for widest
possible coverage? - - It depends on what you want to measure.
- How should one deal with possible bias as a
result of changing coverage policies?
- Define sub-universes (journal sets) of
publications and citations (e.g. national
/international)
40Adequacy of citation indexes implications for
bibliometric studies
41How to tackle this issue ?
- We conduct analyses on the adequacy of the
citation indexes across disciplines based on
reference behavior of researchers themselves. - The degree of referring towards other indexed
literature indicates the importance of journal
literature in the scientific communication
process.
42The medical Life sciences
43The natural sciences
44The technical sciences
45The social and behavioral sciences
46The humanities
47Overall WoS coverage by main field
48Conclusions on adequacy issue
- We can clearly conclude that the application of
bibliometric techniques, solely based on WoS (but
very likely also Scopus) will not be valid for
some of the soft fields in the social sciences
and the humanities. - That is why the tool box has to be extended !
49Part IIBibliometric indicators one encounters
in the field
50Some basic indicators are
- P number of publications in journals processed
for the Web - of Science.
- C number of received citations, excl.
self-citations. - CPP mean number of citations per publication,
excl. self- - citations
- Pnc percentage of the publications not cited
(within a - certain time-frame !!!)
- SC percentage self-citations related to an
output set.
51ISI Impact Factors calculation and validity
52Methodology ISIs classical IF
- The ISI Impact Factor (IF) is defined as the
number of citations received by a journal in year
t, divided by the number of citeable documents in
that same journal in the years t-1 and t-2,
Citations in year t Number of
citeable documents in t-1 t-2
53Practical exercise on IF
- Here we get to the first practical exercise of
the afternoon - Read the Science commentary on Impact Factors.
- Calculate the Impact Factor for the Lancet
- Discuss the results.
54Share citations-for-free for The Lancet
- ISI Method
- Citations in 2000 .
- Citeable documents in 98 and 99
-
- 14037 (c)
- 957 (a)
- Publications Citations
- 9091 1992
- Article 784 2986
- Note 144 593
- Review 29 232
- Sub-total 957 (a) 7959 (b)
- Letter 4181 4264
- Editorial 1313 905
- Other 1421 909
- Total 7872 14037 (c)
IF14.7
- CWTS Method
- Citations to Art/Not/Rev in 2000 .
- Art/Not/Rev in 98 and 99
-
- 7959 (b)
- 957 (a)
- Citations to Art/Let/Not/Rev in 2000 .
- Art/Let/Not/Rev in 98 and 99
-
- 79594264 .
- 9574181
IF8.3
IF2.4
55ISI Impact Factors
- From 1995 onwards CWTS has analyzed the uses and
validity ISI Journal Impact Factor (IF). - Most important points of criticism were
- Not sensitive for the composition of the journal
in terms of the document types.
- Not sensitive for the science fields a journal is
attached to
- Based on too short citation windows.
56Distribution of citations used for the
calculationof the IF value of The Lancet
- The red area indicates citations for free,
while the blue area indicates correct citations
- The IF-score of The Lancet is seriously
overrated by the scientific audience of the
journal.
57Impact Factoren voor Br. J. Clin. Pharm. en Clin.
Pharm. Ther.
- The graph shows the correct and erroneous impact
factors of BJCP and CPT
- In the case of CPT, citations to published
meeting abstracts are included, while BJCP has
stopped publishing of meeting abstracts !
58Document types and fields
Field Journal IF JFIS
The IF is for 02, JFIS covers 98-02
59Fields and Citation windows
60Citation measurement of IF
-
- 1999 2000 2001 2002 2003 2004
2005 2006
- 1999
- 2000
- 2001
- 2002
- 2003
- 2004
- 2005
- 2006
1999 2000 2001
2000 2001 2002
2001 2002 2003
2002 2003 2004
2003 2004 2005
2004 2005 2006
2005 2006
2006
61CWTS answer to the problems of the IF
- This indicator is the JFIS, the Journal-to-Field
Impact Score. - The JFIS solves the problems of the Impact
Factor, as - the calculation of JFIS is based on equally large
entities, - document types are taken into account,
- JFIS is field-normalized, and finally,
- based on longer citation windows (1-4 years)
62Citation measurement of JFIS
Citation-window 1999 2000 2001
2002 2003 2004 2005 2006
- 1999
- 2000
- 2001
- 2002
- 2003
- 2004
- 2005
- 2006
1999 2000 2001 2002 2000 2001
2002 2001 2002 2002
2000 2001 2002 2003 2001 2002
2003 2002 2003 2003
2001 2002 2003 2004 2002 2003
2004 2003 2004 2004
2002 2003 2004 2005 2003 2004
2005 2004 2005 2005
2003 2004 2005 2006 2004 2005
2006 2005 2006 2006
63Practical exercise on journal impact measures
- Here we get to the second practical exercise of
the afternoon - Consider the application of journal impact
measures in evaluation procedures, and try to
think of possible advantages/disadvantages. - Discuss the results.
64- Topic within the Netherlands, research
performance evaluation studies are performed on a
regular basis.
- The Dutch Pediatrics Society (NVKG) evaluated the
Dutch pediatric centers on the basis of mean IF
values related to their output.
- CWTS was asked to benchmark this method, in order
to indicate possible flaws of the used method.
65Results Case II Different measures, different
conclusions
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
AMC WKZ ERAS AZL -------------------------------
--------------------------------------------------
--------------------------------------------------
------------------- ENDOCRINOLOGY 28 / 1.86 46 /
2.23 64 / 2.21 23 / 1.22 GASTROENTEROL 35 /
2.64 IMMUNOLOGY 31 / 4.38 82 / 3.24 46 /
2.94 ONCOLOGY 61 / 2.39 42 / 2.11 METABOLISM
178 / 2.36 72 / 2.06 23 / 1.45 -----------------
--------------------------------------------------
--------------------------------------------------
-------------------------------
Output vs. IF
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
--- AMC WKZ ERAS AZL ----------------------------
--------------------------------------------------
--------------------------------------------------
------------------------- ENDOCRINOLOGY 28 /
0.71 46 / 1.25 64 / 1.25 23 / 0.59 GASTROENTEROL
35 / 1.35 IMMUNOLOGY 31 / 1.78 82 / 1.10 46
/ 1.10 ONCOLOGY 61 / 1.23 42 /
1.06 METABOLISM 178 / 0.80 72 / 0.87 23 /
0.53 ---------------------------------------------
--------------------------------------------------
--------------------------------------------------
--------
Output vs. JFIS
--------------------------------------------------
--------------------------------------------------
--------------------------------------------------
AMC WKZ ERAS AZL -------------------------------
--------------------------------------------------
--------------------------------------------------
------------------ ENDOCRINOLOGY 28 / 0.38 46 /
1.22 64 / 1.25 23 / 0.73 GASTROENTEROL 35 /
0.98 IMMUNOLOGY 31 / 1.22 82 / 1.17 46 /
0.97 ONCOLOGY 61 / 1.40 42 / 1.09 METABOLISM 17
8 / 0.88 72 / 0.88 23 / 0.27 --------------------
--------------------------------------------------
--------------------------------------------------
-----------------------------
Output vs. Actual Field-Normalized Impact
66International rankings of university performance
67Various academic rankings
- ARWU or Shanghai Ranking
- Thes Ranking
- CHE Excellence Ranking
- CWTS University Ranking
68Composition of Shanghai ranking
- Nobel prizes and field medals by alumni 10.
- Nobel prizes and field medals by staff 20
- Highly cited staff in 21 disciplines 20
- Articles published in Nature Science 20
- Articles published in citation indexes 20
- Per capita performance on those indicators 10
69Composition of THES ranking
- Academic peer review 40.
- Employer review 10
- Faculty Student ratio 20
- Citation per faculty 20
- International faculty 5
- International students 5
70Composition of CHE ranking
- Size indicator output volume in citation
indexes. - Perception indicator citations (in relation to
an international standard) - Beacon indicator Number of often-cited staff
Nobel prize winners at the university - Europe indicator number of projects in the Marie
Curie research promotion programme of the EU - No weights are indicated, rankings are applied on
four fields (biology, chemistry, mathematics,
physics), data delivered by CWTS.
71Important CWTS standard indicators
- CPP/JCSm ratio between real, actual impact, and
mean journal impact. - CPP/FCSm ratio between real, actual impact, and
mean field impact. - JCSm/FCSm ratio between journal impact, and
field impact, indicative for the quality of the
journal package in the field
72Composition of CWTS ranking
- Yellow ranking by size, the number of
publications. - Green ranking by size-independent,
field-normalized citation impact (Crown
indicator). - Orange ranking by the size-dependent brute
force impact indicator, the multiplication of P
with the universitys field-normalized average
impact - Blue ranking by the simple citations-per-public
ation indicator (CPP) 0 - Pink ranking by the simple citations-per-publica
tion indicator (CPP) for the top-50 ranking
institutes on size !
73Practical exercise on rankings
- Here we get to the third practical exercise of
the afternoon - Which disadvantages can you think of, when
considering the indicators used in the various
rankings ? - Discuss the results.
74The H-Index and its limitations
75The H-Index, defined as
- The H-Index is the score that indicates the
position at which a publication in a set, the
number of received citations is equal to the
ranking position of that publication. - Idea of an American physicist, J. Hirsch, who
published about this index in the Proc. NAS USA.
76Examples of Hirsch-index values
- Environmental biologist, output of 188 papers,
cited 4,788 times in the period 80-04. - Hirsch-index value of 31
- Clinical psychologist, output of 72 papers, cited
760 time sin the period 80-04. - Hirsch-index value of 14
77Problems with the H-Index
- For serious evaluation of scientific performance,
the H-Index is as indicator not suitable, as the
index - Is insensitive to field specific characteristics
(e.g., difference in citation cultures between
medicine and other disciplines). - Does not take into account age and career length
of scientists, a small oeuvre leads necessarily
to a low H-Index value.
78- Actual versus field normalized impact (CPP/FCSm)
displayed against the output. - Large output can be combined with a relatively
low impact
79- H-Index displayed against the output.
- Larger output is strongly correlated with a high
H-Index value.
80Part III Methodological issues
81Journal Field Normalization
82Network of publications (nodes) linked by
citations (edges)
Lower citation-density Higher
citation-density e.g., applied research, e.g.,
basic natural social sciences medical research
FCSm
JCSm
CPP
Values for normalization
83Calculating the JCSm FCSm
- --------------------------------------------------
-------------------------------------------- - Type publ. Journal Journal
citations - year category until 1999
- --------------------------------------------------
-------------------------------------------- - I review 1996 CANCER RES Oncology 17
-
- II note 1997 J CLIN END Endocrinology
4 -
- III article 1999 J CLIN END Endocrinology
6 -
- IV article 1999 J CLIN END Endocrinology
8 - --------------------------------------------------
--------------------------------------------
84Calculating the JCSm FCSm 2
- --------------------------------------------------
--------------- - CPP JCS FCS
- --------------------------------------------------
--------------- - I 17 16.9 23.7
-
- II 4 3.1 3.0
-
- III 6 4.8 4.1
-
- IV 8 4.8 4.1
- --------------------------------------------------
---------------
85Practical exercise on calculating CWTS normalized
indicators
- Here we get to the fourth practical exercise of
the afternoon - How would one calculate the CWTS normalized
indicators CPP/JCSm and CPP/FCSm ? - Show the outcomes and discuss the results.
86Calculating the JCSm FCSm 3
- The mean citation score is determined as
- 17 4 6 8
- CPP ------------------ 8.8
- 1 1 1 1
The mean journal citation score as (1
x 16.9) (1 x 3.1) (2 x 4.8) JCSm
-------------------------------------- 7.4
1 1 2
CPP / JCSm (8.8 / 7.4) 1.19
The mean field citation score as
(1 x 23.7) (1 x 3.0) (2 x 4.1)
FCSm --------------------------------------
8.7 1 1 2
CPP / FCSm (8.8 / 8.7) 1.01
87Citation Windows Impact Measurement
88Citation measurement and windows
- Publication years, fixed citation window.
- Publications of 1994, with three citation
years (namely 1994, 1995, en 1996), followed by
1995, with three years, etc. - Blocks of publication years with a window
decreasing in length. - Publications of 1994-1997, with citation
window of 4 years (1994-1997), 3 years
(1995-1997), 2 years (1996-1997), and 1 year.
89Citation measurement with fixed window
- Citation years
- 1994 1995 1996 1997 1998 1999
2000 2001
- 1994
- 1995
- 1996
- 1997
- 1998
- 1999
- 2000
- 2001
1994 1995 1996
1995 1996 1997
1996 1997 1998
1997 1998 1999
1998 1999 2000
1999 2000 2001
2000 2001
2001
90Citation measurement with year blocks
Citation years 1994 1995 1996
1997 1998 1999 2000 2001
- 1994
- 1995
- 1996
- 1997
- 1998
- 1999
- 2000
- 2001
1994 1995 1996 1997 1995 1996
1997 1996 1997 1997
1995 1996 1997 1998 1996 1997
1998 1997 1998 1998
1996 1997 1998 1999 1997 1998
1999 1998 1999 1999
1997 1998 1999 2000 1998 1999
2000 1999 2000 2000
1998 1999 2000 2001 1999 2000
2001 2000 2001 2001
91Typology of studies
92Typology of studies
- We distinguish various levels of analysis
- Macro-level, e.g. country comparison for the EU,
Dutch Observatory of ST - Meso-level, e.g. disciplinary evaluation of
physics research in the Netherlands - Micro-level, e.g. analysis of research
institutes, programmes, or groups - Nano-level, e.g. analysis of individual
researchers.
93Types of data collection
- We distinguish various types of data collection
- Address based, e.g. country or institute
comparisons. Here we select publications from the
data-base starting from country or institute
names - Author name based, e.g. various evaluation
procedures. Here we select publications from the
database, the result is verified by the
researchers themselves - Publication list based, e.g . various evaluation
procedures. For example, matching a list
retrieved from Metis with the WoS.
94Methodology of studies
95Functions of applying bibliometrics
- Using bibliometrics as a diagnostic tool, we can
distinguish two main functions
- Mainly descriptive (e.g., on German
- medical sciences, but also CWTS
- benchmark studies).
- Mainly evaluative (e.g., studies for
- VSNU, the Dutch Association of
- Universities).
96Goals of applying bibliometrics
- Using bibliometrics to measure output and impact,
we can distinguish two main goals
- Gaining insight in the research potential of
entities or a complete organization.
- Gaining insight in the past performance of
entities or a complete organization.
97Choices and consequences
- The final choice for a certain approach or type
of study is mainly customer driven.
- Depending on the goals one wants to achieve
(evaluation of research (groups), description of
the scientific profile of an institute or
university, etc.), specific approaches fit the
raised questions.
98Models of bibliometric analysis
- In our analyses, we can roughly distinguish
between two different types of models
In the next section, we will further explain
these two different approaches.
99The building blocks of an organization
- University
- Laboratories / research groups
- Researchers (in these
- laboratories)
- Scientific publications
100Top Down Approach
CWTS compilation of address-based publication set
101Bottom Up Approach
Author verification of scientific publications
102Top Down Approach (modified)
CWTS compilation of address-based publication
set, verification by client
103Bottom Up Approach (modified)
Client compilation of verified set of
scientific publications
104Combining approach with goal
105Combining approach with function
106Possible results
- Often, bibliometric results combine the scores of
an organization with the distribution over fields
of scientific activity. -
- A common misunderstanding is to link, mentally,
the scores of a unit (that fits the name of the
field) to that field.
107Research profile
- Profile of an academic medical center
- Impact at average level in three fields
108Differences between organizational units and
fields
Dept. of Chemistry
Dept. of Physics
C-IV
P-III
P-IV
C-III
P-I
C-I
C-II
P-II
Chemical Engineering
Physics
Chemistry
109Choices and consequences
- The crucial role of the verification of
publication output is shown. -
- Depending on the goals and functions related to
the application of bibliometric analysis, a
specific approach fits the raised questions best.
110Conclusions
- Within a Top Down approach, evaluation is
excluded as an option. - Within a Top Down approach, insight in research
potential is only partially. - The Bottom Up approach provides the most insight
for evaluative purposes, on both the level of
past performance as well as research potential.
111Practical exercise on methodology
- Here we get to the final practical exercise of
the afternoon - Which disavantages/disadvantages can you think
of, when considering the two different
bibliometric approaches Top Down versus Bottom Up
? - Discuss the results.
112- End of the workshop
- For questions regarding the contents of the
workshop, mail to leeuwen_at_cwts.nl