Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection

Description:

... the 'knows' relationship is explicitly stated (to construct a social network) ... A social network is a set of people connected by a set of social relationships. ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 35
Provided by: ccc7
Category:

less

Transcript and Presenter's Notes

Title: Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection


1
Semantic Analytics on Social Networks
Experiences in Addressing the Problem of Conflict
of Interest Detection
  • Boanerges Aleman-Meza, Meenakshi Nagarajan,
    Cartic Ramakrishnan, Li Ding, Pranam Kolari, Amit
    P. Sheth, I. Budak Arpinar, Anupam Joshi, Tim
    Finin
  • LSDIS Lab, Dept, of C.S., Univ. of Georgia
  • Dept. of C.S.E.E., Univ. of Maryland
  • WWW 2006
  • (Nominated for Best Paper Award)

2
Outline
  • Introduction.
  • Motivation and Background.
  • Integration of Two Social Networks.
  • COI Detection.
  • Conclusions and Future Work.

3
Introduction
  • In this paper, we describe a Semantic Web
    application that detects Conflict of Interest
    (COI) relationships among potential reviewers and
    authors of scientific papers.
  • COI is typically known as a situation that may
    bias a decision.
  • It can be caused by a variety of factors such as
    family ties, business or friendship ties.
  • Detection COI is necessary in many situations,
    such as peer-review of scientific research papers
    or proposals.
  • To ensure impartial decisions.

4
Introduction (cont.)
  • In some cases, it can be difficult to detect COI
    because of the lack of available information.
  • However, there exists implicit and/or explicit
    information in the form of social networks.
  • LinkedIn.com comprising people from information
    technology areas, could be used to detect COI in
    situations such as IPO or company acquisitions.
  • MySpace.com, Friendster, and Hi5.
  • The importance of social network applications is
    even not only considering the millions of users
    but also due to the millions of dollars they are
    worth.

5
Introduction (cont.)
  • Although social networks can provide data to
    detect COI, one important problem lies in the
    lack of integration among sites hosting them.
  • Moreover, privacy concerns prevent such sites
    from openly sharing their data.
  • We chose publicly available social network data
    to address the challenge of COI detection (in the
    context of peer-paper-review).
  • The DBLP bibliography provides collaboration
    network data by virtue of the explicit co-author
    relationship among authors.
  • We used a multitude of FOAF (Friend-Of-A-Friend)
    documents from the Web where the knows
    relationship is explicitly stated (to construct a
    social network).

6
Introduction (cont.)
  • We create a populated ontology (network) by
    integrating entities and relationships from the
    above two social networks.
  • Challenges
  • Entity disambiguation
  • DBLP has different entries that in the real world
    refer to the same person.
  • R. Guha and Ramanathan V. Guha
  • A fundamental challenge in developing Semantic
    Web applications involving heterogeneous,
    real-world data.
  • The ontology is used to determine a degree of COI
    between the reviewers and authors.

7
Conflict of Interest Detection Problem
  • COI situations should be identified to produce
    impartial decisions.
  • Many organizations have strict definitions of
    what constitutes a COI.
  • The NIH (National Institutes of Health) defines
    COI in the context of the grant review process
    as
  • A Conflict Of Interest (COI) in scientific peer
    review exists when a reviewer has an interest in
    a grant or cooperative agreement application or
    an RD contract proposal that is likely to bias
    his or her evaluation of it. A reviewer who has a
    real conflict of interest with an application or
    proposal may not participate in its review.

8
Conflict of Interest Detection Problem (cont.)
  • One major cause for bias is professional or
    social relationships between potential reviewers
    and authors of the material to be reviewed.
  • In this paper, we address the problem of COI
    detection in the context of peer-review
    processes.
  • We believe that the techniques presented here are
    applicable for COI detection in other scenarios
    as well.

9
The Peer-Review Process
  • The process is commonly supported by
    semi-automated tools, such as conference
    management systems.
  • Typically, one person designated as Program
    Committee (PC) chair, is in charge of the proper
    assignment of papers to be reviewed by PC members
    of a conference.
  • Assigning papers to reviewers is probably one of
    the most challenging tasks for the Chair.
  • Conference management systems support this task
    by relying on reviewers specifying their
    expertise and/or bidding on papers.
  • Allow the Chair to modify the assignments.
  • The key is to ensure that there are qualified
    reviewers for a paper and that they will not have
    a-priori bias for or against the paper.

10
The Peer-Review Process (cont.)
  • The assignments rely on the knowledge of Chair
    about any particular strong social relationships
    that might cause possible COIs.
  • However, due to the proliferation of
    interdisciplinary research, the Chair cannot be
    expected to keep up with the ever-changing
    landscape of collaborative relationships among
    researchers.
  • Hence, conference management systems need to help
    the Chair with the detection of COIs.

11
The Peer-Review Process (cont.)
  • Contemporary conference management systems
    support COI detection in different manners.
  • EDAS checks for COIs based on declarations of
    possible conflicts by the PC members.
  • Microsoft Researchs CMT tool allows authors to
    indicate COIs with reviewers.
  • Confious automatically detects theses COIs based
    mainly on similar emails or co-authorship
    criteria.
  • The co-authorship criterion identifies users
    that have co-authored at least one paper in the
    past.
  • The straight forward criteria can miss out on
    COIs as exemplified by one recent case (??).
  • Co-author in question has a hyphened last name
    (??).

12
Online Social Networks
  • A social network is a set of people connected by
    a set of social relationships.
  • Friendship, co-working, , etc.
  • The entity Person is the fundamental concept in
    online social networks.
  • With one or several of its properties.
  • Name, ID, , etc.
  • Different sources might use different set of
    properties.
  • Such heterogeneous contexts and entity
    identifiers necessitate entity disambiguation.

13
Online Social Networks (cont.)
  • A link is another important concept in social
    networks.
  • Some sources directly provide links among person
    entities.
  • foafknows
  • Other sources provide links via metadata.
  • co-author meta information of DBLP

14
Social Networks Analysis
  • Focus on the analysis of patterns of
    relationships among social entities (e.g.,
    people).
  • Analysis of networks of criminals.
  • Finding influential individuals.
  • Our work is fundamentally different than these
    previous approaches as
  • It aims to develop and test an approach in
    integrating two social networks.
  • And using semantic association discovery
    techniques for identification of COI
    relationships.

15
Integration of Two Social Networks
  • We bring together a semi-structured semantic
    social network (FOAF) with a structured social
    network extracted from the underlying
    co-authorship network in DBLP.
  • They were chosen based on
  • They are representative.
  • Refer to real-world persons.
  • Publicly available.
  • We describe the challenges involved with respect
    to entity disambiguation that have to be
    addressed to merge entities across these sources
    that in real-world refer to the same person.

16
FOAF DBLP
  • FOAF
  • The Friend of a Friend (FOAF) data source is
    created independently by many authors to publish
    information about themselves and their social
    relationships.
  • foafname.
  • foafknows.
  • DBLP
  • A fixed structure to identify persons by their
    names.
  • Persons are associated by co-author relationships
    (fixed structure).

17
FOAF DBLP (cont.)
18
Data Cleaning
  • We started with a set of authors of papers in the
    2004 and earlier international Semantic Web
    Conference and the Program committee members of
    these conferences.
  • The set of people and their friends are likely to
    publish their personal profiles in FOAF and their
    names usually also appear in DBLP.

19
Data Cleaning (cont.)
  • DBLP-SW
  • We collected 38,027 person entities that have up
    to three hops of social distance from those
    persons in Semantic Web (SW) conferences.
  • FOAF-EDU
  • We first used the value of foafname to perform
    data cleaning.
  • For those containing special characters (i.e.,
    , ?)
  • To identify persons whose FOAF documents residing
    on edu websites.
  • 21,308 persons

20
Entity Disambiguation
  • The goal is to find entities that might have
    multiple references in DBLP and/or FOAF that
    refer to the same entity in real-life.
  • We adapted a name-reconciliation algorithm
    SIGMOD 2005.
  • It employs a rigorous form of semantic similarity
    defined as a combination of the similarity
    between attributes.
  • The similarity of their names and affiliations.
  • The number of common co-author relationships.
  • Weights are manually assigned to the attributes
    based on aspects such as the number of entities
    that have values for certain attributes.

21
Entity Disambiguation (cont.)
  • We found that these weights and merge thresholds
    were quite effective through several experiments.
  • The output of the disambiguation algorithm
    populates two result sets
  • A sameAs set.
  • Entity pairs identified as the same entity.
  • 633 pairs.
  • An ambiguous set.
  • Entity pairs having a good probability of being
    the same but without sufficient information to be
    reconciled with certainty.
  • 6,347 pairs.

22
Entity Disambiguation (cont.)
  • The lack of a gold standard prevented us from
    using precision and recall metrics.
  • We measured statistics of false positives and
    false negatives by manually inspecting random
    samples of entity pairs from both the sameAs set
    and the ambiguous set.
  • We picked 6 random samples, each having 50 entity
    pairs.
  • False positive in the sameAs set.
  • False negative in the ambiguous set.
  • The false negatives in any ambiguous set will be
    between 2.8 and 7.8.
  • The false positives will be between 0.3 and 0.9.

23
Entity Disambiguation (cont.)
  • Reasons for false negatives
  • Entity pairs under comparison had a high
    similarity in atomic attribute values but had
    very few association attribute matches.
  • A pair of entities had very few attributes for
    comparison, but had a high match in the most
    semantically relevant attributes.
  • Their threshold (similarity) was not high enough
    for them to be reconciled.

24
Entity Disambiguation (cont.)
  • We concluded based on experiments, that altering
    the weights and thresholds alone did not improve
    the results.
  • The task of entity disambiguation is very
    difficult.

25
COI Detection
  • Levels of COI
  • Definite COI
  • a reviewer is one of the listed authors in a
    paper to be reviewed.
  • High potential COI
  • the existence of close or strong relationships
    among an author of a submitted paper and a
    reviewer.
  • Medium potential COI
  • a reviewer and an author of a paper to be
    reviewed have close relationships with a third
    party.
  • E.g., the same PhD advisor.
  • Low potential COI
  • Weak of distant relationships between a reviewer
    and an author.
  • Can be ignored.

26
COI Detection (cont.)
  • Weighting Relationships for COI Detection
  • The relationship foafknows is used to explicitly
    list the person that are known to someone.
  • The assertion (A knows B) is usually subjective
    and imperfect.
  • We assigned a weight of 0.5 to all 34,824
    foafknows relationships in the FOAF-EDU dataset.
  • The second type of relationship is the co-author
    relationship.
  • A good indicator for collaboration among authors.
  • We used the ratio of number of co-authored
    publications vs. total of his/her publications as
    the weight for the co-author relationship.
  • Asymmetric.

27
COI Detection (cont.)
  • Detection of COI
  • Analyze how two persons are connected by direct
    relationships or through sequences of
    relationships.
  • Our previous work on discovery of semantic
    associations is directly applicable for COI
    detection.
  • A semantic association is basically a path (of
    relationships) between entities.
  • We find semantic associations containing up to 3
    relationships.
  • For two entities (one reviewer, one author), we
    find all semantic associations (paths) between
    them.

a
r
28
COI Detection (cont.)
  • The following cases are considered
  • Reviewer and author are directly related (through
    foafknows and/or co-author).
  • High potential COI for at least one relationship
    having weight gt 0.3.
  • Medium potential COI for at least one
    relationship having weight in 0.1 0.3).
  • Low potential COI for at least one relationship
    having weight lt 0.1
  • Reviewer and author are not directly related but
    they are directly related to (at least) one
    common person (intermediary).
  • Medium potential COI
  • There are many (i.e., 10) such intermediaries in
    common.
  • The relationships connecting to the intermediary
    have weight gt 0.3.
  • Low potential COI
  • Other cases

i
r
a
29
COI Detection (cont.)
  • Reviewer and author are indirectly related
    through a semantic association containing three
    relationships.
  • The collaborators (or friends) of the reviewer
    and author have some tie.
  • The level is Low potential COI.

i
i
r
a
30
Experimental Results
  • We selected a subset of papers and reviewers from
    2004 International World Wide Web Conference.
  • The scenario included a subset of 15 PC members
    of the Semantic Web Track and 10 of the accepted
    papers having topics related to such track.
  • We compared our application with the COI
    detection of the Confious conference management
    system.
  • Utilizes first and last names to identify at
    least one co-authored paper in the past (between
    reviewers and authors).

31
Experimental Results (cont.)
32
Experimental Results (cont.)
  • Confious misses COI situations that our
    application does not miss because ambiguous
    entities in DBLP are reconciled in our approach.
  • Our approach provides detailed information such
    as the level of potential COI as well as the
    cause.
  • The results of our approach are enhanced by the
    relationships coming from the FOAF social
    network.
  • However, in cases of Table 6 there was no
    situation of two persons having a foafknows
    relationship and not having co-author
    relationships between them.

33
Experimental Results (cont.)
  • We manually verified the COI assessments in Table
    6.
  • While in most cases our approach validated very
    well, very few cases did not.
  • False-negatives caused by lack of information.
  • FOAF documents are not updated (latest).
  • Co-editing rather than co-authorship.
  • We believe that the assessment should still be
    that of low level of potential COI.

34
Conclusions and Future Work
  • We described how our approach for COI detection
    is based on semantic technologies techniques and
    provided an evaluation of its applicability using
    an integrated social network from the FOAF social
    network and the DBLP co-authorship network.
  • We provided details on how these networks were
    integrated.
  • A demo of the application is available
    (lsdis.cs.uga.edu/projects/semdis/coi)
Write a Comment
User Comments (0)
About PowerShow.com