Literature Extraction: Entities and Relations - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Literature Extraction: Entities and Relations

Description:

... outliers, correlations/associations, clusters, trends, ... (e.g., gene A and ... one gene interacts with another gene in a certain fashion (3 types of relations) ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 18
Provided by: Ale8212
Category:

less

Transcript and Presenter's Notes

Title: Literature Extraction: Entities and Relations


1
Literature ExtractionEntities and Relations
  • ChengXiang (Cheng) Zhai
  • Department of Computer Science
  • Institute for Genomic Biology
  • Statistics
  • Graduate School of Library Information Science
  • University of Illinois at Urbana-Champaign

ABC Workshop, UIUC, Dec. 5-6, 2007
2
Outline
  • General Background on Text Information Management
  • Information Extraction Entities Relations
  • Towards Automated Gene Annotation

3
Text Management Technologies
Mining
Access
Select information
Create Knowledge
Add Structure/Annotations
Organization
4
Text Information Access
  • Search Works well if you know exactly what you
    want (e.g., PubMed)
  • Navigation More useful for exploring information
    space or when you cant formulate a good query
  • Recommendation Push information to a user

5
Text Information Organization
  • Summarization
  • Single document vs. multiple documents,
    unstructured vs. structured
  • Helps digest information
  • Categorization
  • Classify text into predefined categories (e.g.,
    different GO terms)
  • Adds structures to text helps browsing direct
    prediction
  • Clustering
  • Group similar text segments or articles into the
    same cluster
  • Helps reveal underlying structures

6
Text Mining
  • Information Extraction
  • Pulling out entities (e.g., genes and proteins)
    and relations (e.g., protein interactions)
  • Helps semantic analysis and inferences
  • Topic Extraction
  • Tease out subtopics/themes in text (e.g.,
    separate multiple subtopics in the same article)
  • Helps summarization and browsing
  • Inferences
  • Attempts to create new knowledge (e.g., Gene A
    has function F Gene A and Gene B are similar ?
    Gene B has function F)
  • Pattern Discovery
  • Discover outliers, correlations/associations,
    clusters, trends, (e.g., gene A and gene B tend
    to occur in similar context)
  • Helps create new knowledge

7
Key Technique Information Retrieval
  • Text/topic representation Vector-space,
    probabilistic models
  • Term weighting TF-IDF
  • Text matching (similarity) Vector/distribution
    similarity functions
  • User feedback (learning about what a user is
    interested in) machine learning

8
Key Technique Machine Learning
  • Computation YF(X)
  • Knowledge-based Specify a recipe (program)
  • Data-driven Provide examples of (X,Y) pairs
  • Basic setup
  • Given training data (many (X,Y) pairs)
  • Assume some kind of function relation between X
    and Y often with parameters
  • Fit the function to the training data to set
    parameters
  • Hope the learned function to be able to compute Y
    for new X
  • Generally require many training examples
    (supervised learning)
  • May also work without training examples
    (clustering)

9
Key Technique Natural Language Processing
  • Basic Tasks
  • Part-of-speech tagging (recognizing nouns, verbs,
    )
  • Syntactic parsing (recognizing sentence
    structure)
  • Semantic analysis (trying to get the meaning)
  • State of the art methods rely on machine learning
    supplemented by some limited linguistic knowledge
  • Generally a very difficult task, but easier for a
    specific domain such as biomedical literature

10
Massive Entity Recognition
  • Class1 Small Variation (Dictionary/Ontology)
  • Organism, Anatomy , Biological Process, Pathway,
    Protein Family
  • Class2 Medium Variation
  • Gene, cis Regulatory Element
  • Class3 Large Variation
  • Phenotype, Behavior

11
Massive Relation Extraction
  • Expression Location
  • the expression of a gene in some location
    (tissues, body parts)
  • Homology/Orthology
  • one gene is homologous to another gene
  • Biological process
  • one gene has some role in a biological process
  • Genetic/Physical/Regulatory Interaction
  • one gene interacts with another gene in a certain
    fashion (3 types of relations)
  • a simple case Protein-Protein Interaction (PPI)

12
Generating New Knowledge
  • Entity Relation Graph Mining
  • Logic-based Inferences

13
Example of Interactive Graph Mining
Behavior B2
isa
isa
Co-occur-fly
Co-occur-bee
Behavior B1
Gene A1
Behavior B4
Behavior B3
Orth-mos
Co-occur-mos
Co-occur-fly
Gene A1
Gene A2
Gene A3
Reg
Reg
Reg
orth
Reg
Gene A4
Gene A4
Gene A5
  • 1.XNeighborOf(B4, Behavior, co-occur,isa)
    B1,B2,B3
  • 2. YNeighborOf(X, Gene, c-occur, orth
    A1,A1,A2,A3
  • 3. YY A5, A6 A1,A1, A2, A3,A5,A6
  • 4. ZNeighborOf(Y, Gene, reg) A4, A4

14
Logic-Based Discovery
  • Encode all kinds of knowledge in the same
    knowledge representation language
  • Perform logic inferences
  • Example
  • Regulate (GeneA, GeneB, ContextC). Literature
    mining
  • SeqSimilar(GeneA,GeneA) Sequence mining
  • Regulate(X,Y,C)? Regulate(Z,Y,C)
    SeqSimilar(X,Z) Human knowledge
  • ? Regulate(GeneA,GeneB,ContextC)
  • ADD InPathway(GeneB, P1)
  • InPathway(X,P)? Regulate(X,Y,C) InPathway(Y,P)
    Human knowledge
  • ? InvolvedInPathway(GeneA,P1)

15
Towards Automated Annotation

Genome
Name 1

Name 2
Name k
2. Gene Summarization
6. Relation Retrieval
1. Literature Search
4. Gene-Term Association
Literature
7. Inferences
Relevant Text
3. GOTerm Summarization
Entities Relations
Gene Ontology
5. New Term Suggestion
Term1 Term2 . Term n
16
Annotation Support Technologies
  • Level 1 Literature Search
  • document filtering (locate relevant articles)
  • relevant passage retrieval (locate relevant
    passages)
  • Level 2 Semi-automatic annotation
  • Gene summarization
  • GO term summarization (profiling)
  • Gene-Term association analysis
  • New GO term suggestion
  • Level 3 Automatic annotation Gene?GO
  • Relation retrieval (direct mentioning)
  • Inferences/Relation mining (Inferred knowledge)

17
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com