BeeSpace: An Interactive Environment for Functional Analysis of Social Behavior - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

BeeSpace: An Interactive Environment for Functional Analysis of Social Behavior

Description:

University of Illinois at Urbana-Champaign. www.beespace.uiuc.edu ... Cat-1 Wormbase None. Cat-2 Wormbase None. CCKR-Human UniProt Cholecystokinin receptor ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 65
Provided by: CAN128
Category:

less

Transcript and Presenter's Notes

Title: BeeSpace: An Interactive Environment for Functional Analysis of Social Behavior


1
BeeSpace An Interactive Environment for
Functional Analysis of Social Behavior
  • Bruce Schatz
  • Institute for Genomic Biology
  • University of Illinois at Urbana-Champaign
  • www.beespace.uiuc.edu
  • Session Keynote Talk, 5th Symposium
  • on Understanding Complex Systems
  • University of Illinois May 18, 2005

2
for Social Beehavior
3
Complex Systems I
  • Understanding Social Behavior
  • Honey Bees have only 1 million neurons
  • Yet
  • A Worker Bee exhibits Social Behavior!
  • She forages when she is not hungry
  • but the Hive is
  • She fights when she is not threatened
  • but the Hive is

4
for Functional Analysis
5
Complex Systems II
  • Understanding Functional Analysis
  • Emergent Properties of Social Behavior
  • Can only be Discovered via the
  • Emergent Properties of Distributed Systems
  • The Interspace is the next generation of
  • of the Net
  • Where Concept Navigation across Distributed
    Communities is routine

6
Analysis Environments For Functional Genomics
Bruce R. Schatz CANIS Laboratory(Community
Architectures for Network Information
Systems)University of Illinois at
Urbana-Champaign schatz_at_uiuc.edu ,
www.canis.uiuc.edu
7
What are Analysis Environments
  • Functional Analysis
  • Find the underlying Mechanisms
  • Of Genes, Behaviors, Diseases
  • Comparative Analysis
  • Top-down data mining (vs Bottom-up)
  • Multiple Sources especially literature

8
Building Analysis Environments
  • Manual by Humans
  • Interaction user navigation
  • Classification collection indexing
  • Automatic by Computers
  • Federation search bridges
  • Integration results links

9
  • Social Environment Precisely Controlled
  • Normal Behavior Easily Observed

10
Needles and Haystacks
  • Genes
  • Honey Bees have 13K genes
  • Perhaps 100 have known functions
  • Behaviors
  • Perhaps 20 Societal Roles
  • Honey Bee Brain has 1500 genes
  • gtgt Must relate Genes (small) to Behaviors (large)
    Each Role expresses nearly ALL of these!

11
Gene Expression of Normal Behavior
12
Trends in Analysis Environments
  • Central versus Distributed Viewpoints
  • The 90s Pre-Genome
  • Entrez (NIH NCBI) versus
  • WCS (NSF Arizona)
  • The 00s Post-Genome
  • GO (NIH curators) versus
  • BeeSpace (NSF Illinois)

13
Pre-Genome Environments
  • Focused on Syntax pre-Web
  • WCS (Worm Community System)
  • Search words across sources
  • Follow links across sources
  • Words automatic, Links manual
  • Towards Uniform Searching

14
Post-Genome Environments
  • Focused on Semantics post-Web
  • BeeSpace (Honey Bee Inter Space)
  • Navigate concepts across sources
  • Integrate data across sources
  • Concepts automatic, Links automatic
  • Towards Question Answering

15
Paradigm Shift
  • Towards Dry-Lab Biology, Walter Gilbert (Jan
    1991)
  • The new paradigm, now emerging, is that all the
    'genes' will be known (in the sense of being
    resident in databases available electronically),
    and that the starting point of a biological
    investigation will be theoretical. An
    individual scientist will begin with a
    theoretical conjecture, only then turning to
    experiment to follow or test that hypothesis.
    ...
  • To use this flood of knowledge the total
    sequence of the human and model organisms, which
    will pour across the computer networks of the
    world, biologists not only must become
    computer-literate, but also change their approach
    to the problem of understanding life. ...
  • The Coming of Informational Science
  • Correlation of Information across Sources

16
Community Systems
results
data
(database management)
(electronic mail)
knowledge
(hypertext annotations)
literature
news
(information retrieval)
(bulletin boards)
Formal
Informal
browse and share all the knowledge of a community
17
Worm Community System
  • WCS Information
  • Literature BIOSIS, MEDLINE, newsletters,
    meetings
  • Data Genes, Maps, Sequences, strains, cells
  • WCS Functionality
  • Browsing search, navigation
  • Filtering selection, analysis
  • Sharing linking, publishing
  • WCS 250 users at 50 labs across Internet (1991)

18
WCS Molecular
19
WCS Cellular
20
WCS Publishing
21
WCS Linking
22
WCS invokes gm
23
WCS vis-à-vis acedb
24
Towards the Interspace
  • from Objects to Concepts
  • from Syntax to Semantics
  • Infrastructure is Interaction with Abstraction

Internet is packet transmission across
computers Interspace is concept navigation
across repositories
25
THE THIRD WAVE OF NET EVOLUTION
CONCEPTS
OBJECTS
PACKETS
26
LEVELS OF INDEXES
27
COMPUTING CONCEPTS
92 4,000 (molecular biology) 93 40,000
(molecular biology) 95 400,000 (electrical
engineering) 96 4,000,000 (engineering) 98
40,000,000 (medicine)
28
Simulating a New World
  • Obtain discipline-scale collection
  • MEDLINE from NLM, 10M bibliographic abstracts
  • human classification Medical Subject Headings
  • Partition discipline into Community Repositories
  • 4 core terms per abstract for MeSH classification
  • 32K nodes with core terms (classification tree)
  • Community is all abstracts classified by core
    term
  • 40M abstracts containing 280M concepts
  • concept spaces took 2 days on NCSA Origin 2000
  • Simulating World of Medical Communities
  • 10K repositories with gt 1K abstracts (1K w/ gt
    10K)

29
Interspace Remote Access Client
30
Navigation in MEDSPACE
  • For a patient with Rheumatoid Arthritis
  • Find a drug that reduces the pain (analgesic)
  • but does not cause stomach (gastrointestinal)
    bleeding

Choose Domain
31
Concept Search
32
Concept Navigation
33
Retrieve Document
34
Navigate Document
35
Retrieve Document
36
Informational Science
  • Computational Science is widely accepted as
  • The Third Branch of Science
  • (beyond Experimental and Theoretical)
  • Genes are Computed, Proteins are Computed,
  • Sequence equivalences are Computed.
  • Informational Science is coming to be accepted as
  • The Fourth Branch of Science
  • Based on Information Science technologies for
  • Functional Analysis across Information Sources

37
Post-Genome Informatics I
  • Comparative Analysis within the
  • Dry Lab of Biological Knowledge
  • Classical Organisms have Genetic Descriptions.
  • There will be NO more classical organisms beyond
  • Mice and Men, Worms and Flies, Yeasts and Weeds.
  • Must use comparative genomics on classical
    organisms
  • Via sequence homologies and literature analysis.

38
Post-Genome Informatics II
  • Functional Analysis within the
  • Dry Lab of Biological Knowledge
  • Automatic annotation of genes to standard
    classifications, e.g. Gene Ontology via homology
    on computed protein sequences.
  • Automatic analysis of functions to scientific
    literature, e.g. concept spaces via text
    extractions. Thus must use functions in
    literature descriptions.

39
Informatics From Bases to Spaces
  • data Bases support genome data
  • e.g. FlyBase has sequences and maps
  • Genes annotated by GO and linked to literature
  • e.g. BeeBase has computed annotations
  • Protein homologies for similar Genes via GO
  • information Spaces support biomedical literature
  • e.g. BeeSpace uses automatically generated
  • conceptual relationships to navigate functions

40
Gene Ontology
41
Gene Ontology
  • Gene Symbol Data Source Full Name
  • Calca MGI calcitonin-related polypeptide
  • Cat-1 Wormbase None
  • Cat-2 Wormbase None
  • CCKR-Human UniProt Cholecystokinin receptor
  • CRF2-Rat UniProt Corticotropin releasing factor
  • Crhr2 RGD corticotrophin relse hormone
  • Egl-10 Wormbase None
  • Egl-30 Wormbase None
  • Feh-1 Wormbase None
  • For FlyBase None

42
Conceptual Navigation in BeeSpace
43
BeeSpace Analysis Environment
  • Build Concept Space of Biomedical Literature for
    Functional Analysis of Bee Genes
  • -Partition Literature into Community Collections
  • -Extract and Index Concepts within Collections
  • -Navigate Concepts within Documents
  • -Follow Links from Documents into Databases
  • Locate Candidate Genes in Related Literatures
    then follow links into Genome Databases

44
Question Answering
45
Functional Phrases
  • ltgenegt encodes ltchemicalgt
  • Sokolowski and colleagues demonstrated in
    Drosophila melanogaster that the foraging gene
    (for) encodes a cGMP dependent protein kinase
    (PKG).
  • The dg2 gene encodes a cyclic guanosine
    monophosphate (cGMP)- dependent protein kinase
    (PKG).
  • ltchemicalgt affects/causes ltbehaviorgt
  • Thus, PKG levels affected food-search behavior.
  • cGMP treatment elevated PKG activity and caused
    foraging behavior.
  • ltgenegt regulates ltbehaviorgt
  • Amfor, an ortholog of the Drosophila for gene, is
    involved in the regulation of age at onset of
    foraging in honey bees.
  • This idea is supported by results for malvolio
    (mvl), which encodes a manganese transporter and
    is involved in regulating Drosophila feeding and
    age at onset of foraging in honey bees.

46
BeeSpace Software Implementation
  • Natural Language Processing
  • Identify noun and verb phrases
  • Recognize biological entities
  • Compute biological relations
  • Statistical Information Retrieval
  • Compute statistical contexts
  • Support conceptual navigation

47
Data Integration (FlyBase Gene)
  • D. melanogaster gene foraging , abbreviated as
    for , is reported here . It has also been known
    in FlyBase as BcDNAGM08338, CG10033 and
    l(2)06860. It encodes a product with
    cGMP-dependent protein kinase activity
    (EC2.7.1.-) involved in protein amino acid
    phosphorylation which is a component of the
    cellular_component unknown . It has been
    sequenced and its amino acid sequence contains an
    eukaryotic protein kinase , a protein kinase
    C-terminal domain , a tyrosine kinase catalytic
    domain , a serine/Threonine protein kinase family
    active site , a cAMP-dependent protein kinase and
    a cGMP-dependent protein kinase . It has been
    mapped by recombination to 2-10 and cytologically
    to 24A2--4 . It interacts genetically with Csr .
    There are 27 recorded alleles 1 in vitro
    construct (not available from the public stock
    centers), 25 classical mutants ( 3 available from
    the public stock centers) and 1 wild-type.
    Mutations have been isolated which affect the
    larval nerve terminal and are behavioral, pupal
    recessive lethal, hyperactive, larval
    neurophysiology defective and larval neuroanatomy
    defective. for is discussed in 80 references
    (excluding sequence accessions), dated between
    1988 and 2003. These include at least 6 studies
    of mutant phenotypes , 2 studies of wild-type
    function , 3 studies of natural polymorphisms and
    7 molecular studies . Among findings on for
    function, for activity levels influence adult
    olfactory trap response to a food medium
    attractant. Among findings on for polymorphisms,
    the frequency of for R and for s strains in three
    natural populations are studied to determine the
    contribution of the local parasitoid community to
    the differences in for R and for s frequencies.

48
BeeSpace Information Sources
  • Biomedical Literature
  • Medline (medicine)
  • Biosis (biology)
  • Agricola, CAB Abstracts, Agris (agriculture)
  • Model Organisms (heredity)
  • -Gene Descriptions (FlyBase, WormBase)
  • Natural Histories (environment)
  • -BeeKeeping Books (Cornell, Harvard)

49
Medical Concept Spaces (1998)
  • Medical Literature (Medline, 10M abstracts)
  • Partition with Medical Subject Headings (MeSH)
  • Community is all abstracts classified by core
    term
  • 40M abstracts containing 280M concepts
  • computation is 2 days on NCSA Origin 2000
  • Simulating World of Medical Communities
  • 10K repositories with gt 1K abstracts
  • (1K with gt 10K)

50
Biological Concept Spaces (2006)
  • Compute concept spaces for All of Biology
  • BioSpace across entire biomedical literature
  • 50M abstracts across 50K repositories
  • Use Gene Ontology to partition literature into
  • biological communities for functional analysis
  • GO same scale as MeSH but adequate coverage?
  • GO light on social behavior (biological process)

51
Paradigm Shift
  • Dissecting Human Disease, Victor McKusick (Feb
    2001)
  • Structural genomics Functional genomics
  • Genomics Proteomics
  • Map-based gene discovery Sequence-based gene
    discovery
  • Monogenic disorders Multifactorial disorders
  • Specific DNA diagnosis Monitoring susceptibility
  • Analysis of one gene Analysis of multi-gene
    pathways
  • Gene action Gene regulation
  • Etiology (mutation) Pathogenesis (mechanism)
  • One species Several species

52
Concept Switching
  • In the Interspace
  • each Community maintains its own repository
  • Switching is navigating Across repositories
  • use your specialty vocabulary to search another
    specialty

53
CONCEPT SWITCHING
  • Concept versus Term
  • set of semantically equivalent terms
  • Concept switching
  • region to region (set to set) match

54
Biomedical Session
55
Categories and Concepts
56
Concept Switching
57
Document Retrieval
58
Technology Trends
  • IEEE Computer for January 2002
  • Information Infrastructure for Trends issue
  • Review of Building The Interspace
  • Document Representation (Semantic Web)
  • Language Parsing (TIPSTER)
  • Statistical Indexing (TREC)
  • Peer-Peer Networking (SETI_at_home)
  • Vocabulary Switching (UMLS)

59
Open Research Problems
  • Language Processing
  • Bandwidth filtering and normalization
  • Concept Switching
  • Spreading activation and graph matching
  • Dynamic Indexing
  • Session collections and distributed processing
  • Path Matching
  • Aggregating indexes and repository merging

60
THE NET OF THE 21st CENTURY
  • Beyond Objects to Concepts
  • Beyond Search to Analysis
  • Problem Solving via Cross-Correlating Multimedia
    Information across the Net
  • Every community has its own special library
  • Every community does semantic indexing
  • The Interspace approximates Cyberspace

61
Interactive Functional Analysis
  • BeeSpace will enable users to navigate a uniform
    space of diverse databases and literature sources
    for hypothesis development and testing, with a
    software system beyond a searchable database,
    using literature analyses to discover functional
    relationships between genes and behavior.
  • Genes to Behaviors
  • Behaviors to Genes
  • Concepts to Concepts
  • Clusters to Clusters
  • Navigation across Sources

62
XSpace Information Sources
  • Organize Genome Databases (XBase)
  • Compute Gene Descriptions from Model Organisms
  • Partition Scientific Literature for Organism X
  • Compute XSpace using Semantic Indexing
  • Boost the Functional Analysis from Special
    Sources
  • Collecting Useful Data about Natural Histories
  • e.g. CowSpace Leverage in AIPL Databases

63
Towards the Interspace
  • The Analysis Environment technology is
    GENERAL! BirdSpace? BeeSpace?
  • PigSpace? CowSpace?
  • BehaviorSpace? BrainSpace?
  • BioSpace
  • Interspace

64
Acknowledgements
  • BeeSpace project is NSF FIBR flagship
  • Frontiers Integrative Biological Research,
  • 5M for 5 years at University of Illinois
  • Biology
  • Gene Robinson, Entomology (behavioral
    expression)
  • Susan Fahrbach, Wake Forest (anatomical
    localization)
  • Sandra Rodriguez-Zas, Animal Sciences (data
    analysis)
  • Informatics
  • Bruce Schatz, Library Information Science
    (systems) ChengXiang Zhai, Computer Science (text
    analysis)
  • Chip Bruce, Library Information Science (users)
Write a Comment
User Comments (0)
About PowerShow.com