Text Analytics for Semantic Applications Workshop - PowerPoint PPT Presentation

1 / 92
About This Presentation
Title:

Text Analytics for Semantic Applications Workshop

Description:

Text Analytics for Semantic Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com – PowerPoint PPT presentation

Number of Views:386
Avg rating:3.0/5.0
Slides: 93
Provided by: TomR97
Category:

less

Transcript and Presenter's Notes

Title: Text Analytics for Semantic Applications Workshop


1
Text Analyticsfor Semantic ApplicationsWorkshop
  • Tom ReamyChief Knowledge Architect
  • KAPS Group
  • Knowledge Architecture Professional Services
  • http//www.kapsgroup.com

2
Agenda
  • Introduction Text Analytics Infrastructure
    Platform
  • Text Analytics Features
  • Semantic Infrastructure Taxonomy, Metadata,
    Technology
  • Value of Text Analytics
  • Getting Started with Text Analytics
  • Development Taxonomy, Categorization, Faceted
    Metadata
  • Text Analytics Applications
  • Integration with Search and ECM
  • Platform for Information Applications text into
    data
  • Questions / Discussions

3
KAPS Group General
  • Knowledge Architecture Professional Services
    Network of Consultants
  • Partners SAS, SAP, IBM, FAST, Smart Logic,
    Concept Searching
  • Attensity, Clarabridge, Lexalytics,
  • Strategy IM KM - Text Analytics, Social
    Media, Integration
  • Services
  • Taxonomy/Text Analytics development, consulting,
    customization
  • Text Analytics Quick Start Audit, Evaluation,
    Pilot
  • Social Media Text based applications design
    development
  • Clients
  • Genentech, Novartis, Northwestern Mutual Life,
    Financial Times, Hyatt, Home Depot, Harvard
    Business Library, British Parliament, Battelle,
    Amdocs, FDA, GAO, etc.
  • Applied Theory Faceted taxonomies, complexity
    theory, natural categories, emotion taxonomies
  • Presentations, Articles, White Papers
    http//www.kapsgroup.com

4
Agenda Introduction Text Analytics Semantic
Infrastructure
  • Text Analytics Features
  • Categorization Extraction
  • Semantic Infrastructure
  • Taxonomy, Metadata, Technology
  • Value of Text Analytics
  • Add Intelligence to Semantic Applications
  • Getting Started with Text Analytics
  • Text Analytics Strategy Vision
  • Text Analytics Evaluation / Quick Start

5
Introduction to Text AnalyticsText Analytics
Features
  • Noun Phrase Extraction / Fact Extraction
  • Catalogs with variants, rule based dynamic
  • Relationships of entities people-organizations-a
    ctivities
  • Sentiment Analysis
  • Objects and phrases statistics rules
    Positive and Negative
  • Summarization replace snippets
  • Auto-categorization built on a taxonomy
  • Training sets, Terms, Semantic Networks
  • Rules AND, OR, NOT, DIST, PARAGRAPH, SENTENCE
  • Auto-categorization as Foundation
  • Disambiguation - Identification of objects,
    events, context
  • Build rules based, not simply Bag of Individual
    Words

6
Case Study Categorization Sentiment
7
Case Study Categorization Sentiment
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
Introduction to Text AnalyticsTaxonomy Metadata
  • Thesauri, Controlled Vocabulary, Glossaries,
    Product Catalogs
  • Resources to build on
  • SharePoint Managed Metadata Services
  • Term stores corporate taxonomies
  • Enterprise Keywords (Folksonomy)
  • Metadata standards Dublin Core - Mostly
    syntactic not semantic
  • Semantic keywords very poor performance, no
    structure
  • Facets classes of metadata
  • Standard - People, Organization, Document
    type-purpose
  • Requires huge amounts of metadata- entities
    triples

14
Introduction to Text AnalyticsTA Taxonomy
Complimentary Information Platform
  • Taxonomy provides a consistent and common
    vocabulary
  • Enterprise resource integrated not centralized
  • Text Analytics provides a consistent tagging
  • Human indexing is subject to inter and intra
    individual variation
  • Taxonomy provides the basic structure for
    categorization
  • And candidates terms
  • Text Analytics provides the power to apply the
    taxonomy
  • And metadata of all kinds
  • Text Analytics and Taxonomy Together Platform
  • Consistent in every dimension
  • Powerful and economic

15
Introduction to Text AnalyticsTaxonomy and Text
Analytics
  • Standard Taxonomies starter categorization
    rules
  • Example Mesh bottom 5 layers are terms
  • Categorization taxonomy structure
  • Tradeoff of depth and complexity of rules
  • Easier to maintain taxonomy, but need to refine
    rules
  • Analysis of taxonomy suitable for
    categorization
  • Structure not too flat, not too large
  • Orthogonal categories
  • Smaller modular taxonomies
  • More flexible relationships not just
    Is-A-Kind/Child-Of
  • Different kinds of taxonomies
  • Sentiment products and features
  • Taxonomy of Sentiment, Emotion - Expertise
    process

16
Introduction to Text AnalyticsMetadata - Tagging
  • How do you bridge the gap taxonomy to
    documents?
  • Tagging documents with taxonomy nodes is tough
  • And expensive central or distributed
  • Library staff experts in categorization not
    subject matter
  • Too limited, narrow bottleneck
  • Often dont understand business processes and
    business uses
  • Authors Experts in the subject matter, terrible
    at categorization
  • Intra and Inter inconsistency, intertwingleness
  • Choosing tags from taxonomy complex task
  • Folksonomy almost as complex, wildly
    inconsistent
  • Resistance not their job, cognitively difficult
    non-compliance
  • Text Analytics is the answer(s)!

17
Introduction to Text AnalyticsContent Management
SharePoint
  • Mind the Gap Manual, Automatic, Hybrid
  • All require human effort issue of where and how
    effective
  • Manual - human effort is tagging (difficult,
    inconsistent)
  • Automatic and Hybrid - human effort is prior to
    tagging
  • Build on expertise librarians on
    categorization, SMEs on subject terms
  • Hybrid Model
  • Publish Document -gt Text Analytics analysis -gt
    suggestions for categorization, entities,
    metadata - gt present to author
  • Cognitive task is simple -gt react to a suggestion
    instead of select from head or a complex taxonomy
  • Feedback if author overrides -gt suggestion for
    new category
  • Facets Requires a lot of Metadata - Entity
    Extraction feeds facets
  • Hybrid Automatic is really a spectrum depends
    on context

18
Introduction to Text AnalyticsBenefits of Text
Analytics
  • Why Text Analytics?
  • Enterprise search has failed to live up to its
    potential
  • Enterprise Content management has failed to live
    up to its potential
  • Taxonomy has failed to live up to its potential
  • Adding metadata, especially keywords has not
    worked
  • What is missing?
  • Intelligence human level categorization,
    conceptualization
  • Infrastructure Integrated solutions not
    technology, software
  • Text Analytics can be the foundation that
    (finally) drives success search, content
    management, and much more
  • Combine with semantic technologiesnew
    application dimensions

19
Text Analytics Platform BenefitsIDC White Paper
  • Time Wasted
  • Reformat information - 5.7 million per 1,000 per
    year
  • Not finding information - 5.3 million per 1,000
  • Recreating content - 4.5 Million per 1,000
  • Small Percent Gain large savings
  • 1 - 10 million
  • 5 - 50 million
  • 10 - 100 million

20
Text Analytics Platform Benefits
  • Findability within and outside the enterprise
  • Savings per year - millions
  • Rescue enterprise search and ECM projects
  • Add semantics to search
  • Clean up enterprise content
  • Duplication and accurate categorization
  • Improve the quality of information access
  • Finding the right information can save millions
  • Build smarter applications
  • Social networking, locate expertise within the
    enterprise

21
Text Analytics Platform Benefits
  • Understand your customers
  • What they are talking about and how they feel
    about it
  • Empower your employees
  • Not only more time, but they work smarter
  • Understand your competitors
  • What they are working on, talking about
  • Combine unstructured content and rich data
    sources more intelligent analysis
  • Payoff for semantic technologies partner with
    Text Analytics

22
Text Analytics Platform Dangers
  • Text Analytics as a software project
  • Not enough resources to develop, to
    maintain-refine
  • Wrong resources SMEs, IT, Library
  • Need all of the above and taxonomists
  • Bad Design
  • Start with bad taxonomy
  • Wrong taxonomy too big or two flat
  • Bad Categorization / Entity Extraction
  • Right kind of experience

23
Getting Started with Text AnalyticsText
Analytics Vision Strategy
  • Strategic Questions why, what value from the
    text analytics, how are you going to use it
  • Platform or Applications?
  • What are the basic capabilities of Text
    Analytics?
  • What can Text Analytics do for Search?
  • After 10 years of failure get search to work?
  • What can you do with smart search based
    applications?
  • RM, PII, Social
  • ROI for effective search difficulty of
    believing
  • Problems with metadata, taxonomy

24
Getting Started with Text AnalyticsText
Analytics Vision Strategy
  • Simple Subject Taxonomy structure
  • Easy to develop and maintain
  • Combined with categorization capabilities
  • Added power and intelligence
  • Combined with people tagging, refining tags
  • Combined with Faceted Metadata
  • Dynamic selection of simple categories
  • Allow multiple user perspectives
  • Cant predict all the ways people think
  • Monkey, Banana, Panda
  • Combined with ontologies and semantic data
  • Multiple applications Text mining to Search
  • Combine search and browse

25
Step 1 TA Information Audit Start with Self
Knowledge
  • Info Problems what, how severe
  • Formal Process - KA audit content, users,
    technology, business and information behaviors,
    applications - Or informal for smaller
    organization,
  • Contextual interviews, content analysis, surveys,
    focus groups, ethnographic studies, Text Mining
  • Category modeling Cognitive Science how
    people think
  • Natural level categories mapped to communities,
    activities
  • Novice prefer higher levels
  • Balance of informative and distinctiveness
  • Text Analytics Strategy/Model forms,
    technology, people

26
Step 1 TA Information Audit Start with Self
Knowledge
  • Ideas Content and Content Structure
  • Map of Content Tribal language silos
  • Structure articulate and integrate
  • Taxonomic resources
  • People Producers Consumers
  • Communities, Users, Central Team
  • Activities Business processes and procedures
  • Semantics, information needs and behaviors
  • Information Governance Policy
  • Technology
  • CMS, Search, portals, text analytics
  • Applications BI, CI, Semantic Web, Text Mining

27
Step 2 TA EvaluationVarieties of Taxonomy/
Text Analytics Software
  • Taxonomy Management - extraction
  • Full Platform
  • SAS, SAP, Smart Logic, Concept Searching, Expert
    System, IBM, Linguamatics, GATE
  • Embedded Search or Content Management
  • FAST, Autonomy, Endeca, Vivisimo, NLP, etc.
  • Interwoven, Documentum, etc.
  • Specialty / Ontology (other semantic)
  • Sentiment Analysis Attensity, Lexalytics,
    Clarabridge, Lots
  • Ontology extraction, plus ontology

28
Step 2 Text Analytics EvaluationDifferent Kind
of software evaluation
  • Traditional Software Evaluation - Start
  • Filter One- Ask Experts - reputation, research
    Gartner, etc.
  • Market strength of vendor, platforms, etc.
  • Feature scorecard minimum, must have, filter to
    top 6
  • Filter Two Technology Filter match to your
    overall scope and capabilities Filter not a
    focus
  • Filter Three In-Depth Demo 3-6 vendors
  • Reduce to 1-3 vendors
  • Vendors have different strengths in multiple
    environments
  • Millions of short, badly typed documents, Build
    application
  • Library 200 page PDF, enterprise public search

29
Design of the Text Analytics Selection Team
Traditional Candidates IT, Business, Library
  • IT - Experience with software purchases, needs
    assess, budget
  • Search/Categorization is unlike other software,
    deeper look
  • Business -understand business, focus on business
    value
  • They can get executive sponsorship, support, and
    budget
  • But dont understand information behavior,
    semantic focus
  • Library, KM - Understand information structure
  • Experts in search experience and categorization
  • But dont understand business or technology

30
Design of the Text Analytics Selection Team
  • Interdisciplinary Team, headed by Information
    Professionals
  • Relative Contributions
  • IT Set necessary conditions, support tests
  • Business provide input into requirements,
    support project
  • Library provide input into requirements, add
    understanding of search semantics and
    functionality
  • Much more likely to make a good decision
  • Create the foundation for implementation

31
Step 3 Proof of Concept / Pilot Project
  • 4 weeks POC bake off / or short pilot
  • Real life scenarios, categorization with your
    content
  • 2 rounds of development, test, refine / Not OOB
  • Need SMEs as test evaluators also to do an
    initial categorization of content
  • Measurable Quality of results is the essential
    factor
  • Majority of time is on auto-categorization
  • Need to balance uniformity of results with vendor
    unique capabilities have to determine at POC
    time
  • Taxonomy Developers expert consultants plus
    internal taxonomists

32
Questions?
  • Tom Reamytomr_at_kapsgroup.com
  • KAPS Group
  • Knowledge Architecture Professional Services
  • http//www.kapsgroup.com

33
Text Analytics Workshop Development
  • Tom ReamyChief Knowledge Architect
  • KAPS Group
  • Knowledge Architecture Professional Services
  • http//www.kapsgroup.com

34
Agenda
  • Development - Foundation
  • Case Study 1 Internet News
  • Case Study 2 Tale of two taxonomies
  • Case Study 3 Software Evaluation and Beyond
  • BBN Motivations
  • Amgen 2 clustering, auto-taxonomy
  • GAO Taxonomy from terms to rules
  • Exercises

35
Text Analytics Platform4 Basic Contexts
  • Ideas Content Structure
  • Language and Mind of your organization
  • Applications - exchange meaning, not data
  • People Company Structure
  • Communities, Users
  • Central team - establish standards, facilitate
  • Activities Business processes and procedures
  • Technology
  • CMS, Search, portals, taxonomy tools
  • Applications BI, CI, Text Mining

36
Text Analytics Development Foundation
  • Articulated Information Management Strategy (K
    Map)
  • Content and Structures and Metadata
  • Search, ECM, applications - and how used in
    Enterprise
  • Community information needs and Text Analytics
    Team
  • POC establishes the preliminary foundation
  • Need to expand and deepen
  • Content full range, basis for rules-training
  • Additional SMEs content selection, refinement
  • Taxonomy starting point for categorization /
    suitable?
  • Databases starting point for entity catalogs

37
Knowledge Architecture AuditKnowledge Map
Project Foundation Contextual Interviews Information Interviews App/Content Catalog User Survey Strategy Document
Meetings, work groups Overview High Level Process Community Info behaviors of Business processes Technology and content All 4 dimensions Meetings, work groups
General Outline Broad Context Deep Details Deep Details Complete Picture New Foundation
38
Taxonomy Development ProcessProgressive
Refinement
Taxonomy Model Information Interviews Content Analysis Refine Map Community Governance Plan
Buy/Find work groups Overview Info behaviors, Card Sorts Bottom Up Prototypes Interviews Evaluate Refine Interviews Develop, Refine
General Outline Preliminary Taxonomy Taxonomy 1.0 Taxonomy 1.0-1.9 Tax 2.0 Taxonomy
39
Text Analytics Development Categorization Process
  • Starter Taxonomy
  • If no taxonomy, develop initial high level (see
    Chart)
  • Analysis of taxonomy suitable for
    categorization
  • Structure not too flat, not too large
  • Orthogonal categories
  • Content Selection
  • Map of all anticipated content
  • Selection of training sets if possible
  • Automated selection of training sets taxonomy
    nodes as first categorization rules apply and
    get content

40
Text Analytics Development Categorization Process
  • First Round of Categorization Rules
  • Term building from content basic set of terms
    that appear often / important to content
  • Add terms to rule, apply to broader set of
    content
  • Repeat for more terms get recall-precision
    scores
  • Repeat, refine, repeat, refine, repeat
  • Get SME feedback formal process scoring
  • Get SME feedback human judgments
  • Text against more, new content
  • Repeat until done 90?

41
Text Analytics Development Entity Extraction
Process
  • Facet Design from KA Audit, K Map
  • Find and Convert catalogs
  • Organization internal resources
  • People corporate yellow pages, HR
  • Include variants
  • Scripts to convert catalogs programming
    resource
  • Build initial rules follow categorization
    process
  • Differences scale, score
  • Recall find all entities
  • Precision correct assignment to entity class
  • Issue disambiguation Ford company, person,
    car

42
Case Study - Background
  • Inxight Smart Discovery
  • Multiple Taxonomies
  • Healthcare first target
  • Travel, Media, Education, Business, Consumer
    Goods,
  • Content 800 Internet news sources
  • 5,000 stories a day
  • Application Newsletters
  • Editors using categorized results
  • Easier than full automation

43
Case Study - Approach
  • Initial High Level Taxonomy
  • Auto generation very strange not usable
  • Editors High Level sections of newsletters
  • Editors Taxonomy Pros - Broad categories
    refine
  • Develop Categorization Rules
  • Multiple Test collections
  • Good stories, bad stories close misses - terms
  • Recall and Precision Cycles
  • Refine and test taxonomists many rounds
  • Review editors 2-3 rounds
  • Repeat about 4 weeks

44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
Case Study - Issues
  • Taxonomy Structure
  • Aggregate nodes vs. independent nodes
  • Children Nodes subset rare
  • Depth of taxonomy and complexity of rules
  • Trade-off need to update and usefulness of
    categories
  • Multiple avenues - Facets source New York
    Times can put into rules or make it a facet to
    filter results
  • When to use filter or terms experimental
  • Recall more important than precision editors
    role

52
Case Study Lessons Learned
  • Combination of SME and Taxonomy pros
  • Combination of Features Entity extraction,
    terms, Boolean, filters, facts
  • Training sets and find similar are weakest
  • Somewhat useful during development for terms
  • No best answer taxonomy structure, format of
    rules
  • Need custom development
  • Plan for ongoing refinement
  • This stuff actually works!

53
Enterprise Environment Case Studies
  • A Tale of Two Taxonomies
  • It was the best of times, it was the worst of
    times
  • Basic Approach
  • Initial meetings project planning
  • High level K map content, people, technology
  • Contextual and Information Interviews
  • Content Analysis
  • Draft Taxonomy validation interviews, refine
  • Integration and Governance Plans

54
Enterprise Environment Case One Taxonomy, 7
facets
  • Taxonomy of Subjects / Disciplines
  • Science gt Marine Science gt Marine microbiology gt
    Marine toxins
  • Facets
  • Organization gt Division gt Group
  • Clients gt Federal gt EPA
  • Instruments gt Environmental Testing gt Ocean
    Analysis gt Vehicle
  • Facilities gt Division gt Location gt Building X
  • Methods gt Social gt Population Study
  • Materials gt Compounds gt Chemicals
  • Content Type Knowledge Asset gt Proposals

55
Enterprise Environment Case One Taxonomy, 7
facets
  • Project Owner KM department included RM,
    business process
  • Involvement of library - critical
  • Realistic budget, flexible project plan
  • Successful interviews build on context
  • Overall information strategy where taxonomy
    fits
  • Good Draft taxonomy and extended refinement
  • Software, process, team train library staff
  • Good selection and number of facets
  • Final plans and hand off to client

56
Enterprise Environment Case Two Taxonomy, 4
facets
  • Taxonomy of Subjects / Disciplines
  • Geology gt Petrology
  • Facets
  • Organization gt Division gt Group
  • Process gt Drill a Well gt File Test Plan
  • Assets gt Platforms gt Platform A
  • Content Type gt Communication gt Presentations

57
Enterprise Environment Case Two Taxonomy, 4
facets
  • Environment Issues
  • Value of taxonomy understood, but not the
    complexity and scope
  • Under budget, under staffed
  • Location not KM tied to RM and software
  • Solution looking for the right problem
  • Importance of an internal library staff
  • Difficulty of merging internal expertise and
    taxonomy

58
Enterprise Environment Case Two Taxonomy, 4
facets
  • Project Issues
  • Project mind set not infrastructure
  • Wrong kind of project management
  • Special needs of a taxonomy project
  • Importance of integration with team, company
  • Project plan more important than results
  • Rushing to meet deadlines doesnt work with
    semantics as well as software

59
Enterprise Environment Case Two Taxonomy, 4
facets
  • Research Issues
  • Not enough research and wrong people
  • Interference of non-taxonomy communication
  • Misunderstanding of research wanted tinker toy
    connections
  • Interview 1 implies conclusion A
  • Design Issues
  • Not enough facets
  • Wrong set of facets business not information
  • Ill-defined facets too complex internal
    structure

60
Taxonomy DevelopmentConclusion Risk Factors
  • Political-Cultural-Semantic Environment
  • Not simple resistance - more subtle
  • re-interpretation of specific conclusions and
    sequence of conclusions / Relative importance of
    specific recommendations
  • Understanding project scope
  • Access to content and people
  • Enthusiastic access
  • Importance of a unified project team
  • Working communication as well as weekly meetings

61
Text Analytics DevelopmentCase Study 3 POC
Government Agency
  • Demo of SAS Teragram / Enterprise Content
    Categorization

62
Conclusion
  • Enterprise Context strategic, self knowledge
  • Importance of a good foundation
  • Importance of Taxonomy Structure mapped to use
  • POC a head start on development
  • Importance of Text Analytics Vision / Strategy
  • Infrastructure resource, not a project
  • Balance of expertise and local knowledge
  • Importance of Usability for refinement cycles
  • Difference of taxonomy and categorization
  • Concepts vs. text in documents

63
Text Analytics Workshop Applications
  • Tom ReamyChief Knowledge Architect
  • KAPS Group
  • Knowledge Architecture Professional Services
  • http//www.kapsgroup.com

64
Agenda
  • Text Analytics Applications
  • Integration with Search Faceted Navigation
  • Ontology as one facet
  • Integration with ECM
  • Metadata
  • Auto-categorization
  • Platform for Information Applications
  • Enterprise internal and external
  • Semantic Applications
  • Structure for Social

65
Text Analytics and Search - Elements
  • Facet orthogonal dimension of metadata
  • Entity / Noun Phrase metadata value of a facet
  • Entity / Fact extraction feeds facets,
    signature, ontologies
  • Taxonomy and categorization rules
  • Auto-categorization aboutness, subject facets
  • People tagging, evaluating tags, fine tune
    rules and taxonomy

66
Essentials of Facets
  • Facets are not categories
  • Categories are what a document is about limited
    number
  • Entities are contained within a document any
    number
  • Facets are orthogonal mutually exclusive
    dimensions
  • An event is not a person is not a document is not
    a place.
  • Facets variety of units, of structure
  • Numerical range (price), Location big to small
  • Alphabetical, Hierarchical taxonomic
  • Facets are designed to be used in combination
  • Wine where color red, price excessive,
    location Calirfornia,
  • And sentiment snotty

67
Advantages of Faceted Navigation
  • More intuitive easy to guess what is behind
    each door
  • Simplicity of internal organization
  • 20 questions we know and use
  • Dynamic selection of categories
  • Allow multiple perspectives
  • Ability to Handle Compound Subjects
  • Systematic Advantages fewer elements
  • 4 facets of 10 nodes 10,000 node taxonomy
  • Ability to Handle Compound Subjects
  • Flexible can be combined with other navigation
    elements

68
Developing Facets Tools and TechniquesSoftware
Tools Entity Extraction
  • Dictionaries variety of entities, coverage,
    specialty
  • Cost of update service or in-house
  • 50 predefined entity types
  • 800,000 people, 700,000 locations, 400,000
    organizations
  • Rules
  • Capitalization, text Mr., Inc.
  • Advanced proximity and frequency of actions,
    associations
  • Need people to continually refine the rules
  • Entities and Categorization
  • Total number and pattern of entities a type of
    aboutness of the document Bar Code, Fingerprint
  • SAS integration of entities (concepts) and
    categorization

69
Three Environments
  • E-Commerce
  • Catalogs, small uniform collections of entities
  • Uniform behavior buy this
  • Enterprise
  • More content, more types of content
  • Enterprise Tools Search, ECM
  • Publishing Process tagging, metadata standards
  • Internet
  • Wildly different amount and type of content, no
    taggers
  • General Purpose Flickr, Yahoo
  • Vertical Portal selected content, no taggers

70
Three Environments E-Commerce
71
Three Environments E-Commerce
72
Enterprise Environment When and how add metadata
  • Enterprise Content different world than
    eCommerce
  • More Content, more kinds, more unstructured
  • Not a catalog to start less metadata and
    structured content
  • Complexity -- not just content but variety of
    users and activities
  • Combination of human and automatic metadata ECM
  • Software aided - suggestions, entities,
    ontologies
  • Enterprise Question of Balance / strategy
  • More facets more findability (up to a point)
  • Fewer facets lower cost to tag documents
  • Issues
  • Not enough facets
  • Wrong set of facets business not information
  • Ill-defined facets too complex internal
    structure

73
Facets and Taxonomies Enterprise Environment
Taxonomy, 7 facets
  • Taxonomy of Subjects / Disciplines
  • Science gt Marine Science gt Marine microbiology gt
    Marine toxins
  • Facets
  • Organization gt Division gt Group
  • Clients gt Federal gt EPA
  • Instruments gt Environmental Testing gt Ocean
    Analysis gt Vehicle
  • Facilities gt Division gt Location gt Building X
  • Methods gt Social gt Population Study
  • Materials gt Compounds gt Chemicals
  • Content Type Knowledge Asset gt Proposals

74
External Environment Text Mining, Vertical
Portals
  • Internet Content
  • Scale impacts design and technology speed of
    indexing
  • Limited control Association of publishers to
    selection of content to none
  • Major subtypes different rules metadata and
    results
  • Complex queries and alerts
  • Terrorism taxonomy geography people
    organizations
  • Text Mining
  • General or specific content and facets and
    categories
  • Dedicated tools or component of Portal internal
    or external
  • Vertical Portal
  • Relatively homogenous content and users
  • General range of questions
  • More specific targets the document, not a web
    site

75
Internet Design
  • Subject Matter taxonomy Business Topics
  • Finance gt Currency gt Exchange Rates
  • Facets
  • Location gt Western World gt United States
  • People Alphabetical and/or Topical -
    Organization
  • Organization gt Corporation gt Car Manufacturing gt
    Ford
  • Date Absolute or range (1-1-01 to 1-1-08, last
    30 days)
  • Publisher Alphabetical and/or Topical
    Organization
  • Content Type list newspapers, financial
    reports, etc.

76
(No Transcript)
77
(No Transcript)
78
(No Transcript)
79
Integrated Facet ApplicationDesign Issues -
General
  • What is the right combination of elements?
  • Faceted navigation, metadata, browse, search,
    categorized search results, file plan
  • What is the right balance of elements?
  • Dominant dimension or equal facets
  • Browse topics and filter by facet
  • When to combine search, topics, and facets?
  • Search first and then filter by topics / facet
  • Browse/facet front end with a search box

80
Integrated Facet ApplicationDesign Issues -
General
  • Homogeneity of Audience and Content
  • Model of the Domain broad
  • How many facets do you need?
  • More facets and let users decide
  • Allow for customization cant define a single
    set
  • User Analysis tasks, labeling, communities
  • Issue labels that people use to describe their
    business and label that they use to find
    information
  • Match the structure to domain and task
  • Users can understand different structures

81
Automatic Facets Special Issues
  • Scale requires more automated solutions
  • More sophisticated rules
  • Rules to find and populate existing metadata
  • Variety of types of existing metadata
    Publisher, title, date
  • Multiple implementation Standards Last Name,
    First / First Name, Last
  • Issue of disambiguation
  • Same person, different name Henry Ford, Mr.
    Ford, Henry X. Ford
  • Same word, different entity Ford and Ford
  • Relationship discovery Tim is CEO at IBM
  • Anaphoric resolution Tim is CEO at IBM. He is
    a mean guy.

82
Putting it all together Infrastructure Solution
  • Facets, Taxonomies, Software, People
  • Combine formal power with ability to support
    multiple user perspectives
  • Facet System interdependent, map of domain
  • Entity extraction feeds facets, signatures,
    ontologies
  • Taxonomy Auto-categorization aboutness,
    subject
  • People tagging, evaluating tags, fine tune
    rules and taxonomy
  • The future is the combination of simple facets
    with rich taxonomies with complex semantics /
    ontologies

83
Putting it all together Infrastructure Solution
  • Integration with ECM
  • Central Team
  • Metadata Create dictionaries of entities
  • Develop text analytics catalogs
  • Publishing Process
  • Software suggests entities, categorization
  • Authors task is simple yes or no, not think of
    keyword
  • Enterprise Search
  • Integrate at metadata level build advanced
    presentation and refine results
  • Beyond Keywords

84
Text Analytics Platform Multiple Applications
  • Platform for Information Applications
  • Content Aggregation
  • Duplicate Documents save millions!
  • Text Mining BI, CI sentiment analysis
  • Social Hybrid folksonomy / taxonomy /
    auto-metadata
  • Social expertise, categorize tweets and blogs,
    reputation
  • Ontology travel assistant SIRI
  • Integrate with Applications
  • Text into data predictive analytics
  • Use your Imagination!

85
Text Analytics Platform Social Media
Applications
  • Beyond Sentiment
  • Context with categorization
  • New types of emotion taxonomies
  • Analysis of Conversations
  • Expertise Analysis
  • Business Customer Intelligence
  • Security threat detection level of expertise
  • Behavior Prediction
  • Combine data (buying patterns) with text

86
New Applications in Social MediaBehavior
Prediction Telecom Customer Service
  • Problem distinguish customers likely to cancel
    from mere threats
  • Analyze customer support notes
  • General issues creative spelling, second hand
    reports
  • Develop categorization rules
  • First distinguish cancellation calls not
    simple
  • Second - distinguish cancel what one line or
    all
  • Third distinguish real threats

87
New Applications in Social MediaBehavior
Prediction Telecom Customer Service
  • Basic Rule
  • (START_20, (AND,  
  • (DIST_7,"cancel", "cancel-what-cust"),
  • (NOT,(DIST_10, "cancel", (OR, "one-line",
    "restore", if)))))
  • Examples
  • customer called to say he will cancell his
    account if the does not stop receiving a call
    from the ad agency.
  • cci and is upset that he has the asl charge and
    wants it off or her is going to cancel his act
  • ask about the contract expiration date as she
    wanted to cxl teh acct
  • Combine sophisticated rules with sentiment
    statistical training and Predictive Analytics and
    behavior monitoring

88
New Applications Wisdom of CrowdsCrowd Sourcing
Technical Support
  • Example Android User Forum
  • Develop a taxonomy of products, features, problem
    areas
  • Develop Categorization Rules
  • I use the SDK method and it isn't to bad a all.
    I'll get some pics up later, I am still trying to
    get the time to update from fresh 1.0 to 1.1.
  • Find product feature forum structure
  • Find problem areas in response, nearby text for
    solution
  • Automatic simply expose lists of solutions
  • Search Based application
  • Human mediated experts scan and clean up
    solutions

89
New Directions in Social MediaText Analytics,
Text Mining, and Predictive Analytics
  • Two Systems of the Brain
  • Fast, System 1, Immediate patterns (TM)
  • Slow, System 2, Conceptual, reasoning (TA)
  • Text Analytics pre-processing for TM
  • Discover additional structure in unstructured
    text
  • Behavior Prediction adding depth in individual
    documents
  • New variables for Predictive Analytics, Social
    Media Analytics
  • New dimensions 90 of information
  • Text Mining for TA Semi-automated taxonomy
    development
  • Bottom Up- terms in documents frequency, date,
    clustering
  • Improve speed and quality semi-automatic

90
Questions?
  • Tom Reamytomr_at_kapsgroup.com
  • KAPS Group
  • Knowledge Architecture Professional Services
  • http//www.kapsgroup.com

91
Resources
  • Conferences
  • Text Analytics World All aspects of text
    analytics
  • Call for Speakers Oct 3-4 Boston
  • Text Analytics Summit social media focus
  • LinkedIn Groups
  • Text Analytics World
  • Text Analytics Group
  • Data and Text Professionals
  • Sentiment Analysis
  • Metadata Management
  • Semantic Technologies

92
Resources
  • Books
  • Women, Fire, and Dangerous Things
  • George Lakoff
  • Knowledge, Concepts, and Categories
  • Koen Lamberts and David Shanks
  • The Stuff of Thought Steven Pinker
  • Journals
  • Academic Cognitive Science, Linguistics, NLP
  • Applied Scientific American Mind, New Scientist
Write a Comment
User Comments (0)
About PowerShow.com