Title: Knowledge Management Systems: Development and Applications Part I: Overview and Related Fields
1Knowledge Management Systems Development and
ApplicationsPart I Overview and Related Fields
Hsinchun Chen, Ph.D. McClelland
Professor, Director, Artificial Intelligence Lab
The University of Arizona Founder, Knowledge
Computing Corporation
Acknowledgement NSF DLI1, DLI2, NSDL, DG, ITR,
IDM, CSS, NIH/NLM, NCI, NIJ, CIA, DHS, NCSA, HP,
SAP
????????, ??? ??
2- My Background ( A Mixed Bag!)
- BS NCTU Management Science, 1981
- MBA SUNY Buffalo Finance, MS, MIS
- Ph.D. NYU Information System, Minor CS, 1989
- Dissertation An AI Approach to the Design Of
Online Information Retrieval Systems (GEAC
Online Cataloging System) - Assistant/Associate/Full/Chair Professor,
University of Arizona, MIS Department - Scientific Counselor, National Library of
Medicine USA), National Library of China,
Academia Sinica
-
-
3- My Background (A Mixed Bag!)
- Founder/Director, Artificial Intelligent Lab,
1990 - Founder/Director, Hoffman eCommerce Lab, 2000
- PIs NSF CISE DLI-1 DLI-2, NSDL, DG, DARPA, NIJ,
NIH, CIA, DHS - Associate Editors JASIST, DSS, ACM TOIS, IEEE
SMC, IEEE ITS - Conference/program Co-hairs ICADL 1998-2004,
China DL 2002/2004, NSF/NIJ ISI 2003-2006, JCDL
2004 - Industry Consulting HP, IBM, ATT, SGI,
Microsoft, SAP - Founder, Knowledge Computing Corporation, 2000
4Knowledge Management Overview
5- Knowledge Management Overview
- What is Knowledge Management
- Data, Information, and Knowledge
- Why Knowledge Management?
- Knowledge Management Processes
6Unit of Analysis
- Data 1980s
- Factual
- Structured, numeric Oracle, Sybase, DB2
- Information 1990s
- Factual Yahoo!, Excalibur,
- Unstructured, textual Verity, Documentum
- Knowledge 2000s
- Inferential, sensemaking, decision making
- Multimedia ???
7Data, Information and Knowledge
- According to Alter (1996), Tobin (1996), and
Beckman (1999) - Data Facts, images, or sounds (interpretationme
aning ) - Information Formatted, filtered, and summarized
data (actionapplication ) - Knowledge Instincts, ideas, rules, and
procedures that guide actions and decisions
8Application and Societal Relevance
- Ontologies, hierarchies, and subject headings
- Knowledge management systems and practices
knowledge maps - Digital libraries, search engines, web mining,
text mining, data mining, CRM, eCommerce - Semantic web, multilingual web, multimedia web,
and wireless web
9The Third Wave of Net Evolution
2010
ARPANET
Internet
SemanticWeb
Function
Server Access
Knowledge Access
Info Access
1995
Unit
Server
Concepts
File/Homepage
1975
2000
Example
Email
Concept Protocols
WWW World Wide Wait
1985
1965
Company
IBM
???
Microsoft/Netscape
10Knowledge Management Definition
The system and managerial approach to
collecting, processing, and organizing
enterprise-specific knowledge assets for business
functions and decision making.
11Knowledge Management Challenges
- making high-value corporate information and
knowledge easily available to support decision
making at the lowest, broadest possible levels - Personnel Turn-over
- Organizational Resistance
- Manual Top-down Knowledge Creation
- Information Overload
12Knowledge Management Landscape
- Research Community
- NSF / DARPA / NASA, Digital Library Initiative I
II, NSDL (120M) - NSF, Digital Government Initiative (60M)
- NSF, Knowledge Networking Initiative (50M)
- NSF, Information Technology Research (300M)
- Business Community
- Intellectual Capital, Corporate Memory,
- Knowledge Chain, Competitive Intelligence
13Knowledge Management Foundations
- Enabling Technologies
- Information Retrieval (Excalibur, Verity, Oracle
Context) - Electronic Document Management (Documentum, PC
DOCS) - Internet/Intranet (Yahoo!, Google)
- Groupware (Lotus Notes, MS Exchange)
- Consulting and System Integration
- Best practices, human resources, organizational
development, performance metrics, methodology,
framework, ontology (Delphi, EY, Arthur
Andersen, AMS, KPMG)
14Knowledge Management Perspectives
- Process perspective (management and behavior)
consulting practices, methodology, best
practices, e-learning, culture/reward, existing
IT ? new information, old IT, new but manual
process - Information perspective (information and library
sciences) content management, manual ontologies
? new information, manual process - Knowledge Computing perspective (text mining,
artificial intelligence) automated knowledge
extraction, thesauri, knowledge maps ? new IT,
new knowledge, automated process
15KM Perspectives
16KM, Emergence of a Discipline (Ponzi, 2004)
- Influences from three disciplines Management and
Policy (40), Computer Science (30),
Information/Library Science (20) - Continuous, steady growth since 1990 academic
publications and industry articles not a fad
(unlike BPR, TQM) - Seminal books and articles in Knowledge
Management (e.g., Drucker, Davenport, Nonaka)
the 50 most-cited KM articles
17KM Thoughts and Thinkers
- Future organizations are information-based
organizations of knowledge workers
Specialization, cross-discipline task teams,
disappearance of middle managers (Drucker, The
Coming of the New Organization) - The Japanese Management Style Tacit knowledge,
redundancy, slogans, metaphors the Ba the
SECI Model Socialization, Externalization,
Combination, and Internalization (Nonaka, The
Knowledge-Creating Company)
18KM Thoughts and Thinkers (contd)
- Knowledge generation (acquisition, dedicated
resources, fusion, adaptation, knowledge
networking) Knowledge codification (mapping and
modeling knowledge) Knowledge transfer
Technologies for KM Learning from experiments
(Davenport, Working Knowledge) - Deep Smart Seeing the big picture and knowing
the skills learning from experience (Leonard,
Deep Smart)
19KM Thoughts and Thinkers (contd)
- Teaching smart people how to learn Defensive
reasoning and doom loop Learning how to reason
productively (Argyris, Teaching Smart People How
to Learn) - Technology gets in the way Research on work
practices Harvesting local innovation and
innovating with customer PARC anthropologists
(John Seely Brown, Research that Reinvents the
Corporation) - Inverting organizations (individual professionals
leading) Creating intellectual webs (Quinn,
Managing Professional Intellect)
20Knowledge Management The Industry and Status
21- Anderson Consulting (Accenture)
- (1) Acquire
- (2) Create
- (3) Synthesize
- (4) Share
- (5) Use to Achieve Organizational Goals
- (6) Environment Conducive to Knowledge Sharing
22- Ernst Young
- (1) Knowledge Generation
- (2) Knowledge Representation
- (3) Knowledge Codification
- (4) Knowledge Application
23Reason for Adopting KM
Retain expertise of personnel
51.9
Increase customer satisfaction
43.1
Improve profits, grow revenues
37.5
Support e-business initiatives
24.7
Shorten product development cycles
23
Provide project workspace
11.7
Knowledge Management and IDC May 2001
24Business Uses Of KM Initiative
Capture and share best practices
77.7
Provide training, corporate learning
62.4
Manage customer relationships
58
Deliver competitive intelligence
55.7
Provide project workspace
31.4
Manage legal, intellectual property
31.4
Continue
25Leader Of KM Initiative
Knowledge Management and IDC May 2001
26Implementation Challenges
Employees have no time for KM
41
Current culture does not encourage sharing
36.6
Lack of understanding of KM and Benefits
29.5
Inability to measure financial benefits of KM
24.5
Lack of Skill in KM techniques
22.7
Organizations processes are not designed for KM
22.2
Continue
27Implementation Challenges
Lack of funding for KM
21.8
Lack of incentives, rewards to share
19.9
Have not yet begun implementing KM
18.7
Lack of appropriate technology
17.4
Lack of commitment from senior management
13.9
No challenges encountered
4.3
Knowledge Management and IDC May 2001
28Types of Software Purchased
Messaging e-mail
44.7
Knowledge base, repository
40.7
Document management
39.2
Data warehousing
34.6
Groupware
33.1
Search engines
32.3
Continue
29Types of Software Purchased
Web-based training
23.8
Workflow
23.8
Enterprise information portal
23.2
Business rules management
11.6
Knowledge Management and IDC May 2001
30Spending On IT Services For KM
15.3 Training
27.8 Consulting Planning
13.7 Maintenance
27 Implementation
15.3 Operations, outsourcing
Knowledge Management and IDC May 2001
31Software Budget Allotments
Enterprise information portal
35.6
Document management
26.2
Groupware
24.4
Workflow
22.9
Data warehousing
19.3
Search engines
13.0
Continue
32Software Budget Allotments
Web-based training
11.4
Messaging e-mail
10.8
Other
29.2
Knowledge Management and IDC May 2001
33Knowledge Management Systems Overview
34- Knowledge Management Systems (KMS)
- Characteristics of KMS
- The Industry and the Market
- Major Vendors and Systems
35Knowledge Management Systems Definition
- KMSs are computer-based information systems that
- can help an enterprise acquire, manage, retain,
analyze, and retrieve mission-critical
information and help turn enterprise information
into well-organized, abstract, and actionable
knowledge and - can help an enterprise identify and
inter-connect experts, managers, and knowledge
workers and help extract, retain, and
disseminate their knowledge in an organization.
36KM Architecture (Source GartnerGroup)
Web UI
Web Browser
Knowledge Maps
Enterprise Knowledge Architecture
Knowledge Retrieval
Conceptual
Physical
KR Functions
Text and Database Drivers
Application Index
Database Indexes
Text Indexes
Workgroup Applications
Databases
Applications
Distributed Object Models
Intranet and Extranet
Network Services
Platform Services
37Knowledge Retrieval Level (Source GartnerGroup)
Concept Yellow Pages
Retrieved Knowledge
- Clustering categorization table of contents
- Semantic Networks index
- Dictionaries
- Thesauri
- Linguistic analysis
- Data extraction
- Collaborative filters
- Communities
- Trusted advisor
- Expert identification
Semantic
Value Recommendation
Collaboration
38Knowledge Retrieval Vendor Direction(Source
GartnerGroup)
Market Target
Newbies
IR Leaders
- grapeVINE
- Sovereign Hill
- CompassWare
- Intraspect
- KnowledgeX
- WiseWire
- Lycos
- Autonomy
- Perspecta
- Verity
- Fulcrum
- Excalibur
- Dataware
Knowledge Retrieval
NewBies
IR Leaders
Niche Players
- IDI
- Oracle
- Open Text
- Folio
- IBM
- InText
- PCDOCS
- Documentum
Lotus
Netscape
Technology Innovation
Microsoft
Niche Players
Not yet marketed
Content Experience
39Challengers
Leaders
Lotus
Microsoft
Dataware
Autonomy
Verity
IBM
Excalibur
Ability to Execute
Netscape Documentum
PCDOCS/
Fulcrum
IDI
Inference
OpenText
Lycos/InMagic
CompassWare
GrapeVINE
KnowledgeX
InXight
WiseWire
SovereignHill
Semio
Intraspect
Visionaries
Niche Players
Completeness of Vision
40Two Approaches to Codify Knowledge
Top-Down Approach
- Structured
- Manual
- Human-driven
Bottom-Up Approach
- Unstructured
- System-aided
- Data/Info-driven
41- Sample KMS
- Search Engine and Web Portal
- Data Mining
- Text Mining
- Web Mining
42Managing Information Search Engine and Web
Portal (Source Jan Peterson and William
Chang, Excite)
43Basic Architectures Search
Log
20M queries/day
Spider
Web
SE
Spam
Index
Browser
SE
SE
Freshness
24x7
Quality results
800M pages?
44Basic Architectures Directory
Url submission
Surfing
Ontology
Web
SE
Browser
SE
SE
Reviewed Urls
45Spidering
- Web HTML data
- Hyperlinked
- Directed, disconnected graph
- Dynamic and static data
- Estimated 2 billion indexible pages
- Freshness
- How often are pages revisited?
46Indexing
- Size
- from 50M to 150M to 3B urls
- 50 to 100 indexing overhead
- 200 to 400GB indices
- Representation
- Fields, meta-tags and content
- NLP stemming?
47Search
- Augmented Vector-space
- Ranked results with Boolean filtering
- Quality-based re-ranking
- Based on hyperlink data
- or user behavior
- Spam
- Manipulation of content to improve placement
48Queries
- Short expressions of information need
- 2.3 words on average
- Relevance overload is a key issue
- Users typically only view top results
- Search is a high volume business
- Yahoo! 50M queries/day
- Excite 30M queries/day
- Infoseek 15M queries/day
49Alta Vista within site search, machine
translation
50Directory
- Manual categorization and rating
- Labor intensive
- 20 to 50 editors
- High quality, but low coverage
- 200-500K urls
- Browsable ontology
- Open Directory is a distributed solution
51Yahoo manual ontology (200 ontologists)
52Special Collections
- Newswire
- Newsgroups
- Specialized services (Deja)
- Information extraction
- Shopping catalog
- Events recipes, etc.
53The Hidden Web
- Non-indexible content
- Behind passwords, firewalls
- Dynamic content
- Often searchable through local interface
- Network of distributed search resources
- How to access?
- Ask Jeeves!
54The Role of NLP
- Many Search Engines do not stem
- Precision bias suggests conservative term
treatment - What about non-English documents
- N-grams are popular for Chinese
- Language ID anyone?
55Link Analysis
- Authors vote via links
- Pages with higher inlink are higher quality
- Not all links are equal
- Links from higher quality sites are better
- Links in context are better
- Resistant to Spam
- Only cross-site links considered
56Page Rank (Page98)
- Limiting distribution of a random walk
- Jump to a random page with Prob. ?
- Follow a link with Prob. 1- ?
- Probability of landing at a page D
- ?/T ? P(D)/L(D)
- Sum over pages leading to D
- L(D) number of links on page D
57Who asks What?
- Query logs revisited
- Query-based indexing why index things people
dont ask for? - If they ask for A, give them B
- From atomic concepts to query extensions
- Structure of questions and answers
- Shyam Kapurs chunks
58Futures
- Vertical markets healthcare, real estate, jobs
and resumes, etc. - Localized search
- Search as embedded app
- Shopping 'bots
- Open Problems
- Has the bubble burst?
59From SE to Web Portal
- Spidering Intranet and Internet crawling
- Integration legacy systems and databases
- Content aggregation and conversion
- Process Collaboration, chat, workflow
management, calendaring, and such - Analysis data and text mining, agent/alert, web
mining
60Discovering Knowledge Data Mining (Source
Michael Welge Automated Learning Group, NCSA)
61Why Data Mining? -- Potential Applications
- Database analysis, decision support, and
automation - Market and Sales Analysis
- Fraud Detection
- Manufacturing Process Analysis
- Risk Analysis and Management
- Experimental Results Analysis
- Scientific Data Analysis
- Text Document Analysis
62Data Mining Confluence of Multiple Disciplines
- Database Systems, Data Warehouses, and OLAP
- Machine Learning
- Statistics
- Mathematical Programming
- Visualization
- High Performance Computing
63Data Mining A KDD Process
64Required Effort for Each KDD Step
65Data Mining Models and Methods
66Deviation Detection
- Identify outliers in a dataset.
- Typical techniques OLAP charting, probability
distribution contrasts, regression analysis,
discriminant analysis
67Link Analysis (Rule Association)
- Given a database, find all associations of the
form - IF lt LHS gt THEN ltRHS gt
- Prevalence frequency of the LHS and RHS
occurring together - Predictability fraction of the RHS out of all
items with the LHS - e.g., Beer and diaper
-
68Database Segmentation
- Regroup datasets into clusters that share common
characteristics. - Typical techniques hierarchical clustering,
neural network clustering (SOM), k-means
69Predictive Modeling
- Use past data to predict future response and
behavior. - Typical technique supervised learning (Neural
Networks, Decision Trees, Naïve Bayesian) - E.g., Who is most likely to respond to a direct
mailing
70Data/Information Visualization
- Gain insight into the contents and complexity of
the database being analyzed - Vast amounts of under utilized data
- Time-critical decisions hampered
- Key information difficult to find
- Results presentation
- Reduced perceptual, interpretative, cognitive
burden
71Rule Association - Basket Analysis
72Text Mining Visualization
This data is considered to be confidential and
proprietary to Caterpillar and may only be used
with prior written consent from Caterpillar.
73Decision Tree Visualizer
74From Data Mining to Text Mining
- Techniques linguistics analysis, clustering,
unsupervised learning, case-based reasoning - Ontologies XML/RDF, content management
- P1000 A picture is worth 1000 words
- Formats/types email, reports, web pages, etc.
- Integration KMS and IT infrastructure
- Cultural rewards and unintended consequences