View by Category

Loading...

PPT – ADVANCED ALGORITHMS IN COMPUTATIONAL BIOLOGY (C3), PowerPoint presentation | free to download - id: 3fcde1-OGE3Y

The Adobe Flash plugin is needed to view this content

About This Presentation

Write a Comment

User Comments (0)

Transcript and Presenter's Notes

ADVANCED ALGORITHMS IN COMPUTATIONAL BIOLOGY (C3),

- This is part of your

where I will cover the first two weeks courses

- 2012/02/24 DATABASES AN OVERVIEW
- 2012/03/02 INTRODUCTION TO DATA MINING

Class Info

- Lecturer Chi-Yao Tseng (???)

cytseng_at_citi.sinica.edu.tw - Grading
- No assignments
- Midterm
- 2012/04/20
- Im in charge of 17x2 points out of 120
- No take-home questions

Outline

- Introduction
- From data warehousing to data mining
- Mining Capabilities
- Association rules
- Classification
- Clustering
- More about Data Mining

Main Reference

- Jiawei Han, Micheline Kamber, Data Mining

Concepts and Techniques, 2nd Edition, Morgan

Kaufmann, 2006. - Official website http//www.cs.uiuc.edu/homes/han

j/bk2/

Why Data Mining?

- The Explosive Growth of Data from terabytes to

petabytes (1015 B 1 million GB) - Data collection and data availability
- Automated data collection tools, database

systems, Web, computerized society - Major sources of abundant data
- Business Web, e-commerce, transactions, stocks,

- Science Remote sensing, bioinformatics,

scientific simulation, - Society and everyone news, digital cameras,

YouTube, Facebook - We are drowning in data, but starving for

knowledge! - Necessity is the mother of inventionData

miningAutomated analysis of massive data sets

Why Not Traditional Data Analysis?

- Tremendous amount of data
- Algorithms must be highly scalable to handle such

as terabytes of data - High-dimensionality of data
- Micro-array may have tens of thousands of

dimensions - High complexity of data
- New and sophisticated applications

Evolution of Database Technology

- 1960s
- Data collection, database creation, IMS and

network DBMS - 1970s
- Relational data model, relational DBMS

implementation - 1980s
- RDBMS, advanced data models (extended-relational,

OO, deductive, etc.) - Application-oriented DBMS (spatial, scientific,

engineering, etc.) - 1990s
- Data mining, data warehousing, multimedia

databases, and Web databases - 2000s
- Stream data management and mining
- Data mining and its applications
- Web technology (XML, data integration) and global

information systems

What is Data Mining?

- Knowledge discovery in databases
- Extraction of interesting (non-trivial, implicit,

previously unknown and potentially useful)

patterns or knowledge from huge amount of data. - Alternative names
- Knowledge discovery (mining) in databases (KDD),

knowledge extraction, data/pattern analysis, data

archeology, data dredging, information

harvesting, business intelligence, etc.

Data Mining On What Kinds of Data?

- Database-oriented data sets and applications
- Relational database, data warehouse,

transactional database - Advanced data sets and advanced applications
- Data streams and sensor data
- Time-series data, temporal data, sequence data

(incl. bio-sequences) - Structure data, graphs, social networks and

multi-linked data - Object-relational databases
- Heterogeneous databases and legacy databases
- Spatial data and spatiotemporal data
- Multimedia database
- Text databases
- The World-Wide Web

Knowledge Discovery (KDD) Process

Knowledge!

Interpretation / Evaluation

Data Mining

Patterns

Selection Transformation

Transformed data

Data Cleaning Integration

Data warehouse

- This is a view from typical database systems
- and data warehousing communities.
- Data mining plays an essential role in the
- knowledge discovery process.

Databases

Data Mining and Business Intelligence

Increasing potential to support business decisions

End User

Decision Making

Business Analyst

Data Presentation

Visualization Techniques

Data Mining

Data Analyst

Information Discovery

Data Exploration

Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses

DBA

Data Sources

Paper, Files, Web documents, Scientific

experiments, Database Systems

Data Mining Confluence of Multiple Disciplines

Typical Data Mining System

Graphical User Interface

Pattern Evaluation

Knowledge Base

Data Mining Engine

Database or Data Warehouse Server

data cleaning, integration, and selection

Database

Data warehouse

World-Wide Web

Other info. repositories

Data Warehousing

- A data warehouse is a subject-oriented,

integrated, time-variant, and nonvolatile

collection of data in support of managements

decision making process. W. H. Inmon

Data Warehousing

- Subject-oriented
- Provide a simple and concise view around

particular subject issues by excluding data that

are not useful in the decision support process. - Integrated
- Constructed by integrating multiple,

heterogeneous data sources. - Time-variant
- Provide information from a historical perspective

(e.g., past 5-10 years.) - Nonvolatile
- Operational update of data does not occur in the

data warehouse environment - Usually requires only two operations load data

access data.

Data Warehousing

- The process of constructing and using data

warehouses - A decision support database that is maintained

separately from the organizations operational

database - Support information processing by providing a

solid platform of consolidated, historical data

for analysis - Set up stages for effective data mining

Illustration of Data Warehousing

client

Data source in Taipei

Clean Transform Integrate Load

Data Warehouse

Query and Analysis Tools

Data source in New York

. . .

client

Data source in London

OLTP vs. OLAP

OLTP(On-line Transaction Processing)

Short online transactionsupdate, insert, delete

current detailed data, Versatile

Complex Queries

Analytics Data Mining Decision Making

Data Warehouse

OLAP(On-line Analytical Processing)

aggregated historical data, Static and Low

volume

Multi-Dimensional View of Data Mining

- Data to be mined
- Relational, data warehouse, transactional,

stream, object-oriented/relational, active,

spatial, time-series, text, multi-media,

heterogeneous, legacy, WWW - Knowledge to be mined
- Characterization, discrimination, association,

classification, clustering, trend/deviation,

outlier analysis, etc. - Multiple/integrated functions and mining at

multiple levels - Techniques utilized
- Database-oriented, data warehouse (OLAP), machine

learning, statistics, visualization, etc. - Applications adapted
- Retail, telecommunication, banking, fraud

analysis, bio-data mining, stock market analysis,

text mining, Web mining, etc.

Mining Capabilities (1/4)

- Multi-dimensional concept description

Characterization and discrimination - Generalize, summarize, and contrast data

characteristics, e.g., dry vs. wet regions - Frequent patterns (or frequent itemsets),

association - Diaper ? Beer 0.5, 75 (support, confidence)

Mining Capabilities (2/4)

- Classification and prediction
- Construct models (functions) that describe and

distinguish classes or concepts for future

prediction - E.g., classify countries based on (climate), or

classify cars based on (gas mileage) - Predict some unknown or missing numerical values

Mining Capabilities (3/4)

- Clustering
- Class label is unknown Group data to form new

categories (i.e., clusters), e.g., cluster houses

to find distribution patterns - Maximizing intra-class similarity minimizing

interclass similarity - Outlier analysis
- Outlier Data object that does not comply with

the general behavior of the data - Noise or exception? Useful in fraud detection,

rare events analysis

Mining Capabilities (4/4)

- Time and ordering, trend and evolution analysis
- Trend and deviation e.g., regression analysis
- Sequential pattern mining e.g., digital camera ?

large SD memory - Periodicity analysis
- Motifs and biological sequence analysis
- Approximate and consecutive motifs
- Similarity-based analysis

More Advanced Mining Techniques

- Data stream mining
- Mining data that is ordered, time-varying,

potentially infinite. - Graph mining
- Finding frequent subgraphs (e.g., chemical

compounds), trees (XML), substructures (web

fragments) - Information network analysis
- Social networks actors (objects, nodes) and

relationships (edges) - e.g., author networks in CS, terrorist networks
- Multiple heterogeneous networks
- A person could be multiple information networks

friends, family, classmates, - Links carry a lot of semantic information Link

mining - Web mining
- Web is a big information network from PageRank

to Google - Analysis of Web information networks
- Web community discovery, opinion mining, usage

mining,

Challenges for Data Mining

- Handling of different types of data
- Efficiency and scalability of mining algorithms
- Usefulness and certainty of mining results
- Expression of various kinds of mining results
- Interactive mining at multiple abstraction levels
- Mining information from different source of data
- Protection of privacy and data security

Brief Summary

- Data mining Discovering interesting patterns and

knowledge from massive amount of data - A natural evolution of database technology, in

great demand, with wide applications - A KDD process includes data cleaning, data

integration, data selection, transformation, data

mining, pattern evaluation, and knowledge

presentation - Mining can be performed in a variety of data
- Data mining functionalities characterization,

discrimination, association, classification,

clustering, outlier and trend analysis, etc.

A Brief History of Data Mining Society

- 1989 IJCAI Workshop on Knowledge Discovery in

Databases - Knowledge Discovery in Databases (G.

Piatetsky-Shapiro and W. Frawley, 1991) - 1991-1994 Workshops on Knowledge Discovery in

Databases - Advances in Knowledge Discovery and Data Mining

(U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and

R. Uthurusamy, 1996) - 1995-1998 International Conferences on Knowledge

Discovery in Databases and Data Mining

(KDD95-98) - Journal of Data Mining and Knowledge Discovery

(1997) - ACM SIGKDD conferences since 1998 and SIGKDD

Explorations - More conferences on data mining
- PAKDD (1997), PKDD (1997), SIAM-Data Mining

(2001), (IEEE) ICDM (2001), etc. - ACM Transactions on KDD starting in 2007

More details here http//www.kdnuggets.com/gpspub

s/sigkdd-explorations-kdd-10-years.html

Conferences and Journals on Data Mining

- KDD Conferences
- ACM SIGKDD Int. Conf. on Knowledge Discovery in

Databases and Data Mining (KDD) - SIAM Data Mining Conf. (SDM)
- (IEEE) Int. Conf. on Data Mining (ICDM)
- European Conf. on Machine Learning and Principles

and practices of Knowledge Discovery and Data

Mining (ECML-PKDD) - Pacific-Asia Conf. on Knowledge Discovery and

Data Mining (PAKDD) - Int. Conf. on Web Search and Data Mining (WSDM)

- Other related conferences
- DB ACM SIGMOD, VLDB, ICDE, EDBT, ICDT
- WEB IR CIKM, WWW, SIGIR
- ML PR ICML, CVPR, NIPS
- Journals
- Data Mining and Knowledge Discovery (DAMI or

DMKD) - IEEE Trans. On Knowledge and Data Eng. (TKDE)
- KDD Explorations
- ACM Trans. on KDD

CAPABILITIES OF DATA MINING

FREQUENT PATTERNS ASSOCIATION RULES

Basic Concepts

- Frequent pattern a pattern (a set of items,

subsequences, substructures, etc.) that occurs

frequently in a data set - First proposed by Agrawal, Imielinski, and Swami

AIS93 in the context of frequent itemsets and

association rule mining - Motivation Finding inherent regularities in data
- What products were often purchased together?

Beer and diapers?! - What are the subsequent purchases after buying a

PC? - What kinds of DNA are sensitive to this new drug?
- Can we automatically classify web documents?
- Applications
- Basket data analysis, cross-marketing, catalog

design, sale campaign analysis, Web log (click

stream) analysis, and DNA sequence analysis

Mining Association Rules

- Transaction data analysis. Given
- A database of transactions (Each tx. has a list

of items purchased) - Minimum confidence and minimum support
- Find all association rules the presence of one

set of items implies the presence of another set

of items

Diaper ? Beer 0.5, 75 (support, confidence)

Two Parameters

- Confidence (how true)
- The rule XY??Z has 90 confidencemeans 90 of

customers who bought X and Y also bought Z. - Support (how useful is the rule)
- Useful rules should have some minimum transaction

support.

Mining Strong Association Rules in Transaction

Databases (1/2)

- Measurement of rule strength in a transaction

database.

A?B support, confidence

Mining Strong Association Rules in Transaction

Databases (2/2)

- We are often interested in only strong

associations, i.e., - support ? min_sup
- confidence ? min_conf
- Examples
- milk ? bread 5, 60
- tire and auto_accessories ? auto_services 2,

80.

Example of Association Rules

Transaction-id Items bought

1 A, B, D

2 A, C, D

3 A, D, E

4 B, E, F

5 B, C, D, E, F

- Let min. support 50, min. confidence 50
- Frequent patterns A3, B3, D3, E3, AD3
- Association rules

A ? D (s 60, c 100) D ? A (s 60, c

75)

Two Steps for Mining Association Rules

- Determining large (frequent) itemsets
- The main factor for overall performance
- The downward closure property of frequent

patterns - Any subset of a frequent itemset must be frequent
- If beer, diaper, nuts is frequent, so is beer,

diaper - i.e., every transaction having beer, diaper,

nuts also contains beer, diaper - Generating rules

The Apriori Algorithm

- Apriori (R. Agrawal and R. Srikant. Fast

algorithms for mining association rules.

VLDB'94.) - Derivation of large 1-itemsets L1 At the first

iteration, scan all the transactions and count

the number of occurrences for each item. - Level-wise derivation At the kth iteration, the

candidate set Ck are those whose every (k-1)-item

subset is in Lk-1. Scan DB and count the of

occurrences for each candidate itemset.

The Apriori AlgorithmAn Example

min. support 2 txs (50)

Database TDB

L1

C1

Itemset sup

A 2

B 3

C 3

D 1

E 3

Itemset sup

A 2

B 3

C 3

E 3

Tid Items

100 A, C, D

200 B, C, E

300 A, B, C, E

400 B, E

1st scan

C2

C2

Itemset

A, B

A, C

A, E

B, C

B, E

C, E

Itemset sup

A, B 1

A, C 2

A, E 1

B, C 2

B, E 3

C, E 2

L2

2nd scan

Itemset sup

A, C 2

B, C 2

B, E 3

C, E 2

C3

L3

Itemset

B, C, E

Itemset sup

B, C, E 2

3rd scan

From Large Itemsets to Rules

- For each large itemset m
- For each subset p of m
- if ( sup(m) / sup(m-p) ? min_conf )
- output the rule (m-p)?p
- conf. sup(m)/sup(m-p)
- support sup(m)
- m a,c,d,e,f,g 2000 txs, p

c,e,f,gm-p a,d 5000 txs - conf. a,c,d,e,f,g / a,d
- rule a,d ?c,e,f,g confidence 40, support

2000 txs

Redundant Rules

- For the same support and confidence, if we have a

rule a,d ?c,e,f,g, do we have agga98a - a,d ?c,e,f ?
- a ?c,e,f,g ?
- a,d,c ?e,f,g ?
- a ?c,d,e,f,g ?

Yes!

Yes!

No!

No!

Practice

- Suppose we additionally have
- 500 ACE
- 600 BCD
- Support 3 txs (50), confidence 66
- Repeat the large itemset generation
- Identify all large itemsets
- Derive up to 4 rules
- Generate rules from the large itemsets with the

biggest number of elements (from big to small)

Discussion of The Apriori Algorithm

- Apriori (R. Agrawal and R. Srikant. Fast

algorithms for mining association rules.

VLDB'94.) - Derivation of large 1-itemsets L1 At the first

iteration, scan all the transactions and count

the number of occurrences for each item. - Level-wise derivation At the kth iteration, the

candidate set Ck are those whose every (k-1)-item

subset is in Lk-1. Scan DB and count the of

occurrences for each candidate itemset. - The cardinalitiy (number of elements) of C2 is

huge. - The execution time for the first 2 iterations is

the dominating factor to overall performance! - Database scan is expensive.

Improvement of the Apriori Algorithm

- Reduce passes of transaction database scans
- Shrink the number of candidates
- Facilitate the support counting of candidates

Example Improvement 1- Partition Scan Database

Only Twice

- Any itemset that is potentially frequent in DB

must be frequent in at least one of the

partitions of DB - Scan 1 partition database and find local

frequent patterns - Scan 2 consolidate global frequent patterns
- A. Savasere, E. Omiecinski, and S. Navathe. An

efficient algorithm for mining association in

large databases. In VLDB95

Example Improvement 2- DHP

- DHP (direct hashing with pruning) Apriori

hashing - Use hash-based method to reduce the size of C2.
- Allow effective reduction on tx database size (tx

number and each tx size.)

Tid Items

100 A, C, D

200 B, C, E

300 A, B, C, E

400 B, E

J. Park, M.-S. Chen, and P. Yu. An effective

hash-based algorithm for mining association

rules. In SIGMOD95.

Mining Frequent Patterns w/o Candidate Generation

- A highly compact data structure frequent pattern

tree. - An FP-tree-based pattern fragment growth mining

method. - Search technique in mining partitioning-based,

divide-and-conquer method. - J. Han, J. Pei, Y. Yin, Mining Frequent Patterns

without Candidate Generation, in SIGMOD2000.

Frequent Patter Tree (FP-tree)

- 3 parts
- One root labeled as null
- A set of item prefix subtrees
- Frequent item header table
- Each node in the prefix subtree consists of
- Item name
- Count
- Node-link
- Each entry in the frequent-item header table

consists of - Item-name
- Head of node-link

The FP-tree Structure

frequent item header table

root

Item

Head of node-links

f c a b m p

f4

c1

c3

b1

b1

a3

p1

m2

b1

p2

m1

FP-tree Construction Step1

- Scan the transaction database DB once (the first

time), and derives a list of frequent items. - Sort frequent items in frequency descending

order. - This ordering is important since each path of a

tree will follow this order.

Example (min. support 3)

Tx ID Items Bought (ordered) Frequent Items

100 f,a,c,d,g,i,m,p f,c,a,m,p

200 a,b,c,f,l,m,o f,c,a,b,m

300 b,f,h,j,o f,b

400 b,c,k,s,p c,b,p

500 a,f,c,e,l,p,m,n f,c,a,m,p

frequent item header table

Item

Head of node-links

f c a b m p

List of frequent items (f4), (c4), (a3),

(b3), (m3), (p3)

FP-tree Construction Step 2

- Create a root of a tree, label with null
- Scan the database the second time. The scan of

the first tx leads to the construction of the

first branch of the tree.

Scan of 1st transaction f,a,c,d,g,i,m,p

root

The 1st branch of the tree lt(f1),(c1),(a1),(m1

),(p1)gt

f1

c1

a1

m1

p1

FP-tree Construction Step 2 (contd)

- Scan of 2nd transaction
- a,b,c,f,l,m,o ? f,c,a,b,m
- two new nodes
- (b1) (m1)

root

f2

c2

a2

m1

b1

p1

m1

The FP-tree

Tx ID Items Bought (ordered) Frequent Items

100 f,a,c,d,g,i,m,p f,c,a,m,p

200 a,b,c,f,l,m,o f,c,a,b,m

300 b,f,h,j,o f,b

400 b,c,k,s,p c,b,p

500 a,f,c,e,l,p,m,n f,c,a,m,p

frequent item header table

root

Item

Head of node-links

f c a b m p

f4

c1

c3

b1

b1

a3

p1

m2

b1

p2

m1

Mining Process

- Starts from the least frequent item p
- Mining order p -gt m -gt b -gt a -gt c -gt f

frequent item header table

Item

Head of node-links

f c a b m p

Mining Process for item p

- Starts from the least frequent item p

min. support 3

root

Two paths ltf4, c3, a3, m2, p2gt ltc1,

b1,p1gt

f4

c1

c3

Conditional pattern based of p ltf2, c2, a2,

m2gt ltc1, b1gt Conditional frequent

pattern ltc3gt So we have two frequent

patterns p3, cp3

b1

b1

a3

p1

m2

b1

p2

m1

Mining Process for Item m

min. support 3

root

Two paths ltf4, c3, a3, m2gt ltf4, c3, a3,

b1, m1gt

f4

c1

c3

Conditional pattern based of m ltf2, c2,

a2gt ltf1, c1, a1, b1gt Conditional frequent

pattern ltf3, c3, a3gt

b1

b1

a3

p1

m2

b1

p2

m1

Mining ms Conditional FP-tree

Mine (ltf3, c3, a3gt m)

f

a

c

(cm3) Mine (ltf3gt cm)

(am3) Mine (ltf3, c3gt am)

(fm3)

c

f

f

(cam3) Mine (ltf3gt cam)

(fam3)

(fcm3)

f

(fcam3)

So we have frequent patterns m3, am3,

cm3, fm3, cam3, fam3, fcm3,

fcam3

Analysis of the FP-tree-based method

- Find the complete set of frequent itemsets
- Efficient because
- Works on a reduced set of pattern bases
- Performs mining operations less costly than

generation test - Cons
- No advantages if the length of most txs are

short - The size of FP-tree not always fit into main

memory

Generalized Association Rules

- Given the class hierarchy (taxonomy), one would

like to choose proper data granularities for

mining. - Different confidence/support may be considered.
- R. Srikant and R. Agrawal, Mining generalized

association rules, VLDB95.

Freq. itemset Itemset support

Jacket 2

Outerwear 3

Clothes 4

Shoes 2

Hiking Boots 2

Footwear 4

Outerwear, Hiking Boots 2

Clothes, Hiking Boots 2

Outerwear, Footwear 2

Clothes, Footwear 2

Concept Hierarchy

Clothes

Footwear

Outerwear

Shoes

Shirts

Hiking Boots

Jackets

Ski Pants

Tx ID Items Bought

100 Shirt

200 Jacket, Hiking Boots

300 Ski Pants, Hiking Boots

400 Shoes

500 Shoes

600 Jacket

sup(30) conf(60)

Outerwear -gt Hiking Boots 33 66

Outerwear -gt Footwear 33 66

Hiking Boots -gt Outerwear 33 100

Hiking Boots -gt Clothes 33 100

Jacket -gt Hiking Boots 16 50

Ski Pants -gt Hiking Boots 16 100

Generalized Association Rules

level filtering

Other Relevant Topics

- Max patterns
- R. J. Bayardo. Efficiently mining long patterns

from databases. SIGMOD'98. - Closed patterns
- N. Pasquier, Y. Bastide, R. Taouil, and L.

Lakhal. Discovering frequent closed itemsets for

association rules. ICDT'99. - Sequential Patterns
- What items one will purchase if he/she has bought

some certain items. - R. Srikant and R. Agrawal, Mining sequential

patterns, ICDE95 - Traversal Patterns
- Mining path traversal patterns in a web

environment where documents or objects are linked

together to facilitate interactive access. - M.-S. Chen, J. Park and P. Yu. Efficient Data

Mining for Path Traversal Patterns. TKDE98. - and more

CLASSIFICATION

Classification

- Classifying tuples in a database.
- Each tuple has some attributes with known values.
- In training set E
- Each tuple consists of the same set of multiple

attributes as the tuples in the large database W. - Additionally, each tuple has a known class

identity.

Classification (contd)

- Derive the classification mechanism from the

training set E, and then use this mechanism to

classify general data (in testing set.) - A decision tree based approach has been

influential in machine learning studies.

Classification Step 1 Model Construction

- Train model from the existing data pool

Training Data

Classification algorithm

name age income own cars?

Sandy lt30 low no

Bill lt30 low yes

Fox 3140 high yes

Susan gt40 med no

Claire gt40 med no

Andy 3140 high yes

Classification rules

Classification Step 2 Model Usage

Classification rules

Testing Data

name age income own cars?

John gt40 hight ?

Sally lt30 low ?

Annie 3140 high ?

No

No

Yes

What is Prediction?

- Prediction is similar to classification
- First, construct model
- Second, use model to predict future of unknown

objects - Prediction is different from classification
- Classification refers to predict categorical

class label. - Prediction refers to predict continuous values.
- Major method regression

Supervised vs. UnsupervisedLearning

- Supervised learning (e.g., classification)
- Supervision The training data (observations,

measurements, etc.) are accompanied by labels

indicating the class of the observations. - Unsupervised learning (e.g., clustering)
- We are given a set of measurements, observations,

etc. with the aim of establishing the existence

of classes or clusters in the data. - No training data, or the training data are not

accompanied by class labels.

Evaluating Classification Methods

- Predictive accuracy
- Speed
- Time to construct the model and time to use the

model - Robustness
- Handling noise and missing values
- Scalability
- Efficiency in large databases (not memory

resident data) - Goodness of rules
- Decision tree size
- The compactness of classification rules

A Decision-Tree Based Classification

- A decision tree of whether going to play tennis

or not

outlook

sunny

rainy

overcast

humidity

windy

P

high

low

No

Yes

N

N

P

P

- ID-3 and its extended version C4.5 (Quinlan93)
- A top-down decision tree generation algorithm

Algorithm for Decision Tree Induction (1/2)

- Basic algorithm (a greedy algorithm)
- Tree is constructed in a top-down recursive

divide-and-conquer manner. - Attributes are categorical.
- (if an attribute is a continuous number, it

needs to be discretized in advance.) E.g. - At start, all the training examples are at the

root. - Examples are partitioned recursively based on

selected attributes.

0 20

61 80

0 lt age lt 100

21 40

81 100

41 60

Algorithm for Decision Tree Induction (2/2)

- Basic algorithm (a greedy algorithm)
- Test attributes are selected on the basis of a

heuristic or statistical measure (e.g.,

information gain) maximizing an information gain

measure, i.e., favoring the partitioning which

makes the majority of examples belong to a single

class. - Conditions for stopping partitioning
- All samples for a given node belong to the same

class - There are no remaining attributes for further

partitioning majority voting is employed for

classifying the leaf - There are no samples left

Decision Tree Induction Training Dataset

Age?

lt 30

3140

gt 40

Primary Issues in Tree Construction (1/2)

- Split criterion Goodness function
- Used to select the attribute to be split at a

tree node during the tree generation phase - Different algorithms may use different goodness

functions - Information gain (used in ID3/C4.5)
- Gini index (used in CART)

Primary Issues in Tree Construction (2/2)

- Branching scheme
- Determining the tree branch to which a sample

belongs - Binary vs. k-ary splitting
- When to stop the further splitting of a node?

e.g. impurity measure - Labeling rule a node is labeled as the class to

which most samples at the node belongs.

Income high

Income medium

Income low

How to Use a Tree?

- Directly
- Test the attribute value of unknown sample

against the tree. - A path is traced from root to a leaf which holds

the label. - Indirectly
- Decision tree is converted to classification

rules. - One rule is created for each path from the root

to a leaf. - IF-THEN is easier for humans to understand .

Attribute Selection Measure Information Gain

(ID3/C4.5)

- Select the attribute with the highest information

gain - Let pi be the probability that an arbitrary tuple

in D belongs to class Ci, estimated by Ci,

D/D - Expected information (entropy) needed to classify

a tuple in D

- Expected information (entropy)
- Entropy is a measure of how "mixed up" an

attribute is. - It is sometimes equated to the purity or impurity

of a variable. - High Entropy means that we are sampling from a

uniform (boring) distribution.

Expected Information (Entropy)

- Expected information (entropy) needed to classify

a tuple in D

(m number of labels)

Attribute Selection Measure Information Gain

(ID3/C4.5)

- Select the attribute with the highest information

gain - Let pi be the probability that an arbitrary tuple

in D belongs to class Ci, estimated by Ci,

D/D - Expected information (entropy) needed to classify

a tuple in D - Information needed (after using A to split D into

v partitions) to classify D - Information gained by branching on attribute A

Expected Information (Entropy)

- Information needed (after using A to split D into

v partitions) to classify D

Attribute Selection Information Gain

- Class P buys_computer yes
- Class N buys_computer no

- means age lt30 has 5 out of 14

samples, with 2 yeses and 3 nos. Hence - Similarly,

Gain Ratio for Attribute Selection (C4.5)

- Information gain measure is biased towards

attributes with a large number of values. - C4.5 (a successor of ID3) uses gain ratio to

overcome the problem (normalization to

information gain.) - GainRatio(A) Gain(A)/SplitInfo(A)
- GainRatio(income)

0.029/0.926 0.031 - The attribute with the maximum gain ratio is

selected as the splitting attribute.

Gini index (CART, IBM IntelligentMiner)

- If a data set D contains examples from n classes,

gini index, Gini(D) is defined as - where pj is the

relative frequency of class j in D - If a data set D is split on A into two subsets

D1 and D2, the gini index Gini(D) is defined as - Reduction in Impurity
- The attribute provides the smallest GiniA(D) (or

the largest reduction in impurity) is chosen to

split the node (need to enumerate all the

possible splitting points for each attribute.)

Gini index (CART, IBM IntelligentMiner)

- Ex. D has 9 tuples in buys_computer yes and

5 in no. - Suppose the attribute income partitions D into 10

in D1 low, medium and 4 in D2 high. - But Giniincome?medium,high is 0.30 and thus the

best since it is the lowest.

Other Attribute Selection Measures

- CHAID a popular decision tree algorithm, measure

based on ?2 test for independence - C-SEP performs better than info. gain and gini

index in certain cases - G-statistics has a close approximation to ?2

distribution - MDL (Minimal Description Length) principle (i.e.,

the simplest solution is preferred) - The best tree as the one that requires the fewest

of bits to both (1) encode the tree, and (2)

encode the exceptions to the tree - Multivariate splits (partition based on multiple

variable combinations) - CART finds multivariate splits based on a linear

combination of attributes.

Which attribute selection measure is the

best? Most give good results, none is

significantly superior than others

Other Types of Classification Methods

- Bayes Classification Methods
- Rule-Based Classification
- Support Vector Machine (SVM)
- Some of these methods will be taught in the

following lessons.

CLUSTERING

What is Cluster Analysis?

- Cluster a collection of data objects
- Similar to one another within the same cluster
- Dissimilar to the objects in other clusters
- Cluster Analysis
- Grouping a set of data objects into clusters
- Typical applications
- As a stand-alone tool to get insight into data

distribution - As a preprocessing step for other algorithms

General Applications of Clustering

- Spatial data analysis
- Create thematic maps in GIS by clustering feature

spaces. - Detect spatial clusters and explain them in

spatial data mining. - Image Processing
- Pattern recognition
- Economic Science (especially market research)
- WWW
- Document classification
- Cluster Web-log data to discover groups of

similar access patterns

Examples of Clustering Applications

- Marketing Help marketers discover distinct

groups in their customer bases, and then use this

knowledge to develop targeted marketing programs. - Land use Identification of areas of similar land

use in an earth observation database. - Insurance Identifying groups of motor insurance

policy holders with a high average claim cost. - City-planning Identifying groups of houses

according to their house type, value, and

geographical location.

What is Good Clustering?

- A good clustering method will produce high

quality clusters with - High intra-class similarity
- Low inter-class similarity
- The quality of a clustering result depends on

both the similarity measure used by the method

and its implementation. - The quality of a clustering method is also

measured by its ability to discover hidden

patterns.

Requirements of Clusteringin Data Mining (1/2)

- Scalability
- Ability to deal with different types of

attributes - Discovery of clusters with arbitrary shape
- Minimal requirements of domain knowledge for

input - Able to deal with outliers

Requirements of Clusteringin Data Mining (2/2)

- Insensitive to order of input records
- High dimensionality
- Curse of dimensionality
- Incorporation of user-specified constraints
- Interpretability and usability

Clustering Methods (I)

- Partitioning Method
- Construct various partitions and then evaluate

them by some criterion, e.g., minimizing the sum

of square errors - K-means, k-medoids, CLARANS
- Hierarchical Method
- Create a hierarchical decomposition of the set of

data (or objects) using some criterion - Diana, Agnes, BIRCH, ROCK, CHAMELEON
- Density-based Method
- Based on connectivity and density functions
- Typical methods DBSACN, OPTICS, DenClue

Clustering Methods (II)

- Grid-based approach
- based on a multiple-level granularity structure
- Typical methods STING, WaveCluster, CLIQUE
- Model-based approach
- A model is hypothesized for each of the clusters

and tries to find the best fit of that model to

each other - Typical methods EM, SOM, COBWEB
- Frequent pattern-based
- Based on the analysis of frequent patterns
- Typical methods pCluster
- User-guided or constraint-based
- Clustering by considering user-specified or

application-specific constraints - Typical methods cluster-on-demand, constrained

clustering

Typical Alternatives toCalculate the Distance

between Clusters

- Single link smallest distance between an element

in one cluster and an element in the other, i.e.,

dis(Ki, Kj) min(tip, tjq) - Complete link largest distance between an

element in one cluster and an element in the

other, i.e., dis(Ki, Kj) max(tip, tjq) - Average average distance between an element in

one cluster and an element in the other, i.e.,

dis(Ki, Kj) avg(tip, tjq) - Centroid distance between the centroids of two

clusters, i.e., dis(Ki,

Kj) dis(Ci, Cj) - Medoid distance between the medoids of two

clusters, i.e.,

dis(Ki, Kj) dis(Mi, Mj) - Medoid one chosen, centrally located object in

the cluster

Centroid, Radius and Diameter of a Cluster(for

numerical data sets)

- Centroid the middle of a cluster
- Radius square root of average mean squared

distance from any point of the cluster to its

centroid - Diameter square root of average mean squared

distance between all pairs of points in the

cluster

diameter ! 2 radius

Partitioning Algorithms Basic Concept

- Partitioning method construct a partition of a

database D of n objects into a set of k clusters. - Given a number k, find a partition of k clusters

that optimizes the chosen partitioning criterion. - Global optimal exhaustively enumerate all

partitions. - Heuristic methods k-means, k-medoids
- k-means (MacQueen67)
- k-medoids or PAM, partion around medoids (Kaufman

Rousseeuw87)

The K-Means Clustering Method

- Given k, the k-means algorithm is implemented in

four steps - Arbitrarily choose k points as initial cluster

centroids. - Update Means (Centroids) Compute seed points as

the center of the clusters of the current

partition. (center mean point of the cluster) - Re-assign Points Assign each object to the

cluster with the nearest seed point. - Go back to Step 2, stop when no more new

assignment.

loop

Example of theK-Means Clustering Method

10

9

8

7

6

5

Assign each objects to the most similar centroid

Update the cluster means

4

3

2

1

0

0

1

2

3

4

5

6

7

8

9

10

Re-assign

Re-assign

Given k 2 Arbitrarily choose k object as

initial cluster centroid

Update the cluster means

Comments on the K-Means Clustering

- Time Complexity O(tkn), where n is of objects,

k is of clusters, and t is of iterations.

Normally, k,tltltn. - Often terminates at a local optimum.

(The global optimum may be found using

techniques such as deterministic annealing and

genetic algorithms) - Weakness
- Applicable only when mean is defined, how about

categorical data? - Need to specify k, the number of clusters, in

advance - Unable to handle noisy data and outliers

Why is K-Means Unable toHandle Outliers?

- The k-means algorithm is sensitive to outliers
- Since an object with an extremely large value may

substantially distort the distribution of the

data. - K-Medoids Instead of taking the mean value of

the object in a cluster as a reference point,

medoids can be used, which is the most centrally

located object in a cluster.

X

PAM The K-Medoids Method

- PAM Partition Around Medoids
- Use real object to represent the cluster
- Randomly select k representative objects as

medoids. - Assign each data point to the closest medoid.
- For each medoid m,
- For each non-medoid data point o
- Swap m and o, and compute the total cost of the

configuration. - Select the configuration with the lowest cost.
- Repeat steps 2 to 5 until there is no change in

the medoid.

loop

A Typical K-Medoids Algorithm (PAM)

10

9

Assign each remaining object to the nearest medoid

8

7

Arbitrary choose k object as initial medoids

6

5

4

3

2

1

0

0

1

2

3

4

5

6

7

8

9

10

k 2

10

m2

9

8

7

6

m1

5

4

3

2

1

0

0

1

2

3

4

5

6

7

8

9

10

Swap each medoid and each data point, and compute

the total cost of the configuration

PAM Clustering Total swapping cost TCih?jCjih

d(j,h)ltd(j,t)

- Original medoid t, i

- h swap with i

- j any non-selected object

d(j,h)gtd(j,t)

What is the Problem with PAM?

- PAM is more robust than k-means in the presence

of noise and outliers because a medoid is less

influenced by outliers or other extreme values

than a mean. - PAM works efficiently for small data sets but

does not scale well for large data sets. - O( k(n-k)(n-k) ) for each iteration,

where n is of

data, k is of clusters - Improvements CLARA (uses a sampled set to

determine medoids), CLARANS

Hierarchical Clustering

- Use distance matrix as clustering criteria.
- This method does not require the number of

clusters k as an input, but needs a termination

condition.

AGNES (Agglomerative Nesting)

- Introduced in Kaufmann and Rousseeuw (1990)
- Use the Single-Link method and the dissimilarity

matrix. - Merge nodes that have the least dissimilarity
- Go on in a non-descending fashion
- Eventually all nodes belong to the same cluster

DendrogramShows How the Clusters are Merged

- Decompose data objects into a several levels of

nested partitioning (tree of clusters), called a

dendrogram. - A clustering of the data objects is obtained by

cutting the dendrogram at the desired level, then

each connected component forms a cluster.

DIANA (Divisive Analysis)

- Introduced in Kaufmann and Rousseeuw (1990)
- Inverse order of AGNES
- Eventually each node forms a cluster on its own.

More on Hierarchical Clustering

- Major weakness
- Do not scale well time complexity is at least

O(n2), where n is the number of total objects. - Can never undo what was done previously.
- Integration of hierarchical with distance-based

clustering - BIRCH(1996) uses CF-tree data structure and

incrementally adjusts the quality of

sub-clusters. - CURE(1998) selects well-scattered points from

the cluster and then shrinks them towards the

center of the cluster by a specified fraction.

Density-Based Clustering Methods

- Clustering based on density (local cluster

criterion), such as density-connected points - Major features
- Discover clusters of arbitrary shape
- Handle noise
- One scan
- Need density parameters as termination condition
- Several interesting studies
- DBSCAN Ester, et al. (KDD96)
- OPTICS Ankerst, et al (SIGMOD99).
- DENCLUE Hinneburg D. Keim (KDD98)
- CLIQUE Agrawal, et al. (SIGMOD98) (more

grid-based)

Density-Based Clustering Basic Concepts

- Two parameters
- Eps Maximum radius of the neighborhood
- MinPts Minimum number of points in an

Eps-neighborhood of that point

Eps

Density-Based Clustering Basic Concepts

- Two parameters
- Eps Maximum radius of the neighborhood
- MinPts Minimum number of points in an

Eps-neighborhood of that point - NEps(q) p dist(p,q) lt Eps // p, q are two

data points - Directly density-reachable A point p is directly

density-reachable from a point q w.r.t. Eps,

MinPts if - p belongs to NEps(q)
- core point condition
- NEps (q) gt MinPts

Density-Reachable and Density-Connected

- Density-reachable
- A point p is density-reachable from a point q

w.r.t. Eps, MinPts if there is a chain of points

p1, , pn, p1 q, pn p

such that pi1 is directly density-reachable from

pi. - Density-connected
- A point p is density-connected to a point q

w.r.t. Eps, MinPts if there is a point o such

that both, p and q are density-reachable from o

w.r.t. Eps and MinPts.

p

p2

q

DBSCAN Density Based Spatial Clustering of

Applications with Noise

- Relies on a density-based notion of cluster A

cluster is defined as a maximal set of

density-connected points. - Discovers clusters of arbitrary shape in spatial

databases with noise.

Border

Border

Eps 1cm MinPts 5

Core

DBSCAN The Algorithm

- Arbitrary select an unvisited point p.
- Retrieve all points density-reachable from p

w.r.t. Eps and MinPts. - If p is a core point, a cluster is formed. Mark

all these points as visited. - If p is a border point (no points are

density-reachable from p), mark p as visited and

DBSCAN visits the next point of the database. - Continue the process until all of the points have

been visited.

References (1)

- R. Agrawal, J. Gehrke, D. Gunopulos, and P.

Raghavan. Automatic subspace clustering of high

dimensional data for data mining applications.

SIGMOD'98 - M. R. Anderberg. Cluster Analysis for

Applications. Academic Press, 1973. - M. Ankerst, M. Breunig, H.-P. Kriegel, and J.

Sander. Optics Ordering points to identify the

clustering structure, SIGMOD99. - P. Arabie, L. J. Hubert, and G. De Soete.

Clustering and Classification. World Scientific,

1996 - Beil F., Ester M., Xu X. "Frequent Term-Based

Text Clustering", KDD'02 - M. M. Breunig, H.-P. Kriegel, R. Ng, J. Sander.

LOF Identifying Density-Based Local Outliers.

SIGMOD 2000. - M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A

density-based algorithm for discovering clusters

in large spatial databases. KDD'96. - M. Ester, H.-P. Kriegel, and X. Xu. Knowledge

discovery in large spatial databases Focusing

techniques for efficient class identification.

SSD'95. - D. Fisher. Knowledge acquisition via incremental

conceptual clustering. Machine Learning,

2139-172, 1987. - D. Gibson, J. Kleinberg, and P. Raghavan.

Clustering categorical data An approach based on

dynamic systems. VLDB98.

References (2)

- V. Ganti, J. Gehrke, R. Ramakrishan. CACTUS

Clustering Categorical Data Using Summaries.

KDD'99. - D. Gibson, J. Kleinberg, and P. Raghavan.

Clustering categorical data An approach based on

dynamic systems. In Proc. VLDB98. - S. Guha, R. Rastogi, and K. Shim. Cure An

efficient clustering algorithm for large

databases. SIGMOD'98. - S. Guha, R. Rastogi, and K. Shim. ROCK A robust

clustering algorithm for categorical attributes.

In ICDE'99, pp. 512-521, Sydney, Australia, March

1999. - A. Hinneburg, D.l A. Keim An Efficient Approach

to Clustering in Large Multimedia Databases with

Noise. KDD98. - A. K. Jain and R. C. Dubes. Algorithms for

Clustering Data. Printice Hall, 1988. - G. Karypis, E.-H. Han, and V. Kumar. CHAMELEON A

Hierarchical Clustering Algorithm Using Dynamic

Modeling. COMPUTER, 32(8) 68-75, 1999. - L. Kaufman and P. J. Rousseeuw, 1987. Clustering

by Means of Medoids. In Dodge, Y. (Ed.),

Statistical Data Analysis Based on the L1 Norm,

North Holland, Amsterdam. pp. 405-416. - L. Kaufman and P. J. Rousseeuw. Finding Groups in

Data an Introduction to Cluster Analysis. John

Wiley Sons, 1990. - E. Knorr and R. Ng. Algorithms for mining

distance-based outliers in large datasets.

VLDB98. - J. B. MacQueen (1967) "Some Methods for

classification and Analysis of Multivariate

Observations", Proceedings of 5-th Berkeley

Symposium on Mathematical Statistics and

Probability, Berkeley, University of California

Press, 1281-297 - G. J. McLachlan and K.E. Bkasford. Mixture

Models Inference and Applications to Clustering.

John Wiley and Sons, 1988. - P. Michaud. Clustering techniques. Future

Generation Computer systems, 13, 1997. - R. Ng and J. Han. Efficient and effective

clustering method for spatial data mining.

VLDB'94.

References (3)

- L. Parsons, E. Haque and H. Liu, Subspace

Clustering for High Dimensional Data A Review ,

SIGKDD Explorations, 6(1), June 2004 - E. Schikuta. Grid clustering An efficient

hierarchical clustering method for very large

data sets. Proc. 1996 Int. Conf. on Pattern

Recognition,. - G. Sheikholeslami, S. Chatterjee, and A. Zhang.

WaveCluster A multi-resolution clustering

approach for very large spatial databases.

VLDB98. - A. K. H. Tung, J. Han, L. V. S. Lakshmanan, and

R. T. Ng. Constraint-Based Clustering in Large

Databases, ICDT'01. - A. K. H. Tung, J. Hou, and J. Han. Spatial

Clustering in the Presence of Obstacles , ICDE'01 - H. Wang, W. Wang, J. Yang, and P.S. Yu.

Clustering by pattern similarity in large data

sets, SIGMOD 02. - W. Wang, Yang, R. Muntz, STING A Statistical

Information grid Approach to Spatial Data Mining,

VLDB97. - T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH

an efficient data clustering method for very

large databases. SIGMOD'96. - Wikipedia DBSCAN. http//en.wikipedia.org/wiki/DB

SCAN.

MORE ABOUT DATA MINING

ICDM 10 KEYNOTE SPEECH10 YEARS OF DATA MINING

RESEARCH RETROSPECT AND PROSPECT

- http//www.cs.uvm.edu/xwu/PPT/ICDM10-Sydney/ICDM1

0-Keynote.pdf

Xindong Wu, University of Vermont, USA

The Top 10 AlgorithmsThe 3-Step Identification

Process

- Nominations. ACM KDD Innovation Award and IEEE

ICDM Research Contributions Award winners were

invited in September 2006 to each nominate up to

10 best-known algorithms. - Verification. Each nomination was verified for

its citations on Google Scholar in late October

2006, and those nominations that did not have at

least 50 citations were removed. 18 nominations

survived and were then organized in 10 topics. - Voting by the wider community.

Top-10 Most Popular DM Algorithms18 Identified

Candidates (I)

- Classification
- 1. C4.5 Quinlan, J. R. C4.5 Programs for

Machine Learning. Morgan Kaufmann., 1993. - 2. CART L. Breiman, J. Friedman, R. Olshen, and

C. Stone. Classification and Regression Trees.

Wadsworth, 1984. - 3. K Nearest Neighbors (kNN) Hastie, T. and

Tibshirani, R. 1996. Discriminant Adaptive

Nearest Neighbor Classification. TPAMI. 18(6) - 4. Naive Bayes Hand, D.J., Yu, K., 2001. Idiot's

Bayes Not So Stupid After All? Internat.

Statist. Rev. 69, 385-398. - Statistical Learning
- 5. SVM Vapnik, V. N. 1995. The Nature of

Statistical Learning Theory. Springer-Verlag. - 6. EM McLachlan, G. and Peel, D. (2000).

Finite Mixture Models. J. Wiley, New York.

Association Analysis - Association Analysis
- 7. Apriori Rakesh Agrawal and Ramakrishnan

Srikant. Fast Algorithms for Mining Association

Rules. In VLDB '94. - 8. FP-Tree Han, J., Pei, J., and Yin, Y. 2000.

Mining frequent patterns without candidate

generation. In SIGMOD '00.

The 18 Identified Candidates (II)

- Link Mining
- 9. PageRank Brin, S. and Page, L. 1998. The

anatomy of a large-scale hypertextual Web search

engine. In WWW-7, 1998. - 10. HITS Kleinberg, J. M. 1998. Authoritative

sources in a hyperlinked environment. SODA, 1998. - Clustering
- 11. K-Means MacQueen, J. B., Some methods for

classification and analysis of multivariate

observations, in Proc. 5th Berkeley Symp.

Mathematical Statistics and Probability, 1967. - 12. BIRCH Zhang, T., Ramakrishnan, R., and

Livny, M. 1996. BIRCH an efficient data

clustering method for very large databases. In

SIGMOD '96. - Bagging and Boosting
- 13. AdaBoost Freund, Y. and Schapire, R. E.

1997. A decision-theoretic generalization of

About PowerShow.com

PowerShow.com is a leading presentation/slideshow sharing website. Whether your application is business, how-to, education, medicine, school, church, sales, marketing, online training or just for fun, PowerShow.com is a great resource. And, best of all, most of its cool features are free and easy to use.

You can use PowerShow.com to find and download example online PowerPoint ppt presentations on just about any topic you can imagine so you can learn how to improve your own slides and presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

You can use PowerShow.com to find and download example online PowerPoint ppt presentations on just about any topic you can imagine so you can learn how to improve your own slides and presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

presentations for free. Or use it to find and download high-quality how-to PowerPoint ppt presentations with illustrated or animated slides that will teach you how to do something new, also for free. Or use it to upload your own PowerPoint slides so you can share them with your teachers, class, students, bosses, employees, customers, potential investors or the world. Or use it to create really cool photo slideshows - with 2D and 3D transitions, animation, and your choice of music - that you can share with your Facebook friends or Google+ circles. That's all free as well!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

For a small fee you can get the industry's best online privacy or publicly promote your presentations and slide shows with top rankings. But aside from that it's free. We'll even convert your presentations and slide shows into the universal Flash format with all their original multimedia glory, including animation, 2D and 3D transition effects, embedded music or other audio, or even video embedded in slides. All for free. Most of the presentations and slideshows on PowerShow.com are free to view, many are even free to download. (You can choose whether to allow people to download your original PowerPoint presentations and photo slideshows for a fee or free or not at all.) Check out PowerShow.com today - for FREE. There is truly something for everyone!

Recommended

«

/ »

Page of

«

/ »

Promoted Presentations

Related Presentations

Page of

Home About Us Terms and Conditions Privacy Policy Contact Us Send Us Feedback

Copyright 2017 CrystalGraphics, Inc. — All rights Reserved. PowerShow.com is a trademark of CrystalGraphics, Inc.

Copyright 2017 CrystalGraphics, Inc. — All rights Reserved. PowerShow.com is a trademark of CrystalGraphics, Inc.

The PowerPoint PPT presentation: "ADVANCED ALGORITHMS IN COMPUTATIONAL BIOLOGY (C3)," is the property of its rightful owner.

Do you have PowerPoint slides to share? If so, share your PPT presentation slides online with PowerShow.com. It's FREE!

Committed to assisting Sinica University and other schools with their online training by sharing educational presentations for free