A systematic overview of Data Mining Algorithms presentation

About This Presentation

Transcript and Presenter's Notes

Title: A systematic overview of Data Mining Algorithms

1
A systematic overview of Data Mining Algorithms

3
Data Mining algorithm specification

Task (visualization, classification, clustering,
regression)
Structure (functional form) of the model that we
are fitting to the data (linear regression,
hierarchical clustering )
Score function
Goodness of fit
Generalization
Search/optimization method(hill climbing,
simulated annealing, convergence specification )

4
CART classification

5
CART Tree

6
Tree classification vs. Linear classification

Tree can deal with mixed data types (combination
of categorical and real-valued data)
Easier to deal with large number of variables
(because process one at a time)

7
CART tree classification

8
CART

Search space is all possible trees (combinatorial
large).
Approximation algorithm Uses greedy local
search to identify good candidate tree
structures
2 phase
Recursively expands the tree from root
Prunes branches

9
Reductionist view on data mining algorithms

Algorithms are tuples of
model structure, score function, search method,
data management
When deciding on an algorithm for an application,
think of which components are suitable

10
Multilayer Preceptrons (MLP) for Regression and
Classification

11
MLP Basic idea
12
MLP algorithm tuple

13
MLP

14
Score function

Commonly used Score function
S S( y(i) y(i))2
y(i) is a true target value and y(i) is a output
of the network for the ith data point
y(i) is a function of the input vector x(i) and
the weights ?
Training network minimizing S

15
Training methods

Back-propagation
Steepest descent on the score function descends
to local minimum given a randomly chosen point in
the parameter space.
Non-linear optimization techniques

16
Tree vs. MLP

Tree algorithm search through models of different
complexities in a relatively automated manner.
No accepted procedure for network structure of
MLP (number of hidden nodes, layers)
In practice, trial-and-error

17
A Priori Alg. For Association Rule Learning

19
Association Rules

20
Example
21
Association Rules Summary

Systematic search explicitely tries to minimize
number of linear scans through database
Designed to operate on very large data sets.
Focus on computational efficiency

22
Vector Space Algorithms for Text Retrieval

Task retrieval of k most similar documents in a
database relative to a given query
Representation vector of term occurrences
Score function angle between two vectors
Search method
Model representation is the key idea (which terms
to use in a vector)

Statistical approach emphasize theoretical
aspect of inference procedures (parameter
estimation, model selection)
Computer science approach focus on more
efficient search and data mangement.

Write a Comment

User Comments (0)

About PowerShow.com

A systematic overview of Data Mining Algorithms PowerPoint PPT Presentation