DATA MINING FOR BUSINESS INTELLIGENCE (8) - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

DATA MINING FOR BUSINESS INTELLIGENCE (8)

Description:

... to construct perceptual maps and other product positioning devices. ... Each segment has special characteristics that affect the success of marketing ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 33
Provided by: www1X
Category:

less

Transcript and Presenter's Notes

Title: DATA MINING FOR BUSINESS INTELLIGENCE (8)


1
DATA MINING FORBUSINESS INTELLIGENCE(8)
2
  • Factor Analysis
  • The Factor/PCA node provides powerful
    data-reduction techniques to reduce the
    complexity of your data. Two similar but distinct
    approaches are provided.
  • Principal components analysis (PCA) finds linear
    combinations of the input fields that do the best
    job of capturing the variance in the entire set
    of fields, where the components are orthogonal
    (perpendicular) to each other. PCA focuses on all
    variance, including both shared and unique
    variance.
  • Factor analysis attempts to identify underlying
    concepts, or factors, that explain the pattern of
    correlations within a set of observed fields.
    Factor analysis focuses on shared variance only.
    Variance that is unique to specific fields is not
    considered in estimating the model. Several
    methods of factor analysis are provided by the
    Factor/PCA node.

3
  • For both approaches, the goal is to find a small
    number of derived fields that effectively
    summarize the information in the original set of
    fields.

4
  • Requirements
  • Only numeric fields can be used in a factor/PCA
    model. To estimate a factor analysis or PCA, you
    need one or more In fields. Fields with direction
    Out, Both, or None are ignored, as are
    non-numeric fields.

5
  • Strengths
  • Factor analysis and PCA can effectively reduce
    the complexity of your data without sacrificing
    much of the information content. These techniques
    can help you build more robust models that
    execute more quickly than would be possible with
    the raw input fields.

6
  • Factor analysis is often confused with principal
    components analysis. The two methods are related,
    but distinct, though factor analysis becomes
    essentially equivalent to principal components
    analysis if the "errors

7
  • Factor analysis in marketing
  • The basic steps are
  • Identify the salient attributes consumers use to
    evaluate products in this category.
  • Use quantitative marketing research techniques
    (such as surveys) to collect data from a sample
    of potential customers concerning their ratings
    of all the product attributes.
  • Input the data into a statistical program and run
    the factor analysis procedure. The computer will
    yield a set of underlying attributes (or
    factors).
  • Use these factors to construct perceptual maps
    and other product positioning devices.

8
  • Advantages
  • Both objective and subjective attributes can be
    used
  • Factor Analysis can be used to identify the
    hidden dimensions or constructs which may or may
    not be apparent from direct analysis.
  • It is extremely difficult to do, inexpensive, and
    accurate
  • There is flexibility in naming and using
    dimensions

9
  • Disadvantages
  • Usefulness depends on the researchers' ability to
    develop a complete and accurate set of product
    attributes - If important attributes are missed
    the value of the procedure is reduced
    accordingly.
  • Naming of the factors can be difficult - multiple
    attributes can be highly correlated with no
    apparent reason.
  • If the observed variables are completely
    unrelated, factor analysis is unable to produce a
    meaningful pattern (though the eigenvalues will
    highlight this suggesting that each variable
    should be given a factor in its own right).
  • If sets of observed variables are highly similar
    to each other but distinct from other items,
    Factor analysis will assign a factor to them,
    even though this factor will essentially capture
    true variance of a single item. In other words,
    it is not possible to know what the 'factors'
    actually represent only theory can help inform
    the researcher on this.

10
  • Feature Selection
  • This option allows you to specify the default
    settings for selecting or excluding predictor
    fields in the generated model. You can then add
    the model to a stream to select a subset of
    fields for use in subsequent model-building
    efforts.

11
  • The following options are available
  • All fields ranked. Selects fields based on their
    ranking as important, marginal, or unimportant.
    You can edit the label for each ranking as well
    as the cutoff values used to assign records to
    one rank or another.
  • Top number of fields. Selects the top n fields
    based on importance.
  • Importance greater than. Selects all fields with
    importance greater than the specified value.
  • The target field is always preserved regardless
    of the selection.

12
  • Importance Ranking Options
  • Importance is measured on a percentage scale and
    can be defined broadly as 1 minus the p value, or
    the probability of obtaining a result as extreme
    or more extreme than the observed result by
    chance alone.

13
  • All categorical. When all predictors and the
    target are categorical, importance can be ranked
    based on any of four measures
  • Pearson chi-square. Tests for independence of the
    target and the predictor without indicating the
    strength or direction of any existing
    relationship.
  • Likelihood-ratio chi-square. Similar to Pearson's
    chi-square and also tests for target-predictor
    independence.
  • Cramer's V. A measure of association based on
    Pearson's chi-square statistic. Values range from
    0, which indicates no association, to 1, which
    indicates perfect association.
  • Lambda. A measure of association reflecting the
    proportional reduction in error when the variable
    is used to predict the target value. A value of 1
    indicates that the predictor perfectly predicts
    the target, while a value of 0 means that the
    predictor provides no useful information about
    the target.

14
  • Some categorical. When somebut not
    allpredictors are categorical and the target is
    also categorical, importance can be ranked based
    on either the Pearson or likelihood-ratio
    chi-square. (Cramer's V and lambda are not
    available unless all predictors are categorical.)

15
  • Categorical versus continuous. When ranking a
    categorical predictor against a continuous target
    or vice versa (one or the other is categorical
    but not both), the F statistic is used.

16
  • Both continuous. When ranking a continuous
    predictor against a continuous target, the t
    statistic based on the correlation coefficient is
    used.

17
  • The network learns by examining individual
    records, generating a prediction for each record,
    and making adjustments to the weights whenever it
    makes an incorrect prediction. This process is
    repeated many times, and the network continues to
    improve its predictions until one or more of the
    stopping criteria have been met.

18
  • Initially, all weights are random, and the
    answers that come out of the net are probably
    nonsensical. The network learns through training.
    Examples for which the output is known are
    repeatedly presented to the network, and the
    answers it gives are compared to the known
    outcomes. Information from this comparison is
    passed back through the network, gradually
    changing the weights. As training progresses, the
    network becomes increasingly accurate in
    replicating the known outcomes. Once trained, the
    network can be applied to future cases where the
    outcome is unknown.

19
  • Requirements There are no restrictions on field
    types. Neural Net nodes can handle numeric,
    symbolic, or flag inputs and outputs. The Neural
    Net node expects one or more fields with
    direction In and one or more fields with
    direction Out. Fields set to Both or None are
    ignored. Field types must be fully instantiated
    when the node is executed.

20
  • Strengths Neural networks are powerful general
    function estimators. They usually perform
    prediction tasks at least as well as other
    techniques and sometimes perform significantly
    better. They also require minimal statistical or
    mathematical knowledge to train or apply.
    Clementine incorporates several features to avoid
    some of the common pitfalls of neural networks,
    including sensitivity analysis to aid in
    interpretation of the network, pruning and
    validation to prevent overtraining, and dynamic
    networks to automatically find an appropriate
    network architecture.

21
  • Clustering Models
  • K-means clustering
  • Kohonen networks
  • TwoStep clustering
  • Clustering models focus on identifying groups of
    similar records and labeling the records
    according to the group to which they belong. This
    is done without the benefit of prior knowledge
    about the groups and their characteristics. In
    fact, you may not even know exactly how many
    groups to look for.

22
  • What distinguishes clustering models from the
    other machine-learning techniques is--there is no
    predefined output or target field for the model
    to predict. These models are often referred to as
    unsupervised learning models, since there is no
    external standard by which to judge the model's
    classification performance.

23
  • The K-Means node clusters the data set into
    distinct groups (or clusters). The method defines
    a fixed number of clusters, iteratively assigns
    records to clusters, and adjusts the cluster
    centers until further refinement can no longer
    improve the model. Instead of trying to predict
    an outcome, K-Means uses a process known as
    unsupervised learning to uncover patterns in the
    set of input fields.

24
  • The TwoStep node uses a two-step clustering
    method. The first step makes a single pass
    through the data to compress the raw input data
    into a manageable set of subclusters. The second
    step uses a hierarchical clustering method to
    progressively merge the subclusters into larger
    and larger clusters. TwoStep has the advantage of
    automatically estimating the optimal number of
    clusters for the training data. It can handle
    mixed field types and large data sets efficiently

25
  • The Kohonen node generates a type of neural
    network that can be used to cluster the data set
    into distinct groups. When the network is fully
    trained, records that are similar should appear
    close together on the output map, while records
    that are different will appear far apart. You can
    look at the number of observations captured by
    each unit in the generated model to identify the
    strong units. This may give you a sense of the
    appropriate number of clusters.

26
  • Clustering models are often used to create
    clusters or segments that are then used as inputs
    in subsequent analyses. A common example of this
    is the market segments used by marketers to
    partition their overall market into homogeneous
    subgroups. Each segment has special
    characteristics that affect the success of
    marketing efforts targeted toward it. If you are
    using data mining to optimize your marketing
    strategy, you can usually improve your model
    significantly by identifying the appropriate
    segments and using that segment information in
    your predictive models.

27
  • K-Means Node
  • Note The resulting model depends to a certain
    extent on the order of the training data.
    Reordering the data and rebuilding the model may
    lead to a different final cluster model.
  • Requirements. To train a K-Means model, you need
    one or more In fields. Fields with direction Out,
    Both, or None are ignored.
  • Strengths. You do not need to have data on group
    membership to build a K-Means model. The K-Means
    model is often the fastest method of clustering
    for large data sets.

28
  • The TwoStep node uses a two-step clustering
    method. The first step makes a single pass
    through the data to compress the raw input data
    into a manageable set of subclusters. The second
    step uses a hierarchical clustering method to
    progressively merge the subclusters into larger
    and larger clusters. TwoStep has the advantage of
    automatically estimating the optimal number of
    clusters for the training data. It can handle
    mixed field types and large data sets efficiently

29
  • The TwoStep node uses partitioned data. Splits
    the data into separate subsets, or samples, for
    training, testing, and validation based on the
    current partition field.

30
  • The TwoStep node uses standardized numeric
    fields. By default, TwoStep will standardize all
    numeric input fields to the same scale, with a
    mean of 0 and a variance of 1. To retain the
    original scaling for numeric fields, deselect
    this option. Symbolic fields are not affected.

31
  • The TwoStep node excludes outliers. If you select
    this option, records that don't appear to fit
    into a substantive cluster will be automatically
    excluded from the analysis. This prevents such
    cases from distorting the results.
  • Outlier detection occurs during the
    pre-clustering step

32
  • The TwoStep node
  • -Automatically calculate number of clusters.
    TwoStep cluster can very rapidly analyze a large
    number of cluster solutions to choose the optimal
    number of clusters for the training data. Specify
    a range of solutions to try by setting the
    Maximum and the Minimum number of clusters.
  • -Specify number of clusters. If you know how many
    clusters to include in your model, select this
    option and enter the number of clusters.
Write a Comment
User Comments (0)
About PowerShow.com