Delivering Business Intelligence With Microsoft SQL Server 2005 - Part 2 - PowerPoint PPT Presentation

About This Presentation
Title:

Delivering Business Intelligence With Microsoft SQL Server 2005 - Part 2

Description:

Topic 3: Tasks Accomplished by Data Mining. Topic 4: Steps for Data Mining ... the data and set direction, so our organizations do not tumble over a cliff. ... – PowerPoint PPT presentation

Number of Views:315
Avg rating:3.0/5.0
Slides: 91
Provided by: circus6
Category:

less

Transcript and Presenter's Notes

Title: Delivering Business Intelligence With Microsoft SQL Server 2005 - Part 2


1
Delivering Business Intelligence With Microsoft
SQL Server 2005- Part 2
2
Summary of Topics
  • Topic 1 What Is Data Mining?
  • Topic 2 SQL Server Data Mining Architecture
    Overview
  • Topic 3 Tasks Accomplished by Data Mining
  • Topic 4 Steps for Data Mining
  • Topic 5 Data Mining Algorithms
  • Topic 6 Data Mining Structure Model

3
Topic 1 What Is Data Mining?
  • Data mining works in a manner similar to our
    topographic map. It makes connections within the
    data that may not be readily apparent to the
    human observer. These connections make the data
    easier to understand. Using this new
    understanding, we can efficiently analyze the
    data and set direction, so our organizations do
    not tumble over a cliff.

4
Making Connections
5
Making Connections -- Planning a route using a
topographic map
                                        
6
Making Predictions
  • Data mining provides us with a similar
    capability. We can use the patterns found in our
    data to make predictions of what will happen
    next. Data mining may find patterns in the way
    our clients make use of our services. Based on
    these patterns, we can predict which clients may
    need additional services in the future. Data
    mining may find patterns in the valid data
    entered for a particular screen. Based on these
    patterns, we can predict whether a newly entered
    set of data is accurate or contains a data entry
    error.

7
Basic Data Mining Terminology
  • The data mining industry uses a lot of terms
    fairly loosely, so its helpful for us to define
    a few of these terms early on. This is not an
    exhaustive list of data mining terms, only the
    relevant ones for our discussion.
  • Algorithm The programmatic technique used to
    identify the relationships or patterns in the
    data.
  • Model The definition of the relationship
    identified by the algorithm, which generally
    takes the form of a set of rules, a decision
    tree, a set of equations, or a set of
    associations.
  • Case The collection of attributes and
    relationships (variables) that are associated
    with an individual object, usually a customer.
    The case is also known as an observation.

8
Basic Data Mining Terminology
  • Case set A group of cases that share the same
    attributes. Think of a case set as a table with
    one row per unique object (like customer). Its
    possible to have a nested case set when one row
    in the parent table, like customer, joins to
    multiple rows in the nested table, like
    purchases. The case set is also known as an
    observation set.
  • Dependent variable(s) (or predicted attribute or
    predict column) The variable the algorithm will
    build a model to predict or classify.

9
Basic Data Mining Terminology
  • Independent variable(s) (or predictive attribute
    or input column) The variable with descriptive
    information used to build the model. The
    algorithm creates a model that uses combinations
    of independent variables to define a grouping or
    predict the dependent variable.

10
Basic Data Mining Terminology
  • Discrete or Continuous Variables Numeric columns
    that contain continuous or discrete values. A
    column in the Employee table called Salary that
    contains the actual salary values is a continuous
    variable. You can add a column to the table
    during data preparation called SalaryRange,
    containing integers to represent encoded salary
    ranges (1 0 to 25,000 2 between 25,000
    and 50,000 and so on). This is a discrete
    numeric column. Early data mining and statistical
    analysis tools required the conversion of strings
    to numeric values like the encoded salary ranges.
    Most tools, including most of the SQL Server data
    mining algorithms, allow the use of character
    descriptions as discrete values. The string 0 to
    25,000 is easier to understand than the number
    1. Discrete variables are also known as
    categorical. This distinction between discrete
    and continuous is important to the underlying
    algorithms in data mining, although its
    significance is less obvious to those of us who
    are not statisticians.

11
Basic Data Mining Terminology
  • Regression A statistical technique that creates
    a best-fit formula based on a dataset. The
    formula can be used to predict values based on
    new input variables. In linear regression, the
    formula is the equation for a line.
  • Deviation A measure of how well the regression
    formula fits the actual values in the dataset
    from which it was created.

12
Basic Data Mining Terminology
  • Mining Structure A Microsoft data mining term
    used as a name for the definition of a case set
    in Analysis Services. The mining structure is
    essentially a metadata layer on top of a data
    source view that includes additional data
    mining-related flags and column properties, like
    the field that identifies a column as input,
    predict, both, or ignore. A mining structure can
    be used as the basis for multiple mining models.
  • Mining Model The specific application of an
    algorithm to a particular mining structure. You
    can build several mining models with different
    parameters or different algorithms from the same
    mining structure.

13
Topic 2 SQL Server Data Mining Architecture
Overview
14
SQL Server Data Mining Architecture Overview
  • Point A in Architecture Figure shows how data
    mining models are built using the dimensional
    engine, leveraging its ability to load data and
    quickly perform the base statistical calculations
    like sums, averages, and counts. The data mining
    server can also easily pull case data from both
    relational Analysis Services databases as seen at
    point B in ArchitectureFigure.

15
SQL Server Data Mining Architecture Overview
  • Point C in Architecture Figure shows how the
    typical developer will first experience data
    mining by creating a BI Studio Analysis Services
    project and then using the Data Mining Wizard to
    create a new data mining structure and an initial
    data mining model. The mining structure is a new
    construct that provides a metadata layer allowing
    several mining models to work with the same input
    data. Each mining model in a mining structure can
    have different algorithms and parameters. The
    wizard provides model building guidance with auto
    selection and adjustment of variables based on
    the algorithm selected. The wizard also helps you
    create case sets, including complex, nested
    queries.

16
Build, Deploy, and Process
  • Most of the functions in the Data Mining Designer
    work with the actual model as it exists in
    Analysis Services. This means once the wizard is
    complete, the developer must build and deploy the
    project (which includes processing the model
    cubes) before any more progress can be made.
    Building the project writes the metadata out to
    project files in the development environment. The
    actual model does not come into being until the
    project is deployed to an Analysis Services
    instance. At that point, BI Studio creates a
    database for the project in Analysis Services. It
    writes out the model structure metadata and the
    definition of each model. Finally, it creates a
    cube for each model and processes the models,
    inserting the training data so the algorithm can
    calculate the rules, correlations, and other
    relationships. Until the project is deployed and
    a model is processed, it cannot be viewed in the
    viewers.

17
Build, Deploy, and Process
  • Most of the functions in the Data Mining Designer
    work with the actual model as it exists in
    Analysis Services. This means once the wizard is
    complete, the developer must build and deploy the
    project (which includes processing the model
    cubes) before any more progress can be made.
    Building the project writes the metadata out to
    project files in the development environment. The
    actual model does not come into being until the
    project is deployed to an Analysis Services
    instance. At that point, BI Studio creates a
    database for the project in Analysis Services. It
    writes out the model structure metadata and the
    definition of each model. Finally, it creates a
    cube for each model and processes the models,
    inserting the training data so the algorithm can
    calculate the rules, correlations, and other
    relationships. Until the project is deployed and
    a model is processed, it cannot be viewed in the
    viewers.

18
Accessing the Mining Models
  • As you see at Point D in ArchitectureFigure, Data
    Mining eXtensions to SQL language (DMX) is at the
    core of all the Microsoft data mining API. As the
    name suggests, DMX is an extension to SQL
    designed to create, train, modify, and query data
    mining models. DMX was introduced with SQL Server
    2000 as part of the OLE DB for Data Mining APIs.
    It has been enhanced in SQL Server 2005 with
    additional options for data sources and more
    flexible SELECT functionality. An easy way to
    begin learning about DMX is to use the Mining
    Model Prediction tab in the Data Mining Designer
    and examine the syntax it generates for DMX
    queries. The code can be copied to a DMX query
    window in SQL Studio for further exploration.
    Although DMX is an extension to SQL, queries are
    submitted to the Analysis Services serverthats
    where the data mining services are.

19
Accessing the Mining Models
  • When the development environment submits its DMX
    commands and queries to Analysis Services, it
    uses the XML for Analysis (XMLA) APIs. In fact,
    because Analysis Services is a native XMLA
    server, the data mining models are available to
    any application as a web service by using SOAP
    protocols. It is still possible to access the
    server with old OLE DB APIs, or ADO and ADO.NET.

20
Integration Services and Data Mining
  • Integration Services can play a major role in the
    data mining process as shown in
    ArchitectureFigure. Many of the standard
    transforms used for data cleansing and data
    integration are particularly valuable for
    building the training and test case sets. Besides
    the obvious tasks, like Data Conversion and
    Derived Column, tasks like the Percentage
    Sampling, Row Sampling, Conditional Split,
    Lookup, and Merge Join are powerful components
    the data miner can use to build a set of packages
    to prepare the case sets for the data mining
    process. This is shown at point E in
    ArchitectureFigure. In addition to the standard
    tasks, there are two Integration Services tasks
    that directly interact with the data mining
    models, shown at point F in ArchitectureFigure.

21
Integration Services and Data Mining
  • The Data Mining Model Training destination task
    is the target for a training set used to train
    (or re-train) an existing mining model. This
    capability is perfect for the ongoing re-training
    required to keep certain mining models
    currentrecommendation models, for example. The
    Data Mining Query task is specifically designed
    to do prediction joins against a model in the IS
    pipeline, once the model has been trained and
    tested. This could be used to add scores to the
    Customer table or identify significant data
    anomalies during the nightly ETL process. It
    could also be used in a real-time mode to flag
    transactions that were potentially fraudulent.

22
Additional Features
  • There are several additional features that
    will be important for certain applications. Most
    of these can be found at point G in
    ArchitectureFigure and are listed briefly here
  • Extensibility Microsoft has provided a set of
    COM APIs that allow developers to integrate
    additional data mining algorithms into the data
    mining engine. They can integrate custom viewers
    into the data mining designer as well. Someone
    could even create a new viewer for an existing
    Microsoft algorithm.
  • Analysis Management Objects (AMO) AMO is a new
    API for managing the creation and maintenance of
    data mining objects, including creating,
    processing, backing up, restoring, and securing.

23
Additional Features
  • Stored procedures and user-defined functions A
    developer can create what are essentially stored
    procedures or user-defined functions and load
    them as managed assemblies into Analysis
    Services. This allows clients to work with large
    mining models through the intermediate layer of
    the server-based managed assembly.
  • Text mining It is possible to do some
    interesting data mining on unstructured text
    data, like the text in HTML files in a set of web
    directories, or even text fields in a database.
    For example, use the Integration Services Term
    Extraction transformation to build a dictionary
    of terms found in the text files. Then use data
    mining to create classification rules to
    categorize the documents according to the terms
    they contain. This is essentially a data mining
    application, not a new data mining algorithm, but
    it has value in certain areas.

24
Architecture Summary
  • The previous slidfes show how data mining fits
    into the overall SQL Server 2005 architecture,
    and to show how well data mining has been
    integrated into the SQL Server environment. It
    should be clear by now that data mining is a
    serious element of the BI toolset. Data mining
    and application developers will have a lot more
    to learn before they are proficient with all the
    data mining components. Fortunately, Microsofts
    documentation is heavily weighted toward the
    development community. Additional help should be
    easy to find. A good place to start is a web site
    called www.sqlserverdatamining.com, maintained by
    the SQL Server data mining development team.

25
Topic 3 Tasks Accomplished by Data Mining
  • Classification
  • Regression
  • Segmentation
  • Association
  • Sequence Analysis
  • Probability Predictions

26
Classification
  • we may need to classify a potential customer as
    someone who is likely to need our services or
    someone who is not. If we know how to classify an
    individual, an entity, or a thing, we can make
    more intelligent decisions as we deal with that
    individual, entity, or thing.

27
Example of Classification using Tree Induction
Customer renting property gt 2 years
No
Yes
Rent property
Customer agegt45
No
Yes
Rent property
Buy property
28
Example of Classification using Neural Induction
29
Regression
  • Classification is used to predict the value for a
    discrete attribute, meaning an attribute that has
    one of a set number of distinct values.
    Regression, on the other hand, is used to predict
    a continuous value.
  • To predict a continuous value, regression looks
    at trends likely to continue and repeat over
    time.
  • Like classification, regression also looks at
    relationships between the value being predicted
    and other continuous values available in the
    data.

30
Regression - Value Prediction
  • Linear regression attempts to fit a straight line
    through a plot of the data, such that the line is
    the best representation of the average of all
    observations at that point in the plot.
  • Problem is that the technique only works well
    with linear data and is sensitive to the presence
    of outliers (i.e.., data values, which do not
    conform to the expected norm).

31
Segmentation
  • Segmentation is the "divide and conquer" approach
    to data analysis. Segmentation divides data into
    groups with similar characteristics. We can then
    analyze the characteristics of each group for
    insights.
  • Applications of database segmentation include
    customer profiling, direct marketing, and cross
    selling.

32
Example of Database Segmentation using a Scatter
plot
33
Association
  • Association requires we have some type of
    grouping in our data. Multiple items could be
    grouped together in a single sales transaction.
    Multiple services could be provided to a single
    family unit.
  • Classification, segmentation, and association can
    sometimes run together. They seem similar. The
    difference is this With classification, we pick
    the attribute that defines the grouping and the
    algorithm determines what most distinguishes
    those groupings. With segmentation, the algorithm
    itself creates the grouping, based on what it
    determines as distinguishing characteristics.
    With association, the underlying data already
    contains some type of grouping. The algorithm
    makes determinations about items likely to be in
    the group, rather than attributes they are likely
    to have in common.

34
Sequence Analysis
  • Sequence analysis is often applied to navigation
    on a website. A user is on Page A. The user is
    then most likely to navigate to Page B, with a
    somewhat smaller probability of navigating to
    Page C.
  • Finds patterns between events such that the
    presence of one set of items is followed by
    another set of items in a database of events over
    a period of time.
  • e.g. Used to understand long term customer buying
    behaviour.

35
Probability Predictions
  • It mathematically analyzes what has occurred in
    the past and determines what is most probable to
    occur if present trends continue.

36
Topic 4 Steps For Data Mining
  • Problem Definition
  • Data Preparation
  • Training
  • Validation
  • Deploy

37
Problem Definition
  • The first step for a data mining solution is to
    define the problem we are trying to solve. Data
    mining is great at finding patterns and
    relationships that are not apparent to human
    eyes. However, it cannot solve problems unless we
    point that analytical power in a specific
    direction.
  • We need to define the business goal we are
    working toward. Perhaps we want to do a better
    job at identifying perspective customers who are
    a bad credit risk. Perhaps we want to identify
    customers who are likely to switch to our
    competitor's products or services. Perhaps we
    want to predict future sales to put in place the
    capacity to meet those needs.

38
Problem Definition
  • Once the business goal is defined, we determine
    what we need
  • data mining to do for us. We select which of the
    five data mining
  • tasks we are looking to accomplish
  • Classification
  • Regression
  • Segmentation
  • Association
  • Sequence Analysis
  • Our choice of task influences our choice of data
    mining
  • Algorithm to be discussed in topic 3 .

39
Data Preparation
  • When we learned about OLAP cube design, one of
    our main concerns
  • was where the data to populate the cube was going
    to come from.
  • With data mining, we have the same concern where
    are we going to
  • get the data to be fed into our data mining
    algorithms?

40
Training
  • With algorithms selected and our data in place,
    we begin the process of working with the data
    mining model. You can think of a data mining
    model as the combination of one or more data
    mining algorithms with some real, live data. The
    model applies the algorithms to the data and
    creates the classifications, associations,
    regression formulas, and so forth that we use to
    solve our business problem.

41
Training
  • This step is known as training the data mining
    model. We provide our data to the data mining
    algorithms, so they can be trained on a
    particular aspect of our organization. They
    "learn" what is unique about our data, as well as
    the operations and entities, customers, products,
    and sales transactions that produced that data.

42
Validation
  • We spoke previously in this chapter about
    providing a data mining algorithm with past data,
    so it could find the classifications,
    associations, and so forth contained within. In
    actuality, we often divide our past data into two
    sets. One set of data is used to train the data
    model. The second set is used to validate the
    model after it is trained. Let's return to our
    credit risk example for an illustration.

43
Deploy
  • Once we test the data mining model and determine
    which data mining algorithm gives us the most
    accurate information, we can put it to work. We
    place the model where it can be accessed by the
    front-end tools that will put it to work. We use
    the Data Mining Extensions (DMX) query language
    to feed new records to our algorithm and have it
    generate information.

44
Topic 5 Data Mining Algorithms
  • SQL Server 2005 provides us with seven data
    mining algorithms. Most of these algorithms
    perform several different tasks. Having a
    detailed understanding of the inner workings of
    each algorithm is unnecessary. Instead, this
    section provides a brief explanation of each to
    give you some background information. More
    important is the knowledge of what each can be
    used to accomplish.

45
Data Mining Algorithms
  • Microsoft Decision Trees
  • Microsoft Naïve Bayes
  • Microsoft Clustering
  • Microsoft Association
  • Microsoft Sequence Clustering
  • Microsoft Time Series
  • Microsoft Neural Network

46
5.1 Microsoft Decision Trees
  • Function The Microsoft Decision Trees
    algorithm is one of the easiest algorithms to
    understand because it creates a tree structure
    during its training process. (You probably
    already guessed that from the name.) The tree
    structure is then used to provide predictions and
    analysis. .

47
The Microsoft Decision Trees Algorithm
                                               

48
Tasks
  • The main purpose of the Microsoft Decision Trees
  • algorithm is Classification.
  • It can also be used for Regression Association.

49
5.2 Microsoft Naïve Bayes
  • Microsoft's SQL Product Manager, Donald Farmer,
    claims because there is a Naïve Bayes algorithm,
    there must be a "deeply cynical" Bayes algorithm
    out there somewhere in data mining land. I guess
    it is needed to bring balance to data mining's
    version of the "force." We will try not to be too
    Naïve as we explore the benefits and shortcomings
    of this algorithm.

50
Function
  • The Naïve Bayes algorithm looks at each attribute
    of the entity in question and determines how that
    attribute, on its own, affects the attribute we
    are looking to predict. Figure 5.2 shows a Naïve
    Bayes algorithm being used to predict whether a
    customer is a good credit risk. One by one, the
    Naïve Bayes algorithm takes a single attribute of
    a customer, size of company, annual revenue, and
    so forth and looks at the training data to
    determine its effect on credit risk.

51
Figure 5.2 The Naïve Bayes algorithm
                                               

52
Function
  • In our diagram, 57 of companies with a size
    attribute of small are bad credit risks. Only 14
    of companies with a size attribute of large are
    bad credit risks. In this particular example, it
    looks pretty cut and dried we should never
    extend credit to small companies and we should
    always extend credit to large companies.

53
Tasks
  • The Naïve Bayes algorithm can only be used for
    Classification.

54
5.3 Microsoft Clustering
  • The Microsoft Clustering algorithm builds
    clusters of entities as it processes the training
    data set. This is shown in Figure 5-3. Once the
    clusters are created, the algorithm analyzes the
    makeup of each cluster. It looks at the values of
    each attribute for the entities in the cluster.
  • Cluster analysis lets the data miner assemble
    data into unforeseen groups containing similar
    characteristics. Also known as "segmentation,"
    this type of data
  • mining is probably the most widely used.

55
Figure 5.3 The Microsoft Clustering Algorithm
                                               

56
Tasks
  • The main purpose of the Microsoft
  • Clustering algorithm is
  • Segmentation
  • Regression
  • ClassificationTree Induction Neural Induction

57
Database Segmentation
  • Aim is to partition a database into an unknown
    number of segments, or clusters, of similar
    records.
  • Uses unsupervised learning to discover
    homogeneous sub-populations in a database to
    improve the accuracy of the profiles.

58
Example of Database Segmentation using a Scatter
plot
59
Example of Classification using Tree Induction
60
Example of Classification using Neural Induction
61
5.4 Microsoft Association
  • The Microsoft Association algorithm creates its
    own sets of entities, and then determines how
    often those sets occur in the test data set. This
    is shown in Figure 5.4. In this set, we are
    looking at groupings of products purchased
    together in a single purchase transaction. To
    simplify our example, we are only looking at
    products in the World War II product subtype.

62
Figure 5.4 The Microsoft Association algorithm
                                               

63
Tasks
  • The Microsoft Association algorithm can
  • only be used for Association.

64
5.5 Microsoft Sequence Clustering
  • The Microsoft Sequence Clustering algorithm
    examines the test data set to identify
    transitions from one state to another. The test
    data set contains data, such as a web log showing
    navigation from one page to another or perhaps
    routing and approval data showing the path taken
    to approve each request. The algorithm uses the
    test data set to determine, as a ratio, how many
    times each of the possible paths is taken.

65
Tasks
  • The main purpose of the Microsoft Sequence
    Clustering algorithm is Sequence Analysis
  • Segmentation.
  • It can also be used for Regression
    Classification.

66
5.6 Microsoft Time Series
  • The Microsoft Time Series algorithm starts with
    time-related data in the test data set. In Figure
    5.6, this is sales data for each month. To
    simplify things, we are only looking at data for
    two products in our example.

67
Figure 5.6 The Microsoft Time Series algorithm
                                               

68
Tasks
  • The Microsoft Time Series algorithm can only be
    used for Regression.

69
5.7 Microsoft Neural Network
  • Neural networks were developed in the 1960s to
    model the way human neurons function. Microsoft
    has created the Microsoft Neural Network
    algorithm, so we can use neural networks for such
    mundane activities as predicting product sales.
    Of course, predicting product sales might not
    seem so mundane if your future employment is
    dependent on being correct.

70
Tasks
  • The main purpose of the Microsoft Neural Network
  • algorithm is
  • Classification
  • Regression

71
Topic 6 Data Mining Structure Model
  • The data mining structure enables us to define
    the data we will use, either from a relational
    database or from an OLAP cube, as the source for
    the mining. It also lets us define one or more
    data mining algorithms to apply to that data.

72
Data Columns
  • The first step in doing any mining is to select
    the data being mined. We select a data source
    either relational data or an OLAP cube. Once this
    is done, we need to select a table if this is a
    relational data source, or a dimension if this is
    an OLAP data source. Finally, we must select the
    table columns or dimension attributes to be used
    as the data columns for our data mining.

73
Data Columns
  • Although an initial table or dimension must be
    selected for data mining, it is possible to
    select additional related tables or dimensions
    for inclusion. These are known as nested tables.
    If a relational database is being used, the
    initial table and the nested table must be linked
    by a foreign key relationship (or a chain of
    foreign key relationships). If a cube is being
    used, the initial dimension and the nested
    dimension must be linked through the fact table.

74
Data Mining Model
  • The data mining model combines the data columns
    with a data mining algorithm. Additionally, we
    must determine how each data column should be
    used by the data mining algorithm. This process
    determines how the data mining algorithm
    functions and what it predicts.

75
Data Column Usage
  • Each column in the data mining structure must
    have one
  • of the following usages
  • Key The key is the unique identifier for a
    table or a dimension. The key is not used to
    uniquely identify records or members and it is
    not used by the data mining algorithm to predict
    values.
  • Input Input columns are used by the data mining
    algorithm when making a prediction. The input
    values are the values that influence the
    prediction made by the algorithm.

76
Data Column Usage
  • Predict A predict is a data column whose value
    is being predicted by the data mining algorithm.
    This column can also be used as an input column.
    If data column A and data column B are both
    predict, then data column A will serve as an
    input when the algorithm is predicting data
    column B, and vice versa.

77
Data Column Usage
  • Predict Only A predict only is a data column
    whose value is being predicted by the data mining
    algorithm. This column cannot be used as an input
    column when predicting other columns.
  • Ignore This data column is not used by the data
    mining algorithm.

78
Using Cube SliceAs a Training Set
  • The fact that data must be divided into training
    and
  • testing data sets. The cube slice enables us to
    specify an
  • expression that divides the cube into two parts.
    The
  • portion of the data that satisfies the expression
    is fed into
  • the data mining algorithm for training. The rest
    of the
  • data will be testing data set.

79
Defining a Data Mining Model
  • Steps
  • Business Need
  • Select the data mining algorithms
  • Determine where the data source

80
Mining Model Viewer
  • Now that we have several data mining models
    created and trained, let's look at how we begin
    to analyze the information. We do this by using
    the data mining model viewers, which are found on
    the Mining Model Viewer tab on the Data Mining
    Design tab.
  • Each data mining algorithm has its own set of
    viewers, enabling us to examine the trained data
    mining algorithm. These viewers present the
    mining information graphically for ease of
    understanding. If desired, we can also use the
    Microsoft Mining Content Viewer to look at the
    raw data underlying the graphical presentations.

81
Mining Model Viewer
  • Microsoft Tree Viewer
  • Microsoft Mining Content Viewerused to see the
    detailed data underlying the diagrams

82
The SQL Server Management Studio Windows
  • Registered Servers Window
  • Object Explorer Window
  • Summary Window
  • Query Windows

83
Query Windows
  • One of the functions of the SQL Server Management
    Studio is the creation and execution of queries.
    These queries may be Transact-SQL for relational
    databases, MDX or XMLA for Analysis Services
    databases, DMX for data mining, or SQL Mobile for
    hand-held devices. Each type of query has its own
    specific type of query window. We discuss how to
    use these query windows throughout the rest of
    this book.

84
MDX Definition
  • Multidimensional Expression (MDX) language
    provides the programming language for OLAP Cube
    navigation and over 50 mathematical functions for
    calculating cube measures. It can be called and
    used in the relational database language of
    Transact-SQL.

85
DMX Definition
  • Data Mining Expression (DMX) language is a new
    language in SQL Server 2005, which provides the
    commands to easily set up data mining. It can be
    called and used in the relational database
    language of Transact-SQL

86
XML Definition
  • XML for Analysis Services (XMLA) is an open,
    XML-based standard protocol for interacting with
    Microsoft SQL Server 2005 Analysis Services data
    over an HTTP connection, such as an intranet or
    the Internet. XMLA uses the Simple Object Access
    Protocol (SOAP).

87
Mining Accuracy Chart
  • Before we put a data mining model to use in a
    production environment, we need to insure the
    model is making predictions with the desired
    accuracy. This is done using the Mining Accuracy
    Chart tab in the Business Intelligence
    Development Studio. The Mining Accuracy Chart tab
    provides three tools for determining the accuracy
    of our mining models the Lift Chart, the Profit
    Chart, and the Classification Matrix.

88
Deployment
  • As you see in Figure 6.1, the deployment step in
    the Business Dimensional Lifecycle is where the
    three parallel development tracks come back
    together. This is the great unveiling of the
    DW/BI system to the business community. The
    quality of this first impression will strongly
    influence the acceptance of the systemand you
    get only one shot at it. Like any big event,
    there are a lot of details that must fall into
    place in order for the show to be successful.

89
Figure 6.1 The Deployment step in the Business
Development Lifecycle
90
Deployment
  • What needs to be tested in the DW/BI system
    before deployment?
  • How and where should you perform that testing?
  • Whats the easiest way to deploy the completely
    new DW/BI system, and how can you safely test and
    deploy changes to an existing system?
  • Why is DW/BI documentation important, and what
    documentation does your team need to write?
  • What kind of support and training will your users
    need?
Write a Comment
User Comments (0)
About PowerShow.com