?????????????PMML(Predict Model Markup Language) - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

?????????????PMML(Predict Model Markup Language)

Description:

PMML(Predict Model Markup Language) PMML PMML PMML DTD PMML ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 17
Provided by: John61
Category:

less

Transcript and Presenter's Notes

Title: ?????????????PMML(Predict Model Markup Language)


1
?????????????PMML(Predict Model Markup Language)
  • ??PMML
  • ????
  • ???????
  • PMML???
  • PMML DTD

2
??PMML
  • ????XML????????
  • ????????(Data Mining)???
  • ????XML???,??????XML??????
  • ??????????
  • ??????????
  • ???????????

3
??PMML(?)
  • ??????????(Nation Center for Data Mining)??????
  • ????www.dmg.org
  • ???????????????
  • DMG PMML
  • OMG CWM DM
  • SQL/MM Part 6 for Data Mining
  • JSR-073 Java Data Mining API
  • Microsoft OLE DB for Data Mining

4
Data Mining Group
  • PMML Version 1.1????
  • Angoss, IBM, Magnify, Microsoft, NCR, Oracle,
    SPSS, University of Illinois at Chicago
  • Focused group to expedite process
  • PMML Version 1.2 ??Xchange,?????
  • Open to any qualified vendor selling data mining
    products
  • Augmented by experts reviewers
  • ??xml.org?????????

5
PMML???
  • PMML??????????????????
  • PMML 1.0??????????(Model),????????
  • ?????????????
  • PMML??????MetaData??????
  • ???????????????
  • ??????????

6
PMML???(?)
  • Open standard for Data Mining Models
  • Not is concerned with the process of creating a
    model
  • Provides independence from application, platform,
    and operating system
  • Simplifies use of data mining models by other
    applications (consumers of data mining models)

7
????
  • ???????(Data Mining)????????????,????????Data
    Mining????????????????????????????????????????????
    ????,??????????????????????????,??????????????????
    ???????????,?Data Mining???????,?????????,??????,?
    ?Internet??????????,????????????

8
????(?)
  • ??Data Mining?????????????????????????????????????
    ????????,????????????????????????????????????????
    ??????????????,???????????????????????????,???????
    ????,???????????????????,?????????????????????????
    ?,????????,?????????????,????????,????????????

9
???????
  • ???(classification)
  • ???(estimation)
  • ???(prediction)
  • ?????(affinity grouping)
  • ?????(clustering)

10
???????
  • ??????(Association Rule)
  • ??????(Automatic Cluster Detection)
  • ???(Decision Tree)
  • ?????(artificial neural network)
  • ???????(Genetic Algorithm)
  • ??????(Online Analytical Processing)

11
PMML???(v1.1)
  • Schemas
  • Data dictionary (data schema, including outliers,
    missing values)
  • Mining schema
  • Infrastructure
  • Univariate statistics
  • Normalization and transformation (very basic)
  • Models

12
PMML???(v1.1)
  • Polynomial regression
  • General regression
  • Trees
  • Center based clusters
  • Density based clusters
  • Associations
  • Neural nets
  • more to be added in v1.2

13
?? PMML (v1.1)
  • ltTreeModel modelName"golfing"gt ltMiningSchemagt
    ltMiningField name"temperature"/gtltMiningField
    name"humidity"/gt ...lt/MiningSchemagt ltNode
    score"play"gt ltPredicate field"outlook"
    operator"equal" value"sunny"/gtltNode
    score"play"gtltCompoundPredicate
    booleanOperator"and" gtltPredicate
    field"temperature operator"lessThan"
    value"90F" /gt ltPredicate field"temperature"
    operator"greaterThan" value"50F" /gt

14
PMML DTD_v1_1
  • lt?xml version'1.0' encoding'ISO-8859-1'
    ?gtlt!ENTITY A-PMML-MODEL '(TreeModel
    NeuralNetwork ClusteringModel
    RegressionModel GeneralRegressionModel
    AssociationModel )' gtlt!ELEMENT PMML ( Header,
    DataDictionary, (A-PMML-MODEL), Extension )
    gtlt!ATTLIST PMML version CDATA
    REQUIREDgtlt!ELEMENT Extension ANY gtlt!ATTLIST
    Extension extender CDATA IMPLIEDname CDATA
    IMPLIED value CDATA IMPLIED gt

15
PMML DTD_v1_1(?)
  • lt!ENTITY NUMBER "CDATA" gtlt!ENTITY INT-NUMBER
    "CDATA"gtlt!-- content must be an integer, no
    fractions or exponent --gtlt!ENTITY REAL-NUMBER
    "CDATA"gtlt!-- content can be any number covers
    C/C types 'float','long','double'scientific
    notation, eg 1.23e4, is allowed --gtlt!ENTITY
    PROB-NUMBER "CDATA"gtlt!-- a REAL-NUMBER between
    0.0 and 1.0usually describing a probability
    --gtlt!ENTITY PERCENTAGE-NUMBER "CDATA"gtlt!-- a
    REAL-NUMBER between 0.0 and 100.0 --gt

16
PMML DTD_v1_1(?)
  • lt!ENTITY FIELD-NAME "CDATA"gtlt!ELEMENT Array
    (PCDATA) gtlt!ATTLIST Arrayn INT-NUMBER
    IMPLIEDtype ( int real string )
    IMPLIEDgtlt!ENTITY NUM-ARRAY "Array"gtlt!-- an
    array of numbers --gtlt!ENTITY INT-ARRAY
    "Array"gtlt!-- an array of integers --gtlt!ENTITY
    REAL-ARRAY "Array"gtlt!-- an array of reals
    --gtlt!ENTITY STRING-ARRAY "Array"gtlt!-- an
    array of strings --gt
Write a Comment
User Comments (0)
About PowerShow.com