Yuan Xiaojie - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Yuan Xiaojie

Description:

Title: Author: XuCF Last modified by: Created Date: 11/10/2000 1:28:04 AM Document presentation format: (4:3) – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 52
Provided by: XuCF
Category:

less

Transcript and Presenter's Notes

Title: Yuan Xiaojie


1
??????????
Yuan Xiaojie Dept. of Computer Science and
technology, Nankai University

2
????????
?????????? (20??60?????)
??????? (70??)
??Web?????? (90??--??)
????????? (80????--??)
??????? (80????--??)
????????? (2000--...)
3
??????
DBMS
????
????
  • ?????????
  • ????????
  • ????????
  • ?????DBMS
  • DBMS????
  • DBMS????
  • ?????????
  • ????????
  • ????????

4
?????DBMS
DBMS
  • ???????????????????

??C??????????
include ltfstream.hgt struct Dateint
iMonth,iDay,iYear void main() Date
dt6,10,92 ofstream datafile("ata.dat",iosbi
nary) datafile.write((char )dt,sizeof dt)
5
ata.dat 06 00 00 00 0A 00 00 00 5C 00 00 00
6
?????????????
  • ???(metadata),???????
  • ?????C????,???????????????,?????????????????????
    ??
  • ????????????????????
  • ?????????????????

7
???????
  • ????
  • ???????????????????
  • ?????????????
  • ??????????????????
  • ????
  • ???????????????????
  • ???????????????,?????
  • ??????,??????,???????

8
??????????
  • ?????
  • ??????????,????????
  • ????????????????
  • ??????????
  • ???????????????
  • ????????????
  • ????????
  • ????????
  • ????
  • ?????

9
??????
10
??????
  • ????????????????????,??????????????

11
An Example of a Relation
Table name
Attribute names
Products
Name Price Category
Manufacturer gizmo 19.99
gadgets GizmoWorks Power gizmo
29.99 gadgets
GizmoWorks SingleTouch 149.99
photography Canon MultiTouch 203.99
household Hitachi
tuples
12
????????
Department
Project
Workson
Employee
13
?????????SQL
  • SQL(Structured Query Language)?1974??Boyde?Chamber
    lin???
  • SQL is a very-high-level language
  • User can say what to do rather than specify
    how to do it
  • Can avoid specifying a lot of data-manipulation
    details needed in procedural languages like C
    or Java
  • Database management system figures out best way
    to execute query
  • Called query optimization


14
?????
CREATE TABLE department ( deptno char(4)
PRIMARY KEY , deptname char(25),
location char(20) )
CREATE TABLE employee ( empno int
PRIMARY KEY empname char(20),
deptno char(4) REFERENCES department(deptno), )
15
?????
???????????
select empno, empname, location from employee,
department where employee.deptnodepartment.deptno
16
?????????
insert into department values ('d5','???','??')
17
DBMS????? ???????
CREATE TABLE employee ( empno int PRIMARY
KEY empname char(20), deptno
char(4) REFERENCES department(deptno), )
???????? ????????
18
DBMS????? ?????????
????????
?? ??
OS??? ???
??
???? ??
???? ????
??
DBMS??
????????
????????????,???????????,DBMS??????,?????????,????
?????????????
19
?????????
select empno, empname, location from employee,
department where employee.deptnodepartment.deptno
???????? ????????
20
DBMS?????
21
?????????
  • Data Warehousing, OLAP and data mining
  • what and why (now)?

22
?????
23
Data, Data everywhere yet ...
  • ?????????
  • ????????
  • ?????,?????
  • ?????????
  • ??????????
  • ?????????
  • ?????????
  • ?????????
  • ??????
  • ??????????????

24
What is a Data Warehouse?
  • ?????????1988?Barry Devlin? Paul
    Murphy?IBM????????????????????????????????????,?
    ???????

????????????????????,????????????????,????????
????????,??????????????????????OLAP?????????,?????
???????,??????
25
???????????
??DECISIONS
??KNOWLEDGE
??DATA
  • Patterns
  • Trends
  • Facts
  • Relations
  • Models
  • Associations
  • Sequences
  • Target Markets
  • Funds allocation
  • Trading options
  • Where to advertise
  • Catalog mailing list
  • Sales geography
  • ??? Financial
  • ???Economic
  • ??Government
  • ????Point-of-Sale
  • ?????Demographic
  • ????Lifestyle

26
Data Warehouse Architecture
27
????(OLAP Tool)
28
DW Integration
MOLAP
ROLAP
Client- OLAP
DW-DB (mostly relational)
29
Example Data Model
Sale
??
??
???
30
Simple Hierarchies
1/2 Year Period
Month
Quarter
Year
Dimension Level
1. Halbjahr 99
1999
Juli 99
August 99
3. Quartal 99
2. Halbjahr 99
Sept. 99
............
31
????(I)
32
????(II)
SELECT g1,...,gn, aggr(m1),..., aggr(mk)FROM
FactName, Dim1,..., DimnWHERE Dim1.level(r1)
r1 AND ... AND Dimn.level(rn) rn AND
Dim1.d1FactName.d1 AND ... AND
Dimn.dnFactName.dnGROUP BY g1,...,gn
33
Data Mining works with Warehouse Data
  • Data Warehousing provides the Enterprise with a
    memory
  • Data Mining provides the Enterprise with
    intelligence

34
????
Industry
Application
??
?????
??,????
??
??
??????
??
????
???
????
???????
????
????
????


35
???????????
  • ??????????????,??????????????????
  • ???????????????????100
  • GUS??????????????????????,??????
  • ????????????????????3.8

36
???????????
  • ?????????????????????????????????
  • ???????????,??????,??????????
  • ???????????????????,??????????????
  • ???????30

37
????????(??)
(Big Bank Credit Card Company)
???????
?????? ?????? ??
????? 1,000,000 750,000 (250,000)
?? 1,000,000 750,000 (250,000)
????? 10,000 9,000 (1,000)
??????? 125 125 0
??? 1,250,000 1,125,000 (125,000)
??? 250,000 375,000 125,000
????? 0 40,000 40,000
????? 250,000 335,000 85,000
38
????????? ?????????
  • ????
  • ????????
  • ????????
  • ??,??????????????????,????????????????????????,???
    ???????,??
  • ??/??? ??????(?????)
  • ????
  • ?????
  • ????
  • ?????
  • ???????

39
????????????
  • ?????????????????
  • ?????????????
  • ????????????????????,????????????
  • ?????????,?????????????????????????????????????
  • ????????????????????

40
?????
  • ??????(Automatic Text Categorization,ATC),
    ?????????,????????????????????????.
  • ????
  • ????????
  • Rocchio??
  • ??Bayes (Naive Bayes)
  • k-???(k-Nearest Neighbor,kNN)
  • ?????(support vector machine,SVM)
  • ????
  • ??????
  • ???????
  • ?? ??????????

41
???????
a.?????? ??? b.???? ???? ?????(???) ????
???? ????
42
Training Dataset
This follows an example from Quinlans ID3
43
Output A Decision Tree for buys_computer
age?
lt30
overcast
gt40
30..40
student?
credit rating?
yes
no
yes
fair
excellent
no
no
yes
yes
44
????
  • ?????????????????,???????,??????????,??,?????????
    ?(unsupervised learning)??,?????????????,??????
    ???,????????????
  • ????(Text clustering) ?????????????,????????????
    ?????????,????????????????????????????????????

45
(No Transcript)
46
(No Transcript)
47
?????
  • ?????????????????????????,?????
  • ??????,????????,?????,?????????
  • ?????????????????????????,???????,???????

48
????
  • ????????????????????????????(?????)
  • ??????????????????
  • ??
  • ????? ????
  • ???????????
  • ????????

49
????????????
  • ?????
  • ????
  • ?????
  • ???????????
  • ?????
  • ?????????????
  • ???
  • ??????
  • ??????????-?????
  • ????????(legacy)???
  • ?????
  • ???(WWW)

50
http//www.sigkdd.org/kddcup
51
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com