Title: Yuan Xiaojie
1??????????
Yuan Xiaojie Dept. of Computer Science and
technology, Nankai University
2????????
?????????? (20??60?????)
??????? (70??)
??Web?????? (90??--??)
????????? (80????--??)
??????? (80????--??)
????????? (2000--...)
3??????
DBMS
????
????
- ?????????
- ????????
- ????????
- ?????DBMS
- DBMS????
- DBMS????
- ?????????
- ????????
- ????????
4?????DBMS
DBMS
??C??????????
include ltfstream.hgt struct Dateint
iMonth,iDay,iYear void main() Date
dt6,10,92 ofstream datafile("ata.dat",iosbi
nary) datafile.write((char )dt,sizeof dt)
5ata.dat 06 00 00 00 0A 00 00 00 5C 00 00 00
6?????????????
- ???(metadata),???????
- ?????C????,???????????????,?????????????????????
?? - ????????????????????
- ?????????????????
7???????
- ????
- ???????????????????
- ?????????????
- ??????????????????
- ????
- ???????????????????
- ???????????????,?????
- ??????,??????,???????
8??????????
- ?????
- ??????????,????????
- ????????????????
- ??????????
- ???????????????
- ????????????
- ????????
- ????????
- ????
- ?????
9??????
10??????
- ????????????????????,??????????????
11An Example of a Relation
Table name
Attribute names
Products
Name Price Category
Manufacturer gizmo 19.99
gadgets GizmoWorks Power gizmo
29.99 gadgets
GizmoWorks SingleTouch 149.99
photography Canon MultiTouch 203.99
household Hitachi
tuples
12????????
Department
Project
Workson
Employee
13?????????SQL
- SQL(Structured Query Language)?1974??Boyde?Chamber
lin??? - SQL is a very-high-level language
- User can say what to do rather than specify
how to do it - Can avoid specifying a lot of data-manipulation
details needed in procedural languages like C
or Java - Database management system figures out best way
to execute query - Called query optimization
14?????
CREATE TABLE department ( deptno char(4)
PRIMARY KEY , deptname char(25),
location char(20) )
CREATE TABLE employee ( empno int
PRIMARY KEY empname char(20),
deptno char(4) REFERENCES department(deptno), )
15?????
???????????
select empno, empname, location from employee,
department where employee.deptnodepartment.deptno
16?????????
insert into department values ('d5','???','??')
17DBMS????? ???????
CREATE TABLE employee ( empno int PRIMARY
KEY empname char(20), deptno
char(4) REFERENCES department(deptno), )
???????? ????????
18DBMS????? ?????????
????????
?? ??
OS??? ???
??
???? ??
???? ????
??
DBMS??
????????
????????????,???????????,DBMS??????,?????????,????
?????????????
19?????????
select empno, empname, location from employee,
department where employee.deptnodepartment.deptno
???????? ????????
20DBMS?????
21?????????
- Data Warehousing, OLAP and data mining
- what and why (now)?
22?????
23Data, Data everywhere yet ...
- ?????????
- ????????
- ?????,?????
- ?????????
- ??????????
- ?????????
- ?????????
- ?????????
- ??????
- ??????????????
24What is a Data Warehouse?
- ?????????1988?Barry Devlin? Paul
Murphy?IBM????????????????????????????????????,?
???????
????????????????????,????????????????,????????
????????,??????????????????????OLAP?????????,?????
???????,??????
25???????????
??DECISIONS
??KNOWLEDGE
??DATA
- Patterns
- Trends
- Facts
- Relations
- Models
- Associations
- Sequences
- Target Markets
- Funds allocation
- Trading options
- Where to advertise
- Catalog mailing list
- Sales geography
- ??? Financial
- ???Economic
- ??Government
- ????Point-of-Sale
- ?????Demographic
- ????Lifestyle
26Data Warehouse Architecture
27????(OLAP Tool)
28DW Integration
MOLAP
ROLAP
Client- OLAP
DW-DB (mostly relational)
29Example Data Model
Sale
??
??
???
30Simple Hierarchies
1/2 Year Period
Month
Quarter
Year
Dimension Level
1. Halbjahr 99
1999
Juli 99
August 99
3. Quartal 99
2. Halbjahr 99
Sept. 99
............
31????(I)
32????(II)
SELECT g1,...,gn, aggr(m1),..., aggr(mk)FROM
FactName, Dim1,..., DimnWHERE Dim1.level(r1)
r1 AND ... AND Dimn.level(rn) rn AND
Dim1.d1FactName.d1 AND ... AND
Dimn.dnFactName.dnGROUP BY g1,...,gn
33Data Mining works with Warehouse Data
- Data Warehousing provides the Enterprise with a
memory
- Data Mining provides the Enterprise with
intelligence
34????
Industry
Application
??
?????
??,????
??
??
??????
??
????
???
????
???????
????
????
????
35???????????
- ??????????????,??????????????????
- ???????????????????100
- GUS??????????????????????,??????
- ????????????????????3.8
36???????????
- ?????????????????????????????????
- ???????????,??????,??????????
- ???????????????????,??????????????
- ???????30
37????????(??)
(Big Bank Credit Card Company)
???????
?????? ?????? ??
????? 1,000,000 750,000 (250,000)
?? 1,000,000 750,000 (250,000)
????? 10,000 9,000 (1,000)
??????? 125 125 0
??? 1,250,000 1,125,000 (125,000)
??? 250,000 375,000 125,000
????? 0 40,000 40,000
????? 250,000 335,000 85,000
38????????? ?????????
- ????
- ????????
- ????????
- ??,??????????????????,????????????????????????,???
???????,?? - ??/??? ??????(?????)
- ????
- ?????
- ????
- ?????
- ???????
39????????????
- ?????????????????
- ?????????????
- ????????????????????,????????????
- ?????????,?????????????????????????????????????
- ????????????????????
40?????
- ??????(Automatic Text Categorization,ATC),
?????????,????????????????????????. - ????
- ????????
- Rocchio??
- ??Bayes (Naive Bayes)
- k-???(k-Nearest Neighbor,kNN)
- ?????(support vector machine,SVM)
- ????
- ??????
- ???????
- ?? ??????????
41???????
a.?????? ??? b.???? ???? ?????(???) ????
???? ????
42Training Dataset
This follows an example from Quinlans ID3
43Output A Decision Tree for buys_computer
age?
lt30
overcast
gt40
30..40
student?
credit rating?
yes
no
yes
fair
excellent
no
no
yes
yes
44????
- ?????????????????,???????,??????????,??,?????????
?(unsupervised learning)??,?????????????,??????
???,???????????? - ????(Text clustering) ?????????????,????????????
?????????,????????????????????????????????????
45(No Transcript)
46(No Transcript)
47?????
- ?????????????????????????,?????
- ??????,????????,?????,?????????
- ?????????????????????????,???????,???????
48????
- ????????????????????????????(?????)
- ??????????????????
- ??
- ????? ????
- ???????????
- ????????
49????????????
- ?????
- ????
- ?????
- ???????????
- ?????
- ?????????????
- ???
- ??????
- ??????????-?????
- ????????(legacy)???
- ?????
- ???(WWW)
50http//www.sigkdd.org/kddcup
51(No Transcript)