Title: Mining Frequent Patterns Without Candidate Generation
1??? ??????????? ????
- ????????
- ???????????
- ??ARMA?????????
- ???????????????????
- ???????????
- ??????????
- AprioriAll ??
- AprioriSome ??
- GSP??
2????????
- ????(Time Series)?????????????????,????????? ?
- ???,??????????????????????????????????????????????
???????????????,??????????????????????????????????
??????? - ???????????????????,???????????,???????????
3????????
- ????????,????????????????????????,????????????????
- ??????????????????????,???????,?????????????????
- ???,??????????????????????????????????????????????
????????,?????????????,???????????????????? - ????????,?????????????????X(t)????,??????t1,t2,,t
n(t????,?t1ltt2lt,lttn)??????????Xt1,Xt2,,Xtn??????
??????X(t)???????,Xti (i1,2,,n)????????,????????
??
4????????
- ?????????????????????,??????????????????????
?????????? - ??????????????????,??????????????????????
- ??????????????????????????,???????????????????????
????????????????? - ?????????????????????????????????,???????????????
- ?????????????????????????????????,???????????????
- ????????????????????????????,????????????????????
????(????)???,???????????????
5??? ??????????? ????
- ????????
- ???????????
- ??ARMA?????????
- ???????????????????
- ???????????
- ??????????
- AprioriAll ??
- AprioriSome ??
- GSP??
6???????????
- ????????????????,????????????????????,???????
??????????????????????,????????????????????? - ???????????
- ??????????
-
- ????
-
7???????????(?)
- ???????????
- ???????????????,??????????????,?????????????????
????,?????????????,??????????????????????? - ???????????????????????????????,????????????????
???????????????? - ?????????????????????????????????????
- ??????????(???)????????(??????????)?
- ???????????????
- ?Tt??????,St ?????????,Ct ?????????,Rt???????,yt
??????????????????????????????? - ????yt Tt St Ct Rt?
- ????yt TtStCtRt?
- ????yt TtSt Rt ?yt St TtCtRt?
8???????????(?)
- ??????????
- ????????,???????????,????????
- ?????????,??????(Auto Regressive,??AR)?????????(Mo
ving Average,??MA)????????(Auto Regressive Moving
Average,??ARMA)????????? - ????
- ??????????????,???????????????????????????,????
????????????????????????????????,?????????????????
??????,?????????????,??????????
9??? ??????????? ????
- ????????
- ???????????
- ??ARMA?????????
- ???????????????????
- ???????????
- ??????????
- AprioriAll ??
- AprioriSome ??
- GSP??
10??ARMA?????????
- ARMA??(??????AR??)?????????????????????????1927?,G
. U. Yule????AR??,??,AR???????ARMA?????ARMA???ARMA
????????????ARMA???????????,??????????????????????
,???????????????? - 1.ARMA??
- ??????????????
,?X?t??????????n?????
??,?????m??????
??(n,m1,2,),???????????,???????ARMA(n,m)?? - ?? ?
11??ARMA?????????(?)
- 2.AR??
- AR(n)???ARMA(n,m)???????????ARMA(n,m)?????,?
?,? - ?? ????????????????,????n?????
?,??AR(n)? - 3 . MA??
- MA(m)???ARMA(n,m)????????????ARMA(n,m)?????,?
?,? - ?? ?????????????,????m?????(
Moving Average)??,??MA(m)?
12??AR??
??AR????????????????????? ??AR(n)??,?
,??
, ????????????? ,
, ,
? ?????????? , ?? ???
???????,???? ???????? ?
,
,
,
,
?
13??????
- ???????,??????????
????? ,????????????????????Yi????? ? - ? ??n???,?????n??????,??????????????
n???Rn?????????,???????????????????? - 1.Euclide
-
- 2.????????
- ?? ???????????,N??????????
- 3.Mahalanobis????
- ?? ????????????
- 4.Mann????
-
- ??, ???????????, ?????????
,
,
,
?
14??? ??????????? ????
- ????????
- ???????????
- ??ARMA?????????
- ???????????????????
- ???????????
- ??????????
- AprioriAll ??
- AprioriSome ??
- GSP??
15???????????????????
- ??????,??????????????????????
- ??????
- Len(X)????X???
- First(X)????X??????
- Last(X)????X???????
- ??X?i?????,
- ????????lt??,???X?,??iltj ,??XiltXj
- ??? ??X????,????X?k????,???????????? ?
- ?????lt??, ?X????,??
- ,?? ?
- ?????(Overlap),??X S1,XS2?X??????,??
? - ??,?XS1?XS2???
16???????????????????
- ???,??????????
- ????(Whole Matching)???N???
- ???????X,??????????,????
,????? ????? - ?????(Subsequence Matching)???N??????????
???????X??????????????
????????,???????X???????????
17????
- ???????????????????????????????,????????,?????
????? - 1.????
- ????????
,?X?????????,?? , - ??,X?xt??????,? ? ??????,
, ??????? - 2.????
- ??Parseval???,?????????????????,??
18????(?)
- ??Parseval???,?????????
- ????????,?????????????????,???????????????????????
??????? ???,? - ??,
- ???????????????????????
- 3.????
- ???????????????????????????,?????????????????,????
???
19??? ??????????? ????
- ????????
- ???????????
- ??ARMA?????????
- ???????????????????
- ???????????
- ??????????
- AprioriAll ??
- AprioriSome ??
- GSP??
20???????????
?6-3 ??X?Y
?6-4 ??Gap????X?Y
?6-5 ????????X?Y ?6-6
????????X?Y
21????
- ??6-1 ???? ??????????????
?Y ?????????????
????3???,???? - ? ??-similar
- (1)????
??? - (2)?????????????????????
-
- ????????????,?????????????????,??????????
- (3)???,????
- ???????????X?Y?????????????????????????,????
?X?Y??-similar???????????,??????????????? - ???????X?Y???????,???????????????
?
22???????????
- Agrawal?X?Y???????????????????????????????
- 1.??????
- ?????????????????????,??????(Atomic
Matching)???????????????????????(???520),????????
???????,??????????????????? - ????????????????????????,?????????????????
- ?? ??????i????, ? ???????????????????
????????????????????(-1,1)??????????????????
23???????????(?)
- 2.????
- ????(Window Stitching)??????,?????????????????????
?????????? - ???X?Y?m????????,??
- ??????????????????
- (1)?????i?? ??
- (2)????jgti,
- (3)???igt1,?? ?? ??,? ?
???Gap?????,??Y????????? ? ??,?????d, ?
??????????d? - (4)X???????????????????????,Y?????????????????????
????
24???????????(?)
- 3.?????
- ?????????????????,?????????(Subsequence
Ordering),??????????????? - ????????????????????????????????????
- ????????????????????,???????????????????,?????????
????,??????????????
25??? ??????????? ????
- ????????
- ???????????
- ??ARMA?????????
- ???????????????????
- ???????????
- ??????????
- AprioriAll ??
- AprioriSome ??
- GSP??
26????
- ????????????,?????????????????????????????????????
??,????????????????????????,??????????????????????
???????????,???????????????? - ??????????Agrawal?????,???????????????????????????
???????????????????????? - ????????????????????????,???????????????,?DNA?????
???????Web???????????????????????
27????????
- ??6-3 ????(Sequence)???????,??aa1?a2???an,????ai?
????(Itemset)????????(Length)???????????k???????k-
??? - ??6-4 ???aa1?a2???an,??ßß1?ß2???ßm
??????i1lti2lt?ltin,?? -
, - ????a???ß????,???ß????a???????,?????a?????????
?,??a????????(Maximal sequence)? - ??6-5 ????S,?????DT,??S????(Support)??S?DT????????
????????S????????????????????(min-sup)?k-??,??DT??
??k-???
28??????????
?6-1?????????????
??????????????????????(Customer-id)?????(Transacti
on-Time)??????????(Item)?????????6-1??????????????
????????????????,?????????????????????,??????????
???????????????6-2????6-1???????????
???(Cust_id) ????(Tran_time) ??(Item)
1 1 June 2599 June 3099 30 90
2 2 2 June 1099 June 1599 June 2099 10,20 30 40,60,70
3 June 2599 30,50,70
4 4 4 June 2599 June 3099 July 2599 30 40,70 90
5 June 1299 90
?6-2???????
???(Cust_id) ????(Customer Sequence)
1 lt(30)(90)gt
2 lt(10,20)(30)(40,60,70)gt
3 lt(30,50,70)gt
4 lt(30)(40,70)((90) gt
5 lt(90)gt
29??????????(?)
?????????????????????????????????????????????????
??????????????????????????????????????????????????
???6-3??????????????,????????????????????????????
?6-3??????????
???(Pro_id) ????(Call_time) ???(Call_id)
744 744 1069 9 1069 744 1069 9 -1 04011030 04011031 04011032 04011034 04011035 04011038 04011039 04011040 23 14 4 24 5 81 62 16
?6-2???????
?6-4???????????
???(Pro_id) ????(Call_sequence)
744 1069 9 lt(23,14,81)gt lt(14,24,16)gt lt(4,5,62)gt
30???????????
- ??????????????????????????????????????????????????
????????????? - 1. ????
- ????????(Sort),????????????????????(??????????????
???????)???,??????????,??????(Cust_id)?????(trans-
time)????,??????????????????????????????? - 2. ?????
- ??????????????(????)?????L????,????????1-???????,?
ltlgt l ?L? - ????6-2???????????,??????2,???????(30),(40),(70),(
40,70)?(90)??????,??????????????????,????????????6
-6????????,???????????????????
Large Itemsets Mapped To
(30) (40) (70) (40,70) (90) 1 2 3 4 5
31???????????(?)
- 3. ????
- ???????????,?????????????????????????????????
- ?6-7????6-2???????????????,??ID??2????????????
,??(10,20)????,???????????????(40,60,70)????????
(40),(70),(40,70)??? - 4. ????
- ????????????????,????(Large Sequence)?
- 5. ?????
- ????????????(Maximal Sequences)?
Large Itemsets Mapped To
(30) (40) (70) (40,70) (90) 1 2 3 4 5
32??? ??????????? ????
- ????????
- ???????????
- ??ARMA?????????
- ???????????????????
- ???????????
- ??????????
- AprioriAll ??
- AprioriSome ??
- GSP??
33AprioriAll??
- AprioriAll?????????Apriori,??Apriori?????????????,
??????????????? - ????????????????????????,??????????????????????
- ???????,????????????????1-??????
- ??????,???????????????,???????,?????????????
- ???????,????????????1-?????????
34AprioriAll??
1. AprioriAll???? ??6-1 AprioriAll?? ???????????
?????DT ???????? (1) L1large 1-sequences//
?????????? (2) FOR(k2Lk-1 ? ?k) DO BEGIN (3)
CkaprioriALL_generate(Lk-1) //
Ck??Lk-1????????? (4) FOR each
customer-sequence c in DT DO //???????????????c
(5) Sum the count of all candidates in Ck
that are contained in c //????c?Ck????????? (6)
Lk Candidates in Ck with minimum support //
LkCk???????????? (7) END (8) Answer Maximal
Sequences in ?kLk ????????????Apriori??????????,
??????????????????????????????,??????????
?6-2???????
35AprioriAll????
?? 6-1 ??????????3?????????????aprioriALL_gener
ate?????,????????6-10 (b)????????????L3?????,?????
??6-10 (c)??????????,??lt1,2,4,3gt,??lt2,4,3gt??L3?3??
?,??lt1,2,4,3gt??????????????????????,???????????4??
?????????4???,??lt1,2,4gt?lt1,3,5gt???,???WHERE???????
????
36AprioriAll????
??6-2????????????????6-11(a)??,????????????????,??
??????????????????????40(???????????)????????????
?1-??,????AprioriAll??????????????????6-11????????
37AprioriAll????
38AprioriAll????
??,AprioriAll?????????-k???,?L1?L2?L3?L4,?
??-k?????Maximal Sequences in ?kLk??,?????????????
??????? AprioriAll???Apriori?????,??????????
????????????????????????????6-8???????????,???????
???????????????6-11?,??L2??C3???,??lt2,3,4gt?lt2,4,3gt
????????,?????lt2,4,3gt????????????L3????????
Sequences
Support lt1,2,3,4gt 2 lt1,3,5gt
2 lt4,5gt
2
39??? ??????????? ????
- ????????
- ???????????
- ??ARMA?????????
- ???????????????????
- ???????????
- ??????????
- AprioriAll ??
- AprioriSome ??
- GSP??
40AprioriSome??
- AprioriSome???????AprioriAll?????,??????????
- ??????????????????????
- ??????????????????????
??6-3 AprioriSome?? ????????????????DT ????????
// Forward Phase ???? (1) L1 large
1-sequences//????????? (2) C1 L1 (3) last
1 //?????Clast (4) FOR(k 2Ck-1
? ? and Llast ??k)DO BEGIN (5) IF (Lk-1
know) THEN Ck New candidates generated from
Lk-1 //Ck???Lk-1????? (6) ELSE Ck New
candidates generated from Ck-1 //
Ck???Ck-1????? (7) IF (k next(last)) THEN
BEGIN (9) FOR each customer-sequence c in the
database DO //???????????????c (10) Sum
the count of all candidates in Ck that are
contained in c //????c??Ck????????? (11) Lk
Candidates in Ck with minimum support //
Lk?Ck???????????? (12) last k (13)
END (14)END
41AprioriSome??(?)
// Backward Phase ???? (15)FOR (k - - k gt
1 k - - ) DO (16)IF (Lk not found in forward
phase) THEN BEGIN // Lk???????????? (17)
Delete all sequences in Ck contained in Some Li,
i gt k //?????Ck????Lk????,igtk (18) FOR each
customer-sequence c in DT DO
//???DT?????????c (19) Sum the count of all
candidates in Ck that are contained in
c //??Ck????c??????????? (20) Lk Candidates
in Ck with minimum support // Lk
?Ck???????????? (21)END (22)ELSE Delete all
sequences in Lk contained in Some Li,i gt k// Lk
?? (23)Answer ?k Lk //?k?m?Lk??? ?????(fo
rward phase)?,??????????????????,??????????1?2?4?6
?????(?????),????3?5?????????????next?????????????
???,????????????????? ??6-4 next(k integer) IF
(hitk lt 0.666)THEN return k 1 ELSEIF (hit k lt
0.75)THEN return k 2 ELSEIF (hit k lt
0.80)THEN return k 3 ELSEIF (hit k lt
0.85)THEN return k 4 ELSE THEN return k
5 hitk?????k-??(large k-sequence)???k-??(candidat
e k-sequence)???,?Lk/Ck????????????????????,??
?????????????????????????????
?6-2???????
42AprioriSome??
?? 6-3 ??????AprioriAll??????6-11(a)????????Aprior
iSome?????????????L1(??6-9(b)?L1??)???next(k)2k,?
????C2????L2(??6-11(d)??L2??)???????,apriori_gener
ate???L2?????????C3??6-12(e)???C3????????????C3,??
????L3????apriori_generate???C3???C4,??????,??????
?6-9(i)???C4?????C4??L4(?6-12(i))??,??????C5,?????
???????????????6-12???
43AprioriSome??
44AprioriSome??
????????????6-13???
45AprioriAll?AprioriSome??
- AprioriAll?AprioriSome??
- AprioriAll?Lk-1????????Ck,?AprioriSome????Ck-1????
???? Ck.,??Ck-1??Lk-1,??AprioriSome?????????? - ??AprioriSome???????,?????????????,??????????????
- ??????,AprioriSome???????????????,(??????????)???,
????????????????????,???AprioriSome????AprioriAll?
?? - ????????, ???????, ????????????, ?? AprioriSome
????
46??? ??????????? ????
- ????????
- ???????????
- ??ARMA?????????
- ???????????????????
- ???????????
- ??????????
- AprioriAll ??
- AprioriSome ??
- GSP??
47GSP??
- GSP??????????
- ????????,?????1?????L1,????????
- ??????i ????Li ????????????????i1???????Ci1????
?????,??????????????,?????i1?????Li1,??Li1?????
?? - ??????,????????????????????????
- ??,?????????????
- ????????????S1?????????????S2???????????????,????
S1?S2????,??S2??????????S1?? - ????????????????????????,????????????????,???????
?????? - ????????????????????
- ?????????????C,???????DT,??????????d,????C??d?????
???????,??????????
48GSP??
??6-5 GSP?? ????????????????DT? ?????? (1) L1
large 1-sequences// ?????????? (2) FOR (k
2Lk-1 ? ?k) DO BEGIN (3) Ck
GSPgenerate(Lk-1) (4) FOR each
customer-sequence c in the database DT DO (5)
Increment the count of all candidates in Ck
that are contained in c (6) Lk Candidates
in Ck with minimum support (7) END (8) Answer
Maximal Sequences in ?kLk
49GSP????
?? 6-5 ?6-9???????3??????????4???????????
?????,??lt(1,2),3gt???lt2,(3,4)gt??,??lt(,2),3gt?lt2,(3
,)gt????,???????lt(1,2),(3,4)gtlt(1,2),3gt?lt2,3,5gt??,
??lt(1,2),3,5gt???????????????3??????,??lt(1,2),4gt???
?????3?????,??????????lt(2),(4,)gt??lt(2),(4)()gt???
? ?6-9 GSP???? ?????lt(1,2),3,5gt????,????
lt1,3,5gt???L3?,?lt(1,2),(3,4)gt????3??????L3????????
Sequential patterns With Length 3 Candidate4-Sequences Candidate4-Sequences
Sequential patterns With Length 3 After Join After Pruning
lt(1,2),3gt lt(1,2),4gt lt1,(3,4)gt lt(1,3),5gt lt2,(3,4)gt lt2,3,5gt lt(1,2),(3,4)gt lt(1,2),3,5gt lt(1,2),(3,4)gt
50??? ??????????? ????
- ????????
- ???????????
- ??ARMA?????????
- ???????????????????
- ???????????
- ??????????
- AprioriAll ??
- AprioriSome ??
- GSP??
51