Why Not Store Everything in Main Memory? Why use disks? - PowerPoint PPT Presentation

Loading...

PPT – Why Not Store Everything in Main Memory? Why use disks? PowerPoint presentation | free to download - id: 6f6429-YzBhY



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Why Not Store Everything in Main Memory? Why use disks?

Description:

fR 0 0 0 Using a 3-dim DSR(Document Sender Receiver) matrix and 2-dim TD(Term,Doc) and UT(User Term) matrixes. 0 0 0 0 0 0 fR 0 0 0 The pSVD trick is to replace these ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 10
Provided by: William1145
Category:
Tags: disks | everything | form | main | memory | store | use

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Why Not Store Everything in Main Memory? Why use disks?


1
Communications Analytics prediction and anomaly
detection for emails, tweets, phone, text
Using a 3-dim DSR(Document Sender Receiver)
matrix and 2-dim TD(Term,Doc) and UT(User Term)
matrixes.
The pSVD trick is to replace these massive
relationship matrixes with small feature matrixes.
Using just one feature, replace with vectors,
ffDfTfUfSfR or ffDfTfU
?rec?
DSR
? sender ?
Replace DSR with fD, fS, fR
TD
Replace TD with fT and fD
UT
Replace UT with fU and fT feature matrixes (2
features)
We do pTrees conversions and train F in the
CLOUD then download the resulting F to user's
personal devices for predictions, anomaly
detections. The same setup should work for phone
record Documents, tweet Documents (in the US
Library of Congress) and text Documents, etc.
2
3DTU Structure relationship as a rotatable
matrix, then create PTreeSets for each rotation
(attach entity tbl PTreeSet to its
rotation Always treat an entity as an attr of
another entity if possible? Rather than add it
as a new dimension of a matrix? E.g., Treat
Sender as a Document attribute instead of as the
3rd dim of matix DSR. The reason Sender is a
candidate key for Doc (while Receiver is
not). (Problem to solve mechanism for SVD
prediction of Sender?)
DR
Sender CT LN 1 3 1 2 1 2
1
0
1
1
1
2
D
1
2
3
T
UT
U
3
5
4
1
2
1
2
1
Only provide blankmask when blanks
pTrees might be provided for DST (SendTime) and
D(LN (Length)
3
Here we try a comprehensive comparison of the 3
alternatives, 3D (DSR) 2D (DS, DR) DTU(2D)
em9 em10
4
Comprehensive comparison of 3 alternatives DTU


em11
3D(DTD,TDT,TUT,UUT,DDSR,SDSR,RDSR)
2D(DT,UT,DS,DR)
5
pSVD for Communication Analytics, f
sse?nbDSR(dsr-DSRdsr)2
sse?nbTD(td-TDtd)2
sse?nbUT(ut-UTut)2
?sse?d2?nbTD(td-TDtd)t
?sse?u2?nbUT(ut-UTtd)t
?sse?d2?nbDSR(dsr-DSRdsr)sr
?sse?t2?nbTD(td-TDtd)d
?sse?t2?nbUT(ut-UTtd)u
?sse?s2?nbDSR(dsr-DSRdsr)dr
?sse?r2?nbDSR(dsr-DSRdssr)ds
pSVD classification predicts blank cell values.
pSVD FAUST Cluster Use pSVD to speed up FAUST
cluster by looking for gaps in TD rather than TD
(i.e., using SVD predicted values rather than
actual given TD values). The same goes for DT,
UT, TU, DSR, SDR, RDS.
E.g., on the T(d1,...,dn) table, the tth row is
pSVD estimated as (ftd1,...,ftdn) and the dot
product vot is pSVD estimated as ?k1..n vkftdk

So we analyze gaps in this column of values taken
over all rows, t.
pSVD FAUST Classification Use pSVD to speed up
FAUST Classification by finding optimal cutpoints
in TD rather than TD (i.e., using SVD predicted
values rather than actual given TD values). Same
goes for DT, UT, TU, DSR, SDR, RDS.
6
Recalling the massive interconnection of
relationships between entities, any analysis we
do on this we can do after estimating each matrix
using pSVD trained feature vectors for the
entities.
On the next slide we display the pSVD1 (one
feature) replacement by a feature vector which
approximates the non-blank cell values and
predicts the blanks.
cust item card
termdoc card
authordoc card
genegene card (ppi)
docdoc
People ?
expPI card
expgene card
genegene card (ppi)
7
On this slide we display the pSVD1 (one feature)
replacement by a feature vector which
approximates the non-blank cell values and
predicts the blanks.
Doc Sender Receiver
UT
CI
AD
TD
Enroll
GG1
DD
UserMovie ratings
ExpG
ExpPI
TermTerm
GG2
8
A n-dim vector space, RC(C1,...,Cn) is a matrix
or TwoEntityRelationship (with row entity
instances R1...RN and column entity instances
C1...Cn.) ARC will denote the pSVD
approximation of RC
A Nn vector, f(fR, fC) defines prediction,
pi,jfRifCj, error, ei,jpi,j-RCi,j then
ARCf,i,jfRifCj and ARCf,row_i fRifC
fRi(fC1...fCn) (fRifC1...fRifCn). Use sse
gradient descent to train f.
RC C1 C2 ... Cn R1 R2 . . . RN
0
1
0
0
0
0
0
1
0
0
1
0
0
0
0
1
Compute fCodt?k1..nfCkdk form constant SPTS
with it, and multiply that SPTS by SPTS, fR.
Any datamining that can be done on RC can be done
using this pSVD approximation of RC, ARC e.g.,
FAUST Oblique (because ARCodt should show us the
large gaps quite faithfully).
Given any K?(Nn) feature matrix, FFR FC,
FRi(f1Ri...fKRi), FCj(f1Cj...fKCj)
pi,jfRiofCj?k1..KfkRifkCj
Keeping in mind that we have decided
(tentatively) to approach all matrixes as
rotatable tables, this then is a universal method
of approximation. The big question is, how good
is the approximation for data mining? It is
known to be good for Netflix type recommender
matrixes but what about others?
9
Of course if we take the previous data (all
nonblarnks1. and we only count errors in those
nonblarnks, then f pure1 is error0. But of
course, if it is an image (fax-type image of 0/1)
then there are no blanks (and zero positions must
be assessed error too). So we change the data.
t sse .13 .2815
1 2 3 4 5 6 7 8 9 a b 1 1 2 5 3 2
3 4 5 6 3 7 2 8 9 4
3 10 1 11
3 4 12 13 1 14 5 15 2
About PowerShow.com