Effective Change Detection Using Sampling - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Effective Change Detection Using Sampling

Description:

Problem: We have only 5-download-cycle data. Solution: Extrapolate the history. Repeat ... Greedy is easy to implement and shows high performance ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 32

Provided by: wind358

Category:

more less

Transcript and Presenter's Notes

Title: Effective Change Detection Using Sampling

1
Effective Change Detection Using Sampling

Junghoo John Cho
Alexandros Ntoulas
UCLA

2
Problem
Polling
Update
Query
Remote database
Local database

Application
Web search engines/crawlers
Web archive
Data warehouse
. . .

3
Existing Approach

Round robin
Download pages in a round robin manner
Change-frequency based CLW98, CGM00, EMT01
Estimate the change frequency
Adjust download frequency
Proven to be optimal

4
Our Approach

Sampling-based
Sample k pages from each source
Download more pages from the source with more
changed samples

5
Comparison

Frequency based
Proven to be optimal
Change history required
Difficult to estimate change frequency
Sampling based
Can be worse than frequency based policy
No history/frequency-estimation required
Experimental comparison later

6
Questions

Are we assuming correlation?
How to use sampling results?
Proportional vs Greedy
How many samples?
Dynamic sample size adjustment?
What if we have very limited resources?

7
Is Correlation Necessary?

Random sampling
Correlation not necessary. Only random sampling
More discussion later

4/5
1/5
8
Questions

Are we assuming correlation?
How to use sampling results?
Proportional vs Greedy
How many samples?
Dynamic sample size adjustment?
What if we have very limited resources?

9
Download Model (1)

Fixed download cycle
Say, once a month
Fixed download resources in each cycle
Say, 100,000 page download every month
Goal
Download as many changes as we can
ChangeRatio
No of changed downloaded pages
No of downloaded pages

10
Download Model (2)

Two-stage sampling policy
Sampling stage
Download stage
Sampling requires page download

11
How to Use Sampling Result?

Sites A and B, each with 20 pages
20 total download, 5 samples from each site
10 page download remaining

1/5
A
B
4/5
12
Proportional Policy

Download pages proportionally to the detected
changes
8 pages from A, 2 pages from B

1/5
A
B
4/5
13
Greedy Policy

Download pages from the sites with most changes
10 pages from A

1/5
A
B
4/5
14
Optimality of Greedy

Theorem
Greedy is optimal if we make download decisions
purely based on sampling results
Probabilistic optimality for their expected values

15
Questions

Are we assuming correlation?
How to use sampling results?
Proportional vs Greedy
How many samples?
Dynamic sample size adjustment?
What if we have very limited resources?

16
How Many Samples?

Too few samples
Inaccurate change estimates
Too many samples
Waste of resources for sampling
How to determine optimal sample size?

17
Optimal Sample Size

Factors to consider
Total number of pages that we maintain
Number of pages that we can download in the
current cycle
Number of pages in each Web site
Change distribution
Scenario 1 -- A 90/100, B 10/100
Scenario 2 -- A 60/100, B 40/100

18
Change Fraction Distribution
fraction of sites
?
?t

ri fraction of changed pages in site i
f(r) distribution of r values

19
Optimal Sample Size

N no of pages in a site
r no of pages to download / no of pages we
maintain
Analysis is complex
is a good rule of thumb

20
Dynamic Sample Size?

Do we need the same sample size for every site?
A ? 0, B ? 0.45, C ? 0.55, D ? 1

21
Adaptive Sampling

If the estimated r is high/low enough, make an
early decision
What does high enough mean?
Confidence interval above threshold

?
?t
22
In the Paper

More details on
Optimal sample size
Adaptive policy
The cases where resource is too limited for
sampling

23
Experiments

353,000 pages from 252 sites
Mostly popular sites
Yahoo, CNN, Microsoft,
1400 pages from each site
Followed the links in the breadth-first manner
Monthly change history for 6 months
5 download cycles
In experiments, 100,000 page downloads in each
download cycle

24
Comparison of Policies
ChangeRatio
25
Optimal Sample Size
ChangeRatio
Sample Size
26
Comparison of Long-Term Performance

Problem We have only 5-download-cycle data
Solution Extrapolate the history

?
27
Frequency vs. Sampling
ChangeRatio
Frequency
Greedy
Download Cycle
28
Related Work

Frequency-based policy
Coffman et al., Journal of Scheduling 1998
Cho et al., SIGMOD 2000
Edwards et al., WWW 2001
Source cooperation
Olston et al., SIGMOD 2002

29
Conclusion

Sampling-based policy
Great short-term performance
No change history required
Frequency-based policy
Potentially good long-term performance if the
change frequency does not change
Greedy is easy to implement and shows high
performance

30
Future Work

Combination of sampling and frequency based
policies
Switch to the frequency-based policy after a
while
Good partitioning for sampling?
Site based? Directory based?
Content based?
Link-structure based?

31
Questions?

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

World's Best PowerPoint Templates PowerPoint PPT Presentation

World's Best PowerPoint Templates - CrystalGraphics offers more PowerPoint templates than anyone else in the world, with over 4 million to choose from. Winner of the Standing Ovation Award for “Best PowerPoint Templates” from Presentations Magazine. They'll give your presentations a professional, memorable appearance - the kind of sophisticated look that today's audiences expect. Boasting an impressive range of designs, they will support your presentations with inspiring background photos or videos that support your themes, set the right mood, enhance your credibility and inspire your audiences.

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware PowerPoint PPT Presentation

Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware - Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering | PowerPoint PPT presentation | free to view

Applications of change point detection in Gravitational Wave Data Analysis PowerPoint PPT Presentation

Applications of change point detection in Gravitational Wave Data Analysis - Applications of change point detection in Gravitational Wave Data Analysis Soumya D. Mohanty AEI | PowerPoint PPT presentation | free to view

Landsat Change Detection of Forests with a Modified Enhancement Classification Methodology PowerPoint PPT Presentation

Landsat Change Detection of Forests with a Modified Enhancement Classification Methodology - Landsat Change Detection of Forests with a Modified Enhancement Classification Methodology | PowerPoint PPT presentation | free to view

DETECTION OF MAJOR DISTURBANCES AND OPTIMIZATION OF TRANSMISSION LINE PROTECTIVE RELAYING OPERATIONS USING NEURAL NETWORKS PowerPoint PPT Presentation

DETECTION OF MAJOR DISTURBANCES AND OPTIMIZATION OF TRANSMISSION LINE PROTECTIVE RELAYING OPERATIONS USING NEURAL NETWORKS - detection of major disturbances and optimization of transmission line protective relaying operations using neural networks cesar rincon louisiana state university | PowerPoint PPT presentation | free to view

The role of attentional breadth in perceptual change detection PowerPoint PPT Presentation

The role of attentional breadth in perceptual change detection - The role of attentional breadth in perceptual change detection Professor: Liu Student: Ruby | PowerPoint PPT presentation | free to view

Estimating Interaction Effects Using Multiple Regression PowerPoint PPT Presentation

Estimating Interaction Effects Using Multiple Regression - Title: Organizational Change and Learning Author: UCD Last modified by: Herman Aguinis Created Date: 6/7/1997 3:01:10 PM Document presentation format | PowerPoint PPT presentation | free to view

Bayesian Inference for Signal Detection Models of Recognition Memory PowerPoint PPT Presentation

Bayesian Inference for Signal Detection Models of Recognition Memory - Bayesian Inference for Signal Detection Models of Recognition Memory Michael Lee Department of Cognitive Sciences University California Irvine mdlee@uci.edu | PowerPoint PPT presentation | free to view

SPLIT PERSONALITY MALWARE DETECTION AND DEFEATING IN POPULAR VIRTUAL MACHINES PowerPoint PPT Presentation

SPLIT PERSONALITY MALWARE DETECTION AND DEFEATING IN POPULAR VIRTUAL MACHINES - SPLIT PERSONALITY MALWARE DETECTION AND DEFEATING IN POPULAR VIRTUAL MACHINES Alwyn Roshan Pais Alwyn.pais@gmail.com Department of Computer Science & Engineering | PowerPoint PPT presentation | free to view

Analysis of techniques for automatic detection and quantification of stiction in control loops PowerPoint PPT Presentation

Analysis of techniques for automatic detection and quantification of stiction in control loops - Title: Analysis of techniques for automatic detection and quantification of stiction in control loops Author: Student Last modified by: Student Created Date | PowerPoint PPT presentation | free to view

Toward Safe and Effective Wireless Medical Devices and Systems PowerPoint PPT Presentation

Toward Safe and Effective Wireless Medical Devices and Systems - Toward Safe and Effective Wireless Medical Devices and Systems Donald Witters Office of Science and Engineering Laboratories Center for Devices and Radiological Health | PowerPoint PPT presentation | free to view

Addressing Stress and Addictive Behavior in the Natural Environment Using AutoSense PowerPoint PPT Presentation

Addressing Stress and Addictive Behavior in the Natural Environment Using AutoSense - Addressing Stress and Addictive Behavior in the Natural Environment Using ... study Evaluated their concern level as their personal stake in ... | PowerPoint PPT presentation | free to view

Using Transactional Analysis for Effective Fraud Detection PowerPoint PPT Presentation

Using Transactional Analysis for Effective Fraud Detection - Indirect costs: image, morale. Direct costs: 6% revenue loss each year = $660 billion in U.S. (ACFE) Compliance/Standards ... but is under-used ... | PowerPoint PPT presentation | free to view

A Supervised Approach for Detecting Boundaries in Music using Difference Features and Boosting PowerPoint PPT Presentation

A Supervised Approach for Detecting Boundaries in Music using Difference Features and Boosting - A Supervised Approach for Detecting Boundaries in Music using Difference Features and Boosting Douglas Turnbull Computer Audition Lab UC San Diego, USA | PowerPoint PPT presentation | free to view

Speech enhancement in nonstationary noise environments using noise properties PowerPoint PPT Presentation

Speech enhancement in nonstationary noise environments using noise properties - Speech enhancement in nonstationary noise environments using noise properties Kotta Manohar, Preeti Rao Department of Electrical Engineering, Indian Institute of ... | PowerPoint PPT presentation | free to view

Using the Repeated Two-sample Rank Procedure for Detecting Anomalies in Space and Time PowerPoint PPT Presentation

Using the Repeated Two-sample Rank Procedure for Detecting Anomalies in Space and Time - Using the Repeated Two-sample Rank Procedure for Detecting Anomalies in Space and Time Ronald D. Fricker, Jr. University of California, Riverside | PowerPoint PPT presentation | free to view

EECS 598 Week 13 Single spin detection by magnetic resonance force microscopy PowerPoint PPT Presentation

EECS 598 Week 13 Single spin detection by magnetic resonance force microscopy - EECS 598 Week 13 Single spin detection by magnetic resonance force microscopy Paul Lee Wayne Fung George Ioannou Smitesh Bakrania | PowerPoint PPT presentation | free to view

Secondary Forest Mapping Using Sentinel-2 MSI Imagery - Secondary Forest Mapping Using Sentinel-2 MSI Imagery | PowerPoint PPT presentation | free to view

DEVELOPMENTS IN TREND DETECTION IN AQUATIC SURVEYS PowerPoint PPT Presentation

DEVELOPMENTS IN TREND DETECTION IN AQUATIC SURVEYS - DEVELOPMENTS IN TREND DETECTION IN AQUATIC SURVEYS N. Scott Urquhart STARMAP Program Director Department of Statistics Colorado State University | PowerPoint PPT presentation | free to view

Using Genomics in Clinical Trial Design PowerPoint PPT Presentation

Using Genomics in Clinical Trial Design - Sample Size Planning for Targeted Clinical Trials ... Clinical Cancer Research 11:7872-8, 2005. Adaptive Signature Design. End of Trial Analysis ... | PowerPoint PPT presentation | free to view

Liquidborne Particle Counting using Light Obscuration and Light Scattering Methods PowerPoint PPT Presentation

Liquidborne Particle Counting using Light Obscuration and Light Scattering Methods - Liquidborne Particle Counting using Light Obscuration and Light Scattering Methods * New Sampling Probe 3 probes available Tare ID =1.2 ml tare volume 1/16 ... | PowerPoint PPT presentation | free to view

geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets PowerPoint PPT Presentation

geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets - geneCBR a case-base reasoning tool for cancer diagnosis using microarray datasets dr. florentino fdez-riverola university of vigo Computer System of New Generation | PowerPoint PPT presentation | free to view

Reversed-Phase HPLC Analysis of Aminoglycoside Antibiotics Using Evaporative Light Scattering Detection PowerPoint PPT Presentation

Reversed-Phase HPLC Analysis of Aminoglycoside Antibiotics Using Evaporative Light Scattering Detection - ... does not have the same selectivity for other aminoglycoside antibiotics and is not effective in improving retention.2 Perfluorinated acids have been ... | PowerPoint PPT presentation | free to view

Sequential analysis: balancing the tradeoff between detection accuracy and detection delay PowerPoint PPT Presentation

Sequential analysis: balancing the tradeoff between detection accuracy and detection delay - worm propagates its effect. Sequential analysis is well-suited ... Arrow, K., Blackwell, D., Girshik, Ann. Math. Stat., 1949. ... | PowerPoint PPT presentation | free to view

DISTANCE DETECTION PowerPoint PPT Presentation

DISTANCE DETECTION - DISTANCE DETECTION EE 594 - Consumer Optoelectronics ... Coherent light source sample of change in distance over a period of time Accuracy of reading are affected by: ... | PowerPoint PPT presentation | free to view

Using Field Analytical Methods for Site Investigation Presented At: Pan American Studies Institute C PowerPoint PPT Presentation

Using Field Analytical Methods for Site Investigation Presented At: Pan American Studies Institute C - Using Field Analytical Methods for Site Investigation Presented At: Pan American Studies Institute C | PowerPoint PPT presentation | free to view

PS3012: Advanced Research Methods Lecture 9: Psychophysics, psychophysical methods, and signal detection theory PowerPoint PPT Presentation

PS3012: Advanced Research Methods Lecture 9: Psychophysics, psychophysical methods, and signal detection theory - PS3012: Advanced Research Methods Lecture 9: Psychophysics, psychophysical methods, and signal detection theory Jonas Larsson Department of Psychology | PowerPoint PPT presentation | free to view