SNFS: The design and implementation of a Social Network File System - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

SNFS: The design and implementation of a Social Network File System

Description:

Title: Arc Subject: simple background Keywords: simple, background, template, Impress Description: Ochi Takashi (hagetaka0) Last modified by: Peter Triantafillou – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 29
Provided by: researchM120
Category:

less

Transcript and Presenter's Notes

Title: SNFS: The design and implementation of a Social Network File System


1
SNFS The design and implementationof a Social
Network File System
  • Ch. Kaidos, A. Pasiopoulos N. Ntarmos,
  • P. Triantafillou
  • University of Patras

2
Shameless plug..
  • If interested, please check out
  • eXO Decentralized Autonomous Scalable Social
    Networking,
  • 5th Conference on Innovative Data Systems
    Research (CIDR2011), 2011.

3
Social Networks
  • Our Take
  • Search for
  • People (friends, experts, )
  • Content (books, photos, videos, blogs, websites,
    )
  • Form entities (collections)
  • Friends-lists, content-libs
  • Search for
  • entities
  • Using previously-formed collections
  • SNFS currently provides the foundation for these

Social Networks
4
Tagging
  • Profiles
  • sets of tags describing entities.
  • Search for
  • based on profiles.
  • Ranked retrieval (top-k)

Tag 1 Tag 2
Tag 3 Tag 4 Tag 5
5
Current State
  • 5,000,000,000 photos
  • 3,000 photos/min (as of September 2010)
  • 2,000,000,000 videos served up each day
  • (May 2010)
  • 600,000,000 monthly active users (January 2011)
  • 15,000,000 books (October 2010)
  • 130,000,000 by the end of the decade

6
Current State
  • Need to access published content
  • 22,750,000,000 queries in search engines
  • 4,000,000,000 queries in YouTube
  • 351,000,000 queries in Facebook
  • 416,000,000 queries in MySpace
  • (U.S. market figures, December 2009)

?
7
Current State
How do I provide intresting objects to my users?
How do I find stuff I want?
8
Proposal
A content-aware file system for Social
Network Systems
Usefull to users...
... And service providers too!
9
Previous Work on File Indexing
1991 Semantic File Systems by Gifford
1996 BeFS by Giampaolo and Meurillon, part of
the BeOS
BeOS never had commercial success...
1998 Indexing Service on Windows NT, not needed
at the time
Remnant of the Object File System from the
unmaterialized Cairo project
  • Typically
  • no ranked retrieval
  • No users input (tags)
  • No user relationships

10
Desktop Searches
2004 Windows Desktop Search, widely popular
2005... Mac OS X's Spotlight, Google Desktop,
Beagle, Strigi, Tracker...
  • Typically
  • no ranked retrieval ?
  • No user relationships
  • no exploits from relations for searching

11
Problems
Power tools for power users... But for average
users...
Boolean operators??? SQL like queries???
12
Previous Work on Ranked Retrieval
1968 SMART system by Salton, introduced weights
in retrieval, instead of classical Boolean
retrieval
1975 Vectors and cosine similarity by Salton
1988 Other functions for similarity tested and
evaluated by Salton and Buckley
2003 Fagin proposes and compares several
efficient algorithms for top-k retrieval
13
Design
14
Design SNFS
  • Tags are extracted from object, stemmed and
    frequency is counted

Each object is associated with a unique id in a
Tree
Weights for each tag and document are calculated
A tf-idf weighting scheme was chosen
15
Design SNFS
  • Term Weight and Object ID are stored in an
    inverted index

Each posting list of the index is a BTree stored
in secondary memory
The position of the root of the BTree in the
index is stored in a Red Black Tree
16
Design Search and retrieval
  • The query is split in terms and stemmed

The score of each document is calculated using a
threshold algorithm and a tf-idf function
17
Threshold Algorithms
Input Posting lists sorted on weight (decreasing)
NRA (No Random Access) Algorithm
Score
Doc ID
Doc ID
d1
s1
t1
d1
d4
d2
s2
s6
s7
d2
t2
s3
s8
d2
d5
d3
d3
s4
s9
d4
t3
d2
d4
d3
s5
d5
depth
1
2
3
Threshold
s1s2s3
t1
s4s5s6
s7s8s9
When no score bellow the top-k objects can be
improved to exceed the threshold the algorithm
halts
18
Threshold Algorithms
Input Posting lists sorted on weight (decreasing)
TA (Threshold Algorithm with random accesses)
Score
Doc ID
Doc ID
d1
s1
t1
d1
d4
d2
d5
s2
s6
s7
d2
t2
s3
s8
d2
d5
d3
d3
s4
s9
d4
t3
d2
d4
d3
s5
s10
d5
1
2
3
depth
Threshold
s1s2s3
s4s5s6
s7s8s9
When score of the last object is bellow threshold
the algorithm halts
19
Qualitative Comparison
NRA
TA
Disk Accesses
System Calls
State Keeping and computation
We expect TA to perform many more slow disk
accesses Can NRA's large state keeping keeping
and computation need overcome TA's disk accesses?
We implement both, on hard disk and on RAM-disk
to find out...
20
Implementation with FUSE
21
Testing
  • - 4 real world test sets
  • - files containing tags from online objects
  • - index is normally on secondary memory
  • - ram-disk used to evaluate the effect of disk
    accesses

22
Results demanded vs Time
Disk based index
TA
NRA
23
Results demanded vs Time
RAM based index
TA
NRA
24
Query Terms vs Time
Disk based index
TA
NRA
25
Query Terms vs Time
RAM based index
TA
NRA
26
Beagle vs NRA
Terms vs time
Results vs time
27
Conclusions
  • SNFS
  • - Indexing, storage, and ranked retrieval of
    entities in a SN.
  • - Study of efficiency of algorithms and
    implementations, using real-world data, and
    various implementations.
  • - Competitive performance, (eg against Beagle).
  • - Many ways of further expansion

28
Future Work
  • - Expansion for distributed systems and clouds
  • - Distributed file systems (HDFS)
  • - Distributed data structures
  • - Tagging, Indexing, and searching for
    entity-collections straightforward, as our
    object implementation/abstraction captures
    this.
  • Establishing entities consisting of relationships
    between entities, using advanced-tagging, and
    searching for these
Write a Comment
User Comments (0)
About PowerShow.com