Proxy Servers - PowerPoint PPT Presentation

About This Presentation
Title:

Proxy Servers

Description:

Decides if request will go on to the server. May have cache & may respond from cache ... C-country: 1-Can, 4-Afghan, etc. Etc. Ratings(su 0019186 aud 3:5 Ty 3 C 1) 31 ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 43
Provided by: carolyn97
Category:
Tags: afghan | proxy | servers

less

Transcript and Presenter's Notes

Title: Proxy Servers


1
Proxy Servers
2
What Is a Proxy Server?
  • Intermediary server between clients and the
    actual server
  • Proxy processes request
  • Proxy processes response
  • Intranet proxy may restrict all outbound/inbound
    requests the intranet server

3
What Does a Proxy Server Do?
  • Between client and server
  • Receives the client request
  • Decides if request will go on to the server
  • May have cache may respond from cache
  • Acts as the client with respect to the server
  • Uses one of its own IP addresses to get page
    from server

4
Usual Uses for Proxies
  • Firewalls
  • Employee web use control (email etc.)
  • Web content filtering (kids)
  • Black lists (sites not allowed)
  • White lists (sites allowed)
  • Keyword filtering of page content

5
User Perspective
  • Proxy is invisible to the client
  • IP address of proxy is the one used or the
    browser is configured to go there
  • Speed up retrieval if using caching
  • Can implement profiles or personalization

6
Main Proxy Functions
  • Caching
  • Firewall
  • Filtering
  • Logging

7
Web Cache Proxy
  • Our concern is not with browser cache!
  • Store frequently used pages at proxy rather than
    request the server to find or create again
  • Why?
  • Reduce latency faster to get from proxy so
    makes the server seem more responsive
  • Reduce traffic reduces traffic to actual server

8
Proxy Caches
  • Proxy cache serves hundreds/thousands of users
  • Corporate and intranets often use
  • Most popular requests are generated only once
  • Good news
  • Proxy cache hit rates often hit 50
  • Bad news
  • Stale content (stock quotes)

9
How Does a Web Cache Work?
  • Set of rules in either or both
  • Proxy admin
  • HTTP header

10
Dont Cache Rules
  • HTTP header
  • Cache-control max-agexxx, must-revalidate
  • Expires date
  • Last-modified date
  • Pragma no-cache (doesnt always work!)
  • Object is authenticated or secure
  • Fails proxy filter rules
  • URL
  • Meta data
  • MIME type
  • Contents

11
Getting From Cache
  • Use cache copy if it is fresh
  • Within date constraint
  • Used recently and modified date is not recent

12
2. Firewalls
  • Proxies for security protection
  • More on this later

13
3. Filtering at the Proxy
  • URL lists (black and white lists)
  • Meta data
  • Content filters

14
Filtering
label base
Web doc
URL lists
keywords
URLs
ratings
URLs
ratings
15
The Problem the Web
  • 1 billion documents (April 2000)
  • Average query is 2 words (e.g., Sara name)
  • Continual growth
  • Balance global indexing and access and
    unintentional access to inappropriate material

16
Filtering Application Types
  • Proxies
  • Black lists
  • White lists
  • Keyword profiles
  • Labels

17
Black and White Lists
  • Black list URLs proxy will not access
  • White list URLs proxy will allow access

18
How Is Filtering/selection Done?
  • Build a profile of preferences
  • Match input against the profile using rules

19
Black and White Lists
  • Black list of URLs
  • No access allowed
  • White list of URLs
  • Access permitted

20
Lists in Action
  • 1 billion documents!
  • Who builds the lists
  • Who updates them
  • Frequency of updates

21
Labels
  • Metadata tags
  • Rule driven PICS rules for example
  • Labels are part of document or separate
  • Separate label bureau

22
Labels
  • Metadata (goes with page)
  • Label Bureau (stored separately from page)

23
Meta Data as part of HTML doc
  • ltHTMLgt
  • ltHEADgt
  • ltMETA
  • HTTP-EQUIVkeywords CONTENTfederalgt
  • ltMETA
  • HTTP-EQUIVkeywords
  • CONTENTtaxgt
  • lt/HEADgt
  • lt/HTMLgt
  • Browser and/or proxy interpret the metadata

24
Metadata Apart From Doc
  • Label bureaus
  • Request for a doc is also a request for labels
    from one or more label bureaus
  • Who makes the labels
  • Text analysis
  • Community of users
  • Creator of document

25
Labels Collaborative Filtering
Search Engine
Label Bureau B
Labels
Author Labels
Label Bureau A
Web Site
Rating Service
26
PICS and PICS Rules
  • Tools for communities to use profiles and
    control/direct access
  • Structure designed by W3 consortium
  • Content designed by communities of users

27
PICS Rating Data
  • (PICS1-1 http//www.abc.org/r1.5
  • by John Doe
  • labels on 1998.11.05
  • until 2000.11.01
  • for http//www.xyz.com/new.html
  • ratings (violence 2 blood 1 language 4)
  • )

28
Using a URL List Filtering
  • (PicsRule-1.1
  • (Policy (RejectByURL (http//www.xyz.com/)
  • Policy (AcceptIf otherwise)
  • )
  • )

29
Using the PICS Data
  • (PicsRule-1.1
  • (serviceinfo (
  • http//www.lablist.org/ratings/v1.html
  • shortname PTA
  • bureauURL http//www.lablist.org/ratings
  • UseEmbedded N
  • )
  • Policy (RejectIf ((PTA.violence gt3) or
    (PTA.language gt2)))
  • Policy (AcceptIf otherwise)
  • )
  • )

30
Example Medical PICS labels
  • Su UMLS vocab word 0-9999999
  • Aud- audience 1-patient, 3-para, 5-GP, etc.
  • Ty-information type 5-scientist, 3-patient,
    4-prod
  • C-country 1-Can, 4-Afghan, etc.
  • Etc.
  • Ratings(su 0019186 aud 35 Ty 3 C 1)

31
User Profiles for Labels
  • Rules for interpreting ratings
  • Based on
  • User preferences
  • User access privileges
  • Who keeps these
  • Who updates these
  • How fine is the granularity

32
Labels and Digital Signatures
  • Labels can also be used to carry digital
  • Signature and authority information

33
Example
  • (''byKey'' ((''N'' ''aba21241241'')
  • (''E'' ''abcdefghijklmnop'')))
  • (''on'' ''1996.12.02T2220-0000'')
  • (''SigCrypto'' ''aba1241241''))
  • (''Signature'' ''http//www.w3.org/TR/1998/REC-DS
    ig-label/DSS-1_0''
  • (''ByName'' ''plipp_at_iaik.tu-graz.ac.at'')
  • (''on'' ''1996.12.02T2220-0000'')
  • (''SigCrypto'' ((''R'' ''aba124124156'')
  • (''S'' ''casdfkl3r489'')))
    ))

34
Proxy level (hidden)
35
Text analysis of Page content
  • Proxy examines text of page before showing it
  • Generally keyword based
  • Profile of black and/or white keywords

36
Profiles for Text analysis
  • Keywords ( weights sometimes)
  • Reflect interest of user or user group
  • May be used to eliminate pages
  • All but
  • May be used to select pages
  • Only those

37
Keyword matching algorithms
  • Extract keywords
  • Eliminate noisy words with stop list (1/3)
  • Stem (computer compute computation)
  • Match to profile
  • Evaluate value of match
  • Check against a threshold for match
  • Show or throw!

38
Stop List (35)
  • the for
  • of on
  • and is
  • to with
  • in by
  • a as
  • be this
  • will are
  • from that
  • or at
  • been an
  • was were
  • have has
  • it
  • (27 words)

39
Matching Profile to Page
  • Similarity?
  • How many profile terms occur in doc?
  • How often?
  • How many docs does term occur in?
  • How important is the term to the profile?

40
Cosine Similarity Measurement
  • Profile terms weighted PW (0,1) ? importance
  • Document terms weighted TW (0,1)
  • frequency in doc
  • frequency in whole set
  • Overall closeness of doc to profile
  • ?(all profile terms)TW PW
  • --------------------------------------------
  • ?(?(all profile terms)TW2PW2)

41
What works well?
Nothing
42
Whats the problem?
  • Site Labels
  • Who does them?
  • Are they authentic?
  • Has the source changed?
  • A billion docs?
  • Black and White lists
  • Ditto
  • Text analysis of page contents
  • Poor results
Write a Comment
User Comments (0)
About PowerShow.com