Analyzing Web Server Log Files - PowerPoint PPT Presentation

About This Presentation
Title:

Analyzing Web Server Log Files

Description:

Analyzing Web Server Log Files – PowerPoint PPT presentation

Number of Views:53
Slides: 48
Provided by: denmilu
Category:
Tags:

less

Transcript and Presenter's Notes

Title: Analyzing Web Server Log Files


1
Analyzing Web Server Log Files
  • Eric Landrieu
  • e-mail eland_at_perfman.com
  • Lead Developer, PerfMan for Web Servers
  • The Information Systems Manager, Inc.

2
Growth of Web Server
  • Has become a vital part of the business model
  • Internet web servers must be reliable, as they
    are truly an international 24x7x365 sales
    mechanism
  • Content of site(s) can be just as damaging in
    users eyes as poor performance we have a
    2-edged sword

3
So how do we monitor the web server?
  • OS-level tools
  • Performance Monitor (Windows NT)
  • SMF, RMF (OS/390)
  • Third-party offerings
  • Active web site monitors (give a client-side
    view of the site)
  • Database/Application monitoring
  • Web server log files

4
So how do we monitor the web server?
  • No one method can give you the whole picture on
    your web servers health and performance

5
Whats in the Log Files?
  • View of client-server transactions client
    request, with the server response
  • Multiple transactions can be required for a web
    page

GET /parking/space.asp
404 File Not Found
6
Whats in the Log Files?
  • Each transaction is totally separate in the log
    file
  • Any user-level data must be manually grouped
    using criteria available in the particular log
    file

7
So what is in these log files?
8
Information in the log files
  • Client IP - Usually the IP address, but can be
    resolved to DNS by the web server (not
    recommended)
  • File requested by client (including directory)
  • Method used in request (GET, POST, etc.)

9
Information in the log files
  • Return Code - was it successful, and if not, why?
  • Bytes Sent back to the client in the response
  • Referring URL where did the user find the link
    to this request?
  • Browser String telling what browser is being used

10
Information in the log files
  • Username - anonymous or authenticated access
  • Cookie The cookie relating to this
    transaction, if any
  • Bytes Received by the server in the request
  • Time Taken by the server to process the request

11
Standardized Log Formats
  • Common Log Format (CLF)
  • Extended Common Log Format
  • W3C Standard
  • Other formats may be product-specific, and many
    are extensions of the CLF or Extended CLF formats.

12
Common Log Format
  • Advantages
  • Supported by just about every web server ever
    written
  • Disadvantages
  • Inflexible
  • Contains very limited data no Bytes Received,
    Time Taken, User agent (Browser), or Referer
    fields available.

13
Common Log Format
64.12.105.154 - - 16/Feb/2001065935 -0800
"GET /cgi-bin/Count.cgi?dfgecbhomeddB
HTTP/1.0" 404 211 64.12.97.10 - -
16/Feb/2001065937 -0800 "GET
/java/FixFontHeadline.class HTTP/1.0" 200
2898 64.12.97.9 - - 16/Feb/2001065943 -0800
"GET /graphics/trombone.gif HTTP/1.0" 200
1050 64.12.96.206 - - 16/Feb/2001065958
-0800 "GET /images/joinband.jpg HTTP/1.0" 200
13457 64.12.97.9 - - 16/Feb/2001070030 -0800
"GET /images/parade.jpg HTTP/1.0" 200
22754 128.93.11.53 - - 16/Feb/2001102053
-0800 "GET /schedule.shtml HTTP/1.0" 200
7103 128.93.11.53 - - 16/Feb/2001102648
-0800 "GET /index.shtml HTTP/1.0" 200
8650 128.93.11.53 - - 16/Feb/2001102118
-0800 "GET /about.shtml HTTP/1.0" 200
9151 128.93.11.53 - - 16/Feb/2001102625
-0800 "GET /communty.shtml HTTP/1.0" 200
5731 128.93.11.53 - - 16/Feb/2001101825
-0800 "GET /join.shtml HTTP/1.0" 200
5056 128.93.11.53 - - 16/Feb/2001102453
-0800 "GET /write.shtml HTTP/1.0" 200
9633 128.93.11.53 - - 16/Feb/2001105405
-0800 "GET /robots.txt HTTP/1.0" 404 204
14
Extended Common Log Format
  • Adds User Agent (Browser) and Referrer to Common
    Log Format
  • Advantages
  • Most web servers support it
  • More information available than CLF
  • Disadvantages
  • Still no Time Taken or Bytes Received
  • Still inflexible

15
Extended Common Log Format
64.12.105.154 - - 16/Feb/2001065935 -0800
"GET /cgi-bin/Count.cgi?dfgecbhomeddB
HTTP/1.0" 404 211 "http//www.mycommunityb
and.org/" "Mozilla/4.0 (compatible MSIE 5.5 CS
2000 Windows 98)" 64.12.97.10 - -
16/Feb/2001065937 -0800 "GET
/java/FixFontHeadline.class HTTP/1.0" 200 2898
"-" "Java 1.1" 64.12.97.9 - - 16/Feb/200106594
3 -0800 "GET /graphics/trombone.gif HTTP/1.0"
200 1050 "http//www.mycommunityband.org/"
"Mozilla/4.0 (compatible MSIE 5.5 CS 2000
Windows 98)" 64.12.96.206 - - 16/Feb/200106595
8 -0800 "GET /images/joinband.jpg HTTP/1.0" 200
13457 "http//www.mycommunityband.org/join
.shtml" "Mozilla/4.0 (compatible MSIE 5.5 CS
2000 Windows 98)" 64.12.97.9 - -
16/Feb/2001070030 -0800 "GET
/images/parade.jpg HTTP/1.0" 200 22754
"http//www.mycommunityband.org/about.shtml"
"Mozilla/4.0 (compatible MSIE 5.5 CS 2000
Windows 98)" 128.93.11.53 - - 16/Feb/200110205
3 -0800 "GET /schedule.shtml HTTP/1.0" 200 7103
"- "xyro_(xcrawler_at_cosmos.inria.fr)" 128.
93.11.53 - - 16/Feb/2001102648 -0800 "GET
/index.shtml HTTP/1.0" 200 8650 "-
"xyro_(xcrawler_at_cosmos.inria.fr)" 128.93.11.53 -
- 16/Feb/2001102118 -0800 "GET /about.shtml
HTTP/1.0" 200 9151 "- "xyro_(xcrawler_at_cos
mos.inria.fr)" 128.93.11.53 - -
16/Feb/2001102625 -0800 "GET /communty.shtml
HTTP/1.0" 200 5731 "- "xyro_(xcrawler_at_cos
mos.inria.fr)" 128.93.11.53 - -
16/Feb/2001101825 -0800 "GET /join.shtml
HTTP/1.0" 200 5056 "- "xyro_(xcrawler_at_cos
mos.inria.fr)" 128.93.11.53 - -
16/Feb/2001102453 -0800 "GET /write.shtml
HTTP/1.0" 200 9633 "- "xyro_(xcrawler_at_cos
mos.inria.fr)" 128.93.11.53 - -
16/Feb/2001105405 -0800 "GET /robots.txt
HTTP/1.0" 404 204 - -
16
W3C Extended Log Format
  • http//www.w3.org/TR/WD-logfile
  • Advantages
  • Very Flexible
  • Extensible
  • Disadvantages
  • Not as universally supported by web servers

17
W3C Extended Log Format
Software Microsoft Internet Information
Services 5.0 Version 1.0 Date 2001-03-18
050120 Fields date time c-ip cs-username s-ip
cs-method cs-uri-stem cs-uri-query sc-status
sc-bytes cs-bytes time-taken cs-version cs-host
cs(User-Agent) cs(Cookie) cs(Referer)
2001-03-18 050120 144.249.14.154 -
144.249.252.75 GET /Default.asp - 200 40606 253
16 HTTP/1.1 entry.corp.com Mozilla/4.0(compat
ibleMSIE5.01Windows95) SITESERVERID547754c
dab354b60fcd92cd09351121e - 2001-03-18 050121
144.249.14.154 - 144.249.252.75 GET
/corporate.css - 304 160 436 0 HTTP/1.1
entry.corp.com Mozilla/4.0(compatibleMSIE5
.01Windows95) SITESERVERID547754cdab354b6
0fcd92cd09351121eASPSESSIONIDGGQQGZECKEJNEBECDJ
LKONONHOOBBINF http//entry.corp.com/ 2001-03-18
050121 144.249.14.154 - 144.249.252.75 GET
/images/vDivider2.gif - 304 209 444 0 HTTP/1.1
entry.corp.com Mozilla/4.0(compatibleMSIE5
.01Windows95) SITESERVERID547754cdab354b6
0fcd92cd09351121eASPSESSIONIDGGQQGZECKEJNEBECDJ
LKONONHOOBBINF http//entry.corp.com/ 2001-03-18
050121 144.249.14.154 - 144.249.252.75 GET
/images/toc_quicklink.gif - 304 209 448 0
HTTP/1.1 entry.corp.com Mozilla/4.0(compatibl
eMSIE5.01Windows95) SITESERVERID547754
cdab354b60fcd92cd09351121eASPSESSIONIDGGQQGZECK
EJNEBECDJLKONONHOOBBINF http//entry.corp.com/ 200
1-03-18 050121 144.249.14.154 - 144.249.252.75
GET /images/region_am.jpg - 304 209 444 0
HTTP/1.1 entry.corp.com Mozilla/4.0(compatibl
eMSIE5.01Windows95) SITESERVERID547754
cdab354b60fcd92cd09351121eASPSESSIONIDGGQQGZECK
EJNEBECDJLKONONHOOBBINF http//entry.corp.com/ 200
1-03-18 050121 144.249.14.154 - 144.249.252.75
GET /images/orange_square_bullet.gif - 304 209
455 0 HTTP/1.1 entry.corp.com
Mozilla/4.0(compatibleMSIE5.01Windows95)
SITESERVERID547754cdab354b60fcd92cd09351121e
ASPSESSIONIDGGQQGZECKEJNEBECDJLKONONHOOBBINF
http//entry.corp.com/ 2001-03-18 050122
144.249.14.154 - 144.249.252.75 GET
/corpnews/images/org_pointer_2.gif - 304 209 456
0 HTTP/1.1 entry.corp.com Mozilla/4.0(compati
bleMSIE5.01Windows95)
SITESERVERID547754cdab354b60fcd92cd09351121eAS
PSESSIONIDGGQQGZECKEJNEBECDJLKONONHOOBBINF
http//entry.corp.com/
18
Which Format(s) Does My Web Server Support
Server Common Log Format Extended CLF W3C Extended Log Format
Apache Default Available No
Microsoft IIS Available No Default
IBM HTTP Server (Websphere) (Based on Apache) Default Available No
iPlanet Web Server Default Available No
Website Pro (Orielly) Available No Available
Lotus Domino Default Available No
19
Which Format(s) Does My Web Server Support
Server Common Log Format Extended CLF W3C Extended Log Format
AOLServer Default Available No
Zeus Web Server Default Available No
Xitami Available Default No
I/Net Commerce Server/400 Default No No
WebStar (Mac) Available No Available
Servertec Internet Server Available Available Available
20
Limitations
  • -or-
  • Why we cant ignore other sources of information

21
Log File Limitations
  • Not enough information to get the whole picture
    on the sites performance and health
  • We need to correlate the log data with other
    sources.
  • OS-level statistics (Performance Monitor, SMF,
    3rd party)
  • Active web analysis (e.g. Keynote)
  • Data on databases or other components of the site

Client
Internet
Web Server
Back End DB
22
Log File Limitations
  • Not enough information to get the whole picture
    on the sites performance and health
  • We need to correlate the log data with other
    sources.
  • OS-level statistics (Performance Monitor, SMF,
    3rd party)
  • Active web analysis (e.g. Keynote)
  • Data on databases or other components of the site

Client
Internet
Web Server
Back End DB
23
Log File Limitations
  • Not enough information to get the whole picture
    on the sites performance and health
  • We need to correlate the log data with other
    sources.
  • OS-level statistics (Performance Monitor, SMF,
    3rd party)
  • Active web analysis (e.g. Keynote)
  • Data on databases or other components of the site

Client
Internet
Web Server
Back End DB
24
Log File Limitations
  • Not enough information to get the whole picture
    on the sites performance and health
  • We need to correlate the log data with other
    sources.
  • OS-level statistics (Performance Monitor, SMF,
    3rd party)
  • Active web analysis (e.g. Keynote)
  • Data on databases or other components of the site

Client
Internet
Web Server
Back End DB
25
Log File Limitations
  • Only when fit together with the other pieces do
    we get the complete picture of your total web
    site health.

Client
Internet
Web Server
Back End DB
26
Log File Limitations
  • You may also have to deal with log file formats
    which dont include all of the information that
    you would like.

Bytes Received
Time Taken
Common Log Format
Referrer
User Agent
27
Issues With Log Files
  • User or Session level statistics
  • Caching
  • Clustering
  • What constitutes a site?

28
User or Session Level Statistics
  • The server doesnt give you statistics for the
    user (e.g. how long were they on the site?)
  • You have to mine these yourself from the data
    available
  • You will only be able to get approximations with
    this data, not exact figures

29
How do we group records for user-level statistics?
  • Clients IP address
  • Proxy Servers and firewalls with Network Address
    Translation (NAT) will make all users from behind
    the firewall look like one user
  • If the proxy or firewall has multiple IP
    addresses (or it is an array), multiple accesses
    of site from one user may look like multiple users

30
How do we group records for user-level statistics?
  • Cookies
  • If the site assigns cookies to track users
    through the site, you can group the records based
    upon the cookie
  • Users who disable cookies on their browser mess
    this up
  • Not all log file formats include the cookie

31
How do we group records for user-level statistics?
  • User name
  • Useful for intranet, but you must have the server
    disallow anonymous access
  • Impractical for most internet sites (except
    restricted access)

32
Caching
  • Content from the web site may be cached outside
    of the web server
  • The web server may not get notification of
    requests for content that are serviced by these
    caches
  • The caches may be in Proxy Servers, Browsers, or
    elsewhere

33
Clustering
  • Each server in a web cluster may maintain its own
    log file
  • You have to combine the log files to get
    information relevant to the entire site
  • One user accessing your site may get data from
    multiple servers
  • You may still want information on each individual
    server, to verify that they are load-balancing
    properly

34
What constitutes a web site?
  • You have to decide exactly what you want to call
    a site
  • A load-balanced cluster
  • A single site running on a dedicated server
  • A single site on a server running multiple sites
  • A directory within a site on a server
  • Multiple servers which act as your web presence
    (home, support, e-commerce)

35
What good is analyzing log files?
  • OS-level analysis cant
  • Provide user (session)-level info
  • Break down by return code, file type or name,
    directory, etc.

36
What good is analyzing log files?
  • Active monitoring
  • Gives the client-side perspective
  • May not distinguish between a slow link/router
    and a slow response from server
  • Some are concerned only with response to the
    testing system, not server load
  • If a browser-based product, it may have troubles
    with browser incompatabilities

37
So whats the key to analyzing log files?
  • Grouping your log file records into useful
    statistics that will help you understand what is
    going on with your site

38
Example 404 Errors
  • When a user gets a 404 Error (File Not Found),
    they may perceive a lack of professionalism or
    quality with your site.
  • You want to know not only what non-existent files
    are being requested, but why they are being
    requested (outdated link?)

39
Example 404 Errors
40
Example 404 Errors
41
Example User Session Time
  • You want to get as useful an approximation as is
    possible for how long users are staying at your
    site (at least, marketing will)
  • Obviously, the longer they are browsing your
    site, the more interested they may be in what you
    have to offer
  • You can use their first and last requests for
    files to get a rough approximation

42
Example User Session Time
  • Most sessions were very short (1-2 pages)
  • This was an Entry server cluster, which passed
    off to other sites
  • A few (lt20 of total sessions) were very long

43
Example Cluster Load-Balancing
  • Ideally, your clustered servers for the site
    would be sharing the load equally
  • If one server is carrying a larger load, it can
    lead to overall perceived slowdown of your site
    (most people going to a heavily loaded server
    while an idle server sits and does nothing)

44
Example Cluster Load-Balancing
45
Example Cluster Load-Balancing
46
So What Should I Take Out Of This?
  • -or-
  • Is there a point???

47
Summary
  • Web server log file analysis is an important part
    of your monitoring of your web servers
  • Log file analysis alone will not give you the
    complete picture of your web server, but you
    cant get the complete picture without it
  • Know what is useful in the log files, what
    limitations are inherent in them, and how to
    analyze them
Write a Comment
User Comments (0)
About PowerShow.com