Rigorous Performance Testing - Modern Testing Tools | Instart Logic - PowerPoint PPT Presentation

View by Category
About This Presentation

Rigorous Performance Testing - Modern Testing Tools | Instart Logic


Grant Ellis gives detailed explanations on various performance metrics that are important for measuring web performance. He also states important tools, the pros and cons of those tools which can be used for testing web performance. – PowerPoint PPT presentation

Number of Views:26


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Rigorous Performance Testing - Modern Testing Tools | Instart Logic

This is my second blog post in a series of three.
If you havent already read my prior
post, Rigorous Performance Testing How We Got
Here, then mosey on over for some extra context.
(No Transcript)
We all know that the Internet has gone through
some serious evolution over the past 25 years
(really! 1989!). Data Centers and hosting
technologies have changed media has changed
(copper to fiber!) switches and peering points
have changed content has changed addressing and
routing has changed (IP Anycast) devices have
changed content has changed. In the last five
years alone, we have seen a transition to rich,
interactive, and dynamic sites and applications.
Clients are accessing those applications on
handheld devices instead of computers.
Connectivity is largely wireless instead of
wired. These are great technologies, and our
lives are better for it but these same
technologies do horrible things to web
performance. Similarly, measuring performance
has become quite complicated. Before the web, the
simple, venerable ping was sufficient while the
web was in its infancy. Then, as bandwidth
demands grew, we needed to use HTTP-aware testing
tools like cURL. With the adoption of the
commercial web, paradigms changed and it became
important to measure whole pages with tools like
Mercury Load Runner (now HP). When CDNs started
helping the middle-mile with decentralized
infrastructure, the testing tools themselves
needed to decentralize in order to capture
performance data with the CDNs in-line. Gomez
(now Compuware) and Keynote stepped in with
browser-based testing agents distributed all over
the middle-mile (backbone) of the Internet.
Now, the web is filled with super-dynamic sites
and applications. All of these applications are
dynamic on the client-side as well as the
server-side. The browser mechanics of a modern
application are complicated in themselves, and so
testing methodologies have become more
sophisticated. One huge differentiator is which
performance metrics are tracked.
Prior testing tools would simply start a timer,
initiate the page load, and then stop the timer
after the underlying internet connection was
disused. In the Web 1.0 world, this was as
sufficient test the browser needed all the
content in order to actually render the page and
get that user happily using. On the modern Web
2.0, pages dont need everything in order to
be actually functional. Secondary content and/or
personalized content may be loaded asynchronously
(for example, below-the-fold loading), but the
page may be fully functional beforehand. Ternary
backend functions like analytics beacons have no
bearing on function from the users perspective.
With these points in mind, internet connection
idleness is no reflection of user experience, and
Fully Loaded has become less relevant.
The Document Complete event is fired in the
browser when, well, when the document is
complete. Generally, this means that the page is
visually complete, responsive to the user (user
can search, scroll, click links, etc.). However,
the browser may still be loading asynchronous
content or firing beacons see Fully Loaded
above. However, this metric is imperfect as
well some sites deliberately defer loading of
prominent content until after Document Complete.
Some Front-End Optimization (FEO) packages can
defer execution of Javascript until after
Document Complete. Script deferral can be hugely
misleading. Visual completeness may occur sooner,
and Document Complete may be significantly
improved as well. Testers will even see evidence
of the visual completeness in videos, filmstrips,
and screen shots. However, despite visual
completeness, the page may not be responsive
until long after Document Complete users may
not be able to click links, scroll, or search.
From a user's perspective, this is hugely
frustrating and contributes to bounce rates.
Imagine if someone switched your browser window
for a screen shot, and you kept trying to click
links but nothing would happen!
Perhaps more importantly, this tactic improves
Document Complete, but only at the cost of making
the metric meaningless altogether! One of the
primary tenets of Document Complete is that the
page is ready for the user. With script deferral,
the page is not ready for the user even if it
looks ready.  
Visually Complete is the moment that all visual
elements are painted on the screen and visible
for the user. Note that visual completeness is
not the same as functional. See the beware
block above!
The Start Render event is fired in the browser
when something (anything!) is first painted on
the screen. The paint event may be the whole page
but it could instead be a single word, single
image, or single pixel. That may not sound
significant after all, if the content is not
there and the user cant interact, then what is
the value? Keep in mind that, before Start
Render fires, the user is staring at a blank
white browser screen, or, worse, the prior page
from which they just tried to navigate away from.
From the users perspective, Start Render is the
moment that the web site is clearly working
properly. There is significant evidence that
Abandonment (bounce rate) is correlated very
strongly with slow Start Render timings.
Arguably, Start Render is the most important
metric of all.
When the browser requests the base page, that
request must traverse the Internet (whether or
not a CDN is in play), then the hosting facility
must fetch (or assemble) the page, then the
response must traverse the Internet again back to
the device requesting the page. First Byte is the
time it takes for the first byte of the response
to reach the browser. So, First Byte is a
function of twice network latency plus server
latency. Other factors, like packet loss, may
also impact this metric. First Byte is
transparent for your users. However, the metric
is still important because it is critical path
for all browser functions.
The Start Render event is fired in the browser
when something (anything!) is first painted on
the screen. The paint event may The Speed Index
is a metric peculiar to WebPageTest (more on that
below). Loosely speaking, the Speed Index is the
average amount of time for visual components to
be painted on the screen. More technically, if we
plotted all the paint events, then measured the
area above the curve, we would have the Speed
Index. That is, the Speed Index is the integral
of the area above the visual completeness
curve. Pages with a faster Start Render and a
faster Visually Complete would have a greater
percentage of the screen painted at any time so
the area above the curve would be less, and the
Speed Index would be less (lower is
better). WebPageTest has excellent technical
documentation on the Speed Index here. Note again
that a fast Speed Index is not the same as
functional page. See the beware block above!
  • Middle-mile (or backbone) testing tools are
    great for measuring availability from the broader
    Internet, but they never reflect the experience
    your users are actually seeing especially those
    using wireless connectivity (even Wi-Fi!).
  • RUM Tools are the best way to fill this gap.
    Basically, performance data is collected from
    your end users as they browser your site. RUM
    tools track all of the above metrics (except
    Speed Index) and represent exactly what your
    users are seeing (with one or two exceptions
    see below). RUM tools are really easy to install
    just paste in a JavaScript tag.
  • Pros
  • True user experience.
  • Easy set-up
  • Support for a broad range of browsers and
  • Collects data from various real-world connection
    types including high-latency wireless and
    packet-loss scenarios.
  • Open source tools are available (Boomerang.js).

  • Cons
  • Inserting a third-party tag hurts performance to
    a degree. The act of measuring performance with
    RUM also hurts performance.
  • Safari doesnt support the browser APIs on which
    RUM tools are dependent. Data for Safari browsers
    will be a subset of the metrics above, and
    remaining metrics are approximated using
    JavaScript timers rather than using
    hyper-accurate native browser code.
  • Outliers can be extreme and must be removed
    before interpreting aggregate data.
  • RUM requires live traffic. It is not possible to
    use RUM to measure performance of a site

RUM tools are excellent for measuring
performance, but sometimes we really need
synthetic measurements especially for
evaluating performance of pre-production
environments (code/stack).
WebPageTest is an open-source, community-supported
and widely endorsed tool for measuring and
analyzing performance. The testing nodes are
community-sponsored and freely available
however, it is possible to set up private testing
nodes for your own dedicated use. Scripting
capabilities are vastly improved on private nodes.
  • Pros
  • Measures user experience metrics, albeit from
    backbone locations.
  • Supports traffic shaping, so testers can
    configure specific bandwidth, latency, or
    packet-loss scenarios. The traffic shaping is, of
    course, synthetic and thus less variable than
    true user connections but still this is an
    excellent feature and quite representative of
    real-world conditions.
  • Supports a subset of mobile clients, and a wide
    array of browsers.
  • Cons
  • Limited testing agent geographies available.
  • Great analysis overall, but very limited
    statistical support.
  • Extremely difficult to monitor performance on an
    ongoing basis or on regular intervals for a fixed
    period. Testers must set up private instances and
    WebPageTest Monitor in order to monitor
  • Nodes are not centrally managed and therefore
    have inconsistent base bandwidth and hardware
    spec. Furthermore, they can sometimes be unstable
    or unavailable.
  • Supports multi-step transactions only on private

Catchpoint is a commercial synthetic testing
package. Catchpoint has a massive collection of
domestic and international testing nodes
available, and a powerful statistical analysis
  • Pros
  • Tracks user experience metrics.
  • Supports ongoing performance monitoring.
  • Easy to provision complicated tests.
  • Supports multi-step transactions.
  • Captures waterfall diagrams for detailed
  • Supports true mobile connection testing. The
    agents themselves are desktop machines, but they
    operate on wireless (Edge/3G/4G/LTE) modems.
  • Excellent statistical package.
  • Cons
  • No traffic shaping available. All backbone tests
    have very high bandwidth and very low latency, so
    results are not necessarily representative of
    end-user performance.
  • No support for mobile devices (note that mobile
    connections are supported).

Keynote is also a commercial synthetic testing
package. Keynote has existed for a LONG time, and
formerly measured only the Fully Loaded metric.
However, they have recently revised their service
to measure user experience metrics like Document
Complete and Start Render.
  • Pros
  • Tracks user experience metrics.
  • Supports ongoing performance monitoring.
  • Easy to provision complicated tests.
  • Supports multi-step transactions.
  • Captures waterfall diagrams for detailed
  • Cons
  • No traffic shaping available. All backbone tests
    have very high bandwidth and very low latency, so
    results are not necessarily representative of
    end-user performance.
  • No support for mobile devices.

So, youve picked your performance metrics and
your tool, and now you have plenty of data. What
are the next steps? In the final installment of
this series, we will discuss statistical analysis
and interpretation of performance data sets.
About PowerShow.com