Scalability and Performance Best Practices - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Scalability and Performance Best Practices

Description:

Apache PHP is fast for dynamic content ... very popular in modern' PHP architectures. ... Writing parsers in PHP that could be done with a simple regex. ... – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 40
Provided by: georgesch
Category:

less

Transcript and Presenter's Notes

Title: Scalability and Performance Best Practices


1
Scalability and Performance Best Practices
  • Laura Thomson
  • OmniTI
  • laura_at_omniti.com

2
Standing in for George
3
Scalability vs. Performance
  • Scalability Ability to gracefully handle
    additional traffic while maintaining service
    quality.
  • Performance Ability to execute a single task
    quickly.
  • Often linked, not the same.

4
Why are Scalability and Performance Important?
  • No hope of growth otherwise.
  • Scalability means you can handle service
    commitments of the future.
  • Performance means you can handle the service
    commitments of today.
  • Both act symbiotically to mean cost-efficient
    growth.

5
Why PHP?
  • PHP is a completely runtime language
  • Compiled, statically typed languages are faster.
  • BUT
  • Scalability is (almost) never a factor of the
    language you use
  • Most bottlenecks are not in user code
  • PHPs heavy lifting is done in C
  • PHP is fast to learn
  • PHP is fast to write
  • PHP is easy to extend

6
When to Start
  • Premature optimization is the root of all evil
    Donald Knuth
  • Without direction and goals, your code will only
    get more obtuse with little hope of actual
    improvement in scalability or speed.
  • Design for refactoring, so that when you need to
    make changes, you can.

7
Knowing When to Stop
  • Optimizations get exponentially more expensive as
    they are accrued.
  • Strike a balance between performance, scalability
    and features.
  • Unless you ship, all the speed in the world is
    meaningless.

8
No Fast True
  • Optimization takes effort.
  • Some are easier than others, but no silver
    bullet.
  • Be prepared to get your hands dirty.

9
General Best Practices
  • Profile early, profile often.
  • Dev-ops cooperation is essential.
  • Test on production data.
  • Track and trend.
  • Assumptions will burn you.

10
Scalability Best Practices
  • Decouple.
  • Cache.
  • Federate.
  • Replicate.
  • Avoid straining hard-to-scale resources.

11
Performance Best Practices
  • Use a compiler cache.
  • Be mindful of using external data sources.
  • Avoid recursive or heavy looping code.
  • Dont try to outsmart PHP.
  • Build with caching in mind.

12
1. Profiling
  • Pick a profiling tool and learn it in and out.
  • APD, XDebug, Zend Platform
  • Learn your system profiling tools
  • strace, dtrace, ltrace
  • Effective debugging profiling is about spotting
    deviations from the norm.
  • Effective habitual profiling is about making the
    norm better.
  • Practice, practice, practice.

13
2. Dev-Ops Cooperation
  • The most critical difference in organizations
    that handles crises well.
  • Production problems are time-critical and usually
    hard to diagnose.
  • Build team unity before emergencies happen.
  • Operations staff should provide feedback on
    behavior changes when code is pushed live.
  • Development staff must heed warnings from
    operations staff.
  • Established code launch windows, developer
    escalation procedures, and fallback plans are
    very helpful.

14
3. Test on Production(-ish) Data
  • Code behavior (especially performance) is often
    data driven.
  • Using data that looks like production data will
    minimize surprises.
  • Having a QA environment that simulates production
    load on all components will highlight problems
    before they occur.

15
4. Track and Trend
  • Understanding your historical performance
    characteristics is essential for spotting
    emerging problems.
  • Access logs (with hi-res timings)
  • System metrics
  • Application and query profiling data

16
Access log timings
  • Apache 2 natively supports hi-res timings
  • For Apache 1.3 youll need to patch it (timings
    in seconds not very useful)

17
5. When you assume
  • Systems are complex and often break in unexpected
    ways.
  • If you knew how your system was broken, you
    probably would have designed it better in the
    first place.
  • Confirming your suspicions is almost always
    cheaper than acting on them.
  • Time is your most precious commodity.

18
6. Decouple
  • Isolate performance failures.
  • Put refactoring time only where needed.
  • Put hardware only where needed.
  • Impairs your ability to efficiently join two
    decoupled application data sets.

19
Example Static versus dynamic content
  • Apache PHP is fast for dynamic content
  • Waste of resources to serve static content from
    here images, CSS, JavaScript
  • Move static content to a separate faster solution
    for static content e.g. lighttpd on a separate
    box - on a geographically distributed CDN

20
Example Session data
  • Using the default session store limits scale out
  • Decouple session data by putting it elsewhere
  • In a database
  • In a distributed cache
  • In cookies

21
7. Cache
  • Caching is the core of most optimizations.
  • Fundamental question is how dynamic does this
    bit have to be.
  • Many levels of caching
  • Algorithmic
  • Data
  • Page/Component
  • Good technologies out there
  • APC (local data)
  • Memcache (distributed data)
  • Squid (distributed page/component/data)
  • Bespoke

22
Caching examples
  • Compiler cache (APC or Zend)
  • MySQL query cache (tune and use where possible)
  • Cache generated pages or iframes (disk or
    memcache)
  • Cache calculated data, datasets, page fragments
    (memcache)
  • Cache static content (squid)

23
8. Federate
  • Data federation is taking a single data set and
    spreading it across multiple database/application
    servers.
  • Great technique for scaling data.
  • Does not inherently promote data reliability.
  • Reduces your ability to join within the data set.
  • Increases overall internal connection
    establishment rate.

24
9. Replicate
  • Replication is making synchronized copies of data
    available in more than one place.
  • Useful scaling technique, very popular in
    modern PHP architectures.
  • Mostly usable for read-only data.
  • High write rates can make it difficult to keep
    slaves in sync.

25
Problems
  • On the slave, you should see two threads running
    an I/O thread, that reads data from the master,
    and an SQL thread, that updates the replicated
    tables.
  • (You can see these with SHOW PROCESSLIST)
  • Since updates on the master occur in multiple
    threads, and on the slave in a single thread,
    the updates on the slave take longer.
  • Slaves have to use a single SQL thread to make
    sure queries are executed in the same order as on
    the master

26
  • The more writes you do, the more likely the
    slaves are to get behind, and the further behind
    they will get.
  • At a certain point the only solution is to stop
    the slave and re-image from the master.
  • Or use a different solution multi master,
    federation, split architectures between
    replication and federation, etc

27
Other uses of replication
  • Remember replication has other uses than scale
    out
  • Failover
  • Backups

28
10. Avoid Straining Hard-to-Scale Resources
  • Some resources are inherently hard to scale
  • Uncacheable data
  • Data with a very high readwrite rate
  • Non-federatable data
  • Data in a black-box
  • Be aware of these limitations and be extra
    careful with these resources.
  • Try and poke holes in the assumptions about why
    the data is hard to manage.

29
11. Compiler Cache
  • PHP natively reparses a script and its includes
    whenever it executes it.
  • This is wasteful and a huge overhead.
  • A compiler cache sits inside the engine and
    caches the parsed optrees.
  • The closest thing to fast true
  • In PHP5 the real alternatives are APC and Zend
    Platform.

30
12. Xenodataphobia
  • External data (RDBMS, App Server, 3rd Party data
    feeds) are the number one cause of application
    bottlenecks.
  • Minimize and optimize your queries.
  • 3rd Party data feeds/transfers are unmanageable.
    Do what you can to take them out of the critical
    path.

31
Managing external data and services
  • Cache it (beware of AUPs for APIs)
  • Load it dynamically (iframes/XMLHttpRequest)
  • Batch writes
  • Ask how critical the data is to your app.

32
Query tuning
  • Query tuning is like PHP tuning what you think
    is slow may not be slow.
  • Benchmarking is the only way to truly test this.
  • When tuning, change one thing at a time
  • Your toolkit
  • EXPLAIN
  • Slow Query Log
  • mytop
  • Innotop
  • Query profilers

33
Indexing problems
  • Lack of appropriate indexing
  • Create relevant indexes. Make sure your queries
    use them. (EXPLAIN is your friend here.)
  • The order of multi-column indexes is important
  • Remove unused indexes to speed writes

34
Schema design (MySQL)
  • Use the smallest data type possible
  • Use fixed width rows where possible (prefer char
    over varchar disk is cheap)
  • Denormalize where necessary
  • Take static data out of the database or use
    MEMORY tables
  • Use the appropriate storage engine for each table

35
Queries
  • Minimizing the number of queries is always a good
    start. Web pages that need to make 70-80 queries
    to be rendered need a different strategy
  • Cache the output
  • Cache part of the output
  • Redesign your schema so you can reduce the number
    of queries
  • Decide if you can live without some of these
    queries.
  • Confirm that your queries are using the indexes
    you think that they are
  • Avoid correlated subqueries where possible
  • Stored procedures are notably faster

36
13. Be Lazy
  • Deeply recursive code is expensive in PHP.
  • Heavy manual looping usually indicates that you
    are doing something wrong.
  • Learn PHPs idioms for dealing with large data
    sets or parsing/packing data.

37
14. Dont Outsmart Yourself
  • Dont try to work around perceived inefficiencies
    in PHP (at least not in userspace code!)
  • Common bad examples here include
  • Writing parsers in PHP that could be done with a
    simple regex.
  • Trying to circumvent connection management in
    networking/database libraries.
  • Performing complex serializations that could be
    done with internal extensions.
  • Calling out to external executables when a PHP
    extension can give you the same information.
  • Reimplementing something that already exists in
    PHP

38
15. Caching
  • Mentioned before, but deserves a second slide
    caching is the most important tool in your tool
    box.
  • For frequently accessed information, even a short
    cache lifespan can be productive.
  • Watch your cache hit rates. A non-effective
    cache is worse than no cache.

39
Thanks!
  • There are longer versions of this talk at
    http//omniti.com/george/talks/
  • There are good books on these topics as well
  • Advanced PHP Programming, G. Schlossnagle
  • Building Scalable Web Sites, C. Henderson
  • Scalable Internet Architectures, T. Schlossnagle
  • Compulsory plug OmniTI is hiring for a number of
    positions (PHP, Perl, C, UI design)
  • http//omniti.com/careers
Write a Comment
User Comments (0)
About PowerShow.com