More Technical Challenges in Using Perl for Commercial Software Real Life War Stories - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

More Technical Challenges in Using Perl for Commercial Software Real Life War Stories

Description:

Large number of things must happen at the same time. Robustness ... syslog() needs argv[0] to stay constant. Bandaid: only main thread can affect argv[0] ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 43
Provided by: allu174
Category:

less

Transcript and Presenter's Notes

Title: More Technical Challenges in Using Perl for Commercial Software Real Life War Stories


1
More Technical Challenges in Using Perl for
Commercial SoftwareReal Life War Stories
  • Gurusamy Sarathy
  • ActiveState Corp.

2
Commercial Software Technical Goals
  • Concurrency
  • Large number of things must happen at the same
    time
  • Robustness
  • Things cannot stop workingno excuses
  • Scalability
  • Organizations growsoftware must not only cope,
    but also aid in this
  • Compatibility
  • Must support older code

3
The Two Cases in Point
  • PureMessage (formerly PerlMx)
  • Mail server product for email policy management
  • PerlEx
  • Web server product to accelerate CGI
  • Many similarities
  • Server class products
  • Many requests must be serviced as quickly as
    possible
  • Business-critical component

4
Concurrency
5
Concurrency
  • Motivation for threads
  • Faster performance, easier on the system
  • Reduced memory footprint
  • Easier data sharing
  • On the other hand
  • Most code isnt thread-safe must reengineer
  • Locking requires discipline and introduces
    complexity
  • Harder to get right, fix bugs

6
Global/static data
  • Instant recipe for race conditions
  • findglobals - Script to find globals by studying
    nm or dumpbin.exe output from Perls object files
  • Must move globals into interpreter structure
  • Dont forget globals in XS code!
  • Addressed in 5.8 with MY_CXT macros
  • State is carried in the interpreter instead
  • Each interpreter can only execute in one thread
    at a time

7
Reentrant functions
  • Many APIs arent reentrancy safe (implies thread
    unsafe)
  • localtime() gt localtime_r()
  • asctime() gt asctime_r()
  • findrfuncs - Script to find reentrant functions
    in the standard headers
  • Configure detects and uses them in 5.8

8
Signals
  • Race conditions
  • e.g. system()
  • sigsetjmp() is not thread safe
  • Not really needed with safe signals in 5.8,
    setjmp() is sufficient

s1 signal(SIGINT,SIG_IGN) run child
process s2 signal(SIGINT,SIG_IGN) si
gnal(SIGINT, s1) run child process
signal(SIGINT,
s2) /ouch/
9
Environment
  • environ array gets munged when ENV is modified
  • Race condition when it happens from different
    threads
  • Possible solutions
  • Virtualize environment
  • Lock access to environ (still problematic)
  • Allow it only from main thread (hack)

10
0 a.k.a. argv0
  • This modifies argv0 in-situ
  • 0 s./g
  • Two problems
  • Not safe from different threads
  • syslog() needs argv0 to stay constant
  • Bandaid only main thread can affect argv0
  • Workaround call openlog() explicitly

11
Fast sv_gets()
  • Snooping stdio buffers
  • Efficiency hack
  • Breaks when multiple threads mess with same FILE
    across threads
  • e.g. stdin

12
fork()
  • Interacts with mutexes
  • Only one thread recreated in child
  • Deadlock if it is not the thread that holds lock
  • Need to use pthread_atfork()
  • Has issues with dynamic loading e.g. mod_perl
  • Cant undo pthread_atfork()

13
dup2()
  • The classic case
  • Present in 5.8.0

close(fd) another thread acquires fd
dup2(otherfd,fd) / ouch /
14
fdopen()
  • fd socket()
  • rf fdopen(fd, r)
  • wf fdopen(fd, w)
  • . . .
  • fclose(rf) / close(fd) /
  • another thread acquires fd
  • fclose(wf) / ouch /

15
Robustness
16
Robustness
  • Must cope with failures and carry on as much as
    possible
  • exit() and _exit()
  • No-no in a server environment
  • Useful to have a minder process that retires
    and respawns server processes

17
malloc()
  • Graceful handling of out of memory errors needed
  • Chicken-and-egg problems with interpreter
    creation and malloc() dependency on interpreter
  • Need a malloc() that is both thread-safe and
    scalable to large numbers of threads
  • vmem.h on windows

18
fork()
  • Any open file descriptors are shared with child
  • read(), write(), seek() etc., move the seek
    pointer in all children
  • Berkeley db will have problems with this
  • exec() will inherit handles by default
  • Normally dont want that

19
Signals again
  • Two approaches to safety
  • Do absolute minimal work in handler
  • Set a flag when signal arrives
  • Check for it in a safe spot
  • Perl 5.8 has this
  • Handle signals in a dedicated thread
  • Signal handling thread blocks waiting for signals
  • Better model in threaded environments
  • Perl doesnt play nice, calls signal APIs directly

20
Signals yet again
  • SIG_DFL may not be sane
  • SIGPIPE could kill you
  • Bandaid only main thread can affect signals

21
Memory leaks
  • Leaks from compile-time errors
  • Flawed design of OP allocation management
  • Closures have typically been a problem
  • Reference counting cycles

CV
pad
CV
22
Memory leaks
  • Other unintended cyclic references
  • a \a
  • Often stumble into them when parsing, in tree
    (really graph) structures
  • Weak references
  • Referent has a pointer back to reference
  • Reference undefined when referent goes away
  • Still considered experimental
  • Just use symbolic (or soft) references

23
Limits
  • Every system has limits (just a matter of scaling
    high enough before you hit it)
  • Need better control over what happens when limits
    are reached
  • Heap
  • Stack
  • TCP buffers

24
Regexes too difficult
  • End users and admins find regexes prone to human
    error
  • Cant afford this on a central mail gateway or
    web server
  • Application specific solution
  • Sieve - a little language for filtering policy

25
CPAN modules
  • Quality varies widely
  • Code review very important
  • XS code NOT thread-safe until proven otherwise
  • XMLParser globals
  • GD globals
  • DigestNilsimsa globals
  • Berkeley db 1.8x unsafe even with flock()

26
Scalability
27
Scalability
  • Minimize locking contention
  • Perl does locking while reference counting the OP
    tree, which can be shared when interpreter is
    created via perl_clone()
  • User code can kill scalability
  • flock()
  • print() can have unintended consequences
  • Typical problem area is log files

28
Arbitrary limits
  • Max file descriptors per process (often set
    ridiculously low)
  • Cant go past a certain value
  • stdio will have issues
  • select() may have issues (use poll() instead)
  • Maximum heap space per process
  • Perl eats lots of memory
  • Maximum stack space
  • Regex engine may recurse massively for certain
    patterns and dataneed lots of stack
  • Create a pool of processes in order to scale

29
Regex Scalability
  • Backtracking can be a huge hit
  • Anchor your regexes
  • Pay attention to warnings
  • Avoid repeated recompilation
  • qr// is your friend
  • Peephole optimizer can eat the C stack when
    optimizing regexes
  • Most platforms dont handle running out of stack
    gracefully
  • Had to increase default stack size on some
    platforms
  • Still no good solution for this

30
Interpreter Startup
  • Can be very slow
  • too many stat()s
  • Pileup effects with on-demand creation
  • Slow application startup when creating everything
    at once

31
Memory Footprint
  • Individual module footprint
  • Use BTerseSize to measure
  • COW sharing on fork()
  • Load everything at startup
  • Load rarely used modules ALAP
  • Do throwaway work in separate process
  • Reduce concurrency to match execution capacity

32
File Descriptors
  • Often a scarce resource
  • Solaris stdio limited to 256 FDs
  • Use fcntl(fd,F_DUPFD,nfd)trick to cope
  • Dont create them until actually needed
  • Close them at earliest opportunity
  • Beware of shared FD semantics

33
Scalable Data Stores
  • Hard to find a data store that is compatible,
    robust, inexpensive, efficient
  • DB_File doesnt cut it
  • Quadratic behavior on typical index file
  • SQL syntax and data types are non-standard
  • Much of the benefits of DBI compromised
  • Many filesystems scale poorly
  • Maildir style tree mostly works
  • Licensing for real databases varies widely
  • PostgreSQL looks pretty good

34
Compatibility
35
Source Compatibility
  • THX macros add an implicit context pointer where
    needed
  • Makes it possible to retain old function
    signatures
  • Context pointer consistently available
  • Autogenerated compatibility defines
  • iperlsys.h abstractions help virtualize system
    access
  • Overrideable function table
  • All calls to system are indirected through table
  • Different host systems may provide different
    systemic functionality

36
Binary compatibility
  • Interpreter structure
  • Changes break binary compatibility
  • Solution access through function
  • ret_type Perl_Ifoo_ptr(pTHX)
  • return PL_foo
  • define PL_foo (Perl_Ifoo_ptr(aTHX))

37
Platform Compatibility
  • SMP bugs (Early Linux 2.4.x releases)
  • OOM behavior sometimes not sane (Linux)
  • pthread bugs (Linux, FreeBSD)
  • NPTL will be a big improvement
  • libc bugs (Linux, Windows)
  • Silly stdio limits (Solaris, Windows)
  • How we cope
  • fork() rather than threads, where possible

38
Platform Compatibility
  • Is it a thread or process?
  • getpid() on Linux 2.2
  • Deleting files in use
  • Access denied on Windows
  • ETXTBSY on HP-UX
  • Debugging a crash
  • strace doesnt do threads on Linux
  • Solaris has nicer support
  • pfiles, pstack et al.
  • Coredumps on Linux 2.2 unreliable

39
Platform Compatibility
  • Berkeley DB compatibility
  • Different OSes ship with different versions
  • Incompatible licenses (1.8x vs. newer)
  • Incompatible file formats across versions
  • impossible to build with your own libdb on
    FreeBSD
  • Dynamic loader prefers the symbols in libc
  • Perl needs a standard database integrated
  • PHP wins here with MySQL

40
Things to do
  • Want lightweight? Dont use Perl. -)
  • Use a decent queueing model for handling mismatch
    between concurrency demand and supply
  • Perl needs better ways to share data across
    interpreters
  • Perl needs to be more embedding friendly
  • Make it easier to leave out parts
  • More control over global actions and state
  • Perl needs a standard data store

41
Things to take home
  • Design for concurrency and scalability!
  • Cannot retrofit or reengineer for this easily
  • Eat your own dogfood
  • Make it a fundamental QA principle
  • Stress test business-critical software
  • Use SMP hardware for testing
  • Build a beta cycle into release process

42
Questions?
Write a Comment
User Comments (0)
About PowerShow.com