Title: Software Sermon
1Software Sermon
- Management, Programming, Testing, CVS
2Software is Hard
- The difficulty of a program goes up roughly with
the square of its size. - Since software is so flexible there is always the
temptation to add a little more. - While some algorithms are difficult, usually the
hardest part is getting all the pieces to work
together, and keeping them working.
3Every Software Project Needs
- A sense of the users
- Wise managers
- Pragmatic programming
- Thorough testing
4Constant Contact with Users
- Its best to be constantly in contact with three
different users. - Strive to become a user yourself. Learn users
other tools. - A web browser provides a familiar user interface,
and access to tools anywhere. - Users are a great help for testing as well as an
endless source of ideas. - You cant implement every idea.
5Bioinformatics Users
- Willing to read and dig a little
- Biologists, medical doctors, programmers
- Macintosh, PC, Unix
- Nearly universally comfortable with web
- May be experts in some areas, know little of
other areas.
6Science is Hard
7Every User Needs
- Reliability
- Speed
- Ability to exchange data with other programs
- Consistent user interface
8Wise Management
- Many ways to do it right.
- Mistakes can be quite costly.
- Have to deal with
- People issues
- Resource issues
- Project issues
- Read The Mythical Man Month, Peopleware,
Extreme Programming, Pragmatic Programming
9People Issues
- Finding the right people
- Introducing new people to the work group
- Training the big picture and where to focus
- Coaxing the silent to talk, the talkers to listen
- Avoiding unnecessary interruptions
- Reassigning jobs when (and only when) needed.
- Credit where credit is due
- Keeping work varied and interesting
- Accepting there is life outside of work
10Pragmatic Programming
- Use consistent conventions.
- Build and test from the bottom up.
- Write for readability.
- Keep everything as local as possible.
- Defend against bad input.
- Learn the fine art of debugging.
- Modularize, but dont over-modularize.
- Sometimes computers need to reboot.
- Sometimes programmers need to start over.
11Consistency and Conventions
- Code is constrained by few natural laws.
- There are many ways to do things, so programmers
make arbitrary decisions. - Arbitrary decisions are hard to remember.
- Conventions make decisions less arbitrary.
- varName vs. VarName vs varname vs var_name pick
one and stick to it - variable vs. var vs. vrbl vs. vble vs varible if
you need to abbreviate, keep it short.
12Write for Readability
- Use descriptive names. Try and keep them short.
- Set your tab stops to 8 and indent cleanly!!
- Comment each module and subroutine with an
overview of what they do and if need be how they
do it. - Its not so important to comment line by line
except in unusual situations that are often
perhaps best avoided entirely. - Most subroutines should take up less than a
single screen. Break larger subroutines into
logical blocks and comment the start of each
block.
13A Header File
/ boxClump - put together 2 dimensional boxes
that overlap with each other into clumps. /
ifndef BOXCLUMP_H define BOXCLUMP_H struct
boxIn / Input to box clumper. /
struct boxIn next / Next in list. / int
qStart, qEnd / Range covered in query. /
int tStart, tEnd / Range covered in target. /
void data / Some user-associated data. /
struct boxClump boxFindClumps(struct
boxIn pBoxList) / Convert list of boxes to a
list of clumps. Clumps are collections of
boxes that overlap. Note that the original
boxList is overwritten as the boxes are moved
from it to the clumps. /
14A Simple Short Routine
struct hashEl hashLookup(struct hash hash, char
name) / Looks for name in hash table. Returns
associated element, if found, or NULL if
not. / struct hashEl start
hash-tablehashCrc(name) hash-mask struct
hashEl el for (el start el ! NULL el
el-next) if (sameString(el-name,
name)) break return el
15Part of a Longer Routine
struct mafAli mafNext(struct mafFile mf) /
Return next alignment in FILE or NULL if at end.
/ struct lineFile lf mf-lf struct
mafAli ali char line, word / Loop until
get an alignment paragraph or reach end of file.
/ for () / Get header line. If
it's not there assume end of file. / if
(!nextLine(lf, line))
lineFileClose(mf-lf) return NULL
/ Parse alignment header line. /
word nextWord(line) if (word NULL)
continue / Ignore blank lines. / if
(sameString(word, "a"))
16Another Header File
struct dlNode / An element on a doubly linked
list. / struct dlNode next /
Pointer to next element. / struct dlNode
prev / Pointer to previous element. /
void val / Pointer to item in this
node. / struct dlList / A doubly
linked list. / struct dlNode head
/ First member in list. / struct
dlNode nullMiddle / Always NULL, shortens
code. / struct dlNode tail / Last
member in list. / void dlRemove(struct
dlNode node) / Removes a node from list. /
void dlAddTail(struct dlList list, struct
dlNode newNode) / Add a node to tail of list.
/ int dlCount(struct dlList list) / Return
length of list. /
17Bottom Up Implementation
- Start with lowest level code. It can be tested
independently. - Its much easier to write and test code if you
can trust the routines the code calls. - Code needs to be tested in full immediately after
it is written, even while it is written. - If not all of a new modules capabilities are
immediately used, write explicit test code for
them or wait to implement them until needed.
18Keep It Local
- Its easier to understand and modify programs
that primarily deal with local variables. - Keep things local to a subroutine, module or
object whenever possible. - Occasionally a global object that is set in one
place and read-only elsewhere is ok. - Putting data in an object as opposed to global or
module-level variables is usually good. - Inheritance delocalizes object-oriented programs.
Use inheritance with great care.
19Code Defensively
- Check inputs, especially in library routines.
- Sprinkle asserts through your program to make
sure it is behaving as you think it should. - Its better to fail hard, fast, and consistently
than to limp along erratically. Throw/catch. - Turn on compiler warnings, especially for
uninitialized variables and missing returns.
20Debugging the Easy Stuff
- Single stepping through a program the first time
it is written is very worthwhile. - Single stepping while debugging is often a waste
of time. - The cause of many bugs is obvious once you find
out where the program died. - Stack traces are quick, often helpful.
- top can tell you if program is eating all
memory or is stuck in a loop. - Check out the easy, obvious stuff first.
21Debugging the Hard Stuff
- Isolate the minimum input needed to make the bug
happen. - Run the program on different inputs to help
define the boundaries of the bug. - Treat a bug as a logic puzzle, not an
embarrassment. Nobodys perfect! - Put in print statements (possibly to a log file
if the volume is high) to make sure program is
behaving as you think it should. - If its not a typo, it may well be a conceptual
bug of some sort.
22Debugging Dynamic Memory
- Writing outside of bounds in dynamically
allocated memory can create crashes later. - Writing to memory after its freed, or freeing
memory twice causes similar problems. - Making malloc/new put sentinal values at start
and end, and keep a list of allocated blocks can
be invaluable. Free/delete can check these as
can a checkHeap routine.
23Modularize, But Not Too Much
- Keeping track of 100,000 lines of code is hard.
- Keeping track of 1,000 modules is even harder.
- Ideally a module encompasses a logically related
group of functions with a relatively simple
unchanging interface. - If interface needs constant changing it may
reflect over-modularization or grouping together
things that dont really belong together. - Its safer to add a new routine than to bend an
old one.
24When to Start Over
- A rewrite is often easier than an extensive
modification, at least if the interfaces are
clean. - When doing a new thing of any complexity,
frequently you have to throw away the first
attempt or two. - Think twice about a rewrite if the interface is
large or ill defined, and you have to maintain
backwards compatibility.
25Thorough Testing
- Programmers test the code as it is written.
- Crucial code should be reviewed and tested by a
second programmer. - As soon as theres a working skeleton, the
program should be regularly tested by
non-programming staff. - For several months before product launch its
worthwhile to have 3-6 testers from the user
community, and at least as many in-house testers
as programmers.
26Modifying Existing Code
- Always start from a compiling/working base.
- Before changing functionality
- Obtain a test suite of existing functionality
- Edit to increase locality as much as possible
- Consolidate code into smaller subroutines with
defined inputs and outputs - Read comments and parenthesize misleading ones.
- As much as possible limit changes inside of
existing routines to a few lines. - Rename things you modify to force yourself to
visit all places they are used.
27Implementing a Small Change
- CVS update source tree to get in sync with
everyone else. Some people have cron do this
every morning at 300am. - Test code in your region before starting.
- Implement small change onto hgwdev-user.
- Test code in your region.
- Do a quick overall test of program.
- CVS commit. CVS update and resolve conflicts if
necessary. - Test code in your region.
- Do a quick overall test of program
- Tell people and/or RT about change and compile it
to genome-test.
28Implementing a Medium Change
- Let genecats know what your planning to do.
- Cvs update at beginning. If it takes more than a
day carefully cvs update at least once a week,
testing locally and globally before and after
each update. - When you think youre done and it passes all your
tests, advise Heather and ask her to test on
hgwdev-user. - When Heather thinks its ok do one last cvs
update. Test. Cvs commit. Test. Make on
genome-test. Announce on genecats.
29Implementing a Large Change
- Make sure that Donna and genecats in general
agree that it is a good thing to do. - Create a branch under CVS to isolate what youre
working on from other developers. Contact Mark
D. for help with branching. - Test as you go as always. Be careful not to mix
up branch code with main code. - When large feature is implemented invite the
usual suspects to admire and test it before
merging in the branch. - Work with a senior developer (Angie, Matt, Mark
D., Chuck or me) to merge the branch.
30The Release Cycle
- Most developers most of the time are working
towards improving the code on genome-test. - Periodically a release branch of the code is
made, currently by Matt, tested extra hard, and
then moved to hgwbeta, and eventually to
genome. - Before release branch try to polish and test your
code and be sure to commit it. - Between branching and actual release please focus
on helping test other peoples code at least as
much as developing new features.
31Conclusions
- Building and maintaining a large multiprogrammer
project is a skill nobody is born with. - Keeping people informed what you are up to and
treating them with respect is essential. - Readable modular code and sensible version
control and testing can keep things working as
they grow.