Version 19 (modified by flip, 11 years ago) (diff)


Coding Standards


Coding standards are one part of a development protocol. Other parts include unit testing and code reviews. This document covers only coding standards.


Following a coding standard is like handwashing in a hospital: both require discipline. Following the protocol takes more time than ignoring it, and it's pretty difficult to associate a particular negative incident (disease transmission or a software bug) with a particular instance of failing to follow the protocol. Nevertheless, every incidence of corner-cutting increases the probability of a bad outcome somewhere down the line.

Development protocols attempt to avoid bad outcomes by reining in software complexity. Entropy kills projects, and the Second Law of Thermodynamics is as true in the software world as it is in the natural world. Perhaps you've heard the maxim that the first 90% of a project takes 90% of the time and the last 10% takes the other 90% of the time. That doesn't have to be true, but it often is. On some projects, that last 10% of development is like a game of whack-a-mole. Smack a bug here, another pops up there.

In fact, if a project is messy enough the last 10% never gets completed. All effort gets sucked into fixing bugs and inadvertently creating new ones. Eventually one faces the choice of shipping something that's only 90% complete or not shipping at all.

Those are some really bad outcomes. They can be avoided, but only by deliberate action. The only thing you get without effort is entropy.

That said, here's the most obvious pressures on this project that call for software development rigor.

  • This project is being written by a team of 4+ people who are across the country from one another.
  • Only one of them is strong in the project's primary language (Python).
  • This project will subsume GAVA, Vespa and Matpulse. Software complexity usually grows exponentially in relation to size, so this project's complexity will exceed not only the individual projects but also the sum of the individual projects. That's a lot to manage!
  • A larger project needs a long lifespan to justify the effort put into it, and a longer lifespan increases the odds that (a) someone totally new will join the project and need to understand the code and (b) the code will need to be modified and/or expanded in the future.
  • The more people involved, the greater the odds that others will read, use and modify code that you write.
  • The end result needs to be clean enough to encourage outsiders to contribute.

Words of Wisdom from the Masters

"Controlling complexity is the essence of computer programming."

—Brian Kernighan

"Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do."

—Donald Knuth

"Readability counts."

—Tim Peters in PEP 20.

This last quote is about design (aircraft design, actually) rather than code, but it is one of my favorites.

"It seems that perfection is reached not when there is nothing left to add, but when there is nothing left to take away".

—Antoine de Saint Exupéry

It's Not About You

The guidelines below are intended to help you write code that's easier for others to work with. They're not about making your life easy. If you think about it, that makes sense: there's a lot more of them than there are you.

Be kind to them! They, in turn, will be kind to you.

And you never know, five years down the road it might be you who has to read that long-forgotten code. You'll be glad, then, that you considered the reader when you wrote it.

In General

  • Magic numbers are unacceptable.

  • As a generalization of the above, DRY is a valuable concept.

  • Comment your Subversion commits.

  • Avoid abbreviations in variable, function, file and class names. There's usually more than one "obvious" way to abbreviate a word or phrase, so if you're not the author of the code (or sometimes even if you are the author of the code) it's hard to remember what abbreviation was used.

    For instance, if you're looking at a variable representing "metabolite description", the author could name it metabolite_desc or metabolite_descr or mdescription or m_desc or mdescr or md. Python requires a bit more care in this area than compiled languages (like C) since compilers complain about undeclared variables whereas Python will happily accept something like this:

    # Code added by person A
    mdesc = [1, 2, 3]

    # ...several pages of code here...

    # Code added by person B months later -- see the bug?
    if erase_previous_data:
        mdsc = None

There's also the benefit that longer variable names help to document the code. The name mdesc could mean "mule desecration" for all I know, whereas metabolite_description carries meaning.

Yes, using unabbreviated variable names makes it harder to respect PEP 8's recommendation of limiting lines to a maximum of 79 characters.

Standard abbreviations are acceptable, like fft for Fast Fourier Transform, or ppm for parts per million. Obviously, "standard" is a weasel word that doesn't really say what's OK and what's not. There's no hard and fast rule; we'll have to judge on a case-by-case basis.

Here's some questions to ask when you're trying to decide whether or not an abbreviation is OK --

  • Does the abbreviation appear more commonly than the expanded form?
  • Is my audience (i.e. those reading the code) likely to be familiar with the abbreviation?
  • Will I save a lot of typing by abbreviating?
  • Don't be shy about using parentheses to clarify operator precedence. e.g.

    This works:

     z = something * PI - something_else / FUDGE_FACTOR
This works and makes your intent clear:
     z = (something * PI) - (something_else / FUDGE_FACTOR)
  • Don't put redundant information in names. For instance, in a Person class it is unnecessary to call the attributes person_name, person_address, etc. Simply use name and address instead. Similarly, if a file is part of the Analysis project, there's no reason to name the file Just will suffice.

    As a bonus, the simpler name will still make sense if the project's name changes or is merged with another project.

  • All of our source code should be straight ASCII. Be careful about copying & pasting text from MS Word that contains curly quotes or em/en dashes.

    If you're ever confronted with a choice as to what non-ASCII encoding to use, choose utf-8.

  • Always use / as the path separator. Microsoft operating systems accept both \ and / (since DOS 2.0 according to this discussion). It's only the DOS command line that hiccups on /. By contrast, backslash as a path separator only works under Windows and is an escape character in Python strings.

  • If you come across (or write) some code that is or may be broken, fix it. If the fix isn't obvious or you don't have time, add a comment containing the string FIXME (no space!) in the comments and a brief explanation of what you think is wrong. e.g.

        if film == HOLY_GRAIL:
        elif film == LIFE_OF_BRIAN:
        elif film == HOLLYWOOD_BOWL:
        # FIXME - need an else statement; how to handle unexpected cases?

C and C++

C++ coding standards in detail: CppCodingStandards


  • Duck typing is an important and valuable concept in Python that can feel strange if you're used to statically typed languages.

  • The corollary -- if you find yourself using type() or isinstance(), that's usually a sign of unPythonic code.

  • Our project will require a minimum Python version of 2.5, so any language features (like the ternary operator) or libraries (like sqlite or ctypes) that are in 2.5 are fair game.

  • If you're new to Python, use an editor with decent code highlighting so that it tells you when you're using a Python keyword as a variable name.

  • PEP 8 is worth following. The main things to remember are CamelCase for class names and lower_with_underscores for variable names. Filenames should be all lower case since the filesystems on some of our target operating systems are not case-sensitive.

    Note that PEP 8 observes, "The naming conventions of Python's library are a bit of a mess...". It's true! The standard library is unfortunately not always a good example to follow.

    PEP 20 is also worth a read as it's really short.

  • Never use the idiom from some_package import *. It has a couple of disadvantages. For one, it clutters up your local namespace and can even lead to one module stepping on another's variables.

    The other huge disadvantage is that it makes one's code difficult to read. If the code imports * from, say, five modules and then calls a function foo(), the person reading the code has to guess if the function is local, and if not, then which one of the five imported modules contains it.

    This is also true to a lesser extent for from some_package import xyz where xyz is a function. If I see a call to xyz() in the code, I have to look around to see whether it is a local function or an imported one. By contrast, when I see in the code, I know exactly where that function comes from.

    If you find that you're importing some package with an inconveniently long name, make use of Python's as keyword:

   import xml.etree.ElementTree as ElementTree
Be mindful of creating obscure abbreviations, however:
   import some_complicated_math_library.curves.splines as sp
  • Python booleans are True and False, not 1 and 0. Be aware of this when you're porting code from languages that don't have a native Boolean type. Some examples include IDL, C, Fortran and possibly Matlab. They usually use 1 and 0 to represent true and false. (C++ has a native boolean type.)

    Note that it's OK to treat 1 and 0 as booleans in expressions, just don't assign them as booleans.

    For instance, if a variable (received from a C function for instance) has a value of 1 or 0 it is perfectly acceptable to do this:

    if some_c_library.function_that_returns_one_or_zero():
It would be unPythonic, however, to do this:
    def on_foo_checkbox_clicked():
       self.foo_is_on = 1  # should be True, not 1
As a specific application of duck typing, it's usually unPythonic to explicitly test for True and False. Note that all of these evaluate to False:
        bool("")    # empty string
        bool([ ])   # empty list
        bool(( ))   # empty tuple
        bool({ })   # empty dict
All of these evaluate to True:
        bool(n) where n is a non-zero number
        bool(s) where s is a non-empty string
        bool(z) where z is a non-empty iterable (tuple or list)
        bool(m) where m is a non-empty mapping (dict)
        bool(o) where o is an object other than None 
Historical note: the values True and False weren't added to Python until sometime in the 2.x series (2.2 I think) so you might see some Python code -- esp. Python library code which must remain compatible with very old versions -- using 1 and 0 instead of True and False.
    from __future__ import division

And then we need to review the use of division in those modules to ensure we're not breaking them.

We can either pay this cost now, or pay it later when we want to move to Python 3 and there's a lot more code to review and fix.

  • Python 2.2 introduced improved classes; these are called (rather unfortunately) "new"-style classes. Old-style classes are gone completely in Python 3. Our classes should always be new-style classes. To create a new-style class, inherit from object. e.g. this:
    class TransformThingy(object):
not this:
    class TransformThingy():
  • Python has the identity operator "is". It means "are these objects the same object" rather than "are they equivalent". The only time you'll probably need to use it is when comparing something to None.
       if foo is None:
Since we prefer to perform simple boolean tests, the need to check explicitly for None (as opposed to False) might indicate a problem somewhere upstream, as this would be better:
      if not foo:

Sometimes an explicit test for None is unavoidable, however.

In short, the admonition against "is" is similar to that against isinstance(), although less strong. If you find yourself using it, it's often a sign of a design flaw.

  • Don't underestimate what you can learn from testing concepts in the Python interpreter. For instance, if you can't remember the rules for taking a slice of a string from the end, try it out in the Python interpreter:
        $ python
        Python 2.5.1 (r251:54863, Nov 17 2007, 21:19:53) 
        [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
        Type "help", "copyright", "credits" or "license" for more information.
        >>> "abcde"[:-2]