Changes between Initial Version and Version 1 of CodingStandards

Jul 16, 2009, 7:13:14 PM (11 years ago)



  • CodingStandards

    v1 v1  
     5Coding Standards
     11Coding standards are one part of a development protocol. Other parts include
     12unit testing and code reviews. This document covers only coding standards.
     18Following a coding standard is like handwashing in a hospital:
     19both require discipline. Following the
     20protocol takes more time than ignoring it, and it's pretty difficult to
     21associate a particular negative incident (disease transmission or a software
     22bug) with a particular instance of failing to follow the protocol.
     23Nevertheless, every incidence of corner-cutting increases the
     24probability of a bad outcome somewhere down the line.
     26Development protocols attempt to avoid bad outcomes by reining in software
     27complexity. Entropy kills projects, and the Second Law of Thermodynamics is as
     28true in the software world as it is in the natural world. Perhaps you've heard
     29the maxim that the first 90% of a project takes 90% of the time and the last
     3010% takes the other 90% of the time. That doesn't have to be true, but it
     31often is. On some projects, that last 10% of development is like a game of
     32whack-a-mole. Smack a bug here, another pops up there.
     34In fact, if a project is messy enough the last 10% never gets completed. All
     35effort gets sucked into fixing bugs and inadvertently creating new ones.
     36Eventually one faces the choice of shipping something that's only 90% complete
     37or not shipping at all.
     39Those are some really bad outcomes. They can be avoided, but only by deliberate
     40action. **The only thing you get without effort is entropy.**
     42That said, here's the most obvious pressures on this project that call for
     43software development rigor.
     45- This project is being written by a team of 4+ people who are across the
     46  country from one another.
     48- Only one of them is strong in the project's primary language (Python).
     50- This project will subsume GAVA, Vespa and Matpulse. Software complexity
     51  usually grows exponentially in relation to size, so this project's complexity
     52  will exceed not only the individual projects but also the *sum* of the
     53  individual projects. That's a lot to manage!
     55- A larger project needs a long lifespan to justify the effort put into it,
     56  and a longer lifespan increases the odds that (a) someone totally new will
     57  join the project and need to understand the code and (b) the code will need
     58  to be modified and/or expanded in the future.
     60- The more people involved, the greater the odds that others will read, use
     61  and modify code that you write.
     63- The end result needs to be clean enough to encourage outsiders to
     64  contribute.
     67Words of Wisdom from the Masters
     70  "Controlling complexity is the essence of computer programming."
     72  -- Brian Kernighan
     74  "Let us change our traditional attitude to the construction of programs:
     75  Instead of imagining that our main task is to instruct a computer what to
     76  do, let us concentrate rather on explaining to human beings what we want a
     77  computer to do."
     79  -- Donald Knuth
     81  "Readability counts."
     83  -- Tim Peters in PEP 20.
     85This last quote is about design (aircraft design, actually) rather than code,
     86but it is one of my favorites.
     88  "It seems that perfection is reached not when there is nothing left to add,
     89  but when there is nothing left to take away".
     91  -- Antoine de Saint Exupéry
     94It's Not About You
     97The guidelines below are intended to help you write code that's easier
     98for *others* to work with. They're not about making *your* life easy.
     99If you think about it, that makes sense: there's a lot more of them
     100than there are you.
     102Be kind to them! They, in turn, will be kind to you.
     104And you never know, five years down the road it might be you
     105who has to read that long-forgotten code. You'll be glad, then, that
     106you considered the reader when you wrote it.
     110In General
     113- `Magic numbers <>`_
     114  are unacceptable.
     116- As a generalization of the above,
     117  `DRY <>`_ is a
     118  valuable concept.
     120- Comment your Subversion commits.
     122- Avoid abbreviations in variable, function, file and class names. There's
     123  usually more than one "obvious" way to abbreviate a word or phrase, so if
     124  you're not
     125  the author of the code (or sometimes even if you *are* the author of the
     126  code) it's hard to remember what abbreviation was used.
     128  For instance, if you're looking at a variable representing "metabolite
     129  description", the author could name it metabolite_desc or metabolite_descr
     130  or mdescription or m_desc or mdescr or md. Python requires a bit more
     131  care in this area than compiled languages (like C) since compilers complain
     132  about undeclared variables whereas Python will happily accept something like
     133  this:
     134  ::
     136    # Code added by person A
     137    mdesc = [1, 2, 3]
     139    # ...several pages of code here...
     141    # Code added by person B months later -- see the bug?
     142    if erase_previous_data:
     143        mdsc = None
     146  There's also the benefit that longer variable names help to document the code.
     147  The name `mdesc` could mean "mule desecration" for all I know, whereas
     148  `metabolite_description` carries meaning.
     150  Yes, using unabbreviated variable names makes it harder to respect PEP 8's
     151  recommendation of limiting lines to a maximum of 79 characters.
     153  Standard abbreviations are acceptable, like *fft* for Fast Fourier
     154  Transform, or *ppm* for parts per million. Obviously, "standard" is a weasel
     155  word that doesn't really say what's OK and what's not. There's no hard and
     156  fast rule; we'll have to judge on a case-by-case basis.
     158  Here's some questions to ask when you're trying to decide whether or not an
     159  abbreviation is OK --
     161  - Does the abbreviation appear more commonly than the expanded form?
     162  - Is my audience (i.e. those reading the code) likely to be familiar with
     163    the abbreviation?
     164  - Will I save a lot of typing by abbreviating?
     166- Don't be shy about using parentheses to clarify operator precedence. e.g.
     168  This works:
     169  ::
     171     z = something * PI - something_else / FUDGE_FACTOR
     173  This works and is easier to read:
     174  ::
     176     z = (something * PI) - (something_else / FUDGE_FACTOR)
     178  .. 
     180- Don't put redundant information in names. For instance, in a Person class it
     181  is unnecessary to call the attributes ``person_name``, ``person_address``,
     182  etc. Simply  use ``name`` and ``address`` instead. Similarly,
     183  if a file is part of the pyvespa
     184  project, there's no reason to name the file ````. Just
     185  ```` will suffice.
     187  As a bonus, the simpler name will still make
     188  sense if the project's name
     189  changes or is merged with another project.
     191- All of our source code should be straight ASCII. Be careful about copying &
     192  pasting text from MS Word that contains curly quotes or em/en dashes.
     194  If you're ever confronted with a choice as to what non-ASCII encoding to
     195  use, choose utf-8.
     197  Related note: there are files in PyVespa that have been generated by wxGlade
     198  that contain this Python metacomment:
     199  ::
     201     # -*- coding: iso-8859-15 -*-
     203  Please change this to comply with PEP 8. In practical terms, this means use
     204  ASCII unless you need non-ASCII characters, in which case use utf-8. It'd be
     205  nice if wxGlade would not output the encoding metacomment at all for ASCII
     206  files, but I don't know if we can control that.
     208- Always use / as the path separator. Microsoft operating systems accept both
     209  \\ and / (since DOS 2.0 `according to this discussion
     210  <>`_).
     211  It's only the DOS command
     212  line that hiccups on /. By contrast, backslash as a
     213  path separator only works under Windows and is an escape character in Python
     214  strings.
     216- If you come across (or write) some code that is or may be broken, fix it. If
     217  the fix isn't obvious or you don't have time, add a comment containing the
     218  string FIXME (no space!) in the comments and a brief explanation of what you
     219  think is wrong. e.g.
     220  ::
     222        if film == HOLY_GRAIL:
     223           bring_out_your_dead()
     224        elif film == LIFE_OF_BRIAN:
     225           look_on_bright_side()
     226        elif film == HOLLYWOOD_BOWL:
     227            albatross()
     228        # FIXME - need an else statement; how to handle unexpected cases?
     230  ..
     232C and C++
     235- Compile with `-Wall` on and emit no warnings.
     241- `Duck typing <>`_ is an important
     242  and valuable concept in Python that can feel strange if
     243  you're used to statically typed languages.
     245- The corollary -- if you find yourself using ``type()`` or
     246  ``isinstance()``, that's usually a sign of unPythonic code.
     249- Our project will require a minimum Python version of 2.5, so any language
     250  features (like the ternary operator) or libraries (like sqlite or ctypes) that
     251  are in 2.5 are fair game.
     253- If you're new to Python, use an editor with decent code highlighting so that
     254  it tells you when you're using a Python keyword as a variable name.
     256- PEP 8 is worth following. The main
     257  things to remember are CamelCase for class names and lower_with_underscores
     258  for variable names. Filenames should be all lower case since the filesystems
     259  on some of our target operating systems are not case-sensitive.
     261  Note that PEP 8 observes, "The naming conventions of Python's library are a
     262  bit of a mess...". It's true! The standard library is unfortunately not always
     263  a good example to follow.
     265  PEP 20 is also worth a read as it's really short.
     267- Never use the idiom ``from some_package import *``. It has a couple of
     268  disadvantages. For one, it clutters up your local namespace and can even lead
     269  to one module stepping on another's variables.
     271  The other huge disadvantage is that it makes one's code difficult to read.
     272  If the code
     273  imports * from, say, five modules and then calls a function ``foo()``,
     274  the person reading the code has to guess if the function is local, and
     275  if not, then which one of the five imported modules contains it.
     277  This is also true to a lesser extent for ``from some_package import xyz`` where
     278  xyz is a function. If I see a call to ``xyz()`` in the code, I have to look
     279  around
     280  to see whether it is a local function or an imported one. By contrast, when I
     281  see ```` in the code, I know exactly where that function comes
     282  from.
     284  If you find that you're importing some package with an inconveniently long
     285  name, make use of Python's as keyword:
     286  ::
     288   import xml.etree.ElementTree as ElementTree
     290  Be mindful of creating obscure abbreviations, however:
     291  ::
     293   import some_complicated_math_library.curves.splines as sp
     295  ..
     297- Python booleans are True and False, not 1 and 0. Be aware of this when you're
     298  porting code from languages that don't have a native Boolean type.
     299  Some examples include IDL, C, Fortran and possibly Matlab. They usually
     300  use 1 and 0 to represent
     301  true and false. (C++ has a native boolean type.)
     303  Note that it's OK to treat 1 and 0 as booleans in expressions, just don't
     304  *assign* them as booleans.
     306  For instance, if a variable (received from a C function for instance) has
     307  a value of 1 or 0 it is perfectly acceptable to do this:
     308  ::
     310    if some_c_library.function_that_returns_one_or_zero():
     311       do_something()
     313  It would be unPythonic, however, to do this:
     314  ::
     316    def on_foo_checkbox_clicked():
     317       self.foo_is_on = 1  # should be True, not 1
     319  As a specific application of duck typing, it's usually unPythonic to
     320  explicitly test for True and
     321  False. Note that all of these evaluate to False:
     322  ::
     324        bool(None)
     325        bool("")    # empty string
     326        bool([ ])   # empty list
     327        bool(( ))   # empty tuple
     328        bool({ })   # empty dict
     329        bool(0)
     331  All of these evaluate to True:
     332  ::
     334        bool(n) where n is a non-zero number
     335        bool(s) where s is a non-empty string
     336        bool(z) where z is a non-empty iterable (tuple or list)
     337        bool(m) where m is a non-empty mapping (dict)
     338        bool(o) where o is an object other than None
     340  Historical note: the values True and False weren't added to Python until
     341  sometime in the 2.x series (2.2 I think) so you might see some Python code --
     342  esp. Python library code which must remain compatible with very old
     343  versions -- using 1 and 0 instead of True and False.
     345- To prepare for Python 3.0, we need to `explicitly use "true"
     346  division <>`_.
     348  In order to do so, we need to add this to every module that uses division:
     349  ::
     351    from __future__ import division
     353  And then we need to review the use of division in those modules
     354  to ensure we're not breaking them.
     356  We can either pay this cost now, or pay it later when we want to move to
     357  Python 3 and there's a lot more code to review and fix.
     360- Python 2.2 introduced improved classes; these are called (rather
     361  unfortunately) "new"-style classes. Old-style classes are gone completely
     362  in Python 3. Our classes should always be new-style classes. To create a
     363  new-style class, inherit from object. e.g. this:
     364  ::
     366    class TransformThingy(object):
     368  not this:
     369  ::
     371    class TransformThingy():
     373  ..
     376- Python has the identity operator "is". It means "are these objects the same
     377  object" rather than "are they equivalent". The only time you'll probably need
     378  to use it is when comparing something to None.
     379  ::
     381       if foo is None:
     382           do_something()
     384  Since we prefer to perform simple boolean tests, the need to check explicitly
     385  for None (as opposed to False) might indicate a problem somewhere upstream, as
     386  this would be better:
     387  ::
     389      if not foo:
     390         do_something()
     392  Sometimes an explicit test for None is unavoidable, however.
     394  In short, the admonition against "is" is similar to that against
     395  ``isinstance()``, although less strong. If you find yourself using it, it's
     396  often a sign of a design flaw.
     399- Don't underestimate what you can learn from testing concepts in the Python
     400  interpreter. For instance, if you can't remember the rules
     401  for taking a slice of a string from the end, try it out in the Python
     402  interpreter:
     403  ::
     405        $ python
     406        Python 2.5.1 (r251:54863, Nov 17 2007, 21:19:53)
     407        [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
     408        Type "help", "copyright", "credits" or "license" for more information.
     409        >>> "abcde"[:-2]
     410        'abc'
     411        >>>