Changes between Initial Version and Version 1 of CodingStandards


Ignore:
Timestamp:
Jul 16, 2009, 7:13:14 PM (11 years ago)
Author:
flip
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CodingStandards

    v1 v1  
     1{{{
     2#!rst
     3
     4================
     5Coding Standards
     6================
     7
     8
     9About
     10-----
     11Coding standards are one part of a development protocol. Other parts include
     12unit testing and code reviews. This document covers only coding standards.
     13
     14
     15Why?
     16--------------------------
     17
     18Following a coding standard is like handwashing in a hospital:
     19both require discipline. Following the
     20protocol takes more time than ignoring it, and it's pretty difficult to
     21associate a particular negative incident (disease transmission or a software
     22bug) with a particular instance of failing to follow the protocol.
     23Nevertheless, every incidence of corner-cutting increases the
     24probability of a bad outcome somewhere down the line.
     25
     26Development protocols attempt to avoid bad outcomes by reining in software
     27complexity. Entropy kills projects, and the Second Law of Thermodynamics is as
     28true in the software world as it is in the natural world. Perhaps you've heard
     29the maxim that the first 90% of a project takes 90% of the time and the last
     3010% takes the other 90% of the time. That doesn't have to be true, but it
     31often is. On some projects, that last 10% of development is like a game of
     32whack-a-mole. Smack a bug here, another pops up there.
     33
     34In fact, if a project is messy enough the last 10% never gets completed. All
     35effort gets sucked into fixing bugs and inadvertently creating new ones.
     36Eventually one faces the choice of shipping something that's only 90% complete
     37or not shipping at all.
     38
     39Those are some really bad outcomes. They can be avoided, but only by deliberate
     40action. **The only thing you get without effort is entropy.**
     41
     42That said, here's the most obvious pressures on this project that call for
     43software development rigor.
     44
     45- This project is being written by a team of 4+ people who are across the
     46  country from one another.
     47
     48- Only one of them is strong in the project's primary language (Python).
     49
     50- This project will subsume GAVA, Vespa and Matpulse. Software complexity
     51  usually grows exponentially in relation to size, so this project's complexity
     52  will exceed not only the individual projects but also the *sum* of the
     53  individual projects. That's a lot to manage!
     54
     55- A larger project needs a long lifespan to justify the effort put into it,
     56  and a longer lifespan increases the odds that (a) someone totally new will
     57  join the project and need to understand the code and (b) the code will need
     58  to be modified and/or expanded in the future.
     59
     60- The more people involved, the greater the odds that others will read, use
     61  and modify code that you write.
     62
     63- The end result needs to be clean enough to encourage outsiders to
     64  contribute.
     65
     66
     67Words of Wisdom from the Masters
     68--------------------------------
     69
     70  "Controlling complexity is the essence of computer programming."
     71 
     72  -- Brian Kernighan
     73
     74  "Let us change our traditional attitude to the construction of programs:
     75  Instead of imagining that our main task is to instruct a computer what to
     76  do, let us concentrate rather on explaining to human beings what we want a
     77  computer to do."
     78 
     79  -- Donald Knuth
     80
     81  "Readability counts."
     82 
     83  -- Tim Peters in PEP 20.
     84
     85This last quote is about design (aircraft design, actually) rather than code,
     86but it is one of my favorites.
     87
     88  "It seems that perfection is reached not when there is nothing left to add,
     89  but when there is nothing left to take away".
     90   
     91  -- Antoine de Saint Exupéry
     92
     93
     94It's Not About You
     95--------------------
     96
     97The guidelines below are intended to help you write code that's easier
     98for *others* to work with. They're not about making *your* life easy.
     99If you think about it, that makes sense: there's a lot more of them
     100than there are you.
     101
     102Be kind to them! They, in turn, will be kind to you.
     103
     104And you never know, five years down the road it might be you
     105who has to read that long-forgotten code. You'll be glad, then, that
     106you considered the reader when you wrote it.
     107
     108}}}
     109
     110In General
     111----------
     112
     113- `Magic numbers <http://en.wikipedia.org/wiki/Magic_number_(programming)#Unnamed_numerical_constants>`_
     114  are unacceptable.
     115 
     116- As a generalization of the above,
     117  `DRY <http://en.wikipedia.org/wiki/Don%27t_repeat_yourself>`_ is a
     118  valuable concept.
     119
     120- Comment your Subversion commits.
     121
     122- Avoid abbreviations in variable, function, file and class names. There's
     123  usually more than one "obvious" way to abbreviate a word or phrase, so if
     124  you're not
     125  the author of the code (or sometimes even if you *are* the author of the
     126  code) it's hard to remember what abbreviation was used.
     127
     128  For instance, if you're looking at a variable representing "metabolite
     129  description", the author could name it metabolite_desc or metabolite_descr
     130  or mdescription or m_desc or mdescr or md. Python requires a bit more
     131  care in this area than compiled languages (like C) since compilers complain
     132  about undeclared variables whereas Python will happily accept something like
     133  this:
     134  ::
     135   
     136    # Code added by person A
     137    mdesc = [1, 2, 3]
     138
     139    # ...several pages of code here...
     140
     141    # Code added by person B months later -- see the bug?
     142    if erase_previous_data:
     143        mdsc = None
     144
     145
     146  There's also the benefit that longer variable names help to document the code.
     147  The name `mdesc` could mean "mule desecration" for all I know, whereas
     148  `metabolite_description` carries meaning.
     149
     150  Yes, using unabbreviated variable names makes it harder to respect PEP 8's
     151  recommendation of limiting lines to a maximum of 79 characters.
     152
     153  Standard abbreviations are acceptable, like *fft* for Fast Fourier
     154  Transform, or *ppm* for parts per million. Obviously, "standard" is a weasel
     155  word that doesn't really say what's OK and what's not. There's no hard and
     156  fast rule; we'll have to judge on a case-by-case basis.
     157
     158  Here's some questions to ask when you're trying to decide whether or not an
     159  abbreviation is OK --
     160 
     161  - Does the abbreviation appear more commonly than the expanded form?
     162  - Is my audience (i.e. those reading the code) likely to be familiar with
     163    the abbreviation?
     164  - Will I save a lot of typing by abbreviating?
     165   
     166- Don't be shy about using parentheses to clarify operator precedence. e.g.
     167
     168  This works:
     169  ::
     170 
     171     z = something * PI - something_else / FUDGE_FACTOR
     172 
     173  This works and is easier to read:
     174  ::
     175 
     176     z = (something * PI) - (something_else / FUDGE_FACTOR)
     177
     178  .. 
     179
     180- Don't put redundant information in names. For instance, in a Person class it
     181  is unnecessary to call the attributes ``person_name``, ``person_address``,
     182  etc. Simply  use ``name`` and ``address`` instead. Similarly,
     183  if a file is part of the pyvespa
     184  project, there's no reason to name the file ``pyvespa_utilities.py``. Just
     185  ``utilities.py`` will suffice.
     186 
     187  As a bonus, the simpler name will still make
     188  sense if the project's name
     189  changes or is merged with another project.
     190
     191- All of our source code should be straight ASCII. Be careful about copying &
     192  pasting text from MS Word that contains curly quotes or em/en dashes.
     193
     194  If you're ever confronted with a choice as to what non-ASCII encoding to
     195  use, choose utf-8.
     196
     197  Related note: there are files in PyVespa that have been generated by wxGlade
     198  that contain this Python metacomment:
     199  ::
     200 
     201     # -*- coding: iso-8859-15 -*-
     202
     203  Please change this to comply with PEP 8. In practical terms, this means use
     204  ASCII unless you need non-ASCII characters, in which case use utf-8. It'd be
     205  nice if wxGlade would not output the encoding metacomment at all for ASCII
     206  files, but I don't know if we can control that.
     207
     208- Always use / as the path separator. Microsoft operating systems accept both
     209  \\ and / (since DOS 2.0 `according to this discussion
     210  <http://bytes.com/groups/python/23123-when-did-windows-start-accepting-forward-slash-path-separator>`_).
     211  It's only the DOS command
     212  line that hiccups on /. By contrast, backslash as a
     213  path separator only works under Windows and is an escape character in Python
     214  strings.
     215
     216- If you come across (or write) some code that is or may be broken, fix it. If
     217  the fix isn't obvious or you don't have time, add a comment containing the
     218  string FIXME (no space!) in the comments and a brief explanation of what you
     219  think is wrong. e.g.
     220  ::
     221   
     222        if film == HOLY_GRAIL:
     223           bring_out_your_dead()
     224        elif film == LIFE_OF_BRIAN:
     225           look_on_bright_side()
     226        elif film == HOLLYWOOD_BOWL:
     227            albatross()
     228        # FIXME - need an else statement; how to handle unexpected cases?
     229
     230  ..
     231
     232C and C++
     233---------
     234
     235- Compile with `-Wall` on and emit no warnings.
     236
     237
     238Python
     239------
     240
     241- `Duck typing <http://en.wikipedia.org/wiki/Duck_typing>`_ is an important
     242  and valuable concept in Python that can feel strange if
     243  you're used to statically typed languages.
     244   
     245- The corollary -- if you find yourself using ``type()`` or
     246  ``isinstance()``, that's usually a sign of unPythonic code.
     247
     248
     249- Our project will require a minimum Python version of 2.5, so any language
     250  features (like the ternary operator) or libraries (like sqlite or ctypes) that
     251  are in 2.5 are fair game.
     252
     253- If you're new to Python, use an editor with decent code highlighting so that
     254  it tells you when you're using a Python keyword as a variable name.
     255
     256- PEP 8 is worth following. The main
     257  things to remember are CamelCase for class names and lower_with_underscores
     258  for variable names. Filenames should be all lower case since the filesystems
     259  on some of our target operating systems are not case-sensitive.
     260
     261  Note that PEP 8 observes, "The naming conventions of Python's library are a
     262  bit of a mess...". It's true! The standard library is unfortunately not always
     263  a good example to follow.
     264
     265  PEP 20 is also worth a read as it's really short.
     266 
     267- Never use the idiom ``from some_package import *``. It has a couple of
     268  disadvantages. For one, it clutters up your local namespace and can even lead
     269  to one module stepping on another's variables.
     270
     271  The other huge disadvantage is that it makes one's code difficult to read.
     272  If the code
     273  imports * from, say, five modules and then calls a function ``foo()``,
     274  the person reading the code has to guess if the function is local, and
     275  if not, then which one of the five imported modules contains it.
     276
     277  This is also true to a lesser extent for ``from some_package import xyz`` where
     278  xyz is a function. If I see a call to ``xyz()`` in the code, I have to look
     279  around
     280  to see whether it is a local function or an imported one. By contrast, when I
     281  see ``some_package.xyz()`` in the code, I know exactly where that function comes
     282  from.
     283
     284  If you find that you're importing some package with an inconveniently long
     285  name, make use of Python's as keyword:
     286  ::
     287 
     288   import xml.etree.ElementTree as ElementTree
     289
     290  Be mindful of creating obscure abbreviations, however:
     291  ::
     292 
     293   import some_complicated_math_library.curves.splines as sp
     294 
     295  ..
     296
     297- Python booleans are True and False, not 1 and 0. Be aware of this when you're
     298  porting code from languages that don't have a native Boolean type.
     299  Some examples include IDL, C, Fortran and possibly Matlab. They usually
     300  use 1 and 0 to represent
     301  true and false. (C++ has a native boolean type.)
     302
     303  Note that it's OK to treat 1 and 0 as booleans in expressions, just don't
     304  *assign* them as booleans.
     305
     306  For instance, if a variable (received from a C function for instance) has
     307  a value of 1 or 0 it is perfectly acceptable to do this:
     308  ::
     309 
     310    if some_c_library.function_that_returns_one_or_zero():
     311       do_something()
     312
     313  It would be unPythonic, however, to do this:
     314  ::
     315 
     316    def on_foo_checkbox_clicked():
     317       self.foo_is_on = 1  # should be True, not 1
     318
     319  As a specific application of duck typing, it's usually unPythonic to
     320  explicitly test for True and
     321  False. Note that all of these evaluate to False:
     322  ::
     323 
     324        bool(None)
     325        bool("")    # empty string
     326        bool([ ])   # empty list
     327        bool(( ))   # empty tuple
     328        bool({ })   # empty dict
     329        bool(0)
     330
     331  All of these evaluate to True:
     332  ::
     333 
     334        bool(n) where n is a non-zero number
     335        bool(s) where s is a non-empty string
     336        bool(z) where z is a non-empty iterable (tuple or list)
     337        bool(m) where m is a non-empty mapping (dict)
     338        bool(o) where o is an object other than None
     339
     340  Historical note: the values True and False weren't added to Python until
     341  sometime in the 2.x series (2.2 I think) so you might see some Python code --
     342  esp. Python library code which must remain compatible with very old
     343  versions -- using 1 and 0 instead of True and False.
     344
     345- To prepare for Python 3.0, we need to `explicitly use "true"
     346  division <http://www.python.org/doc/2.2.3/whatsnew/node7.html>`_.
     347
     348  In order to do so, we need to add this to every module that uses division:
     349  ::
     350 
     351    from __future__ import division
     352 
     353  And then we need to review the use of division in those modules
     354  to ensure we're not breaking them.
     355
     356  We can either pay this cost now, or pay it later when we want to move to
     357  Python 3 and there's a lot more code to review and fix.
     358
     359
     360- Python 2.2 introduced improved classes; these are called (rather
     361  unfortunately) "new"-style classes. Old-style classes are gone completely
     362  in Python 3. Our classes should always be new-style classes. To create a
     363  new-style class, inherit from object. e.g. this:
     364  ::
     365 
     366    class TransformThingy(object):
     367   
     368  not this:
     369  ::
     370 
     371    class TransformThingy():
     372
     373  ..
     374
     375
     376- Python has the identity operator "is". It means "are these objects the same
     377  object" rather than "are they equivalent". The only time you'll probably need
     378  to use it is when comparing something to None.
     379  ::
     380 
     381       if foo is None:
     382           do_something()
     383
     384  Since we prefer to perform simple boolean tests, the need to check explicitly
     385  for None (as opposed to False) might indicate a problem somewhere upstream, as
     386  this would be better:
     387  ::
     388 
     389      if not foo:
     390         do_something()
     391
     392  Sometimes an explicit test for None is unavoidable, however.
     393
     394  In short, the admonition against "is" is similar to that against
     395  ``isinstance()``, although less strong. If you find yourself using it, it's
     396  often a sign of a design flaw.
     397
     398
     399- Don't underestimate what you can learn from testing concepts in the Python
     400  interpreter. For instance, if you can't remember the rules
     401  for taking a slice of a string from the end, try it out in the Python
     402  interpreter:
     403  ::
     404   
     405        $ python
     406        Python 2.5.1 (r251:54863, Nov 17 2007, 21:19:53)
     407        [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
     408        Type "help", "copyright", "credits" or "license" for more information.
     409        >>> "abcde"[:-2]
     410        'abc'
     411        >>>
     412