Changes between Version 10 and Version 11 of SwiggingHlsvd


Ignore:
Timestamp:
Jul 14, 2009, 4:42:46 PM (10 years ago)
Author:
flip
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SwiggingHlsvd

    v10 v11  
    11{{{
    22#!rst
    3 FooBar Header
    4 =============
    5 
    6 reStructuredText is **nice**.
    7 
    8 It has its own webpage_.
    9 
    10 A table:
    11 
    12 RST TracLinks
    13 -------------
    14 
    15 See also ticket `#42`:trac:.
    16 
    17 .. _webpage: http://docutils.sourceforge.net/rst.html
     3=============================================
     4Making the hlsvd Library Accessible to Python
     5=============================================
     6
     7Introduction
     8------------------------------------
     9
     10hlsvd is a library written in FORTRAN. We've run it through f2c, a FORTRAN to C convertor. Our version of the library is therefore written in autogenerated  C. It exposes just one function called ``hlsvd()``.
     11
     12This is an explanation of what I learned using SWIG and ctypes to wrap the hlsvd library in order to make its function accessible to Python.
     13
     14Comments on the hlsvd Code
     15------------------------------------
     16
     17I'm impressed by f2c; it's a very cool tool. Nevertheless, there's some imperfections I'd like to address. They're not specific to SWIG and ctypes; they're evident even when calling hlsvd from straight C.
     18
     19First, the f2c-generated code gives warnings when it compiles. They're mostly suggestions to parenthesize expressions so as not to rely on C operator precedence for statement correctness. I think they're only warning that the code is hard for humans to parse; I don't think the code is broken. However, I'm loath to ignore *any* compiler warnings; I want clean compiles. Hopefully we are on the same wavelength on this subject and I don't have to explain why I think clean compiles are extremely important, particularly in code that we'll share with others.
     20
     21Cleaning up the code to eliminate the warnings is easy enough, but modifying auto-generated code comes with the caveat that if it is ever auto-generated again, bye-bye modifications. I think that's a low risk in this case since (a) the FORTRAN hlsvd code is by now probably quite stable and so probably won't need to be f2c-ed again, and (b) any modifications we'd apply are pretty lightweight and would be pretty easy to reapply if necessary.
     22
     23A second problem with the code generated by f2c is the function interface. All of the parameters are pass-by-reference. This is technically correct because all FORTRAN params are pass-by-reference. C, however, is pass-by-value so f2c has to make every parameter to ``hlsvd()`` a pointer in order to mimic pass-by-reference semantics. It's unnecessary since many of the params to hlsvd are input-only, and it makes calling the function (from C, ctypes, or SWIG) more complicated than it needs to be.
     24
     25I'd like to modify the f2c-generated hlsvd.c to simplify the call signature by making all of the input-only parameters regular C pass-by-value parameters. It might be easier to add a wrapper function that achieves the same result. A wrapper would live in its own file which would have the advantage of not getting overwritten if this code is ever regenerated from the FORTRAN source.
     26
     27A third problem is that the generated hlsvd library exports about a zillion functions when it really only needs to export one. That's a little sloppy. It can be addressed by adding a "static" modifier to each function in the source code or by adding some linker commands. We can also just ignore it.
     28
     29Last but not least, f2c creates several C typedefs to mimic the FORTRAN variable types integer, real and double precision. In the code we have, these are mapped to the C types long, float and double. (By supplying options to f2c when it is run, one can modify these typedefs.) There's nothing wrong with f2c's choices and it would be unfair to present these typedefs as a flaw in f2c. Nevertheless, they do present us with a problem.
     30
     31The problem is that regardless of whether we use SWIG or ctypes or our own handwritten interface, sooner or later something will have to convert those C variables to Python objects by making calls to the Python API. Python's API provides functions for converting standard C types (int, long, float, char \*, etc.) to Python types. (For instance, the Python C library function ``PyInt_FromLong()`` accepts a C long as a parameter and returns a new Python integer object.) However, it does not (cannot) know anything about custom typedefs. At some level, a piece of code is going to have to make an assumption about the true type (i.e. the standard C type) to which the typedef maps.
     32
     33In our case, that's simple since there's only three typedefs we have to worry about and they're very simple. Nevertheless, if this code is f2c-ed again and for whatever reason the typedefs change (for instance, if f2c decides to start typedef-fing FORTRAN reals as C doubles instead of C floats), then our assumption will break and so will our code. Therefore, we have a dependency on how f2c generates code.
     34
     35If we never run f2c to regenerate this code, then this is a non-issue. If we do run f2c again, we must ensure that the typedefs of integer, real and doublereal still map to C's long, float and double.
     36
     37
     38Calling hlsvd from Python
     39------------------------------------
     40
     41Before I get into specifics about SWIG and ctypes, I want to describe what an ideal Python interface to ``hlsvd()`` would look like. Remember that this function takes inputs of various types and returns a number of values also of various types. Returning multiple values from a function is one the of things Python makes easy, so we should take advantage of that.
     42
     43The C interface looks something like this:
     44::
     45
     46    int hlsvd(float *,        //  1 - yr (input)
     47               float *,       //  2 - yi (input)
     48               long *,        //  3 - ndata (input)
     49               long *,        //  4 - ndp (input)
     50               float *,       //  5 - tstep (input)
     51               long *,        //  6 - m (input)
     52               long *,        //  7 - nssv (input)
     53               double *,      //  8 - rootr (output)
     54               double *,      //  9 - rooti (output)
     55               float *,       // 10 - fre (output)
     56               float *,       // 11 - dam (output)
     57               float *,       // 12 - amp (output)
     58               float *,       // 13 - phase (output)
     59               double *,      // 14 - sv (output)
     60               double *,      // 15 - snoise (output)
     61               double *,      // 16 - rms (output)
     62               long *,        // 17 - ndiv (output)
     63               long *,        // 18 - nit (input)
     64               long *);       // 19 - nitf (output)
     65
     66
     67This comment about the parameters is stolen from the f2c-ed code:
     68::
     69
     70    /*   on entry: */
     71    /*     yr    - data array (real part of time-domain signal) */
     72    /*     yi    - data array (imaginary part of time-domain signal) */
     73    /*     ndata - total number of data-points of experimental signal */
     74    /*     ndp   - number of data-points being used in the hlsvd analysis */
     75    /*     tstep - stepsize (milli-seconds) */
     76    /*     m     - size parameter of hankel data matrix */
     77    /*     nssv  - start value for number of signal-related singular values */
     78    /*     nit   - number of iterations entered */
     79
     80    /*   on exit: */
     81    /*     rootr - signal-root array (real part) */
     82    /*     rooti - signal-root array (imaginary part) */
     83    /*     fre   - frequency array (khz) */
     84    /*     dam   - damping-factor array (milli-seconds) */
     85    /*     amp   - amplitude array (arbitrary units) */
     86    /*     phase - phase array (degrees) */
     87    /*     sv    - singular value array */
     88    /*     snois - estimation of signal noise level (standard deviation) */
     89    /*     rms   - root mean square of hlsvd fit */
     90    /*     nitf  - final number of iterations */
     91    /*     ndiv  - number of singular values found */
     92
     93
     94The code to call an ideal Python version of ``hlsvd()`` might look something like the example below. (For clarity's sake I'm sticking with the same variable names as in the comments above; I generally prefer to avoid abbreviations.)
     95::
     96
     97    # - y is an iterable type (e.g. a list) of complex numbers. Complex
     98    # numbers are a native type in Python.
     99    # - ndata is no longer necessary as it can be inferred from len(y)
     100    return_tuple = hlsvd(y, ndp, tstep, m, nssv, nit)
     101
     102    # Extract values from the return_tuple
     103    # - root is a list of complex numbers
     104    # - fre, dam, amp, phase and sv are lists of floats
     105    # - snoise, rms, nitf and ndiv are floats or ints as appropriate
     106    root, fre, dam, amp, phase, sv, snoise, rms, nitf, ndiv = return_tuple
     107
     108
     109This code has several advantages over the C version. For starters, the input and output parameters are clearly separated which makes the code easier to read. Second, since complex numbers are a native type in Python, there's no need to represent them as two correlated lists of floats. That makes the input parameter y and the output parameter root simpler to read and manipulate. It also eliminates the possibility of a mismatched list-size bug (i.e. if a different number of real and imaginary values are supplied). Yet another advantage is that we can eliminate one parameter (ndata) that can be inferred from ``len(y)``.
     110
     111Another bug that can't occur here is a memory overwrite due to undersized output parameters. In C one must preallocate the output parameters (like fre, amp, etc.) using ``malloc()`` or by declaring them on the stack. If they're too small, ``hlsvd()`` might crash (if you're lucky) or silently alter the values of neighboring arrays (if you're not). That can't happen in a Python interface like the one above.
     112
     113Last but not least, in the C interface some of the output parameters are floats and some are doubles. Python's float type covers both C float and C double, so we don't have to remember which is which.
     114
     115Our Python call is just two lines of code, and could be one:
     116::
     117
     118    root, fre, dam, amp, phase, sv, snoise, rms, nitf, ndiv = \
     119        hlsvd(y, ndp, tstep, m, nssv, nit)
     120
     121
     122With this Pythonic interface in mind, let's look at how the interfaces generated by SWIG and ctypes stack up.
     123
     124
     125Wrapping hlsvd with SWIG
     126------------------------------------
     127
     128Wrapping anything with SWIG requires describing it to SWIG via an interface (.i) file. In an ideal world, SWIG can infer what it needs to know from a .h file and the interface file just refers SWIG to the .h file. In the case of hlsvd, there's no .h file so I have to describe the interface to SWIG explicitly.
     129
     130Complications arise with ``hlsvd()`` because it takes arrays (rather than simple types) as input parameters. Further complications result from the fact that it expects to be passed pre-allocated arrays to which it will write output. This is standard stuff with C but takes some work to express in a pointer-less language like Python.
     131
     132The resulting code is something only a mother could love. I should point out that I think SWIG is near miraculous in what it can do, and I don't intend any of my snide comments as criticism of SWIG or its developers. It's just that autotranslated code has all the charm of autotranslated writing. If you've ever used Babelfish, you know what I'm talking about. Meaning is (usually) preserved, albeit crudely, but all the idioms are lost and it's eminently clear that the text (or in this case, the interface) was not created by a native speaker.
     133
     134So we get ugly code like this (where inputs is a Python list I populated from a file):
     135::
     136
     137    yr = hlsvd_wrapper.new_float_array(2048)
     138    for i in range(0, 2048):
     139        hlsvd_wrapper.float_array_setitem(yr, i, inputs[i])
     140
     141
     142And this (which I think is only necessary because of ``hlsvd()``'s pass-by-reference awkwardness I mentioned above):
     143::
     144
     145   tstep = hlsvd_wrapper.new_tstep()
     146   hlsvd_wrapper.tstep_assign(tstep, .256)
     147
     148
     149And a series of calls like this one to create the output arrays:
     150::
     151
     152   rootr = hlsvd_wrapper.new_double_array(50)
     153
     154In all, here's what one has to do just to create (not populate) the variables to be used in the call to ``hlsvd()``:
     155::
     156
     157    yr = hlsvd_wrapper.new_float_array(2048)
     158    yi = hlsvd_wrapper.new_float_array(2048)
     159    ndata = hlsvd_wrapper.new_ndata()
     160    ndp = hlsvd_wrapper.new_ndp()
     161    tstep = hlsvd_wrapper.new_tstep()
     162    m = hlsvd_wrapper.new_m()
     163    nssv = hlsvd_wrapper.new_nssv()
     164    nit = hlsvd_wrapper.new_nit()
     165    rootr = hlsvd_wrapper.new_double_array(50)
     166    rooti = hlsvd_wrapper.new_double_array(50)
     167    fre = hlsvd_wrapper.new_float_array(50)
     168    dam = hlsvd_wrapper.new_float_array(50)
     169    amp = hlsvd_wrapper.new_float_array(50)
     170    phase = hlsvd_wrapper.new_float_array(50)
     171    sv = hlsvd_wrapper.new_double_array(1024)
     172    snoise = hlsvd_wrapper.new_snoise()
     173    rms = hlsvd_wrapper.new_rms()
     174    ndiv = hlsvd_wrapper.new_ndiv()
     175    nitf = hlsvd_wrapper.new_nitf()
     176
     177
     178The call itself looks like this:
     179::
     180
     181    hlsvd_wrapper.hlsvd(yr, yi, ndata, ndp, tstep, m, nssv,
     182                         rootr, rooti, fre, dam, amp, phase, sv, snoise,
     183                         rms, ndiv, nit, nitf)
     184
     185
     186Note that every disadvantage of the C interface remains. There's no distinction between input and output parameters. The complex numbers are still represented as correlated lists of floats. The list/array sizes must be specified explicitly, and an opportunity for memory corruption or a crash exists if one gets it wrong. One must keep track of which variables are floats and which are doubles. And it's many more lines of code than the Python version. (Some of this is exacerbated by ``hlsvd()``'s pass-by-reference interface.)
     187
     188I should also note that generating the code to even make this interface possible requires the creation of a non-trivial interface file, a SWIG-ing step to auto-generate two cooperative wrappers (one in C and one in Python), and compilation and linkage of that C wrapper. Once the code is constructed, the call chain looks like this:
     189::
     190
     191   My code --> SWIG's .py wrapper --> SWIG's .c wrapper --> libhlsvd
     192
     193It works, for which I bow deeply to the SWIG developers. However, it was a fair amount of work to generate the interface file. (Granted, I was climbing the SWIG learning curve.) In addition, the result is  unPythonic and leaves us exposed to a lot of C's unpleasantness.
     194
     195
     196Wrapping hlsvd With Ctypes
     197------------------------------------
     198
     199Ctypes is part of the Python standard library. The Python documentation describes it like this:
     200  ctypes is a foreign function library for Python. It provides C compatible data types, and allows to call functions in dlls/shared libraries. It can be used to wrap these libraries in pure Python.
     201
     202Note the key phrase "pure Python".
     203
     204One starts with the very unPythonic explicit load of the C library to be called:
     205::
     206
     207    lib = ctypes.CDLL("hlsvd.dylib")
     208
     209
     210This is one place where SWIG is more Pythonic, since I access SWIG's library with a normal Python import statement.
     211
     212
     213Ctypes already knows about, well, C types, so it's easy to create something that can be passed to a function expecting a C float, for instance:
     214::
     215
     216   tstep = ctypes.c_float(.256)
     217
     218
     219Arrays are pretty simple too:
     220::
     221
     222   # Create an array type; this is like a class
     223   InputFloatArrayType = ctypes.c_float * 2048
     224   # Create an instance of that "class"
     225   yr = InputFloatArrayType()
     226
     227
     228The variable setup for the call to ``hlsvd()`` looks like the code below:
     229::
     230
     231    InputFloatArrayType = ctypes.c_float * 2048
     232    OutputDoubleArrayType = ctypes.c_double * 50
     233    OutputFloatArrayType = ctypes.c_float * 50
     234
     235    yr = InputFloatArrayType()
     236    yi = InputFloatArrayType()
     237
     238    ndpmax = ctypes.c_long(2048)
     239    ndp = ctypes.c_long(1024)
     240    tstep = ctypes.c_float(.256)
     241    m = ctypes.c_long(380)
     242    nssv = ctypes.c_long(25)
     243    nit = ctypes.c_long(500)
     244
     245    rootr = OutputDoubleArrayType()
     246    rooti = OutputDoubleArrayType()
     247    fre = OutputFloatArrayType()
     248    dam = OutputFloatArrayType()
     249    amp = OutputFloatArrayType()
     250    phase = OutputFloatArrayType()
     251    sv = OutputDoubleArrayType()
     252    snoise = ctypes.c_double()
     253    rms = ctypes.c_double()
     254    ndiv = ctypes.c_long()
     255    nitf = ctypes.c_long()
     256
     257
     258    lib.hlsvd(yr, yi, ctypes.byref(ndpmax), ctypes.byref(ndp),
     259               ctypes.byref(tstep), ctypes.byref(m), ctypes.byref(nssv),
     260               ctypes.pointer(rootr), ctypes.pointer(rooti),
     261               ctypes.pointer(fre), ctypes.pointer(dam),
     262               ctypes.pointer(amp), ctypes.pointer(phase),
     263               ctypes.pointer(sv), ctypes.pointer(snoise),
     264               ctypes.pointer(rms), ctypes.pointer(ndiv),
     265               ctypes.byref(nit), ctypes.pointer(nitf)
     266              )
     267
     268
     269At first glance this might look no better than the SWIG code, but it's actually a bit nicer. For one thing, the array sizes are defined in one place (the first three lines) and not repeated over and over. In addition, it's pretty clear what's going on in a line like this:
     270::
     271
     272    ndiv = ctypes.c_long()
     273
     274whereas SWIG's version is opaque:
     275::
     276
     277    ndiv = hlsvd_wrapper.new_ndiv()
     278
     279
     280The ``ctypes.byref()`` call is informative. It says that the C interface demands that I pass by reference, but I don't need access to the pointer here in Python. That tells anyone reading this code that I probably don't care what the called function does with the value, hinting that it is input-only. Again, I wouldn't need this at all if the f2c-generated code was tweaked to eliminate the unnecessary pass-by-reference for the input parameters.
     281
     282Also note that the code above includes some value assignments, like this line which assigns the value .256 to tstep:
     283::
     284
     285    tstep = ctypes.c_float(.256)
     286
     287
     288In my ideal Python example above I ignored assigning values to the variables for brevity's sake. And in order to make a fair comparison of SWIG's code to the ideal, I didn't include assignment in that code either. But to fairly compare SWIG and ctypes I need to mention assignment here, because it's simpler with ctypes. The code to assign a value to tstep via the SWIG interface looks like this:
     289::
     290
     291    tstep = hlsvd_wrapper.new_tstep()
     292    hlsvd_wrapper.tstep_assign(tstep, .256)
     293
     294
     295Arrays are also considerably uglier in SWIG. First consider the ctypes version (inputs is a list that I read from a disk file):
     296::
     297
     298    yi = InputFloatArrayType()
     299    for i, f in enumerate(inputs):
     300        yi[i] = inputs[i]
     301
     302
     303And now look at the SWIG version:
     304::
     305
     306    yi = hlsvd_wrapper.new_float_array(2048)
     307    for i in range(0, 2048):
     308        hlsvd_wrapper.float_array_setitem(yi, i, inputs[i])
     309
     310
     311Last but not least, note that the code above is 95% of what I needed to use ctypes to call ``hlsvd()``. There's no interface file, no SWIG-ing step, and no compile & link steps. Just the Python standard library, that's it.
     312
     313I have to say, though, that it's only because I saw SWIG first that I'm thrilled with ctypes. It's a nicer tool with which to build an equally ugly interface. Neither SWIG nor ctypes insulates us from the warts and wrinkles of the C interface.
     314
     315Conclusions
     316------------------------------------
     317
     318For the purpose of wrapping ``hlsvd()``, both SWIG and ctypes get the job done, although neither result will win a beauty contest. Setting aside criticisms of their ugly and unPythonic interfaces, ctypes is the hands down winner in the ease-of-use category.
     319
     320Some adjustments to the f2c-generated code and another Python wrapper layer around the code demonstrated here could go a long way to making the interface Pythonic.
     321
     322I'm less clear on what this experience portends for wrapping a larger C++ library such as GAMMA. The example here was perhaps unfair to SWIG, since SWIG can generate code based on header files but there were no header files with which to work in this case. Ctypes code, on the other hand, must always be handwritten, so in order to wrap 100 functions one must write 100 chunks of ctypes code. SWIG, however, might be able to generate at least some of that automatically.
     323
     324I also don't know what this implies for use of C++ objects within Python. Ctypes is strictly for wrapping C code (or code that exports a C-style calling interface) and knows nothing about C++ objects. I suspect that even if SWIG can make it possible to cajole C++ objects into Python, the Babelfish analogy would apply again -- the result might be Python in syntax, but C++ in spirit.
     325
     326
     327
     328References
     329------------------------------------
     330
     331| FORTRAN defines only INTEGER, REAL and "DOUBLE PRECISION" numbers.
     332| http://www.ibiblio.org/pub/languages/fortran/ch2-3.html
     333
     334| The options to f2c control how it deals with FORTRAN data types.
     335| http://www.netlib.org/f2c/f2c.1
     336
     337
     338
    18339
    19340}}}