Changes between Version 7 and Version 8 of SwiggingHlsvd

Jul 14, 2009, 4:41:17 PM (10 years ago)



  • SwiggingHlsvd

    v7 v8  
    3 FooBar Header
    4 =============
    7 i like it
    9 reStructuredText is **nice**.
    11 It has its own webpage_.
    13 A table:
    15 RST TracLinks
    16 -------------
    18 See also ticket `#42`:trac:.
    20 .. _webpage:
     4Making the hlsvd Library Accessible to Python
     10hlsvd is a library written in FORTRAN. We've run it through f2c, a FORTRAN to C convertor. Our version of the library is therefore written in autogenerated  C. It exposes just one function called ``hlsvd()``.
     12This is an explanation of what I learned using SWIG and ctypes to wrap the hlsvd library in order to make its function accessible to Python.
     14Comments on the hlsvd Code
     17I'm impressed by f2c; it's a very cool tool. Nevertheless, there's some imperfections I'd like to address. They're not specific to SWIG and ctypes; they're evident even when calling hlsvd from straight C.
     19First, the f2c-generated code gives warnings when it compiles. They're mostly suggestions to parenthesize expressions so as not to rely on C operator precedence for statement correctness. I think they're only warning that the code is hard for humans to parse; I don't think the code is broken. However, I'm loath to ignore *any* compiler warnings; I want clean compiles. Hopefully we are on the same wavelength on this subject and I don't have to explain why I think clean compiles are extremely important, particularly in code that we'll share with others.
     21Cleaning up the code to eliminate the warnings is easy enough, but modifying auto-generated code comes with the caveat that if it is ever auto-generated again, bye-bye modifications. I think that's a low risk in this case since (a) the FORTRAN hlsvd code is by now probably quite stable and so probably won't need to be f2c-ed again, and (b) any modifications we'd apply are pretty lightweight and would be pretty easy to reapply if necessary.
     23A second problem with the code generated by f2c is the function interface. All of the parameters are pass-by-reference. This is technically correct because all FORTRAN params are pass-by-reference. C, however, is pass-by-value so f2c has to make every parameter to ``hlsvd()`` a pointer in order to mimic pass-by-reference semantics. It's unnecessary since many of the params to hlsvd are input-only, and it makes calling the function (from C, ctypes, or SWIG) more complicated than it needs to be.
     25I'd like to modify the f2c-generated hlsvd.c to simplify the call signature by making all of the input-only parameters regular C pass-by-value parameters. It might be easier to add a wrapper function that achieves the same result. A wrapper would live in its own file which would have the advantage of not getting overwritten if this code is ever regenerated from the FORTRAN source.
     27A third problem is that the generated hlsvd library exports about a zillion functions when it really only needs to export one. That's a little sloppy. It can be addressed by adding a "static" modifier to each function in the source code or by adding some linker commands. We can also just ignore it.
     29Last but not least, f2c creates several C typedefs to mimic the FORTRAN variable types integer, real and double precision. In the code we have, these are mapped to the C types long, float and double. (By supplying options to f2c when it is run, one can modify these typedefs.) There's nothing wrong with f2c's choices and it would be unfair to present these typedefs as a flaw in f2c. Nevertheless, they do present us with a problem.
     31The problem is that regardless of whether we use SWIG or ctypes or our own handwritten interface, sooner or later something will have to convert those C variables to Python objects by making calls to the Python API. Python's API provides functions for converting standard C types (int, long, float, char \*, etc.) to Python types. (For instance, the Python C library function ``PyInt_FromLong()`` accepts a C long as a parameter and returns a new Python integer object.) However, it does not (cannot) know anything about custom typedefs. At some level, a piece of code is going to have to make an assumption about the true type (i.e. the standard C type) to which the typedef maps.
     33In our case, that's simple since there's only three typedefs we have to worry about and they're very simple. Nevertheless, if this code is f2c-ed again and for whatever reason the typedefs change (for instance, if f2c decides to start typedef-fing FORTRAN reals as C doubles instead of C floats), then our assumption will break and so will our code. Therefore, we have a dependency on how f2c generates code.
     35If we never run f2c to regenerate this code, then this is a non-issue. If we do run f2c again, we must ensure that the typedefs of integer, real and doublereal still map to C's long, float and double.
     38Calling hlsvd from Python
     41Before I get into specifics about SWIG and ctypes, I want to describe what an ideal Python interface to ``hlsvd()`` would look like. Remember that this function takes inputs of various types and returns a number of values also of various types. Returning multiple values from a function is one the of things Python makes easy, so we should take advantage of that.
     43The C interface looks something like this:
     46    int hlsvd(float *,        //  1 - yr (input)
     47               float *,       //  2 - yi (input)
     48               long *,        //  3 - ndata (input)
     49               long *,        //  4 - ndp (input)
     50               float *,       //  5 - tstep (input)
     51               long *,        //  6 - m (input)
     52               long *,        //  7 - nssv (input)
     53               double *,      //  8 - rootr (output)
     54               double *,      //  9 - rooti (output)
     55               float *,       // 10 - fre (output)
     56               float *,       // 11 - dam (output)
     57               float *,       // 12 - amp (output)
     58               float *,       // 13 - phase (output)
     59               double *,      // 14 - sv (output)
     60               double *,      // 15 - snoise (output)
     61               double *,      // 16 - rms (output)
     62               long *,        // 17 - ndiv (output)
     63               long *,        // 18 - nit (input)
     64               long *);       // 19 - nitf (output)
     67This comment about the parameters is stolen from the f2c-ed code:
     70    /*   on entry: */
     71    /*     yr    - data array (real part of time-domain signal) */
     72    /*     yi    - data array (imaginary part of time-domain signal) */
     73    /*     ndata - total number of data-points of experimental signal */
     74    /*     ndp   - number of data-points being used in the hlsvd analysis */
     75    /*     tstep - stepsize (milli-seconds) */
     76    /*     m     - size parameter of hankel data matrix */
     77    /*     nssv  - start value for number of signal-related singular values */
     78    /*     nit   - number of iterations entered */
     80    /*   on exit: */
     81    /*     rootr - signal-root array (real part) */
     82    /*     rooti - signal-root array (imaginary part) */
     83    /*     fre   - frequency array (khz) */
     84    /*     dam   - damping-factor array (milli-seconds) */
     85    /*     amp   - amplitude array (arbitrary units) */
     86    /*     phase - phase array (degrees) */
     87    /*     sv    - singular value array */
     88    /*     snois - estimation of signal noise level (standard deviation) */
     89    /*     rms   - root mean square of hlsvd fit */
     90    /*     nitf  - final number of iterations */
     91    /*     ndiv  - number of singular values found */
     94The code to call an ideal Python version of ``hlsvd()`` might look something like the example below. (For clarity's sake I'm sticking with the same variable names as in the comments above; I generally prefer to avoid abbreviations.)
     97    # - y is an iterable type (e.g. a list) of complex numbers. Complex
     98    # numbers are a native type in Python.
     99    # - ndata is no longer necessary as it can be inferred from len(y)
     100    return_tuple = hlsvd(y, ndp, tstep, m, nssv, nit)
     102    # Extract values from the return_tuple
     103    # - root is a list of complex numbers
     104    # - fre, dam, amp, phase and sv are lists of floats
     105    # - snoise, rms, nitf and ndiv are floats or ints as appropriate
     106    root, fre, dam, amp, phase, sv, snoise, rms, nitf, ndiv = return_tuple
     109This code has several advantages over the C version. For starters, the input and output parameters are clearly separated which makes the code easier to read. Second, since complex numbers are a native type in Python, there's no need to represent them as two correlated lists of floats. That makes the input parameter y and the output parameter root simpler to read and manipulate. It also eliminates the possibility of a mismatched list-size bug (i.e. if a different number of real and imaginary values are supplied). Yet another advantage is that we can eliminate one parameter (ndata) that can be inferred from ``len(y)``.
     111Another bug that can't occur here is a memory overwrite due to undersized output parameters. In C one must preallocate the output parameters (like fre, amp, etc.) using ``malloc()`` or by declaring them on the stack. If they're too small, ``hlsvd()`` might crash (if you're lucky) or silently alter the values of neighboring arrays (if you're not). That can't happen in a Python interface like the one above.
     113Last but not least, in the C interface some of the output parameters are floats and some are doubles. Python's float type covers both C float and C double, so we don't have to remember which is which.
     115Our Python call is just two lines of code, and could be one:
     118    root, fre, dam, amp, phase, sv, snoise, rms, nitf, ndiv = \
     119        hlsvd(y, ndp, tstep, m, nssv, nit)
     122With this Pythonic interface in mind, let's look at how the interfaces generated by SWIG and ctypes stack up.
     125Wrapping hlsvd with SWIG
     128Wrapping anything with SWIG requires describing it to SWIG via an interface (.i) file. In an ideal world, SWIG can infer what it needs to know from a .h file and the interface file just refers SWIG to the .h file. In the case of hlsvd, there's no .h file so I have to describe the interface to SWIG explicitly.
     130Complications arise with ``hlsvd()`` because it takes arrays (rather than simple types) as input parameters. Further complications result from the fact that it expects to be passed pre-allocated arrays to which it will write output. This is standard stuff with C but takes some work to express in a pointer-less language like Python.
     132The resulting code is something only a mother could love. I should point out that I think SWIG is near miraculous in what it can do, and I don't intend any of my snide comments as criticism of SWIG or its developers. It's just that autotranslated code has all the charm of autotranslated writing. If you've ever used Babelfish, you know what I'm talking about. Meaning is (usually) preserved, albeit crudely, but all the idioms are lost and it's eminently clear that the text (or in this case, the interface) was not created by a native speaker.
     134So we get ugly code like this (where inputs is a Python list I populated from a file):
     137    yr = hlsvd_wrapper.new_float_array(2048)
     138    for i in range(0, 2048):
     139        hlsvd_wrapper.float_array_setitem(yr, i, inputs[i])
     142And this (which I think is only necessary because of ``hlsvd()``'s pass-by-reference awkwardness I mentioned above):
     145   tstep = hlsvd_wrapper.new_tstep()
     146   hlsvd_wrapper.tstep_assign(tstep, .256)
     149And a series of calls like this one to create the output arrays:
     152   rootr = hlsvd_wrapper.new_double_array(50)
     154In all, here's what one has to do just to create (not populate) the variables to be used in the call to ``hlsvd()``:
     157    yr = hlsvd_wrapper.new_float_array(2048)
     158    yi = hlsvd_wrapper.new_float_array(2048)
     159    ndata = hlsvd_wrapper.new_ndata()
     160    ndp = hlsvd_wrapper.new_ndp()
     161    tstep = hlsvd_wrapper.new_tstep()
     162    m = hlsvd_wrapper.new_m()
     163    nssv = hlsvd_wrapper.new_nssv()
     164    nit = hlsvd_wrapper.new_nit()
     165    rootr = hlsvd_wrapper.new_double_array(50)
     166    rooti = hlsvd_wrapper.new_double_array(50)
     167    fre = hlsvd_wrapper.new_float_array(50)
     168    dam = hlsvd_wrapper.new_float_array(50)
     169    amp = hlsvd_wrapper.new_float_array(50)
     170    phase = hlsvd_wrapper.new_float_array(50)
     171    sv = hlsvd_wrapper.new_double_array(1024)
     172    snoise = hlsvd_wrapper.new_snoise()
     173    rms = hlsvd_wrapper.new_rms()
     174    ndiv = hlsvd_wrapper.new_ndiv()
     175    nitf = hlsvd_wrapper.new_nitf()
     178The call itself looks like this:
     181    hlsvd_wrapper.hlsvd(yr, yi, ndata, ndp, tstep, m, nssv,
     182                         rootr, rooti, fre, dam, amp, phase, sv, snoise,
     183                         rms, ndiv, nit, nitf)
     186Note that every disadvantage of the C interface remains. There's no distinction between input and output parameters. The complex numbers are still represented as correlated lists of floats. The list/array sizes must be specified explicitly, and an opportunity for memory corruption or a crash exists if one gets it wrong. One must keep track of which variables are floats and which are doubles. And it's many more lines of code than the Python version. (Some of this is exacerbated by ``hlsvd()``'s pass-by-reference interface.)
     188I should also note that generating the code to even make this interface possible requires the creation of a non-trivial interface file, a SWIG-ing step to auto-generate two cooperative wrappers (one in C and one in Python), and compilation and linkage of that C wrapper. Once the code is constructed, the call chain looks like this:
     191   My code --> SWIG's .py wrapper --> SWIG's .c wrapper --> libhlsvd
     193It works, for which I bow deeply to the SWIG developers. However, it was a fair amount of work to generate the interface file. (Granted, I was climbing the SWIG learning curve.) In addition, the result is  unPythonic and leaves us exposed to a lot of C's unpleasantness.
     196Wrapping hlsvd With Ctypes
     199Ctypes is part of the Python standard library. The Python documentation describes it like this:
     200  ctypes is a foreign function library for Python. It provides C compatible data types, and allows to call functions in dlls/shared libraries. It can be used to wrap these libraries in pure Python.
     202Note the key phrase "pure Python".
     204One starts with the very unPythonic explicit load of the C library to be called:
     207    lib = ctypes.CDLL("hlsvd.dylib")
     210This is one place where SWIG is more Pythonic, since I access SWIG's library with a normal Python import statement.
     213Ctypes already knows about, well, C types, so it's easy to create something that can be passed to a function expecting a C float, for instance:
     216   tstep = ctypes.c_float(.256)
     219Arrays are pretty simple too:
     222   # Create an array type; this is like a class
     223   InputFloatArrayType = ctypes.c_float * 2048
     224   # Create an instance of that "class"
     225   yr = InputFloatArrayType()
     228The variable setup for the call to ``hlsvd()`` looks like the code below:
     231    InputFloatArrayType = ctypes.c_float * 2048
     232    OutputDoubleArrayType = ctypes.c_double * 50
     233    OutputFloatArrayType = ctypes.c_float * 50
     235    yr = InputFloatArrayType()
     236    yi = InputFloatArrayType()
     238    ndpmax = ctypes.c_long(2048)
     239    ndp = ctypes.c_long(1024)
     240    tstep = ctypes.c_float(.256)
     241    m = ctypes.c_long(380)
     242    nssv = ctypes.c_long(25)
     243    nit = ctypes.c_long(500)
     245    rootr = OutputDoubleArrayType()
     246    rooti = OutputDoubleArrayType()
     247    fre = OutputFloatArrayType()
     248    dam = OutputFloatArrayType()
     249    amp = OutputFloatArrayType()
     250    phase = OutputFloatArrayType()
     251    sv = OutputDoubleArrayType()
     252    snoise = ctypes.c_double()
     253    rms = ctypes.c_double()
     254    ndiv = ctypes.c_long()
     255    nitf = ctypes.c_long()
     258    lib.hlsvd(yr, yi, ctypes.byref(ndpmax), ctypes.byref(ndp),
     259               ctypes.byref(tstep), ctypes.byref(m), ctypes.byref(nssv),
     260               ctypes.pointer(rootr), ctypes.pointer(rooti),
     261               ctypes.pointer(fre), ctypes.pointer(dam),
     262               ctypes.pointer(amp), ctypes.pointer(phase),
     263               ctypes.pointer(sv), ctypes.pointer(snoise),
     264               ctypes.pointer(rms), ctypes.pointer(ndiv),
     265               ctypes.byref(nit), ctypes.pointer(nitf)
     266              )
     269At first glance this might look no better than the SWIG code, but it's actually a bit nicer. For one thing, the array sizes are defined in one place (the first three lines) and not repeated over and over. In addition, it's pretty clear what's going on in a line like this:
     272    ndiv = ctypes.c_long()
     274whereas SWIG's version is opaque:
     277    ndiv = hlsvd_wrapper.new_ndiv()
     280The ``ctypes.byref()`` call is informative. It says that the C interface demands that I pass by reference, but I don't need access to the pointer here in Python. That tells anyone reading this code that I probably don't care what the called function does with the value, hinting that it is input-only. Again, I wouldn't need this at all if the f2c-generated code was tweaked to eliminate the unnecessary pass-by-reference for the input parameters.
     282Also note that the code above includes some value assignments, like this line which assigns the value .256 to tstep:
     285    tstep = ctypes.c_float(.256)
     288In my ideal Python example above I ignored assigning values to the variables for brevity's sake. And in order to make a fair comparison of SWIG's code to the ideal, I didn't include assignment in that code either. But to fairly compare SWIG and ctypes I need to mention assignment here, because it's simpler with ctypes. The code to assign a value to tstep via the SWIG interface looks like this:
     291    tstep = hlsvd_wrapper.new_tstep()
     292    hlsvd_wrapper.tstep_assign(tstep, .256)
     295Arrays are also considerably uglier in SWIG. First consider the ctypes version (inputs is a list that I read from a disk file):
     298    yi = InputFloatArrayType()
     299    for i, f in enumerate(inputs):
     300        yi[i] = inputs[i]
     303And now look at the SWIG version:
     306    yi = hlsvd_wrapper.new_float_array(2048)
     307    for i in range(0, 2048):
     308        hlsvd_wrapper.float_array_setitem(yi, i, inputs[i])
     311Last but not least, note that the code above is 95% of what I needed to use ctypes to call ``hlsvd()``. There's no interface file, no SWIG-ing step, and no compile & link steps. Just the Python standard library, that's it.
     313I have to say, though, that it's only because I saw SWIG first that I'm thrilled with ctypes. It's a nicer tool with which to build an equally ugly interface. Neither SWIG nor ctypes insulates us from the warts and wrinkles of the C interface.
     318For the purpose of wrapping ``hlsvd()``, both SWIG and ctypes get the job done, although neither result will win a beauty contest. Setting aside criticisms of their ugly and unPythonic interfaces, ctypes is the hands down winner in the ease-of-use category.
     320Some adjustments to the f2c-generated code and another Python wrapper layer around the code demonstrated here could go a long way to making the interface Pythonic.
     322I'm less clear on what this experience portends for wrapping a larger C++ library such as GAMMA. The example here was perhaps unfair to SWIG, since SWIG can generate code based on header files but there were no header files with which to work in this case. Ctypes code, on the other hand, must always be handwritten, so in order to wrap 100 functions one must write 100 chunks of ctypes code. SWIG, however, might be able to generate at least some of that automatically.
     324I also don't know what this implies for use of C++ objects within Python. Ctypes is strictly for wrapping C code (or code that exports a C-style calling interface) and knows nothing about C++ objects. I suspect that even if SWIG can make it possible to cajole C++ objects into Python, the Babelfish analogy would apply again -- the result might be Python in syntax, but C++ in spirit.
     331| FORTRAN defines only INTEGER, REAL and "DOUBLE PRECISION" numbers.
     334| The options to f2c control how it deals with FORTRAN data types.