Changes between Version 9 and Version 10 of SwiggingHlsvd

Jul 14, 2009, 4:42:13 PM (12 years ago)



  • SwiggingHlsvd

    v9 v10  
     3FooBar Header
    4 =============================================
    5 Making the hlsvd Library Accessible to Python
    6 =============================================
     6reStructuredText is **nice**.
    8 Introduction
    9 ------------------------------------
     8It has its own webpage_.
    11 hlsvd is a library written in FORTRAN. We've run it through f2c, a FORTRAN to C convertor. Our version of the library is therefore written in autogenerated  C. It exposes just one function called ``hlsvd()``.
     10A table:
    13 This is an explanation of what I learned using SWIG and ctypes to wrap the hlsvd library in order to make its function accessible to Python.
     12RST TracLinks
    15 Comments on the hlsvd Code
    16 ------------------------------------
     15See also ticket `#42`:trac:.
    18 I'm impressed by f2c; it's a very cool tool. Nevertheless, there's some imperfections I'd like to address. They're not specific to SWIG and ctypes; they're evident even when calling hlsvd from straight C.
    20 First, the f2c-generated code gives warnings when it compiles. They're mostly suggestions to parenthesize expressions so as not to rely on C operator precedence for statement correctness. I think they're only warning that the code is hard for humans to parse; I don't think the code is broken. However, I'm loath to ignore *any* compiler warnings; I want clean compiles. Hopefully we are on the same wavelength on this subject and I don't have to explain why I think clean compiles are extremely important, particularly in code that we'll share with others.
    22 Cleaning up the code to eliminate the warnings is easy enough, but modifying auto-generated code comes with the caveat that if it is ever auto-generated again, bye-bye modifications. I think that's a low risk in this case since (a) the FORTRAN hlsvd code is by now probably quite stable and so probably won't need to be f2c-ed again, and (b) any modifications we'd apply are pretty lightweight and would be pretty easy to reapply if necessary.
    24 A second problem with the code generated by f2c is the function interface. All of the parameters are pass-by-reference. This is technically correct because all FORTRAN params are pass-by-reference. C, however, is pass-by-value so f2c has to make every parameter to ``hlsvd()`` a pointer in order to mimic pass-by-reference semantics. It's unnecessary since many of the params to hlsvd are input-only, and it makes calling the function (from C, ctypes, or SWIG) more complicated than it needs to be.
    26 I'd like to modify the f2c-generated hlsvd.c to simplify the call signature by making all of the input-only parameters regular C pass-by-value parameters. It might be easier to add a wrapper function that achieves the same result. A wrapper would live in its own file which would have the advantage of not getting overwritten if this code is ever regenerated from the FORTRAN source.
    28 A third problem is that the generated hlsvd library exports about a zillion functions when it really only needs to export one. That's a little sloppy. It can be addressed by adding a "static" modifier to each function in the source code or by adding some linker commands. We can also just ignore it.
    30 Last but not least, f2c creates several C typedefs to mimic the FORTRAN variable types integer, real and double precision. In the code we have, these are mapped to the C types long, float and double. (By supplying options to f2c when it is run, one can modify these typedefs.) There's nothing wrong with f2c's choices and it would be unfair to present these typedefs as a flaw in f2c. Nevertheless, they do present us with a problem.
    32 The problem is that regardless of whether we use SWIG or ctypes or our own handwritten interface, sooner or later something will have to convert those C variables to Python objects by making calls to the Python API. Python's API provides functions for converting standard C types (int, long, float, char \*, etc.) to Python types. (For instance, the Python C library function ``PyInt_FromLong()`` accepts a C long as a parameter and returns a new Python integer object.) However, it does not (cannot) know anything about custom typedefs. At some level, a piece of code is going to have to make an assumption about the true type (i.e. the standard C type) to which the typedef maps.
    34 In our case, that's simple since there's only three typedefs we have to worry about and they're very simple. Nevertheless, if this code is f2c-ed again and for whatever reason the typedefs change (for instance, if f2c decides to start typedef-fing FORTRAN reals as C doubles instead of C floats), then our assumption will break and so will our code. Therefore, we have a dependency on how f2c generates code.
    36 If we never run f2c to regenerate this code, then this is a non-issue. If we do run f2c again, we must ensure that the typedefs of integer, real and doublereal still map to C's long, float and double.
    39 Calling hlsvd from Python
    40 ------------------------------------
    42 Before I get into specifics about SWIG and ctypes, I want to describe what an ideal Python interface to ``hlsvd()`` would look like. Remember that this function takes inputs of various types and returns a number of values also of various types. Returning multiple values from a function is one the of things Python makes easy, so we should take advantage of that.
    44 The C interface looks something like this:
    45 ::
    47     int hlsvd(float *,        //  1 - yr (input)
    48                float *,       //  2 - yi (input)
    49                long *,        //  3 - ndata (input)
    50                long *,        //  4 - ndp (input)
    51                float *,       //  5 - tstep (input)
    52                long *,        //  6 - m (input)
    53                long *,        //  7 - nssv (input)
    54                double *,      //  8 - rootr (output)
    55                double *,      //  9 - rooti (output)
    56                float *,       // 10 - fre (output)
    57                float *,       // 11 - dam (output)
    58                float *,       // 12 - amp (output)
    59                float *,       // 13 - phase (output)
    60                double *,      // 14 - sv (output)
    61                double *,      // 15 - snoise (output)
    62                double *,      // 16 - rms (output)
    63                long *,        // 17 - ndiv (output)
    64                long *,        // 18 - nit (input)
    65                long *);       // 19 - nitf (output)
    68 This comment about the parameters is stolen from the f2c-ed code:
    69 ::
    71     /*   on entry: */
    72     /*     yr    - data array (real part of time-domain signal) */
    73     /*     yi    - data array (imaginary part of time-domain signal) */
    74     /*     ndata - total number of data-points of experimental signal */
    75     /*     ndp   - number of data-points being used in the hlsvd analysis */
    76     /*     tstep - stepsize (milli-seconds) */
    77     /*     m     - size parameter of hankel data matrix */
    78     /*     nssv  - start value for number of signal-related singular values */
    79     /*     nit   - number of iterations entered */
    81     /*   on exit: */
    82     /*     rootr - signal-root array (real part) */
    83     /*     rooti - signal-root array (imaginary part) */
    84     /*     fre   - frequency array (khz) */
    85     /*     dam   - damping-factor array (milli-seconds) */
    86     /*     amp   - amplitude array (arbitrary units) */
    87     /*     phase - phase array (degrees) */
    88     /*     sv    - singular value array */
    89     /*     snois - estimation of signal noise level (standard deviation) */
    90     /*     rms   - root mean square of hlsvd fit */
    91     /*     nitf  - final number of iterations */
    92     /*     ndiv  - number of singular values found */
    95 The code to call an ideal Python version of ``hlsvd()`` might look something like the example below. (For clarity's sake I'm sticking with the same variable names as in the comments above; I generally prefer to avoid abbreviations.)
    96 ::
    98     # - y is an iterable type (e.g. a list) of complex numbers. Complex
    99     # numbers are a native type in Python.
    100     # - ndata is no longer necessary as it can be inferred from len(y)
    101     return_tuple = hlsvd(y, ndp, tstep, m, nssv, nit)
    103     # Extract values from the return_tuple
    104     # - root is a list of complex numbers
    105     # - fre, dam, amp, phase and sv are lists of floats
    106     # - snoise, rms, nitf and ndiv are floats or ints as appropriate
    107     root, fre, dam, amp, phase, sv, snoise, rms, nitf, ndiv = return_tuple
    110 This code has several advantages over the C version. For starters, the input and output parameters are clearly separated which makes the code easier to read. Second, since complex numbers are a native type in Python, there's no need to represent them as two correlated lists of floats. That makes the input parameter y and the output parameter root simpler to read and manipulate. It also eliminates the possibility of a mismatched list-size bug (i.e. if a different number of real and imaginary values are supplied). Yet another advantage is that we can eliminate one parameter (ndata) that can be inferred from ``len(y)``.
    112 Another bug that can't occur here is a memory overwrite due to undersized output parameters. In C one must preallocate the output parameters (like fre, amp, etc.) using ``malloc()`` or by declaring them on the stack. If they're too small, ``hlsvd()`` might crash (if you're lucky) or silently alter the values of neighboring arrays (if you're not). That can't happen in a Python interface like the one above.
    114 Last but not least, in the C interface some of the output parameters are floats and some are doubles. Python's float type covers both C float and C double, so we don't have to remember which is which.
    116 Our Python call is just two lines of code, and could be one:
    117 ::
    119     root, fre, dam, amp, phase, sv, snoise, rms, nitf, ndiv = \
    120         hlsvd(y, ndp, tstep, m, nssv, nit)
    123 With this Pythonic interface in mind, let's look at how the interfaces generated by SWIG and ctypes stack up.
    126 Wrapping hlsvd with SWIG
    127 ------------------------------------
    129 Wrapping anything with SWIG requires describing it to SWIG via an interface (.i) file. In an ideal world, SWIG can infer what it needs to know from a .h file and the interface file just refers SWIG to the .h file. In the case of hlsvd, there's no .h file so I have to describe the interface to SWIG explicitly.
    131 Complications arise with ``hlsvd()`` because it takes arrays (rather than simple types) as input parameters. Further complications result from the fact that it expects to be passed pre-allocated arrays to which it will write output. This is standard stuff with C but takes some work to express in a pointer-less language like Python.
    133 The resulting code is something only a mother could love. I should point out that I think SWIG is near miraculous in what it can do, and I don't intend any of my snide comments as criticism of SWIG or its developers. It's just that autotranslated code has all the charm of autotranslated writing. If you've ever used Babelfish, you know what I'm talking about. Meaning is (usually) preserved, albeit crudely, but all the idioms are lost and it's eminently clear that the text (or in this case, the interface) was not created by a native speaker.
    135 So we get ugly code like this (where inputs is a Python list I populated from a file):
    136 ::
    138     yr = hlsvd_wrapper.new_float_array(2048)
    139     for i in range(0, 2048):
    140         hlsvd_wrapper.float_array_setitem(yr, i, inputs[i])
    143 And this (which I think is only necessary because of ``hlsvd()``'s pass-by-reference awkwardness I mentioned above):
    144 ::
    146    tstep = hlsvd_wrapper.new_tstep()
    147    hlsvd_wrapper.tstep_assign(tstep, .256)
    150 And a series of calls like this one to create the output arrays:
    151 ::
    153    rootr = hlsvd_wrapper.new_double_array(50)
    155 In all, here's what one has to do just to create (not populate) the variables to be used in the call to ``hlsvd()``:
    156 ::
    158     yr = hlsvd_wrapper.new_float_array(2048)
    159     yi = hlsvd_wrapper.new_float_array(2048)
    160     ndata = hlsvd_wrapper.new_ndata()
    161     ndp = hlsvd_wrapper.new_ndp()
    162     tstep = hlsvd_wrapper.new_tstep()
    163     m = hlsvd_wrapper.new_m()
    164     nssv = hlsvd_wrapper.new_nssv()
    165     nit = hlsvd_wrapper.new_nit()
    166     rootr = hlsvd_wrapper.new_double_array(50)
    167     rooti = hlsvd_wrapper.new_double_array(50)
    168     fre = hlsvd_wrapper.new_float_array(50)
    169     dam = hlsvd_wrapper.new_float_array(50)
    170     amp = hlsvd_wrapper.new_float_array(50)
    171     phase = hlsvd_wrapper.new_float_array(50)
    172     sv = hlsvd_wrapper.new_double_array(1024)
    173     snoise = hlsvd_wrapper.new_snoise()
    174     rms = hlsvd_wrapper.new_rms()
    175     ndiv = hlsvd_wrapper.new_ndiv()
    176     nitf = hlsvd_wrapper.new_nitf()
    179 The call itself looks like this:
    180 ::
    182     hlsvd_wrapper.hlsvd(yr, yi, ndata, ndp, tstep, m, nssv,
    183                          rootr, rooti, fre, dam, amp, phase, sv, snoise,
    184                          rms, ndiv, nit, nitf)
    187 Note that every disadvantage of the C interface remains. There's no distinction between input and output parameters. The complex numbers are still represented as correlated lists of floats. The list/array sizes must be specified explicitly, and an opportunity for memory corruption or a crash exists if one gets it wrong. One must keep track of which variables are floats and which are doubles. And it's many more lines of code than the Python version. (Some of this is exacerbated by ``hlsvd()``'s pass-by-reference interface.)
    189 I should also note that generating the code to even make this interface possible requires the creation of a non-trivial interface file, a SWIG-ing step to auto-generate two cooperative wrappers (one in C and one in Python), and compilation and linkage of that C wrapper. Once the code is constructed, the call chain looks like this:
    190 ::
    192    My code --> SWIG's .py wrapper --> SWIG's .c wrapper --> libhlsvd
    194 It works, for which I bow deeply to the SWIG developers. However, it was a fair amount of work to generate the interface file. (Granted, I was climbing the SWIG learning curve.) In addition, the result is  unPythonic and leaves us exposed to a lot of C's unpleasantness.
    197 Wrapping hlsvd With Ctypes
    198 ------------------------------------
    200 Ctypes is part of the Python standard library. The Python documentation describes it like this:
    201   ctypes is a foreign function library for Python. It provides C compatible data types, and allows to call functions in dlls/shared libraries. It can be used to wrap these libraries in pure Python.
    203 Note the key phrase "pure Python".
    205 One starts with the very unPythonic explicit load of the C library to be called:
    206 ::
    208     lib = ctypes.CDLL("hlsvd.dylib")
    211 This is one place where SWIG is more Pythonic, since I access SWIG's library with a normal Python import statement.
    214 Ctypes already knows about, well, C types, so it's easy to create something that can be passed to a function expecting a C float, for instance:
    215 ::
    217    tstep = ctypes.c_float(.256)
    220 Arrays are pretty simple too:
    221 ::
    223    # Create an array type; this is like a class
    224    InputFloatArrayType = ctypes.c_float * 2048
    225    # Create an instance of that "class"
    226    yr = InputFloatArrayType()
    229 The variable setup for the call to ``hlsvd()`` looks like the code below:
    230 ::
    232     InputFloatArrayType = ctypes.c_float * 2048
    233     OutputDoubleArrayType = ctypes.c_double * 50
    234     OutputFloatArrayType = ctypes.c_float * 50
    236     yr = InputFloatArrayType()
    237     yi = InputFloatArrayType()
    239     ndpmax = ctypes.c_long(2048)
    240     ndp = ctypes.c_long(1024)
    241     tstep = ctypes.c_float(.256)
    242     m = ctypes.c_long(380)
    243     nssv = ctypes.c_long(25)
    244     nit = ctypes.c_long(500)
    246     rootr = OutputDoubleArrayType()
    247     rooti = OutputDoubleArrayType()
    248     fre = OutputFloatArrayType()
    249     dam = OutputFloatArrayType()
    250     amp = OutputFloatArrayType()
    251     phase = OutputFloatArrayType()
    252     sv = OutputDoubleArrayType()
    253     snoise = ctypes.c_double()
    254     rms = ctypes.c_double()
    255     ndiv = ctypes.c_long()
    256     nitf = ctypes.c_long()
    259     lib.hlsvd(yr, yi, ctypes.byref(ndpmax), ctypes.byref(ndp),
    260                ctypes.byref(tstep), ctypes.byref(m), ctypes.byref(nssv),
    261                ctypes.pointer(rootr), ctypes.pointer(rooti),
    262                ctypes.pointer(fre), ctypes.pointer(dam),
    263                ctypes.pointer(amp), ctypes.pointer(phase),
    264                ctypes.pointer(sv), ctypes.pointer(snoise),
    265                ctypes.pointer(rms), ctypes.pointer(ndiv),
    266                ctypes.byref(nit), ctypes.pointer(nitf)
    267               )
    270 At first glance this might look no better than the SWIG code, but it's actually a bit nicer. For one thing, the array sizes are defined in one place (the first three lines) and not repeated over and over. In addition, it's pretty clear what's going on in a line like this:
    271 ::
    273     ndiv = ctypes.c_long()
    275 whereas SWIG's version is opaque:
    276 ::
    278     ndiv = hlsvd_wrapper.new_ndiv()
    281 The ``ctypes.byref()`` call is informative. It says that the C interface demands that I pass by reference, but I don't need access to the pointer here in Python. That tells anyone reading this code that I probably don't care what the called function does with the value, hinting that it is input-only. Again, I wouldn't need this at all if the f2c-generated code was tweaked to eliminate the unnecessary pass-by-reference for the input parameters.
    283 Also note that the code above includes some value assignments, like this line which assigns the value .256 to tstep:
    284 ::
    286     tstep = ctypes.c_float(.256)
    289 In my ideal Python example above I ignored assigning values to the variables for brevity's sake. And in order to make a fair comparison of SWIG's code to the ideal, I didn't include assignment in that code either. But to fairly compare SWIG and ctypes I need to mention assignment here, because it's simpler with ctypes. The code to assign a value to tstep via the SWIG interface looks like this:
    290 ::
    292     tstep = hlsvd_wrapper.new_tstep()
    293     hlsvd_wrapper.tstep_assign(tstep, .256)
    296 Arrays are also considerably uglier in SWIG. First consider the ctypes version (inputs is a list that I read from a disk file):
    297 ::
    299     yi = InputFloatArrayType()
    300     for i, f in enumerate(inputs):
    301         yi[i] = inputs[i]
    304 And now look at the SWIG version:
    305 ::
    307     yi = hlsvd_wrapper.new_float_array(2048)
    308     for i in range(0, 2048):
    309         hlsvd_wrapper.float_array_setitem(yi, i, inputs[i])
    312 Last but not least, note that the code above is 95% of what I needed to use ctypes to call ``hlsvd()``. There's no interface file, no SWIG-ing step, and no compile & link steps. Just the Python standard library, that's it.
    314 I have to say, though, that it's only because I saw SWIG first that I'm thrilled with ctypes. It's a nicer tool with which to build an equally ugly interface. Neither SWIG nor ctypes insulates us from the warts and wrinkles of the C interface.
    316 Conclusions
    317 ------------------------------------
    319 For the purpose of wrapping ``hlsvd()``, both SWIG and ctypes get the job done, although neither result will win a beauty contest. Setting aside criticisms of their ugly and unPythonic interfaces, ctypes is the hands down winner in the ease-of-use category.
    321 Some adjustments to the f2c-generated code and another Python wrapper layer around the code demonstrated here could go a long way to making the interface Pythonic.
    323 I'm less clear on what this experience portends for wrapping a larger C++ library such as GAMMA. The example here was perhaps unfair to SWIG, since SWIG can generate code based on header files but there were no header files with which to work in this case. Ctypes code, on the other hand, must always be handwritten, so in order to wrap 100 functions one must write 100 chunks of ctypes code. SWIG, however, might be able to generate at least some of that automatically.
    325 I also don't know what this implies for use of C++ objects within Python. Ctypes is strictly for wrapping C code (or code that exports a C-style calling interface) and knows nothing about C++ objects. I suspect that even if SWIG can make it possible to cajole C++ objects into Python, the Babelfish analogy would apply again -- the result might be Python in syntax, but C++ in spirit.
    329 References
    330 ------------------------------------
    332 | FORTRAN defines only INTEGER, REAL and "DOUBLE PRECISION" numbers.
    333 |
    335 | The options to f2c control how it deals with FORTRAN data types.
    336 |
     17.. _webpage: