Changes between Version 3 and Version 4 of SwiggingHlsvd


Ignore:
Timestamp:
Jul 14, 2009, 3:51:56 PM (10 years ago)
Author:
flip
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SwiggingHlsvd

    v3 v4  
    1 {{{
    2 #!rst
     1{{{
     2#!rst
     3FooBar Header
     4=============
    35
    4 =============================================
    5 Making the hlsvd Library Accessible to Python
    6 =============================================
     6reStructuredText is **nice**.
    77
    8 Introduction
    9 ------------------------------------
     8It has its own webpage_.
    109
    11 hlsvd is a library written in FORTRAN. We've run it through f2c, a FORTRAN to C convertor. Our version of the library is therefore written in autogenerated  C. It exposes just one function called ``hlsvd()``.
     10A table:
    1211
    13 This is an explanation of what I learned using SWIG and ctypes to wrap the hlsvd library in order to make its function accessible to Python.
     12RST TracLinks
     13-------------
    1414
    15 Comments on the hlsvd Code
    16 ------------------------------------
     15See also ticket `#42`:trac:.
    1716
    18 I'm impressed by f2c; it's a very cool tool. Nevertheless, there's some imperfections I'd like to address. They're not specific to SWIG and ctypes; they're evident even when calling hlsvd from straight C.
    19 
    20 First, the f2c-generated code gives warnings when it compiles. They're mostly suggestions to parenthesize expressions so as not to rely on C operator precedence for statement correctness. I think they're only warning that the code is hard for humans to parse; I don't think the code is broken. However, I'm loath to ignore *any* compiler warnings; I want clean compiles. Hopefully we are on the same wavelength on this subject and I don't have to explain why I think clean compiles are extremely important, particularly in code that we'll share with others.
    21 
    22 Cleaning up the code to eliminate the warnings is easy enough, but modifying auto-generated code comes with the caveat that if it is ever auto-generated again, bye-bye modifications. I think that's a low risk in this case since (a) the FORTRAN hlsvd code is by now probably quite stable and so probably won't need to be f2c-ed again, and (b) any modifications we'd apply are pretty lightweight and would be pretty easy to reapply if necessary.
    23 
    24 A second problem with the code generated by f2c is the function interface. All of the parameters are pass-by-reference. This is technically correct because all FORTRAN params are pass-by-reference. C, however, is pass-by-value so f2c has to make every parameter to ``hlsvd()`` a pointer in order to mimic pass-by-reference semantics. It's unnecessary since many of the params to hlsvd are input-only, and it makes calling the function (from C, ctypes, or SWIG) more complicated than it needs to be.
    25 
    26 I'd like to modify the f2c-generated hlsvd.c to simplify the call signature by making all of the input-only parameters regular C pass-by-value parameters. It might be easier to add a wrapper function that achieves the same result. A wrapper would live in its own file which would have the advantage of not getting overwritten if this code is ever regenerated from the FORTRAN source.
    27 
    28 A third problem is that the generated hlsvd library exports about a zillion functions when it really only needs to export one. That's a little sloppy. It can be addressed by adding a "static" modifier to each function in the source code or by adding some linker commands. We can also just ignore it.
    29 
    30 Last but not least, f2c creates several C typedefs to mimic the FORTRAN variable types integer, real and double precision. In the code we have, these are mapped to the C types long, float and double. (By supplying options to f2c when it is run, one can modify these typedefs.) There's nothing wrong with f2c's choices and it would be unfair to present these typedefs as a flaw in f2c. Nevertheless, they do present us with a problem.
    31 
    32 The problem is that regardless of whether we use SWIG or ctypes or our own handwritten interface, sooner or later something will have to convert those C variables to Python objects by making calls to the Python API. Python's API provides functions for converting standard C types (int, long, float, char \*, etc.) to Python types. (For instance, the Python C library function ``PyInt_FromLong()`` accepts a C long as a parameter and returns a new Python integer object.) However, it does not (cannot) know anything about custom typedefs. At some level, a piece of code is going to have to make an assumption about the true type (i.e. the standard C type) to which the typedef maps.
    33 
    34 In our case, that's simple since there's only three typedefs we have to worry about and they're very simple. Nevertheless, if this code is f2c-ed again and for whatever reason the typedefs change (for instance, if f2c decides to start typedef-fing FORTRAN reals as C doubles instead of C floats), then our assumption will break and so will our code. Therefore, we have a dependency on how f2c generates code.
    35 
    36 If we never run f2c to regenerate this code, then this is a non-issue. If we do run f2c again, we must ensure that the typedefs of integer, real and doublereal still map to C's long, float and double.
    37 
    38 
    39 Calling hlsvd from Python
    40 ------------------------------------
    41 
    42 Before I get into specifics about SWIG and ctypes, I want to describe what an ideal Python interface to ``hlsvd()`` would look like. Remember that this function takes inputs of various types and returns a number of values also of various types. Returning multiple values from a function is one the of things Python makes easy, so we should take advantage of that.
    43 
    44 The C interface looks something like this:
    45 ::
    46 
    47     int hlsvd(float *,        //  1 - yr (input)
    48                float *,       //  2 - yi (input)
    49                long *,        //  3 - ndata (input)
    50                long *,        //  4 - ndp (input)
    51                float *,       //  5 - tstep (input)
    52                long *,        //  6 - m (input)
    53                long *,        //  7 - nssv (input)
    54                double *,      //  8 - rootr (output)
    55                double *,      //  9 - rooti (output)
    56                float *,       // 10 - fre (output)
    57                float *,       // 11 - dam (output)
    58                float *,       // 12 - amp (output)
    59                float *,       // 13 - phase (output)
    60                double *,      // 14 - sv (output)
    61                double *,      // 15 - snoise (output)
    62                double *,      // 16 - rms (output)
    63                long *,        // 17 - ndiv (output)
    64                long *,        // 18 - nit (input)
    65                long *);       // 19 - nitf (output)
    66 
    67 
    68 This comment about the parameters is stolen from the f2c-ed code:
    69 ::
    70 
    71     /*   on entry: */
    72     /*     yr    - data array (real part of time-domain signal) */
    73     /*     yi    - data array (imaginary part of time-domain signal) */
    74     /*     ndata - total number of data-points of experimental signal */
    75     /*     ndp   - number of data-points being used in the hlsvd analysis */
    76     /*     tstep - stepsize (milli-seconds) */
    77     /*     m     - size parameter of hankel data matrix */
    78     /*     nssv  - start value for number of signal-related singular values */
    79     /*     nit   - number of iterations entered */
    80 
    81     /*   on exit: */
    82     /*     rootr - signal-root array (real part) */
    83     /*     rooti - signal-root array (imaginary part) */
    84     /*     fre   - frequency array (khz) */
    85     /*     dam   - damping-factor array (milli-seconds) */
    86     /*     amp   - amplitude array (arbitrary units) */
    87     /*     phase - phase array (degrees) */
    88     /*     sv    - singular value array */
    89     /*     snois - estimation of signal noise level (standard deviation) */
    90     /*     rms   - root mean square of hlsvd fit */
    91     /*     nitf  - final number of iterations */
    92     /*     ndiv  - number of singular values found */
    93 
    94 
    95 The code to call an ideal Python version of ``hlsvd()`` might look something like the example below. (For clarity's sake I'm sticking with the same variable names as in the comments above; I generally prefer to avoid abbreviations.)
    96 ::
    97 
    98     # - y is an iterable type (e.g. a list) of complex numbers. Complex
    99     # numbers are a native type in Python.
    100     # - ndata is no longer necessary as it can be inferred from len(y)
    101     return_tuple = hlsvd(y, ndp, tstep, m, nssv, nit)
    102 
    103     # Extract values from the return_tuple
    104     # - root is a list of complex numbers
    105     # - fre, dam, amp, phase and sv are lists of floats
    106     # - snoise, rms, nitf and ndiv are floats or ints as appropriate
    107     root, fre, dam, amp, phase, sv, snoise, rms, nitf, ndiv = return_tuple
    108 
    109 
    110 This code has several advantages over the C version. For starters, the input and output parameters are clearly separated which makes the code easier to read. Second, since complex numbers are a native type in Python, there's no need to represent them as two correlated lists of floats. That makes the input parameter y and the output parameter root simpler to read and manipulate. It also eliminates the possibility of a mismatched list-size bug (i.e. if a different number of real and imaginary values are supplied). Yet another advantage is that we can eliminate one parameter (ndata) that can be inferred from ``len(y)``.
    111 
    112 Another bug that can't occur here is a memory overwrite due to undersized output parameters. In C one must preallocate the output parameters (like fre, amp, etc.) using ``malloc()`` or by declaring them on the stack. If they're too small, ``hlsvd()`` might crash (if you're lucky) or silently alter the values of neighboring arrays (if you're not). That can't happen in a Python interface like the one above.
    113 
    114 Last but not least, in the C interface some of the output parameters are floats and some are doubles. Python's float type covers both C float and C double, so we don't have to remember which is which.
    115 
    116 Our Python call is just two lines of code, and could be one:
    117 ::
    118 
    119     root, fre, dam, amp, phase, sv, snoise, rms, nitf, ndiv = \
    120         hlsvd(y, ndp, tstep, m, nssv, nit)
    121 
    122 
    123 With this Pythonic interface in mind, let's look at how the interfaces generated by SWIG and ctypes stack up.
    124 
    125 
    126 Wrapping hlsvd with SWIG
    127 ------------------------------------
    128 
    129 Wrapping anything with SWIG requires describing it to SWIG via an interface (.i) file. In an ideal world, SWIG can infer what it needs to know from a .h file and the interface file just refers SWIG to the .h file. In the case of hlsvd, there's no .h file so I have to describe the interface to SWIG explicitly.
    130 
    131 Complications arise with ``hlsvd()`` because it takes arrays (rather than simple types) as input parameters. Further complications result from the fact that it expects to be passed pre-allocated arrays to which it will write output. This is standard stuff with C but takes some work to express in a pointer-less language like Python.
    132 
    133 The resulting code is something only a mother could love. I should point out that I think SWIG is near miraculous in what it can do, and I don't intend any of my snide comments as criticism of SWIG or its developers. It's just that autotranslated code has all the charm of autotranslated writing. If you've ever used Babelfish, you know what I'm talking about. Meaning is (usually) preserved, albeit crudely, but all the idioms are lost and it's eminently clear that the text (or in this case, the interface) was not created by a native speaker.
    134 
    135 So we get ugly code like this (where inputs is a Python list I populated from a file):
    136 ::
    137 
    138     yr = hlsvd_wrapper.new_float_array(2048)
    139     for i in range(0, 2048):
    140         hlsvd_wrapper.float_array_setitem(yr, i, inputs[i])
    141 
    142 
    143 And this (which I think is only necessary because of ``hlsvd()``'s pass-by-reference awkwardness I mentioned above):
    144 ::
    145 
    146    tstep = hlsvd_wrapper.new_tstep()
    147    hlsvd_wrapper.tstep_assign(tstep, .256)
    148 
    149 
    150 And a series of calls like this one to create the output arrays:
    151 ::
    152 
    153    rootr = hlsvd_wrapper.new_double_array(50)
    154 
    155 In all, here's what one has to do just to create (not populate) the variables to be used in the call to ``hlsvd()``:
    156 ::
    157 
    158     yr = hlsvd_wrapper.new_float_array(2048)
    159     yi = hlsvd_wrapper.new_float_array(2048)
    160     ndata = hlsvd_wrapper.new_ndata()
    161     ndp = hlsvd_wrapper.new_ndp()
    162     tstep = hlsvd_wrapper.new_tstep()
    163     m = hlsvd_wrapper.new_m()
    164     nssv = hlsvd_wrapper.new_nssv()
    165     nit = hlsvd_wrapper.new_nit()
    166     rootr = hlsvd_wrapper.new_double_array(50)
    167     rooti = hlsvd_wrapper.new_double_array(50)
    168     fre = hlsvd_wrapper.new_float_array(50)
    169     dam = hlsvd_wrapper.new_float_array(50)
    170     amp = hlsvd_wrapper.new_float_array(50)
    171     phase = hlsvd_wrapper.new_float_array(50)
    172     sv = hlsvd_wrapper.new_double_array(1024)
    173     snoise = hlsvd_wrapper.new_snoise()
    174     rms = hlsvd_wrapper.new_rms()
    175     ndiv = hlsvd_wrapper.new_ndiv()
    176     nitf = hlsvd_wrapper.new_nitf()
    177 
    178 
    179 The call itself looks like this:
    180 ::
    181 
    182     hlsvd_wrapper.hlsvd(yr, yi, ndata, ndp, tstep, m, nssv,
    183                          rootr, rooti, fre, dam, amp, phase, sv, snoise,
    184                          rms, ndiv, nit, nitf)
    185 
    186 
    187 Note that every disadvantage of the C interface remains. There's no distinction between input and output parameters. The complex numbers are still represented as correlated lists of floats. The list/array sizes must be specified explicitly, and an opportunity for memory corruption or a crash exists if one gets it wrong. One must keep track of which variables are floats and which are doubles. And it's many more lines of code than the Python version. (Some of this is exacerbated by ``hlsvd()``'s pass-by-reference interface.)
    188 
    189 I should also note that generating the code to even make this interface possible requires the creation of a non-trivial interface file, a SWIG-ing step to auto-generate two cooperative wrappers (one in C and one in Python), and compilation and linkage of that C wrapper. Once the code is constructed, the call chain looks like this:
    190 ::
    191 
    192    My code --> SWIG's .py wrapper --> SWIG's .c wrapper --> libhlsvd
    193 
    194 It works, for which I bow deeply to the SWIG developers. However, it was a fair amount of work to generate the interface file. (Granted, I was climbing the SWIG learning curve.) In addition, the result is  unPythonic and leaves us exposed to a lot of C's unpleasantness.
    195 
    196 
    197 Wrapping hlsvd With Ctypes
    198 ------------------------------------
    199 
    200 Ctypes is part of the Python standard library. The Python documentation describes it like this:
    201   ctypes is a foreign function library for Python. It provides C compatible data types, and allows to call functions in dlls/shared libraries. It can be used to wrap these libraries in pure Python.
    202 
    203 Note the key phrase "pure Python".
    204 
    205 One starts with the very unPythonic explicit load of the C library to be called:
    206 ::
    207 
    208     lib = ctypes.CDLL("hlsvd.dylib")
    209 
    210 
    211 This is one place where SWIG is more Pythonic, since I access SWIG's library with a normal Python import statement.
    212 
    213 
    214 Ctypes already knows about, well, C types, so it's easy to create something that can be passed to a function expecting a C float, for instance:
    215 ::
    216 
    217    tstep = ctypes.c_float(.256)
    218 
    219 
    220 Arrays are pretty simple too:
    221 ::
    222 
    223    # Create an array type; this is like a class
    224    InputFloatArrayType = ctypes.c_float * 2048
    225    # Create an instance of that "class"
    226    yr = InputFloatArrayType()
    227 
    228 
    229 The variable setup for the call to ``hlsvd()`` looks like the code below:
    230 ::
    231 
    232     InputFloatArrayType = ctypes.c_float * 2048
    233     OutputDoubleArrayType = ctypes.c_double * 50
    234     OutputFloatArrayType = ctypes.c_float * 50
    235 
    236     yr = InputFloatArrayType()
    237     yi = InputFloatArrayType()
    238 
    239     ndpmax = ctypes.c_long(2048)
    240     ndp = ctypes.c_long(1024)
    241     tstep = ctypes.c_float(.256)
    242     m = ctypes.c_long(380)
    243     nssv = ctypes.c_long(25)
    244     nit = ctypes.c_long(500)
    245 
    246     rootr = OutputDoubleArrayType()
    247     rooti = OutputDoubleArrayType()
    248     fre = OutputFloatArrayType()
    249     dam = OutputFloatArrayType()
    250     amp = OutputFloatArrayType()
    251     phase = OutputFloatArrayType()
    252     sv = OutputDoubleArrayType()
    253     snoise = ctypes.c_double()
    254     rms = ctypes.c_double()
    255     ndiv = ctypes.c_long()
    256     nitf = ctypes.c_long()
    257 
    258 
    259     lib.hlsvd(yr, yi, ctypes.byref(ndpmax), ctypes.byref(ndp),
    260                ctypes.byref(tstep), ctypes.byref(m), ctypes.byref(nssv),
    261                ctypes.pointer(rootr), ctypes.pointer(rooti),
    262                ctypes.pointer(fre), ctypes.pointer(dam),
    263                ctypes.pointer(amp), ctypes.pointer(phase),
    264                ctypes.pointer(sv), ctypes.pointer(snoise),
    265                ctypes.pointer(rms), ctypes.pointer(ndiv),
    266                ctypes.byref(nit), ctypes.pointer(nitf)
    267               )
    268 
    269 
    270 At first glance this might look no better than the SWIG code, but it's actually a bit nicer. For one thing, the array sizes are defined in one place (the first three lines) and not repeated over and over. In addition, it's pretty clear what's going on in a line like this:
    271 ::
    272 
    273     ndiv = ctypes.c_long()
    274 
    275 whereas SWIG's version is opaque:
    276 ::
    277 
    278     ndiv = hlsvd_wrapper.new_ndiv()
    279 
    280 
    281 The ``ctypes.byref()`` call is informative. It says that the C interface demands that I pass by reference, but I don't need access to the pointer here in Python. That tells anyone reading this code that I probably don't care what the called function does with the value, hinting that it is input-only. Again, I wouldn't need this at all if the f2c-generated code was tweaked to eliminate the unnecessary pass-by-reference for the input parameters.
    282 
    283 Also note that the code above includes some value assignments, like this line which assigns the value .256 to tstep:
    284 ::
    285 
    286     tstep = ctypes.c_float(.256)
    287 
    288 
    289 In my ideal Python example above I ignored assigning values to the variables for brevity's sake. And in order to make a fair comparison of SWIG's code to the ideal, I didn't include assignment in that code either. But to fairly compare SWIG and ctypes I need to mention assignment here, because it's simpler with ctypes. The code to assign a value to tstep via the SWIG interface looks like this:
    290 ::
    291 
    292     tstep = hlsvd_wrapper.new_tstep()
    293     hlsvd_wrapper.tstep_assign(tstep, .256)
    294 
    295 
    296 Arrays are also considerably uglier in SWIG. First consider the ctypes version (inputs is a list that I read from a disk file):
    297 ::
    298 
    299     yi = InputFloatArrayType()
    300     for i, f in enumerate(inputs):
    301         yi[i] = inputs[i]
    302 
    303 
    304 And now look at the SWIG version:
    305 ::
    306 
    307     yi = hlsvd_wrapper.new_float_array(2048)
    308     for i in range(0, 2048):
    309         hlsvd_wrapper.float_array_setitem(yi, i, inputs[i])
    310 
    311 
    312 Last but not least, note that the code above is 95% of what I needed to use ctypes to call ``hlsvd()``. There's no interface file, no SWIG-ing step, and no compile & link steps. Just the Python standard library, that's it.
    313 
    314 I have to say, though, that it's only because I saw SWIG first that I'm thrilled with ctypes. It's a nicer tool with which to build an equally ugly interface. Neither SWIG nor ctypes insulates us from the warts and wrinkles of the C interface.
    315 
    316 Conclusions
    317 ------------------------------------
    318 
    319 For the purpose of wrapping ``hlsvd()``, both SWIG and ctypes get the job done, although neither result will win a beauty contest. Setting aside criticisms of their ugly and unPythonic interfaces, ctypes is the hands down winner in the ease-of-use category.
    320 
    321 Some adjustments to the f2c-generated code and another Python wrapper layer around the code demonstrated here could go a long way to making the interface Pythonic.
    322 
    323 I'm less clear on what this experience portends for wrapping a larger C++ library such as GAMMA. The example here was perhaps unfair to SWIG, since SWIG can generate code based on header files but there were no header files with which to work in this case. Ctypes code, on the other hand, must always be handwritten, so in order to wrap 100 functions one must write 100 chunks of ctypes code. SWIG, however, might be able to generate at least some of that automatically.
    324 
    325 I also don't know what this implies for use of C++ objects within Python. Ctypes is strictly for wrapping C code (or code that exports a C-style calling interface) and knows nothing about C++ objects. I suspect that even if SWIG can make it possible to cajole C++ objects into Python, the Babelfish analogy would apply again -- the result might be Python in syntax, but C++ in spirit.
    326 
    327 
    328 
    329 References
    330 ------------------------------------
    331 
    332 | FORTRAN defines only INTEGER, REAL and "DOUBLE PRECISION" numbers.
    333 | http://www.ibiblio.org/pub/languages/fortran/ch2-3.html
    334 
    335 | The options to f2c control how it deals with FORTRAN data types.
    336 | http://www.netlib.org/f2c/f2c.1
    337 
    338 
    339 
    340 
     17.. _webpage: http://docutils.sourceforge.net/rst.html
    34118
    34219}}}