Opened 8 years ago

Closed 8 years ago

#38 closed defect (fixed)

HLSVD crashes for certain data points values

Reported by: flip Owned by: flip
Priority: major Milestone:
Component: hlsvd Version:
Keywords: Cc:

Description

While playing around with HLSVD I noticed that it crashes hard (segfault, bus error, etc.) when given certain values for # of data points.

The crash is the result of an easy-to-fix logic bug in the Fortran (described below). The situation is a little more complicated than fixing the bug, though, because the logic bug happens in a section of code flagged as 'error in fft'. I don't think we would have expected this FFT code to fail so often, so that frequent failure might represent another bug or a lack of robustness in the algorithm. We might need to file a second ticket describing that problem. This particular ticket is focused on the crash.

Inducing Failure

There's code attached that tests the problem. The code discard the results from the call to hlsvd() because it's only meant to test whether the call crashes or not. Once this bug is fixed, the attached test code will behave the same regardless of whether or not the call succeeds or fails.

The success or failure of a given # of data points is 100% consistent. What's more, it is consistent on both OS X and Windows. I didn't test with Linux but I expect the results would be exactly the same.

I can't find a strong pattern to predict which numbers succeed or fail. However, I can say that all the powers of 2 that I've tried succeed, as do all numbers evenly divisible by 100.

To demonstrate the lack of a pattern, here are some values I tried for # of data points. The numbers in the left hand column succeed, the ones on the right cause a failure. (I only used even numbers.) Given you find pattern here? I can't.

 OK       Fail
----      ----
 900
           902
           904
           906
           908
           910
 912
           914
           916
 918
 920
           922
 924
           926
           928
           930
           932
           934
 936
           938
           940

The Logic Bug

In hlsubr.f there's a subroutine called fftwp(). The 6th param to fftwp() is called "isn". fftwp() is called from three places in the code, and each passes a hardcoded 1 for the value of isn, like this example from lanczos.f:

call fftwp(lr,li,n1,n1,n1,1)

The logic bug is that under certain conditions, fftwp() sets isn=0 (at label 660). Since params are passed by reference in Fortran, this is trying to write to the address of the integer 1, hence the crash.

I don't know if fftwp() is at fault for trying to overwrite what was meant to be a read-only param, or if the callers are at fault for passing a hardcoded int where a writeable variable was expected. fftwp() is almost completely devoid of meaningful comments so it's hard to figure out what "isn" stands for, let alone whether or not it was meant to be read-write.

The fix I suggest is to simply comment out the line that sets isn=0. Even if this was meant as feedback to the callers, they clearly aren't using it.

Why The f2c-ed Code Doesn't Fail

Our f2c-ed version of this code doesn't crash in the same place. (It doesn't crash, but the jury is still out on whether or not it returns meaningful results when fftwp() fails.) The f2c-ed code doesn't crash because f2c implements Fortran's call-by-reference semantics by passing pointers and that gives fftwp() space to write to. The C version of the call to fftwp() looks something like this --

int isn;

isn = 1;
fftwp(&lr, &li, &n1, &n1, &n1, &isn);

So when the code inside fftwp() does this, no harm is done --

*isn = 0;

Attachments (2)

hlsvd_input.xml (32.4 KB) - added by flip 8 years ago.
main.py (897 bytes) - added by flip 8 years ago.

Download all attachments as: .zip

Change History (3)

Changed 8 years ago by flip

Changed 8 years ago by flip

comment:1 Changed 8 years ago by flip

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in r3112.

Also, the symptoms I observed regarding some numbers failing indeed indicate the "lack of robustness" I hinted at. It's a design choice, I think, because fftwp() implements a "mixed-radix fft" which apparently accepts a limited number of inputs.

Note: See TracTickets for help on using tickets.