Opened 8 years ago

Closed 5 years ago

#49 closed defect (duplicate)

HLSVDPRO passing NaN to ZLASCL

Reported by: flip Owned by: flip
Priority: major Milestone:
Component: svd Version:
Keywords: Cc:

Description

When using HLSVDPRO, I've seen intermittent cases of the following error:

 ** On entry to ZLASCL parameter number  4 had an illegal value

"Intermittent" isn't quite the right word to describe this problem. When it crops up, it tends to be 100% recreatable. But I've thought several times that I found the source of the problem and vanquished it, only to have it return. Since I've been unable to reliably find the magic conditions to turn it off and on, I'll describe this in general terms.

I should mention that I've seen this mostly on OS X 32 bit, but also occasionally on 64-bit Linux and never on Windows. That might be simply because I do 98% of my testing under OS X.

The problem happens in the function zsafescal() which is in zsafescal.f. It mostly consists of this:

sfmin = dlamch('s')

if (abs(alpha).ge.sfmin) then
 call zscal(n,one/alpha, x, 1)
else
 call zlascl('General',i,i,alpha,one,n,1,x,n,info)
endif

In some cases, alpha (which is a parameter to zsafescal()) is NaN, hence the error "ZLASCL parameter number 4 had an illegal value".

Initially, I found that disabling optimization when compiling zsafescal.f fixed the problem. This was true regardless of whether I disabled optimization just this one file or the whole project. This led me to suspect that zsafescal() was doing something unusual which was tripping up the compiler's optimizer.

zsafescal() used the Fortran keyword "save" (which creates something similar to a C static variable) so I eliminated the use of that. This made the problem disappear even when optimization was turned on, so I felt sure I'd solved the problem.

HLSVDPRO worked for a while, then the problem came back. Even disabling optimization didn't help.

I was able to see that the NaN passed to zsafescal() was really the problem, so I investigated the callers. There are 5 calls to zsafescal() in zlanbprow.f. I tried to debug this problem by adding print statements just before each call to zsafescal() so I could see which one was passing NaN. Curious thing is, adding the print statements makes the problem disappear. In fact, this is the only way I've found I can consistently squash this problem. At first I printed the value of alpha but I found that even printing "" made the problem disappear.

Based on this, my guess is that there's something on the stack that should not be on the stack when zsafescal() is called. When I add a print statement, it's the print that receives the broken stack and not zsafescal(). For whatever reason, print isn't troubled by this whereas zsafescal() is.

I have further evidence to support this theory. Printing to the console is OK when testing but not OK for a release-worthy version of Analysis. So I replaced the print statements with a call to a function I wrote called donothing(). That function writes "" to /dev/null instead of to stdout. Calling donothing() just before each call to zsafescal() also makes the problem disappear. In fact, it even does so if I comment out the code that writes "" to /dev/null.

I have researched compiler flags and double checked that function declarations match (i.e. not passing single precision where double is expected) and haven't found any clues. I'm frustrated and a little embarrassed to admit that I haven't been able to nail this problem down.

For now, I'm going to leave donothing() in the code as a patch that I hope is temporary but realistically realize might be long term.

Change History (1)

comment:1 Changed 5 years ago by flip

  • Resolution set to duplicate
  • Status changed from new to closed

This has be moved to the HLSVDPRO ticket tracker as issue 2.

Note: See TracTickets for help on using tickets.