Analysis: Ticket #40: HLSVD GUI doesn't account for algorithmic limitation for # of data points
http://scion.duhs.duke.edu/vespa/analysis/ticket/40
<p>
The HLSVD implementation uses a mixed-radix FFT which limits it to certain values for # of data points. Other values cause the algorithm to fail (return no data). No Python version of Analysis to date has restricted # of data points to numbers on this whitelist. We need to fix that.
</p>
<p>
IDL Vespa knew about this limitation of HLSVD and fudged users' input (silently) to ensure the # of data points was valid. In the Fortran code, the numbers are read from a file that we don't have anymore. IDL Vespa hardcoded the list and the numbers below were taken from IDL Vespa's code.
</p>
<p>
The numbers in the list only go up to 1024 even though the algorithm purportedly allows 2048. This is probably because the algorithm isn't stable up to 2048 points as described in ticket <a class="closed ticket" href="http://scion.duhs.duke.edu/vespa/analysis/ticket/36" title="defect: Double precison overflow causes hang (infinte loop) in hlsvd (closed: wontfix)">#36</a>. In fact, I found the bug described in ticket <a class="closed ticket" href="http://scion.duhs.duke.edu/vespa/analysis/ticket/36" title="defect: Double precison overflow causes hang (infinte loop) in hlsvd (closed: wontfix)">#36</a> because I was iterating through values 1025-2048 to find which ones were accepted by the mixed-radix FFT.
</p>
<p>
Anyway, here's the list.
</p>
<pre class="wiki"> 1,
2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 30, 32, 33, 34, 35,
36, 38, 39, 40, 42, 44, 45, 46,
48, 49, 50, 51, 52, 54, 55, 56,
57, 60, 63, 64, 65, 66, 68, 69,
70, 72, 75, 76, 77, 78, 80, 81,
84, 85, 88, 90, 91, 92, 95, 96,
98, 99, 100, 102, 104, 105, 108, 110,
112, 114, 115, 117, 119, 120, 121, 125,
126, 128, 130, 132, 133, 135, 136, 138,
140, 143, 144, 146, 147, 150, 152, 153,
154, 156, 160, 161, 162, 165, 168, 169,
170, 171, 175, 176, 180, 182, 184, 187,
189, 190, 192, 195, 196, 198, 200, 204,
207, 208, 209, 210, 216, 220, 221, 224,
225, 228, 230, 231, 234, 238, 240, 242,
243, 245, 247, 250, 252, 253, 255, 256,
260, 264, 266, 270, 272, 273, 275, 276,
280, 285, 286, 288, 289, 294, 297, 299,
300, 304, 306, 308, 312, 315, 320, 322,
323, 324, 325, 330, 336, 338, 340, 342,
343, 345, 350, 351, 352, 357, 360, 361,
363, 364, 368, 374, 375, 378, 380, 384,
385, 390, 391, 392, 396, 399, 400, 405,
408, 414, 416, 418, 420, 425, 429, 432,
437, 440, 441, 442, 448, 450, 455, 456,
459, 460, 462, 468, 475, 476, 480, 483,
484, 486, 490, 494, 495, 500, 504, 506,
507, 510, 512, 513, 520, 525, 528, 529,
532, 539, 540, 544, 550, 552, 560, 567,
572, 575, 576, 578, 585, 588, 594, 600,
605, 608, 612, 616, 621, 624, 625, 630,
637, 640, 644, 648, 650, 660, 672, 675,
676, 680, 684, 686, 693, 700, 702, 704,
720, 722, 726, 728, 729, 735, 736, 748,
750, 756, 760, 765, 768, 780, 784, 792,
800, 810, 816, 819, 825, 828, 832, 833,
836, 840, 845, 847, 850, 855, 864, 867,
875, 880, 882, 884, 891, 896, 900, 912,
918, 920, 924, 931, 936, 945, 950, 952,
960, 968, 972, 975, 980, 988, 990, 1000,
1001, 1008, 1012, 1014, 1020, 1024
</pre>en-usAnalysis/vespa_logo.png
http://scion.duhs.duke.edu/vespa/analysis/ticket/40
Trac 1.0.9flipWed, 29 Feb 2012 00:04:10 GMT
http://scion.duhs.duke.edu/vespa/analysis/ticket/40#comment:1
http://scion.duhs.duke.edu/vespa/analysis/ticket/40#comment:1
<p>
The list above is not quite accurate. For instance, the HLSVD algorithm fails when fed a value of 1001, although that's in the "OK" list above.
</p>
<p>
A little reading on mixed-radix FFTs suggest that they work well when the # of data points is a factor of 2, 3, 5 and 7, otherwise the algorithm's performance is quite bad. That would explain why 1001 doesn't work, since it's factors are 7, 11, and 13.
</p>
<p>
But by that logic 1020 shouldn't work (factors: 2, 2, 3, 5, 17) but it does.
</p>
<p>
Below is some code to print out the prime factors for all numbers up to and including 1024.
</p>
<pre class="wiki">def prime_factors(n):
"""Given an integer N, returns a list of the prime factors of N. For
our convenience, the value 1 is never included in the result although
it should be.
"""
# Swiped from "deuce" here:
# http://wj32.wordpress.com/2007/12/08/prime-factorization-sieve-of-eratosthenes/#comment-235
factors = []
lastresult = n
while lastresult > 1:
c = 2
while lastresult % c > 0:
c += 1
factors.append(c)
lastresult /= c
return factors
for i in range(1, 1025):
print "%4d: %s" % (i, prime_factors(i))
</pre>
TicketflipMon, 05 Mar 2012 21:46:16 GMT
http://scion.duhs.duke.edu/vespa/analysis/ticket/40#comment:2
http://scion.duhs.duke.edu/vespa/analysis/ticket/40#comment:2
<p>
I wrote a little code to test all # data points values between 11 and 1024 inclusive. For this test, I modified HLSVD to re-enable its "error in FFT" message which indicates that the FFT failed due to an unacceptable value for # of data points. I recorded in one file values of n_data_points for which hlsvd returned data. In another file I recorded values for which hlsvd returned no data. I assumed that if hlsvd returned no data (# of singular values found == 0) that this would correlate 100% with the "error in FFT" situation. That assumption is wrong, as described below.
</p>
<p>
The values for which hlsvd is successful are enlightening. All are multiples of the following primes: 2, 3, 5, 7, 11, 13, 17, 19, and 23. From that I conclude that those are the magic limits of the mixed-radix FFT. Any number that can be factored by those primes should be acceptable to the FFT algorithm.
</p>
<p>
Of course, hlsvd is more than just an FFT algorithm, and some values for n_data_points return no data despite being acceptable according to the rules above. For these values, hlsvd does <em>not</em> print the "error in fft" message. In other words, they return no data, but I don't know why. Those values and their factors are:
</p>
<pre class="wiki"> 20: [2, 2, 5]
147: [3, 7, 7]
169: [13, 13]
170: [2, 5, 17]
196: [2, 2, 7, 7]
224: [2, 2, 2, 2, 2, 7]
230: [2, 5, 23]
234: [2, 3, 3, 13]
242: [2, 11, 11]
247: [13, 19]
253: [11, 23]
260: [2, 2, 5, 13]
280: [2, 2, 2, 5, 7]
286: [2, 11, 13]
288: [2, 2, 2, 2, 2, 3, 3]
289: [17, 17]
294: [2, 3, 7, 7]
304: [2, 2, 2, 2, 19]
306: [2, 3, 3, 17]
308: [2, 2, 7, 11]
322: [2, 7, 23]
324: [2, 2, 3, 3, 3, 3]
325: [5, 5, 13]
330: [2, 3, 5, 11]
336: [2, 2, 2, 2, 3, 7]
338: [2, 13, 13]
340: [2, 2, 5, 17]
343: [7, 7, 7]
345: [3, 5, 23]
361: [19, 19]
364: [2, 2, 7, 13]
368: [2, 2, 2, 2, 23]
374: [2, 11, 17]
384: [2, 2, 2, 2, 2, 2, 2, 3]
385: [5, 7, 11]
390: [2, 3, 5, 13]
391: [17, 23]
392: [2, 2, 2, 7, 7]
396: [2, 2, 3, 3, 11]
420: [2, 2, 3, 5, 7]
459: [3, 3, 3, 17]
480: [2, 2, 2, 2, 2, 3, 5]
490: [2, 5, 7, 7]
500: [2, 2, 5, 5, 5]
506: [2, 11, 23]
512: [2, 2, 2, 2, 2, 2, 2, 2, 2]
539: [7, 7, 11]
546: [2, 3, 7, 13]
550: [2, 5, 5, 11]
561: [3, 11, 17]
567: [3, 3, 3, 3, 7]
570: [2, 3, 5, 19]
572: [2, 2, 11, 13]
585: [3, 3, 5, 13]
595: [5, 7, 17]
598: [2, 13, 23]
605: [5, 11, 11]
616: [2, 2, 2, 7, 11]
621: [3, 3, 3, 23]
627: [3, 11, 19]
646: [2, 17, 19]
660: [2, 2, 3, 5, 11]
663: [3, 13, 17]
665: [5, 7, 19]
690: [2, 3, 5, 23]
704: [2, 2, 2, 2, 2, 2, 11]
714: [2, 3, 7, 17]
715: [5, 11, 13]
741: [3, 13, 19]
759: [3, 11, 23]
770: [2, 5, 7, 11]
782: [2, 17, 23]
798: [2, 3, 7, 19]
805: [5, 7, 23]
836: [2, 2, 11, 19]
858: [2, 3, 11, 13]
874: [2, 19, 23]
897: [3, 13, 23]
910: [2, 5, 7, 13]
935: [5, 11, 17]
936: [2, 2, 2, 3, 3, 13]
966: [2, 3, 7, 23]
968: [2, 2, 2, 11, 11]
969: [3, 17, 19]
988: [2, 2, 13, 19]
1001: [7, 11, 13]
1020: [2, 2, 3, 5, 17]
</pre><p>
Since I don't know why those values fail, I'm inclined to assume that they could be successful under the right circumstances until I have evidence to the contrary.
</p>
<p>
However, I can say with confidence that any value for n_data_points that has factors not in (2, 3, 5, 7, 11, 13, 17, 19, 23) will fail.
</p>
<p>
Figuring out why the values listed above are acceptable to the FFT but still fail is left as an exercise for the reader.
</p>
TicketflipMon, 25 Jun 2012 16:02:58 GMTstatus changed; resolution set
http://scion.duhs.duke.edu/vespa/analysis/ticket/40#comment:3
http://scion.duhs.duke.edu/vespa/analysis/ticket/40#comment:3
<ul>
<li><strong>status</strong>
changed from <em>new</em> to <em>closed</em>
</li>
<li><strong>resolution</strong>
set to <em>fixed</em>
</li>
</ul>
<p>
June 2012: the GUI was changed some months ago to limit the # of data points to values that are products of the magic primes, and that solved the problem described. It's a moot point anyway since we're now using a different HLSVD method that doesn't share the same FFT limitations.
</p>
Ticket