So here is the infamous code that does an order-64 full complex DFT in 912 additions and 240 multiplications. The code itself is in the Fortran free-format include file fft64t.i90. A simple test program that shows how the code can be used is fft64t.f90. Since I finally have this web page up and running, I can also post code that does all prime powers <= 64. Even though I now know that some of these aren't optimal, I'll post what I've got here in codelets.ZIP in case anyone is interested. My email address can be obtained by running sig.f90 on a little-endian machine with IEEE-754 and ASCII internal representation.
New stuff 1/29/04: fft27a.i90 does an order-27 full complex fft in 368 adds and 152 muls. Also test file fft27a.f90.
New stuff 1/31/04: fft27t.i90 and fft27t.f90, 380 adds & 124 muls.
New stuff 2/1/04: fft25t.i90 and fft25t.f90, 332 adds & 156 muls.
New stuff 2/2/04: fft49t.i90 and fft49t.f90, 856 adds & 472 muls.
New stuff 2/3/04: fft11t.i90 and fft11t.f90, 138 adds & 58 muls.
Same day: fft13t.i90 and fft13t.f90, 164 adds & 64 muls.
New stuff 2/5/04: fft23t.i90 and fft23t.f90, 490 adds & 258 muls.
Same day: fft29a.i90 and fft29a.f90, 586 adds & 322 muls.
New stuff 2/6/04: fft29t.i90 and fft29t.f90, 598 adds & 282 muls.
New stuff 2/7/04: fft31t.i90 and fft31t.f90, 658 adds & 234 muls.
New stuff 2/29/04: fft1024.ZIP, 25488 adds & 8480 muls.
To create this last monster file I used automatic code generation: scaled2d.f90 which is brand new today and badly in need of a maintainence cycle already.
I use the file countops.f90 to count the number of operations in a linear fft subroutine.
New stuff 4/19/04: scaled2g.f90, this avoids internal I/O during I/O to permit use with compilers that aren't in step with the 21st century. Also eliminates global variables, does AVL trees more consistently with current practice, checks user input and even has a comment or two! (But see entry for 8/22/04.)
New stuff 6/29/04: fft31a.i90 and fft31a.f90, 632 adds & 344 muls.
New stuff 7/2/04: fft37t.i90 and fft37t.f90, 764 adds & 328 muls.
New stuff 7/8/04: I was looking at Howard W. Johnson & C. Sidney Burrus, "The Design of Optimal DFT Algorithms Using Dynamic Programming", IEEE Trans. Acoust. Speech Signal Process. 31, 378 (1983) and concluded that the authors had a point, but I assumed that their point would be better made with minimum-add order-5 & 7 DFT modules. Accordingly I offer fft35.i90 and fft35.f90, 490 adds & 216 muls.
New stuff 8/17/04: conv2a.f90 generates cyclic convolution code for order n = 2**k, when input k. (But see entry for 8/22/04.)
Given an efficient cyclic convolution algorithm for power of 2 orders, we can derive efficient DFT algorithms for Fermat prime orders. We have:
fft5.i90 and fft5.f90, 30 adds & 14 muls.
fft17.i90 and fft17.f90, 246 adds & 102 muls.
fft257.i90 and fft257.f90, 9174 adds & 3750 muls.
New stuff 8/22/04: scaled2h.f90 supersedes scaled2g.f90 with a few minor bug fixes, comments, and a workaround for CVF 6.6C. Also conv2b.f90 supersedes conv2a.f90 for similar reasons. In either case search for the string 'CVF' to find out how to implement the workaround.
New stuff 12/22/04: Fortran_stuff some examples which may prove challenging for Fortran 95 compilers.
New stuff 8/23/05: I've got a new version of scaled2i.f90 that now also generates code for real-half complex DFTs and is consistent with the revised notation for SOT(2k). Also consistent is conv2c.f90 with some minor bug fixes. By popular demand, we present loopy code for real-half complex DFTs via radix-8, split-radix, and the new algorithm with SOT in real_ft.ZIP with speed and correctness testing as well as accuracy testing capability.
New stuff 11/26/05: rf16.f90 has algorithms to compute real-half complex radix-16 DFTs with and without intermediate scaling and a real-half complex radix-64 DFT with internal scaling.