So here is the infamous code that does an order-64 full complex DFT in 912
additions and 240 multiplications. The code itself is in the Fortran free-format
include file fft64t.i90. A simple test
program that shows how the code can be used is fft64t.f90. Since I
finally have this web page up and running, I can also post code that does all
prime powers <= 64. Even though I now know that some of these aren't optimal,
I'll post what I've got here in codelets.ZIP in case
anyone is interested. My email address can be obtained by running sig.f90 on a little-endian
machine with IEEE-754 and ASCII internal representation.
New stuff 1/29/04:
fft27a.i90 does an
order-27 full complex fft in 368 adds and 152 muls. Also test file fft27a.f90.
New stuff
1/31/04: fft27t.i90 and
fft27t.f90, 380 adds
& 124 muls.
New stuff 2/1/04: fft25t.i90 and fft25t.f90, 332 adds &
156 muls.
New stuff 2/2/04: fft49t.i90 and fft49t.f90, 856 adds &
472 muls.
New stuff 2/3/04: fft11t.i90 and fft11t.f90, 138 adds &
58 muls.
Same day: fft13t.i90 and fft13t.f90, 164 adds &
64 muls.
New stuff 2/5/04: fft23t.i90 and fft23t.f90, 490 adds &
258 muls.
Same day: fft29a.i90 and fft29a.f90, 586 adds &
322 muls.
New stuff 2/6/04: fft29t.i90 and fft29t.f90, 598 adds &
282 muls.
New stuff 2/7/04: fft31t.i90 and fft31t.f90, 658 adds &
234 muls.
New stuff 2/29/04: fft1024.ZIP, 25488 adds
& 8480 muls.
To create this last monster file I used automatic code
generation: scaled2d.f90 which is
brand new today and badly in need of a maintainence cycle already.
I use the
file countops.f90 to
count the number of operations in a linear fft subroutine.
New stuff 4/19/04:
scaled2g.f90, this
avoids internal I/O during I/O to permit use with compilers that aren't in step
with the 21st century. Also eliminates global variables, does AVL trees more
consistently with current practice, checks user input and even has a comment or
two! (But see entry for 8/22/04.)
New stuff 6/29/04: fft31a.i90 and fft31a.f90, 632 adds &
344 muls.
New stuff 7/2/04: fft37t.i90 and fft37t.f90, 764 adds &
328 muls.
New stuff 7/8/04: I was looking at Howard W. Johnson & C.
Sidney Burrus, "The Design of Optimal DFT Algorithms Using Dynamic Programming",
IEEE Trans. Acoust. Speech Signal Process. 31, 378 (1983) and
concluded that the authors had a point, but I assumed that their point would be
better made with minimum-add order-5 & 7 DFT modules. Accordingly I offer fft35.i90 and fft35.f90, 490 adds &
216 muls.
New stuff 8/17/04: conv2a.f90 generates
cyclic convolution code for order n = 2**k, when input k. (But see entry for
8/22/04.)
Given an efficient cyclic convolution algorithm for power of 2
orders, we can derive efficient DFT algorithms for Fermat prime orders. We
have:
fft5.i90 and fft5.f90, 30 adds & 14
muls.
fft17.i90 and
fft17.f90, 246 adds
& 102 muls.
fft257.i90 and fft257.f90, 9174 adds
& 3750 muls.
New stuff 8/22/04: scaled2h.f90 supersedes
scaled2g.f90 with a few minor bug fixes, comments, and a workaround for CVF
6.6C. Also conv2b.f90
supersedes conv2a.f90 for similar reasons. In either case search for the string
'CVF' to find out how to implement the workaround.
New stuff 12/22/04: Fortran_stuff some
examples which may prove challenging for Fortran 95 compilers.
New stuff
8/23/05: I've got a new version of scaled2i.f90 that now
also generates code for real-half complex DFTs and is consistent with the
revised notation for SOT(2k). Also consistent is conv2c.f90 with some minor
bug fixes. By popular demand, we present loopy code for real-half complex DFTs
via radix-8, split-radix, and the new algorithm with SOT in real_ft.ZIP with speed
and correctness testing as well as accuracy testing capability.
New stuff
11/26/05: rf16.f90 has
algorithms to compute real-half complex radix-16 DFTs with and without
intermediate scaling and a real-half complex radix-64 DFT with internal scaling.