The Constant Q Transform

Benjamin Blankertz
 
 

[gzipped ps-file  or  pdf-file]



Introduction

The constant Q transform as introduced in [Brown 91] is very close related to the Fourier transform. Like the Fourier transform a constant Q transform is a bank of filters, but in contrast to the former it has geometrically spaced center frequencies $f_k= f_0\cdot2^{\frac{k}{b}}$  (k=0,...), where b dictates the number of filters per octave. To make the filter domains adjectant one chooses the bandwidth of the k-th filter as $\Delta^{\text{cq}}_k=f_{k+1}-f_k= f_k(2^{\frac{1}{b}}-1)$. This yields a constant ratio of frequency to resolution $Q=\frac{f_k}{\Delta^{\text{cq}}_k}=(2^{\frac{1}{b}}-1)^{-1}$. What makes the constant Q transform so useful is that by an appropriate choice for f0 (minimal center frequency) and b the center frequencies directly correspond to musical notes. For instance choosing b=12 and f0 as the frequency of midinote 0 makes the k-th cq-bin correspond the midinote number k.

Another nice feature of the constant Q transform is its increasing time resolution towards higher frequencies. This resembles the situation in our auditory system. It is not only the digital computer that needs more time to perceive the frequency of a low tone but also our auditory sense. This is related to music usually being less agitated in the lower registers.


Deriving Filter Parameters for the Constant Q Transform

To derive the calculation of the constant Q transform of some sequence x we begin with an inspection of the familiar formula of a Fourier filter at z
(1) \begin{displaymath}\sum_{n<N} x[n]\;e^{-2\pi i n z / N}\end{displaymath}
Each component of the constant Q transform, in the sequel called cq-bin, will be calculated as such a filter, but suitable values for z and window length N have to be found in order to match the properties discussed above. The bandwidth of the filter (1) is $\Delta^{\text{ft}}_z= \nicefrac{f_s}{N}$
(fs denotes the sampling rate) independently of z. Thus the desired bandwidth $\Delta^{\text{cq}}_k= \nicefrac{f_k}{Q}$ can be realized by choosing a window of length $N_k= \frac{f_s}{\Delta^{\text{cq}}_k} = Q\frac{f_s}{f_k}$. The frequency to resolution ratio of the filter in (1) is $\frac{f_z}{\Delta^{\text{ft}}_z} = z$. To achieve a constant value Q for the frequency to resolution ratio of each cq-bin one has to select z:= Q. Thus for integer values Q the k-th cq-bin is the Q-th DFT-bin with window length $Q\frac{f_s}{f_k}$.

Summerizing we get the following recipe for the calculation of a constant Q transform: First choose minimal frequency f0 and the number of bins per octave b according to the requirements of the application. The maximal frequency $f_{\text{max}}$ only affects the number of cq-bins to be calculated1):
\begin{gather}K := \ulcorner b\cdot\log_2(\frac{f_{\text{max}}}{f_0})\urcorner\......N_k}\sum_{n<N_k}x[n]\;w_{N_k}[n]\;e^{-2\pi i n \nicefrac{Q}{N_k}}\end{gather}     [Sorry: equation numbers should be (2) to (5)]

To reduce spectral leakage (cf. [Harris 78]), it is advisable to use the filter in conjunction with some window function:
<w${\langlew_N[n]\;:\;n<N\rangle}$ is some analysis window of length N. We followed Judith Brown by using Hamming windows.



1)$\ulcorner x\urcorner$ denotes the least interger greater than or equal to x.


Direct Implementation

For comparison first the usual FT-algorithm is given, not in its compact form
   ft= (x .* win) * exp(-2*pi*i*(0:N-1)'*(0:N-1)/N),
but with a loop.

function ft= slowFT(x, N)

for k=0:N-1;
   ft(k+1)= x(1:N) * (hamming(N) .* exp( -2*pi*i*k*(0:N-1)'/N)) / N;
end

And here is the direct implementation of the constant Q transform.

function cq= slowQ(x, minFreq, maxFreq, bins, fs)

Q= 1/(2(1/bins)-1);
for k=1:ceil(bins*log2(maxFreq/minFreq));
   N= round(Q*fs/(minFreq*2((k-1)/bins)));
   cq(k)= x(1:N) * (hamming(N) .* exp( -2*pi*i*Q*(0:N-1)'/N)) / N;
end


An Efficient Algorithm

Since the calculation of the constant Q transform according to the formula (5) is very time consuming, an efficient algorithm is highly desirable. Using matrix multiplication the constant Q transform of a row vector x of length N ($N\ge N_k$ for all k<K) is
(6) \begin{displaymath}x^{\text{cq}}= x \cdot {T^{*}}\end{displaymath}
where T* is the complex conjugate of the temporal kernel2)$T= (T_{nk})_{\begin{substack}n<N,\\ k<K \end{substack}}$
(7) \begin{displaymath}T_{nk}:=\begin{cases}\frac{1}{N_k} \, w_{N_k}[n] \; e^{......k}}& \text{if } n<N_k \\0 & \text{otherwise}\end{cases}\end{displaymath}
Since the temporal kernel is independent of the input signal x one can speed up successive constant Q transforms by precalculating T*. But this is very memory consuming and since there are many non vanishing elements in T the calculation of the matrix product $x\cdot{T^{*}}$ still takes quite a while.
Luckily Judith Brown and Miller Puckette came up with a very clever idea for improving the calculation [Brown and Puckette 92]. The idea is to carry out the matrix multiplication in the spectral domain. Since the windowed complex exponentials of the temporal kernel have a DFT that vanishes almost everywhere except for the immediate vicinity of the corresponding frequency the spectral kernel
(8) \begin{displaymath}S:= \text{DFT}(T)\quad \text{(one dimensional DFTs applied columnwise)}\end{displaymath}
is a sparse matrix (after eliminating components below some threshold value). This fact can be exploited for the calculation of $x^{\text{cq}}$ owing to the identity
(9) \begin{displaymath}\sum_{n<N} x[n] \; {y[n]^{*}} \;=\;\frac{1}{N}\sum_{n<N} x^{\text{ft}}[n] \; {y^{\text{ft}}[n]^{*}}\end{displaymath}
where x and y are sequences of length N and $x^{\text{ft}}$$y^{\text{ft}}$ denote their unnormalized discrete Fourier transform. Applying this identity to the formula of constant Q transform (5) using definitions (7) and (8) yields
\begin{displaymath}x^{\text{cq}}[k]= \frac{1}{N} \sum_{n<N} x^{\text{ft}}[n] \; {S_{nk}^{*}}\end{displaymath}
or equivalently in matrix notation
(10) \begin{displaymath}x^{\text{cq}}= \frac{1}{N} \; x^{\text{ft}}\cdot {S^{*}}.\end{displaymath}
Due to the sparseness of S the calculation of the product $x^{\text{ft}}\cdot{S^{*}}$ involves essentially less multiplications than $x\cdot{T^{*}}$.
The Fourier transforms that arise in the efficient algorithm should of course be calculated using FFTs. To this end N is chosen as the lowest power of 2 greater than or equal to N0 (which is the maximum of all Nk). The calculation of the spectral kernel is quite expensive, but having done this once all succeeding constant Q transforms are performed much faster.


2) In (7) we center the filter domains on the left for the ease of notation. Right-centering is more appropriate for real-time applications. Middle-centering has the advantage of making the spectral kernel (8) real.


Implementation of the Efficient Algorithm

function sparKernel= sparseKernel(minFreq, maxFreq, bins, fs, thresh)

if nargin<5 thresh= 0.0054; end    % for Hamming window

Q= 1/(2^(1/bins)-1);                                                     (3)
K= ceil( bins * log2(maxFreq/minFreq) );                                 (2)
fftLen= 2^nextpow2( ceil(Q*fs/minFreq) );
tempKernel= zeros(fftLen, 1);
sparKernel= [];
for k= K:-1:1;
   len= ceil( Q * fs / (minFreq*2^((k-1)/bins)) );                       (4)
   tempKernel(1:len)= hamming(len)/len .* exp(2*pi*i*Q*(0:len-1)'/len);  (7)
   specKernel= fft(tempKernel);                                          (8)
   specKernel(find(abs(specKernel)<=thresh))= 0;
   sparKernel= sparse([specKernel sparKernel]);
end
sparKernel= conj(sparKernel) / fftLen;                                  (10).1
 

function cq= constQ(x, sparKernel)    % x must be a row vector

cq= fft(x,size(sparKernel,1)) * sparKernel;                             (10).2
 

A matlab implementation of the efficient algorithm for the constant Q transform [Brown and Puckette 92] coded by Benjamin Blankertz. The sparseKernel has only to be calculated once. This might take some seconds. After that, constant Q transforms of any row vector x can be done very efficiently by calling constQ. The function hamming is included in the dsp-toolbox. It returns the Hamming window of the given length:

 hamming(len)= 0.46-0.54*cos(2*pi*(0:len-1)'/len).


Bibliography



mail to: <blanker@math.uni-muenster.de>                                 go to: dsp-page                                      go to: home-page