Diagram EditThe DFT Band Coder implemented is based in the following pipeline: a PCM sound vector (read from a wav file) is enframed and then the FFT of each frame is computed, returning the real and imaginary part, this values enter in the coder which quantizes the values in the number of bits assigned from the user; this values are written in a .dft file and can be sent into a decoder to obtain a 16 bit playable wav.
One of the main functions of the DFT Band coder is the coder (function DFTcoding in DFT.py). This function receives a path to a wav file (originalFile), number of bands to divide the DFT (numBands), frame length (N), number of bits per sample to quantize (bits) and the output name (codedFile). This values are set to default to make quick tests (originalFile is drumsA.wav in the sounds directory, numBands is 5, N is 1024, bits is 16 and codedfile is yourfile.dft).
First of all it creates an Output directory, in case there is no folder named like that, and adapts the output file to be saved under this directory. Then informs the user through the terminal that the file has started to be encoded and reads the wav input file 'originalFile' using the function wavread.
After that the function enframes the audio signal using a triangular window to avoid artifacts in the edges and obtains the number of frames to encode.
Before starting to encode all the values from the loaded PCM sound, the function creates a binary file using the functions defined in the previous section. In this file it includes the header with the following information needed to decode the body data of the file: sampling frequency (needed to write the wav file in the decoding), frame length (needed to unframe the signal), number of bits used for the quantization, number of bands divisions of the FFT and number of frames of the file.
Then, for every frame it computes the FFT, takes half of it and splits it into real and imaginary parts. Then separates this parts in the number of bands defined and codifies every sample by applying a gain to expand higher bands (that should make a better use of the quantizing levels) and then quantizing them using the function midtread_quantizer_2. This function returns a signal quantized in number of levels that the user assigns. It prints both real and imaginary parts of the frame half FFT and writes it into a the file.
Every 10 frames, an information about the encoding status should be displayed in the terminal. When it has finished encoding it also displays information.
Then it flushes the stream and close the file and finally the function returns the path to the coded file.
The bitstream format of the .dft files is composed by a header with information needed to decode and then a body with the quantized frames.
- 16 bits to represent the sampling frequency.
- 12 bits to represent the length of each frame (N).
- 8 bits to represent the number of bits used for the quantization (R).
- 8 bits to represent the number of bands divisions of the FFT.
- 26 bits to represent the number of frames of the file.
Total number of bits for the header: 70 bits.
For each frame:
- R*(N/2+1) bits for the quantized real part of the FFT.
- R*(N/2+1) bits for the quantized imaginary part of the FFT.
Total number of bits for the body = (number of frames)*(R*(N+2)) bits.
Total number of bits for the file: 70 + (number of frames)*(R*(N+2)) + padding bits.
The other main function of the DFT Band coder is the decoder (function DFTdecoding in DFT.py). This function receives a path to a .dft file (filename).
The decoder first of all checks if the input file exists, and creates the Output directory if this does not exist. Then it informs the user that it has started to decode.
In order to read the file it use our function fread contained in UtilFunctions. This function is used to read the header and for every frame the real and imaginary part of the signal quantized.
Then the function dequantizes our signal using the function midtrad_dequantizer, it recovers the original gain and reconstruct the negative part of the spectrum and then applies the inverse DFT. In this case the user should see a status in the terminal as well.
After that it joing all frames into an output vector, it verifies if there is clipping and in case of clipping it normalizes the output sound vector. Finally it writes the output sound into a wav using the function wavwrite and returns the path of the coded file.