Diagram EditThe band coder with a fixed bitrate that implemented is based on bit allocation on bands depending of the energy of each band. Therefore, its pipeline is: a PCM sound vector (read from a wav file), it is enframed in order to obtain chunks of the sound in which it will compute the MDCT and the FFT, then it allocate bits using the dB SPL of each frequency of each frame and use this allocation to encode bands of the MDCT of the frame depending on their energy; this MDCT frames are written in a .bands file and can be sent into a decoder to obtain a 16 bit playable wav (this would substitute an audio player that would play the file in real time).
One of the main functions of this module is the coder (function bandCoding in Bands.py). This function receives a path to a wav file (originalFile), the bitrate it should achieve (bitrate), the frame length (N) and the output name (codedFile). This values are set to default to make quick tests (originalFile is 'drumsA.wav' in the sounds directory, bitrate is 128000, N is 1024, and codedFile is 'yourfile.bands).
First it creates an Output directory, in case there is no folder named like that, and adapts the output to be saved under this directory. Then informs the user through the terminal that the file has started to be encoded, defines some variables and reads the wav input file 'originalFile' using the function wavread.
Then, in order to measure the dB SPL it computes the max_fft (96 dB) as the maximum of the FFT of a pure tone of 1000Hz. It enframes the signal with the enframe and obtains the number of frames generated in order to later iterate through them.
It opens the stream and the file to write the header with the following information needed to decode the body data of the file: sampling frequency, frame length, bitrate (although it is not necessary), scale bits for the gain and number of frames.
Then for every frame it encodes it by doing the following: first selects the frame from the matrix of the enframed sound, it gets half of its FFT and computes its magnitude. It initialises the gain vector and the bit allocation vector. Then, if the frame has energy it allocates the bits by using the half FFT in SPL of the frame.
To allocate the bits it uses the function allocate from the utilFunctions module. As in this coder it does not use auditory masking models, the input SPL is the FFT in SPL of the frame; while in Perceptual, it uses the Signal to Masker Ratio.
Given this allocation of bits depending on the energy of each band in the frame, it uses another method from utilFunctions called p_encode which quantizes the Bark bands of the MDCT of each frame, using the bits allocated before, with a midtread quantizer; and also returns a quantized gain factor for each band in order to use a dynamic compander method for a better use of the quantization level.
Once it has the bits allocated, the quantized MDCT and the gain, it writes this values in the file; first the gain, then the bit allocation array and finally for each band it writes its quantized values only if there were any bits allocated to it.
Every 10 frames, an information about the encoding status should be displayed in the terminal. When it has finished encoding it also displays information.
The coder ends flushing the stream, closing the file and returning the path to the coded file.
The bitstream format of the .bands files is composed by a header with information needed to decode and then a body with the quantized frames and information needed to decode each frame.
- 16 bits to represent the sampling frequency.
- 12 bits to represent the length of each frame (N).
- 19 bits to represent the bitrate.
- 4 bits to represent the scale bits of the gain.
- 26 bits to represent the number of frames of the file.
Total number of bits for the header: 77 bits.
For each frame:
- bands*(scale bits) bits to represent the gain array.
- bands*4 bits to represent the bit allocation array.
- For each band:
- (bit allocation of the band)*(length of the band) bits for the quantized MDCT of the band.
Note: If the sampling frequency is 44100 Hz, then the number of bands is 25.
Total number of bits for the body changes a lot depending on the parameters
Total number of bits for the file: 77 bits of the header + bits of the body + padding bits.
The second main function of the Band Coder is the decoder (function bandDecoder in Bands.py). This function receives a path to a .bands file as an input parameter.
The decoder checks if the file exists and creates the Output directory if it does not exist. Adapts the output path and prints information. Defines the Birdie Reduction Constant (that will be used later) and then reads the header to obtain the useful variables to decode the file. It prints more information and then starts decoding the frames.
To decode the frames, it first reads the gain factor, then the bit allocation. Initialises certain values that will be needed later and starts reading the values of the MDCT bands of the frame (if there were bits allocated, if else it just sets them to zero)
Once the values from the frame have been read from the file, it dequantizes them (unless the value is zero) and the bit allocation of the band is zero.
Then applies the gain (dequantizing it and multiplying it to each value of the MDCT) and applies the birdie reduction (if any band has an allocation of 0 bits, then the values of the last band are replaced by the ones of the last band of the previous frame multiplied by the birdie ramp constant; this should remove an artifact called "birdies").
It finally applies the inverse MDCT of the decoded MDCT frame and saves it for joining the frames later.
Once all the frames have been decoded and stored in a vector, some information is printed and then the frames are joined by overlapping frames as they were divided. As there was no windowing in the enframing (no triangular, whatsoever) the sound is normalized to obtain values between -1 and 1.