FANDOM


Welcome to the SCAV Tools Wiki

We created this wiki for a better understanding of SCAV Tools usage and theory behind it.

What is SCAV Tools?

SCAV Tools is a Python based interface to encode an audio file using different approaches. It is intended to be used for educational purpouses. It is open-source in order to motivate anyone to modify the functions as much as they please and easily see some evaluation of the changes they have made.

It has been developed by Joan Barceló and Joan Cañellas, as part of the final project of the subject "Audio and Voice Codification Systems" (Sistemes de Codificació d'Audio i Veu in Catalan, abreviated as SCAV) which is taught in the Universitat Pompeu Fabra (UPF) in Barcelona as part of the mandatory subjects for the "Audiovisual Systems Engineering" degree. Most of the code is translated from the MATLAB code assignments asked to deliver during the course, which was partially based on skeleton code which we had to fill. Other parts of the code (interface code and some utilities) are adapted from the sms-tools package developed by the Music Technology Group of the UPF.

User Guide

Installation

In order to use these tools you have to install version 2.7.* of Python and the following modules: numpy, matplotlib, scipy and bitstring.

Unix based operating systems (Ubuntu, Mac OS X...) already have Python installed, but not the modules:

In Ubuntu (strongly recommended) in order to install all these modules it is as simple as typing in the Terminal:

$ sudo apt-get install python-dev python-numpy python-matplotlib python-scipy bitstring

In OS X you install these modules by typing in the Terminal:

$ pip install numpy matplotlib scipy bitstring

Although we do not recommend using Windows for Python (there is no support from the developers), it can be dowloaded here.

To install the modules though the Python terminal for Windows [1]:

cd C:/Python/Scripts/
pip.exe install <package-name>

Usage

This application is developed in Python. Therefore to execute it follow the steps:

  1. Access to the models_interface directory using the terminal (e.g. $ cd scav-tools/software/models_interface/)
  2. Execute the interface of the application using the following command: $ python models_GUI.py

The interface has different tabs to access to different functionalities, which are listed below.


Quantizer Tab

In this tab we can quantize a PCM sound using a number of bits that we choose.

This tab uses the functions in the module from models/scav-tools called Quantizer. For more information about how it works and the output produced see the Quantizer wiki page.

First of all choose a mono .wav file using the Browse button. This input file can be played with the Play button next to the Browse button. Then you can choose the number of bits and finally press the "Quantize button" to produce a result. Once it is computed, the output file can be played using the Play button in the bottom. In the terminal you should see some information about the codification status.

Apart of generating a coded file (.quant) and the decoded (.wav) file, this computation shows the plot of the waveform and the magnitude spectrogram of the input and output sound, the plot of the SNR over time (in frames of 1024 samples), the bitrate of the of the coded file (in kbps), the Compression Ratio, the total SNR and the time of execution.


DFT Coder Tab

In this tab we can encode a PCM sound using using a DFT based band coder.

This tab uses the functions in the module from models/scav-tools called DFT. For more information about how it works and the output produced see the DFT wiki page.

First of all choose a mono .wav file using the Browse button. This input file can be played with the Play button next to the Browse button. Then you can choose the number of bands, the window size of each frame and the number of bits per sample, and finally press the "Apply Codec" button to encode the file. Once it is computed, the output file can be played using the Play button in the bottom. In the terminal you should see some information about the codification status.

Apart of generating a coded file (.dft) and the decoded (.wav) file, this computation shows the plot of the waveform and the magnitude spectrogram of the input and output sound, the plot of the SNR over time (in frames of 1024 samples), the bitrate of the of the coded file (in kbps), the Compression Ratio, the total SNR and the time of execution.


MDCT Coder Tab

In this tab we can encode a PCM sound using using a MDCT based band coder.

This tab uses the functions from the module from models/scav-tools called MDCT. For more information about how it works and the output produced see the MDCT wiki page.

First of all choose a mono .wav file using the Browse button. This input file can be played with the Play button next to the Browse button. Then you can choose the number of bands, the window size of each frame and the number of bits per sample, and finally press the "Apply Codec" button to encode the file. Once it is computed, the output file can be played using the Play button in the bottom. In the terminal you should see some information about the codification status.

Apart of generating a coded file (.mdct) and the decoded (.wav) file, this computation shows the plot of the waveform and the magnitude spectrogram of the input and output sound, the plot of the SNR over time (in frames of 1024 samples), the bitrate of the of the coded file (in kbps), the Compression Ratio, the total SNR and the time of execution.


Bands Coder Tab

In this tab we can encode a PCM sound using using a Band based coder to achieve a certain bitrate without perceptual modeling.

This tab uses the functions in the module from models/scav-tools called Bands. For more information about how it works and the output produced see the Bands wiki page.

First of all choose a mono .wav file using the Browse button. This input file can be played with the Play button next to the Browse button. Then you can choose the desired bitrate and the window size of each frame, and finally press the "Apply Codec" button to encode the file. Once it is computed, the output file can be played using the Play button in the bottom. In the terminal you should see some information about the codification status. Then the output file can be played.

Apart of generating a coded file (.bands) and the decoded (.wav) file, this computation shows the plot of the waveform and the magnitude spectrogram of the input and output sound, the plot of the SNR over time (in frames of 1024 samples), the bitrate of the of the coded file (in kbps), the Compression Ratio, the total SNR and the time of execution.


Perceptual Coder Tab

In this tab we can encode a PCM sound using using a Band based coder to achieve a certain bitrate using perceptual modeling to mask the quantization noise.

This tab uses the functions in the module from models/scav-tools called Perceptual. For more information about how it works and the output produced see the Perceptual wiki page.

First of all choose a mono .wav file using the Browse button. This input file can be played with the Play button next to the Browse button. Then you can choose the desired bitrate and the window size of each frame, and finally press the "Apply Codec" button to encode the file. Once it is computed, the output file can be played using the Play button in the bottom. In the terminal you should see some information about the codification status. Then the output file can be played.

Apart of generating a coded file (.perc) and the decoded (.wav) file, this computation shows the plot of the waveform and the magnitude spectrogram of the input and output sound, the plot of the SNR over time (in frames of 1024 samples), the bitrate of the of the coded file (in kbps), the Compression Ratio, the total SNR and the time of execution.


Decoder Tab

In this tab we can decode any file encoded with a scav-tools encoder. Therefore allowing users to send scav-tools generated files and then decode them.

This tab uses the decoding functions from all the modules in models/scav-tools, that is Quantizer Decoder, DFT Decoder, MDCT Decoder, Bands Decoder and Perceptual Decoder.

First of all choose a coded file (.quant, .dft, .mdct, .bands or .perc), then press the "Decode" button to compute the decoding. Finally the output file can be played.

The app will show you where this file has been saved after you decode the file.

Modules of the package

The source code of the modules with functions used can be found at the software directory on scav-tools. They are divided in two main blocks: models, which have the computation code for each coder; and models_interface which provides tools for the interface of each coder. Here we list them all with a little description. For more and complete information about the models there is a main wiki page for each main models module.

Models

Quantizer module

This is an overview, see main page for details.

The Quantizer module provides tools to quantize a PCM signal using a certain number of bits per sample. It includes the:

  • Quantizer: Main function of the module. Specifiyng the path to the PCM input .wav file, the number of bits desired and the output name; it should quantize the input PCM using the desired bits into a .quant file which can be decoded with the Decoder function.
  • Decoder: Specifying the path to the .quant file to decode, this function should generate a playable wav file of the resulting encoding of the Quantizer main function.
  • quantize_num: Function used by the Quantizer main function. Specifying the number to quantize and the bits used it returns an integer number representable with the desired bits using a midtread quantizer schema.

DFT module

This is an overview, see main page for details.

The DFT module provides tools to encode a PCM signal using a frequency band approach. It works better for tonal sounds, because it adapts the quantization of each band to theoretically have an equal energy distribuition for each band. It includes the functions:

  • DFTcoding: Main function of the module. Specifiyng the path to the PCM input .wav file, the number of bands desired, the length of each DFT frame, the number of bits used to quantize each sample of the real part and imaginary part of the DFT frame and the output name; it should quantize the input PCM using the desired bits into a .dft file which can be decoded with the DFTdecoding function.
  • DFTdecoding: Specifying the path to the .dft file to decode, this function should generate a playable wav file of the resulting encoding of the DFTcoding function.

MDCT module

This is an overview, see main page for details.

The MDCT module provides tools to encode a PCM signal using a frequency band approach but using the Modified Discrete Cosine Transform. It works in a similar way to the DFT module, but as the MDCT only has a real part, it produces smaller files (because the DFT uses real and imaginary parts, the length of the fie is the double). It includes the functions:

  • MDCTcoding: Main function of the module. Specifiyng the path to the PCM input .wav file, the number of bands desired, the length of each MDCT frame, the number of bits used to quantize each sample of the MDCT frame and the output name; it should quantize the input PCM using the desired bits into a .mdct file which can be decoded with the MDCTdecoding function.
  • MDCTdecoding: Specifying the path to the .mdct file to decode, this function should generate a playable wav file of the resulting encoding of the MDCTcoding function.

Bands module

This is an overview, see main page for details.

The Bands module provides tools to encode a PCM signal using a frequency Bark band approach also using the MDCT but allocating bits to each Bark band depending on its energy. It is not perceptual auditive masking based, therefore it doesn't have in mind that some noise might be masked. It includes the functions:

  • bandCoding: Main function of the module. Specifiyng the path to the PCM input .wav file, the bitrate desired, the length of each MDCT frame, and the output name; it should quantize the input PCM achieving the desired bitrate into a .bands file which can be decoded with the bandDecoder function.
  • bandDecoder: Specifying the path to the .bands file to decode, this function should generate a playable wav file of the resulting encoding of the MDCTcoding function.

Perceptual module

This is an overview, see main page for details.

The Perceptual module provides tools to encode a PCM signal using a frequency Bark band approach also using the MDCT but allocating bits to each Bark band depending on its SMR's energy. The SMR is the difference of the FFT (in SPL) of the signal and the computing masking spectrum. It includes the functions:

  • perceptual: Main function of the module. Specifiyng the path to the PCM input .wav file, the bitrate desired, the output name and the length of each MDCT frame; it should quantize the input PCM achieving the desired bitrate into a .perc file which can be decoded with the percDecoder function.
  • percDecoder: Specifying the path to the .perc file to decode, this function should generate a playable wav file of the resulting encoding of the perceptual function.

UtilFunctions module

This is an overview, see main page for more details.

This module includes some functions that are common in some of the modules. It also includes functions adapted from sms-tools.

  • isPower2: Given a number it checks if it is power of two or not in order to secure a good computation of the FFT.
  • wavread: Given a path to a .wav file it extracts the PCM vector representation and its sampling frequency (fs). We have modified the function such that if a file is stereo, only the first channel is returned.
  • wavplay: Given a path to a .wav file it uses the operative sistem provided tools to play it though the default audio device.
  • wavwrite: Given a sound PCM vector, its fs and the output path it saves the vector into a .wav file.
  • peakDetection: Given a magnitude spectrum vector and a threshold, it returns the peak locations of the spectrum vector that are over the threshold.
  • lenOfFile: Given a path to a file, it returns the size of the file in kbits.
  • fwrite: Writes a variable (number or array) to a chosen file using a desired amount of bits.
  • fread: Reads a variable (number or array) that is expressed in a certain number of bits from a chosen file.
  • openStream: Initialises a global variable Stream, in order to print bits in the files.
  • flushStream: Empties all the bits left in the global variable Stream by doing zero padding because of the Byte alingation in Python.
  • signalToNoiseRatio: Given two arrays that represent an input and output signal and a frame length it computes the total Signal to Noise Ratio (SNR) between input and output and also a vector of the SNR calculated in frames of the desired length.
  • bark: Converts a lineal frequency in Hertz to the Bark scale.
  • fftbark: Converts a fft bin to bark scale using the fft size and the fs.
  • Schroeder: Calculates the masking spectrum for a given frequency SPL frame.
  • midtread_quantizer: Quantizes a number with the number of bits specified by using a midtread schema.
  • midtread_quantizer_2: Improved version of the midtread_quantizer, which can also quantize arrays and indicate overload in debugging phases.
  • midtread_dequantizer: Dequantizes a value or an array quantized with midtread_quantizer (version one or two) using a specified number of bits.
  • mdct: Returns the Modified Discrete Cosine Transform of an input signal.
  • imdct: Returns the synthesied signal of an input MDCT vector.
  • p_encode: Quantizes a MDCT frame using a vector that represents the bit allocation for each band.
  • enframe: Given a signal, a window and an overlap size it divides the signal into windowed frames using overlap.
  • allocate: Given a Signal Masker Ratio (if a masking based bit allocation is desired) or a frequency SPL (if a band energy based bit allocation is desired), the number of bits used for the gain (to substract them from the bitrate), the frame length and the fs; it allocates bits to each band dinamically by giving one bit to each band for each 6dB in the SMR or SPL.

Models Interface

models_GUI

This scrip is the main of the interface of our program. It calls the others scrips of the program to show que different tabs.

Quantizer frame and function

  • Quantizer_frame: Displays the tab of the Quantizer. It displays file search, file play, number of bits, compute button and play computed file button. Afther computing it displays bitrate, compression ratio, SNR and time of encoding.
  • Quantizer_function: Calls the Quantizer module to quantize and decode an audio file. It measures the bitrate, compression ratio, SNR and time of encoding; and plot SNR over time, original waveform, encoded waveform, original spectrum and encoded spectrum.

DFT frame and function

  • DFT_frame: Displays the tab of the DFT Model. It displays file search, file play, number of bands, window size of each frame, number of bits per band, compute button and play computed file button. After computing it displays bitrate, compression ratio, SNR and time of encoding.
  • DFT_function: Calls the DFT module to code and decode a audio file. It mesure the bitrate, compression ratio, SNR and time of encoding; and plot SNR over time, original waveform, encoded waveform, original spectrum and encoded spectrum.

MDCT frame and function

  • MDCT_frame: Displays the tab of the MDCT Model. It displays file search, file play, number of bands, window size of each frame, number of bits per band, compute button and play computed file button. After computing it displays bitrate, compression ratio, SNR and time of encoding.
  • MDCT_function: Calls the MDCT module to code and decode a audio file. It mesure the bitrate, compression ratio, SNR and time of encoding; and plot SNR over time, original waveform, encoded waveform, original spectrum and encoded spectrum.

Bands frame and function

  • Bands_frame: Displays the tab of the Bands. It displays file search, file play, desired bitrate, window size of each frame, compute button and play computed file button. After computing it displays bitrate, compression ratio, SNR and time of encoding.
  • Bands_function: Calls the Bands module to code and decode a audio file. It mesure the bitrate, compression ratio, SNR and time of encoding; and plot SNR over time, original waveform, encoded waveform, original spectrum and encoded spectrum.

Perceptual frame and function

  • Perceptual_frame: Displays the tab of the Perceptual. It displays file search, file play, desired bitrate, window size of each frame, compute button and play computed file button. After computing it displays bitrate, compression ratio, SNR and time of encoding.
  • Preceptual_function: Calls the Perceptual module to code and decode a audio file. It mesure the bitrate, compression ratio, SNR and time of encoding; and plot SNR over time, original waveform, encoded waveform, original spectrum and encoded spectrum.

Decoder frame and function

  • Decoder_frame: Displays the tap of the Decoder. It displays file search, decode button and play computed file button. After computing it displays where the decoded file has been saved.
  • Decoder_function: Reads the extension of the input file and calls the decoder depending on the extension to return a decodified audio file (in .wav).

References:
  1. StackOverflow: How do I install Python packages on Windows? http://stackoverflow.com/a/23833666

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.