Tools (dsptoolbox.tools)

Tools

This module contains general dsp utilities. These functions use exclusively arrays and primitive data types instead of custom classes.

dsptoolbox.tools.convert_sample_representation(values: ndarray[tuple[Any, ...], dtype[_ScalarT]] | bytes, input_format: str, output_format: str, cast_output: bool = True, output_in_bytes: bool = False) tuple[ndarray[tuple[Any, ...], dtype[_ScalarT]] | bytes, float, float]

This function takes in an array of audio samples and turns it into the desired sample output format. It always clips the input to the maximum allowed range.

Parameters:
vectorNDArray, bytes

Values to convert. If in bytes, the output will be a flat array as the input without reordering.

input_formatstr, {“f32”, “f64”, “i8”, “i16”, “i24”, “i32”, “u8”, “u16”, “u24”, “u32”}

Input format for the samples. If the input is a byte array, the samples will be read using numpy.frombuffer(). In the case of “i24” and “u24”, the input is expected to have 3-bytes samples and the endianness of the current platform.

output_formatstr, {“f32”, “f64”, “i8”, “i16”, “i24”, “i32”, “u8”, “u16”, “u24”, “u32”}

Output format for the samples.

cast_outputbool, optional

When True, the output vector is casted to the equivalent data type of the output format. This throws an assertion error if the casting is not supported by numpy (for “i24” and “u24”) AND the output is not in bytes. When avoiding casting, the data type of the output is always np.float64. Default: True.

output_in_bytesbool, optional

When True, the array is returned with its bytes representation as produced by numpy.tobytes(). In the case of “i24” and “u24” and cast_output=True, the bytes are always produced with the endianness of the current platform and the size is 3 per sample bytes and in c ordering. Default: False.

Returns:
outputNDArray or bytes

Vector with samples in the desired format.

equilibriumfloat

Value that represents equilibrium in the output sample format.

span_valuefloat

Maximum distance from the equilibrium to the ends of the dynamic range in the output sample format. Use equilibrium ± span_value to find the range of values.

Notes

  • Dithering is advised when lowering the bit depth, this is not done within this function.

  • Passing the same format as input and output will raise an AssertionError.

  • i refers to signed integer and u means unsigned integer.

dsptoolbox.tools.erb_frequencies(freq_range_hz=[20, 20000], resolution: float = 1, reference_frequency_hz: float = 1000) ndarray[tuple[Any, ...], dtype[float64]]

Get frequencies that are linearly spaced on the ERB frequency scale. This implementation was taken and adapted from the pyfar package. See references.

Parameters:
freq_rangearray-like, optional

The upper and lower frequency limits in Hz between which the frequency vector is computed. Default: [20, 20e3].

resolutionfloat, optional

The frequency resolution in ERB units. 1 returns frequencies that are spaced by 1 ERB unit, a value of 0.5 would return frequencies that are spaced by 0.5 ERB units. Default: 1.

reference_frequencyfloat, optional

The reference frequency in Hz relative to which the frequency vector is constructed. Default: 1000.

Returns:
frequenciesNDArray[np.float64]

The frequencies in Hz that are linearly distributed on the ERB scale with a spacing given by resolution ERB units.

References

  • The pyfar package: https://github.com/pyfar/pyfar

  • B. C. J. Moore, An introduction to the psychology of hearing, (Leiden, Boston, Brill, 2013), 6th ed.

  • V. Hohmann, “Frequency analysis and synthesis using a gammatone filterbank,” Acta Acust. united Ac. 88, 433-442 (2002).

  • P. L. Søndergaard, and P. Majdak, “The auditory modeling toolbox,” in The technology of binaural listening, edited by J. Blauert (Heidelberg et al., Springer, 2013) pp. 33-56.

dsptoolbox.tools.fractional_octave_frequencies(num_fractions=1, frequency_range=(20, 20000.0), return_cutoff=False) tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]], tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]]] | tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]]

Return the octave center frequencies according to the IEC 61260:1:2014 standard. This implementation has been taken from the pyfar package. See references.

For numbers of fractions other than 1 and 3, only the exact center frequencies are returned, since nominal frequencies are not specified by corresponding standards.

Parameters:
num_fractionsint, optional

The number of bands an octave is divided into. Eg., 1 refers to octave bands and 3 to third octave bands. The default is 1.

frequency_rangearray, tuple

The lower and upper frequency limits, the default is frequency_range=(20, 20e3).

Returns:
nominalarray, float

The nominal center frequencies in Hz specified in the standard. Nominal frequencies are only returned for octave bands and third octave bands. Otherwise, an empty array is returned.

exactarray, float

The exact center frequencies in Hz, resulting in a uniform distribution of frequency bands over the frequency range.

cutoff_freqtuple, array, float

The lower and upper critical frequencies in Hz of the bandpass filters for each band as a tuple corresponding to (f_lower, f_upper).

References

dsptoolbox.tools.fractional_octave_smoothing(vector: ndarray[tuple[Any, ...], dtype[float64]], bin_spacing_octaves: float | None = None, num_fractions: int = 3, window_type='hann', window_vec: ndarray[tuple[Any, ...], dtype[float64]] | None = None, clip_values: bool = False) ndarray[tuple[Any, ...], dtype[float64]]

Smoothes a vector using interpolation to a logarithmic scale. Usually done for smoothing of frequency data. This implementation is taken from the pyfar package, see references.

Parameters:
vectorNDArray[np.float64]

Vector to be smoothed. It is assumed that the first axis is to be smoothed.

bin_spacing_octavesfloat, None, optional

Spacing between frequency bins in octaves. If None, it is assumed that the vector is linearly spaced. Default: None.

num_fractionsint, optional

Fraction of octave to be smoothed across. Default: 3 (third band).

window_typestr, optional

Type of window to be used. See scipy.signal.windows.get_window for valid types. If the window is ‘gaussian’, the parameter passed will be interpreted as alpha and not sigma. Default: ‘hann’.

window_vecNDArray[np.float64], optional

Window vector to be used as a window. window_type should be set to None if this direct window is going to be used. Default: None.

clip_valuesbool, optional

When True, negative values are clipped to 0. Default: False.

Returns:
vec_finalNDArray[np.float64]

Vector after smoothing.

References

  • Tylka, Joseph & Boren, Braxton & Choueiri, Edgar. (2017). A Generalized Method for Fractional-Octave Smoothing of Transfer Functions that Preserves Log-Frequency Symmetry. Journal of the Audio Engineering Society. 65. 239-245. 10.17743/jaes.2016.0053.

  • https://github.com/pyfar/pyfar

dsptoolbox.tools.framed_signal(time_data: ndarray[tuple[Any, ...], dtype[float64]], window_length_samples: int, step_size: int, keep_last_frames: bool = True) ndarray[tuple[Any, ...], dtype[float64]]

This function turns a signal into (possibly) overlaping time frames. The original data gets copied.

Parameters:
time_dataNDArray[np.float64]

Signal with shape (time samples, channels).

window_length_samplesint

Window length in samples.

step_sizeint

Step size (also called hop length) in samples.

keep_last_framesbool, optional

When True, the last frames (probably with zero-padding) are kept. Otherwise, no frames with zero padding are included. Default: True.

Returns:
time_data_framedNDArray[np.float64]

Framed signal with shape (time samples, frames, channels).

Notes

  • Perfect reconstruction from this representation can be achieved when the signal is zero-padded at the edges where the window does not yet meet the COLA condition. Otherwise, these sections might be distorted.

dsptoolbox.tools.frequency_crossover(crossover_region_hz: list[float], logarithmic: bool = True)

Return a callable that can be used to extract values from a crossover to use on frequency data. This uses a hann window function to generate the crossover. It is a “fade-in”, i.e., the values are 0 before the low frequency and rise up to 1 at the high frequency of the crossover.

Parameters:
crossover_region_hzlist with length 2

Frequency range for which to create the crossover.

logarithmicbool, optional

When True, the crossover is defined logarithmically on the frequency axis. Default: True.

Returns:
callable

Callable that produces values from the crossover function. The input should always be in Hz. It can take float or NDArray[np.float64] and returns the same type.

dsptoolbox.tools.from_db(x: float | ndarray[tuple[Any, ...], dtype[float64]], amplitude_output: bool)

Get the values in their amplitude or power form from dB.

Parameters:
xfloat, NDArray[np.float64]

Values in dB.

amplitude_outputbool

When True, the values are returned in their linear form. Otherwise, the squared (power) form is returned.

Returns:
float NDArray[np.float64]

Converted values

dsptoolbox.tools.get_exact_value_at_frequency(freqs_hz: ndarray[tuple[Any, ...], dtype[float64]], y: ndarray[tuple[Any, ...], dtype[Any]], f: float = 1000.0)

Return the exact value at 1 kHz extracted by using linear interpolation.

Parameters:
freqs_hzNDArray[np.float64]

Frequency vector in Hz. It is assumed to be in ascending order.

yNDArray[np.float64]

Values to use for the interpolation.

ffloat, optional

Frequency to query. Default: 1000.

Returns:
float

Queried value.

dsptoolbox.tools.get_smoothing_factor_ema(relaxation_time_s: float, sampling_rate_hz: int, accuracy: float = 0.95)

This computes the smoothing factor needed for a single-pole IIR, or exponential moving averager. The returned value (alpha) should be used as follows:

y[n] = alpha * x[n] + (1-alpha)*y[n-1]
Parameters:
relaxation_time_sfloat

Time for the step response to stabilize around the given value (with the given accuracy).

sampling_rate_hzint

Sampling rate to be used.

accuracyfloat, optional

Accuracy with which the value of the step response can differ from 1 after the relaxation time. This must be between ]0, 1[. Default: 0.95.

Returns:
alphafloat

Smoothing value for the exponential smoothing.

Notes

dsptoolbox.tools.interpolate_fr(f_interp: ndarray[tuple[Any, ...], dtype[float64]], fr_interp: ndarray[tuple[Any, ...], dtype[float64]], f_target: ndarray[tuple[Any, ...], dtype[float64]], mode: str | None = None, interpolation_scheme: str = 'linear') ndarray[tuple[Any, ...], dtype[float64]]

Interpolate one frequency response to a new frequency vector.

Parameters:
f_interpNDArray[np.float64]

Frequency vector of the frequency response that should be interpolated.

fr_interpNDArray[np.float64]

Frequency response to be interpolated.

f_targetNDArray[np.float64]

Target frequency vector.

modestr {“db2amplitude”, “amplitude2db”, “power2db”, “power2amplitude”, “amplitude2power”}, None, optional

Convert between amplitude, power or dB representation during the interpolation step. For instance, using the modes “db2power” means input in dB, interpolation in power spectrum, output in dB. Available modes are “db2amplitude”, “amplitude2db”, “power2db”, “power2amplitude”, “amplitude2power”. Pass None to avoid any conversion. Default: None.

interpolation_schemestr {“linear”, “quadratic”, “cubic”}, optional

Type of interpolation to use. See scipy.interpolation.interp1d for details. Choose from “quadratic” or “cubic” splines, or “linear”. Default: “linear”.

Returns:
NDArray[np.float64]

New interpolated frequency response corresponding to f_target vector.

Notes

  • The input is always assumed to be already sorted.

  • In case f_target has values outside the boundaries of f_interp, 0 is used as the fill value. For interpolation in dB, fill values are the vector’s edges.

  • The interpolation is always done along the first (outer) axis or the vector.

  • When converting to dB, the default clipping value of to_db is used.

  • Theoretical thoughts on interpolating an amplitude or power frequency response:

    • Using complex and dB values during interpolation are not very precise when comparing the results in terms of the amplitude or power spectrum.

    • Interpolation can be done with amplitude or power representation with similar precision.

    • Changing the frequency resolution in a linear scale means zero- padding or trimming the underlying time series. For an amplitude representation , i.e. spectrum or spectral density, the values must be scaled using the factor old_length/new_length. This ensures that the RMS values (amplitude spectrum) are still correct, and that integrating the new power spectral density still renders the total signal’s energy truthfully, i.e. parseval’s theorem would still hold. For the power representation, it also applies with the same squared factor.

    • A direct FFT-result which is not in physical units needs rescaling depending on the normalization scheme used during the FFT -> IFFT (in the complex/amplitude representation):

      • Forward: scaling factor old_length/new_length.

      • Backward: no rescaling.

      • Orthogonal: scaling factor (old_length/new_length)**0.5

    • Interpolating the (amplitude or power) spectrum to a logarithmic- spaced frequency vector can be done without rescaling (the underlying transformation in the time domain would be warping). Doing so for the (amplitude or power) spectral density only retains its validity if the new spectrum is weighted exponentially with increasing frequency since each bin contains the energy of a larger “frequency band” (this changes the physical units of the spectral density). Doing so ensures that integrating the power spectral density over frequency still retains the energy of the signal (parseval).

    • Assuming a different time window in each frequency resolution would require knowing the specific windows in order to rescale correctly. Assuming the same time window while zero-padding in the time domain would mean that no rescaling has to be applied.

dsptoolbox.tools.log_frequency_vector(frequency_range_hz: list[float], n_bins_per_octave: int) ndarray[tuple[Any, ...], dtype[float64]]

Obtain a logarithmically spaced frequency vector with a specified number of frequency bins per octave.

Parameters:
frequency_range_hzlist[float]

Frequency with length 2 for defining the frequency range. The lowest frequency should be above 0.

n_bins_per_octaveint

Number of frequency bins in each octave.

Returns:
NDArray[np.float64]

Log-spaced frequency vector

dsptoolbox.tools.log_mean(x: ndarray[tuple[Any, ...], dtype[float64]], axis: int = 0)

Get the mean value while using a logarithmic x-axis. It is assumed that x is initially linearly-spaced.

Parameters:
xNDArray[np.float64]

Vector for which to obtain the mean.

axisint, optional

Axis along which to compute the mean.

Returns:
float or NDArray[np.float64]

Logarithmic mean along the selected axis.

dsptoolbox.tools.next_power_2(number, mode: str = 'closest') int

This function returns the power of 2 closest to the given number.

Parameters:
numberint, float

Number for which to find the closest power of 2.

modestr {‘closest’, ‘floor’, ‘ceil’}, optional

‘closest’ gives the closest value. ‘floor’ returns the next smaller power of 2 and ‘ceil’ the next larger. Default: ‘closest’.

Returns:
number_2int

Next power of 2 according to the selected mode.

dsptoolbox.tools.reconstruct_from_framed_signal(td_framed: ndarray[tuple[Any, ...], dtype[float64]], step_size: int, window: str | ndarray[tuple[Any, ...], dtype[float64]] | None = None, original_signal_length: int | None = None, safety_threshold: float = 0.0001) ndarray[tuple[Any, ...], dtype[float64]]

Gets and returns a framed signal into its vector representation.

Parameters:
td_framedNDArray[np.float64]

Framed signal with shape (time samples, frames, channels).

step_sizeint

Step size in samples between frames (also known as hop length).

windowstr, NDArray[np.float64], optional

Window (if applies). Pass None to avoid using a window during reconstruction. Default: None.

original_signal_lengthint, optional

When different than None, the output is padded or trimmed to this length. Default: None.

safety_thresholdfloat, optional

When reconstructing the signal with a window, very small values can lead to instabilities. This safety threshold avoids dividing with samples beneath this value. Default: 1e-4.

Dividing by 1e-4 is the same as amplifying by 80 dB.

Returns:
tdNDArray[np.float64]

Reconstructed signal with shape (time samples, channels).

dsptoolbox.tools.scale_spectrum(spectrum: ndarray[tuple[Any, ...], dtype[float64]] | ndarray[tuple[Any, ...], dtype[complex128]], scaling: SpectrumScaling, time_length_samples: int, sampling_rate_hz: int, window: ndarray[tuple[Any, ...], dtype[float64]] | None = None) ndarray[tuple[Any, ...], dtype[float64]]

Scale the spectrum directly from the unscaled (“backward” normalization) (R)FFT. If a window was applied, it is necessary to compute the right scaling factor.

Parameters:
spectrumNDArray[np.float64] | NDArray[np.complex128]

Spectrum to scale. It is assumed that the frequency bins are along the first dimension. No FFT normalization should have been applied to it.

scalingSpectrumScaling

Type of scaling to use. Using a power representation will returned the squared spectrum.

time_length_samplesint

Original length of the time data.

sampling_rate_hzint

Sampling rate.

windowNDArray[np.float64], None, optional

Applied window when obtaining the spectrum. It is necessary to compute the correct scaling factor. In case of None, “boxcar” window is assumed. Default: None.

Returns:
NDArray[np.float64] | NDArray[np.complex128]

Scaled spectrum

Notes

  • The amplitude spectrum shows the RMS value of each frequency in the signal.

  • Integrating the power spectral density over the frequency spectrum delivers the total energy contained in the signal (parseval’s theorem).

dsptoolbox.tools.time_smoothing(x: ndarray[tuple[Any, ...], dtype[float64]], sampling_rate_hz: int, ascending_time_s: float, descending_time_s: float | None = None) ndarray[tuple[Any, ...], dtype[float64]]

Smoothing for a time series with independent ascending and descending times using an exponential moving average. It works on 1D and 2D arrays. The smoothing is always applied along the longest axis.

If no descending time is provided, ascending_time_s is used for both increasing and decreasing values.

Parameters:
xNDArray[np.float64]

Vector to apply smoothing to.

sampling_rate_hzint

Sampling rate of the time series x.

ascending_time_sfloat

Corresponds to the needed time for achieving a 95% accuracy of the step response when the samples are increasing in value. Pass 0. in order to avoid any smoothing for rising values.

descending_time_sfloat, None, optional

As ascending_time_s but for descending values. If None, ascending_time_s is applied. Default: None.

Returns:
NDArray[np.float64]

Smoothed time series.

dsptoolbox.tools.to_db(x: ndarray[tuple[Any, ...], dtype[_ScalarT]], amplitude_input: bool, dynamic_range_db: float | None = None, min_value: float | None = 2.2250738585072014e-308) ndarray[tuple[Any, ...], dtype[float64]]

Convert to dB from amplitude or power representation. Clipping small values can be activated in order to avoid -inf dB outcomes.

Parameters:
xNDArray

Array to convert to dB.

amplitude_inputbool

Set to True if the values in x are in their linear form. False means they have been already squared, i.e., they are in their power form.

dynamic_range_dbfloat, None, optional

If specified, a dynamic range in dB for the vector is applied by finding its largest value and clipping to max - dynamic_range_db. This will always overwrite min_value if specified. Pass None to ignore. Default: None.

min_valuefloat, None, optional

Minimum value to clip x before converting into dB in order to avoid np.nan or -np.inf in the output. Pass None to ignore. Default: np.finfo(np.float64).smallest_normal.

Returns:
NDArray[np.float64]

New array or float in dB.

dsptoolbox.tools.warp_frequency(freqs_hz: ndarray[tuple[Any, ...], dtype[float64]], sampling_rate_hz: int, warping_factor: float)

Warp a frequency vector as shown in [1].

Parameters:
freqs_hzNDArray[np.float64]

Frequency vector to warp.

sampling_rate_hzint

Sampling rate to assume during warping.

warping_factorfloat

Warping factor. It must be between ]-1;1[.

Notes

  • The formula presented in [1] has been modified with a negative sign for lambda in order to match the warping formulation used in this python package.

  • Negative lambda values increase the resolution for lower frequencies, while positive values expand higher frequencies.

References

  • [1]: Germán Ramos, José J. López, Basilio Pueo. Cascaded warped-FIR and FIR filter structure for loudspeaker equalization with low computational cost requirements. Digital Signal Processing, Volume 19, Issue 3, 2009, Pages 393-409, ISSN 1051-2004, https://doi.org/10.1016/j.dsp.2008.01.003.

dsptoolbox.tools.wrap_phase(phase_vector: ndarray[tuple[Any, ...], dtype[float64]]) ndarray[tuple[Any, ...], dtype[float64]]

Wraps phase between [-np.pi, np.pi[ after it has been unwrapped. This works for 1D and 2D arrays, more dimensions have not been tested.

Parameters:
phase_vectorNDArray[np.float64]

Phase vector for which to wrap the phase.

Returns:
NDArray[np.float64]

Wrapped phase vector.