search.selectionΒΆ

The selection package handles candidate selection for peak groups using various algorithms and strategies.

alphadia.search.selection.selection

Main candidate selection implementation for DIA data analysis.

alphadia.search.selection.config_df

Configuration DataFrames for selection parameters.

alphadia.search.selection.fft

Fast Fourier Transform operations for signal processing.

alphadia.search.selection.kernel

Kernel-based operations for candidate selection.

alphadia.search.selection.utils

Utility functions for candidate selection operations.

Main candidate selection implementation for DIA data analysis.

class alphadia.search.selection.selection.CandidateSelection(dia_data: TimsTOFTranspose | AlphaRawBase | MzML | Sciex | Thermo, precursors_flat: DataFrame, fragments_flat: DataFrame, config: CandidateSelectionConfig, rt_column: str, mobility_column: str, precursor_mz_column: str, fragment_mz_column: str, fwhm_rt: float = 5.0, fwhm_mobility: float = 0.012)[source]ΒΆ

Bases: object

__init__(dia_data: TimsTOFTranspose | AlphaRawBase | MzML | Sciex | Thermo, precursors_flat: DataFrame, fragments_flat: DataFrame, config: CandidateSelectionConfig, rt_column: str, mobility_column: str, precursor_mz_column: str, fragment_mz_column: str, fwhm_rt: float = 5.0, fwhm_mobility: float = 0.012) None[source]ΒΆ

Select candidates for MS2 extraction based on MS1 features

Parameters:
  • dia_data (DiaData) – dia data object

  • precursors_flat (pd.DataFrame) – flattened precursor dataframe

  • fragments_flat (pd.DataFrame) – flattened fragment dataframe

  • config (CandidateSelectionConfig) – config object

  • rt_column (str) – name of the rt column in the precursor dataframe

  • mobility_column (str) – name of the mobility column in the precursor dataframe

  • precursor_mz_column (str) – name of the precursor mz column in the precursor dataframe

  • fragment_mz_column (str) – name of the fragment mz column in the fragment dataframe

  • fwhm_rt (float, optional) – full width at half maximum in RT dimension for the GaussianKernel, by default 5.0

  • fwhm_mobility (float, optional) – full width at half maximum in mobility dimension for the GaussianKernel, by default 0.012

Configuration DataFrames for selection parameters.

class alphadia.search.selection.config_df.CandidateContainer(*args, **kwargs)[source]ΒΆ

Bases: CandidateContainer

class_type = jitclass.CandidateContainer#75e5dd9f7d10<precursor_idx:array(uint32, 1d, C),rank:array(uint8, 1d, C),score:array(float32, 1d, C),scan_center:array(uint32, 1d, C),scan_start:array(uint32, 1d, C),scan_stop:array(uint32, 1d, C),frame_center:array(uint32, 1d, C),frame_start:array(uint32, 1d, C),frame_stop:array(uint32, 1d, C)>ΒΆ
class alphadia.search.selection.config_df.CandidateSelectionConfig[source]ΒΆ

Bases: JITConfig

__init__()[source]ΒΆ

Base class for creating numba compatible config objects.

validate()[source]ΒΆ

Validates the config object. Note that this class is not meant to be instantiated. Classes inheriting from JITConfig must implement their own validate method.

class alphadia.search.selection.config_df.CandidateSelectionConfigJIT(*args, **kwargs)[source]ΒΆ

Bases: CandidateSelectionConfigJIT

Numba compatible config object for the HybridCandidate class. Please see the documentation of the CandidateSelectionConfig class for more information on the parameters and their default values.

candidate_count: int64ΒΆ
center_fraction: float64ΒΆ
class_type = jitclass.CandidateSelectionConfigJIT#75e5dd9f4fd0<rt_tolerance:float64,precursor_mz_tolerance:float64,fragment_mz_tolerance:float64,mobility_tolerance:float64,isotope_tolerance:float64,peak_len_rt:float64,sigma_scale_rt:float64,peak_len_mobility:float64,sigma_scale_mobility:float64,candidate_count:int64,top_k_precursors:int64,top_k_fragments:int64,exclude_shared_ions:bool,kernel_size:int64,f_mobility:float64,f_rt:float64,center_fraction:float64,min_size_mobility:int64,min_size_rt:int64,max_size_mobility:int64,max_size_rt:int64,group_channels:bool,use_weighted_score:bool,join_close_candidates:bool,join_close_candidates_scan_threshold:float64,join_close_candidates_cycle_threshold:float64,feature_std:array(float64, 1d, C),feature_mean:array(float64, 1d, C),feature_weight:array(float64, 1d, C)>ΒΆ
exclude_shared_ions: boolΒΆ
f_mobility: float64ΒΆ
f_rt: float64ΒΆ
feature_mean: Array(float64, 1, 'C', False, aligned=True)ΒΆ
feature_std: Array(float64, 1, 'C', False, aligned=True)ΒΆ
feature_weight: Array(float64, 1, 'C', False, aligned=True)ΒΆ
fragment_mz_tolerance: float64ΒΆ
group_channels: boolΒΆ
isotope_tolerance: float64ΒΆ
join_close_candidates: boolΒΆ
join_close_candidates_cycle_threshold: float64ΒΆ
join_close_candidates_scan_threshold: float64ΒΆ
kernel_size: int64ΒΆ
max_size_mobility: int64ΒΆ
max_size_rt: int64ΒΆ
min_size_mobility: int64ΒΆ
min_size_rt: int64ΒΆ
mobility_tolerance: float64ΒΆ
peak_len_mobility: float64ΒΆ
peak_len_rt: float64ΒΆ
precursor_mz_tolerance: float64ΒΆ
rt_tolerance: float64ΒΆ
sigma_scale_mobility: float64ΒΆ
sigma_scale_rt: float64ΒΆ
top_k_fragments: int64ΒΆ
top_k_precursors: int64ΒΆ
use_weighted_score: boolΒΆ
class alphadia.search.selection.config_df.PrecursorFlatContainer(*args, **kwargs)[source]ΒΆ

Bases: PrecursorFlatContainer

candidate_start_idx: Array(uint32, 1, 'C', False, aligned=True)ΒΆ
candidate_stop_idx: Array(uint32, 1, 'C', False, aligned=True)ΒΆ
charge: Array(uint8, 1, 'C', False, aligned=True)ΒΆ
class_type = jitclass.PrecursorFlatContainer#75e5dd9f5750<precursor_idx:array(uint32, 1d, C),frag_start_idx:array(uint32, 1d, C),frag_stop_idx:array(uint32, 1d, C),candidate_start_idx:array(uint32, 1d, C),candidate_stop_idx:array(uint32, 1d, C),charge:array(uint8, 1d, C),rt:array(float32, 1d, C),mobility:array(float32, 1d, C),mz:array(float32, 1d, C),isotopes:array(float32, 2d, C)>ΒΆ
frag_start_idx: Array(uint32, 1, 'C', False, aligned=True)ΒΆ
frag_stop_idx: Array(uint32, 1, 'C', False, aligned=True)ΒΆ
isotopes: Array(float32, 2, 'C', False, aligned=True)ΒΆ
mobility: Array(float32, 1, 'C', False, aligned=True)ΒΆ
mz: Array(float32, 1, 'C', False, aligned=True)ΒΆ
precursor_idx: Array(uint32, 1, 'C', False, aligned=True)ΒΆ
rt: Array(float32, 1, 'C', False, aligned=True)ΒΆ
alphadia.search.selection.config_df.candidate_container_to_df(candidate_container: CandidateContainer) DataFrame[source]ΒΆ

Convert a CandidateContainer to pd.DataFrame.

Fast Fourier Transform operations for signal processing.

exception alphadia.search.selection.fft.NumbaContextOnly[source]ΒΆ

Bases: Exception

alphadia.search.selection.fft.convolve_fourier(dense, kernel)[source]ΒΆ

Numba helper function to apply a gaussian filter to a 2d or 3d dense matrix.

Parameters:
  • dense (np.ndarray) – Array of shape (…, n_scans, n_frames)

  • kernel (np.ndarray) – Array of shape (i, j)

Returns:

Array of shape (…, n_scans, n_frames) containing the filtered dense stack.

Return type:

np.ndarray

Kernel-based operations for candidate selection.

class alphadia.search.selection.kernel.GaussianKernel(dia_data: TimsTOFTransposeJIT | AlphaRawJIT, fwhm_rt: float = 10.0, sigma_scale_rt: float = 1.0, fwhm_mobility: float = 0.03, sigma_scale_mobility: float = 1.0, kernel_height: int = 30, kernel_width: int = 30)[source]ΒΆ

Bases: object

__init__(dia_data: TimsTOFTransposeJIT | AlphaRawJIT, fwhm_rt: float = 10.0, sigma_scale_rt: float = 1.0, fwhm_mobility: float = 0.03, sigma_scale_mobility: float = 1.0, kernel_height: int = 30, kernel_width: int = 30)[source]ΒΆ

Create a two-dimensional gaussian filter kernel for the RT and mobility dimensions of a DIA dataset. First, the observed standard deviation is scaled by a linear factor. Second, the standard deviation is scaled by the resolution of the respective dimension.

This results in sigma_scale to be independent of the resolution of the data and FWHM of the peaks.

Parameters:
  • dia_data (DiaDataJIT) – dia_data jit object.

  • fwhm_rt (float) – Full width at half maximum in RT dimension of the peaks in the spectrum.

  • sigma_scale_rt (float) – Scaling factor for the standard deviation in RT dimension.

  • fwhm_mobility (float) – Full width at half maximum in mobility dimension of the peaks in the spectrum.

  • sigma_scale_mobility (float) – Scaling factor for the standard deviation in mobility dimension.

  • kernel_size (int) – Kernel shape in pixel. The kernel will be a square of size (kernel_size, kernel_size). Should be even and will be rounded up to the next even number if necessary.

determine_mobility_sigma(mobility_resolution: float)[source]ΒΆ

Determine the standard deviation of the gaussian kernel in mobility dimension. The standard deviation will be sclaed to the resolution of the raw data.

Parameters:

mobility_resolution (float) – Resolution of the mobility dimension in 1/K_0.

Returns:

Standard deviation of the gaussian kernel in mobility dimension scaled to the resolution of the raw data.

Return type:

float

determine_rt_sigma(cycle_length_seconds: float)[source]ΒΆ

Determine the standard deviation of the gaussian kernel in RT dimension. The standard deviation will be sclaed to the resolution of the raw data.

Parameters:

cycle_length_seconds (float) – Cycle length of the duty cycle in seconds.

Returns:

Standard deviation of the gaussian kernel in RT dimension scaled to the resolution of the raw data.

Return type:

float

static gaussian_kernel_2d(size_x: int, size_y: int, sigma_x: float, sigma_y: float)[source]ΒΆ

Create a 2D gaussian kernel with a given size and standard deviation.

Parameters:
  • size (int) – Width and height of the kernel matrix.

  • sigma_x (float) – Standard deviation of the gaussian kernel in x direction. This will correspond to the RT dimension.

  • sigma_y (float) – Standard deviation of the gaussian kernel in y direction. This will correspond to the mobility dimension.

Returns:

weights – 2D gaussian kernel matrix of shape (size, size).

Return type:

np.ndarray, dtype=np.float32

get_dense_matrix(verbose: bool = True) ndarray[source]ΒΆ

Calculate the gaussian kernel for the given data set and parameters.

Parameters:

verbose (bool) – If True, log information about the data set and the kernel.

Returns:

Two-dimensional gaussian kernel.

Return type:

np.ndarray

alphadia.search.selection.kernel.multivariate_normal(x: ndarray, mu: ndarray, sigma: ndarray)[source]ΒΆ

Multivariate normal distribution, probability density function

Most likely an absolutely inefficient implementation of the multivariate normal distribution. Is only used for creating the gaussian kernel and will only be used a few times for small kernels.

Parameters:
  • x (np.ndarray) – (N, D,)

  • mu (np.ndarray) – (1, D,)

  • sigma (np.ndarray) – (D, D,)

Returns:

array of shape (N,) with the density at each point

Return type:

np.ndarray, float32

Utility functions for candidate selection operations.

alphadia.search.selection.utils.amean1(array)[source]ΒΆ
alphadia.search.selection.utils.assemble_isotope_mz(mono_mz, charge, isotope_intensity)[source]ΒΆ

Assemble the isotope m/z values from the precursor m/z and the isotope offsets.

alphadia.search.selection.utils.astd1(array)[source]ΒΆ
alphadia.search.selection.utils.find_peaks_1d(a: ndarray, top_n: int = 3) tuple[ndarray, ndarray, ndarray][source]ΒΆ

Accepts a dense representation and returns the top three peaks

alphadia.search.selection.utils.find_peaks_2d(a: ndarray, top_n: int = 3) tuple[ndarray, ndarray, ndarray][source]ΒΆ

Accepts a dense representation and returns the top three peaks

alphadia.search.selection.utils.slice_manual(inst, slices)[source]ΒΆ
alphadia.search.selection.utils.symetric_limits_2d(a, scan_center, dia_cycle_center, f_mobility=0.95, f_rt=0.95, center_fraction=0.01, min_size_mobility=3, max_size_mobility=20, min_size_rt=1, max_size_rt=10)[source]ΒΆ
alphadia.search.selection.utils.wrap0(value, limit)[source]ΒΆ
alphadia.search.selection.utils.wrap1(values, limit)[source]ΒΆ