search.scoring¶

The scoring package performs scoring of peak group candidates by calculating their features.

`alphadia.search.scoring.scoring`	Main Implementation of Candidate Scoring System.
`alphadia.search.scoring.config`	Configuration Module for Candidate Scoring.
`alphadia.search.scoring.containers`	Data containers for the scoring pipeline.
`alphadia.search.scoring.output`	Output Handling for Candidate Scoring.
`alphadia.search.scoring.scoring_utils`	Utility functions for scoring calculations in AlphaDIA.
`alphadia.search.scoring.utils`	Utility Functions for Candidate Scoring.
`alphadia.search.scoring.features.features_utils`	Utility functions for feature calculations.
`alphadia.search.scoring.features.fragment_features`	Feature extraction for fragment ions.
`alphadia.search.scoring.features.location_features`	Location-based features for candidate scoring.
`alphadia.search.scoring.features.precursor_features`	Feature extraction for precursor ions.
`alphadia.search.scoring.features.profile_features`	Profile-based features for elution and mobility patterns.
`alphadia.search.scoring.features.reference_features`	Reference-based features comparing against library spectra.

Main Implementation of Candidate Scoring System.

class alphadia.search.scoring.scoring.CandidateScoring(*, dia_data: TimsTOFTranspose | AlphaRawBase | MzML | Sciex | Thermo, precursors_flat: DataFrame, fragments_flat: DataFrame, rt_column: str, mobility_column: str, precursor_mz_column: str, fragment_mz_column: str, config: CandidateScoringConfig | None = None, quadrupole_calibration: SimpleQuadrupole | None = None)[source]¶

Bases: object

Calculate features for each precursor candidate used in scoring.

__init__(*, dia_data: TimsTOFTranspose | AlphaRawBase | MzML | Sciex | Thermo, precursors_flat: DataFrame, fragments_flat: DataFrame, rt_column: str, mobility_column: str, precursor_mz_column: str, fragment_mz_column: str, config: CandidateScoringConfig | None = None, quadrupole_calibration: SimpleQuadrupole | None = None)[source]¶

Initialize candidate scoring step. The features can then be used for scoring, calibration and quantification.

Parameters:

dia_data (DiaData) – DIA data object.
precursors_flat (pd.DataFrame) – A DataFrame containing precursor information. The DataFrame will be validated by using the alphadia.validation.schemas.precursors_flat schema.
fragments_flat (pd.DataFrame) – A DataFrame containing fragment information. The DataFrame will be validated by using the alphadia.validation.schemas.fragments_flat schema.
rt_column (str) – The name of the column in precursors_flat containing the RT information. This property needs to be changed to rt_calibrated if the data has been calibrated.
mobility_column (str) – The name of the column in precursors_flat containing the mobility information. This property needs to be changed to mobility_calibrated if the data has been calibrated.
precursor_mz_column (str) – The name of the column in precursors_flat containing the precursor m/z information. This property needs to be changed to mz_calibrated if the data has been calibrated.
fragment_mz_column (str) – The name of the column in fragments_flat containing the fragment m/z information. This property needs to be changed to mz_calibrated if the data has been calibrated.
config (CandidateScoringConfig, default = None) – A Numba jit compatible object containing the configuration for the candidate scoring. If None, the default configuration will be used.
quadrupole_calibration (SimpleQuadrupole, default=None) – An object containing the quadrupole calibration information. If None, an uncalibrated quadrupole will be used. The object musst have a jit method which returns a Numba JIT compiled instance of the calibration function.

assemble_fragments() → FragmentContainer[source]¶

Assemble the Numba JIT compatible fragment container from a fragment dataframe.

If not present, the cardinality column will be added to the fragment dataframe and set to 1. Then the fragment dataframe is validated using the validate.fragments_flat() schema.

Returns:: fragment_container – A Numba JIT compatible fragment container.
Return type:: fragments.FragmentContainer

assemble_score_group_container(candidates_df: DataFrame) → ScoreGroupContainer[source]¶

Assemble the Numba JIT compatible score group container from a candidate dataframe.

If not present, the rank column will be added to the candidate dataframe. Then score groups are calculated using calculate_score_groups() function. If configured in CandidateScoringConfig.score_grouped, all channels will be grouped into a single score group. Otherwise, each channel will be scored separately.

The candidate dataframe is validated using the validate.candidates() schema.

Parameters:: candidates_df (pd.DataFrame) – A DataFrame containing the candidates.
Returns:: score_group_container – A Numba JIT compatible score group container.
Return type:: ScoreGroupContainer

collect_candidates(candidates_df: DataFrame, psm_proto_df: OutputPsmDF, feature_columns: list[str] | None = None, candidate_columns: list[str] | None = None, precursor_df_columns: list[str] | None = None) → DataFrame[source]¶

Collect the features from the score group container and return a DataFrame.

Parameters:

candidates_df (pd.DataFrame) – A DataFrame containing the features for each candidate.
psm_proto_df (OutputPsmDF) – A Numba JIT compatible OutputPsmDF object containing the features for each candidate.
feature_columns (list[str], default=None) – The columns to use for the features. If None, the DEFAULT_FEATURE_COLUMNS will be used
candidate_columns (list[str], default=None) – The columns to use for the candidates. If None, the DEFAULT_CANDIDATE_COLUMNS will be used
precursor_df_columns (list[str], default=None) – The columns to use for the precursor DataFrame. If None, the DEFAULT_PRECURSOR_COLUMNS will be used.

Returns:

candidates_psm_df – A DataFrame containing the features for each candidate.

Return type:

pd.DataFrame

collect_fragments(candidates_df: DataFrame, psm_proto_df) → DataFrame[source]¶

Collect the fragment-level features from the score group container and return a DataFrame.

Parameters:

score_group_container (ScoreGroupContainer) – A Numba JIT compatible score group container.
candidates_df (pd.DataFrame) – A DataFrame containing the features for each candidate.

Returns:

fragment_psm_df – A DataFrame containing the features for each fragment.

Return type:

pd.DataFrame

property config: CandidateScoringConfig¶: Get the configuration object.

property dia_data: TimsTOFTranspose | AlphaRawBase | MzML | Sciex | Thermo¶: Get the raw mass spec data as a DiaData object.

property fragments_flat_df: DataFrame¶: Get the DataFrame containing fragment information.

static merge_candidate_data(df: DataFrame, candidates_df: DataFrame, candidate_columns: list[str] | None = None)[source]¶: Merge candidate_columns from candidates_df into df.

static merge_precursor_data(df: DataFrame, precursors_flat_df: DataFrame, rt_column: str, mobility_column: str, precursor_mz_column: str, precursor_df_columns: list[str] | None = None)[source]¶: Merge rt_column, mobility_column, precursor_mz_column, precursor_df_columns from precursors_flat_df into df.

property precursors_flat_df: DataFrame¶: Get the DataFrame containing precursor information.

property quadrupole_calibration: SimpleQuadrupole¶: Get the quadrupole calibration object.

Configuration Module for Candidate Scoring.

class alphadia.search.scoring.config.CandidateScoringConfig[source]¶

Bases: JITConfig

Config object for CandidateScoring.

__init__()[source]¶: Create default config for CandidateScoring

property collect_fragments: bool¶: Collect fragment features. Default: collect_fragments = False

property exclude_shared_ions: int¶: When multiplexing is used, some fragments are shared for the same peptide with different labels. This setting removes fragments who are shared by more than one channel. Default: exclude_shared_ions = True

property experimental_xic: bool¶: Use experimental XIC features. Default: experimental_xic = False

property fragment_mz_tolerance: float¶: The fragment m/z tolerance in ppm. Default: fragment_mz_tolerance = 15

property precursor_mz_tolerance: float¶: The precursor m/z tolerance in ppm. Default: precursor_mz_tolerance = 10

property quant_all: bool¶: Quantify all fragments in the quantification window. Default: quant_all = False

property quant_window: int¶: The quantification window size in cycles. the area will be calculated from scan_center - quant_window to scan_center + quant_window. Default: quant_window = 3

property reference_channel: int¶: When multiplexing is being used, a reference channel can be defined for calculating reference channel deopendent features. The channel information is used as defined in the channel column in the precursor dataframe. If set to -1, no reference channel is used. Default: reference_channel = -1

property score_grouped: bool¶: When multiplexing is used, some grouped features are calculated taking into account all channels. Default: score_grouped = False

property top_k_fragments: int¶: The number of fragments to consider for scoring. The top_k_fragments most intense fragments are used. Default: top_k_fragments = 12

property top_k_isotopes: int¶: The number of precursor isotopes to consider for scoring. The first top_k_isotopes most intense isotopes are used. Default: top_k_isotopes = 4

validate()[source]¶: Validate all properties of the config object. Should be called whenever a property is changed.

class alphadia.search.scoring.config.CandidateScoringConfigJIT(*args, **kwargs)[source]¶

Bases: CandidateScoringConfigJIT

class_type = jitclass.CandidateScoringConfigJIT#75e511313f90<collect_fragments:bool,score_grouped:bool,exclude_shared_ions:bool,top_k_fragments:uint32,top_k_isotopes:uint32,reference_channel:int16,quant_window:uint32,quant_all:bool,precursor_mz_tolerance:float32,fragment_mz_tolerance:float32,experimental_xic:bool>¶

Data containers for the scoring pipeline.

This module provides specialized container classes for organizing and managing scoring data, including candidate information and score group structures.

Output Handling for Candidate Scoring.

class alphadia.search.scoring.output.OutputPsmDF(*args, **kwargs)[source]¶

Bases: OutputPsmDF

class_type = jitclass.OutputPsmDF#75e51185d310<valid:array(bool, 1d, C),precursor_idx:array(uint32, 1d, C),rank:array(uint8, 1d, C),features:array(float32, 2d, C),fragment_precursor_idx:array(uint32, 2d, C),fragment_rank:array(uint8, 2d, C),fragment_mz_library:array(float32, 2d, C),fragment_mz:array(float32, 2d, C),fragment_mz_observed:array(float32, 2d, C),fragment_height:array(float32, 2d, C),fragment_intensity:array(float32, 2d, C),fragment_mass_error:array(float32, 2d, C),fragment_correlation:array(float32, 2d, C),fragment_position:array(uint8, 2d, C),fragment_number:array(uint8, 2d, C),fragment_type:array(uint8, 2d, C),fragment_charge:array(uint8, 2d, C),fragment_loss_type:array(uint8, 2d, C)>¶

Utility functions for scoring calculations in AlphaDIA.

This module provides numba-accelerated utility functions for various scoring calculations, including correlation coefficients, profile normalization, and statistical operations.

alphadia.search.scoring.scoring_utils.correlation_coefficient(x: ndarray, ys: ndarray) → ndarray[source]¶

Calculate the correlation coefficient between x and each y in ys.

Returns a numpy array of the same length as ys.

Parameters:

x (np.ndarray[float32, ndim=1]) – Base array of shape (n,)
ys (np.ndarray[float32, ndim=2]) – Array of shape (m, n) containing arrays to correlate with x

Returns:

Array of shape (m,) containing correlation coefficients. Returns 0 for cases where either x or y has zero variance.

Return type:

np.ndarray[float32, ndim=1]

alphadia.search.scoring.scoring_utils.median_axis(array: ndarray, axis: int = 0) → ndarray[source]¶

Calculate the median along a specified axis.

Parameters:

array (np.ndarray[float32, ndim=2]) – Input array
axis (int, optional) – Axis along which to calculate median. Default is 0.

Returns:

Array of medians along the specified axis

Return type:

np.ndarray[float32, ndim=1]

alphadia.search.scoring.scoring_utils.normalize_profiles(intensity_slice: ndarray, center_dilations: int = 1) → ndarray[source]¶

Calculate normalized intensity profiles from dense array.

Parameters:

intensity_slice (np.ndarray[float32, ndim=2]) – Array where first dimension represents different measurements, and subsequent dimensions represent mz and rt
center_dilations (int, optional) – Number of points to consider around center for normalization. Default is 1.

Returns:

Array of normalized intensity profiles with same shape as input, where profiles with zero center intensity are set to zero

Return type:

np.ndarray[float32, ndim=2]

Utility Functions for Candidate Scoring.

alphadia.search.scoring.utils.calculate_score_groups(input_df: DataFrame, group_channels: bool = False)[source]¶

Calculate score groups for DIA multiplexing.

On the candidate selection level, score groups are used to group ions across channels. On the scoring level, score groups are used to group channels of the same precursor and rank together.

This function makes sure that all precursors within a score group have the same elution_group_idx, decoy status and rank if available. If group_channels is True, different channels of the same precursor will be grouped together.

Parameters:

input_df (pandas.DataFrame) – Precursor dataframe. Must contain columns ‘elution_group_idx’ and ‘decoy’. Can contain ‘rank’ column.
group_channels (bool) – If True, precursors from the same elution group will be grouped together while seperating different ranks and decoy status.

Returns:

score_groups – Updated precursor dataframe with score_group_idx column.

Return type:

pandas.DataFrame

Example

A precursor with the same elution_group_idx might be grouped with other precursors if only the channel is different. Different rank and decoy status will always lead to different score groups.

rank	decoy	channel	group_channels = False	group_channels = True
0	0	0	0	0
0	0	4	1	0
1	0	0	2	1
1	1	0	3	2

alphadia.search.scoring.utils.candidate_features_to_candidates(candidate_features_df: DataFrame, optional_columns: list[str] | None = None)[source]¶

create candidates_df from candidate_features_df

Parameters:: candidate_features_df (pd.DataFrame) – candidate_features_df
Returns:: candidate_df – candidates_df
Return type:: pd.DataFrame

alphadia.search.scoring.utils.fragment_correlation(fragments_profile)[source]¶

Calculates a save correlation matrix for a given fragment profile.

Parameters:: fragments_profile (np.ndarray) – array of shape (n_fragments, n_observations, n_data_points)
Returns:: array of shape (n_observations, n_fragments, n_fragments)
Return type:: np.ndarray

alphadia.search.scoring.utils.fragment_correlation_different(x: ndarray, y: ndarray)[source]¶

Calculates a save correlation matrix for a given fragment profile.

Parameters:

x (np.ndarray) – array of shape (n_fragments, n_observations, n_data_points)
y (np.ndarray) – array of shape (n_fragments, n_observations, n_data_points)

Returns:

output – array of shape (n_observations, n_fragments_x, n_fragments_y)

Return type:

np.ndarray

alphadia.search.scoring.utils.frame_profile_1d(x)[source]¶

alphadia.search.scoring.utils.frame_profile_2d(x)[source]¶

alphadia.search.scoring.utils.merge_missing_columns(left_df: DataFrame, right_df: DataFrame, right_columns: list, on: list = None, how: str = 'left')[source]¶

Merge missing columns from right_df into left_df.

Merging is performed only for columns not yet present in left_df.

Parameters:

left_df (pandas.DataFrame) – Left dataframe
right_df (pandas.DataFrame) – Right dataframe
right_columns (list) – List of columns to merge from right_df into left_df
on (list, optional) – List of columns to merge on, by default None
how (str, optional) – How to merge, by default ‘left’

Returns:

Merged left dataframe

Return type:

pandas.DataFrame

alphadia.search.scoring.utils.multiplex_candidates(candidates_df: DataFrame, precursors_flat_df: DataFrame, remove_decoys: bool = True, channels: list[int] | None = None)[source]¶

Takes a candidates dataframe and a precursors dataframe and returns a multiplexed candidates dataframe. All original candidates will be retained. For missing candidates, the best scoring candidate in the elution group will be used and multiplexed across all missing channels.

Parameters:

candidates_df (pd.DataFrame) – Candidates dataframe as returned by hybridselection.CandidateSelection
precursors_flat_df (pd.DataFrame) – Precursors dataframe
remove_decoys (bool, optional) – If True, remove decoys from the precursors dataframe, by default True
channels (List[int], optional) – List of channels to include in the multiplexed candidates dataframe, by default [0,4,8,12]

Returns:

Multiplexed candidates dataframe

Return type:

pd.DataFrame

alphadia.search.scoring.utils.or_envelope_1d(x)[source]¶

alphadia.search.scoring.utils.or_envelope_2d(x)[source]¶

alphadia.search.scoring.utils.save_corrcoeff(x: array, y: array)[source]¶

Save way to calculate the correlation coefficient between two one-dimensional arrays.

Parameters:

x (np.array) – One-dimensional array of shape (n,)
y (np.array) – One-dimensional array of shape (n,)

Returns:

Correlation coefficient between x and y

Return type:

float

alphadia.search.scoring.utils.scan_profile_1d(x)[source]¶

alphadia.search.scoring.utils.scan_profile_2d(x)[source]¶

alphadia.search.scoring.utils.slice(inst, slices)[source]¶

alphadia.search.scoring.utils.tile(a, n)[source]¶

The quadrupole module contains a quadrupole calibration for a dia dataset.

class alphadia.search.scoring.quadrupole.SimpleQuadrupole(cycle)[source]¶

Bases: object

__init__(cycle)[source]¶

Wrapper for fitting the quadrupole transfer efficiency.

Parameters:

cycle (np.ndarray) – The dia cycle as defined in the Bruker file
Properties
----------
jit (SimpleQuadrupoleJit) – Jitclass for predicting quadrupole transfer efficiency.

fit(P, S, X, y)[source]¶

Fit the quadrupole transfer efficiency.

Parameters:

P (np.ndarray) – Precursor index for N datapoints
S (np.ndarray) – Scan index for N datapoints
X (np.ndarray) – m/z value for N datapoints
y (np.ndarray) – Quadrupole transfer efficiency for N datapoints

Returns:

self – Fitted SimpleQuadrupole object

Return type:

SimpleQuadrupole

get_calibrated_cycle(treshold=0.01)[source]¶: Calculate an updated cycle based on the fitted quadrupole transfer efficiency and the treshold.

get_params(deep: bool = True)[source]¶

predict(P, S, X)[source]¶

Fit the quadrupole transfer efficiency.

Parameters:

P (np.ndarray) – Precursor index for N datapoints
S (np.ndarray) – Scan index for N datapoints
X (np.ndarray) – m/z value for N datapoints

set_params(**params)[source]¶

class alphadia.search.scoring.quadrupole.SimpleQuadrupoleJit(*args, **kwargs)[source]¶

Bases: SimpleQuadrupoleJit

class_type = jitclass.SimpleQuadrupoleJit#75e51135b0d0<cycle:array(float64, 4d, C),cycle_calibrated:array(float64, 4d, C),dia_mz_cycle_calibrated:array(float64, 2d, C),sigma:array(float64, 1d, C),delta_mu:array(float64, 1d, C)>¶

alphadia.search.scoring.quadrupole.calculate_observation_importance_single(template)[source]¶

alphadia.search.scoring.quadrupole.calculate_template_single(qtf, dense_precursor_mz, isotope_intensity)[source]¶

alphadia.search.scoring.quadrupole.expand_cycle(cycle, lower_mz, upper_mz)[source]¶

alphadia.search.scoring.quadrupole.logistic(x: array, mu: float, sigma: float)[source]¶

Numba implementation of the logistic function

Parameters:

x (np.array) – Input array of shape (n_samples,)
mu (float) – Mean of the logistic function
sigma (float) – Standard deviation of the logistic function

Returns:

Logistic function evaluated for every element in x of shape (n_samples,)

Return type:

np.array

alphadia.search.scoring.quadrupole.logistic_rectangle(mu1, mu2, sigma1, sigma2, x)[source]¶

alphadia.search.scoring.quadrupole.quadrupole_transfer_function_single(quadrupole_calibration_jit, observation_indices, scan_indices, isotope_mz)[source]¶

Calculate quadrupole transfer function for a given set of observations and scans.

Parameters:

quadrupole_calibration_jit (alphadia.quadrupole.SimpleQuadrupoleJit) – Quadrupole calibration jit object
observation_indices (np.ndarray) – Array of observation indices, shape (n_observations,)
scan_indices (np.ndarray) – Array of scan indices, shape (n_scans,)
isotope_mz (np.ndarray) – Array of precursor isotope m/z values, shape (n_isotopes)

Returns:

intensity – Array of predicted intensity values, shape (n_isotopes, n_observations, n_scans)

Return type:

np.ndarray

Features¶

Utility functions for feature calculations.

alphadia.search.scoring.features.features_utils.cosine_similarity_a1(template_intensity, fragments_intensity)[source]¶

alphadia.search.scoring.features.features_utils.weighted_center_mean(single_dense_representation, scan_center, frame_center)[source]¶

alphadia.search.scoring.features.features_utils.weighted_center_mean_2d(dense_representation, scan_center, frame_center)[source]¶

Feature extraction for fragment ions.

alphadia.search.scoring.features.fragment_features.center_envelope_1d(x: ndarray)[source]¶

Applies an interference correction envelope to a collection of 1D arrays. Numba function which operates in place.

Parameters:: x (np.ndarray) – Array of shape (a, b) where a is the number of arrays and b is the length of each array. It is mandatory that dimension b is odd.

alphadia.search.scoring.features.fragment_features.fragment_features(dense_fragments: ndarray, fragments_frame_profile: ndarray, frame_rt: ndarray, observation_importance: ndarray, template: ndarray, fragments: ndarray, feature_array: Array(float32, 1, 'C', False, aligned=True), quant_window: uint32 = 3, quant_all: bool = False)[source]¶

alphadia.search.scoring.features.fragment_features.fragment_mobility_correlation(fragments_scan_profile, template_scan_profile, observation_importance, fragment_intensity)[source]¶

alphadia.search.scoring.features.fragment_features.weighted_center_of_mass(single_dense_representation)[source]¶

alphadia.search.scoring.features.fragment_features.weighted_center_of_mass_1d(dense_representation)[source]¶

alphadia.search.scoring.features.fragment_features.weighted_mean_a1(array, weight_mask)[source]¶

takes an array of shape (a, b) and a mask of shape (a, b) and returns an array of shape (a) where each element is the weighted mean of the corresponding masked row in the array.

Parameters:

array (np.ndarray) – array of shape (a, b)
weight_mask (np.ndarray) – array of shape (a, b)

Returns:

array of shape (a)

Return type:

np.ndarray

Location-based features for candidate scoring.

alphadia.search.scoring.features.location_features.location_features(jit_data, scan_start, scan_stop, scan_center, frame_start, frame_stop, frame_center, feature_array)[source]¶

Feature extraction for precursor ions.

alphadia.search.scoring.features.precursor_features.precursor_features(isotope_mz: ndarray, isotope_intensity: ndarray, dense_precursors: ndarray, observation_importance, template: ndarray, feature_array: ndarray)[source]¶

Profile-based features for elution and mobility patterns.

alphadia.search.scoring.features.profile_features.profile_features(dia_data, fragment_intensity, fragment_type, observation_importance, fragments_scan_profile, fragments_frame_profile, template_scan_profile, template_frame_profile, scan_start, scan_stop, frame_start, frame_stop, feature_array, experimental_xic)[source]¶

Reference-based features comparing against library spectra.

alphadia.search.scoring.features.reference_features.reference_features(reference_observation_importance, reference_fragments_scan_profile, reference_fragments_frame_profile, reference_template_scan_profile, reference_template_frame_profile, observation_importance, fragments_scan_profile, fragments_frame_profile, template_scan_profile, template_frame_profile, fragment_lib_intensity)[source]¶