search.scoring¶
The scoring package performs scoring of peak group candidates by calculating their features.
Main Implementation of Candidate Scoring System. |
|
Configuration Module for Candidate Scoring. |
|
Data containers for the scoring pipeline. |
|
Output Handling for Candidate Scoring. |
|
Utility functions for scoring calculations in AlphaDIA. |
|
Utility Functions for Candidate Scoring. |
|
Utility functions for feature calculations. |
|
Feature extraction for fragment ions. |
|
Location-based features for candidate scoring. |
|
Feature extraction for precursor ions. |
|
Profile-based features for elution and mobility patterns. |
|
Reference-based features comparing against library spectra. |
Main Implementation of Candidate Scoring System.
- class alphadia.search.scoring.scoring.CandidateScoring(*, dia_data: TimsTOFTranspose | AlphaRawBase | MzML | Sciex | Thermo, precursors_flat: DataFrame, fragments_flat: DataFrame, rt_column: str, mobility_column: str, precursor_mz_column: str, fragment_mz_column: str, config: CandidateScoringConfig | None = None, quadrupole_calibration: SimpleQuadrupole | None = None)[source]¶
Bases:
objectCalculate features for each precursor candidate used in scoring.
- __init__(*, dia_data: TimsTOFTranspose | AlphaRawBase | MzML | Sciex | Thermo, precursors_flat: DataFrame, fragments_flat: DataFrame, rt_column: str, mobility_column: str, precursor_mz_column: str, fragment_mz_column: str, config: CandidateScoringConfig | None = None, quadrupole_calibration: SimpleQuadrupole | None = None)[source]¶
Initialize candidate scoring step. The features can then be used for scoring, calibration and quantification.
- Parameters:
dia_data (DiaData) – DIA data object.
precursors_flat (pd.DataFrame) – A DataFrame containing precursor information. The DataFrame will be validated by using the alphadia.validation.schemas.precursors_flat schema.
fragments_flat (pd.DataFrame) – A DataFrame containing fragment information. The DataFrame will be validated by using the alphadia.validation.schemas.fragments_flat schema.
rt_column (str) – The name of the column in precursors_flat containing the RT information. This property needs to be changed to rt_calibrated if the data has been calibrated.
mobility_column (str) – The name of the column in precursors_flat containing the mobility information. This property needs to be changed to mobility_calibrated if the data has been calibrated.
precursor_mz_column (str) – The name of the column in precursors_flat containing the precursor m/z information. This property needs to be changed to mz_calibrated if the data has been calibrated.
fragment_mz_column (str) – The name of the column in fragments_flat containing the fragment m/z information. This property needs to be changed to mz_calibrated if the data has been calibrated.
config (CandidateScoringConfig, default = None) – A Numba jit compatible object containing the configuration for the candidate scoring. If None, the default configuration will be used.
quadrupole_calibration (SimpleQuadrupole, default=None) – An object containing the quadrupole calibration information. If None, an uncalibrated quadrupole will be used. The object musst have a jit method which returns a Numba JIT compiled instance of the calibration function.
- assemble_fragments() FragmentContainer[source]¶
Assemble the Numba JIT compatible fragment container from a fragment dataframe.
If not present, the cardinality column will be added to the fragment dataframe and set to 1. Then the fragment dataframe is validated using the
validate.fragments_flat()schema.- Returns:
fragment_container – A Numba JIT compatible fragment container.
- Return type:
fragments.FragmentContainer
- assemble_score_group_container(candidates_df: DataFrame) ScoreGroupContainer[source]¶
Assemble the Numba JIT compatible score group container from a candidate dataframe.
If not present, the rank column will be added to the candidate dataframe. Then score groups are calculated using
calculate_score_groups()function. If configured inCandidateScoringConfig.score_grouped, all channels will be grouped into a single score group. Otherwise, each channel will be scored separately.The candidate dataframe is validated using the
validate.candidates()schema.- Parameters:
candidates_df (pd.DataFrame) – A DataFrame containing the candidates.
- Returns:
score_group_container – A Numba JIT compatible score group container.
- Return type:
ScoreGroupContainer
- collect_candidates(candidates_df: DataFrame, psm_proto_df: OutputPsmDF, feature_columns: list[str] | None = None, candidate_columns: list[str] | None = None, precursor_df_columns: list[str] | None = None) DataFrame[source]¶
Collect the features from the score group container and return a DataFrame.
- Parameters:
candidates_df (pd.DataFrame) – A DataFrame containing the features for each candidate.
psm_proto_df (OutputPsmDF) – A Numba JIT compatible OutputPsmDF object containing the features for each candidate.
feature_columns (list[str], default=None) – The columns to use for the features. If None, the DEFAULT_FEATURE_COLUMNS will be used
candidate_columns (list[str], default=None) – The columns to use for the candidates. If None, the DEFAULT_CANDIDATE_COLUMNS will be used
precursor_df_columns (list[str], default=None) – The columns to use for the precursor DataFrame. If None, the DEFAULT_PRECURSOR_COLUMNS will be used.
- Returns:
candidates_psm_df – A DataFrame containing the features for each candidate.
- Return type:
pd.DataFrame
- collect_fragments(candidates_df: DataFrame, psm_proto_df) DataFrame[source]¶
Collect the fragment-level features from the score group container and return a DataFrame.
- Parameters:
score_group_container (ScoreGroupContainer) – A Numba JIT compatible score group container.
candidates_df (pd.DataFrame) – A DataFrame containing the features for each candidate.
- Returns:
fragment_psm_df – A DataFrame containing the features for each fragment.
- Return type:
pd.DataFrame
- property config: CandidateScoringConfig¶
Get the configuration object.
- property dia_data: TimsTOFTranspose | AlphaRawBase | MzML | Sciex | Thermo¶
Get the raw mass spec data as a DiaData object.
- property fragments_flat_df: DataFrame¶
Get the DataFrame containing fragment information.
- static merge_candidate_data(df: DataFrame, candidates_df: DataFrame, candidate_columns: list[str] | None = None)[source]¶
Merge candidate_columns from candidates_df into df.
- static merge_precursor_data(df: DataFrame, precursors_flat_df: DataFrame, rt_column: str, mobility_column: str, precursor_mz_column: str, precursor_df_columns: list[str] | None = None)[source]¶
Merge rt_column, mobility_column, precursor_mz_column, precursor_df_columns from precursors_flat_df into df.
- property precursors_flat_df: DataFrame¶
Get the DataFrame containing precursor information.
- property quadrupole_calibration: SimpleQuadrupole¶
Get the quadrupole calibration object.
Configuration Module for Candidate Scoring.
- class alphadia.search.scoring.config.CandidateScoringConfig[source]¶
Bases:
JITConfigConfig object for CandidateScoring.
- property collect_fragments: bool¶
Collect fragment features. Default: collect_fragments = False
When multiplexing is used, some fragments are shared for the same peptide with different labels. This setting removes fragments who are shared by more than one channel. Default: exclude_shared_ions = True
- property experimental_xic: bool¶
Use experimental XIC features. Default: experimental_xic = False
- property fragment_mz_tolerance: float¶
The fragment m/z tolerance in ppm. Default: fragment_mz_tolerance = 15
- property precursor_mz_tolerance: float¶
The precursor m/z tolerance in ppm. Default: precursor_mz_tolerance = 10
- property quant_all: bool¶
Quantify all fragments in the quantification window. Default: quant_all = False
- property quant_window: int¶
The quantification window size in cycles. the area will be calculated from scan_center - quant_window to scan_center + quant_window. Default: quant_window = 3
- property reference_channel: int¶
When multiplexing is being used, a reference channel can be defined for calculating reference channel deopendent features. The channel information is used as defined in the channel column in the precursor dataframe. If set to -1, no reference channel is used. Default: reference_channel = -1
- property score_grouped: bool¶
When multiplexing is used, some grouped features are calculated taking into account all channels. Default: score_grouped = False
- property top_k_fragments: int¶
The number of fragments to consider for scoring. The top_k_fragments most intense fragments are used. Default: top_k_fragments = 12
- property top_k_isotopes: int¶
The number of precursor isotopes to consider for scoring. The first top_k_isotopes most intense isotopes are used. Default: top_k_isotopes = 4
- class alphadia.search.scoring.config.CandidateScoringConfigJIT(*args, **kwargs)[source]¶
Bases:
CandidateScoringConfigJIT- class_type = jitclass.CandidateScoringConfigJIT#75e511313f90<collect_fragments:bool,score_grouped:bool,exclude_shared_ions:bool,top_k_fragments:uint32,top_k_isotopes:uint32,reference_channel:int16,quant_window:uint32,quant_all:bool,precursor_mz_tolerance:float32,fragment_mz_tolerance:float32,experimental_xic:bool>¶
Data containers for the scoring pipeline.
This module provides specialized container classes for organizing and managing scoring data, including candidate information and score group structures.
Output Handling for Candidate Scoring.
- class alphadia.search.scoring.output.OutputPsmDF(*args, **kwargs)[source]¶
Bases:
OutputPsmDF- class_type = jitclass.OutputPsmDF#75e51185d310<valid:array(bool, 1d, C),precursor_idx:array(uint32, 1d, C),rank:array(uint8, 1d, C),features:array(float32, 2d, C),fragment_precursor_idx:array(uint32, 2d, C),fragment_rank:array(uint8, 2d, C),fragment_mz_library:array(float32, 2d, C),fragment_mz:array(float32, 2d, C),fragment_mz_observed:array(float32, 2d, C),fragment_height:array(float32, 2d, C),fragment_intensity:array(float32, 2d, C),fragment_mass_error:array(float32, 2d, C),fragment_correlation:array(float32, 2d, C),fragment_position:array(uint8, 2d, C),fragment_number:array(uint8, 2d, C),fragment_type:array(uint8, 2d, C),fragment_charge:array(uint8, 2d, C),fragment_loss_type:array(uint8, 2d, C)>¶
Utility functions for scoring calculations in AlphaDIA.
This module provides numba-accelerated utility functions for various scoring calculations, including correlation coefficients, profile normalization, and statistical operations.
- alphadia.search.scoring.scoring_utils.correlation_coefficient(x: ndarray, ys: ndarray) ndarray[source]¶
Calculate the correlation coefficient between x and each y in ys.
Returns a numpy array of the same length as ys.
- Parameters:
x (np.ndarray[float32, ndim=1]) – Base array of shape (n,)
ys (np.ndarray[float32, ndim=2]) – Array of shape (m, n) containing arrays to correlate with x
- Returns:
Array of shape (m,) containing correlation coefficients. Returns 0 for cases where either x or y has zero variance.
- Return type:
np.ndarray[float32, ndim=1]
- alphadia.search.scoring.scoring_utils.median_axis(array: ndarray, axis: int = 0) ndarray[source]¶
Calculate the median along a specified axis.
- Parameters:
array (np.ndarray[float32, ndim=2]) – Input array
axis (int, optional) – Axis along which to calculate median. Default is 0.
- Returns:
Array of medians along the specified axis
- Return type:
np.ndarray[float32, ndim=1]
- alphadia.search.scoring.scoring_utils.normalize_profiles(intensity_slice: ndarray, center_dilations: int = 1) ndarray[source]¶
Calculate normalized intensity profiles from dense array.
- Parameters:
intensity_slice (np.ndarray[float32, ndim=2]) – Array where first dimension represents different measurements, and subsequent dimensions represent mz and rt
center_dilations (int, optional) – Number of points to consider around center for normalization. Default is 1.
- Returns:
Array of normalized intensity profiles with same shape as input, where profiles with zero center intensity are set to zero
- Return type:
np.ndarray[float32, ndim=2]
Utility Functions for Candidate Scoring.
- alphadia.search.scoring.utils.calculate_score_groups(input_df: DataFrame, group_channels: bool = False)[source]¶
Calculate score groups for DIA multiplexing.
On the candidate selection level, score groups are used to group ions across channels. On the scoring level, score groups are used to group channels of the same precursor and rank together.
This function makes sure that all precursors within a score group have the same elution_group_idx, decoy status and rank if available. If group_channels is True, different channels of the same precursor will be grouped together.
- Parameters:
input_df (pandas.DataFrame) – Precursor dataframe. Must contain columns ‘elution_group_idx’ and ‘decoy’. Can contain ‘rank’ column.
group_channels (bool) – If True, precursors from the same elution group will be grouped together while seperating different ranks and decoy status.
- Returns:
score_groups – Updated precursor dataframe with score_group_idx column.
- Return type:
pandas.DataFrame
Example
A precursor with the same elution_group_idx might be grouped with other precursors if only the channel is different. Different rank and decoy status will always lead to different score groups.
elution_group_idx
rank
decoy
channel
group_channels = False
group_channels = True
0
0
0
0
0
0
0
0
0
4
1
0
0
1
0
0
2
1
0
1
1
0
3
2
- alphadia.search.scoring.utils.candidate_features_to_candidates(candidate_features_df: DataFrame, optional_columns: list[str] | None = None)[source]¶
create candidates_df from candidate_features_df
- Parameters:
candidate_features_df (pd.DataFrame) – candidate_features_df
- Returns:
candidate_df – candidates_df
- Return type:
pd.DataFrame
- alphadia.search.scoring.utils.fragment_correlation(fragments_profile)[source]¶
Calculates a save correlation matrix for a given fragment profile.
- Parameters:
fragments_profile (np.ndarray) – array of shape (n_fragments, n_observations, n_data_points)
- Returns:
array of shape (n_observations, n_fragments, n_fragments)
- Return type:
np.ndarray
- alphadia.search.scoring.utils.fragment_correlation_different(x: ndarray, y: ndarray)[source]¶
Calculates a save correlation matrix for a given fragment profile.
- Parameters:
x (np.ndarray) – array of shape (n_fragments, n_observations, n_data_points)
y (np.ndarray) – array of shape (n_fragments, n_observations, n_data_points)
- Returns:
output – array of shape (n_observations, n_fragments_x, n_fragments_y)
- Return type:
np.ndarray
- alphadia.search.scoring.utils.merge_missing_columns(left_df: DataFrame, right_df: DataFrame, right_columns: list, on: list = None, how: str = 'left')[source]¶
Merge missing columns from right_df into left_df.
Merging is performed only for columns not yet present in left_df.
- Parameters:
left_df (pandas.DataFrame) – Left dataframe
right_df (pandas.DataFrame) – Right dataframe
right_columns (list) – List of columns to merge from right_df into left_df
on (list, optional) – List of columns to merge on, by default None
how (str, optional) – How to merge, by default ‘left’
- Returns:
Merged left dataframe
- Return type:
pandas.DataFrame
- alphadia.search.scoring.utils.multiplex_candidates(candidates_df: DataFrame, precursors_flat_df: DataFrame, remove_decoys: bool = True, channels: list[int] | None = None)[source]¶
Takes a candidates dataframe and a precursors dataframe and returns a multiplexed candidates dataframe. All original candidates will be retained. For missing candidates, the best scoring candidate in the elution group will be used and multiplexed across all missing channels.
- Parameters:
candidates_df (pd.DataFrame) – Candidates dataframe as returned by hybridselection.CandidateSelection
precursors_flat_df (pd.DataFrame) – Precursors dataframe
remove_decoys (bool, optional) – If True, remove decoys from the precursors dataframe, by default True
channels (List[int], optional) – List of channels to include in the multiplexed candidates dataframe, by default [0,4,8,12]
- Returns:
Multiplexed candidates dataframe
- Return type:
pd.DataFrame
- alphadia.search.scoring.utils.save_corrcoeff(x: array, y: array)[source]¶
Save way to calculate the correlation coefficient between two one-dimensional arrays.
- Parameters:
x (np.array) – One-dimensional array of shape (n,)
y (np.array) – One-dimensional array of shape (n,)
- Returns:
Correlation coefficient between x and y
- Return type:
float
The quadrupole module contains a quadrupole calibration for a dia dataset.
- class alphadia.search.scoring.quadrupole.SimpleQuadrupole(cycle)[source]¶
Bases:
object- __init__(cycle)[source]¶
Wrapper for fitting the quadrupole transfer efficiency.
- Parameters:
cycle (np.ndarray) – The dia cycle as defined in the Bruker file
Properties
----------
jit (SimpleQuadrupoleJit) – Jitclass for predicting quadrupole transfer efficiency.
- fit(P, S, X, y)[source]¶
Fit the quadrupole transfer efficiency.
- Parameters:
P (np.ndarray) – Precursor index for N datapoints
S (np.ndarray) – Scan index for N datapoints
X (np.ndarray) – m/z value for N datapoints
y (np.ndarray) – Quadrupole transfer efficiency for N datapoints
- Returns:
self – Fitted SimpleQuadrupole object
- Return type:
- get_calibrated_cycle(treshold=0.01)[source]¶
Calculate an updated cycle based on the fitted quadrupole transfer efficiency and the treshold.
- class alphadia.search.scoring.quadrupole.SimpleQuadrupoleJit(*args, **kwargs)[source]¶
Bases:
SimpleQuadrupoleJit- class_type = jitclass.SimpleQuadrupoleJit#75e51135b0d0<cycle:array(float64, 4d, C),cycle_calibrated:array(float64, 4d, C),dia_mz_cycle_calibrated:array(float64, 2d, C),sigma:array(float64, 1d, C),delta_mu:array(float64, 1d, C)>¶
- alphadia.search.scoring.quadrupole.calculate_template_single(qtf, dense_precursor_mz, isotope_intensity)[source]¶
- alphadia.search.scoring.quadrupole.logistic(x: array, mu: float, sigma: float)[source]¶
Numba implementation of the logistic function
- Parameters:
x (np.array) – Input array of shape (n_samples,)
mu (float) – Mean of the logistic function
sigma (float) – Standard deviation of the logistic function
- Returns:
Logistic function evaluated for every element in x of shape (n_samples,)
- Return type:
np.array
- alphadia.search.scoring.quadrupole.quadrupole_transfer_function_single(quadrupole_calibration_jit, observation_indices, scan_indices, isotope_mz)[source]¶
Calculate quadrupole transfer function for a given set of observations and scans.
- Parameters:
quadrupole_calibration_jit (alphadia.quadrupole.SimpleQuadrupoleJit) – Quadrupole calibration jit object
observation_indices (np.ndarray) – Array of observation indices, shape (n_observations,)
scan_indices (np.ndarray) – Array of scan indices, shape (n_scans,)
isotope_mz (np.ndarray) – Array of precursor isotope m/z values, shape (n_isotopes)
- Returns:
intensity – Array of predicted intensity values, shape (n_isotopes, n_observations, n_scans)
- Return type:
np.ndarray
Features¶
Utility functions for feature calculations.
- alphadia.search.scoring.features.features_utils.cosine_similarity_a1(template_intensity, fragments_intensity)[source]¶
- alphadia.search.scoring.features.features_utils.weighted_center_mean(single_dense_representation, scan_center, frame_center)[source]¶
- alphadia.search.scoring.features.features_utils.weighted_center_mean_2d(dense_representation, scan_center, frame_center)[source]¶
Feature extraction for fragment ions.
- alphadia.search.scoring.features.fragment_features.center_envelope_1d(x: ndarray)[source]¶
Applies an interference correction envelope to a collection of 1D arrays. Numba function which operates in place.
- Parameters:
x (np.ndarray) – Array of shape (a, b) where a is the number of arrays and b is the length of each array. It is mandatory that dimension b is odd.
- alphadia.search.scoring.features.fragment_features.fragment_features(dense_fragments: ndarray, fragments_frame_profile: ndarray, frame_rt: ndarray, observation_importance: ndarray, template: ndarray, fragments: ndarray, feature_array: Array(float32, 1, 'C', False, aligned=True), quant_window: uint32 = 3, quant_all: bool = False)[source]¶
- alphadia.search.scoring.features.fragment_features.fragment_mobility_correlation(fragments_scan_profile, template_scan_profile, observation_importance, fragment_intensity)[source]¶
- alphadia.search.scoring.features.fragment_features.weighted_center_of_mass(single_dense_representation)[source]¶
- alphadia.search.scoring.features.fragment_features.weighted_center_of_mass_1d(dense_representation)[source]¶
- alphadia.search.scoring.features.fragment_features.weighted_mean_a1(array, weight_mask)[source]¶
takes an array of shape (a, b) and a mask of shape (a, b) and returns an array of shape (a) where each element is the weighted mean of the corresponding masked row in the array.
- Parameters:
array (np.ndarray) – array of shape (a, b)
weight_mask (np.ndarray) – array of shape (a, b)
- Returns:
array of shape (a)
- Return type:
np.ndarray
Location-based features for candidate scoring.
- alphadia.search.scoring.features.location_features.location_features(jit_data, scan_start, scan_stop, scan_center, frame_start, frame_stop, frame_center, feature_array)[source]¶
Feature extraction for precursor ions.
- alphadia.search.scoring.features.precursor_features.precursor_features(isotope_mz: ndarray, isotope_intensity: ndarray, dense_precursors: ndarray, observation_importance, template: ndarray, feature_array: ndarray)[source]¶
Profile-based features for elution and mobility patterns.
- alphadia.search.scoring.features.profile_features.profile_features(dia_data, fragment_intensity, fragment_type, observation_importance, fragments_scan_profile, fragments_frame_profile, template_scan_profile, template_frame_profile, scan_start, scan_stop, frame_start, frame_stop, feature_array, experimental_xic)[source]¶
Reference-based features comparing against library spectra.
- alphadia.search.scoring.features.reference_features.reference_features(reference_observation_importance, reference_fragments_scan_profile, reference_fragments_frame_profile, reference_template_scan_profile, reference_template_frame_profile, observation_importance, fragments_scan_profile, fragments_frame_profile, template_scan_profile, template_frame_profile, fragment_lib_intensity)[source]¶