_fdrxΒΆ

The _fdrx module contains experimental functionality for false discovery rate control.

This module implements a base class for semisupervised FDR estimation using targets and decoys. It is flexible with regards to the features, type of classifier and type of identifications (precursors, peptides, proteins).

class alphadia.fdr._fdrx.base.TargetDecoyFDR(classifier: BaseEstimator, feature_columns: list, decoy_column: str = 'decoy', competition_columns: list | None = None)[source]ΒΆ

Bases: object

__init__(classifier: BaseEstimator, feature_columns: list, decoy_column: str = 'decoy', competition_columns: list | None = None)[source]ΒΆ

Target Decoy FDR estimation using a classifier.

This class supports target decoy competition as well as fragment competition.

Parameters:
  • classifier (sklearn.base.BaseEstimator) – The classifier to use for target decoy estimation.

  • feature_columns (list) – The columns to use as features for the classifier.

  • decoy_column (str, default='decoy') – The column to use as decoy information.

  • competition_columns (list, default=[]) – Perform target decoy competition on these columns. Only the best PSM for each group will be kept.

fit_classifier(psm_df: DataFrame)[source]ΒΆ

Fit the classifier on the PSMs.

Parameters:

psm_df (pd.DataFrame) – The dataframe containing the PSMs.

fit_predict_qval(psm_df: DataFrame, fragments_df: DataFrame | None = None, cycle: ndarray | None = None)[source]ΒΆ

Fit the classifier, predict the decoy probabilities and calculate q-values.

Parameters:
  • psm_df (pd.DataFrame) – The dataframe containing the PSMs.

  • fragments_df (pd.DataFrame, default=None) – The dataframe containing the fragments.

  • cycle (np.ndarray, default=None) – The DIA cycle for the fragments.

Returns:

The input dataframe with q-values and PEPs added.

Return type:

pd.DataFrame

predict_classifier(psm_df: DataFrame)[source]ΒΆ

Predict the decoy probability for the PSMs.

Parameters:

psm_df (pd.DataFrame) – The dataframe containing the PSMs.

Returns:

The decoy probabilities for the PSMs with same shape and order as the input dataframe.

Return type:

np.ndarray

predict_qval(psm_df: DataFrame, fragments_df: DataFrame | None = None, dia_cycle: ndarray | None = None, competition_heuristic: float = 0.1) DataFrame[source]ΒΆ

Calculate q-values for scored identifications.

Parameters:
  • psm_df (pd.DataFrame) – The dataframe containing the PSMs.

  • fragments_df (pd.DataFrame, default=None) – The dataframe containing the fragments.

  • dia_cycle (np.ndarray, default=None) – The DIA cycle for the fragments.

  • competition_heuristic (float, default=0.10) – The q-value threshold for fragment competition. Only precursors with q-values below this threshold will be considered for fragment competition.

Returns:

The input dataframe with q-values and PEPs added.

Return type:

pd.DataFrame

alphadia.fdr._fdrx.stats.add_q_values(df: DataFrame, decoy_proba_column: str = 'decoy_proba', decoy_column: str = 'decoy', qval_column: str = 'qval', r_target_decoy: float = 1.0)[source]ΒΆ

Calculates q-values for a dataframe containing PSMs.

Parameters:
  • df (pd.DataFrame) – The dataframe containing the PSMs.

  • decoy_proba_column (str, default='proba') – The name of the column containing the probability of being a decoy. Value should be between 0 and 1 with 1 being a decoy.

  • decoy_column (str, default='_decoy') – The name of the column containing the decoy information. Decoys are expected to be 1 and targets 0.

  • qval_column (str, default='qval') – The name of the column to store the q-values in.

Returns:

The dataframe containing the q-values in column qval.

Return type:

pd.DataFrame

alphadia.fdr._fdrx.stats.fdr_to_q_values(fdr_values: ndarray)[source]ΒΆ

Converts FDR values to q-values. Takes a ascending sorted array of FDR values and converts them to q-values. for every element the lowest FDR where it would be accepted is used as q-value.

Parameters:

fdr_values (np.ndarray) – The FDR values to convert.

Returns:

The q-values.

Return type:

np.ndarray

alphadia.fdr._fdrx.stats.get_pep(psm_df: DataFrame, score_column: str = 'decoy_proba', decoy_column: str = 'decoy', score_std: float = 0.01, pep_granularity: int = 1000, kernel_size: int = 20)[source]ΒΆ

Implementation of a very simple nonparametric PEP estimation using a gaussian kernel.

Parameters:
  • psm_df (pd.DataFrame) – The dataframe containing the PSMs.

  • score_column (str, default='decoy_proba') – The name of the column containing the score to use for the selection.

  • decoy_column (str, default='decoy') – The name of the column containing the decoy information.

  • score_std (float, default=0.01) – The standard deviation of the gaussian kernel.

  • pep_granularity (int, default=1000) – The number of bins to use for the score histogram.

  • kernel_size (int, default=20) – The size of the kernel to use for the convolution.

Returns:

The PEP values with same shape and order as the input dataframe.

Return type:

np.ndarray

alphadia.fdr._fdrx.stats.keep_best(df: DataFrame, score_column: str = 'decoy_proba', group_columns: list[str] | None = None)[source]ΒΆ

Keep the best PSM for each group of PSMs with the same precursor_idx. This function is used to select the best candidate PSM for each precursor. if the group_columns is set to [β€˜channel’, β€˜elution_group_idx’] then its used for target decoy competition.

Parameters:
  • df (pd.DataFrame) – The dataframe containing the PSMs.

  • score_column (str) – The name of the column containing the score to use for the selection.

  • group_columns (list[str], default=['channel', 'precursor_idx']) – The columns to use for the grouping.

Returns:

The dataframe containing the best PSM for each group.

Return type:

pd.DataFrame