_fdrxΒΆ
The _fdrx module contains experimental functionality for false discovery rate control.
This module implements a base class for semisupervised FDR estimation using targets and decoys. It is flexible with regards to the features, type of classifier and type of identifications (precursors, peptides, proteins).
- class alphadia.fdr._fdrx.base.TargetDecoyFDR(classifier: BaseEstimator, feature_columns: list, decoy_column: str = 'decoy', competition_columns: list | None = None)[source]ΒΆ
Bases:
object- __init__(classifier: BaseEstimator, feature_columns: list, decoy_column: str = 'decoy', competition_columns: list | None = None)[source]ΒΆ
Target Decoy FDR estimation using a classifier.
This class supports target decoy competition as well as fragment competition.
- Parameters:
classifier (sklearn.base.BaseEstimator) β The classifier to use for target decoy estimation.
feature_columns (list) β The columns to use as features for the classifier.
decoy_column (str, default='decoy') β The column to use as decoy information.
competition_columns (list, default=[]) β Perform target decoy competition on these columns. Only the best PSM for each group will be kept.
- fit_classifier(psm_df: DataFrame)[source]ΒΆ
Fit the classifier on the PSMs.
- Parameters:
psm_df (pd.DataFrame) β The dataframe containing the PSMs.
- fit_predict_qval(psm_df: DataFrame, fragments_df: DataFrame | None = None, cycle: ndarray | None = None)[source]ΒΆ
Fit the classifier, predict the decoy probabilities and calculate q-values.
- Parameters:
psm_df (pd.DataFrame) β The dataframe containing the PSMs.
fragments_df (pd.DataFrame, default=None) β The dataframe containing the fragments.
cycle (np.ndarray, default=None) β The DIA cycle for the fragments.
- Returns:
The input dataframe with q-values and PEPs added.
- Return type:
pd.DataFrame
- predict_classifier(psm_df: DataFrame)[source]ΒΆ
Predict the decoy probability for the PSMs.
- Parameters:
psm_df (pd.DataFrame) β The dataframe containing the PSMs.
- Returns:
The decoy probabilities for the PSMs with same shape and order as the input dataframe.
- Return type:
np.ndarray
- predict_qval(psm_df: DataFrame, fragments_df: DataFrame | None = None, dia_cycle: ndarray | None = None, competition_heuristic: float = 0.1) DataFrame[source]ΒΆ
Calculate q-values for scored identifications.
- Parameters:
psm_df (pd.DataFrame) β The dataframe containing the PSMs.
fragments_df (pd.DataFrame, default=None) β The dataframe containing the fragments.
dia_cycle (np.ndarray, default=None) β The DIA cycle for the fragments.
competition_heuristic (float, default=0.10) β The q-value threshold for fragment competition. Only precursors with q-values below this threshold will be considered for fragment competition.
- Returns:
The input dataframe with q-values and PEPs added.
- Return type:
pd.DataFrame
- alphadia.fdr._fdrx.stats.add_q_values(df: DataFrame, decoy_proba_column: str = 'decoy_proba', decoy_column: str = 'decoy', qval_column: str = 'qval', r_target_decoy: float = 1.0)[source]ΒΆ
Calculates q-values for a dataframe containing PSMs.
- Parameters:
df (pd.DataFrame) β The dataframe containing the PSMs.
decoy_proba_column (str, default='proba') β The name of the column containing the probability of being a decoy. Value should be between 0 and 1 with 1 being a decoy.
decoy_column (str, default='_decoy') β The name of the column containing the decoy information. Decoys are expected to be 1 and targets 0.
qval_column (str, default='qval') β The name of the column to store the q-values in.
- Returns:
The dataframe containing the q-values in column qval.
- Return type:
pd.DataFrame
- alphadia.fdr._fdrx.stats.fdr_to_q_values(fdr_values: ndarray)[source]ΒΆ
Converts FDR values to q-values. Takes a ascending sorted array of FDR values and converts them to q-values. for every element the lowest FDR where it would be accepted is used as q-value.
- Parameters:
fdr_values (np.ndarray) β The FDR values to convert.
- Returns:
The q-values.
- Return type:
np.ndarray
- alphadia.fdr._fdrx.stats.get_pep(psm_df: DataFrame, score_column: str = 'decoy_proba', decoy_column: str = 'decoy', score_std: float = 0.01, pep_granularity: int = 1000, kernel_size: int = 20)[source]ΒΆ
Implementation of a very simple nonparametric PEP estimation using a gaussian kernel.
- Parameters:
psm_df (pd.DataFrame) β The dataframe containing the PSMs.
score_column (str, default='decoy_proba') β The name of the column containing the score to use for the selection.
decoy_column (str, default='decoy') β The name of the column containing the decoy information.
score_std (float, default=0.01) β The standard deviation of the gaussian kernel.
pep_granularity (int, default=1000) β The number of bins to use for the score histogram.
kernel_size (int, default=20) β The size of the kernel to use for the convolution.
- Returns:
The PEP values with same shape and order as the input dataframe.
- Return type:
np.ndarray
- alphadia.fdr._fdrx.stats.keep_best(df: DataFrame, score_column: str = 'decoy_proba', group_columns: list[str] | None = None)[source]ΒΆ
Keep the best PSM for each group of PSMs with the same precursor_idx. This function is used to select the best candidate PSM for each precursor. if the group_columns is set to [βchannelβ, βelution_group_idxβ] then its used for target decoy competition.
- Parameters:
df (pd.DataFrame) β The dataframe containing the PSMs.
score_column (str) β The name of the column containing the score to use for the selection.
group_columns (list[str], default=['channel', 'precursor_idx']) β The columns to use for the grouping.
- Returns:
The dataframe containing the best PSM for each group.
- Return type:
pd.DataFrame