outputtransform¶

The outputtransform package contains functionality to perform the consensus workflow. Individual search workflows, part of a joined search plan, are combined.

This includes:

protein inference
false-discovery-rate correction
quantification
spectral library generation.

`alphadia.outputtransform.df_builders`
`alphadia.outputtransform.outputaccumulator`	Output Accumulator This module contains classes to accumulate the information from the output folders of the alphadia pipeline in a linear fashion. This is hugely useful when we have a large number of output folders and we want to accumulate the information from all of them in a single object/Library which can be a challenge to do in a single go due to memory constraints. The module is designed as broadcast-subscriber pattern where the AccumulationBroadcaster class loops over the output folders creating a speclibBase object from each output folder and then broadcasts the information to the subscribers.
`alphadia.outputtransform.protein_fdr`
`alphadia.outputtransform.search_plan_output`
`alphadia.outputtransform.utils`

alphadia.outputtransform.df_builders.build_run_internal_df(folder_path: str)[source]¶

Build stat dataframe for a single run.

Parameters:: folder_path (str) – Path (from the base directory of the output_folder attribute of the SearchStep class) to the directory containing the raw file and the managers
Returns:: Dataframe containing the statistics
Return type:: pd.DataFrame

alphadia.outputtransform.df_builders.build_run_stat_df(folder: str, raw_name: str, run_df: DataFrame, channels: list[int] | None = None)[source]¶

Build stat dataframe for a single run.

Parameters:

folder (str) – Directory containing the raw file and the managers
raw_name (str) – Name of the raw file
run_df (pd.DataFrame) – Dataframe containing the precursor data
channels (List[int], optional) – List of channels to include in the output, default=[0]

Returns:

Dataframe containing the statistics

Return type:

pd.DataFrame

alphadia.outputtransform.df_builders.log_stat_df(stat_df: DataFrame)[source]¶

log statistics dataframe to console

Parameters:: stat_df (pd.DataFrame) – statistics dataframe

alphadia.outputtransform.df_builders.transfer_library_stat_df(transfer_library: SpecLibBase) → DataFrame[source]¶

create statistics dataframe for transfer library

Parameters:: transfer_library (SpecLibBase) – transfer library
Returns:: statistics dataframe
Return type:: pd.DataFrame

Output Accumulator¶

This module contains classes to accumulate the information from the output folders of the alphadia pipeline in a linear fashion. This is hugely useful when we have a large number of output folders and we want to accumulate the information from all of them in a single object/Library which can be a challenge to do in a single go due to memory constraints. The module is designed as broadcast-subscriber pattern where the AccumulationBroadcaster class loops over the output folders creating a speclibBase object from each output folder and then broadcasts the information to the subscribers.

Classes¶

BaseAccumulator: Base class for accumulator classes, which are used to subscribe on the linear accumulation of a list of output folders. it has two methods update and post_process.
AccumulationBroadcaster: Class that loops over output folders in a linear fashion to prevent having all the output folders in memory at the same time.
TransferLearningAccumulator: Class that accumulates the information from the output folders for fine-tuning by selecting the top keep_top precursors and their fragments from all the output folders.

class alphadia.outputtransform.outputaccumulator.AccumulationBroadcaster(folder_list: list, number_of_processes: int, processing_kwargs: dict)[source]¶

Bases: object

Class that loops over output folders in a linear fashion to only have one folder in memory at a time. And broadcasts the output of each folder to the subscribers.

__init__(folder_list: list, number_of_processes: int, processing_kwargs: dict)[source]¶

run()[source]¶

subscribe(subscriber: BaseAccumulator)[source]¶

class alphadia.outputtransform.outputaccumulator.BaseAccumulator[source]¶

Bases: object

Base class for accumulator classes, which are used to subscribe on the linear accumulation of a list of output folders.

post_process() → None[source]¶: Called after all output folders have been processed.

update(info: SpecLibBase) → None[source]¶

Called when a new output folder is obtained.

Parameters:: info (SpecLibBase) – The information from the output folder.

class alphadia.outputtransform.outputaccumulator.TransferLearningAccumulator(keep_top: int = 3, norm_delta_max: bool = True, precursor_correlation_cutoff: float = 0.5, fragment_correlation_ratio: float = 0.75)[source]¶

Bases: BaseAccumulator

__init__(keep_top: int = 3, norm_delta_max: bool = True, precursor_correlation_cutoff: float = 0.5, fragment_correlation_ratio: float = 0.75)[source]¶

TransferLearningAccumulator is used to accumulate the information from the output folders for fine-tuning by selecting the top keep_top precursors and their fragments from all the output folders. The current measure of score is the probA

Parameters:

keep_top (int, optional) – The number of top precursors to keep, by default 3
norm_w_calib (bool, optional) –
If true, advanced normalization of retention times will be performed. Retention times are normalized using calibrated deviation from the library at the start of the gradient and max normalization at the end of the gradient.

If false, max normalization will be performed, by default True
precursor_correlation_cutoff (float, optional) – Only precursors with a median fragment correlation above this cutoff will be used for MS2 learning, by default 0.5
fragment_correlation_ratio (float, optional) – The cutoff for the fragment correlation relative to the median fragment correlation for a precursor, by default 0.75

post_process()[source]¶: Post process the consensus_speclibase by normalizing retention times.

update(speclibase: SpecLibBase)[source]¶

Update the consensus_speclibase with the information from the speclibase.

Parameters:: speclibase (SpecLibBase) – The information from the output folder.

alphadia.outputtransform.outputaccumulator.build_speclibflat_from_quant(folder: str, mandatory_precursor_columns: list[str] | None = None, optional_precursor_columns: list[str] | None = None, charged_frag_types: list[str] | None = None) → SpecLibFlat[source]¶

Build a SpecLibFlat object from quantification output data stored in a folder for transfer learning.

Parameters:

folder (str) – The output folder to be parsed.
mandatory_precursor_columns (list[str], optional) – The columns to be selected from the precursor dataframe
optional_precursor_columns (list[str], optional) – Additional optional columns to include if present

Returns:

A spectral library object containing the parsed data

Return type:

SpecLibFlat

alphadia.outputtransform.outputaccumulator.error_callback(e)[source]¶

alphadia.outputtransform.outputaccumulator.ms2_quality_control(spec_lib_base: SpecLibBase, precursor_correlation_cutoff: float = 0.5, fragment_correlation_ratio: float = 0.75)[source]¶

Perform quality control for transfer learning by filtering out precursors with low median fragment correlation and fragments with low correlation.

Parameters:

spec_lib_base (SpecLibBase) – The SpecLibBase object to be normalized.
precursor_correlation_cutoff (float) – Only precursors with a median fragment correlation above this cutoff will be used for MS2 learning. Default is 0.5.
fragment_correlation_ratio (float) – The cutoff for the fragment correlation relative to the median fragment correlation for a precursor. Default is 0.75.

Returns:

The SpecLibBase object with the precursors and fragments that pass the quality control filters.

Return type:

SpecLibBase

alphadia.outputtransform.outputaccumulator.normalize_rt_delta_max(spec_lib_base: SpecLibBase) → SpecLibBase[source]¶

Normalize the retention times of the precursors in the SpecLibBase object using delta max normalization.

Parameters:: spec_lib_base (SpecLibBase) – The SpecLibBase object to be normalized.
Returns:: The SpecLibBase object with the retention times normalized using delta max normalization.
Return type:: SpecLibBase

alphadia.outputtransform.outputaccumulator.normalize_rt_max(spec_lib_base: SpecLibBase) → SpecLibBase[source]¶

Normalize the retention times of the precursors in the SpecLibBase object using max normalization.

Parameters:: spec_lib_base (SpecLibBase) – The SpecLibBase object to be normalized.
Returns:: The SpecLibBase object with the retention times normalized using max normalization.
Return type:: SpecLibBase

alphadia.outputtransform.protein_fdr.perform_protein_fdr(psm_df: DataFrame, figure_path: str) → DataFrame[source]¶: Perform protein FDR on PSM dataframe

class alphadia.outputtransform.search_plan_output.SearchPlanOutput(config: Config, output_folder: str)[source]¶

Bases: object

INTERNAL_OUTPUT = 'internal'¶

LIBRARY_OUTPUT = 'speclib.mbr'¶

PG_OUTPUT = 'protein_groups'¶

PRECURSOR_OUTPUT = 'precursors'¶

PSM_INPUT = 'psm'¶

STAT_OUTPUT = 'stat'¶

TRANSFER_MODEL = 'peptdeep.transfer'¶

TRANSFER_OUTPUT = 'speclib.transfer'¶

TRANSFER_STATS_OUTPUT = 'stats.transfer'¶

__init__(config: Config, output_folder: str)[source]¶

Combine individual searches into and build combined outputs

In alphaDIA the search plan orchestrates the library building preparation, schedules the individual searches and combines the individual outputs into a single output.

The SearchPlanOutput class is responsible for combining the individual search outputs into a single output.

This includes: - combining the individual precursor tables - building the output stat table - performing protein grouping - performing protein FDR - performin label-free quantification - building the spectral library

Parameters:

config (Config) – Configuration object
output_folder (str) – Output folder

build(folder_list: list[str], base_spec_lib: SpecLibBase | None)[source]¶

Build output from a list of search outputs.

The following files are written to the output folder: - precursor.tsv - protein_groups.tsv - stat.tsv - speclib.mbr.hdf

Parameters:

folder_list (List[str]) – List of folders containing the search outputs
base_spec_lib (base.SpecLibBase, optional) – Base spectral library

alphadia.outputtransform.utils.apply_output_column_names(df: DataFrame) → DataFrame[source]¶

Convert internal column names to output names and filter to only mapped columns.

Only columns that are present in INTERNAL_TO_OUTPUT_MAPPING are kept in the output. This ensures that output files only contain the defined output columns.

Parameters:: df (pd.DataFrame) – Dataframe with internal column names
Returns:: Dataframe with output column names applied, containing only mapped columns
Return type:: pd.DataFrame

alphadia.outputtransform.utils.apply_protein_inference(psm_df: DataFrame, inference_strategy: Literal['library', 'maximum_parsimony', 'heuristic'], group_level: str) → DataFrame[source]¶

Apply protein inference strategy to PSM dataframe.

Parameters:

psm_df (pd.DataFrame) – PSM dataframe
inference_strategy (Literal["library", "maximum_parsimony", "heuristic"]) – Inference strategy: ‘library’, ‘maximum_parsimony’, or ‘heuristic’
group_level (str) – Grouping level: ‘proteins’ or ‘genes’

Returns:

PSM dataframe with protein grouping applied

Return type:

pd.DataFrame

alphadia.outputtransform.utils.get_channels_from_config(config: dict) → list[int][source]¶

Extract and compute channel list from configuration.

Parameters:: config (dict) – Configuration dictionary containing search and multiplexing settings
Returns:: Sorted list of channel integers
Return type:: list[int]

alphadia.outputtransform.utils.load_psm_files_from_folders(folder_list: list[str], psm_file_name: str) → list[DataFrame][source]¶

Load PSM files from multiple folders.

Parameters:

folder_list (list[str]) – List of folders containing PSM files
psm_file_name (str) – Name of the PSM file (without extension)

Returns:

List of PSM dataframes from all folders

Return type:

list[pd.DataFrame]

alphadia.outputtransform.utils.log_protein_fdr_summary(psm_df: DataFrame) → None[source]¶

Log summary statistics for protein FDR results.

Parameters:: psm_df (pd.DataFrame) – Precursor table with protein grouping and FDR filtering applied

alphadia.outputtransform.utils.merge_quant_levels_to_psm(psm_df: DataFrame, lfq_results: dict[str, DataFrame], quantlevel_configs: list) → DataFrame[source]¶

Merge quantification results from all levels back to the precursor table.

Parameters:

psm_df (pd.DataFrame) – Precursor table to merge quantification data into
lfq_results (dict[str, pd.DataFrame]) – Dictionary containing quantification results for each level
quantlevel_configs (list) – List of LFQOutputConfig objects defining quantification levels

Returns:

Updated precursor table with merged quantification data

Return type:

pd.DataFrame

alphadia.outputtransform.utils.prepare_psm_dataframe(psm_df: DataFrame) → DataFrame[source]¶

Prepare PSM dataframe by cleaning modification columns and hashing precursors.

Parameters:: psm_df (pd.DataFrame) – Raw PSM dataframe
Returns:: Prepared PSM dataframe with hashed precursor information
Return type:: pd.DataFrame

alphadia.outputtransform.utils.read_df(path_no_format, file_format='parquet')[source]¶

Read dataframe from disk with choosen file format

Parameters:

path_no_format (str) – File to read from disk without file format
file_format (str, default = 'parquet') – File format for loading the file. Available options: [‘parquet’, ‘tsv’]

Returns:

loaded dataframe from disk

Return type:

pd.DataFrame

alphadia.outputtransform.utils.write_df(df: DataFrame, path_no_format: str, file_format: str = 'parquet') → None[source]¶

Write dataframe from disk with chosen file format.

Parameters:

df (pd.DataFrame) – Dataframe to save to disk
path_no_format (str) – Path for file without format
file_format (str, default = 'parquet') – File format for loading the file. Available options: [‘parquet’, ‘tsv’]