libtransform

This module implement the library transform pipeline. It allows to load, manipulate and save alphabse libraries, formulated as a chain of transformations.

Abstract classes for the library transformations.

class alphadia.libtransform.base.ProcessingPipeline(steps: list[ProcessingStep])[source]

Bases: object

__init__(steps: list[ProcessingStep]) None[source]

Processing pipeline for loading and transforming spectral libraries.

The pipeline is a list of ProcessingStep objects. Each step is called in order and the output of the previous step is passed to the next step.

Example:

pipeline = ProcessingPipeline([
    DynamicLoader(),
    PrecursorInitializer(),
    AnnotateFasta(fasta_path_list),
    IsotopeGenerator(),
    DecoyGenerator(),
    RTNormalization()
])

library = pipeline(input_path)
class alphadia.libtransform.base.ProcessingStep[source]

Bases: object

__init__() None[source]

Base class for processing steps. Each implementation must implement the validate and forward method. Processing steps can be chained together in a ProcessingPipeline.

forward(*args: Any) Any[source]

Run the processing step on the input object.

validate(*args: Any) bool[source]

Validate the input object.

class alphadia.libtransform.loader.DynamicLoader(modification_mapping: dict | None = None)[source]

Bases: ProcessingStep

__init__(modification_mapping: dict | None = None) None[source]

Load a spectral library from a file. The file type is dynamically inferred from the file ending. Expects a str as input and will return a SpecLibBase object.

Supported file types are:

Alphabase hdf5 files The library is loaded into a SpecLibBase object and immediately returned.

Long format csv files The classical spectral library format as returned by MSFragger. It will be imported and converted to a SpecLibBase format. This might require additional parsing information.

forward(input_path: str) SpecLibBase[source]

Load the spectral library from the input path. The file type is dynamically inferred from the file ending.

validate(input: str) bool[source]

Validate the input object. It is expected that the input is a path to a file which exists.

class alphadia.libtransform.fasta_digest.FastaDigest(enzyme: str = 'trypsin', fixed_modifications: list[str] | None = None, variable_modifications: list[str] | None = None, missed_cleavages: int = 1, precursor_len: list[int] | None = None, precursor_charge: list[int] | None = None, precursor_mz: list[int] | None = None, max_var_mod_num: int = 1)[source]

Bases: ProcessingStep

__init__(enzyme: str = 'trypsin', fixed_modifications: list[str] | None = None, variable_modifications: list[str] | None = None, missed_cleavages: int = 1, precursor_len: list[int] | None = None, precursor_charge: list[int] | None = None, precursor_mz: list[int] | None = None, max_var_mod_num: int = 1) None[source]

Digest a FASTA file into a spectral library. Expects a List[str] object as input and will return a SpecLibBase object.

forward(input: list[str]) SpecLibBase[source]

Run the processing step on the input object.

validate(input: list[str]) bool[source]

Validate the input object.

class alphadia.libtransform.prediction.PeptDeepPrediction(use_gpu: bool = True, mp_process_num: int = 8, fragment_mz: list[int] | None = None, nce: int = 25, instrument: str = 'Lumos', peptdeep_model_path: str | None = None, peptdeep_model_type: str | None = None, fragment_types: list[str] | None = None, max_fragment_charge: int = 2, predict_charge: bool = False, min_charge_probability: float = 0.1)[source]

Bases: ProcessingStep

__init__(use_gpu: bool = True, mp_process_num: int = 8, fragment_mz: list[int] | None = None, nce: int = 25, instrument: str = 'Lumos', peptdeep_model_path: str | None = None, peptdeep_model_type: str | None = None, fragment_types: list[str] | None = None, max_fragment_charge: int = 2, predict_charge: bool = False, min_charge_probability: float = 0.1) None[source]

Predict the retention time of a spectral library using PeptDeep.

Parameters:
  • use_gpu (bool, optional) – Use GPU for prediction. Default is True.

  • mp_process_num (int, optional) – Number of processes to use for prediction. Default is 8.

  • fragment_mz (List[int], optional) – MZ range for fragment prediction. Default is [100, 2000].

  • nce (int, optional) – Normalized collision energy for prediction. Default is 25.

  • instrument (str, optional) – Instrument type for prediction. Default is “Lumos”. Must be a valid PeptDeep instrument.

  • peptdeep_model_path (str, optional) – Path to a folder containing PeptDeep models. If not provided, the default models will be used.

  • peptdeep_model_type (str, optional) – Use other peptdeep models provided by the peptdeep model manager. Default is None, which means the default model provided by peptdeep (e.g. “generic” for version 1.4.0) is being used. Possible values are [‘generic’,’phospho’,’digly’]

  • fragment_types (list[str], optional) – Fragment types to predict. Default is [“b”, “y”].

  • max_fragment_charge (int, optional) – Maximum charge state to predict. Default is 2.

  • predict_charge (bool, optional) – Whether to predict charge states using PeptDeep’s charge model. Default is False.

  • min_charge_probability (float, optional) – Minimum probability threshold for including a charge state. Default is 0.1. Uses peptdeep’s charge range as defined by the loaded model.

forward(input: SpecLibBase) SpecLibBase[source]

Run the processing step on the input object.

validate(input: list[str]) bool[source]

Validate the input object.

class alphadia.libtransform.decoy.DecoyGenerator(decoy_type: str = 'diann', mp_process_num: int = 8)[source]

Bases: ProcessingStep

__init__(decoy_type: str = 'diann', mp_process_num: int = 8) None[source]

Generate decoys for the spectral library. Expects a SpecLibBase object as input and will return a SpecLibBase object.

Parameters:

decoy_type (str, optional) – Type of decoys to generate. Currently only pseudo_reverse and diann are supported. Default is diann.

forward(input: SpecLibBase) SpecLibBase[source]

Generate decoys for the spectral library.

validate(input: SpecLibBase) bool[source]

Validate the input object. It is expected that the input is a SpecLibBase object.

class alphadia.libtransform.flatten.FlattenLibrary(top_k_fragments: int = 12, min_fragment_intensity: float = 0.01)[source]

Bases: ProcessingStep

__init__(top_k_fragments: int = 12, min_fragment_intensity: float = 0.01) None[source]

Convert a SpecLibBase object into a SpecLibFlat object.

Parameters:
  • top_k_fragments (int, optional) – Number of top fragments to keep. Default is 12.

  • min_fragment_intensity (float, optional) – Minimum intensity threshold for fragments. Default is 0.01.

forward(input: SpecLibBase) SpecLibFlat[source]

Convert a SpecLibBase object into a SpecLibFlat object.

validate(input: SpecLibBase) bool[source]

Validate the input object. It is expected that the input is a SpecLibBase object.

class alphadia.libtransform.flatten.InitFlatColumns[source]

Bases: ProcessingStep

__init__() None[source]

Initialize the columns of a SpecLibFlat object for alphadia search. Calibratable columns are mz_library, rt_library and mobility_library will be initialized with the first matching column in the input dataframe.

forward(input: SpecLibFlat) SpecLibFlat[source]

Initialize the columns of a SpecLibFlat object for alphadia search.

validate(input: SpecLibFlat) bool[source]

Validate the input object. It is expected that the input is a SpecLibFlat object.

class alphadia.libtransform.flatten.LogFlatLibraryStats[source]

Bases: ProcessingStep

__init__() None[source]

Log basic statistics of a SpecLibFlat object.

forward(input: SpecLibFlat) SpecLibFlat[source]

Validate the input object. It is expected that the input is a SpecLibFlat object.

validate(input: SpecLibFlat) bool[source]

Validate the input object. It is expected that the input is a SpecLibFlat object.

class alphadia.libtransform.harmonize.AnnotateFasta(fasta_path_list: list[str], drop_unannotated: bool = True)[source]

Bases: ProcessingStep

__init__(fasta_path_list: list[str], drop_unannotated: bool = True) None[source]

Annotate the precursor dataframe with protein information from a FASTA file.

Expects a SpecLibBase object as input and will return a SpecLibBase object.

Parameters:
  • fasta_path_list (List[str]) – List of paths to FASTA files. Multiple files can be provided and will be merged into a single protein dataframe.

  • drop_unannotated (bool, optional) – Drop all precursors which could not be annotated by the FASTA file. Default is True.

forward(input: SpecLibBase) SpecLibBase[source]

Annotate the precursor dataframe with protein information from a FASTA file.

validate(input: SpecLibBase) bool[source]

Validate the input object. It is expected that the input is a SpecLibBase object and that all FASTA files exist.

class alphadia.libtransform.harmonize.IsotopeGenerator(n_isotopes: int = 4, mp_process_num: int = 8)[source]

Bases: ProcessingStep

__init__(n_isotopes: int = 4, mp_process_num: int = 8) None[source]

Generate isotope information for the spectral library. Expects a SpecLibBase object as input and will return a SpecLibBase object.

Parameters:

n_isotopes (int, optional) – Number of isotopes to generate. Default is 4.

forward(input: SpecLibBase) SpecLibBase[source]

Generate isotope information for the spectral library.

validate(input: SpecLibBase) bool[source]

Validate the input object. It is expected that the input is a SpecLibBase object.

class alphadia.libtransform.harmonize.PrecursorInitializer(drop_decoys: bool = False)[source]

Bases: ProcessingStep

__init__(drop_decoys: bool = False) None[source]

Initialize alphabase spectral library with precursor information.

Expects a SpecLibBase object as input and will return a SpecLibBase object. This step is required for all spectral libraries and will add the precursor_idx, decoy, channel and elution_group_idx columns to the precursor dataframe.

Parameters:

drop_decoys (bool, optional) – Drop decoys from the library during initialization. Default is False. Set to True to allow FASTA annotation of libraries that already contain decoys.

forward(input: SpecLibBase) SpecLibBase[source]

Initialize the precursor dataframe with the precursor_idx, decoy, channel and elution_group_idx columns.

validate(input: SpecLibBase) bool[source]

Validate the input object. It is expected that the input is a SpecLibBase object.

class alphadia.libtransform.harmonize.RTNormalization[source]

Bases: ProcessingStep

__init__() None[source]

Normalize the retention time of the spectral library. Expects a SpecLibBase object as input and will return a SpecLibBase object.

forward(input: SpecLibBase) SpecLibBase[source]

Normalize the retention time of the spectral library.

validate(input: SpecLibBase) bool[source]

Validate the input object. It is expected that the input is a SpecLibBase object.

class alphadia.libtransform.multiplex.MultiplexLibrary(multiplex_mapping: list, input_channel: str | int | None = None)[source]

Bases: ProcessingStep

__init__(multiplex_mapping: list, input_channel: str | int | None = None)[source]

Initialize the MultiplexLibrary step.

forward(input: SpecLibBase) SpecLibBase[source]

Apply the MultiplexLibrary step to the input object.

validate(input: SpecLibBase) bool[source]

Validate the input object. It is expected that the input is a SpecLibBase object.

class alphadia.libtransform.mbr.IndexBuilder(target_keys: ndarray, target_fallback_keys: ndarray, fallback_lookup_keys: ndarray, specific_lookup_keys: ndarray)[source]

Bases: object

Build and apply lookup indices with fallback and specific matching.

This class computes indices that map each target to a value in a lookup table. It uses a two-level lookup strategy: 1. Fallback: Every target gets an index based on its fallback key (elution_group_idx) 2. Specific: Targets whose primary key (mod_seq_charge_hash) exists in the specific

lookup table get marked for override

Both lookups use pandas hash-based indexing for O(n) average case complexity.

Parameters:
  • target_keys (np.ndarray) – Primary keys for specific lookup (e.g., lib mod_seq_charge_hash).

  • target_fallback_keys (np.ndarray) – Fallback keys for each target (e.g., lib elution_group_idx).

  • fallback_lookup_keys (np.ndarray) – Keys in fallback lookup table (e.g., PSM elution_group_idx).

  • specific_lookup_keys (np.ndarray) – Keys in specific lookup table (e.g., PSM mod_seq_charge_hash).

__init__(target_keys: ndarray, target_fallback_keys: ndarray, fallback_lookup_keys: ndarray, specific_lookup_keys: ndarray) None[source]
apply(fallback_values: ndarray, specific_values: ndarray) ndarray[source]

Apply precomputed indices to get values.

Parameters:
  • fallback_values (np.ndarray) – Values from the fallback lookup table.

  • specific_values (np.ndarray) – Values from the specific lookup table.

Returns:

Result array with fallback values, overridden by specific values where available.

Return type:

np.ndarray

class alphadia.libtransform.mbr.MbrLibraryBuilder(fdr: float = 0.01, keep_decoys: bool = False)[source]

Bases: ProcessingStep

__init__(fdr: float = 0.01, keep_decoys: bool = False) None[source]

Base class for processing steps. Each implementation must implement the validate and forward method. Processing steps can be chained together in a ProcessingPipeline.

forward(psm_df: DataFrame, base_library: SpecLibBase) SpecLibBase[source]

Build MBR library from PSM results and base library.

Parameters:
  • psm_df (pd.DataFrame) – PSM results with columns: elution_group_idx, decoy, qval, rt_observed, pg, mod_seq_charge_hash.

  • base_library (SpecLibBase) – Base spectral library containing target precursors.

Returns:

MBR library with RT and protein group assignments.

Return type:

SpecLibBase

Notes

MBR library generation procedure: 1. Filter PSMs by FDR threshold 2. Get elution groups that passed FDR (targets only, or targets+decoys) 3. Filter base library to those elution groups 4. Generate decoys if keep_decoys=True, then rehash so each

precursor has a unique mod_seq_charge_hash

  1. Assign RT and protein groups to each precursor

RT and protein group assignment strategy: - If a precursor’s mod_seq_charge_hash was identified in PSM, use its specific RT/pg - Otherwise, fall back to the median RT / first pg of its elution group

This ensures identified precursors get their observed values while unidentified precursors (e.g., decoys after rehashing) inherit sensible defaults from their elution group.

validate(psm_df: DataFrame, base_library: SpecLibBase) bool[source]

Validate the input object. It is expected that the input is a SpecLibFlat object.