workflow

class alphadia.workflow.base.WorkflowBase(instance_name: str, config: Config, quant_path: str = None)[source]

Bases: object

Base class for all workflows. This class is responsible for creating the workflow folder. It also initializes the calibration_manager and fdr_manager for the workflow.

CALIBRATION_MANAGER_PKL_NAME = 'calibration_manager.pkl'
FDR_MANAGER_PKL_NAME = 'fdr_manager.pkl'
OPTIMIZATION_MANAGER_PKL_NAME = 'optimization_manager.pkl'
RAW_FILE_MANAGER_PKL_NAME = 'raw_file_manager.pkl'
TIMING_MANAGER_PKL_NAME = 'timing_manager.pkl'
__init__(instance_name: str, config: Config, quant_path: str = None) None[source]
Parameters:
  • instance_name (str) – Name for the particular workflow instance, e.g. the name of the raw file

  • config (Config) – Configuration for the workflow.

  • quant_path (str) – path to directory holding quant folders, relevant for distributed searches

property calibration_manager: CalibrationManager

Calibration manager for the workflow. Owns the RT, IM, MZ calibration and the calibration data

property config: Config

Configuration for the workflow.

property dia_data: TimsTOFTranspose | AlphaRawBase | MzML | Sciex | Thermo

DIA data for the workflow. Owns the DIA data

load(dia_data_path: str, spectral_library: SpecLibFlat) None[source]
property optimization_manager: OptimizationManager

Optimization manager for the workflow. Owns the optimization data

property path: str

Path to the workflow folder, e.g. first_search/quant/raw_file_xyz.raw

property spectral_library: SpecLibFlat | None

Spectral library for the workflow. Owns the spectral library data

property timing_manager: TimingManager

Optimization manager for the workflow. Owns the timing data

This module is responsible for creating and storing the configuration.

It allows updating the default configuration with one or more other configuration objects. The order of configs holds significance, with configurations later in the sequence overwriting previous values. Lists are always overwritten completely.

On demand, the current config can be visualized in a tree-like structure.

class alphadia.workflow.config.Config(data: dict = None, name: str = 'default')[source]

Bases: UserDict

Dict-like config class that can read from and write to yaml and json files and allows updating with other config objects.

TODO: this class should be read-only, but currently mutable value elements can be mutated.

__init__(data: dict = None, name: str = 'default') None[source]
copy()[source]
from_json(path: str) None[source]
from_yaml(path: str) None[source]
set_value(key: str | tuple[str, ...], path: str | list[str]) None[source]

Set a config key.

Only certain keys are allowed to be set. Use a tuple key for nested access, e.g. (“library_prediction”, “peptdeep_model_path”).

to_json(path: str) None[source]
to_yaml(path: str) None[source]
update(configs: list[Config], do_print: bool = False)[source]

Updates the config with one or more other config objects.

The order of configs holds significance, with configurations later in the sequence taking precedence in terms of their impact on changes.

All changes to the default config are tracked and stored in a separate dictionary to enable convenient visualization of the changes.

Parameters:
  • configs (list of configs) – List of config objects to update the current config with. The order of the configs is important (last one wins).

  • do_print (bool, optional) – Whether to print the modified config. Default is False.

workflow.managers

Base class for Managers.

In AlphaDIA, a “manager” is a stateful object, and can be saved/loaded from disk. Additionally, it may offer functionality to change its state.

class alphadia.workflow.managers.base.BaseManager(path: None | str = None, load_from_file: bool = True, figure_path: None | str = None, reporter: None | Pipeline | Backend = None)[source]

Bases: object

__init__(path: None | str = None, load_from_file: bool = True, figure_path: None | str = None, reporter: None | Pipeline | Backend = None)[source]

Base class for all managers which handle parts of the workflow.

Parameters:
  • path (str, optional) – Path to the manager pickle on disk.

  • load_from_file (bool, optional) – If True, the manager will be loaded from file if it exists.

property is_loaded_from_file

Check if the calibration manager was loaded from file.

load()[source]

Load the state from pickle file.

property path

Path to the manager pickle on disk.

save()[source]

Save the state to pickle file.

class alphadia.workflow.managers.calibration_manager.CalibrationEstimators[source]

Bases: object

String constants for calibration estimators.

MOBILITY = 'mobility'
MZ = 'mz'
RT = 'rt'
class alphadia.workflow.managers.calibration_manager.CalibrationGroups[source]

Bases: object

String constants for calibration groups.

FRAGMENT = 'fragment'
PRECURSOR = 'precursor'
class alphadia.workflow.managers.calibration_manager.CalibrationManager(path: None | str = None, load_from_file: bool = True, has_ms1: bool = True, has_mobility: bool = True, **kwargs)[source]

Bases: BaseManager

__init__(path: None | str = None, load_from_file: bool = True, has_ms1: bool = True, has_mobility: bool = True, **kwargs)[source]

Contains, updates and applies all calibrations for a single run.

Calibrations are grouped into calibration groups. Each calibration group is applied to a single data structure (precursor dataframe, fragment fataframe, etc.). Each calibration group contains multiple estimators which each calibrate a single property (mz, rt, etc.). Each estimator is a Calibration object which contains the estimator function.

Parameters:
  • path (str, default=None) – Path where the current parameter set is saved to and loaded from.

  • load_from_file (bool, default=True) – If True, the manager will be loaded from file if it exists.

  • has_ms1 (bool, default=True) – If True, the calibration manager will include MS1 calibration. This will include an MS1 estimator in the precursor group.

  • has_mobility (bool, default=True) – If True, the calibration manager will include mobility calibration. This will include a mobility estimator in the precursor group.

  • kwargs – Will be passed to the parent class BaseManager, need to be valid keyword arguments.

property estimator_groups: dict[str, dict[str, CalibrationEstimator]]

List of calibration groups.

fit(df: DataFrame, group_name: str, plot: bool = True, figure_path: None | str = None)[source]

Fit all estimators in a calibration group.

Parameters:
  • df (pandas.DataFrame) – Dataframe containing the input and target columns

  • group_name (str) – Name of the calibration group

  • plot (bool, default=True) – If True, a plot of the calibration is generated.

  • figure_path (str, default=None) – If set, the generated plot is saved to the given path.

get_estimator(group_name: str, estimator_name: str) CalibrationEstimator | None[source]

Get an estimator from a calibration group.

Parameters:
  • group_name (str) – Name of the calibration group

  • estimator_name (str) – Name of the estimator

Returns:

The estimator object or None if not found

Return type:

CalibrationEstimator | None

predict(df: DataFrame, group_name: str)[source]

Predict all estimators in a calibration group.

Parameters:
  • df (pandas.DataFrame) – Dataframe containing the input and target columns

  • group_name (str) – Name of the calibration group

setup_estimator_groups(calibration_config: dict[str, dict[str, dict[str, str | int | list[str]]]])[source]

Load calibration configuration.

Each calibration config is a list of calibration groups which consist of multiple estimators. For each estimator the model and model_args are used to request a model from the calibration_model_provider and to initialize it. The estimator is then initialized with the Calibration class and added to the group.

Parameters:

calibration_config (CalibrationConfig) – Calibration configuration

Example

Create a calibration manager with a single group and a single estimator:

calibration_manager = CalibrationManager()
calibration_manager.load_config({
    'mz_calibration': [
        {
            'name': 'mz',
            'model': 'LOESSRegression',
            'model_args': { 'n_kernels': 2 },
            'input_columns': [CalibCols.MZ_LIBRARY],
            'target_columns': [CalibCols.MZ_OBSERVED],
            'output_columns': [CalibCols.MZ_CALIBRATED],
            'transform_deviation': 1e6
        }
    ]
})
class alphadia.workflow.managers.fdr_manager.FDRManager(feature_columns: list, classifier_base: Classifier, config: Config, dia_cycle: None | ndarray = None, path: None | str = None, load_from_file: bool = True, random_state: int | None = None, **kwargs)[source]

Bases: BaseManager

__init__(feature_columns: list, classifier_base: Classifier, config: Config, dia_cycle: None | ndarray = None, path: None | str = None, load_from_file: bool = True, random_state: int | None = None, **kwargs)[source]

Contains, updates and applies classifiers for target-decoy competition-based false discovery rate (FDR) estimation.

Parameters:
  • feature_columns (list) – List of feature columns to use for the classifier

  • classifier_base (object) – Base classifier object to use for the FDR estimation

  • config (Config) – The workflow configuration object

  • dia_cycle (None | np.ndarray) – DIA cycle information, if applicable. If None, no DIA cycle information is used.

  • path (str, optional) – Path to the manager pickle on disk.

  • load_from_file (bool, optional) – If True, the manager will be loaded from file if it exists.

  • random_state (int, optional) – Random state for reproducibility.

property current_version
fit_predict(features_df: DataFrame, decoy_strategy: Literal['precursor', 'precursor_channel_wise', 'channel'], competitive: bool, df_fragments: DataFrame | None = None, decoy_channel: int = -1, version: int = -1)[source]

Fit the classifier and perform FDR estimation.

Parameters:
  • features_df (pd.DataFrame) – DataFrame containing the features to use for the classifier. Must contain the columns specified in self.feature_columns.

  • decoy_strategy (Literal["precursor", "precursor_channel_wise", "channel"]) – The decoy strategy.

  • competitive (bool) – Whether competitive scoring should be used.

  • df_fragments (None | pd.DataFrame) – Dataframe containing the fragments to use for the classifier. If None, no fragments are used.

  • decoy_channel (int) – Channel to use for decoy competition if decoy_strategy is “channel”. Defaults to -1, which means no decoy channel is used.

  • version (int) – Version of the classifier to use. If -1, uses the latest version. Defaults to -1.

Notes

The classifier_hash must be identical for every call of fit_predict for self._current_version to give the right index in self.classifier_store.

get_classifier(available_columns: list, version: int = -1) Classifier[source]

Gets the classifier for a given set of feature columns and version. If the classifier is not found in the store, gets the base classifier instead.

Parameters:
  • available_columns (list) – List of feature columns

  • version (int) – Version of the classifier to get

Returns:

Classifier object

Return type:

object

load_classifier_store(path: None | str = None)[source]

Loads the classifier store from disk.

Parameters:

path (None | str) – Location of the classifier to load. Loads from alphadia/constants/classifier if None.

save_classifier_store(path: None | str = None, version: int = -1)[source]

Saves the classifier store to disk.

Parameters:
  • path (None | str) – Where to save the classifier. Saves to alphadia/constants/classifier if None.

  • version (int) – Version of the classifier to save. Takes the last classifier if -1 (default)

alphadia.workflow.managers.fdr_manager.column_hash(columns)[source]
alphadia.workflow.managers.fdr_manager.get_group_columns(competitive: bool, group_channels: bool) list[str][source]

Determine the group columns based on competitiveness and channel grouping.

competitivebool

If True, group candidates eluting at the same time by grouping them under the same ‘elution_group_idx’.

group_channelsbool

If True and ‘competitive’ is also True, further groups candidates by ‘channel’.

Returns:

A list of column names to be used for grouping in the analysis. If competitive, this could be either [‘elution_group_idx’, ‘channel’] or [‘elution_group_idx’] depending on the group_channels flag. If not competitive, the list will always be [‘precursor_idx’].

Return type:

list

class alphadia.workflow.managers.optimization_manager.OptimizationManager(config: None | Config = None, gradient_length: None | float = None, path: None | str = None, load_from_file: bool = True, **kwargs)[source]

Bases: BaseManager

__init__(config: None | Config = None, gradient_length: None | float = None, path: None | str = None, load_from_file: bool = True, **kwargs)[source]

Base class for all managers which handle parts of the workflow.

Parameters:
  • path (str, optional) – Path to the manager pickle on disk.

  • load_from_file (bool, optional) – If True, the manager will be loaded from file if it exists.

classifier_version: int
fwhm_mobility: float
fwhm_rt: float
mobility_error: float
ms1_error: float
ms2_error: float
num_candidates: int
rt_error: float
score_cutoff: float
update(*, ms1_error: float | None = None, ms2_error: float | None = None, rt_error: float | None = None, mobility_error: float | None = None, num_candidates: int | None = None, classifier_version: int | None = None, fwhm_rt: float | None = None, fwhm_mobility: float | None = None, score_cutoff: float | None = None)[source]

Update the parameters dict with the values in update_dict.

Manager handling the raw data file and its statistics.

class alphadia.workflow.managers.raw_file_manager.RawFileManager(config: None | Config = None, path: None | str = None, load_from_file: bool = False, **kwargs)[source]

Bases: BaseManager

__init__(config: None | Config = None, path: None | str = None, load_from_file: bool = False, **kwargs)[source]

Handles raw file loading and contains information on the raw file.

get_dia_data_object(dia_data_path: str) TimsTOFTranspose | AlphaRawBase | MzML | Sciex | Thermo[source]

Get the correct data class depending on the file extension of the DIA data file.

Parameters:

dia_data_path (str) – Path to the DIA data file

Returns:

object containing the DIA data

Return type:

DiaData

class alphadia.workflow.managers.timing_manager.TimingManager(path: None | str = None, load_from_file: bool = True, **kwargs)[source]

Bases: BaseManager

__init__(path: None | str = None, load_from_file: bool = True, **kwargs)[source]

Contains and updates timing information for the portions of the workflow.

set_end_time(workflow_stage: str)[source]

Stores the end time of the given stage of the workflow in the timings attribute and calculates the duration. Also saves the timing manager to disk.

Parameters:

workflow_stage (str) – The name under which the timing will be stored in the timings dict

set_start_time(workflow_stage: str)[source]

Stores the start time of the given stage of the workflow in the timings attribute. Also saves the timing manager to disk.

Parameters:

workflow_stage (str) – The name under which the timing will be stored in the timings dict