workflow¶
- class alphadia.workflow.base.WorkflowBase(instance_name: str, config: Config, quant_path: str = None)[source]¶
Bases:
objectBase class for all workflows. This class is responsible for creating the workflow folder. It also initializes the calibration_manager and fdr_manager for the workflow.
- CALIBRATION_MANAGER_PKL_NAME = 'calibration_manager.pkl'¶
- FDR_MANAGER_PKL_NAME = 'fdr_manager.pkl'¶
- OPTIMIZATION_MANAGER_PKL_NAME = 'optimization_manager.pkl'¶
- RAW_FILE_MANAGER_PKL_NAME = 'raw_file_manager.pkl'¶
- TIMING_MANAGER_PKL_NAME = 'timing_manager.pkl'¶
- __init__(instance_name: str, config: Config, quant_path: str = None) None[source]¶
- Parameters:
instance_name (str) – Name for the particular workflow instance, e.g. the name of the raw file
config (Config) – Configuration for the workflow.
quant_path (str) – path to directory holding quant folders, relevant for distributed searches
- property calibration_manager: CalibrationManager¶
Calibration manager for the workflow. Owns the RT, IM, MZ calibration and the calibration data
- property dia_data: TimsTOFTranspose | AlphaRawBase | MzML | Sciex | Thermo¶
DIA data for the workflow. Owns the DIA data
- property optimization_manager: OptimizationManager¶
Optimization manager for the workflow. Owns the optimization data
- property path: str¶
Path to the workflow folder, e.g. first_search/quant/raw_file_xyz.raw
- property spectral_library: SpecLibFlat | None¶
Spectral library for the workflow. Owns the spectral library data
- property timing_manager: TimingManager¶
Optimization manager for the workflow. Owns the timing data
This module is responsible for creating and storing the configuration.
It allows updating the default configuration with one or more other configuration objects. The order of configs holds significance, with configurations later in the sequence overwriting previous values. Lists are always overwritten completely.
On demand, the current config can be visualized in a tree-like structure.
- class alphadia.workflow.config.Config(data: dict = None, name: str = 'default')[source]¶
Bases:
UserDictDict-like config class that can read from and write to yaml and json files and allows updating with other config objects.
TODO: this class should be read-only, but currently mutable value elements can be mutated.
- set_value(key: str | tuple[str, ...], path: str | list[str]) None[source]¶
Set a config key.
Only certain keys are allowed to be set. Use a tuple key for nested access, e.g. (“library_prediction”, “peptdeep_model_path”).
- update(configs: list[Config], do_print: bool = False)[source]¶
Updates the config with one or more other config objects.
The order of configs holds significance, with configurations later in the sequence taking precedence in terms of their impact on changes.
All changes to the default config are tracked and stored in a separate dictionary to enable convenient visualization of the changes.
- Parameters:
configs (list of configs) – List of config objects to update the current config with. The order of the configs is important (last one wins).
do_print (bool, optional) – Whether to print the modified config. Default is False.
workflow.managers¶
Base class for Managers.
In AlphaDIA, a “manager” is a stateful object, and can be saved/loaded from disk. Additionally, it may offer functionality to change its state.
- class alphadia.workflow.managers.base.BaseManager(path: None | str = None, load_from_file: bool = True, figure_path: None | str = None, reporter: None | Pipeline | Backend = None)[source]¶
Bases:
object- __init__(path: None | str = None, load_from_file: bool = True, figure_path: None | str = None, reporter: None | Pipeline | Backend = None)[source]¶
Base class for all managers which handle parts of the workflow.
- Parameters:
path (str, optional) – Path to the manager pickle on disk.
load_from_file (bool, optional) – If True, the manager will be loaded from file if it exists.
- property is_loaded_from_file¶
Check if the calibration manager was loaded from file.
- property path¶
Path to the manager pickle on disk.
- class alphadia.workflow.managers.calibration_manager.CalibrationEstimators[source]¶
Bases:
objectString constants for calibration estimators.
- MOBILITY = 'mobility'¶
- MZ = 'mz'¶
- RT = 'rt'¶
- class alphadia.workflow.managers.calibration_manager.CalibrationGroups[source]¶
Bases:
objectString constants for calibration groups.
- FRAGMENT = 'fragment'¶
- PRECURSOR = 'precursor'¶
- class alphadia.workflow.managers.calibration_manager.CalibrationManager(path: None | str = None, load_from_file: bool = True, has_ms1: bool = True, has_mobility: bool = True, **kwargs)[source]¶
Bases:
BaseManager- __init__(path: None | str = None, load_from_file: bool = True, has_ms1: bool = True, has_mobility: bool = True, **kwargs)[source]¶
Contains, updates and applies all calibrations for a single run.
Calibrations are grouped into calibration groups. Each calibration group is applied to a single data structure (precursor dataframe, fragment fataframe, etc.). Each calibration group contains multiple estimators which each calibrate a single property (mz, rt, etc.). Each estimator is a Calibration object which contains the estimator function.
- Parameters:
path (str, default=None) – Path where the current parameter set is saved to and loaded from.
load_from_file (bool, default=True) – If True, the manager will be loaded from file if it exists.
has_ms1 (bool, default=True) – If True, the calibration manager will include MS1 calibration. This will include an MS1 estimator in the precursor group.
has_mobility (bool, default=True) – If True, the calibration manager will include mobility calibration. This will include a mobility estimator in the precursor group.
kwargs – Will be passed to the parent class BaseManager, need to be valid keyword arguments.
- property estimator_groups: dict[str, dict[str, CalibrationEstimator]]¶
List of calibration groups.
- fit(df: DataFrame, group_name: str, plot: bool = True, figure_path: None | str = None)[source]¶
Fit all estimators in a calibration group.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the input and target columns
group_name (str) – Name of the calibration group
plot (bool, default=True) – If True, a plot of the calibration is generated.
figure_path (str, default=None) – If set, the generated plot is saved to the given path.
- get_estimator(group_name: str, estimator_name: str) CalibrationEstimator | None[source]¶
Get an estimator from a calibration group.
- Parameters:
group_name (str) – Name of the calibration group
estimator_name (str) – Name of the estimator
- Returns:
The estimator object or None if not found
- Return type:
CalibrationEstimator | None
- predict(df: DataFrame, group_name: str)[source]¶
Predict all estimators in a calibration group.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the input and target columns
group_name (str) – Name of the calibration group
- setup_estimator_groups(calibration_config: dict[str, dict[str, dict[str, str | int | list[str]]]])[source]¶
Load calibration configuration.
Each calibration config is a list of calibration groups which consist of multiple estimators. For each estimator the model and model_args are used to request a model from the calibration_model_provider and to initialize it. The estimator is then initialized with the Calibration class and added to the group.
- Parameters:
calibration_config (CalibrationConfig) – Calibration configuration
Example
Create a calibration manager with a single group and a single estimator:
calibration_manager = CalibrationManager() calibration_manager.load_config({ 'mz_calibration': [ { 'name': 'mz', 'model': 'LOESSRegression', 'model_args': { 'n_kernels': 2 }, 'input_columns': [CalibCols.MZ_LIBRARY], 'target_columns': [CalibCols.MZ_OBSERVED], 'output_columns': [CalibCols.MZ_CALIBRATED], 'transform_deviation': 1e6 } ] })
- class alphadia.workflow.managers.fdr_manager.FDRManager(feature_columns: list, classifier_base: Classifier, config: Config, dia_cycle: None | ndarray = None, path: None | str = None, load_from_file: bool = True, random_state: int | None = None, **kwargs)[source]¶
Bases:
BaseManager- __init__(feature_columns: list, classifier_base: Classifier, config: Config, dia_cycle: None | ndarray = None, path: None | str = None, load_from_file: bool = True, random_state: int | None = None, **kwargs)[source]¶
Contains, updates and applies classifiers for target-decoy competition-based false discovery rate (FDR) estimation.
- Parameters:
feature_columns (list) – List of feature columns to use for the classifier
classifier_base (object) – Base classifier object to use for the FDR estimation
config (Config) – The workflow configuration object
dia_cycle (None | np.ndarray) – DIA cycle information, if applicable. If None, no DIA cycle information is used.
path (str, optional) – Path to the manager pickle on disk.
load_from_file (bool, optional) – If True, the manager will be loaded from file if it exists.
random_state (int, optional) – Random state for reproducibility.
- property current_version¶
- fit_predict(features_df: DataFrame, decoy_strategy: Literal['precursor', 'precursor_channel_wise', 'channel'], competitive: bool, df_fragments: DataFrame | None = None, decoy_channel: int = -1, version: int = -1)[source]¶
Fit the classifier and perform FDR estimation.
- Parameters:
features_df (pd.DataFrame) – DataFrame containing the features to use for the classifier. Must contain the columns specified in self.feature_columns.
decoy_strategy (Literal["precursor", "precursor_channel_wise", "channel"]) – The decoy strategy.
competitive (bool) – Whether competitive scoring should be used.
df_fragments (None | pd.DataFrame) – Dataframe containing the fragments to use for the classifier. If None, no fragments are used.
decoy_channel (int) – Channel to use for decoy competition if decoy_strategy is “channel”. Defaults to -1, which means no decoy channel is used.
version (int) – Version of the classifier to use. If -1, uses the latest version. Defaults to -1.
Notes
The classifier_hash must be identical for every call of fit_predict for self._current_version to give the right index in self.classifier_store.
- get_classifier(available_columns: list, version: int = -1) Classifier[source]¶
Gets the classifier for a given set of feature columns and version. If the classifier is not found in the store, gets the base classifier instead.
- Parameters:
available_columns (list) – List of feature columns
version (int) – Version of the classifier to get
- Returns:
Classifier object
- Return type:
object
- load_classifier_store(path: None | str = None)[source]¶
Loads the classifier store from disk.
- Parameters:
path (None | str) – Location of the classifier to load. Loads from alphadia/constants/classifier if None.
- save_classifier_store(path: None | str = None, version: int = -1)[source]¶
Saves the classifier store to disk.
- Parameters:
path (None | str) – Where to save the classifier. Saves to alphadia/constants/classifier if None.
version (int) – Version of the classifier to save. Takes the last classifier if -1 (default)
- alphadia.workflow.managers.fdr_manager.get_group_columns(competitive: bool, group_channels: bool) list[str][source]¶
Determine the group columns based on competitiveness and channel grouping.
- competitivebool
If True, group candidates eluting at the same time by grouping them under the same ‘elution_group_idx’.
- group_channelsbool
If True and ‘competitive’ is also True, further groups candidates by ‘channel’.
- Returns:
A list of column names to be used for grouping in the analysis. If competitive, this could be either [‘elution_group_idx’, ‘channel’] or [‘elution_group_idx’] depending on the group_channels flag. If not competitive, the list will always be [‘precursor_idx’].
- Return type:
list
- class alphadia.workflow.managers.optimization_manager.OptimizationManager(config: None | Config = None, gradient_length: None | float = None, path: None | str = None, load_from_file: bool = True, **kwargs)[source]¶
Bases:
BaseManager- __init__(config: None | Config = None, gradient_length: None | float = None, path: None | str = None, load_from_file: bool = True, **kwargs)[source]¶
Base class for all managers which handle parts of the workflow.
- Parameters:
path (str, optional) – Path to the manager pickle on disk.
load_from_file (bool, optional) – If True, the manager will be loaded from file if it exists.
- classifier_version: int¶
- fwhm_mobility: float¶
- fwhm_rt: float¶
- mobility_error: float¶
- ms1_error: float¶
- ms2_error: float¶
- num_candidates: int¶
- rt_error: float¶
- score_cutoff: float¶
- update(*, ms1_error: float | None = None, ms2_error: float | None = None, rt_error: float | None = None, mobility_error: float | None = None, num_candidates: int | None = None, classifier_version: int | None = None, fwhm_rt: float | None = None, fwhm_mobility: float | None = None, score_cutoff: float | None = None)[source]¶
Update the parameters dict with the values in update_dict.
Manager handling the raw data file and its statistics.
- class alphadia.workflow.managers.raw_file_manager.RawFileManager(config: None | Config = None, path: None | str = None, load_from_file: bool = False, **kwargs)[source]¶
Bases:
BaseManager- __init__(config: None | Config = None, path: None | str = None, load_from_file: bool = False, **kwargs)[source]¶
Handles raw file loading and contains information on the raw file.
- get_dia_data_object(dia_data_path: str) TimsTOFTranspose | AlphaRawBase | MzML | Sciex | Thermo[source]¶
Get the correct data class depending on the file extension of the DIA data file.
- Parameters:
dia_data_path (str) – Path to the DIA data file
- Returns:
object containing the DIA data
- Return type:
DiaData
- class alphadia.workflow.managers.timing_manager.TimingManager(path: None | str = None, load_from_file: bool = True, **kwargs)[source]¶
Bases:
BaseManager- __init__(path: None | str = None, load_from_file: bool = True, **kwargs)[source]¶
Contains and updates timing information for the portions of the workflow.