calibration¶

The calibration module provides functionality to calibrate measured quantities like mass-to-charge ratios, retention times, and ion mobilities.

Models for calibration.

class alphadia.calibration.models.LOESSRegression(n_kernels: int = 6, kernel_size: float = 2.0, polynomial_degree: int = 2, *, uniform: bool = False)[source]¶

Bases: BaseEstimator, RegressorMixin

scikit-learn estimator which implements a LOESS style local polynomial regression. The number of basis functions or kernels can be explicitly defined which allows for faster and cheaper training and inference.

Parameters:

n_kernels (int) – default = 6, The number of local polynomial functions used to approximate the data. The location and extend of the kernels will be distributed to contain an equal number of datapoints in the training set.
kernel_size (float) – default = 2, A factor increasing the kernel size to overlap with the neighboring kernel.
polynomial_degree (int) – default = 2, Degree of the polynomial functions used for the local approximation.
uniform (bool) – default = False, If True, the kernels are distributed uniformly over the input space. If False, the kernels are distributed to contain an equal number of datapoints. For every kernel at least polynomial_degree + 1 datapoints are required.

__init__(n_kernels: int = 6, kernel_size: float = 2.0, polynomial_degree: int = 2, *, uniform: bool = False)[source]¶: Initialize the LOESS regression model.

fit(x: ndarray, y: ndarray) → LOESSRegression[source]¶

Fit the model passed on provided training data.

Parameters:

x (np.ndarray) – float, of shape (n_samples,) or (n_samples, 1), Training data. Note that only a single feature is supported at the moment.
y (np.ndarray, float) – of shape (n_samples,) or (n_samples, 1) Target values.

Returns:

self – Returns the fitted estimator.

Return type:

object

predict(x: ndarray) → ndarray[source]¶

Predict using the LOESS model.

Parameters:

x (np.ndarray) – float, of shape (n_samples,) or (n_samples, 1) Feature data. Note that only a single feature is supported at the moment.

Returns:

y (np.ndarray, float)
of shape (n_samples,) – Target values.

set_fit_request(*, x: bool | None | str = '$UNCHANGED$') → LOESSRegression¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_predict_request(*, x: bool | None | str = '$UNCHANGED$') → LOESSRegression¶

Configure whether metadata should be requested to be passed to the predict method.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in predict.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → LOESSRegression¶

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

alphadia.calibration.models.construct_polynomial_regression(degree: int = 2, *, include_bias: bool = False) → Pipeline[source]¶: Create a polynomial regression model.

Calibration estimator module.

class alphadia.calibration.estimator.CalibrationEstimator(name: str, model: LOESSRegression | LinearRegression | Pipeline, input_columns: list[str], target_columns: list[str], output_columns: list[str], transform_deviation: None | str | float = None)[source]¶

Bases: object

A single estimator for a property.

__init__(name: str, model: LOESSRegression | LinearRegression | Pipeline, input_columns: list[str], target_columns: list[str], output_columns: list[str], transform_deviation: None | str | float = None)[source]¶

A single estimator for a property (mz, rt, etc.).

Calibration is performed by modeling the deviation of an input values (e.g. mz_library) from an observed property (e.g. mz_observed) using a function (e.g. LinearRegression). Once calibrated, calibrated values (e.g. mz_calibrated) can be predicted from input values (e.g. mz_library). Additional explaining variables can be added to the input values (e.g. rt_library) to improve the calibration.

Parameters:

name (str) – Name of the estimator for logging and plotting e.g. ‘mz’
model (CalibrationModel) – The estimator object instance which must have a fit and predict method. This will usually be a sklearn estimator or a custom estimator.
input_columns (list[str]) – The columns of the dataframe that are used as input for the estimator e.g. [‘mz_library’]. The first column is the property which should be calibrated, additional columns can be used as explaining variables e.g. [‘mz_library’, ‘rt_library’].
target_columns (list[str]) – The columns of the dataframe that are used as target for the estimator e.g. [‘mz_observed’]. At the moment only one target column is supported.
output_columns (list[str]) – The columns of the dataframe that are used as output for the estimator e.g. [‘mz_calibrated’]. At the moment only one output column is supported.
transform_deviation (List[Union[None, float, str]]) – If set to a valid float, the deviation is expressed as a fraction of the input value e.g. 1e6 for ppm. If set to None, the deviation is expressed in absolute units.

calc_deviation(df: DataFrame) → ndarray[source]¶

Calculate the deviations between the input, target and calibrated values.

Parameters:: df (pd.DataFrame) – Dataframe containing the input and target columns
Returns:: Array of shape (n_samples, 3 + n_input_columns). The second dimension contains the observed deviation, calibrated deviation, residual deviation and the input values.
Return type:: np.ndarray

ci(df: DataFrame, ci: float = 0.95) → float[source]¶

Calculate the residual deviation at the given confidence interval.

Parameters:

df (pandas.DataFrame) – Dataframe containing the input and target columns
ci (float, default=0.95) – confidence interval

Returns:

the confidence interval of the residual deviation after calibration

Return type:

float

fit(df: DataFrame, *, plot: bool = True, figure_path: str | None = None) → None[source]¶

Fit the estimator based on the input and target columns of the dataframe.

Parameters:

df (pd.DataFrame) – Dataframe containing the input and target columns
plot (bool, default=True) – If True, a plot of the calibration is generated.
figure_path (str, default=None) – If not None, a plot of the calibration is generated and saved.

Returns:

Array of shape (n_input_columns, ) containing the mean absolute deviation of the residual deviation at the given confidence interval

Return type:

np.ndarray

classmethod from_file(file_name: str) → CalibrationEstimator[source]¶

Load the estimator from pickle file.

Parameters:: file_name (str) – Path to the pickle file

predict(df: DataFrame, *, inplace: bool = True) → ndarray | None[source]¶

Perform a prediction based on the input columns of the dataframe.

Parameters:

df (pd.DataFrame) – Dataframe containing the input and target columns
inplace (bool, default=True) – If True, the prediction is added as a new column to the dataframe.

Returns:

Array of shape (n_samples, ) containing the prediction

Return type:

np.ndarray

save(file_name: str) → None[source]¶

Save the estimator to pickle file.

Parameters:: file_name (str) – Path to the pickle file

class alphadia.calibration.estimator.CalibrationModelProvider[source]¶

Bases: object

A provider for calibration models that can be used in the calibration process.

__init__()[source]¶: Provides a collection of scikit-learn compatible models for calibration.

get_model(model_name: str) → type[LOESSRegression | LinearRegression | Pipeline][source]¶

Get a model template by name.

Parameters:: model_name (str) – Name of the model
Returns:: The model template which must have a fit and predict method.
Return type:: type[CalibrationModel]

register_model(model_name: str, model_template: type[LOESSRegression | LinearRegression | Pipeline]) → None[source]¶

Parameters:

model_name (str) – Name of the model
model_template (type[CalibrationModel]) – The model template which must have a fit and predict method.