calibration¶
The calibration module provides functionality to calibrate measured quantities like mass-to-charge ratios, retention times, and ion mobilities.
Models for calibration.
- class alphadia.calibration.models.LOESSRegression(n_kernels: int = 6, kernel_size: float = 2.0, polynomial_degree: int = 2, *, uniform: bool = False)[source]¶
Bases:
BaseEstimator,RegressorMixinscikit-learn estimator which implements a LOESS style local polynomial regression. The number of basis functions or kernels can be explicitly defined which allows for faster and cheaper training and inference.
- Parameters:
n_kernels (int) – default = 6, The number of local polynomial functions used to approximate the data. The location and extend of the kernels will be distributed to contain an equal number of datapoints in the training set.
kernel_size (float) – default = 2, A factor increasing the kernel size to overlap with the neighboring kernel.
polynomial_degree (int) – default = 2, Degree of the polynomial functions used for the local approximation.
uniform (bool) – default = False, If True, the kernels are distributed uniformly over the input space. If False, the kernels are distributed to contain an equal number of datapoints. For every kernel at least polynomial_degree + 1 datapoints are required.
- __init__(n_kernels: int = 6, kernel_size: float = 2.0, polynomial_degree: int = 2, *, uniform: bool = False)[source]¶
Initialize the LOESS regression model.
- fit(x: ndarray, y: ndarray) LOESSRegression[source]¶
Fit the model passed on provided training data.
- Parameters:
x (np.ndarray) – float, of shape (n_samples,) or (n_samples, 1), Training data. Note that only a single feature is supported at the moment.
y (np.ndarray, float) – of shape (n_samples,) or (n_samples, 1) Target values.
- Returns:
self – Returns the fitted estimator.
- Return type:
object
- predict(x: ndarray) ndarray[source]¶
Predict using the LOESS model.
- Parameters:
x (np.ndarray) – float, of shape (n_samples,) or (n_samples, 1) Feature data. Note that only a single feature is supported at the moment.
- Returns:
y (np.ndarray, float)
of shape (n_samples,) – Target values.
- set_fit_request(*, x: bool | None | str = '$UNCHANGED$') LOESSRegression¶
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
xparameter infit.- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, x: bool | None | str = '$UNCHANGED$') LOESSRegression¶
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
xparameter inpredict.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LOESSRegression¶
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- alphadia.calibration.models.construct_polynomial_regression(degree: int = 2, *, include_bias: bool = False) Pipeline[source]¶
Create a polynomial regression model.
Calibration estimator module.
- class alphadia.calibration.estimator.CalibrationEstimator(name: str, model: LOESSRegression | LinearRegression | Pipeline, input_columns: list[str], target_columns: list[str], output_columns: list[str], transform_deviation: None | str | float = None)[source]¶
Bases:
objectA single estimator for a property.
- __init__(name: str, model: LOESSRegression | LinearRegression | Pipeline, input_columns: list[str], target_columns: list[str], output_columns: list[str], transform_deviation: None | str | float = None)[source]¶
A single estimator for a property (mz, rt, etc.).
Calibration is performed by modeling the deviation of an input values (e.g. mz_library) from an observed property (e.g. mz_observed) using a function (e.g. LinearRegression). Once calibrated, calibrated values (e.g. mz_calibrated) can be predicted from input values (e.g. mz_library). Additional explaining variables can be added to the input values (e.g. rt_library) to improve the calibration.
- Parameters:
name (str) – Name of the estimator for logging and plotting e.g. ‘mz’
model (CalibrationModel) – The estimator object instance which must have a fit and predict method. This will usually be a sklearn estimator or a custom estimator.
input_columns (list[str]) – The columns of the dataframe that are used as input for the estimator e.g. [‘mz_library’]. The first column is the property which should be calibrated, additional columns can be used as explaining variables e.g. [‘mz_library’, ‘rt_library’].
target_columns (list[str]) – The columns of the dataframe that are used as target for the estimator e.g. [‘mz_observed’]. At the moment only one target column is supported.
output_columns (list[str]) – The columns of the dataframe that are used as output for the estimator e.g. [‘mz_calibrated’]. At the moment only one output column is supported.
transform_deviation (List[Union[None, float, str]]) – If set to a valid float, the deviation is expressed as a fraction of the input value e.g. 1e6 for ppm. If set to None, the deviation is expressed in absolute units.
- calc_deviation(df: DataFrame) ndarray[source]¶
Calculate the deviations between the input, target and calibrated values.
- Parameters:
df (pd.DataFrame) – Dataframe containing the input and target columns
- Returns:
Array of shape (n_samples, 3 + n_input_columns). The second dimension contains the observed deviation, calibrated deviation, residual deviation and the input values.
- Return type:
np.ndarray
- ci(df: DataFrame, ci: float = 0.95) float[source]¶
Calculate the residual deviation at the given confidence interval.
- Parameters:
df (pandas.DataFrame) – Dataframe containing the input and target columns
ci (float, default=0.95) – confidence interval
- Returns:
the confidence interval of the residual deviation after calibration
- Return type:
float
- fit(df: DataFrame, *, plot: bool = True, figure_path: str | None = None) None[source]¶
Fit the estimator based on the input and target columns of the dataframe.
- Parameters:
df (pd.DataFrame) – Dataframe containing the input and target columns
plot (bool, default=True) – If True, a plot of the calibration is generated.
figure_path (str, default=None) – If not None, a plot of the calibration is generated and saved.
- Returns:
Array of shape (n_input_columns, ) containing the mean absolute deviation of the residual deviation at the given confidence interval
- Return type:
np.ndarray
- classmethod from_file(file_name: str) CalibrationEstimator[source]¶
Load the estimator from pickle file.
- Parameters:
file_name (str) – Path to the pickle file
- predict(df: DataFrame, *, inplace: bool = True) ndarray | None[source]¶
Perform a prediction based on the input columns of the dataframe.
- Parameters:
df (pd.DataFrame) – Dataframe containing the input and target columns
inplace (bool, default=True) – If True, the prediction is added as a new column to the dataframe.
- Returns:
Array of shape (n_samples, ) containing the prediction
- Return type:
np.ndarray
- class alphadia.calibration.estimator.CalibrationModelProvider[source]¶
Bases:
objectA provider for calibration models that can be used in the calibration process.
- get_model(model_name: str) type[LOESSRegression | LinearRegression | Pipeline][source]¶
Get a model template by name.
- Parameters:
model_name (str) – Name of the model
- Returns:
The model template which must have a fit and predict method.
- Return type:
type[CalibrationModel]
- register_model(model_name: str, model_template: type[LOESSRegression | LinearRegression | Pipeline]) None[source]¶
Register a model template with a given name.
- Parameters:
model_name (str) – Name of the model
model_template (type[CalibrationModel]) – The model template which must have a fit and predict method.