DIA Transfer Learning for Dimethyl-Labeled Samples¶

This tutorial was created using AlphaDIA 1.10.1 - please be aware that there might be changes in your version

This guide demonstrates how to use AlphaDIA’s transfer learning capabilities for analyzing samples which require learning the retention time and fragmentation behaviour. We will use replicates of dimethyl-labeled samples with a three step workflow that is fully supported in AlphaDIA 1.10.0 onwards via both GUI and CLI.

The integrated workflow eliminates the need for multiple separate searches and consists of:

  1. Transfer Learning step: Generates a custom PeptDeep model fine-tuned to your specific samples, predicting retention times, fragmentation patterns, and charge state of your dimethyl-labeled peptides.

  2. First search step: Builds a spectral library using the fine-tuned model containing peptide information with accurate mass, retention time, and fragmentation predictions.

  3. Second search step: Uses the library created in the previous step for cross-run quantification (Match Between Runs), providing the most accurate and comprehensive results.

The entire process is fully automated, with data seamlessly transferred between steps and optimal parameters applied at each phase. You configure the search workflow in the settings tab, and AlphaDIA automatically handles information passing between search steps and activates required settings for transfer learning and MBR.

Prerequisites¶

This guide requires AlphaDIA 1.10.0 or higher.

Before starting, ensure you have:

  • A machine with at least 64 gigabytes of memory

  • Test data available for download here (replicates of dimethyl-labeled tryptic HeLa digests with light isotope)

  • Valid AlphaDIA installation with GUI (one-click installer recommended)

  • The BundledExecutionEngine selected as your execution engine

Make sure you have a project folder set up with the raw data, fasta file and an output folder.

Setting Up Your Project¶

1. Configure Input/Output¶

Launch AlphaDIA and configure your inputs. Set your output folder to a location of your choice where results will be stored. Select all raw files and add them to the file list. You’ll also need to add the FASTA file which will be used for library prediction. Set your output folder to a location of your choice where results will be stored.

3. Start the Workflow¶

With your settings configured, click the “Run Workflow” button to start the integrated multistep search process. The workflow executes in three automated phases.

Results¶

1. Output Folder¶

After the workflow completes, your results will be organized as follows:

The final analysis results will be located in the root of your project folder. These are the primary files you’ll use for your research.

Intermediate results from earlier steps are stored in subfolders named transfer and library, allowing you to examine the results of individual stages if needed.

For downstream analysis and interpretation, you’ll primarily work with the results stored in the root of the output folder:

The precursor-level file precursors.tsv, which contains detailed information about all identified peptide precursors, including retention times, intensities, and confidence metrics.

The protein matrix file pg.matrix.tsv, which summarizes protein-level quantification across all your samples, making it ideal for comparative analyses.

If you wish to use the custom PeptDeep models developed during your analysis for future projects, you can find them in transfer/peptdeep.transfer. These models can be valuable for analyzing similar samples in the future.

2. Search Performance¶

The stat.tsv file provides a quick overview of the number of precursors and protein groups identified across your samples, along with other relevant performance metrics. This is useful for quickly assessing the success of your analysis.

Run

Precursors (Transfer)

Precursors (First Search)

Precursors (Final)

20240408_OA1_Evo12_31min_TiHe_SA_H032_E32_F-40_B3

39,425

57,757

68,195

20240408_OA1_Evo12_31min_TiHe_SA_H032_E32_F-40_B2

39,062

58,165

68,192

20240408_OA1_Evo12_31min_TiHe_SA_H032_E32_F-40_B1

34,720

60,075

69,072

These numbers demonstrate how each step of the workflow contributes to increasing the number of identified precursors, with the final result identifying significantly more peptides than the initial transfer learning step alone.

The individual performance metrics of the peptdeep model are stored in transfer/stats.transfer.tsv. The Evaluation guide here explains how to analyze and visualize the learned peptdeep model.