Output format¶

Single-Step Search¶

For a standard single-step search, all output files are written directly to the output directory specified with the -o flag:

output/
├── stats.tsv
├── precursors.parquet
├── pg.matrix.parquet
├── internal.tsv
├── speclib.hdf
├── speclib.mbr.hdf
├── frozen_config.yaml
├── quant/
│   ├── <raw_file_1>/
│   │   ├── psm.parquet
│   │   └── frag.parquet
│   ├── <raw_file_2>/
│   │   ├── psm.parquet
│   │   └── frag.parquet
│   └── ...
└── figures/

Multi-Step Search¶

AlphaDIA supports multi-step searches to improve identification rates through transfer learning and match-between-runs (MBR). When these features are enabled, the output is organized into subdirectories, with each step producing its own intermediate results. The output of the final step will always be in the root of the output directory.

Transfer Learning Step (`transfer_step_enabled: true`)¶

When transfer learning is enabled, an initial search is performed to train sample-specific PeptDeep models. The intermediate results are stored in output/transfer/, which include:

Training data for the neural network models (speclib.transfer.hdf)
Trained PeptDeep models (peptdeep.transfer/)
Statistics from the transfer learning process (stats.transfer.tsv)

The final results will still be saved in the output folder.

Match Between Runs (`mbr_step_enabled: true`)¶

When MBR is enabled, a two-pass search strategy is used. The first pass performs an initial search to build a sample-specific MBR library. Intermediate results are stored in output/library/, including:

The MBR library built from first-pass identifications (speclib.mbr.hdf)
Statistics from the library building step

The final results will still be saved in the output folder.

Output Files¶

Overview¶

File	Description
`precursors.parquet`	Main output with precursor-level information, quantification, and scoring
`stats.tsv`	Summary statistics and quality metrics per run/channel
`pg.matrix.parquet`	Protein group quantification matrix across all samples
`peptide.matrix.parquet`	Peptide-level quantification matrix (if enabled)
`precursor.matrix.parquet`	Precursor-level quantification matrix (if enabled)
`internal.tsv`	Internal statistics and metadata from the search
`speclib.hdf`	Input spectral library (may be reannotated or predicted)
`speclib.mbr.hdf`	MBR library containing all identified precursors
`speclib.transfer.hdf`	Fragment quantities extracted from search results for transfer learning
`frozen_config.yaml`	Complete configuration snapshot for reproducibility
`quant/`	Per-file quantification data for checkpointing
`figures/`	Quality control figures and visualizations

`precursors.parquet`¶

The main output file containing precursor-level identifications with scoring, quantification, and metadata.

Format: one row per identified precursor per run.

Columns¶

Column	Description	Unit
Raw Level
`raw.name`	Name of the raw file/run	-
Precursor Level
`precursor.idx`	Unique index for the precursor in the library (consistent only within a search; may vary across searches due to filtering or raw files)	-
`precursor.elution_group_idx`	Index of the elution group (precursors eluting together; consistent only within a search)	-
`precursor.sequence`	Peptide sequence	-
`precursor.charge`	Precursor charge state	-
`precursor.mods`	Modification types (e.g., Phospho@S; semicolon-separated)	-
`precursor.mod_sites`	Modification positions in the sequence (e.g., 5; semicolon-separated, corresponds to mods)	-
`precursor.mod_seq_hash`	Hash of modified sequence (peptide level; stable across searches for comparison)	-
`precursor.mod_seq_charge_hash`	Hash of modified sequence with charge (precursor level; stable across searches for comparison)	-
`precursor.rank`	Rank of this precursor in the search candidates	-
`precursor.naa`	Number of amino acids in the sequence	count
`precursor.mz.library`	Calculated (theoretical) m/z based on peptide sequence and modifications	-
`precursor.mz.observed`	Observed m/z	-
`precursor.mz.calibrated`	Calibrated m/z	-
`precursor.rt.library`	Library-annotated retention time (predicted or empirical)	seconds
`precursor.rt.observed`	Observed retention time	seconds
`precursor.rt.calibrated`	Calibrated retention time	seconds
`precursor.rt.fwhm`	Full width at half maximum of the RT peak	seconds
`precursor.mobility.library`	Library-annotated ion mobility (predicted or empirical)	mobility units
`precursor.mobility.observed`	Observed ion mobility	mobility units
`precursor.mobility.calibrated`	Calibrated ion mobility	mobility units
`precursor.mobility.fwhm`	Full width at half maximum of the mobility peak	mobility units
`precursor.intensity`	Quantified intensity (LFQ intensity if enabled)	arbitrary units
`precursor.qval`	Q-value (FDR-corrected p-value)	-
`precursor.proba`	Decoy probability score from classifier (range 0-1). Lower scores indicate higher probability of a target hit.	-
`precursor.score`	Raw score from scoring function	-
`precursor.channel`	Channel number (0 for label-free)	-
`precursor.decoy`	Decoy flag (0=target, 1=decoy)	-
Peptide Level
`peptide.intensity`	Peptide-level intensity (if peptide-level LFQ enabled)	arbitrary units
Protein Group Level
`pg.name`	Protein group identifier	-
`pg.proteins`	Protein accessions in the group (semicolon-separated)	-
`pg.genes`	Gene names associated with the protein group	-
`pg.master_protein`	Representative protein in the group	-
`pg.qval`	Protein group q-value	-
`pg.intensity`	Protein group intensity (if LFQ enabled)	arbitrary units

Notes:

Mobility columns are only present for ion mobility data
precursor.mz.calibrated is calculated only if the MS1 spectra in the raw file follow a DIA cycle. Currently not calculated when using the rust extraction backend.
LFQ intensities are only present when label-free quantification is enabled
Decoy precursors (decoy=1) are typically filtered out unless keep_decoys is enabled
The precursor.proba value represents the decoy probability score (lower is better for target hits)
Identifiers for comparison: Use precursor.mod_seq_hash (peptide level) or precursor.mod_seq_charge_hash (precursor level) to match identifications across different searches. These hashes are stable and based on the modified sequence, making them suitable for comparing results between runs, experiments, or analysis versions. In contrast, precursor.idx and precursor.elution_group_idx are search-specific and should not be used for cross-search comparisons

`stats.tsv`¶

The stats.tsv file contains summary statistics and quality metrics for each run and channel in the analysis. It provides insights into the search results, calibration quality, and general performance metrics.

Format: one row per run/channel combination.

Columns¶

Column	Description	Unit
Raw Level
`raw.name`	Name of the raw file/run	-
`raw.gradient_length`	Total duration of the gradient	seconds
`raw.cycle_length`	Number of scans per cycle	count
`raw.cycle_duration`	Average duration of each cycle	seconds
`raw.cycle_number`	Total number of cycles in the run	count
`raw.ms2_range_min`	Minimum MS2 m/z value measured	-
`raw.ms2_range_max`	Maximum MS2 m/z value measured	-
Search Level
`search.channel`	Channel number (0 for label-free, or channel numbers for multiplexed data)	-
`search.precursors`	Number of identified precursors in this run/channel	count
`search.proteins`	Number of unique protein groups identified in this run/channel	count
`search.fwhm_rt`	Mean full width at half maximum of peaks in retention time	seconds
`search.fwhm_mobility`	Mean FWHM of peaks in mobility dimension (ion mobility data only)	mobility units
Optimization Level
`optimization.ms2_error`	Final MS2 mass error tolerance used	ppm
`optimization.ms1_error`	Final MS1 mass error tolerance used	ppm
`optimization.rt_error`	Final retention time tolerance used	seconds
`optimization.mobility_error`	Final ion mobility tolerance used (ion mobility data only)	mobility units
Calibration Level
`calibration.ms2_bias`	Median mass bias for fragment ions	ppm
`calibration.ms2_variance`	Median mass variance for fragment ions	ppm
`calibration.ms1_bias`	Median mass bias for precursor ions	ppm
`calibration.ms1_variance`	Median mass variance for precursor ions	ppm

Notes:

Some columns may be NaN if the corresponding measurements or calibrations were not performed
For label-free data, there will typically be one row per run with channel=0
For multiplexed data, there will be multiple rows per run (one for each channel)
Important: The search.precursors and search.proteins counts represent identification statistics (precursors/proteins that passed protein FDR), while the quantification matrices (see below) contain quantification statistics (a subset that passed additional quality filters for LFQ). Typically ~3-4% of identified precursors may lack quantification values due to insufficient fragment quality, poor correlation, or failing directLFQ thresholds. This is expected behavior and indicates the difference between identification (broader) and quantification (stricter quality requirements).

`pg.matrix.parquet`¶

The protein group quantification matrix provides protein-level quantification across all samples. It contains one row per protein group and one column per sample.

Important: This matrix contains only protein groups with valid quantification values. The number of non-zero entries per sample may be slightly lower (~0.3-0.8%) than the search.proteins count in stats.tsv, which reports all identified proteins. The difference represents proteins that were identified but could not be quantified due to insufficient fragment data or quality.

`peptide.matrix.parquet`¶

The peptide quantification matrix provides peptide-level quantification across all samples (when peptide-level LFQ is enabled). It contains one row per peptide and one column per sample.

Important: This matrix contains only peptides with valid quantification values. Peptides that were identified but failed quality filters for LFQ will have missing (NaN) values or may be absent from the matrix entirely.

`precursor.matrix.parquet`¶

The precursor quantification matrix provides precursor-level quantification across all samples (when precursor-level LFQ is enabled). It contains one row per precursor and one column per sample.

Important: This matrix contains only precursors with valid quantification values. The number of non-zero entries per sample will be lower (~3-4%) than the search.precursors count in stats.tsv. The difference represents precursors that were identified but failed quantification quality filters such as:

Poor fragment quality or correlation (below min_correlation threshold)
Insufficient fragments (fewer than min_k_fragments)
Insufficient non-missing values (below min_nonnan threshold for directLFQ)

This is expected behavior and reflects the distinction between identification (passing FDR) and quantification (passing additional quality requirements).

`internal.tsv`¶

Internal statistics and timing information from the search process.

`speclib.hdf`¶

The input spectral library as it was loaded and preprocessed. This includes isotope calculation, library prediction, retention time normalization, and FASTA annotation.

`speclib.mbr.hdf`¶

The match-between-runs (MBR) output library containing all precursors which were identified in the search for second-step quantification.

All high-confidence precursor identifications from the search
Empirically optimized retention times and ion mobilities
Only created when general.save_mbr_library: true in the configuration

`speclib.transfer.hdf`¶

The transfer learning library containing training data for sample-specific model refinement.

High-confidence precursor identifications
All requested fragment types (not just top fragments)
Observed intensities (not predicted values)
Empirically measured retention times and ion mobilities

`quant/` folder¶

The quant/ folder contains per-file quantification results used for checkpointing and distributed searches.

Structure:

quant/
├── <raw_file_1>/
│   ├── psm.parquet      # PSM-level data for the raw file
│   └── frag.parquet     # Fragment-level data for the raw file
├── <raw_file_2>/
│   ├── psm.parquet
│   └── frag.parquet
└── ...

If the files psm.parquet and frag.parquet are available in the quant/ folder, these values will be reused when reuse_quant is enabled in the configuration. This allows for efficient re-analysis without re-extracting quantification data from raw files.

See the restarting documentation for more details on using the --quant-dir parameter and reuse_quant configuration.

Output format¶

Single-Step Search¶

Multi-Step Search¶

Transfer Learning Step (transfer_step_enabled: true)¶

Match Between Runs (mbr_step_enabled: true)¶

Output Files¶

Overview¶

precursors.parquet¶

Columns¶

stats.tsv¶

Columns¶

pg.matrix.parquet¶

peptide.matrix.parquet¶

precursor.matrix.parquet¶

internal.tsv¶

speclib.hdf¶

speclib.mbr.hdf¶

speclib.transfer.hdf¶

quant/ folder¶

Transfer Learning Step (`transfer_step_enabled: true`)¶

Match Between Runs (`mbr_step_enabled: true`)¶

`precursors.parquet`¶

`stats.tsv`¶

`pg.matrix.parquet`¶

`peptide.matrix.parquet`¶

`precursor.matrix.parquet`¶

`internal.tsv`¶

`speclib.hdf`¶

`speclib.mbr.hdf`¶

`speclib.transfer.hdf`¶

`quant/` folder¶