Output formatΒΆ
Single-Step SearchΒΆ
For a standard single-step search, all output files are written directly to the output directory specified with the -o flag:
output/
βββ stats.tsv
βββ precursors.parquet
βββ pg.matrix.parquet
βββ internal.tsv
βββ speclib.hdf
βββ speclib.mbr.hdf
βββ frozen_config.yaml
βββ quant/
β βββ <raw_file_1>/
β β βββ psm.parquet
β β βββ frag.parquet
β βββ <raw_file_2>/
β β βββ psm.parquet
β β βββ frag.parquet
β βββ ...
βββ figures/
Multi-Step SearchΒΆ
AlphaDIA supports multi-step searches to improve identification rates through transfer learning and match-between-runs (MBR). When these features are enabled, the output is organized into subdirectories, with each step producing its own intermediate results. The output of the final step will always be in the root of the output directory.
Transfer Learning Step (transfer_step_enabled: true)ΒΆ
When transfer learning is enabled, an initial search is performed to train sample-specific PeptDeep models. The intermediate results are stored in output/transfer/, which include:
Training data for the neural network models (
speclib.transfer.hdf)Trained PeptDeep models (
peptdeep.transfer/)Statistics from the transfer learning process (
stats.transfer.tsv)
The final results will still be saved in the output folder.
Match Between Runs (mbr_step_enabled: true)ΒΆ
When MBR is enabled, a two-pass search strategy is used. The first pass performs an initial search to build a sample-specific MBR library. Intermediate results are stored in output/library/, including:
The MBR library built from first-pass identifications (
speclib.mbr.hdf)Statistics from the library building step
The final results will still be saved in the output folder.
Output FilesΒΆ
OverviewΒΆ
File |
Description |
|---|---|
|
Main output with precursor-level information, quantification, and scoring |
|
Summary statistics and quality metrics per run/channel |
|
Protein group quantification matrix across all samples |
|
Peptide-level quantification matrix (if enabled) |
|
Precursor-level quantification matrix (if enabled) |
|
Internal statistics and metadata from the search |
|
Input spectral library (may be reannotated or predicted) |
|
MBR library containing all identified precursors |
|
Fragment quantities extracted from search results for transfer learning |
|
Complete configuration snapshot for reproducibility |
|
Per-file quantification data for checkpointing |
|
Quality control figures and visualizations |
precursors.parquetΒΆ
The main output file containing precursor-level identifications with scoring, quantification, and metadata.
Format: one row per identified precursor per run.
ColumnsΒΆ
Column |
Description |
Unit |
|---|---|---|
Raw Level |
||
|
Name of the raw file/run |
- |
Precursor Level |
||
|
Unique index for the precursor in the library (consistent only within a search; may vary across searches due to filtering or raw files) |
- |
|
Index of the elution group (precursors eluting together; consistent only within a search) |
- |
|
Peptide sequence |
- |
|
Precursor charge state |
- |
|
Modification types (e.g., Phospho@S; semicolon-separated) |
- |
|
Modification positions in the sequence (e.g., 5; semicolon-separated, corresponds to mods) |
- |
|
Hash of modified sequence (peptide level; stable across searches for comparison) |
- |
|
Hash of modified sequence with charge (precursor level; stable across searches for comparison) |
- |
|
Rank of this precursor in the search candidates |
- |
|
Number of amino acids in the sequence |
count |
|
Calculated (theoretical) m/z based on peptide sequence and modifications |
- |
|
Observed m/z |
- |
|
Calibrated m/z |
- |
|
Library-annotated retention time (predicted or empirical) |
seconds |
|
Observed retention time |
seconds |
|
Calibrated retention time |
seconds |
|
Full width at half maximum of the RT peak |
seconds |
|
Library-annotated ion mobility (predicted or empirical) |
mobility units |
|
Observed ion mobility |
mobility units |
|
Calibrated ion mobility |
mobility units |
|
Full width at half maximum of the mobility peak |
mobility units |
|
Quantified intensity (LFQ intensity if enabled) |
arbitrary units |
|
Q-value (FDR-corrected p-value) |
- |
|
Decoy probability score from classifier (range 0-1). Lower scores indicate higher probability of a target hit. |
- |
|
Raw score from scoring function |
- |
|
Channel number (0 for label-free) |
- |
|
Decoy flag (0=target, 1=decoy) |
- |
Peptide Level |
||
|
Peptide-level intensity (if peptide-level LFQ enabled) |
arbitrary units |
Protein Group Level |
||
|
Protein group identifier |
- |
|
Protein accessions in the group (semicolon-separated) |
- |
|
Gene names associated with the protein group |
- |
|
Representative protein in the group |
- |
|
Protein group q-value |
- |
|
Protein group intensity (if LFQ enabled) |
arbitrary units |
Notes:
Mobility columns are only present for ion mobility data
precursor.mz.calibratedis calculated only if the MS1 spectra in the raw file follow a DIA cycle. Currently not calculated when using therustextraction backend.LFQ intensities are only present when label-free quantification is enabled
Decoy precursors (
decoy=1) are typically filtered out unlesskeep_decoysis enabledThe
precursor.probavalue represents the decoy probability score (lower is better for target hits)Identifiers for comparison: Use
precursor.mod_seq_hash(peptide level) orprecursor.mod_seq_charge_hash(precursor level) to match identifications across different searches. These hashes are stable and based on the modified sequence, making them suitable for comparing results between runs, experiments, or analysis versions. In contrast,precursor.idxandprecursor.elution_group_idxare search-specific and should not be used for cross-search comparisons
stats.tsvΒΆ
The stats.tsv file contains summary statistics and quality metrics for each run and channel in the analysis.
It provides insights into the search results, calibration quality, and general performance metrics.
Format: one row per run/channel combination.
ColumnsΒΆ
Column |
Description |
Unit |
|---|---|---|
Raw Level |
||
|
Name of the raw file/run |
- |
|
Total duration of the gradient |
seconds |
|
Number of scans per cycle |
count |
|
Average duration of each cycle |
seconds |
|
Total number of cycles in the run |
count |
|
Minimum MS2 m/z value measured |
- |
|
Maximum MS2 m/z value measured |
- |
Search Level |
||
|
Channel number (0 for label-free, or channel numbers for multiplexed data) |
- |
|
Number of identified precursors in this run/channel |
count |
|
Number of unique protein groups identified in this run/channel |
count |
|
Mean full width at half maximum of peaks in retention time |
seconds |
|
Mean FWHM of peaks in mobility dimension (ion mobility data only) |
mobility units |
Optimization Level |
||
|
Final MS2 mass error tolerance used |
ppm |
|
Final MS1 mass error tolerance used |
ppm |
|
Final retention time tolerance used |
seconds |
|
Final ion mobility tolerance used (ion mobility data only) |
mobility units |
Calibration Level |
||
|
Median mass bias for fragment ions |
ppm |
|
Median mass variance for fragment ions |
ppm |
|
Median mass bias for precursor ions |
ppm |
|
Median mass variance for precursor ions |
ppm |
Notes:
Some columns may be NaN if the corresponding measurements or calibrations were not performed
For label-free data, there will typically be one row per run with channel=0
For multiplexed data, there will be multiple rows per run (one for each channel)
Important: The
search.precursorsandsearch.proteinscounts represent identification statistics (precursors/proteins that passed protein FDR), while the quantification matrices (see below) contain quantification statistics (a subset that passed additional quality filters for LFQ). Typically ~3-4% of identified precursors may lack quantification values due to insufficient fragment quality, poor correlation, or failing directLFQ thresholds. This is expected behavior and indicates the difference between identification (broader) and quantification (stricter quality requirements).
pg.matrix.parquetΒΆ
The protein group quantification matrix provides protein-level quantification across all samples. It contains one row per protein group and one column per sample.
Important: This matrix contains only protein groups with valid quantification values. The number of non-zero entries per sample may be slightly lower (~0.3-0.8%) than the search.proteins count in stats.tsv, which reports all identified proteins. The difference represents proteins that were identified but could not be quantified due to insufficient fragment data or quality.
peptide.matrix.parquetΒΆ
The peptide quantification matrix provides peptide-level quantification across all samples (when peptide-level LFQ is enabled). It contains one row per peptide and one column per sample.
Important: This matrix contains only peptides with valid quantification values. Peptides that were identified but failed quality filters for LFQ will have missing (NaN) values or may be absent from the matrix entirely.
precursor.matrix.parquetΒΆ
The precursor quantification matrix provides precursor-level quantification across all samples (when precursor-level LFQ is enabled). It contains one row per precursor and one column per sample.
Important: This matrix contains only precursors with valid quantification values. The number of non-zero entries per sample will be lower (~3-4%) than the search.precursors count in stats.tsv. The difference represents precursors that were identified but failed quantification quality filters such as:
Poor fragment quality or correlation (below
min_correlationthreshold)Insufficient fragments (fewer than
min_k_fragments)Insufficient non-missing values (below
min_nonnanthreshold for directLFQ)
This is expected behavior and reflects the distinction between identification (passing FDR) and quantification (passing additional quality requirements).
internal.tsvΒΆ
Internal statistics and timing information from the search process.
speclib.hdfΒΆ
The input spectral library as it was loaded and preprocessed. This includes isotope calculation, library prediction, retention time normalization, and FASTA annotation.
speclib.mbr.hdfΒΆ
The match-between-runs (MBR) output library containing all precursors which were identified in the search for second-step quantification.
All high-confidence precursor identifications from the search
Empirically optimized retention times and ion mobilities
Only created when
general.save_mbr_library: truein the configuration
speclib.transfer.hdfΒΆ
The transfer learning library containing training data for sample-specific model refinement.
High-confidence precursor identifications
All requested fragment types (not just top fragments)
Observed intensities (not predicted values)
Empirically measured retention times and ion mobilities
quant/ folderΒΆ
The quant/ folder contains per-file quantification results used for checkpointing and distributed searches.
Structure:
quant/
βββ <raw_file_1>/
β βββ psm.parquet # PSM-level data for the raw file
β βββ frag.parquet # Fragment-level data for the raw file
βββ <raw_file_2>/
β βββ psm.parquet
β βββ frag.parquet
βββ ...
If the files psm.parquet and frag.parquet are available in the quant/ folder, these values will be reused when reuse_quant is enabled in the configuration. This allows for efficient re-analysis without re-extracting quantification data from raw files.
See the restarting documentation for more details on using the --quant-dir parameter and reuse_quant configuration.