Output formatΒΆ

Output FilesΒΆ

OverviewΒΆ

File

Description

precursors.parquet

Main output with precursor-level information, quantification, and scoring

stats.tsv

Summary statistics and quality metrics per run/channel

pg.matrix.parquet

Protein group quantification matrix across all samples

peptide.matrix.parquet

Peptide-level quantification matrix (if enabled)

precursor.matrix.parquet

Precursor-level quantification matrix (if enabled)

internal.tsv

Internal statistics and metadata from the search

speclib.hdf

Input spectral library (may be reannotated or predicted)

speclib.mbr.hdf

MBR library containing all identified precursors

speclib.transfer.hdf

Fragment quantities extracted from search results for transfer learning

frozen_config.yaml

Complete configuration snapshot for reproducibility

quant/

Per-file quantification data for checkpointing

figures/

Quality control figures and visualizations

precursors.parquetΒΆ

The main output file containing precursor-level identifications with scoring, quantification, and metadata.

Format: one row per identified precursor per run.

ColumnsΒΆ

Column

Description

Unit

Raw Level

raw.name

Name of the raw file/run

-

Precursor Level

precursor.idx

Unique index for the precursor in the library (consistent only within a search; may vary across searches due to filtering or raw files)

-

precursor.elution_group_idx

Index of the elution group (precursors eluting together; consistent only within a search)

-

precursor.sequence

Peptide sequence

-

precursor.charge

Precursor charge state

-

precursor.mods

Modification types (e.g., Phospho@S; semicolon-separated)

-

precursor.mod_sites

Modification positions in the sequence (e.g., 5; semicolon-separated, corresponds to mods)

-

precursor.mod_seq_hash

Hash of modified sequence (peptide level; stable across searches for comparison)

-

precursor.mod_seq_charge_hash

Hash of modified sequence with charge (precursor level; stable across searches for comparison)

-

precursor.rank

Rank of this precursor in the search candidates

-

precursor.naa

Number of amino acids in the sequence

count

precursor.mz.library

Calculated (theoretical) m/z based on peptide sequence and modifications

-

precursor.mz.observed

Observed m/z

-

precursor.mz.calibrated

Calibrated m/z

-

precursor.rt.library

Library-annotated retention time (predicted or empirical)

seconds

precursor.rt.observed

Observed retention time

seconds

precursor.rt.calibrated

Calibrated retention time

seconds

precursor.rt.fwhm

Full width at half maximum of the RT peak

seconds

precursor.mobility.library

Library-annotated ion mobility (predicted or empirical)

mobility units

precursor.mobility.observed

Observed ion mobility

mobility units

precursor.mobility.calibrated

Calibrated ion mobility

mobility units

precursor.mobility.fwhm

Full width at half maximum of the mobility peak

mobility units

precursor.intensity

Quantified intensity (LFQ intensity if enabled)

arbitrary units

precursor.qval

Q-value (FDR-corrected p-value)

-

precursor.proba

Decoy probability score from classifier (range 0-1). Lower scores indicate higher probability of a target hit.

-

precursor.score

Raw score from scoring function

-

precursor.channel

Channel number (0 for label-free)

-

precursor.decoy

Decoy flag (0=target, 1=decoy)

-

Peptide Level

peptide.intensity

Peptide-level intensity (if peptide-level LFQ enabled)

arbitrary units

Protein Group Level

pg.name

Protein group identifier

-

pg.proteins

Protein accessions in the group (semicolon-separated)

-

pg.genes

Gene names associated with the protein group

-

pg.master_protein

Representative protein in the group

-

pg.qval

Protein group q-value

-

pg.intensity

Protein group intensity (if LFQ enabled)

arbitrary units

Notes:

  • Mobility columns are only present for ion mobility data

  • precursor.mz.calibrated is calculated only if the MS1 spectra in the raw file follow a DIA cycle. Currently not calculated when using the rust extraction backend.

  • LFQ intensities are only present when label-free quantification is enabled

  • Decoy precursors (decoy=1) are typically filtered out unless keep_decoys is enabled

  • The precursor.proba value represents the decoy probability score (lower is better for target hits)

  • Identifiers for comparison: Use precursor.mod_seq_hash (peptide level) or precursor.mod_seq_charge_hash (precursor level) to match identifications across different searches. These hashes are stable and based on the modified sequence, making them suitable for comparing results between runs, experiments, or analysis versions. In contrast, precursor.idx and precursor.elution_group_idx are search-specific and should not be used for cross-search comparisons

stats.tsvΒΆ

The stats.tsv file contains summary statistics and quality metrics for each run and channel in the analysis. It provides insights into the search results, calibration quality, and general performance metrics.

Format: one row per run/channel combination.

ColumnsΒΆ

Column

Description

Unit

Raw Level

raw.name

Name of the raw file/run

-

raw.gradient_length

Total duration of the gradient

seconds

raw.cycle_length

Number of scans per cycle

count

raw.cycle_duration

Average duration of each cycle

seconds

raw.cycle_number

Total number of cycles in the run

count

raw.ms2_range_min

Minimum MS2 m/z value measured

-

raw.ms2_range_max

Maximum MS2 m/z value measured

-

Search Level

search.channel

Channel number (0 for label-free, or channel numbers for multiplexed data)

-

search.precursors

Number of identified precursors in this run/channel

count

search.proteins

Number of unique protein groups identified in this run/channel

count

search.fwhm_rt

Mean full width at half maximum of peaks in retention time

seconds

search.fwhm_mobility

Mean FWHM of peaks in mobility dimension (ion mobility data only)

mobility units

Optimization Level

optimization.ms2_error

Final MS2 mass error tolerance used

ppm

optimization.ms1_error

Final MS1 mass error tolerance used

ppm

optimization.rt_error

Final retention time tolerance used

seconds

optimization.mobility_error

Final ion mobility tolerance used (ion mobility data only)

mobility units

Calibration Level

calibration.ms2_bias

Median mass bias for fragment ions

ppm

calibration.ms2_variance

Median mass variance for fragment ions

ppm

calibration.ms1_bias

Median mass bias for precursor ions

ppm

calibration.ms1_variance

Median mass variance for precursor ions

ppm

Notes:

  • Some columns may be NaN if the corresponding measurements or calibrations were not performed

  • For label-free data, there will typically be one row per run with channel=0

  • For multiplexed data, there will be multiple rows per run (one for each channel)

  • Important: The search.precursors and search.proteins counts represent identification statistics (precursors/proteins that passed protein FDR), while the quantification matrices (see below) contain quantification statistics (a subset that passed additional quality filters for LFQ). Typically ~3-4% of identified precursors may lack quantification values due to insufficient fragment quality, poor correlation, or failing directLFQ thresholds. This is expected behavior and indicates the difference between identification (broader) and quantification (stricter quality requirements).

pg.matrix.parquetΒΆ

The protein group quantification matrix provides protein-level quantification across all samples. It contains one row per protein group and one column per sample.

Important: This matrix contains only protein groups with valid quantification values. The number of non-zero entries per sample may be slightly lower (~0.3-0.8%) than the search.proteins count in stats.tsv, which reports all identified proteins. The difference represents proteins that were identified but could not be quantified due to insufficient fragment data or quality.

peptide.matrix.parquetΒΆ

The peptide quantification matrix provides peptide-level quantification across all samples (when peptide-level LFQ is enabled). It contains one row per peptide and one column per sample.

Important: This matrix contains only peptides with valid quantification values. Peptides that were identified but failed quality filters for LFQ will have missing (NaN) values or may be absent from the matrix entirely.

precursor.matrix.parquetΒΆ

The precursor quantification matrix provides precursor-level quantification across all samples (when precursor-level LFQ is enabled). It contains one row per precursor and one column per sample.

Important: This matrix contains only precursors with valid quantification values. The number of non-zero entries per sample will be lower (~3-4%) than the search.precursors count in stats.tsv. The difference represents precursors that were identified but failed quantification quality filters such as:

  • Poor fragment quality or correlation (below min_correlation threshold)

  • Insufficient fragments (fewer than min_k_fragments)

  • Insufficient non-missing values (below min_nonnan threshold for directLFQ)

This is expected behavior and reflects the distinction between identification (passing FDR) and quantification (passing additional quality requirements).

internal.tsvΒΆ

Internal statistics and timing information from the search process.

speclib.hdfΒΆ

The input spectral library as it was loaded and preprocessed. This includes isotope calculation, library prediction, retention time normalization, and FASTA annotation.

speclib.mbr.hdfΒΆ

The match-between-runs (MBR) output library containing all precursors which were identified in the search for second-step quantification.

  • All high-confidence precursor identifications from the search

  • Empirically optimized retention times and ion mobilities

  • Only created when general.save_mbr_library: true in the configuration

speclib.transfer.hdfΒΆ

The transfer learning library containing training data for sample-specific model refinement.

  • High-confidence precursor identifications

  • All requested fragment types (not just top fragments)

  • Observed intensities (not predicted values)

  • Empirically measured retention times and ion mobilities

quant/ folderΒΆ

The quant/ folder contains per-file quantification results used for checkpointing and distributed searches.

Structure:

quant/
β”œβ”€β”€ <raw_file_1>/
β”‚   β”œβ”€β”€ psm.parquet      # PSM-level data for the raw file
β”‚   └── frag.parquet     # Fragment-level data for the raw file
β”œβ”€β”€ <raw_file_2>/
β”‚   β”œβ”€β”€ psm.parquet
β”‚   └── frag.parquet
└── ...

If the files psm.parquet and frag.parquet are available in the quant/ folder, these values will be reused when reuse_quant is enabled in the configuration. This allows for efficient re-analysis without re-extracting quantification data from raw files.

See the restarting documentation for more details on using the --quant-dir parameter and reuse_quant configuration.