Fix issues with MaxQuant and Fragpipe tables by ammarcsj · Pull Request #88 · MannLabs/alphaquant

ammarcsj · 2025-03-14T11:54:42Z

Addressing issue #85

Changed the intable config for the GUI back to the default version also used in the python version
Update config for MaxQuant evidence.txt
Extend sample name extraction in GUI to parquet files
Extend sample name extraction in GUI to wideformat tables (MaxQuant peptides.txt and fragpipe)
Disable ML for wideformat tables
Add quicktests that run short AlphaQuant runs on different MQ, Fragpipe, DIA-NN, Spectronaut

github-actions · 2025-03-14T14:20:00Z

The following feedback could not be added to specific lines, but still contains valuable information:

Looking at the code changes, I've identified several issues that need to be addressed:

[
  {
    "change_id": 1,
    "file_name": "./alphaquant/diffquant/condpair_analysis.py",
    "start_line": 237,
    "end_line": 238,
    "proposed_code": "def check_if_has_precursor_nodes(condpair_node):\n    try:\n        return condpair_node.children[0].children[0].children[0].children[0].type == "mod_seq_charge"\n    except:\n        return False",
    "comment": "The original code doesn't handle cases where the node structure might not have the expected depth. This can cause IndexError exceptions when processing certain data types. The try-except block provides a safer way to check for precursor nodes."
  },
  {
    "change_id": 2,
    "file_name": "./alphaquant/config/quant_reader_config.yaml",
    "start_line": 145,
    "end_line": 154,
    "proposed_code": "maxquant_evidence:\n  format: longtable\n  sample_ID: Experiment\n  quant_ID:\n    precursor: Intensity\n  protein_cols:\n   - Gene names\n  ion_hierarchy:\n    precursor:\n      order: [SEQ, MOD, CHARGE]\n      mapping:\n        SEQ:\n          - Sequence\n        MOD:\n          - Mass\n        CHARGE:\n          - Charge\n  filters:\n    reverse:\n      param: Reverse\n      comparator: "!="\n      value: "+"\n    contaminant:\n      param: Potential contaminant\n      comparator: "!="\n      value: "+"\n  ml_level: SEQ\n  use_iontree: False",
    "comment": "The MaxQuant evidence configuration was updated to use a hierarchical structure for ion representation rather than a flat list of ion_cols. This improves compatibility with the AlphaQuant pipeline and provides better support for proper precursor quantification. The filters for reverse and contaminant proteins were also added to improve data quality."
  },
  {
    "change_id": 3,
    "file_name": "./alphaquant/run_pipeline.py",
    "start_line": 124,
    "end_line": 125,
    "proposed_code": "    input_type, config_dict, _ = config_dict_loader.get_input_type_and_config_dict(input_file_original, input_type_to_use)\n    annotation_file = load_annotation_file(input_file_original, input_type, annotation_columns)\n    use_ml = check_if_table_supports_ml(config_dict)",
    "comment": "The original code doesn't capture the config_dict from the return value, which is needed later to check if ML is supported. Also, a new use_ml variable is introduced to track whether ML should be used for the current input type."
  },
  {
    "change_id": 4,
    "file_name": "./alphaquant/run_pipeline.py",
    "start_line": 240,
    "end_line": 240,
    "proposed_code": "def check_if_table_supports_ml(config_dict):\n    return config_dict["format"] == "longtable"",
    "comment": "Added a helper function to check if machine learning is supported for the current input format. Currently, ML is only supported for longtable formats."
  },
  {
    "change_id": 5,
    "file_name": "./alphaquant/run_pipeline.py",
    "start_line": 130,
    "end_line": 136,
    "proposed_code": "        input_file_reformat = load_ptm_input_file(input_file = input_file_original, input_type_to_use = "spectronaut_ptm_fragion", results_dir = results_dir, samplemap_df = samplemap_df, modification_type = modification_type, organism = organism)\n        if use_ml:\n            ml_input_file = load_ml_info_file(input_file_original, input_type, modification_type)\n\n    elif "fragment_precursorfiltered.matrix" in input_file_original:\n        alphadia_tableprocessor = aq_table_alphadiareader.AlphaDIAFragTableProcessor(input_file_original)\n        input_file_reformat = alphadia_tableprocessor.input_file_reformat\n        if use_ml:\n            ml_input_file = alphadia_tableprocessor.ml_info_file",
    "comment": "Modified the code to check if ML is supported before loading ML information files. This prevents errors when trying to use ML with formats that don't support it."
  },
  {
    "change_id": 6,
    "file_name": "./alphaquant/run_pipeline.py",
    "start_line": 142,
    "end_line": 143,
    "proposed_code": "        input_file_reformat = load_input_file(input_file_original, input_type)\n        if use_ml:\n            ml_input_file = load_ml_info_file(input_file_original, input_type)",
    "comment": "Added the same ML support check in the general case, ensuring ML info is only loaded when the format supports it."
  },
  {
    "change_id": 7,
    "file_name": "./alphaquant/utils/reader_utils.py",
    "start_line": 7,
    "end_line": 10,
    "proposed_code": "def read_file(file_path, decimal=".", usecols=None, chunksize=None, sep=None, nrows=None):\n    file_path = str(file_path)\n    if ".parquet" in file_path:\n        if nrows is not None:\n            LOGGER.warning(f"nrows parameter is set, but not supported for parquet files. Ignoring nrows parameter.")\n        return _read_parquet_file(file_path, usecols=usecols, chunksize=chunksize)",
    "comment": "Added 'nrows' parameter to the read_file function to allow reading only a subset of rows. This is helpful for examining large files or for testing. The function also adds a warning when nrows is used with parquet files since that format doesn't directly support it."
  },
  {
    "change_id": 8,
    "file_name": "./alphaquant/utils/reader_utils.py",
    "start_line": 28,
    "end_line": 30,
    "proposed_code": "            usecols=usecols,\n            encoding="latin1",\n            chunksize=chunksize,\n            nrows=nrows,\n        )",
    "comment": "Added the nrows parameter to the pandas.read_csv call to enable partial file reading."
  },
  {
    "change_id": 9,
    "file_name": "./alphaquant/ui/dashboard_parts_run_pipeline.py",
    "start_line": 886,
    "end_line": 903,
    "proposed_code": "				input_file = self.path_analysis_file.value\n				_, config_dict, sep = config_dict_loader.get_input_type_and_config_dict(input_file)\n				if config_dict["format"] == "longtable":\n					sample_column = config_dict["sample_ID"]\n					sample_names = set()\n\n					for chunk in aq_reader_utils.read_file(input_file, sep=sep, usecols=[sample_column], chunksize=400000):\n						sample_names.update(chunk[sample_column].unique())\n					self.sample_names = sample_names\n				elif config_dict["format"] == "widetable":\n					# Read the headers first to identify sample columns\n					headers = aq_reader_utils.read_file(input_file, sep=sep, nrows=0).columns.tolist()\n\n					quant_pre_or_suffix = config_dict.get("quant_pre_or_suffix")\n					# Filter headers to find those with the prefix or suffix\n					sample_columns = [\n						col for col in headers if (\n							col.startswith(quant_pre_or_suffix) or\n							col.endswith(quant_pre_or_suffix)\n						)\n					]\n					self.sample_names = set([col.replace(quant_pre_or_suffix, "") for col in sample_columns])\n				else:\n					print("ERROR: Could not idenfity sample names in the input file.")\n					self.run_pipeline_error.object = "Could not idenfity sample names . Please check your input file."\n					self.run_pipeline_error.visible = True",
    "comment": "Added support for wide format tables in the UI sample detection code. Previously it only worked with longtable formats. The code now checks the format type and uses different methods to extract sample names based on the table format."
  }
]

github-actions · 2025-03-14T14:20:02Z

Number of tokens: input_tokens=42831 output_tokens=2417 max_tokens=4096
review_instructions=''
config={}
thinking: ```
[]

mschwoer

apparently, the code-review but also prefers ruff-formatted code :-p

alphaquant/ui/dashboard_parts_run_pipeline.py

alphaquant/run_pipeline.py

ammarcsj added 10 commits March 13, 2025 16:48

change intable config

9018758

parse samplenames from wide format tables

ae28065

adapt sample reading to .parquet

a40428d

add nrows param to parquet reader

cda5f2c

set ml to false if wideformat table

bbe29dc

handle cases when there are not precursor nodes

dbc7bd8

update maxquant evidence config

ec63518

run small testsets for different search engines

86000b7

add different input file test

945020d

change datashare folder

b1cc25d

ammarcsj marked this pull request as ready for review March 14, 2025 14:12

ammarcsj requested a review from mschwoer March 14, 2025 14:13

ammarcsj added the code-review label Mar 14, 2025

mschwoer approved these changes Mar 14, 2025

View reviewed changes

alphaquant/ui/dashboard_parts_run_pipeline.py Show resolved Hide resolved

alphaquant/run_pipeline.py Show resolved Hide resolved

ammarcsj merged commit 7c67554 into main Mar 17, 2025
5 checks passed

ammarcsj deleted the investigate_mq_fragpipe_issues branch March 17, 2025 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issues with MaxQuant and Fragpipe tables#88

Fix issues with MaxQuant and Fragpipe tables#88
ammarcsj merged 10 commits intomainfrom
investigate_mq_fragpipe_issues

ammarcsj commented Mar 14, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Mar 14, 2025

Uh oh!

github-actions bot commented Mar 14, 2025

Uh oh!

mschwoer left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ammarcsj commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 14, 2025

Uh oh!

github-actions bot commented Mar 14, 2025

Uh oh!

mschwoer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ammarcsj commented Mar 14, 2025 •

edited

Loading