Skip to content

Sample_ID validation #125

@nathanweeks

Description

@nathanweeks

The Illumina Sequencing Sample Sheet Format Specifications document cited in the sample-sheet code:

# From the section "Character Encoding" in the Illumina format specification.
#
# https://www.illumina.com/content/dam/illumina-marketing/
# documents/products/technotes/
# sequencing-sheet-format-specifications-technical-note-970-2017-004.pdf

explicitly mentions additional restrictions on Sample_ID column values:

The field for the Sample_ID column has special character restrictions as only alphanumeric (ASCII codes 48-57, 65-90, and 97-122), dash (ASCII code 45), and underscore (ASCII code 95) are permitted. The Sample_ID length is limited to 100 characters maximum.

The sample_sheet validation code currently allows some invalid Sample_ID values (e.g., containing +) that some tools (like bcl2fastq) reject. Could the sample_sheet validation code be enhanced to detect Sample_IDs that don't conform to the Illumina spec?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions