-
Notifications
You must be signed in to change notification settings - Fork 16
Description
The Illumina Sequencing Sample Sheet Format Specifications document cited in the sample-sheet code:
sample-sheet/sample_sheet/__init__.py
Lines 58 to 62 in 06d2566
| # From the section "Character Encoding" in the Illumina format specification. | |
| # | |
| # https://www.illumina.com/content/dam/illumina-marketing/ | |
| # documents/products/technotes/ | |
| # sequencing-sheet-format-specifications-technical-note-970-2017-004.pdf |
explicitly mentions additional restrictions on Sample_ID column values:
The field for the Sample_ID column has special character restrictions as only alphanumeric (ASCII codes 48-57, 65-90, and 97-122), dash (ASCII code 45), and underscore (ASCII code 95) are permitted. The Sample_ID length is limited to 100 characters maximum.
The sample_sheet validation code currently allows some invalid Sample_ID values (e.g., containing +) that some tools (like bcl2fastq) reject. Could the sample_sheet validation code be enhanced to detect Sample_IDs that don't conform to the Illumina spec?