Skip to content

Conversation

@HaidYi
Copy link

@HaidYi HaidYi commented Jan 6, 2026

Add test data for nf-core/pacsomatic

Summary

This PR adds test data for the nf-core/pacsomatic pipeline, consisting of downsampled mitochondrial genome BAM files from GIAB HG008 (paired tumor-normal samples).

Contents

  • Test data: Downsampled tumor/normal mitochondrial BAM files (testdata/)
  • Reference files: MT genome FASTA and target BED file (reference/)
  • Samplesheet: Input CSV following pacsomatic schema
  • Documentation: Comprehensive README with dataset details and usage instructions

Checklist

  • Test data is minimal size
  • README follows nf-core standards
  • Files follow naming conventions
  • Samplesheet matches pipeline schema

@HaidYi HaidYi requested review from famosab and jfy133 January 6, 2026 19:57
@HaidYi HaidYi self-assigned this Jan 6, 2026
Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise

Overall really good documenation! I've made some suggestions just to make it really solid :)

README.md Outdated

### Sample Information

The test dataset includes paired tumor-normal samples from the Genome in a Bottle (GIAB) reference sample **HG008**:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide a link to the original source files, and a descirption how they were downsampled?

README.md Outdated

#### Reference Files (`reference/`)

- `MT.fa`: Human mitochondrial genome reference sequence (Homo sapiens mitochondrion, complete genome, accession: J01415.2, length: 16,569 bp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you include the a link to the originanal source?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README.md Outdated
#### Reference Files (`reference/`)

- `MT.fa`: Human mitochondrial genome reference sequence (Homo sapiens mitochondrion, complete genome, accession: J01415.2, length: 16,569 bp)
- `MT_target.bed`: BED file defining target regions on the mitochondrial genome for variant calling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note if you manually made it

README.md Outdated
--outdir results
```

## Rationale for Test Data Selection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would place this section with the description of the date itself (it's a bit out of context after you've deescibed how to run the test data)

README.md Outdated

#### Sequencing Data (`testdata/`)

- `HG008_Downsample_MT_tumor.bam`: Downsampled tumor BAM file containing mitochondrial reads
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe how generated (which tool/command with version used for alignment, filtering) etc., to allow reconstruction in an 'emergency'

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfy133. It's me responsible for generating the tiny HG008_Downsample_MT_xx.bam. Take the HG008 Normal as example. My procedure are the following:
1.) get the total read number of HG008 Normal sample
samtools view -c -f 4 ${HG008_Normal_hifibam}
2.) calculate the percentage to a 10K reads against total read. , e.g 0.01/66.8=0.00015
3.) Use the above percentage to generate the downsampled 10K unaligned bam.
samtools view -b -s 0.00015 ${HG008_Normal_hifibam} > ./HG008-N_Downsample_10K.bam
4.) Use pbmm2 align the above downsampled unaligned bam to reference hg38, and generate aligned bam, Patient_HG008_Downsample_10K_N.align.bam
5.) Get the MT region specific aligned bam
samtools view -b Patient_HG008_Downsample_10K_N.align.bam MT| samtools sort -@ 16 -T HG008_Downsample_10K_N.tmp -o HG008_Downsample_MT_normal.align.bam
6.). Use Picard to convert the above tiny MT region aligned bam to unaligned bam
java -jar ${PICARD} RevertSam INPUT=HG008_Downsample_MT_normal.align.bam OUTPUT=HG008_Downsample_MT_normal.bam
samtools/1.17, picard/2.9.4 are used for the above command. pbmm2 is from smrttools/25.1.

@HaidYi
Copy link
Author

HaidYi commented Jan 7, 2026

@jfy133 Thank you for the valuable comments. @wzhang42 Please check the comments and answer the data generation-related ones.

@wzhang42
Copy link

wzhang42 commented Jan 7, 2026

@HaidYi, Have replied as above.

@HaidYi
Copy link
Author

HaidYi commented Jan 8, 2026

@jfy133 The readme is updated based on the answers of @wzhang42 and your comments.

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thank you both!

@HaidYi HaidYi merged commit 1735bc3 into nf-core:pacsomatic Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants