-
Notifications
You must be signed in to change notification settings - Fork 461
Add test data for nf-core/pacsomatic #1826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jfy133
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
README.md
Outdated
|
|
||
| ### Sample Information | ||
|
|
||
| The test dataset includes paired tumor-normal samples from the Genome in a Bottle (GIAB) reference sample **HG008**: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide a link to the original source files, and a descirption how they were downsampled?
README.md
Outdated
|
|
||
| #### Reference Files (`reference/`) | ||
|
|
||
| - `MT.fa`: Human mitochondrial genome reference sequence (Homo sapiens mitochondrion, complete genome, accession: J01415.2, length: 16,569 bp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you include the a link to the originanal source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original data-HG008 are from GIAB (https://www.nist.gov/programs-projects/cancer-genome-bottle) I downloaded the normal and tumor data from GIAB ftp ( https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data_somatic/HG008/Liss_lab/PacBio_Revio_20240125/ )
README.md
Outdated
| #### Reference Files (`reference/`) | ||
|
|
||
| - `MT.fa`: Human mitochondrial genome reference sequence (Homo sapiens mitochondrion, complete genome, accession: J01415.2, length: 16,569 bp) | ||
| - `MT_target.bed`: BED file defining target regions on the mitochondrial genome for variant calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note if you manually made it
README.md
Outdated
| --outdir results | ||
| ``` | ||
|
|
||
| ## Rationale for Test Data Selection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would place this section with the description of the date itself (it's a bit out of context after you've deescibed how to run the test data)
README.md
Outdated
|
|
||
| #### Sequencing Data (`testdata/`) | ||
|
|
||
| - `HG008_Downsample_MT_tumor.bam`: Downsampled tumor BAM file containing mitochondrial reads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describe how generated (which tool/command with version used for alignment, filtering) etc., to allow reconstruction in an 'emergency'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfy133. It's me responsible for generating the tiny HG008_Downsample_MT_xx.bam. Take the HG008 Normal as example. My procedure are the following:
1.) get the total read number of HG008 Normal sample
samtools view -c -f 4 ${HG008_Normal_hifibam}
2.) calculate the percentage to a 10K reads against total read. , e.g 0.01/66.8=0.00015
3.) Use the above percentage to generate the downsampled 10K unaligned bam.
samtools view -b -s 0.00015 ${HG008_Normal_hifibam} > ./HG008-N_Downsample_10K.bam
4.) Use pbmm2 align the above downsampled unaligned bam to reference hg38, and generate aligned bam, Patient_HG008_Downsample_10K_N.align.bam
5.) Get the MT region specific aligned bam
samtools view -b Patient_HG008_Downsample_10K_N.align.bam MT| samtools sort -@ 16 -T HG008_Downsample_10K_N.tmp -o HG008_Downsample_MT_normal.align.bam
6.). Use Picard to convert the above tiny MT region aligned bam to unaligned bam
java -jar ${PICARD} RevertSam INPUT=HG008_Downsample_MT_normal.align.bam OUTPUT=HG008_Downsample_MT_normal.bam
samtools/1.17, picard/2.9.4 are used for the above command. pbmm2 is from smrttools/25.1.
|
@HaidYi, Have replied as above. |
jfy133
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great thank you both!
Add test data for nf-core/pacsomatic
Summary
This PR adds test data for the nf-core/pacsomatic pipeline, consisting of downsampled mitochondrial genome BAM files from GIAB HG008 (paired tumor-normal samples).
Contents
testdata/)reference/)Checklist