-
Notifications
You must be signed in to change notification settings - Fork 60
Description
Hi,
ariba was running into weird issue while running on vf database:
[E::hts_idx_push] Unsorted positions on sequence # 1: 109 followed by 11
OSError: building of index for /scratch/shadow/tmpr7wt7j_c/ariba_virulencefinder/ariba_virulencefinder/read_store.gz failed
I figured that it was because read_store.gz is incorrectly sorted because one of the genes doesnt have cluster information. I changed read_store.py to sort correctly even with cluster information missing but then it failed in future step:
_init_and_run_clusters reference_names=self.cluster_ids[cluster_name],
KeyError: ''
Obviously, because cluster name was missing. :)
Then I started digging around and made this small test:
mkdir vftest
cd vftest
ariba getref virulencefinder out.virulencefinder
ariba prepareref -f out.virulencefinder.fa -m out.virulencefinder.tsv ./test
cd test
cat 02.cdhit.clusters.tsv | awk '{$1="";print}' | tr " " "\n" | sort | uniq > cluster_file
grep ">" 02.cdhit.all.fa | sed 's/>//g' | sort > all_file
wc -l all_file
wc -l cluster_file
diff cluster_file all_file
Output of the last three lines:
5558 all_file
5554 cluster_file //cluster file contains one empty line in the beginning
1d0 //this is the empty line
< //this is the empty line
718a718
> csnA_4_KJ922517
973a974
> eltIIAB_c8_1_AASRQF010000005
4943a4945
> stx2_122_CP022279_122
5082a5085
> stx2b_O128_24196_97_95_AJ567995_95
5157a5161
> stx2h_O102_STEC299_122_CP022279_122
So the issue is because one or more of those 5 genes (in my case stx2h_O102_STEC299_122_CP022279_122) can be found in my sequencing reads but they are not part of any cluster. Whenever read_store is made, they do not contain any cluster name which fails the script.
ariba version
ARIBA version: 2.14.6
External dependencies:
bowtie2 2.2.5 /srv/data/tools/anaconda3/envs/env_cge_update/bin/bowtie2
cdhit 4.8.1 /srv/data/tools/anaconda3/envs/env_cge_update/bin/cd-hit-est
nucmer 3.1 /srv/data/tools/anaconda3/envs/env_cge_update/bin/nucmer
spades 3.15.5 /srv/data/tools/anaconda3/envs/env_cge_update/bin/spades.py
External dependencies OK: True
Python version:
3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:45:29)
[GCC 10.4.0]
Python packages:
ariba 2.14.6 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/ariba/init.py
bs4 4.11.1 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/bs4/init.py
dendropy 4.5.2 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/dendropy/init.py
pyfastaq 3.17.0 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/pyfastaq/init.py
pymummer 0.11.0 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/pymummer/init.py
pysam 0.18.0 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/pysam/init.py
Python packages OK: True
Everything looks OK: True