-
Notifications
You must be signed in to change notification settings - Fork 504
Add deacon and deacon DM #7473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add deacon and deacon DM #7473
Changes from 7 commits
130daa1
0e1bba1
438cb98
aae6216
6e9014f
4e11579
8474b57
a7fc57f
19ddd38
f0ca31f
18b9283
8eff8c0
57dfe9b
142e8e3
42ee2f2
6288d4e
3c9a09d
3a31713
f423773
3e6211d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| categories: | ||
| - Data Managers | ||
| - Metagenomics | ||
| homepage_url: https://github.com/bede/deacon | ||
| description: Data manager for Deacon index files | ||
| long_description: Data manager for Deacon index files | ||
| name: deacon_build_database | ||
| owner: iuc | ||
| remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/data_managers/data_manager_deacon | ||
| type: unrestricted |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,149 @@ | ||
| <tool id="deacon_build_database" name="Deacon" tool_type="manage_data" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@"> | ||
| <description>database builder</description> | ||
| <macros> | ||
| <!-- on update run a local test setting `test` to something else than "true" --> | ||
| <token name="@TOOL_VERSION@">0.12.0</token> | ||
| <token name="@VERSION_SUFFIX@">0</token> | ||
| <token name="@PROFILE@">24.1</token> | ||
| </macros> | ||
| <requirements> | ||
| <requirement type="package" version="@TOOL_VERSION@">deacon</requirement> | ||
| </requirements> | ||
| <command detect_errors="exit_code"><![CDATA[ | ||
| mkdir -p '$out_file.extra_files_path' && | ||
| #if $test != "true" | ||
| #if $input.is_select == "prebuild" | ||
| #if $download == "human" | ||
| wget -P '$out_file.extra_files_path' 'https://zenodo.org/records/17288185/files/panhuman-1.k31w15.idx' && | ||
| #else | ||
| wget -P '$out_file.extra_files_path' 'https://objectstorage.uk-london-1.oraclecloud.com/n/lrbvkel2wjot/b/human-genome-bucket/o/deacon/3/panmouse-1.k31w15.idx' && | ||
| #end if | ||
| #else | ||
| wget -P '$out_file.extra_files_path' '$link' && | ||
| #end if | ||
| #else | ||
| touch '$out_file.extra_files_path'/test.idx && | ||
| #end if | ||
| cp '$dmjson' '$out_file' | ||
| ]]></command> | ||
| <configfiles> | ||
| <configfile name="dmjson"><![CDATA[ | ||
| #from datetime import datetime | ||
| #set time=datetime.now().strftime("%Y-%m-%d") | ||
| { | ||
| "data_tables":{ | ||
| "deacon":[ | ||
| { | ||
| #if $input.is_select == "prebuild" | ||
| #if $download == "human" | ||
| "path":"panhuman-1.k31w15.idx", | ||
| #else | ||
| "path":"panmouse-1.k31w15.idx", | ||
| #end if | ||
| #else | ||
| "path":"$link.strip('/')[-1]", | ||
SantaMcCloud marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| #end if | ||
| #if $input.is_select == "prebuild" | ||
| "dbkey":"@TOOL_VERSION@", | ||
|
||
| #else | ||
| "dbkey":"custom $time", | ||
| #end if | ||
| #if $input.is_select == "prebuild" | ||
| #if $download == "human" | ||
| "name":"panhuman-1 (k=31, w=15)", | ||
| #else | ||
| "name":"panmouse-1 (k=31, w=15, e=0.5)", | ||
| #end if | ||
| #else | ||
| "name":"$name", | ||
| #end if | ||
| #if $input.is_select == "prebuild" | ||
| "version":"@TOOL_VERSION@", | ||
| #else | ||
| "version":"$version", | ||
| #end if | ||
| #if $input.is_select == "prebuild" | ||
| "value":"pre-build $time", | ||
| #else | ||
| "value":"custom $time", | ||
| #end if | ||
| #if $input.is_select == "prebuild" | ||
| "format_version":"3", | ||
| #else | ||
| "format_version":"$format_version", | ||
| #end if | ||
| #if $input.is_select == "prebuild" | ||
| "note":"Pre-build index files from the devs of deacon" | ||
| #else | ||
| "note":"$note" | ||
| #end if | ||
| } | ||
| ] | ||
| } | ||
| }]]> | ||
| </configfile> | ||
| </configfiles> | ||
| <inputs> | ||
| <conditional name="input"> | ||
| <param name="is_select" type="select" label="Choose how to add data to the DM"> | ||
| <option value="url">Copy an index file from a URL</option> | ||
SantaMcCloud marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| <option value="prebuild" selected="true">Download a pre-build file</option> | ||
|
||
| </param> | ||
| <when value="prebuild"> | ||
| <param name="download" type="select" label="Select which pre-build should be downloaded" help="See help section for more information"> | ||
| <option value="human">panhuman-1 (k=31, w=15)</option> | ||
| <option value="mouse">panmouse-1 (k=31, w=15, e=0.5)</option> | ||
| </param> | ||
| </when> | ||
| <when value="url"> | ||
| <param name="link" type="text" label="Input the URL to download a index file"/> | ||
| <param name="name" type="text" label="Set the name for the entry in the DM" help="For an example look in the name of the pre-build files. Also add in brackets the values used for building the file!"/> | ||
| <param name="version" type="text" label="Which version was used to build the file to copy" help="State the tool version used to build the index file"/> | ||
| <param name="format_version" type="text" label="Set the index format version" help="The current index format version for Deacon v.@TOOL_VERSION@ is 3"/> | ||
| <param name="note" type="text" label="Add a note" help="Here some notes can be set for example from where the data comes, who created the data and so on"/> | ||
| </when> | ||
| </conditional> | ||
| <param name="test" type="hidden"/> | ||
| </inputs> | ||
| <outputs> | ||
| <data name="out_file" format="data_manager_json" /> | ||
| </outputs> | ||
| <tests> | ||
| <test expect_num_outputs="1"> | ||
| <conditional name="input"> | ||
| <param name="is_select" value="prebuild"/> | ||
| <param name="download" value="human"/> | ||
| </conditional> | ||
| <param name="test" value="true"/> | ||
| <output name="out_file"> | ||
| <assert_contents> | ||
| <has_text text='"format_version":"3"'/> | ||
| <has_text text='"name":"panhuman-1 (k=31, w=15)"'/> | ||
| </assert_contents> | ||
| </output> | ||
| </test> | ||
| <test expect_num_outputs="1"> | ||
| <conditional name="input"> | ||
| <param name="is_select" value="url"/> | ||
| <param name="link" value="https://zenodo.org/records/17288185/files/panhuman-1.k31w15.idx"/> | ||
| <param name="name" value="panhuman-1 (k=31, w=15)"/> | ||
| <param name="version" value="0.12.0"/> | ||
| <param name="format_version" value="3"/> | ||
| <param name="note" value="test"/> | ||
| </conditional> | ||
| <param name="test" value="true"/> | ||
| <output name="out_file"> | ||
| <assert_contents> | ||
| <has_text text='"format_version":"3"'/> | ||
| <has_text text='"name":"panhuman-1 (k=31, w=15)"'/> | ||
| </assert_contents> | ||
| </output> | ||
| </test> | ||
| </tests> | ||
| <help><![CDATA[ | ||
| Download pre-build index files for deacon or download other index files made for deacon via url. | ||
| ]]></help> | ||
| <citations> | ||
| <citation type="doi">10.1101/2025.06.09.658732</citation> | ||
| </citations> | ||
| </tool> | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| <data_managers> | ||
| <data_manager tool_file="data_manager/deacon_datamanager.xml" id="deacon_build_database"> | ||
| <data_table name="deacon"> | ||
| <output> | ||
| <column name="value"/> | ||
| <column name="dbkey"/> | ||
| <column name="name"/> | ||
| <column name="version"/> | ||
| <column name="path" output_ref="out_file"> | ||
| <move type="file"> | ||
| <source>${path}</source> | ||
| <target base="${GALAXY_DATA_MANAGER_DATA_PATH}">deacon/${value}/${path}</target> | ||
| </move> | ||
| <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/deacon/${value}/${path}</value_translation> | ||
| <value_translation type="function">abspath</value_translation> | ||
| </column> | ||
| <column name="format_version"/> | ||
| <column name="note"/> | ||
| </output> | ||
| </data_table> | ||
| </data_manager> | ||
| </data_managers> |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
|
|
||
| db_download_xxxx-xx-xx 0.12.0 deacon human index db 0.12.0 /tmp/tmpf_hplx2a/galaxy-dev/tool-data/deacon/0.12.0/test.idx 3 Testing | ||
| pre-build 2025-11-17 0.12.0 panhuman-1 (k=31, w=15) 0.12.0 /home/sf373/sf373/galaxy/tool-data/deacon/pre-build 2025-11-17/panhuman-1.k31w15.idx 3 Pre-build index files from the devs of deacon | ||
SantaMcCloud marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| custom 2025-11-17 custom 2025-11-17 panhuman-1 (k=31, w=15) 0.12.0 /home/sf373/sf373/galaxy/tool-data/deacon/custom 2025-11-17/https:/zenodo.org/records/17288185/files/panhuman-1.k31w15.idx[-1] 3 test | ||
SantaMcCloud marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| #This is a sample file distributed with Galaxy that enables tools | ||
| #to use a the deacon dabase. | ||
| # | ||
| #<unique_build_id> <dbkey> <display_name> <version> <file_base_path> <index_format_version> <note_like_who_did_create_the_db> | ||
|
|
||
| #The <version> column indicates the deacon version that generated the database | ||
|
|
||
| # | ||
| #deacon_db 0.12.0 Deacon_database 0.12.0 /mnt/galaxyIndices/deacon_database/test.idx 3 just for the test |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| <tables> | ||
| <table name="deacon" comment_char="#" allow_duplicate_entries="False"> | ||
| <columns>value, dbkey, name, version, path, format_version, note</columns> | ||
| <file path="tool-data/deacoon.loc" /> | ||
SantaMcCloud marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| </table> | ||
| </tables> | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| <tables> | ||
| <!-- Location of deacon indexes for testing --> | ||
| <table name="deacon" comment_char="#" allow_duplicate_entries="False"> | ||
| <columns>value, dbkey, name, version, path, format_version, note</columns> | ||
| <file path="${__HERE__}/test-data/deacon.loc" /> | ||
| </table> | ||
| </tables> |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| name: deacon | ||
| owner: iuc | ||
| description: filters DNA sequences in FASTA/Q files and streams using accelerated minimizer comparison | ||
| homepage_url: https://github.com/bede/deacon | ||
| long_description: | | ||
| Filter sequences using accelerated minimizer comparison with query sequence(s), | ||
| emitting either matching sequences (search mode), or sequences without matches (deplete mode). | ||
| Sequences match when they share enough distinct minimizers with the indexed query to exceed chosen | ||
| absolute and relative thresholds. | ||
| remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/deacon | ||
| type: unrestricted | ||
| categories: | ||
| - Metagenomics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do they provide md5 hashes for checking? I'm worried about data corruption.
Also stange that the do not gz?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know but i can ask upstream if they have or if they are willing to upload it to zenodo. And no they dont use gz since the index files are small. The human index file only has around 4GB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asking does not hurt.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there, Deacon author here.
panhuman-1index is deposited on Zenodo, butpanmouse-1is not. I am happy to putpanmouse-1onto Zenodo in a manner consistent with panhuman-1 if desired. Please note that Zenodo downloads are much slower (at least in the UK) than the S3 bucket downloads, which is why Deacon defaults to using object storage withdeacon index fetch <name>, added in 0.13.0There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just .md5 files? The intention is only to verify the integrity of the download.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advantage for us is: guaranteed stable URLs.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we care about Zenodo, to add md5 files to I would need to make new versions of those records. They seem to recommend using the zenodo REST API, which provides checksums for all files like so:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I will put panmouse-1 (and all public indexes) on Zenodo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which other indexes are there? Since we start using the newest version of the tool all public index files with format version 1 and 2 can not be used in this case. If you only do this for galaxy then you can save some work in this case since we only want the format version 3 index files :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool trick.