Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions tools/hyphy/hyphy_cln.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
<tool id="hyphy_cln" name="HyPhy-CLN" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
<description>Clean and normalize alignment</description>
<macros>
<import>macros.xml</import>
</macros>
<expand macro="bio_tools"/>
<expand macro="requirements"/>
<command detect_errors="exit_code"><![CDATA[
@SYMLINK_FILES_NO_TREE@
@HYPHYMP@ cln
--alignment input.${input_file.extension}
--code '$gencodeid'
--filtering-method '$filtering_method'
--output '$output_file'
]]></command>
<inputs>
<param name="input_file" type="data" format="fasta,fasta.gz,nex,nexus,phylip,mega" label="Input alignment file" help="An in-frame codon alignment in one of the formats supported by HyPhy (FASTA, NEXUS, PHYLIP, or MEGA)" />
<expand macro="gencode"/>
<param argument="--filtering-method" type="select" label="Filter duplicates/gaps" help="How to filter duplicates and gaps">
<option value="No/No" selected="true">Keep all sequences and sites</option>
<option value="No/Yes">Keep all sequences, filter sites with nothing but gaps</option>
<option value="Yes/No">Filter duplicate sequences but keep all sites</option>
<option value="Yes/Yes">Filter duplicate sequences and sites with nothing but gaps</option>
<option value="Disallow stops">Filter duplicate sequences and all sequences that have stop codons</option>
</param>
</inputs>
<outputs>
<data name="output_file" format="fasta" label="${tool.name} on ${on_string}: Cleaned alignment" />
</outputs>
<tests>
<test>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<test>
<test expect_num_outputs="1">

<param name="input_file" value="conv-in1.fa"/>
<param name="filtering_method" value="No/No"/>
<output name="output_file">
<assert_contents>
<has_text text=">epi_isl_1041406_hCoV_19_USA_NY_PRL_2021_02_08_05H12_2021" />
</assert_contents>
</output>
</test>
<test>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<test>
<test expect_num_outputs="1">

<param name="input_file" value="cln-stop-codons.fa"/>
<param name="filtering_method" value="Disallow stops"/>
<output name="output_file">
<assert_contents>
<has_text text=">GoodSeq" />
<has_text text="ATGGCGACC" />
<has_text text=">StopSeq" />
<has_text text="ATG---GCG" />
<not_has_text text=">GoodSeqDup" />
</assert_contents>
</output>
</test>
</tests>
<help><![CDATA[
HyPhy-CLN: Clean and Normalize Alignment
========================================

**What does this tool do?**

This tool reads a sequence alignment and 'normalizes' it by cleaning sequence identifiers, removing duplicates, and/or removing gaps. It ensures the alignment is in a format suitable for other HyPhy analyses.

**Options**

* **Genetic Code**: The genetic code to use for the alignment.
* **Filter duplicates/gaps**:

* **Keep all sequences and sites**: No filtering is performed.
* **Keep all sequences, filter sites with nothing but gaps**: Removes sites (columns) that contain only gaps.
* **Filter duplicate sequences but keep all sites**: Removes duplicate sequences (identical sequences with different names or same names).
* **Filter duplicate sequences and sites with nothing but gaps**: Removes both duplicate sequences and gap-only sites.
* **Filter duplicate sequences and mask sequences that have stop codons**: Removes duplicates and replaces any stop codons with gaps.

**Input**

* A FASTA, NEXUS, PHYLIP, or MEGA sequence alignment.

**Output**

* A cleaned FASTA alignment.

]]></help>
<expand macro="citations" />
</tool>
6 changes: 6 additions & 0 deletions tools/hyphy/test-data/cln-stop-codons.fa
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
>GoodSeq
ATGGCGACC
>StopSeq
ATGTAGGCG
>GoodSeqDup
ATGGCGACC
Loading