Skip to content

Releases: PacificBiosciences/HiPhase

HiPhase v1.6.0

20 Jan 20:28
74dbc7e

Choose a tag to compare

Changes

  • Added support for RNA sequencing dataset, which required or were facilitated by the following changes
  • Changed the quality values of local re-alignment to be fixed values that are half of those from global re-alignment, removing local base qualities from consideration. In aggregate, this had a negligible/slightly beneficial impact on accuracy.
  • Reduced run-time of high-coverage blocks by collapsing identical allele assignments (reads) with a corresponding weight
  • Replaced the linear post-phasing block breaker with a non-linear version
    • Added a new parameter, --min-connecting-reads, that controls the minimum number of reads that are required to connect two variants. The default is 1, which reflects the behavior of the previous linear version. Increasing this value can substantially reduce switchflips errors at the cost of shorter phase blocks.
    • This non-linear approach can create overlapping phase blocks, but each variant is only assigned to a single block. This is particularly common with RNA-seq datasets which may contain overlapping mapping in disjoint phase blocks.
    • This approach slightly reduces errors in WGS samples. In RNA samples, this substantially reduces switchflip errors cause by erroneous block joints, especially with higher --min-connecting-reads values.
  • Added a new option, --optimize-variant-order, which enables a non-linear search traversal which is beneficial for RNA-seq reads
  • Added a new option, --preset, which sets and overrides multiple CLI options at once. Added an RNA-seq preset, which is described in the user guide.
  • Multi-threading has been refactored to better distribute work across threads for the broader range of input types

HiPhase v1.5.0

27 Mar 14:29
926c925

Choose a tag to compare

Changes

  • Reduced memory footprint for large putative phase blocks by adjusting algorithms for storing read segment variants. Overall, this change has a negligible impact on memory usage for a typical human WGS dataset. However, sample types with much higher heterozygous variants per phase block (e.g., mouse) have significantly less peak memory usage (>80% reduction on tests).
  • Minor tweaks to phasing methods to reduce overhead of tracking candidate phase solutions
  • Internal results are highly similar, but not identical, to results for v1.4.5

HiPhase v1.4.5

25 Sep 13:10
d39543a

Choose a tag to compare

Fixed

  • Fixed an error where BAM phase tags were not always properly removed prior to re-tagging, leading to a run-time error and exit

HiPhase v1.4.4

15 Aug 18:25
a80258e

Choose a tag to compare

Fixed

  • Fixed an error where phasing information that was present in input files would be copied through to output files if it was not overwritten by HiPhase phasing results. HiPhase will now automatically remove this phasing information to prevent accidental mixing of phase results.
    • For VCF files, any unphased genotypes will be switched to unphased and sorted by allele index (e.g. 1|0 -> 0/1). The "FORMAT:PS" and "FORMAT:PF" tags will either be removed entirely if the whole record is unphased or set to "." for partially phased records.
    • For BAM files, the "HP" and "PS" tags will be removed for any unphased records.

HiPhase v1.4.3

08 Aug 19:05
52ab5c0

Choose a tag to compare

Fixed

  • Replaced a panic caused by a chromosome appearing in a VCF but not in the BAM file with a more descriptive error message
  • Fixed an error caused by a multi-sample VCF with a mixture of haploid and diploid genotypes

HiPhase v1.4.2

07 May 13:07
9b01d7e

Choose a tag to compare

Changes

  • Removes a 1 basepair shift from tandem repeat region calculation to support anchor base changes in TRGT v1.0.0; internal results are nearly identical before and after this change

HiPhase v1.4.1

01 May 21:10
ad1539f

Choose a tag to compare

Changes

  • Reclassifies warnings during VCF writing to debug (Resolves #32)
  • Adds a section to the quickstart guide on resource requirements

HiPhase v1.4.0

14 Feb 18:25
99b92ac

Choose a tag to compare

Changes

  • Major changes to dual-mode allele assignment: Prior to this version, global realignment would revert to local realignment if the CPU cost (in seconds) exceeded a user provided threshold. While this was useful for fast-tracking noisy phase blocks, it could lead to non-deterministic output as CPU costs can vary. The thresholding has been reworked such that global realignment will revert to local realignment for an individual mapping if the edit distance exceeds a user provided threshold (default: 500). Additionally, global realignment will revert to local realignment for the remainder of a putative phase block if too many reads have reverted to local realignment (default: 50%, minimum number of failures: 50 mappings). This has the following downstream impact on results:
    • All results from HiPhase are fully deterministic from run to run.
    • Baseline quality scores for local realignment have been adjusted to scale at the same relative ratios as those from global realignment.
      • When running HiPhase on only small variants (e.g., local realignment mode only), this tended to slightly increase the number of switch flip errors relative to v1.3.0.
      • When running HiPhase on small, structural, and tandem repeat variants (recommended), we observed a small decrease in switch flip errors relative to v1.3.0.
    • Relative to v1.3.0, we observed reduced run-time costs for all tests (~25% reduction in both CPU time and wall-clock time, on average).
    • The number of mappings processed through global/local realignment are now tracked in the --stats-file.
  • Global realignment is now on by default, reflecting our overall recommended usage of HiPhase. This can be disabled with the --disable-global-realignment option.
  • CLI changes: The CLI has been updated to reflect the above algorithmic changes. These new CLI options have been added to reflect the changes:
    • --disable-global-realignment - This option will disable all global realignments; it is recommended if only small variant files are available for phasing
    • --global-realignment-max-ed <DISTANCE> - Controls the maximum allowed edit distance before reverting an individual mapping to local realignment (default: 500)
    • --max-global-failure-ratio <FRAC> - Controls the maximum allowed failure rates for global realignment before reverting the rest of the phase block to local realignment (default: 50%)
    • --global-failure-count <COUNT> - Controls the minimum number of failures required before the failure rate check is enabled (default: 50)
    • --global-realignment-cputime <SECONDS> - Deprecated, this option is now hidden on the CLI. It will produce a warning if used but has no impact on the downstream results.

HiPhase v1.3.0

02 Feb 14:25
94118a4

Choose a tag to compare

Changes

  • Relaxes the requirements for SV deletion and insertion events such that they no longer require an alternate or reference allele, respectively, to have length 1

Internal changes

  • The interface for variant creation was modified to reduce panics from invalid variant construction. This modification changes all the return types for the various Variant::new*(...) functions from Variant to Result<Variant, VariantError>.

Fixed

  • SV events with a placeholder ALT sequence (e.g., <DEL>, <INS>) are now properly ignored by HiPhase instead of creating an error.

HiPhase v1.2.1

26 Jan 18:37
dff3c47

Choose a tag to compare

Fixed

  • Fixed a rare issue where reference alleles with stripped IUPAC codes were throwing errors due to reference mismatch
  • Fixed an issue where variants preceding a GraphWFA region were not ignored, potentially leading to aberrant graph structure