Skip to content

Implement common schemas #18

@jagedn

Description

@jagedn

Following #11 one of the most "difficult" part of the plugin is create and maintain Java record for schemas

Maybe can be interesting the plugin can offer some common schemas out-the-box as for example FastQRecord (and also another more generics records as StringMap)

for example, a pipeline to convert from raw Fasta text to parquet:

include { toParquet; toFastq } from 'plugin/nf-parquet'

workflow {
    channel.fromPath('data/HI.4549.004.index_10.ANN0830_R2.fastq')
        .splitFastq( record: true )
        .map{ record ->
            toFastq(record)
        }
        .toParquet('data/HI.4549.004.index_10.ANN0830_R2.parquet', ['schema':'fastq'])
        .view{ record -> record }

}

here, the pipeline is using splitFastq to read a fastq file, convert the Map version of Nextflow into an internal Java Record , and write the Java records to a parquet file using schema instead of record parameter

schema can be one of:

  • stringMap
  • fastq
  • fasta
  • ....

Custom/propietary formats will be out the scope of this issue and will required the current approach

@Hoohm , what do you think? can cover your use case ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions