Update schema multi-cmd
Update schema
Add commands schema
step_input
parameter) allowing users to create any dependency between tasks.LabxPipe provides a unique lxpipe
command with multiples sub-commands. Running a pipeline would typically involve using these sub-commands:
The output of multiple pipelines executed using lxpipe run
can be combined to merge gene counts or create profiles and trackhubs with the following sub-commands:
See examples to understand how each sub-command works.
See JSON files in config/pipelines
of this repository.
Pipeline JSON file | |
---|---|
mrna_seq.json |
mRNA-seq. |
mrna_seq_profiling_bam.json |
mRNA-seq. Genomic coverage profiles using GeneAbacus. BAM and SAM outputs. |
mrna_seq_no_db.json |
mRNA-seq. No LabxDB. |
mrna_seq_with_plotting.json |
mRNA-seq. Plotting non-mapped reads. Demonstrate step_input . |
mrna_seq_cufflinks.json |
mRNA-seq. Replaces GeneAbacus by Cufflinks. |
chip_seq.json |
ChIP-seq. Bowtie2 and Samtools to uniquify reads. |
chip_seq_user_function.json |
ChIP-seq. Bowtie2 and Samtools to uniquify reads. Genomic coverage profiles using GeneAbacus. Peak-calling using MACS3 employing a user-defined step/function. |
Following demonstrates how to apply mrna_seq.json
pipeline. It requires:
AGR000850
and AGR000912
/plus/data/seq/by_run/AGR000850
├── 23_009_R1.fastq.zst
└── 23_009_R2.fastq.zst
/plus/data/seq/by_run/AGR000912
├── 65_009_R1.fastq.zst
└── 65_009_R2.fastq.zst
Note: mrna_seq_no_db.json
demonstrates how to use LabxPipe without LabxDB: it only requires FASTQ files (in path_seq_run
directory, see above).
Requirements:
mrna_seq_no_db.json
doesn't require LabxDB.path_star_index
.Start pipeline:
lxpipe run --pipeline mrna_seq.json \
--worker 2 \
--processor 16
Output is written in path_output
directory.
Create report:
lxpipe report --pipeline mrna_seq.json
Report file mrna_seq.xlsx
should be created in same directory as mrna_seq.json
.
Extract output file(s) to use them directly, for instance to load them in IGV. For example:
lxpipe extract --pipeline mrna_seq.json \
--files aligning,accepted_hits.sam.zst \
--label
lxpipe extract --pipeline mrna_seq.json \
--files profiling,genome_plus.bw \
--label \
--reference \
--suffix
-d
/--dry_run
to test the extract command before applying it.Merge gene/mRNA counts generated by GeneAbacus in counting
directory:
lxpipe merge-count --pipeline mrna_seq.json \
--step counting
Create a trackhub. Requirements:
Execute in a separate directory:
lxpipe trackhub --runs AGR000850,AGR000912 \
--species_ucsc danRer11 \
--path_genome /plus/scratch/sai/annots/danrer_genome_all_ensembl_grcz11_ucsc_chroms_chrom_length.tab \
--path_mapping /plus/scratch/sai/annots/ChromosomeMappings/GRCz11_ensembl2UCSC.txt \
--input_sam \
--bam_names accepted_hits.sam.zst \
--make_config \
--make_trackhub \
--make_bigwig \
--processor 16
Directory is ready to be shared by a web server for display in the UCSC genome browser.
Parameters can be defined globally. See in config
directory of this repository for examples.
Parameters are defined first globally (see above), then per pipeline, then per replicate/run, and then per step/function. The latest definition takes precedence: path_seq_run
defined in /etc/hts/labxpipe.json
is used by default, but if path_seq_run
is defined in the pipeline file, it will be used instead.
Main parameters
Parameter | Type |
---|---|
name | string |
path_output | string |
path_seq_run | string |
path_local_steps | string |
path_annots | string |
path_bowtie2_index | string |
path_bwa-mem2_index | string |
path_minimap2_index | string |
path_star_index | string |
fastq_exts | []strings |
adaptors | {} |
logging_level | string |
run_refs | []strings |
replicate_refs | []strings |
ref_info_source | []strings |
ref_infos | {} |
analysis | [{}, {}, ...] |
Parameters for all steps
Parameter | Type |
---|---|
step_name | string |
step_function | string |
step_desc | string |
force | boolean |
Step-specific parameters
Step | Synonym | Parameter | Type |
---|---|---|---|
readknead | preparing | options | []strings |
ops_r1 | [{}, {}, ...] | ||
ops_r2 | [{}, {}, ...] | ||
plot_fastq_in | boolean | ||
plot_fastq | boolean | ||
fastq_out | boolean | ||
zip_fastq_out | string | ||
bowtie2 | genomic_aligning | options | []strings |
index | string | ||
output | string | ||
output_unfiltered | string | ||
compress_sam | boolean | ||
compress_sam_cmd | string | ||
create_bam◆ | boolean | ||
index_bam◆ | boolean | ||
bwa-mem2 | options | []strings | |
index | string | ||
output | string | ||
compress_output | boolean | ||
compress_output_cmd | string | ||
create_bam◆ | boolean | ||
index_bam◆ | boolean | ||
minimap2 | options | []strings | |
index | string | ||
output | string | ||
compress_output | boolean | ||
compress_output_cmd | string | ||
create_bam◆ | boolean | ||
index_bam◆ | boolean | ||
star | aligning | options | []strings |
index | string | ||
output_type | []strings | ||
compress_sam | boolean | ||
compress_sam_cmd | string | ||
compress_unmapped | boolean | ||
compress_unmapped_cmd | string | ||
cufflinks | options | []strings | |
inputs | [{}, {}, ...] | ||
features | [{}, {}, ...] | ||
geneabacus | counting | options | []strings |
inputs | [{}, {}, ...] | ||
path_annots | string | ||
features | [{}, {}, ...] | ||
samtools_sort | options | []strings | |
sort_by_name_bam | boolean | ||
samtools_uniquify | options | []strings | |
sort_by_name_bam | boolean | ||
index_bam | boolean | ||
cleaning | steps | [{}, {}, ...] |
◆ indicates exclusive options. For example, either create_bam
or index_bam
can be used, but not both.
Sample-specific parameters. Automatically populated if using LabxDB or sourced from ref_infos
. These parameters can be changed manually in any step (for example setting paired
to false
will ignore second reads in that step).
Parameter | Type |
---|---|
label_short | string |
paired | boolean |
directional | boolean |
r1_strand | string |
quality_scores | string |
In addition to the provided steps/functions, i.e. bowtie2
, star
or geneabacus
, users can defined their own step, usable in the LabxPipe pipelines. LabxPipe will import user-defined steps:
Written in Python
One step per file with the .py
extension located in the directory defined by path_local_steps
Each step defined in individual file requires:
functions
variable listing the step name(s)run
with the 3 parameters path_in
, path_out
and params
For example:
functions = ['macs3']
def run(path_in, path_out, params):
...
Example of a user-defined function providing peak-calling using MACS3 is available in config/user_steps/macs3.py
in this repository.
Example of a pipeline using the MACS3 step is available in config/pipelines/chip_seq_user_function.json
in this repository.
lxpipe demultiplex
Demultiplex reads based on barcode sequences from the Second barcode
field in LabxDB
Demultiplexing using ReadKnead. The most important for demultiplexing is the ReadKnead pipeline. Pipelines are identified using the Adapter 3'
field in LabxDB.
Example for simple demultiplexing. The first nucleotides at the 5' end of read 1 are used as barcodes (the Adapter 3'
field is set to sRNA 1.5
in LabxDB for these samples) with the following pipeline:
{
"sRNA 1.5": {
"R1": [
{
"name": "demultiplex",
"end": 5,
"max_mismatch": 1
}
],
"R2": null
}
}
The barcode sequences are added by LabxPipe using the Second barcode
field in LabxDB.
Example for iCLIP demultiplexing. In Vejnar et al., iCLIP is demultiplexed (the Adapter 3'
field is set to TruSeq-DMS+A Index
in LabxDB for these samples) using the following pipeline:
{
"TruSeq-DMS+A Index": {
"R1": [
{
"name": "clip",
"end": 5,
"length": 4,
"add_clipped": true
},
{
"name": "trim",
"end": 3,
"algo": "bktrim",
"min_sequence": 5,
"keep": ["trim_exact", "trim_align"]
},
{
"name": "length",
"min_length": 6
},
{
"name": "demultiplex",
"end": 3,
"max_mismatch": 1,
"length_ligand": 2
},
{
"name": "length",
"min_length": 15
}
],
"R2": null
}
}
Pipeline is stored in demux_truseq_dms_a.json
. The barcode sequences are added by LabxPipe using the Second barcode
field in LabxDB. (NB: published demultiplexed data were generated using "algo": "align"
with a minimum score of 80 instead of "algo": "bktrim"
)
Then pipeline was tested running:
lxpipe demultiplex --bulk HHYLKADXX \
--path_demux_ops demux_truseq_dms_a.json \
--path_seq_prepared prepared \
--demux_nozip \
--processor 1 \
--demux_verbose_level 20 \
--no_readonly
This output is very verbose: for every read, output from every step of the demultiplexing pipeline is reported. To get consistent output, --processor
must be set to 1
. Output is written in local directory prepared
.
And finally, once pipeline is validated (data is written in path_seq_prepared
directory, see here):
lxpipe demultiplex --bulk HHYLKADXX \
--path_demux_ops demux_truseq_dms_a.json \
--processor 10
LabxPipe is distributed under the Mozilla Public License Version 2.0 (see /LICENSE).
Copyright © 2013-2023 Charles E. Vejnar