Nextflow and nf-core on Negishi¶
Nextflow is a workflow manager that lets you describe
complex data analysis pipelines in a portable, reproducible way. nf-core
is a community-curated collection of production-grade Nextflow pipelines for
genomics, transcriptomics, epigenomics, and more, including rnaseq, sarek,
ampliseq, mag, atacseq, and many others.
This guide walks you through installing Nextflow, installing the nf-core tools,
configuring both for SLURM and Apptainer/Singularity on Negishi, and launching
your first pipeline. The same steps work on Bell, Gautschi, and Gilbreth with
minor edits to partition names and resource ceilings.
Quick start¶
If you have used Nextflow before and just want the commands:
Then copy the minimal Negishi config
below, edit the two REQUIRED lines, and launch a pipeline from a compute node:
Read on for the step-by-step version.
Installing Nextflow¶
Nextflow is a single self-contained JAR wrapped in a small shell script. You only need Java 17 or newer to run it.
-
Check your
javaversionYou need Java 17 or newer. If
openjdkis not available or reports an older version, see the FAQ below for installing Java via SDKMAN. -
Download the Nextflow launcher
This downloads the
nextflowlauncher into~/bin/. -
Make it executable and add it to your
PATH -
Verify the installation
Tip
To update Nextflow in the future, run nextflow self-update. Nextflow also
writes per-user state (pipeline cache, work directories for plugins) under
~/.nextflow/. If your home directory is tight on quota, set
export NXF_HOME=$RCAC_SCRATCH/.nextflow in your ~/.bashrc to move it to scratch.
Installing nf-core tools¶
The nf-core Python package gives you a command-line tool for discovering,
downloading, linting, and launching nf-core pipelines. It is separate from
Nextflow itself. You do not strictly need it to run a pipeline, but it makes
the common tasks much easier.
nf-core tools requires Python 3.8 or newer. Negishi's system python3 is
3.6.8 (too old), but loading the conda module gives you conda itself plus a
recent Python (3.12). Use it to create a dedicated conda environment for
nf-core tools.
-
Load the
condamoduleVerify that you have a recent Python on
PATH: -
Create a dedicated environment for nf-core tools
This pulls
nf-corefrom the bioconda channel and resolves all its Python dependencies in one shot. -
Activate the environment and verify
nf-core pipelines listprints every released nf-core pipeline with its latest version and a short description.
Example nf-core pipelines list output (click to expand)
Note
In a new shell, reload the module and then reactivate the environment:
module load conda && conda activate nf-core. You do not need the
environment active to run a pipeline with nextflow run nf-core/<pipeline>;
it is only required for the nf-core CLI itself.
Tip
By default, conda writes environments to $HOME/.conda/envs/, which can
quickly exhaust your home quota. Point envs_dirs at scratch before
creating the environment:
A minimal working config for Negishi¶
Nextflow reads a config file (-c) to learn how to submit jobs, where to cache
containers, and how much CPU, memory, and walltime to request for each process.
nf-core pipelines tag every process with standardized labels
(process_single, process_low, process_medium, process_high,
process_long, process_high_memory). Your job is to map those labels to
sensible Negishi resource requests.
The block below is a starting my_negishi.config that handles SLURM
submission, container setup, and defaults for all standard nf-core process
labels. Save it next to your launch script and pass it with -c my_negishi.config.
Before first use, edit the two lines marked REQUIRED:
config_profile_contact: your Purdue email.clusterOptions: your SLURM account. Runsliston Negishi to see the accounts you belong to.
Note
The process_high_memory label routes jobs to the highmem partition
(1 TB nodes, 1-day walltime cap). This is handled automatically by pipelines
that tag memory-hungry steps with the standard nf-core labels; you do not need
to edit the pipeline to take advantage of it.
Warning
Negishi GPUs are AMD and are not currently useful for most bioinformatics pipelines, which assume CUDA. Do not expect nf-core GPU-tagged processes to run.
Running your first pipeline¶
A good smoke test is nf-core/demo, a tiny pipeline
that only exercises FastQC and MultiQC on a handful of small FASTQ files. It
finishes in a few minutes and confirms that SLURM submission, Apptainer, and
your config are all wired up correctly.
The Nextflow "head" process is long-running; it stays alive for the entire pipeline, submitting and monitoring child jobs. Do not run it on a login node. Submit it as its own low-resource SLURM job.
Submit it from the directory containing my_negishi.config:
When the run finishes, you should see a directory structure like this under
$RCAC_SCRATCH/nf-demo-results/:
Open multiqc_report.html in a browser to confirm the run produced real output.
Tip
Once the demo runs cleanly, swap nf-core/demo for a real pipeline and drop
the test profile. For example:
nextflow run nf-core/rnaseq -profile singularity -c my_negishi.config --input samplesheet.csv --outdir ${RCAC_SCRATCH}/rnaseq --genome GRCh38.
Every nf-core pipeline documents its required parameters on its page at
nf-co.re/pipelines.
Best practices¶
- Run Nextflow from scratch, not home. The
work/directory Nextflow creates holds intermediate files for every task and can grow to hundreds of gigabytes. Launch from$RCAC_SCRATCH/<project>/so it lands on scratch. - Reuse the image cache. Setting
NXF_SINGULARITY_CACHEDIR(orsingularity.cacheDirin the config) to a stable path in scratch means subsequent runs reuse the downloaded container images instead of re-pulling. - Use
-resume. If a run fails partway through, add-resumeto your nextnextflow runcommand and it will pick up from the last successful task instead of restarting from scratch. - Clean up
work/when done. Once you have copied the final results out,rm -rf work/ .nextflow*inside the launch directory. It is disposable. - Pin pipeline versions with
-r <tag>(e.g.-r 3.14.0) so reruns are reproducible.
FAQs¶
What is the difference between Nextflow and nf-core? [click to expand]
Nextflow is the workflow engine: the language and runtime that execute pipelines. nf-core is a community project that publishes a curated set of production-quality Nextflow pipelines (and a Python CLI for working with them). You install Nextflow to run any Nextflow pipeline; you install nf-core tools in addition if you want the discovery, download, and linting helpers.
Can I launch pipelines directly from a login node? [click to expand]
No. The Nextflow head process runs for as long as the pipeline runs (hours or
days), and login nodes are for short interactive work only. Always submit
nextflow run from inside an sbatch script (as shown in the run_demo.sh
example above) or from an sinteractive session on a compute node.
How do I use a different cluster (Bell, Gautschi, Gilbreth)? [click to expand]
Copy my_negishi.config to my_<cluster>.config and edit:
config_profile_description/config_profile_url: cluster name and URL.max_cpus,max_memory,max_time: match the target cluster's largest compute node and walltime limits.process.queue: default partition on that cluster.process.withLabel:process_high_memory { queue = ... }: the cluster's high-memory partition name (if any).
Everything else (SLURM executor, Apptainer, retry logic, label tiers) transfers unchanged.
What to do when java is not available? [click to expand]
If the java version you need is not available as a module, install it with
SDKMAN:
-
Install SDKMAN
-
Load SDKMAN
-
Install Java 17
-
Verify Java installation
Pipeline fails with 'Failed to pull singularity image'. What now? [click to expand]
This is almost always a transient network or cache issue. Check:
NXF_SINGULARITY_CACHEDIR(orsingularity.cacheDir) points to a directory you can write to. Scratch is a safe choice.- You are on a compute node with outbound network access (all Negishi compute nodes have it by default).
- Disk quota on scratch has not been exceeded.
Then rerun with -resume to continue from where it failed without losing
completed tasks.