Tutorial Tutorial leader: Pablo Prieto Barja <>


Participants interested in the Nextflow tutorials will need the following pieces of software installed on their laptop:

  • Unix-compatible OS (Linux, OSX, etc)
  • Java 7 or 8
  • Docker engine (optional)
  • BLAST+ (optional)
  • A Vagrant box is available that provides a pre-configured environment will all the required tools. Instructions for downloading and setup:

Getting ready

In order to use it, you will need to install the following pieces of software:

Clone the UPPNEX15 virtual machine (this project) in a convenient location by using the command:
git clone

Change to the UPPNEX15-vm folder and launch vagrant:

cd UPPNEX15-vm/
vagrant up

The first time you run it, it will automatically download the virtual machine required by the tutorial. It may take some minutes to complete, so be patient. When it has booted up and the configuration steps are terminated, login into the VM instance:

vagrant ssh 

You are now in the virtual machine. Now you can verify that Nextflow is working by entering the command:

nextflow -version 



I would not say that Nextflow is a DSL, but more of a framework oriented for data-driven pipelines or workflows. The idea is that any bioinformatician is (or will be at some point…) kind of a linux hacker in the sense that we end up working a lot with linux pipes to construct more or less complicated instructions or on-liners that do a task, simple and fast.

cat sequence | blast -in - | head 10 | t_coffee > result

In this tutorial you will see how Nextflow takes advantage of Linux being the integration platform for data science and how Nextflow takes advantage of it. The main features that Nextflow has to deliver are:

  • Fast prototyping
  • Smoothly integration with linux world
  • High level parallelization model
  • Portability and reproducibility
  • Error handling and crash recovery
  • Debugging
This tutorial will go through all these features showing code snippets, script examples, pipelines written in Nextflow, commands and a bit of the structure of the framework. Lets start by having a look at this diagram showing the architecture of Nextflow to have a better understanding of what is behind it, and what you can expect from it’s execution. Those are the different abstraction levels that Nextflow contains:

The code is going to interact directly with the script interpreter that reads all scripts and code within Nextflow. The only that we are going to touch that can affect some inner layers of Nextflow is the configuration that will tell the task dispatcher which is the executor type to use (i.e local, grid engine, …). But we don’t need to interact with the file system layer, or the parallelization layer for instance. To understand the philosophy behind Nextflow we have to talk about Dataflow programming paradigm which is a declarative model for concurrent processes execution. Processes wait for data and when an input set is ready the process is executed. They communicate by using dataflow variables (i.e async FIFO queues called channels). This is a list of the main primitive elements that are used in Nextflow and the tutorial will cover:

  • Closures: references to pieces of code
  • Processes: run any piece of script/command/code
  • Channels: unidirectional async queues that allows the process to communicate
  • Operators: transform channels content

The following sections will cover the tutorial:


Official Nextflow site:

Nextflow forum:!forum/nextflow

Nextflow github repository:

Nextflow at Gitter:

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r4 - 2015-01-20 - EInfraMPS2015Org
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2015 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback