Luigi Tutorial

luigi_screenshot.png

Tutorial leader: Samuel Lampa

Preface

Luigi is a batch workflow system written in Python and developed at Spotify, where it is used to compute machine-learning powered music recommendation lists, top lists etc.

Luigi is one of not-too-many batch workflow systems that supports running both normal command line jobs and hadoop jobs in the same (in this tutorial, we will focus only on the command line part).

Luigi workflows are developed in an object oriented fashion in python code, and are executed and controlled from the commandline. But when running, the status of the workflow run can be followed in a web browser, in the graphical web interface that luigi ships with (as demonstrated in the picture above).

Luigi is a little special compared to most other workflow solutions, that the dependency graph is by default defined by hard-coding the upstream dependent task, inside each task. In this regard, luigi tasks are quite similar to funcitons in functional programming, where each function knows everything needed to provide it's answer, including all the other functions that need to be executed to get there.

There are ways to override these hard-coded dependencies, to create other workflows though, but since workflows in bioinformatics often need more flexibility that that, and need to be easy to augment with e.g. extra filtering steps anywhere in a workflow, we will in this tutorial show how we can extend the "functional" design of luigi, into a more "data flow" like design that most other workflow engines follow.

Prerequisites

  • For the Luigi tutorial you will need an UPPMAX-account (our local HPC center).
    • To apply for an account, go to https://supr.snic.se/ and select “Register New Person”.
    • Note that you need to apply using an institutional email adress (university, company, governmental etc), and that gmail, hotmail etc. will not be accepted.
    • After your application has been approved, you can log into the system and select “View and Manage Projects” where you can request membership in Projects. Here you should enter “g2015001” and on subsequent page click “Request”. The request will then be manually handled by us.
    • When your account is approved you will receive a link to a temporary password. You can only click this link once to receive your password so make sure to note it. If you connect to UPPMAX from abroad, see http://www.uppmax.uu.se/using-the-uppmax-gateway
    • Login to UPPMAX system is done via SSH2. There are guides available at http://www.uppmax.uu.se/support
    • If you have any problems or questions regarding UPPMAX accounts, please contact Martin Dahlö <martin.dahlo@scilifelab.uu.se>
  • If you want to install Luigi locally you can have a look at http://www.uppmax.uu.se/automating-workflows-using-the-luigi-batch-workflow-system , or follow the steps below!

Installing Luigi

For installing luigi, either in your UPPMAX account, or on your local laptop, see this separate page:

Tutorial

The tutorial is divided into the following sections, created as separate pages, for better overview *(Note: These pages are thought to be opened as separate tabs in the browser, and be worked through one by one, in order!)*:

Some useful resources

Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r14 - 2015-01-20 - EInfraMPS2015Org
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2015 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback