flowr: streamlining computing workflows

try this

install.packages(devtools)
devtools::install_github("sahilseth/flowr")
## OR
install.packages("flowr") 
library(flowr) ## load the library
setup() ## copy flowr bash script
run('sleep', execute=TRUE,  platform='moab')
## OR from terminal
# flowr run sleep execute=TRUE platform=moab


pic of flood







a deluge of data







flowr

streamlining computing workflows


What we need?

What we dont need?


a flowr recepie needs:








five terms one needs to know about

Submission type

Given a bunch of shell commands for a step, how to submit the jobs ?

five terms one needs to know about

Dependency type:

In what fashion should the downstream step wait for previous step(s)?

An example chart showing a typical pipeline

plot of chunk flow_overview

lets get started

#install.packages(devtools)
#devtools::install_github("sahilseth/flowr")
## OR
#install.packages("flowr")
library(flowr)
setup()
Consider adding ~/bin to your PATH variable in .bashrc.
export PATH=$PATH:$HOME/bin
You may now use all R functions using 'flowr' from shell.

load some example data

exdata = file.path(system.file(package = "flowr"), "pipelines")
flowmat = as.flowmat(file.path(exdata, "abcd.tsv"))
flowdef = as.flowdef(file.path(exdata, "abcd.def"))

stitch it

fobj <- to_flow(x = flowmat, def = flowdef)
input x is data.frame
##--- Getting default values for missing parameters...
Using `samplename` as the grouping column
Using `jobname` as the jobname column
Using `cmd` as the cmd column
Using flow_base_path default: ~/flowr
##--- Checking flow definition and flow matrix for consistency...
##--- Detecting platform...
Platform supplied, this will override defaults from flow_definition...
##--- flowr submission...
Working on... sample1
Test Successful!
You may check this folder for consistency. Also you may re-run submit with execute=TRUE
 ~/flowr/example1-sample1-20150706-21-50-08-AuNGnTHi

plot it

plot_flow(fobj)

plot of chunk plotit

test it

submit_flow(fobj)

submit it

submit_flow(fobj, execute = TRUE)
Flow has been submitted. Track it from terminal using:
flowr::status(x="~/flowr/type1-20150520-15-18-46-sySOzZnE")
OR
flowr status x=~/flowr/type1-20150520-15-18-46-sySOzZnE
$ flowr status x=~/flowr/sample1-20150619-07-43-28-OTpuKaMz
Flowr: streamlining workflows
Showing status of: ~/flowr/sample1-20150619-07-43-28-OTpuKaMz
|          | total| started| completed| exit_status|
|:---------|-----:|-------:|---------:|-----------:|
|001.sleep |     3|       3|         1|           0|
|002.tmp   |     3|       1|         1|           0|
|003.merge |     1|       0|         0|           0|

all in one go

Here is an example:

flowr run sleep execute=TRUE platform=moab

simple yet powerful status()

$ flowr status x=sample1
Showing status of: ./sample1-20150619-07-34-17-lykJ4pdf
|          | total| started| completed| exit_status|
|:---------|-----:|-------:|---------:|-----------:|
|001.sleep |     3|       3|         3|           0|
|002.tmp   |     3|       3|         3|           0|
|003.merge |     1|       1|         1|           0|
Showing status of: ./sample1-20150619-07-43-28-OTpuKaMz
|          | total| started| completed| exit_status|
|:---------|-----:|-------:|---------:|-----------:|
|001.sleep |     3|       3|         3|           0|
|002.tmp   |     3|       3|         3|           0|
|003.merge |     1|       0|         0|           0|

simple yet powerful status()

status() is designed to work similar to how ls works in the terminal

flowr run sleep execute=TRUE flow_base_path="~/flowr/sleep"
flowr status x=~/flowr/sleep ## parent folder with 3 flows inside
Showing status of: /rsrch2/iacs/iacs_dep/sseth/flowr/sleep
|          | total| started| completed| exit_status|
|:---------|-----:|-------:|---------:|-----------:|
|001.sleep |     9|       9|         6|           0|
|002.tmp   |     9|       6|         6|           0|
|003.merge |     3|       1|         1|           0|
|004.size  |     3|       1|         1|           0|
flowr status x=~/flowr/sleep/sample1* ## get status of all them

stopping flows

flowr kill_flow wd=~/flowr/sample1-20150619-07-53-58-ySuYo5t0

rerun partially completed flows

flowr rerun_flow x=~/flowr/sample1-20150619-11-41-50-eXa0insg start_from=tmp
Extracting commands from previous run.
Hope the reason for previous failure was fixed...
Subsetting... get stuff to run starting tmp
Using flow_base_path default: ~/flowr

clean organization and structure

├── 001.sleep
│   ├── 001.sleep
│   ├── sleep_cmd_1.sh
│   ├── sleep_cmd_2.sh
│   └── sleep_cmd_3.sh
├── 002.tmp
│   ├── 002.tmp
│   ├── tmp_cmd_1.sh
│   ├── tmp_cmd_2.sh
│   └── tmp_cmd_3.sh
├── 003.merge
│   ├── 003.merge
│   └── merge_cmd_1.sh
├── 004.size
│   ├── 004.size
│   └── size_cmd_1.sh

clean organization and structure

├── example1-flow_design.pdf
├── flow_details.rda
├── flow_details.txt
├── flow_status.txt
├── tmp
│   ├── merge1
│   ├── tmp1_1
│   ├── tmp1_2
│   └── tmp1_3
└── trigger
    ├── trigger_001.sleep_1.txt
    ├── trigger_001.sleep_2.txt
    ├── trigger_001.sleep_3.txt
    ├── trigger_002.tmp_1.txt
    ├── trigger_002.tmp_2.txt
    ├── trigger_002.tmp_3.txt
    ├── trigger_003.merge_1.txt
    └── trigger_004.size_1.txt

mixing local and hpcc jobs

platforms supported

Error in get_dt(tb, "plat_supp"): could not find function "inviible"

flowr shell script: universal use

syntax: flowr function parameters

flowr rnorm n=100
Loading required package: shape
Flowr: streamlining workflows
2.277249 0.3188005 -0.9658285 0.4719445 
....
## load help file for knitr
funr knitr::knit
## OR use
funr knitr::knit -h

more examples

links for more info:

Aknowledgements