install.packages(devtools)
devtools::install_github("sahilseth/flowr")
## OR
install.packages("flowr")
library(flowr) ## load the library
setup() ## copy flowr bash script
run('sleep', execute=TRUE, platform='moab')
## OR from terminal
# flowr run sleep execute=TRUE platform=moab
a deluge of data
flowr
streamlining computing workflows
two
ingredients
flow_mat
]flow_def
]scatter
, sequential (serial
)gather
, serial
, burst
Given a bunch of shell commands for a step, how to submit the jobs ?
serial
: or sequential, would submit one after the otherscatter
: submits all the them at the same time, executing all in parallelIn what fashion should the downstream step wait for previous step(s)?
gather
: wait for N jobs in previous step to completeserial
: when ith job in previous step completes, start the ith job in current stepburst
: when the previous step completes (and had a single job), start N jobs of the current step
#install.packages(devtools)
#devtools::install_github("sahilseth/flowr")
## OR
#install.packages("flowr")
library(flowr)
setup()
Consider adding ~/bin to your PATH variable in .bashrc.
export PATH=$PATH:$HOME/bin
You may now use all R functions using 'flowr' from shell.
exdata = file.path(system.file(package = "flowr"), "pipelines")
flowmat = as.flowmat(file.path(exdata, "abcd.tsv"))
flowdef = as.flowdef(file.path(exdata, "abcd.def"))
fobj <- to_flow(x = flowmat, def = flowdef)
input x is data.frame
##--- Getting default values for missing parameters...
Using `samplename` as the grouping column
Using `jobname` as the jobname column
Using `cmd` as the cmd column
Using flow_base_path default: ~/flowr
##--- Checking flow definition and flow matrix for consistency...
##--- Detecting platform...
Platform supplied, this will override defaults from flow_definition...
##--- flowr submission...
Working on... sample1
Test Successful!
You may check this folder for consistency. Also you may re-run submit with execute=TRUE
~/flowr/example1-sample1-20150706-21-50-08-AuNGnTHi
plot_flow(fobj)
submit_flow(fobj)
submit_flow(fobj, execute = TRUE)
Flow has been submitted. Track it from terminal using:
flowr::status(x="~/flowr/type1-20150520-15-18-46-sySOzZnE")
OR
flowr status x=~/flowr/type1-20150520-15-18-46-sySOzZnE
$ flowr status x=~/flowr/sample1-20150619-07-43-28-OTpuKaMz
Flowr: streamlining workflows
Showing status of: ~/flowr/sample1-20150619-07-43-28-OTpuKaMz
| | total| started| completed| exit_status|
|:---------|-----:|-------:|---------:|-----------:|
|001.sleep | 3| 3| 1| 0|
|002.tmp | 3| 1| 1| 0|
|003.merge | 1| 0| 0| 0|
flow
object !Here is an example:
flowr run sleep execute=TRUE platform=moab
$ flowr status x=sample1
Showing status of: ./sample1-20150619-07-34-17-lykJ4pdf
| | total| started| completed| exit_status|
|:---------|-----:|-------:|---------:|-----------:|
|001.sleep | 3| 3| 3| 0|
|002.tmp | 3| 3| 3| 0|
|003.merge | 1| 1| 1| 0|
Showing status of: ./sample1-20150619-07-43-28-OTpuKaMz
| | total| started| completed| exit_status|
|:---------|-----:|-------:|---------:|-----------:|
|001.sleep | 3| 3| 3| 0|
|002.tmp | 3| 3| 3| 0|
|003.merge | 1| 0| 0| 0|
status
shows a summary of all the flows in the folderstatus() is designed to work similar to how ls works in the terminal
flowr run sleep execute=TRUE flow_base_path="~/flowr/sleep"
flowr status x=~/flowr/sleep ## parent folder with 3 flows inside
Showing status of: /rsrch2/iacs/iacs_dep/sseth/flowr/sleep
| | total| started| completed| exit_status|
|:---------|-----:|-------:|---------:|-----------:|
|001.sleep | 9| 9| 6| 0|
|002.tmp | 9| 6| 6| 0|
|003.merge | 3| 1| 1| 0|
|004.size | 3| 1| 1| 0|
flowr status x=~/flowr/sleep/sample1* ## get status of all them
kill_flow
: fetch jobid of each jobflowr kill_flow wd=~/flowr/sample1-20150619-07-53-58-ySuYo5t0
flowr rerun_flow x=~/flowr/sample1-20150619-11-41-50-eXa0insg start_from=tmp
Extracting commands from previous run.
Hope the reason for previous failure was fixed...
Subsetting... get stuff to run starting tmp
Using flow_base_path default: ~/flowr
├── 001.sleep
│ ├── 001.sleep
│ ├── sleep_cmd_1.sh
│ ├── sleep_cmd_2.sh
│ └── sleep_cmd_3.sh
├── 002.tmp
│ ├── 002.tmp
│ ├── tmp_cmd_1.sh
│ ├── tmp_cmd_2.sh
│ └── tmp_cmd_3.sh
├── 003.merge
│ ├── 003.merge
│ └── merge_cmd_1.sh
├── 004.size
│ ├── 004.size
│ └── size_cmd_1.sh
├── example1-flow_design.pdf
├── flow_details.rda
├── flow_details.txt
├── flow_status.txt
├── tmp
│ ├── merge1
│ ├── tmp1_1
│ ├── tmp1_2
│ └── tmp1_3
└── trigger
├── trigger_001.sleep_1.txt
├── trigger_001.sleep_2.txt
├── trigger_001.sleep_3.txt
├── trigger_002.tmp_1.txt
├── trigger_002.tmp_2.txt
├── trigger_002.tmp_3.txt
├── trigger_003.merge_1.txt
└── trigger_004.size_1.txt
local
moab
Error in get_dt(tb, "plat_supp"): could not find function "inviible"
syntax: flowr
function
parameters
-h
or missing argument loads R help fileflowr rnorm n=100
Loading required package: shape
Flowr: streamlining workflows
2.277249 0.3188005 -0.9658285 0.4719445
....
## load help file for knitr
funr knitr::knit
## OR use
funr knitr::knit -h