-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
100 lines (72 loc) · 2.69 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
# rush
*rush* is a package for parallel and distributed computing in R.
It evaluates an R expression asynchronously on a cluster of workers and provides a shared storage between the workers.
The shared storage is a [Redis](https://redis.io) data base.
Rush offers a centralized and decentralized network architecture.
The centralized network has a single controller (`Rush`) and multiple workers (`RushWorker`).
Tasks are created centrally and distributed to workers by the controller.
The decentralized network has no controller.
The workers sample tasks and communicate the results asynchronously with other workers.
# Features
* Parallelize arbitrary R expressions.
* Centralized and decentralized network architecture.
* Small overhead of a few milliseconds per task.
* Easy start of local workers with `processx`
* Start workers on any platform with a batch script.
* Designed to work with [`data.table`](https://CRAN.R-project.org/package=data.table).
* Results are cached in the R session to minimize read and write operations.
* Detect and recover from worker failures.
* Start heartbeats to monitor workers on remote machines.
* Snapshot the in-memory data base to disk.
* Store [`lgr`](https://CRAN.R-project.org/package=lgr) messages of the workers in the Redis data base.
* Light on dependencies.
## Install
Install the development version from GitHub.
```{r eval = FALSE}
remotes::install_github("mlr-org/rush")
```
And install [Redis](https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/).
## Centralized Rush Network
![](man/figures/README-flow.png)
*Centralized network with a single controller and three workers.*
```{r, include=FALSE}
config = redux::redis_config()
r = redux::hiredis(config)
r$FLUSHDB()
```
The example below shows the evaluation of a simple function in a centralized network.
The `network_id` identifies the instance and workers in the network.
The `config` is a list of parameters for the connection to Redis.
```{r}
library(rush)
config = redux::redis_config()
rush = Rush$new(network_id = "test", config)
rush
```
Next, we define a function that we want to evaluate on the workers.
```{r}
fun = function(x1, x2, ...) {
list(y = x1 + x2)
}
```
We start two workers.
```{r}
rush$start_local_workers(fun = fun, n_workers = 2)
```
Now we can push tasks to the workers.
```{r}
xss = list(list(x1 = 3, x2 = 5), list(x1 = 4, x2 = 6))
keys = rush$push_tasks(xss)
rush$wait_for_tasks(keys)
```
And retrieve the results.
```{r}
rush$fetch_finished_tasks()
```
## Decentralized Rush Network
![](man/figures/README-flow-2.png)
*Decentralized network with four workers.*