DLedger is a raft-based java library for building high-available, high-durable, strong-consistent commitlog. The test run concurrent operations to dledger from different nodes in a dledger cluster and checks that the operations preserve the consistency properties defined in the test. During the test, various nemesis can be added to interfere with the operations.
Currently, checker is Set. Given a set of concurrent unique appends to dledger commitlog followed by a final read, verifies that every successfully appended element is present in the read, and that the read contains only elements for which an append was attempted.
- Prepare one control node and five db nodes and ensure that the control node can use SSH to log into a bunch of db nodes.
- Install clojure, jepsen and clojure-control on the control node.
- Edit nodes , control.clj and src/dledger_jepsen_test/core.clj files to set hostname, user name and store path. Those values are hardcoded in the program by now.
- Deploy the dledger server with clojure-control on the control node:
control run dledger build
control run dledger deploy
- Run the test
lein run test --nodes-file ./nodes
or execute ./run_test.sh
In one shell, we start the five nodes and the controller using docker compose.
cd docker
./up.sh --dev
In another shell, use docker exec -it chaos-control bash
to enter the controller, then
control run dledger build
control run dledger deploy
./run_test.sh
See lein run test --help
for options.
nemesis
--nemsis NAME
, what nemesis should we run? The default value is partition-random-halves. You can also run the following nemesis:
- partition-random-node: isolates a single node from the rest of the network.
- partition-random-halves: cuts the network into randomly chosen halves.
- kill-random-processes: kill random processes and restart them.
- crash-random-nodes: crash random nodes and restart them (kill processes and drop caches).
- hammer-time: pause random nodes with SIGSTOP/SIGCONT.
- bridge: a grudge which cuts the network in half, but preserves a node in the middle which has uninterrupted bidirectional connectivity to both components.
- partition-majorities-ring: every node can see a majority, but no node sees the same majority as any other. Randomly orders nodes into a ring.
Other options:
--rate HZ
, approximate number of requests per second, per thread, the default value is 10.
--concurrency NUMBER
, the number of workers (clients), the default value is 5.
--time-limit TIME
, test time limit, the default value is 60.
--interval TIME
, nemesis interval, the default value is 15.
--test-count TIMES
, times to run test, the default value is 1.