Parallel evaluation of a CQL query on multiple Cassandra nodes in a cluster #2

acharal · 2016-07-23T08:00:18Z

Motivation

It would be great to implement a parallel version of the cassandra connector. Assume that the Semagrow execution engine spans over a cluster of nodes and each node can execute part of the execution plan in parallel. Assume also that each each Semagrow node is colocated with a Cassandra node. Then, a single CQL query can be processed in parallel by all the colocated Cassandra and Semagrow nodes and perform some work locally to the physical node.

Suggested Solution

An easy way to retrieve data local to a Cassandra node is with the use of the CQL token function. The same technique is used by the sparql-cassandra-connector (for example see CqlTokenRange and CassandraTableScanRDD). Each Cassandra node gets an altered CQL query with token ranges added in the where clause. For example, suppose that there are 3 nodes in a cluster and the initial CQL query is

SELECT event_description 
FROM events
WHERE event_category = 'Alerts'

Each i-st node will then get a query of the form

SELECT event_description 
FROM events
WHERE token(event_name) >= x_i AND token(event_name) < y_i AND event_category = 'Alerts'

Ideally, the token range [x_i, y_i) matches with the local data of the i-st node and therefore there will be no network exchange. However, in the case that not every Cassandra node participates in a Semagrow computation then some of the nodes will get a query with tokens outside of their range. Cassandra cluster will handle the query by finding which node owns the specific tokens and transfers them to the node that handles the query.

Hope that the suggestion is at least sound.

The text was updated successfully, but these errors were encountered:

acharal added the enhancement label Jul 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel evaluation of a CQL query on multiple Cassandra nodes in a cluster #2

Parallel evaluation of a CQL query on multiple Cassandra nodes in a cluster #2

acharal commented Jul 23, 2016

Parallel evaluation of a CQL query on multiple Cassandra nodes in a cluster #2

Parallel evaluation of a CQL query on multiple Cassandra nodes in a cluster #2

Comments

acharal commented Jul 23, 2016

Motivation

Suggested Solution