Include the URL of the launchpad RFE: https://blueprints.launchpad.net/dragonflow/+spec/cassandra-support
Apache Cassandra [1] is a key-value store and widely used in large-scale real-time internet applications, such as Netflix, Reddit, The Weather Channel, etc.
[1] | https://cassandra.apache.org/ |
The performance is amazing and generally dominates others according to universities’ research reports [2].
[2] | https://www.planetcassandra.org/nosql-performance-benchmarks/ |
Besides performance, it also has many noticeable advantages such as
Currently, we implement control plane of clustering for Redis inside Dragonflow, which is actually beyond the scope of Dragonflow project. The reason why we implement db-api layer is that we do not want to maintain the details of data backend as it is not the responsibility of Dragonflow project.
The disadvantage of Cassandra is that it needs external mechanism for PUB/SUB, for example, Zookeeper or ZeroMQ. The latter has been implemented in Dragonflow, so it is usable for now.
It is noted that Cassandra is run over JVM.
In this section I will highlight some internal mechanisms of Cassandra that will greatly help Dragonflow scale out and put into production.
- You can adjust ReplicationFactor to have multiple replications across data centers.
- You can adjust ConsistencyLevel to use different algorithms, like Quorum.
- Every node in the cluster is identical. No Master or Slave roles.
- The data written to Cassandra node is going to append-only CommitLog first and fsync to disk next. You also can adjust the policy of fsync. It guarantees the durability.
You just need to specify a set of nodes in configuration, remote_db_hosts in [df] section. The nodes will automatically form a Quorum-like cluster with replications and consistency you specify in Cassandra configuration.
Although this section is beyond the scope of Dragonflow, the following links are provided by Cassandra official to guide users on tuning Cassandra and JVM.
It is observed that the operations on data store in Dragonflow is read intensive according to monitoring in the production. This is actually not the Dragonflow’s characteristic but the Neutron’s. Most of the operations on data store in Neutron are high concurrent read.
Here is another link [3] that provides hints on how to optimize JVM in Cassandra for read heavy workloads.
[3] | https://www.planetcassandra.org/blog/cassandra-tuning-the-jvm-for-read-heavy-workloads/ |
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.