The URL of your launchpad Blueprint:
https://blueprints.launchpad.net/dragonflow/+spec/keep-db-consistency
This blueprint proposes the solutions for how to keep data consistency between Neutron DB and Dragonflow DB, also between Dragonflow DB and Dragonflow local controller cache. As we know, Neutron DB stores the virtualized network topology of OpenStack clouds. It is a reliable relational data store and can be considered as the master DB in Dragonflow distributed architecture. As a result, the question becomes how to guarantee the atomicity of DB operations across two distinct kinds of database in the one API session. Here we will propose a simple and feasible distributed lock solution and improve it later according to evaluation in production. We also provide an effective data synchronization mechanism to resolve the data inconsistency problem which may happen in the communication among above three types of data stores.
Currently, Dragonflow architecture has three types of data stores. One is Neutron DB responsible for storing virtualized network topology. This database is relational and all the operations on it are based upon atomic transactions. It is considered as the master DB of Dragonflow. The other is distributed NoSQL DB responsible for storing Dragonflow-related topology, the subset of Neutron DB. It is considered as the slave DB of Dragonflow. Its API, encapsulated by Dragonflow, lacks of atomic transaction in general which causes inconsistency. The third is Dragonflow local controller cache which is the subset of Dragonflow NoSQL DB based on selective topology storage mechanism, it makes Dragonflow local controller visit its concerned data much faster.
There are several problems related with DB inconsistency in the current codebase. The pseudo-code or description to demonstrate the inconsistency is as follows:
with transaction of Neutron DB:
call Neutron DB API
call Dragonflow DB API
When an exception is thrown from Dragonflow DB API, however, Neutron DB transaction is committed. We need to call re-sync to make sure the two DB is in the same state. This phenomenon is discovered when Dragonflow is disconnected after Neutron DB transaction is committed.
with transaction of Neutron DB:
call Neutron DB API
call _some_operation(...)
def _some_operation(...):
call get_key(...)
update the values.
call set_key(...)
After Neutron DB is committed, concurrent calling of _some_operation function will still cause inconsistency, because all the operations in that function are not atomic. This phenomenon is discovered in multi-node deployments.
Scenario 3:
- Dragonflow local controller subscribe its concerned data;
- Neutron plugin publish a new created data;
- Dragonflow local controller receive data and store in the cache.
After Dragonflow local controller subscribe its concerned data, the sub/pub connection between local controller and sub/pub system is down because of the breakdown of network or sub/pub system, so local controller will lose the new data. Another case is after local controller receives the pub message, it happens internal exception in the data process, then the new data will be dropped.
Scenario 4:
1. A VM online on a Dragonflow local controller host while it is the first VM of one tenant on the host;
2. The local controller will fetch all the data belong to the tenant from Dragonflow NoSQL DB.
After the VM online, the data read/write connection between local controller and Dragonflow NoSQL DB is down because of the breakdown of network or the problem of DB itself(restart or node crash), then the local controller will lose the tenant data it concerned.
To solve the problems discussed in Scenario 2, the general idea is to introduce a distributed lock mechanism for core plugin layer. The distributed lock is to protect the API context and prevent from concurrent write operations on the same records in the database. As to the problems discussed in Scenario 1, 3 and 4, we need a effective data synchronization mechanism.
To solve these problems, a SQL-based distributed lock mechanism is introduced. A carefully-designed SQL transaction is capable of being an external atomic lock to protect all the dependent database operations both for Neutron DB and DF distributed NoSQL server in a given API context. This can greatly reduces the complexity of introducing other sophisticated synchronization mechanism from scratch.
The distributed lock is tenant-based and each tenant has its own lock in the database. Due to this design, we can allow concurrency to a certain degree.
In Neutron plugin:
When an API is processing:
Acquire the distributed lock for the Neutron object.
Start Neutron DB transaction for network operations.
Do Neutron DB operations.
Do DF DB operations.
Emit messages via PUB/SUB.
Release the distributed lock.
def CUD_object(context, obj):
nb_lock = lock_db.DBLock(context.tenant_id)
with nb_lock:
with db_api.autonested_transaction(context.session):
modified_obj = super(Plugin, self).CUD_object(context, obj)
self.nb_api.CUD_object(name=obj['id'],
topic=obj['obj']['tenant_id'],
modified_obj)
return modified_obj
@lock_db.wrap_db_lock()
def CUD_object(self, context, obj):
pass
As noted above, the spec adds a new table for the distributed lock in Neutron DB. The table is designed as follows:
Attribute | Type | Description |
---|---|---|
object_uuid | String | primary key |
lock | Boolean | True means it is locked. |
session_id | String | generated for a given API session |
created_at | DateTime |
We discussed the Data Synchronization Mechanism from two aspects:
- Neutron plugin;
- Dragonflow local controller.
When Neutron plugin receives a creation object(router/network/subnet/port, etc) invoke:
Start Neutron DB transaction for creation operations.
with session.begin(subtransactions=True):
Do Neutron DB operations.
try:
Do DF DB operations.
Emit messages via PUB/SUB.
except:
rollback Neutron DB operations.
raise creation exception.
When Neutron plugin receives a update/delete object(router/network/subnet/port, etc) invoke:
Start Neutron DB transaction for DB operations.
with session.begin(subtransactions=True):
Do Neutron DB operations.
try:
Do DF DB operations.
Emit messages via PUB/SUB.
except:
raise update/delete exception.
When DB driver and pub/sub driver find the read/write connection between Neutron plugin and DF DB, and also the pub/sub connection between Neutron plugin and pub/sub system are recovered, the driver should notify Neutron plugin a recover message, Neutron plugin should process the message:
Start handle the recover message:
pull data from DF DB.
pull data from Neutron DB.
do compare with two data set.
if found object create/update/delete :
do DF DB operations.
Emit messages via PUB/SUB.
As we know, during the data pulling and comparison period, both of the two DB data is changing dynamically, for example, a new port data has been written into Neutron DB, the db comparison thread read the port data from both Neutron DB and DF DB before the port data is written into DF DB, so the port will be consider create, in factor, the port data will be written into DF DB soon, another case, if we find an update port in Neutron DB than DF DB during data comparison, then the port may be updated again, if the latter update operation is happened earlier than the former update operation, the new port data will be covered by the old port data in DF DB, so we start a green thread to do the data comparison between Neutron DB and DF DB periodically which introduce a verification mechanism in Neutron plugin, if we found the performance bottleneck for Neutron plugin, we could consider the 3rd-party software (such as an additional process or system OM tools) to do this.
As to the verification mechanism for the db comparison, We can mark and cache the create/update/delete status for each object at the first time db comparison, and after the second time db comparison, if the status of one object is still unchanged, so we can confirm the create/update/delete status and try to do the corresponding operations, but if the status is changed, we should flush the status for the object by the latest status and then wait for next db comparison. So we need two times of db comparison to confirm the status of object.
During the corresponding operations after confirm the object status, it should try to get the distribute lock, after getting the lock:
- If the status is create, it should try to read the object from DF DB, if the object is still not exist, we should create this object to DF DB, while if the object is exist in DF DB because the object maybe updated during the db comparison, so we consider it is not a creating object any more and delete the status of this object from cache.
- If the status is update, it should try to read the object from DF DB, if the object is not exist because the object maybe deleted during the db comparison or the object is exist but the object is changed because it maybe updated during the db comparison, so we should delete the status of this object from cache, otherwise, we should update this object to DF DB.
- If the status is delete, we could delete this object from DF DB directly.
After the above processing is done, the lock will be released.
Start db comparison periodically:
pull data from DF DB.
pull data from Neutron DB.
do compare with object version.
if found object create/update/delete :
verification the object.
if confirmed object status:
try:
db_lock = lock_db.DBLock(context.tenant_id)
with db_lock:
read and check object from DF DB.
if everything confirmed:
do DF DB operations.
delete object from cache.
except:
raise exception.
else:
refresh object status in cache.
We could use the distribute lock mechanism discussed above for the election. We should define a primary key for the Neutron plugin election in the distribute lock table, and we should store one data record for each Neutron plugin in DF DB, the record should be like this, the total_number property will only be contained in master plugin record:
Attribute | Type | Description |
---|---|---|
plugin_name | String | primary key |
role | String | master or normal |
status | String | active or down |
hash_factor | int | used to LB |
total_number | int | total active plugin number |
plugin_time | DateTime | the latest update DateTime |
The election process should be like this:
def get_master_neutron_plugin(context):
nb_lock = lock_db.DBLock(context.election_key)
with nb_lock:
if db_api.get_master_plugin_name() == self.plugin_name:
db_api.set_master_plugin_time(self.current_time)
return True
elif self.current_time > master_old_time + timeout:
db_api.set_master_plugin_time(self.current_time)
db_api.set_master_plugin_name(self.plugin_name)
return True
else:
return False
In multi nodes environment, we should consider Neutron plugin load balance to do the db comparison work, it can make the work more effective and avoid single node bottleneck.
Master Neutron plugin will assign hash factor for each existing active plugin and store the assign result into DF DB. For example, if there are three active plugins (include master plugin) A, B and C, master plugin could assign hash factor 0 to A, 1 to B, 2 to C, if a new plugin D join, master should assign hash factor 3 to D, then if plugin B break down, master plugin will reassign hash factor 0 to A, 1 to C, 2 to D, plugin will calculate hash value by object_uuid for each object and get the corresponding result by using the hash value to mod the total active plugin number, if the result is equal to the hash factor of the plugin, the object will be processed, otherwise the object will be passed.
Start handle the recover message:
get tenant list according to local VM ports.
pull data from DF DB according to tenant list.
compare data between local cache and the got data.
if found object create/update/delete :
notify to local apps.
When local controller receives the notification message from pub/sub system, it will compare the version id between the message and the corresponding object stored in the cache, if version id in the notification is newer, we will do the update process, otherwise, we will ignore it.
We should start a green thread to do the data comparison between local controller cache and DF DB periodically by using the similar verification mechanism as Neutron plugin.
We add the version_id value for every object, it will be generated when object is created and updated when the object is updated. We will add an additional table in Neutron DB to store the version info for each object:
Attribute | Type | Description |
---|---|---|
obj_uuid | String | primary key |
version_id | String | object version id |
Also we will add a version_id attribute into each object in DF DB, when we create/update an object, we should do like this:
Start create an object:
db_lock = lock_db.DBLock(context.tenant_id)
with db_lock:
create object into Neutron DB.
generate and write version id into Neutron DB.
create object into DF DB with version_id
Start update an object:
db_lock = lock_db.DBLock(context.tenant_id)
with db_lock:
update object into Neutron DB.
compare and swap version id into Neutron DB.
update object into DF DB with version_id
After add the version_id into object, we could judge whether the object has been updated just according to the version_id which makes the db comparison more effective.
If we want to reuse ML2 core plugin, we should develop Dragonflow mechanism driver for it, the driver should implement db operations for DF DB and pub/sub operations, we should also put the db consistency logic into the driver.
For db operations, our Dragonflow mechanism driver should implement object_precommit and object_postcommit method, the object_precommit method should not block the main process and could make the Neutron DB transaction to rollback when it happens exception in the transaction. For object creation, object_postcommit method should raise a MechanismDriverError if it happens exception to make Ml2 plugin to delete the resource. For object update/delete, object_postcommit method could ignore its internal exception because ML2 do not concern about it in current implementation.
We should add these db consistency functions into our Dragonflow mechanism driver:
- handle discover message.
- DB comparison periodically.
- Master Neutron plugin election.
- multi Neutron plugins load balance.
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.