SkyDive integration
Relevant launchpad RFE:
https://bugs.launchpad.net/dragonflow/+bug/1749429
Currently we do not have an easy way to visualize the way Dragonflow sees the
topology and the relations between topology elements. This view is important
both for operators using Dragonflow, and for developers trying to debug the
system.
To solve this, we would like to leverage one of the SkyDive project
capabilities - being a good topology visualization tool. This will allow
us to provide the overall topology view (skydive model) and use the skydive
for all the different ways to visualize and dissect the information (skydive
view).
There is a talk on Skydive from the Austin summit in 2016 .
Problem Description
Add the ability to view the topology as known by DragonFlow in a graphical way.
This feature is needed to allow easier debugging of issues in flows and
understanding the behaviour of DragonFlow.
Implementation stages:
- Have a view of the topology of a specific chassis (Node running
Dragonflow controller)
- Have real-time updates of the information (topology elements added/removed)
- Allow aggregation of the information and have a global view of the
topology, which means the view of the entire network as Dragonflow sees it
- Allow filtering and dissecting of the information to show just specific
parts
- Allow simulation or tracing of packets in the system on the graphical
interface
All of these features are supported by skydive project, but for other
agent types.
What we need to do is implement our own agent (or a way to send the
information to the analyser) in a way that will be out-of-band with the
operation of the controller (or at least not block it for long periods).
Another thing we should provide is to allow an administrator (rather than
developer) easy access and debugging of the system.
Proposed Change
Add one Skydive analyzer with one (or more) skydive agents.
The agents are responsible to collect the topology information,
translate it to the Skydive model structure and send it to the analyzer.
The Analyzer, in turn, is responsible to aggregate the information and for
the displaying of the information, with/without filtering requested by the
end-user (be it a developer / integrator / operator).
The agents will run on the Dragonflow Controller nodes as a separate
service (not within the controller) for several reasons:
- We do not want to affect the performance of the controller. Each update
sent to the analyzer may take up to a few seconds, in which time the
controller will not be able to service other requests
- The code running in the controller must be monkey_patched (as it is using
the oslo infrastructure and the etcd driver with both require
monkey_patching). This creates different limitations on our code - e.g.
the asyncio loops and selectors are limited or behave badly
Integration should be done in several stages:
- Create a basic service that runs every given period and sends an update
of the elements in the system to the analyzer.
- Support topology update, including removal of objects from the DB and
reflect it in the topology view in SkyDive.
- Handle cases of disconnect/re-authentication between the collector and the
analyzer
- Handle cases of disconnect/reconnect of the collector and the nb_db
- Add a mechanism in which the skydive_service will get notification of
objects that were added-to/removed-from the topology to have an
experience that is closer to real-time as opposed to periodic updates.
- Specify an API for DragonFlow applications to add custom information to
the topology view (e.g. port-behind-port) and relevant metadata to be
used in the view filtering.
- Improve the visualization (custom icons, etc.)
- Add some SkyDive views / filters.
- In the nb_db the router only points to the internal network ports, so the
view we get is not complete. We would like it to point to the its gateway
port as well. To achieve that we would have to add some kind of proxy
object (as we have the table name and owner ID) to be able to retrieve this
gateway port from the nb_db.
Open issues / feature discussion:
- How do we get the topology change notifications? As we are an external
application we disable the pubsub feature, so we should have a different
way of getting these notifications.
- One solution may be using the pubsub mechanism, but it will require
rewriting (at least) the etcd pub/sub subscriber driver to use (e.g.)
etcd3gw or etcd3 library (for patched code or not) so we can support
both cases for different use-cases.
I believe that a separate spec would have to cover these:
- What views / filters do we want to supply - need to investigate how to
define them.
- If possible, we would like to support the option to emulate a passing
of a packet through the system. Is it supported by SkyDive? if so, how?
- If possible, we would like to support visualization of tracing of packets
in our system. Is it supported by SkyDive? if so, how?
What is required on our side to visualize it on SkyDive?
References