.. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Convention for heading levels in Neutron devref: ======= Heading 0 (reserved for the title in a document) ------- Heading 1 ~~~~~~~ Heading 2 +++++++ Heading 3 ''''''' Heading 4 (Avoid deeper levels because they do not render well.) Open vSwitch L2 Agent ===================== This Agent uses the `Open vSwitch`_ virtual switch to create L2 connectivity for instances, along with bridges created in conjunction with OpenStack Nova for filtering. ovs-neutron-agent can be configured to use different networking technologies to create project isolation. These technologies are implemented as ML2 type drivers which are used in conjunction with the Open vSwitch mechanism driver. VLAN Tags --------- .. image:: images/under-the-hood-scenario-1-ovs-compute.png .. _Open vSwitch: http://openvswitch.org GRE Tunnels ----------- GRE Tunneling is documented in depth in the `Networking in too much detail `_ by RedHat. VXLAN Tunnels ------------- VXLAN is an overlay technology which encapsulates MAC frames at layer 2 into a UDP header. More information can be found in `The VXLAN wiki page. `_ Geneve Tunnels -------------- Geneve uses UDP as its transport protocol and is dynamic in size using extensible option headers. It is important to note that currently it is only supported in newer kernels. (kernel >= 3.18, OVS version >=2.4) More information can be found in the `Geneve RFC document. `_ Bridge Management ----------------- In order to make the agent capable of handling more than one tunneling technology, to decouple the requirements of segmentation technology from project isolation, and to preserve backward compatibility for OVS agents working without tunneling, the agent relies on a tunneling bridge, or br-tun, and the well known integration bridge, or br-int. All VM VIFs are plugged into the integration bridge. VM VIFs on a given virtual network share a common "local" VLAN (i.e. not propagated externally). The VLAN id of this local VLAN is mapped to the physical networking details realizing that virtual network. For virtual networks realized as VXLAN/GRE tunnels, a Logical Switch (LS) identifier is used to differentiate project traffic on inter-HV tunnels. A mesh of tunnels is created to other Hypervisors in the cloud. These tunnels originate and terminate on the tunneling bridge of each hypervisor, leaving br-int unaffected. Port patching is done to connect local VLANs on the integration bridge to inter-hypervisor tunnels on the tunnel bridge. For each virtual network realized as a VLAN or flat network, a veth or a pair of patch ports is used to connect the local VLAN on the integration bridge with the physical network bridge, with flow rules adding, modifying, or stripping VLAN tags as necessary, thus preserving backward compatibility with the way the OVS agent used to work prior to the tunneling capability (for more details, please look at https://review.openstack.org/#/c/4367). Bear in mind, that this design decision may be overhauled in the future to support existing VLAN-tagged traffic (coming from NFV VMs for instance) and/or to deal with potential QinQ support natively available in the Open vSwitch. Tackling the Network Trunking use case -------------------------------------- Rationale ~~~~~~~~~ At the time the first design for the OVS agent came up, trunking in OpenStack was merely a pipe dream. Since then, lots has happened in the OpenStack platform, and many many deployments have gone into production since early 2012. In order to address the `vlan-aware-vms `_ use case on top of Open vSwitch, the following aspects must be taken into account: * Design complexity: starting afresh is always an option, but a complete rearchitecture is only desirable under some circumstances. After all, customers want solutions...yesterday. It is noteworthy that the OVS agent design is already relatively complex, as it accommodates a number of deployment options, especially in relation to security rules and/or acceleration. * Upgrade complexity: being able to retrofit the existing design means that an existing deployment does not need to go through a forklift upgrade in order to expose new functionality; alternatively, the desire of avoiding a migration requires a more complex solution that is able to support multiple modes of operations; * Design reusability: ideally, a proposed design can easily apply to the various technology backends that the Neutron L2 agent supports: Open vSwitch and Linux Bridge. * Performance penalty: no solution is appealing enough if it is unable to satisfy the stringent requirement of high packet throughput, at least in the long term. * Feature compatibility: VLAN `transparency `_ is for better or for worse intertwined with vlan awareness. The former is about making the platform not interfere with the tag associated to the packets sent by the VM, and let the underlay figure out where the packet needs to be sent out; the latter is about making the platform use the vlan tag associated to packet to determine where the packet needs to go. Ideally, a design choice to satisfy the awareness use case will not have a negative impact for solving the transparency use case. Having said that, the two features are still meant to be mutually exclusive in their application, and plugging subports into networks whose vlan-transparency flag is set to True might have unexpected results. In fact, it would be impossible from the platform's point of view discerning which tagged packets are meant to be treated 'transparently' and which ones are meant to be used for demultiplexing (in order to reach the right destination). The outcome might only be predictable if two layers of vlan tags are stacked up together, making guest support even more crucial for the combined use case. It is clear by now that an acceptable solution must be assessed with these issues in mind. The potential solutions worth enumerating are: * VLAN interfaces: in layman's terms, these interfaces allow to demux the traffic before it hits the integration bridge where the traffic will get isolated and sent off to the right destination. This solution is `proven `_ to work for both iptables-based and native ovs security rules (credit to Rawlin Peters). This solution has the following design implications: * Design complexity: this requires relative small changes to the existing OVS design, and it can work with both iptables and native ovs security rules. * Upgrade complexity: in order to employ this solution no major upgrade is necessary and thus no potential dataplane disruption is involved. * Design reusability: VLAN interfaces can easily be employed for both Open vSwitch and Linux Bridge. * Performance penalty: using VLAN interfaces means that the kernel must be involved. For Open vSwitch, being able to use a fast path like DPDK would be an unresolved issue (`Kernel NIC interfaces `_ are not on the roadmap for distros and OVS, and most likely will never be). Even in the absence of an extra bridge, i.e. when using native ovs firewall, and with the advent of userspace connection tracking that would allow the `stateful firewall driver `_ to work with DPDK, the performance gap between a pure userspace DPDK capable solution and a kernel based solution will be substantial, at least under certain traffic conditions. * Feature compatibility: in order to keep the design simple once VLAN interfaces are adopted, and yet enable VLAN transparency, Open vSwitch needs to support QinQ, which is currently lacking as of 2.5 and with no ongoing plan for integration. * Going full openflow: in layman's terms, this means programming the dataplane using OpenFlow in order to provide tenant isolation, and packet processing. This solution has the following design implications: * Design complexity: this requires a big rearchitecture of the current Neutron L2 agent solution. * Upgrade complexity: existing deployments will be unable to work correctly unless one of the actions take place: a) the agent can handle both the 'old' and 'new' way of wiring the data path; b) a dataplane migration is forced during a release upgrade and thus it may cause (potentially unrecoverable) dataplane disruption. * Design reusability: a solution for Linux Bridge will still be required to avoid widening the gap between Open vSwitch (e.g. OVS has DVR but LB does not). * Performance penalty: using Open Flow will allow to leverage the user space and fast processing given by DPDK, but at a considerable engineering cost nonetheless. Security rules will have to be provided by a `learn based firewall `_ to fully exploit the capabilities of DPDK, at least until `user space `_ connection tracking becomes available in OVS. * Feature compatibility: with the adoption of Open Flow, tenant isolation will no longer be provided by means of local vlan provisioning, thus making the requirement of QinQ support no longer strictly necessary for Open vSwitch. * Per trunk port OVS bridge: in layman's terms, this is similar to the first option, in that an extra layer of mux/demux is introduced between the VM and the integration bridge (br-int) but instead of using vlan interfaces, a combination of a new per port OVS bridge and patch ports to wire this new bridge with br-int will be used. This solution has the following design implications: * Design complexity: the complexity of this solution can be considered in between the above mentioned options in that some work is already available since `Mitaka `_ and the data path wiring logic can be partially reused. * Upgrade complexity: if two separate code paths are assumed to be maintained in the OVS agent to handle regular ports and ports participating a trunk with no ability to convert from one to the other (and vice versa), no migration is required. This is done at a cost of some loss of flexibility and maintenance complexity. * Design reusability: a solution to support vlan trunking for the Linux Bridge mech driver will still be required to avoid widening the gap with Open vSwitch (e.g. OVS has DVR but LB does not). * Performance penalty: from a performance standpoint, the adoption of a trunk bridge relieves the agent from employing kernel interfaces, thus unlocking the full potential of fast packet processing. That said, this is only doable in combination with a native ovs firewall. At the time of writing the only DPDK enabled firewall driver is the learn based one available in the `networking-ovs-dpdk repo `_; * Feature compatibility: the existing local provisioning logic will not be affected by the introduction of a trunk bridge, therefore use cases where VMs are connected to a vlan transparent network via a regular port will still require QinQ support from OVS. To summarize: * VLAN interfaces (A) are compelling because will lead to a relatively contained engineering cost at the expense of performance. The Open vSwitch community will need to be involved in order to deliver vlan transparency. Irrespective of whether this strategy is chosen for Open vSwitch or not, this is still the only viable approach for Linux Bridge and thus pursued to address Linux Bridge support for VLAN trunking. To some extent, this option can also be considered a fallback strategy for OVS deployments that are unable to adopt DPDK. * Open Flow (B) is compelling because it will allow Neutron to unlock the full potential of Open vSwitch, at the expense of development and operations effort. The development is confined within the boundaries of the Neutron community in order to address vlan awareness and transparency (as two distinct use cases, ie. to be adopted separately). Stateful firewall (based on ovs conntrack) limits the adoption for DPDK at the time of writing, but a learn-based firewall can be a suitable alternative. Obviously this solution is not compliant with iptables firewall. * Trunk Bridges (C) tries to bring the best of option A and B together as far as OVS development and performance are concerned, but it comes at the expense of maintenance complexity and loss of flexibility. A Linux Bridge solution would still be required and, QinQ support will still be needed to address vlan transparency. All things considered, as far as OVS is concerned, option (C) is the most promising in the medium term. Management of trunks and ports within trunks will have to be managed differently and, to start with, it is sensible to restrict the ability to update ports (i.e. convert) once they are bound to a particular bridge (integration vs trunk). Security rules via iptables rules is obviously not supported, and never will be. Option (A) for OVS could be pursued in conjunction with Linux Bridge support, if the effort is seen particularly low hanging fruit. However, a working solution based on this option positions the OVS agent as a sub-optminal platform for performance sensitive applications in comparison to other accelerated or SDN-controller based solutions. Since further data plane performance improvement is hindered by the extra use of kernel resources, this option is not at all appealing in the long term. Embracing option (B) in the long run may be complicated by the adoption of option (C). The development and maintenance complexity involved in Option (C) and (B) respectively poses the existential question as to whether investing in the agent-based architecture is an effective strategy, especially if the end result would look a lot like other maturing alternatives. Implementation VLAN Interfaces (Option A) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This implementation doesn't require any modification of the vif-drivers since Nova will plug the vif of the VM the same way as it does for traditional ports. Trunk port creation +++++++++++++++++++ A VM is spawned passing to Nova the port-id of a parent port associated with a trunk. Nova/libvirt will create the tap interface and will plug it into br-int or into the firewall bridge if using iptables firewall. In the external-ids of the port Nova will store the port ID of the parent port. The OVS agent detects that a new vif has been plugged. It gets the details of the new port and wires it. The agent configures it in the same way as a traditional port: packets coming out from the VM will be tagged using the internal VLAN ID associated to the network, packets going to the VM will be stripped of the VLAN ID. After wiring it successfully the OVS agent will send a message notifying Neutron server that the parent port is up. Neutron will send back to Nova an event to signal that the wiring was successful. If the parent port is associated with one or more subports the agent will process them as described in the next paragraph. Subport creation ++++++++++++++++ If a subport is added to a parent port but no VM was booted using that parent port yet, no L2 agent will process it (because at that point the parent port is not bound to any host). When a subport is created for a parent port and a VM that uses that parent port is already running, the OVS agent will create a VLAN interface on the VM tap using the VLAN ID specified in the subport segmentation id. There's a small possibility that a race might occur: the firewall bridge might be created and plugged while the vif is not there yet. The OVS agent needs to check if the vif exists before trying to create a subinterface. Let's see how the models differ when using the iptables firewall or the ovs native firewall. Iptables Firewall ''''''''''''''''' :: +----------------------------+ | VM | | eth0 eth0.100 | +-----+-----------------+----+ | | +---+---+ +-----+-----+ | tap1 |-------| tap1.100 | +---+---+ +-----+-----+ | | | | +---+---+ +---+---+ | qbr1 | | qbr2 | +---+---+ +---+---+ | | | | +-----+-----------------+----+ | port 1 port 2 | | (tag 3) (tag 5) | | br-int | +----------------------------+ Let's assume the subport is on network2 and uses segmentation ID 100. In the case of hybrid plugging the OVS agent will have to create the firewall bridge (qbr2), create tap1.100 and plug it into qbr2. It will connect qbr2 to br-int and set the subport ID in the external-ids of port 2. *Inbound traffic from the VM point of view* The untagged traffic will flow from port 1 to eth0 through qbr1. For the traffic coming out of port 2, the internal VLAN ID of network2 will be stripped. The packet will then go untagged through qbr2 where iptables rules will filter the traffic. The tag 100 will be pushed by tap1.100 and the packet will finally get to eth0.100. *Outbound traffic from the VM point of view* The untagged traffic will flow from eth0 to port1 going through qbr1 where firewall rules will be applied. Traffic tagged with VLAN 100 will leave eth0.100, go through tap1.100 where the VLAN 100 is stripped. It will reach qbr2 where iptables rules will be applied and go to port 2. The internal VLAN of network2 will be pushed by br-int when the packet enters port2 because it's a tagged port. OVS Firewall case ''''''''''''''''' :: +----------------------------+ | VM | | eth0 eth0.100 | +-----+-----------------+----+ | | +---+---+ +-----+-----+ | tap1 |-------| tap1.100 | +---+---+ +-----+-----+ | | | | | | +-----+-----------------+----+ | port 1 port 2 | | (tag 3) (tag 5) | | br-int | +----------------------------+ When a subport is created the OVS agent will create the VLAN interface tap1.100 and plug it into br-int. Let's assume the subport is on network2. *Inbound traffic from the VM point of view* The traffic will flow untagged from port 1 to eth0. The traffic going out from port 2 will be stripped of the VLAN ID assigned to network2. It will be filtered by the rules installed by the firewall and reach tap1.100. tap1.100 will tag the traffic using VLAN 100. It will then reach the VM's eth0.100. *Outbound traffic from the VM point of view* The untagged traffic will flow and reach port 1 where it will be tagged using the VLAN ID associated to the network. Traffic tagged with VLAN 100 will leave eth0.100 reach tap1.100 where VLAN 100 will be stripped. It will then reach port2. It will be filtered by the rules installed by the firewall on port 2. Then the packets will be tagged using the internal VLAN associated to network2 by br-int since port 2 is a tagged port. Parent port deletion ++++++++++++++++++++ Deleting a port that is an active parent in a trunk is forbidden. If the parent port has no trunk associated (it's a "normal" port), it can be deleted. The OVS agent doesn't need to perform any action, the deletion will result in a removal of the port data from the DB. Trunk deletion ++++++++++++++ When Nova deletes a VM, it deletes the VM's corresponding Neutron ports only if they were created by Nova when booting the VM. In the vlan-aware-vm case the parent port is passed to Nova, so the port data will remain in the DB after the VM deletion. Nova will delete the VIF of the VM (in the example tap1) as part of the VM termination. The OVS agent will detect that deletion and notify the Neutron server that the parent port is down. The OVS agent will clean up the corresponding subports as explained in the next paragraph. The deletion of a trunk that is used by a VM is not allowed. The trunk can be deleted (leaving the parent port intact) when the parent port is not used by any VM. After the trunk is deleted, the parent port can also be deleted. Subport deletion ++++++++++++++++ Removing a subport that is associated with a parent port that was not used to boot any VM is a no op from the OVS agent perspective. When a subport associated with a parent port that was used to boot a VM is deleted, the OVS agent will take care of removing the firewall bridge if using iptables firewall and the port on br-int. Implementation Trunk Bridge (Option C) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This implementation is based on this `etherpad `_. Credits to Bence Romsics. The option use_veth_interconnection=true won't be supported, it will probably be deprecated soon, see [1]. The IDs used for bridge and port names are truncated. :: +--------------------------------+ | VM | | eth0 eth0.100 | +-----+--------------------+-----+ | | +-----+--------------------------+ | tap1 | | tbr-trunk-id | | | | tpt-parent-id spt-subport-id | | (tag 100) | +-----+-----------------+--------+ | | | | | | +-----+-----------------+---------+ | tpi-parent-id spi-subport-id | | (tag 3) (tag 5) | | | | br-int | +---------------------------------+ tpt-parent-id: trunk bridge side of the patch port that implements a trunk. tpi-parent-id: int bridge side of the patch port that implements a trunk. spt-subport-id: trunk bridge side of the patch port that implements a subport. spi-subport-id: int bridge side of the patch port that implements a subport. [1] https://bugs.launchpad.net/neutron/+bug/1587296 Trunk creation ++++++++++++++ A VM is spawned passing to Nova the port-id of a parent port associated with a trunk. Neutron will pass to Nova the bridge where to plug the vif as part of the vif details. The os-vif driver creates the trunk bridge tbr-trunk-id if it does not exist in plug(). It will create the tap interface tap1 and plug it into tbr-trunk-id setting the parent port ID in the external-ids. The OVS agent will be monitoring the creation of ports on the trunk bridges. When it detects that a new port has been created on the trunk bridge, it will do the following: :: ovs-vsctl add-port tbr-trunk-id tpt-parent-id -- set Interface tpt-parent-id type=patch options:peer=tpi-parent-id ovs-vsctl add-port br-int tpi-parent-id tag=3 -- set Interface tpi-parent-id type=patch options:peer=tpt-parent-id A patch port is created to connect the trunk bridge to the integration bridge. tpt-parent-id, the trunk bridge side of the patch is not associated to any tag. It will carry untagged traffic. tpi-parent-id, the br-int side the patch port is tagged with VLAN 3. We assume that the trunk is on network1 that on this host is associated with VLAN 3. The OVS agent will set the trunk ID in the external-ids of tpt-parent-id and tpi-parent-id. If the parent port is associated with one or more subports the agent will process them as described in the next paragraph. Subport creation ++++++++++++++++ If a subport is added to a parent port but no VM was booted using that parent port yet, the agent won't process the subport (because at this point there's no node associated with the parent port). When a subport is added to a parent port that is used by a VM the OVS agent will create a new patch port: :: ovs-vsctl add-port tbr-trunk-id spt-subport-id tag=100 -- set Interface spt-subport-id type=patch options:peer=spi-subport-id ovs-vsctl add-port br-int spi-subport-id tag=5 -- set Interface spi-subport-id type=patch options:peer=spt-subport-id This patch port connects the trunk bridge to the integration bridge. spt-subport-id, the trunk bridge side of the patch is tagged using VLAN 100. We assume that the segmentation ID of the subport is 100. spi-subport-id, the br-int side of the patch port is tagged with VLAN 5. We assume that the subport is on network2 that on this host uses VLAN 5. The OVS agent will set the subport ID in the external-ids of spt-subport-id and spi-subport-id. *Inbound traffic from the VM point of view* The traffic coming out of tpi-parent-id will be stripped by br-int of VLAN 3. It will reach tpt-parent-id untagged and from there tap1. The traffic coming out of spi-subport-id will be stripped by br-int of VLAN 5. It will reach spt-subport-id where it will be tagged with VLAN 100 and it will then get to tap1 tagged. *Outbound traffic from the VM point of view* The untagged traffic coming from tap1 will reach tpt-parent-id and from there tpi-parent-id where it will be tagged using VLAN 3. The traffic tagged with VLAN 100 from tap1 will reach spt-subport-id. VLAN 100 will be stripped since spt-subport-id is a tagged port and the packet will reach spi-subport-id, where it's tagged using VLAN 5. Parent port deletion ++++++++++++++++++++ Deleting a port that is an active parent in a trunk is forbidden. If the parent port has no trunk associated, it can be deleted. The OVS agent doesn't need to perform any action. Trunk deletion ++++++++++++++ When Nova deletes a VM, it deletes the VM's corresponding Neutron ports only if they were created by Nova when booting the VM. In the vlan-aware-vm case the parent port is passed to Nova, so the port data will remain in the DB after the VM deletion. Nova will delete the port on the trunk bridge where the VM is plugged. The L2 agent will detect that and delete the trunk bridge. It will notify the Neutron server that the parent port is down. The deletion of a trunk that is used by a VM is not allowed. The trunk can be deleted (leaving the parent port intact) when the parent port is not used by any VM. After the trunk is deleted, the parent port can also be deleted. Subport deletion ++++++++++++++++ The OVS agent will delete the patch port pair corresponding to the subport deleted. Agent resync ~~~~~~~~~~~~ During resync the agent should check that all the trunk and subports are still valid. It will delete the stale trunk and subports using the procedure specified in the previous paragraphs according to the implementation. Further Reading --------------- * `Darragh O'Reilly - The Open vSwitch plugin with VLANs `_