[NB DB] NB OVN BGP Agent: Design of the BGP Driver with kernel routing

Purpose

The addition of a BGP driver enables the OVN BGP agent to expose virtual machine (VMs) and load balancer (LBs) IP addresses through the BGP dynamic protocol when these IP addresses are either associated with a floating IP (FIP) or are booted or created on a provider network. The same functionality is available on project networks, when a special flag is set.

This document presents the design decision behind the NB BGP Driver for the Networking OVN BGP agent.

Overview

With the growing popularity of virtualized and containerized workloads, it is common to use pure Layer 3 spine and leaf network deployments in data centers. The benefits of this practice reduce scaling complexities, failure domains, and broadcast traffic limits

The Northbound driver for OVN BGP agent is a Python-based daemon that runs on each OpenStack Controller and Compute node. The agent monitors the Open Virtual Network (OVN) northbound database for certain VM and floating IP (FIP) events. When these events occur, the agent notifies the FRR BGP daemon (bgpd) to advertise the IP address or FIP associated with the VM. The agent also triggers actions that route the external traffic to the OVN overlay. Unlike its predecessor, the Southbound driver for OVN BGP agent, the Northbound driver uses the northbound database API which is more stable than the southbound database API because the former is isolated from internal changes to core OVN.

Note

Note northbound OVN BGP agent driver is only intended for the N/S traffic, the E/W traffic will work exactly the same as before, i.e., VMs are connected through geneve tunnels.

The agent provides a multi-driver implementation that allows you to configure it for specific infrastructure running on top of OVN, for instance OpenStack or Kubernetes/OpenShift. This design simplicity enables the agent to implement different drivers, depending on what OVN NB DB events are being watched (watchers examples at ovn_bgp_agent/drivers/openstack/watchers/), and what actions are triggered in reaction to them (drivers examples at ovn_bgp_agent/drivers/openstack/XXXX_driver.py, implementing the ovn_bgp_agent/drivers/driver_api.py).

A driver implements the support for BGP capabilities. It ensures that both VMs and LBs on provider networks or associated Floating IPs are exposed through BGP. In addition, VMs on tenant networks can be also exposed if the expose_tenant_network configuration option is enabled. To control what tenant networks are exposed another flag can be used: address_scopes. If not set, all the tenant networks will be exposed, while if it is configured with a (set of) address_scopes, only the tenant networks whose address_scope matches will be exposed.

A common driver API is defined exposing the these methods:

  • expose_ip and withdraw_ip: exposes or withdraws IPs for local OVN ports.

  • expose_remote_ip and withdraw_remote_ip: exposes or withdraws IPs through another node when the VM or pods are running on a different node. For example, use for VMs on tenant networks where the traffic needs to be injected through the OVN router gateway port.

  • expose_subnet and withdraw_subnet: exposes or withdraws subnets through the local node.

Proposed Solution

To support BGP functionality the NB OVN BGP Agent includes a new driver that performs the steps required for exposing the IPs through BGP on the correct nodes and steering the traffic to/from the node from/to the OVN overlay. To configure the OVN BGP agent to use the northbound OVN BGP driver, in the bgp-agent.conf file, set the value of driver to nb_ovn_bgp_driver.

This driver requires a watcher to react to the BGP-related events. In this case, BGP actions are triggered by events related to Logical_Switch_Port, Logical_Router_Port``and ``Load_Balancer on OVN NB DB tables. The information in these tables is modified when VMs and LBs are created and deleted, and when FIPs for them are associated and disassociated.

Then, the agent performs these actions to ensure the VMs are reachable through BGP:

  • Traffic between nodes or BGP Advertisement: These are the actions needed to expose the BGP routes and make sure all the nodes know how to reach the VM/LB IP on the nodes. This is exactly the same as in the initial OVN BGP Driver (see [SB DB] OVN BGP Agent: Design of the BGP Driver with kernel routing)

  • Traffic within a node or redirecting traffic to/from OVN overlay (wiring): These are the actions needed to redirect the traffic to/from a VM to the OVN neutron networks, when traffic reaches the node where the VM is or in their way out of the node.

The code for the NB BGP driver is located at ovn_bgp_agent/drivers/openstack/nb_ovn_bgp_driver.py, and its associated watcher can be found at ovn_bgp_agent/drivers/openstack/watchers/nb_bgp_watcher.py.

Note this new driver also allows different ways of wiring the node to the OVN overlay. These are configurable through the option exposing_method, where for now you can select:

OVN NB DB Events

The watcher associated with the BGP driver detects the relevant events on the OVN NB DB to call the driver functions to configure BGP and linux kernel networking accordingly.

Note

Linux Kernel Networking is used when the default exposing_method (underlay) or (vrf) is used. If ovn is used instead, OVN routing is used instead of Kernel. For more details on this see [NB DB] NB OVN BGP Agent: Design of the BGP Driver with OVN routing.

The following events are watched and handled by the BGP watcher:

  • VMs or LBs created/deleted on provider networks

  • FIPs association/disassociation to VMs or LBs

  • VMs or LBs created/deleted on tenant networks (if the expose_tenant_networks configuration option is enabled, or if the expose_ipv6_gua_tenant_networks for only exposing IPv6 GUA ranges)

    Note

    If expose_tenant_networks flag is enabled, it does not matter the status of expose_ipv6_gua_tenant_networks, as all the tenant IPs are advertised.

The NB BGP watcher reacts to the following events:

  • Logical_Switch_Port

  • Logical_Router_Port

  • Load_Balancer

Besides the previously existing OVNLBEvent class, the NB BGP watcher has new event classes named LSPChassisEvent and LRPChassisEvent that all the events watched for NB BGP driver use as the base (inherit from).

The specific defined events to react to are:

  • LogicalSwitchPortProviderCreateEvent: Detects when a VM or an amphora LB port, logical switch ports of type "" (empty double-qoutes) or virtual, comes up or gets attached to the OVN chassis where the agent is running. If the ports are on a provider network, then the driver calls the expose_ip driver method to perform the needed actions to expose the port (wire and advertise). If the port is on a tenant network, the driver dismisses the event.

  • LogicalSwitchPortProviderDeleteEvent: Detects when a VM or an amphora LB port, logical switch ports of type “” (empty double-qoutes) or virtual, goes down or gets detached from the OVN chassis where the agent is running. If the ports are on a provider network, then the driver calls the withdraw_ip driver method to perform the needed actions to withdraw the port (withdraw and unwire). If the port is on a tenant network, the driver dismisses the event.

  • LogicalSwitchPortFIPCreateEvent: Similar to LogicalSwitchPortProviderCreateEvent but focusing on the changes on the FIP information on the Logical Switch Port external_ids. It calls expose_fip driver method to perform the needed actions to expose the floating IP (wire and advertize).

  • LogicalSwitchPortFIPDeleteEvent: Same as previous one but for withdrawing FIPs. In this case it is similar to LogicalSwitchPortProviderDeleteEvent but instaed calls the withdraw_fip driver method to perform the needed actions to withdraw the floating IP (Withdraw and unwire).

  • LocalnetCreateDeleteEvent: Detects creation/deletion of OVN localnet ports, which indicates the creation/deletion of provider networks. This triggers a resync (sync method) action to perform the base configuration needed for the provider networks, such as OVS flows or arp/ndp configurations.

  • ChassisRedirectCreateEvent: Similar to LogicalSwitchPortProviderCreateEvent but with the focus on logical router ports, such as the Distributed Router Ports (cr-lrps), instead of logical switch ports. The driver calls expose_ip which performs additional steps to also expose IPs related to the cr-lrps, such as the ovn-lb or IPs in tenant networks. The watcher match checks the chassis information in the status field, which must be ovn23.09 or later.

  • ChassisRedirectDeleteEvent: Similar to LogicalSwitchPortProviderDeleteEvent but with the focus on logical router ports, such as the Distributed Router Ports (cr-lrps), instead of logical switch ports. The driver calls withdraw_ip which performs additional steps to also withdraw IPs related to the cr-lrps, such as the ovn-lb or IPs in tenant networks. The watcher match checks the chassis information in the status field, which must be ovn23.09 or later.

  • LogicalSwitchPortSubnetAttachEvent: Detects Logical Switch Ports of type router (connecting Logical Switch to Logical Router) and checks if the associated router is associated to the local chassis, i.e., if the cr-lrp of the router is located in the local chassis. If that is the case, the expose_subnet driver method is called which is in charge of the wiring needed for the IPs on that subnet (set of IP routes and rules).

  • LogicalSwitchPortSubnetDetachEvent: Similar to LogicalSwitchPortSubnetAttachEvent but for unwiring the subnet, so it is calling the``withdraw_subnet`` driver method.

  • LogicalSwitchPortTenantCreateEvent: Detects when a logical switch port of type "" (empty double-qoutes) or virtual, similar to LogicalSwitchPortProviderCreateEvent. It checks if the network associated to the VM is exposed in the local chassis (meaning its cr-lrp is also local). If that is the case, it calls expose_remote_ip, which manages the advertising of the IP – there is no need for wiring, as that is done when the subnet is exposed by LogicalSwitchPortSubnetAttachEvent event.

  • LogicalSwitchPortTenantDeleteEvent: Similar to LogicalSwitchPortTenantCreateEvent but for withdrawing IPs. Calling withdraw_remote_ips.

  • OVNLBCreateEvent: Detects Load_Balancer events and processes them only if the Load_Balancer entry has associated VIPs and the router is local to the chassis. If the VIP or router is added to a provider network, the driver calls expose_ovn_lb_vip to expose and wire the VIP or router. If the VIP or router is added to a tenant network, the driver calls expose_ovn_lb_vip to only expose the VIP or router. If a floating IP is added, then the driver calls expose_ovn_lb_fip to expose and wire the FIP.

  • OVNLBDeleteEvent: If the VIP or router is removed from a provider network, the driver calls withdraw_ovn_lb_vip to withdraw and unwire the VIP or router. If the VIP or router is removed to a tenant network, the driver calls withdraw_ovn_lb_vip to only withdraw the VIP or router. If a floating IP is removed, then the driver calls withdraw_ovn_lb_fip to withdraw and unwire the FIP.

Driver Logic

The NB BGP driver is in charge of the networking configuration ensuring that VMs and LBs on provider networks or with FIPs can be reached through BGP (N/S traffic). In addition, if the expose_tenant_networks flag is enabled, VMs in tenant networks should be reachable too – although instead of directly in the node they are created, through one of the network gateway chassis nodes. The same happens with expose_ipv6_gua_tenant_networks but only for IPv6 GUA ranges. In addition, if the config option address_scopes is set, only the tenant networks with matching corresponding address_scope will be exposed.

Note

To be able to expose tenant networks a OVN version OVN 23.09 or newer is required.

To accomplish the network configuration and advertisement, the driver ensures:

  • VM and LBs IPs can be advertised in a node where the traffic can be injected into the OVN overlay: either in the node that hosts the VM or in the node where the router gateway port is scheduled. (See the “limitations” subsection.).

  • After the traffic reaches the specific node, kernel networking redirects the traffic to the OVN overlay, if the default underlay exposing method is used.

BGP Advertisement

The OVN BGP Agent (both SB and NB drivers) is in charge of triggering FRR (IP routing protocol suite for Linux which includes protocol daemons for BGP, OSPF, RIP, among others) to advertise/withdraw directly connected routes via BGP. To do that, when the agent starts, it ensures that:

  • FRR local instance is reconfigured to leak routes for a new VRF. To do that it uses vtysh shell. It connects to the existsing FRR socket ( --vty_socket option) and executes the next commands, passing them through a file (-c FILE_NAME option):

    router bgp {{ bgp_as }}
      address-family ipv4 unicast
        import vrf {{ vrf_name }}
      exit-address-family
    
      address-family ipv6 unicast
        import vrf {{ vrf_name }}
      exit-address-family
    
    router bgp {{ bgp_as }} vrf {{ vrf_name }}
      bgp router-id {{ bgp_router_id }}
      address-family ipv4 unicast
        redistribute connected
      exit-address-family
    
      address-family ipv6 unicast
        redistribute connected
      exit-address-family
    
  • There is a VRF created (the one leaked in the previous step), by default with name bgp-vrf.

  • There is a dummy interface type (by default named bgp-nic), associated to the previously created VRF device.

  • Ensure ARP/NDP is enabled at OVS provider bridges by adding an IP to it.

Then, to expose the VMs/LB IPs as they are created (or upon initialization or re-sync), since the FRR configuration has the redistribute connected option enabled, the only action needed to expose it (or withdraw it) is to add it (or remove it) from the bgp-nic dummy interface. Then it relies on Zebra to do the BGP advertisement, as Zebra detects the addition/deletion of the IP on the local interface and advertises/withdraws the route:

$ ip addr add IPv4/32 dev bgp-nic
$ ip addr add IPv6/128 dev bgp-nic

Note

As we also want to be able to expose VM connected to tenant networks (when expose_tenant_networks or expose_ipv6_gua_tenant_networks configuration options are enabled), there is a need to expose the Neutron router gateway port (cr-lrp on OVN) so that the traffic to VMs in tenant networks is injected into OVN overlay through the node that is hosting that port.

EVPN Advertisement (expose method vrf)

When using expose method vrf, the OVN BGP Agent is in charge of triggering FRR (IP routing protocol suite for Linux which includes protocol daemons for BGP, OSPF, RIP, among others) to advertise/withdraw directly connected and kernel routes via BGP.

To do that, when the agent starts, it will search for all provider networks and configure them.

In order to expose a provider network, each provider network must match these criteria:

  • The provider network can be matched to the bridge mappings as defined in the running OpenVSwitch instance (e.g. ovn-bridge-mappings="physnet1:br-ex")

  • The provider network has been configured by an admin with at least a vni, and the vpn type has been configured too with value l3.

    For example (when using the OVN tools):

    $ ovn-nbctl set logical-switch neutron-cd5d6fa7-3ed7-452b-8ce9-1490e2d377c8 external_ids:"neutron_bgpvpn\:type"=l3
    $ ovn-nbctl set logical-switch neutron-cd5d6fa7-3ed7-452b-8ce9-1490e2d377c8 external_ids:"neutron_bgpvpn\:vni"=100
    $ ovn-nbctl list logical-switch | less
    ...
    external_ids        : {.. "neutron_bgpvpn:type"=l3, "neutron_bgpvpn:vni"="1001" ..}
    name                : neutron-cd5d6fa7-3ed7-452b-8ce9-1490e2d377c8
    ...
    

    It is also possible to configure this using the Neutron BGP VPN API.

Initialization sequence per VRF

Once the networks have been initialized, the driver waits for the first ip to be exposed, before actually exposing the VRF on the host.

Once a VRF is exposed on the host, the following will be done (per VRF):

  1. Create EVPN related devices

    • Create VRF device, using the VNI number as name suffix: vrf-1001

      $ ip link add vrf-1001 type vrf table 1001
      
    • Create the VXLAN device, using the VNI number as the vxlan id, as well as for the name suffix: vxlan-1001

      $ ip link add vxlan-1001 type vxlan id 1001 dstport 4789 local LOOPBACK_IP nolearning
      
    • Create the Bridge device, where the vxlan device is connected, and associate it to the created vrf, also using the VNI number as name suffix: br-1001

      $ ip link add name br-1001 type bridge stp_state 0
      $ ip link set br-1001 master vrf-1001
      $ ip link set vxlan-1001 master br-1001
      
  2. Reconfigure local FRR instance (frr.conf) to ensure the new VRF is exposed. To do that it uses vtysh shell. It connects to the existing FRR socket (–vty_socket option) and executes the next commands, passing them through a file (-c FILE_NAME option):

    vrf {{ vrf_name }}
        vni {{ vni }}
    exit-vrf
    router bgp {{ bgp_as }} vrf {{ vrf_name }}
      bgp router-id {{ bgp_router_id }}
      address-family ipv4 unicast
      redistribute connected
      redistribute kernel
      exit-address-family
    
      address-family ipv6 unicast
        redistribute connected
        redistribute kernel
      exit-address-family
      address-family l2vpn evpn
        advertise ipv4 unicast
        advertise ipv6 unicast
        rd {{ local_ip }}:{{ vni }}
      exit-address-family
    
  3. Connect EVPN to OVN overlay so that traffic can be redirected from the node to the OVN virtual networking. It needs to connect the VRF to the OVS provider bridge:

    • Create a veth device, that will be used for routing between the vrf and OVN, using the uuid of the localnet port in the logical-switch-port table and connect it to ovs (in this example the uuid of the localnet port is 12345678-1234-1234-1234-123456789012, and the first 11 chars will be used in the interface name):

      $ ip link add name vrf12345678-12 type veth peer name ovs12345678-12
      $ ovs-vsctl add-port br-ex ovs12345678-12
      $ ip link set up dev ovs12345678-12
      
    • For EVPN l3 mode (only supported mode currently), it will attach the vrf side to the vrf:

      $ ip link set vrf12345678-12 master vrf-1001
      $ ip link set up dev vrf12345678-12
      

      And it will add routing IPs on the veth interface, so the kernel is able to do L3 routing within the VRF. By default it will add a 169.254.x.x address based on the VNI/VLAN.

      If possible it will use the dhcp options to determine if it can use an actually configured router ip address, in addition to the 169.254.x.x address:

      $ ip address add 10.0.0.1/32 dev vrf12345678-12  # router option from dhcp opts
      $ ip address add 169.254.0.123/32 dev vrf12345678-12  # generated 169.254.x.x address for vlan 123
      $ ip -6 address add fd53:d91e:400:7f17::7b/128 dev vrf12345678-12  # generated ipv6 address for vlan 123
      
  4. Add needed OVS flows into the OVS provider bridge (e.g., br-ex) to redirect the traffic back from OVN to the proper VRF, based on the subnet CIDR and the router gateway port MAC address.

    $ ovs-ofctl add-flow br-ex cookie=0x3e7,priority=900,ip,in_port=<OVN_PATCH_PORT_ID>,actions=mod_dl_dst:VETH|VLAN_MAC,NORMAL
    
  5. If CONF.anycast_evpn_gateway_mode is enabled, it will make sure that the mac address on the vrf12345678-12 interface is equal on all nodes, using the VLAN id and VNI id as an offset while generating a MAC address.

    $ ip link set address 02:00:03:e7:00:7b dev vrf12345678-12  # generated mac for vni 1001 and vlan 123
    
    # Replace link local address and update to generated vlan mac (used for ipv6 router advertisements)
    $ ip -6 address del <some fe80::/10 address> dev vrf12345678-12
    $ ip -6 address add fe80::200:3e7:65/64 dev vrf12345678-12
    
  6. If IPv6 subnets are defined (checked in dhcp opts once again), then configure FRR to handle neighbor discovery (and do router advertisements for us)

    interface {{ vrf_intf }}
     {% if is_dhcpv6 %}
     ipv6 nd managed-config-flag
     {% endif %}
     {% for server in dns_servers %}
     ipv6 nd rdnss {{ server }}
     {% endfor %}
     ipv6 nd prefix {{ prefix }}
     no ipv6 nd suppress-ra
    exit
    
  7. Then, finally, add the routes to expose to the VRF, since we use full kernel routing in this VRF, we also expose the MAC address that belongs to this route, so we do not rely on ARP proxies in OVN.

    $ ip route add 10.0.0.5/32 dev vrf12345678-12
    $ ip route show table 1001 | grep veth
    local 10.0.0.1 dev vrf12345678-12 proto kernel scope host src 10.0.0.1
    10.0.0.5 dev vrf12345678-12 scope link
    local 169.254.0.123 dev vrf12345678-12 proto kernel scope host src 169.254.0.123
    
    $ ip neigh add 10.0.0.5 dev vrf12345678-12 lladdr fa:16:3e:7d:50:ad nud permanent
    $ ip neigh show vrf vrf-100 | grep veth
    10.0.0.5 dev vrf12345678-12 lladdr fa:16:3e:7d:50:ad PERMANENT
    fe80::f816:3eff:fe7d:50ad dev vrf12345678-12 lladdr fa:16:3e:7d:50:ad STALE
    

Note

The VRF is not associated to one OpenStack tenant, but can be mixed with other provider networks too. When using VLAN provider networks, one can connect multiple networks to the same VNI, effectively placing them in the same VRF, routed and handled through kernel and FRR.

Note

As we also want to be able to expose VM connected to tenant networks (when expose_tenant_networks or expose_ipv6_gua_tenant_networks configuration options are enabled), there is a need to expose the Neutron router gateway port (cr-lrp on OVN) so that the traffic to VMs in tenant networks is injected into OVN overlay through the node that is hosting that port.

Traffic flow from tenant networks

By default neutron enables SNAT on routers (because that is typically what you’d use the routers for). This has some side effects that might not be all that convenient; for one, all connections initiated from VMs in tenant networks will be externally identified with the IP of the cr-lrp.

The VMs in the tenant networks are reachable through their own ip and return traffic will flow as expected as well, but it is just not really what one would expect.

To prevent tenant networks from being exposed if SNAT is enabled, one can set the configuration option require_snat_disabled_for_tenant_networks to True

This will check if the cr-lrp has SNAT disabled for that subnet, and prevent announcement of those tenant networks.

Note

Neutron will add IPv6 subnets are without NAT, so even though the IPv4 of those tenant networks might have NAT enabled, the IPv6 subnet might still be exposed, as this has no NAT enabled.

To disable the SNAT on a neutron router, one could simply run this command:

$ openstack router set --disable-snat --external-gateway <provider_network> <router>

Traffic Redirection to/from OVN

Besides the VM/LB IP being exposed in a specific node (either the one hosting the VM/LB or the one with the OVN router gateway port), the OVN BGP Agent is in charge of configuring the linux kernel networking and OVS so that the traffic can be injected into the OVN overlay, and vice versa. To do that, when the agent starts, it ensures that:

  • ARP/NDP is enabled on OVS provider bridges by adding an IP to it

  • There is a routing table associated to each OVS provider bridge (adds entry at /etc/iproute2/rt_tables)

  • If the provider network is a VLAN network, a VLAN device connected to the bridge is created, and it has ARP and NDP enabled.

  • Cleans up extra OVS flows at the OVS provider bridges

Then, either upon events or due to (re)sync (regularly or during start up), it:

  • Adds an IP rule to apply specific routing table routes, in this case the one associated to the OVS provider bridge:

    $ ip rule
    0:      from all lookup local
    1000:   from all lookup [l3mdev-table]
    *32000:  from all to IP lookup br-ex*  # br-ex is the OVS provider bridge
    *32000:  from all to CIDR lookup br-ex*  # for VMs in tenant networks
    32766:  from all lookup main
    32767:  from all lookup default
    
  • Adds an IP route at the OVS provider bridge routing table so that the traffic is routed to the OVS provider bridge device:

    $ ip route show table br-ex
    default dev br-ex scope link
    *CIDR via CR-LRP_IP dev br-ex*  # for VMs in tenant networks
    *CR-LRP_IP dev br-ex scope link*  # for the VM in tenant network redirection
    *IP dev br-ex scope link*  # IPs on provider or FIPs
    
  • Adds a static ARP entry for the OVN Distributed Gateway Ports (cr-lrps) so that the traffic is steered to OVN via br-int – this is because OVN does not reply to ARP requests outside its L2 network:

    $ ip neigh
    ...
    CR-LRP_IP dev br-ex lladdr CR-LRP_MAC PERMANENT
    ...
    
  • For IPv6, instead of the static ARP entry, an NDP proxy is added, same reasoning:

    $ ip -6 neigh add proxy CR-LRP_IP dev br-ex
    
  • Finally, in order for properly send the traffic out from the OVN overlay to kernel networking to be sent out of the node, the OVN BGP Agent needs to add a new flow at the OVS provider bridges so that the destination MAC address is changed to the MAC address of the OVS provider bridge (actions=mod_dl_dst:OVN_PROVIDER_BRIDGE_MAC,NORMAL):

    $ sudo ovs-ofctl dump-flows br-ex
    cookie=0x3e7, duration=77.949s, table=0, n_packets=0, n_bytes=0, priority=900,ip,in_port="patch-provnet-1" actions=mod_dl_dst:3a:f7:e9:54:e8:4d,NORMAL
    cookie=0x3e7, duration=77.937s, table=0, n_packets=0, n_bytes=0, priority=900,ipv6,in_port="patch-provnet-1" actions=mod_dl_dst:3a:f7:e9:54:e8:4d,NORMAL
    

Driver API

The NB BGP driver implements the driver_api.py interface with the following functions:

  • expose_ip: creates all the IP rules and routes, and OVS flows needed to redirect the traffic to OVN overlay. It also ensures that FRR exposes the required IP by using BGP.

  • withdraw_ip: removes the configuration (IP rules/routes, OVS flows) from expose_ip method to withdraw the exposed IP.

  • expose_subnet: adds kernel networking configuration (IP rules and route) to ensure traffic can go from the node to the OVN overlay (and back) for IPs within the tenant subnet CIDR.

  • withdraw_subnet: removes kernel networking configuration added by expose_subnet.

  • expose_remote_ip: BGP expose VM tenant network IPs through the chassis hosting the OVN gateway port for the router where the VM is connected. It ensures traffic directed to the VM IP arrives at this node by exposing the IP through BGP locally. The previous steps in expose_subnet ensure the traffic is redirected to the OVN overlay after it arrives on the node.

  • withdraw_remote_ip: removes the configuration added by expose_remote_ip.

And in addition, the driver also implements extra methods for the FIPs and the OVN load balancers:

  • expose_fip and withdraw_fip which are equivalent to expose_ip and withdraw_ip but for FIPs.

  • expose_ovn_lb_vip: adds kernel networking configuration to ensure traffic is forwarded from the node with the associated cr-lrp to the OVN overlay, as well as to expose the VIP through BGP in that node.

  • withdraw_ovn_lb_vip: removes the above steps to stop advertising the load balancer VIP.

  • expose_ovn_lb_fip and withdraw_ovn_lb_fip: for exposing the FIPs associated to ovn loadbalancers. This is similar to expose_fip/withdraw_fip but taking into account that it must be exposed on the node with the cr-lrp for the router associated to the loadbalancer.

Agent deployment

The BGP mode (for both NB and SB drivers) exposes the VMs and LBs in provider networks or with FIPs, as well as VMs on tenant networks if expose_tenant_networks or expose_ipv6_gua_tenant_networks configuration options are enabled.

There is a need to deploy the agent in all the nodes where VMs can be created as well as in the networker nodes (i.e., where OVN router gateway ports can be allocated):

  • For VMs and Amphora load balancers on provider networks or with FIPs, the IP is exposed on the node where the VM (or amphora) is deployed. Therefore the agent needs to be running on the compute nodes.

  • For VMs on tenant networks (with expose_tenant_networks or expose_ipv6_gua_tenant_networks configuration options enabled), the agent needs to be running on the networker nodes. In OpenStack, with OVN networking, the N/S traffic to the tenant VMs (without FIPs) needs to go through the networking nodes, more specifically the one hosting the Distributed Gateway Port (chassisredirect OVN port (cr-lrp)), connecting the provider network to the OVN virtual router. Hence, the VM IPs are advertised through BGP in that node, and from there it follows the normal path to the OpenStack compute node where the VM is located — through the tunnel.

  • Similarly, for OVN load balancer the IPs are exposed on the networker node. In this case the ARP request for the VIP is replied by the OVN router gateway port, therefore the traffic needs to be injected into OVN overlay at that point too. Therefore the agent needs to be running on the networker nodes for OVN load balancers.

As an example of how to start the OVN BGP Agent on the nodes, see the commands below:

$ python setup.py install
$ cat bgp-agent.conf
# sample configuration that can be adapted based on needs
[DEFAULT]
debug=True
reconcile_interval=120
expose_tenant_networks=True
# expose_ipv6_gua_tenant_networks=True
# for SB DB driver
driver=ovn_bgp_driver
# for NB DB driver
#driver=nb_ovn_bgp_driver
bgp_AS=64999
bgp_nic=bgp-nic
bgp_vrf=bgp-vrf
bgp_vrf_table_id=10
ovsdb_connection=tcp:127.0.0.1:6640
address_scopes=2237917c7b12489a84de4ef384a2bcae

[ovn]
ovn_nb_connection = tcp:172.17.0.30:6641
ovn_sb_connection = tcp:172.17.0.30:6642

[agent]
root_helper=sudo ovn-bgp-agent-rootwrap /etc/ovn-bgp-agent/rootwrap.conf
root_helper_daemon=sudo ovn-bgp-agent-rootwrap-daemon /etc/ovn-bgp-agent/rootwrap.conf

$ sudo bgp-agent --config-dir bgp-agent.conf
Starting BGP Agent...
Loaded chassis 51c8480f-c573-4c1c-b96e-582f9ca21e70.
BGP Agent Started...
Ensuring VRF configuration for advertising routes
Configuring br-ex default rule and routing tables for each provider network
Found routing table for br-ex with: ['201', 'br-ex']
Sync current routes.
Add BGP route for logical port with ip 172.24.4.226
Add BGP route for FIP with ip 172.24.4.199
Add BGP route for CR-LRP Port 172.24.4.221
....

Note

If you only want to expose the IPv6 GUA tenant IPs, then remove the option expose_tenant_networks and add expose_ipv6_gua_tenant_networks=True instead.

Note

If you want to filter the tenant networks to be exposed by some specific address scopes, add the list of address scopes to address_scope=XXX section. If no filtering should be applied, just remove the line.

Note that the OVN BGP Agent operates under the next assumptions:

  • A dynamic routing solution, in this case FRR, is deployed and advertises/withdraws routes added/deleted to/from certain local interface, in this case the ones associated to the VRF created to that end. As only VM and load balancer IPs need to be advertised, FRR needs to be configure with the proper filtering so that only /32 (or /128 for IPv6) IPs are advertised. A sample config for FRR is:

    frr version 7.5
    frr defaults traditional
    hostname cmp-1-0
    log file /var/log/frr/frr.log debugging
    log timestamp precision 3
    service integrated-vtysh-config
    line vty
    
    router bgp 64999
      bgp router-id 172.30.1.1
      bgp log-neighbor-changes
      bgp graceful-shutdown
      no bgp default ipv4-unicast
      no bgp ebgp-requires-policy
    
      neighbor uplink peer-group
      neighbor uplink remote-as internal
      neighbor uplink password foobar
      neighbor enp2s0 interface peer-group uplink
      neighbor enp3s0 interface peer-group uplink
    
      address-family ipv4 unicast
        redistribute connected
        neighbor uplink activate
        neighbor uplink allowas-in origin
        neighbor uplink prefix-list only-host-prefixes out
      exit-address-family
    
      address-family ipv6 unicast
        redistribute connected
        neighbor uplink activate
        neighbor uplink allowas-in origin
        neighbor uplink prefix-list only-host-prefixes out
      exit-address-family
    
    ip prefix-list only-default permit 0.0.0.0/0
    ip prefix-list only-host-prefixes permit 0.0.0.0/0 ge 32
    
    route-map rm-only-default permit 10
      match ip address prefix-list only-default
      set src 172.30.1.1
    
    ip protocol bgp route-map rm-only-default
    
    ipv6 prefix-list only-default permit ::/0
    ipv6 prefix-list only-host-prefixes permit ::/0 ge 128
    
    route-map rm-only-default permit 11
      match ipv6 address prefix-list only-default
      set src f00d:f00d:f00d:f00d:f00d:f00d:f00d:0004
    
    ipv6 protocol bgp route-map rm-only-default
    
    ip nht resolve-via-default
    
  • The relevant provider OVS bridges are created and configured with a loopback IP address (eg. 1.1.1.1/32 for IPv4), and proxy ARP/NDP is enabled on their kernel interface.

Limitations

The following limitations apply:

  • OVN 23.09 or later is needed to support exposing tenant networks IPs and OVN loadbalancers.

  • There is no API to decide what to expose, all VMs/LBs on providers or with floating IPs associated with them are exposed. For the VMs in the tenant networks, use the flag address_scopes to filter which subnets to expose, which also prevents having overlapping IPs.

  • In the currently implemented exposing methods (underlay and ovn) there is no support for overlapping CIDRs, so this must be avoided, e.g., by using address scopes and subnet pools.

  • For the default exposing method (underlay) but also with the vrf exposing method the network traffic is steered by kernel routing (ip routes and rules), therefore OVS-DPDK, where the kernel space is skipped, is not supported. With the ovn exposing method the routing is done at ovn level, so this limitation does not exists. More details in [NB DB] NB OVN BGP Agent: Design of the BGP Driver with OVN routing.

  • For the default exposing method (underlay) but also with the vrf exposing method the network traffic is steered by kernel routing (ip routes and rules), therefore SRIOV, where the hypervisor is skipped, is not supported. With the ovn exposing method the routing is done at ovn level, so this limitation does not exists. More details in [NB DB] NB OVN BGP Agent: Design of the BGP Driver with OVN routing.

  • In OpenStack with OVN networking the N/S traffic to the ovn-octavia VIPs on the provider or the FIPs associated with the VIPs on tenant networks needs to go through the networking nodes (the ones hosting the Neutron Router Gateway Ports, i.e., the chassisredirect cr-lrp ports, for the router connecting the load balancer members to the provider network). Therefore, the entry point into the OVN overlay needs to be one of those networking nodes, and consequently the VIPs (or FIPs to VIPs) are exposed through them. From those nodes the traffic will follow the normal tunneled path (Geneve tunnel) to the OpenStack compute node where the selected member is located.