Upgrade OVN to 22.03 on Focal

Charmed OpenStack supports OVN version 22.03 starting with OpenStack Ussuri. Clouds running on Focal nodes that are not using this version are strongly recommended to upgrade to it in order to benefit from important bug fixes and software enhancements.

In particular, the procedure described on this page aims to prevent OVN data plane downtime during the upgrade to 22.03. This is an upstream OVN issue that can cause network disruption to all cloud VMs.

Important

Read this entire document before making any changes to your cloud.

It is recommended that the upgrade be tested in a staged environment prior to applying the steps to a production cloud.

Prerequisites

Ensure the following prerequisites are satisfied before making any changes.

Juju version

Juju is running the latest stable version of its major and minor release (e.g. 2.9.x). This pertains to all three Juju components: client, controller model, and workload model. Juju upgrade documentation is available, but quick guidance is also given below.

First ensure that the client context is the cloud’s controller and model (check with command juju whoami). The essential commands are then:

sudo snap refresh juju
juju upgrade-controller
juju upgrade-model

Channel charms

OVN is managed by channel charms (charms ovn-central and ovn-chassis). If it is not, perform the migration away from legacy charms by applying special procedure All charms: migration to channels to those two charms.

Caution

As the above migration document states, when performing the migration to channel charms, ensure that the currently running OVN version does not change.

Procedure

Set the election timer

To minimise timeout issues during the upgrade, set the OVS database server election timer to its maximum value:

juju config ovn-central ovsdb-server-election-timer=30

For background information, see section Raft leader election timeout of this document.

Allow the ovn-central application to settle - use the juju status ovn-central command.

Ensure package requirements

Ensure that select packages are up to date on the cloud’s OVN and Neutron units.

OVN

Perform the package upgrades on all OVN units by running commands across the ovn-chassis and ovn-central applications:

juju run -a ovn-chassis 'apt update && apt -y install \
   --only-upgrade openvswitch-common ovn-common'
juju run -a ovn-central 'apt update && apt -y install \
   --only-upgrade openvswitch-common ovn-common'

Note

Some clouds may be running ovn-dedicated-chassis as opposed to ovn-chassis.

Neutron

Perform the package upgrades on all neutron-api units by running commands across the neutron-api application:

juju run -a neutron-api 'apt -y install ubuntu-cloud-keyring'
juju run -a neutron-api -- add-apt-repository \
   'deb http://ubuntu-cloud.archive.canonical.com/ubuntu focal-updates/ovn-22.03 main'
juju run -a neutron-api -- 'apt update; apt -y install \
   --only-upgrade neutron-common openvswitch-common --option Dpkg::Options::="--force-confold"'

Note

For the rationale behind manually enabling the UCA pocket for OVN 22.03 on neutron-api units see LP #1992770.

Fail-safe mode on OVN < 22.03

To prevent an OVN data plane outage during the upgrade to 22.03 the ovn-controller daemon must be placed into fail-safe mode. This section corresponds to upstream’s documented fail-safe method.

First stop the ovn-northd daemon:

juju run -a ovn-central 'systemctl stop ovn-northd'

Secondly, identify the Southbound database leader unit (see the Querying OVN page for guidance).

Finally, manually set the northd version to an arbitrary string. The ovn-controller processes will detect this change and adapt to be able to understand the data that the upgraded northd daemon will subsequently insert into the database (use the Southbound leader unit found above):

juju run -u <sb-db-leader-unit> 'ovn-sbctl set sb-global .  options:northd_internal_version="<string>"'

An example invocation of the above if the Southbound leader unit is ovn-central/2:

juju run -u ovn-central/2 'ovn-sbctl set sb-global . options:northd_internal_version="safe"'

The above command contains the string ‘safe’. Any string will suffice provided that it is different from the current OVN version.

Perform the upgrade

To ensure a smooth migration, guidance is provided below that includes verification steps.

ovn-central

Prior to upgrading the ovn-central application, change its software sources to ‘distro’ and change the charm’s channel to ‘22.03/stable’:

juju refresh ovn-central --channel 22.03/stable \
   --config <(printf "ovn-central:\n source: \"distro\"")

Now upgrade the application by selecting the UCA pocket for OVN 22.03 on Focal:

juju config ovn-central ovn-source=cloud:focal-ovn-22.03

As before, allow the ovn-central application to settle - use the juju status ovn-central command.

Verify: database migration

Ensure that the upgraded Northbound and Southbound database schemas match what’s expected (the target version). An example set of commands are provided below.

The Northbound database’s target version and actual version, respectively:

juju run -a ovn-central 'ovsdb-tool schema-version /usr/share/ovn/ovn-nb.ovsschema'

Stdout: |
6.1.0
UnitId: ovn-central/0
Stdout: |
6.1.0
UnitId: ovn-central/1
Stdout: |
6.1.0
UnitId: ovn-central/2

juju run -a ovn-central 'ovsdb-client get-schema-version unix:/var/run/ovn/ovnnb_db.sock OVN_Northbound'

Stdout: |
6.1.0
UnitId: ovn-central/0
Stdout: |
6.1.0
UnitId: ovn-central/1
Stdout: |
6.1.0
UnitId: ovn-central/2

The Southbound database’s target version and actual version, respectively:

juju run -a ovn-central 'ovsdb-tool schema-version /usr/share/ovn/ovn-sb.ovsschema'

Stdout: |
20.21.0
UnitId: ovn-central/0
Stdout: |
20.21.0
UnitId: ovn-central/2
Stdout: |
20.21.0
UnitId: ovn-central/1

juju run -a ovn-central 'ovsdb-client get-schema-version unix:/var/run/ovn/ovnsb_db.sock OVN_Southbound'

Stdout: |
20.21.0
UnitId: ovn-central/0
Stdout: |
20.21.0
UnitId: ovn-central/1
Stdout: |
20.21.0
UnitId: ovn-central/2

If versions do not match it might be that the database migration did not succeed (see log files under /var/log/ovn on the ovn-central units).

Verify: cluster status

Check the status of the Northbound and Southbound database clusters. It is expected that one unit has Role: leader and the others have Role: follower. An example set of commands are provided below.

The Northbound database cluster:

juju run -a ovn-central 'ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound' | egrep "Server ID|Role|Leader"

Server ID: 2a92 (2a9226b6-7a57-411a-94ee-092aa6a19e40)
Role: follower
Leader: bc3a
Server ID: adb2 (adb28a73-4e21-492c-81d0-f51adc6665a4)
Role: follower
Leader: bc3a
Server ID: bc3a (bc3a26b1-14c0-4133-b2c3-d8f64e4b722d)
Role: leader
Leader: self

The Southbound database cluster:

juju run -a ovn-central 'ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound' | egrep "Server ID|Role|Leader"

Server ID: 8849 (8849b07b-cc32-47cf-8800-ed89fbc7db94)
Role: follower
Leader: fa7e
Server ID: 50b7 (50b7f34e-b295-4329-8d29-47039f697365)
Role: follower
Leader: fa7e
Server ID: fa7e (fa7e81bb-90e9-4c87-8ce4-cedcd54c6150)
Role: leader
Leader: self

ovn-chassis

To upgrade the ovn-chassis application, change the charm’s channel to ‘22.03/stable’ and then select the UCA pocket for OVN 22.03 on Focal:

juju refresh ovn-chassis --channel 22.03/stable
juju config ovn-chassis ovn-source=cloud:focal-ovn-22.03
Verify: network agents

Ensure that all network agents are “alive” and “up”:

openstack network agent list

Sample output:

+---------------+----------------------+---------------+---------------+-------+-------+-------------------------------+
| ID            | Agent Type           | Host          | Avail... Zone | Alive | State | Binary                        |
+---------------+----------------------+---------------+---------------+-------+-------+-------------------------------+
| xxxx-xxxx-... | OVN Controller agent | xxxx-xxxx-... |               | :-)   | UP    | ovn-controller                |
| xxxx-xxxx-... | OVN Metadata agent   | xxxx-xxxx-... |               | :-)   | UP    | networking-ovn-metadata-agent |
| xxxx-xxxx-... | OVN Controller agent | xxxx-xxxx-... |               | :-)   | UP    | ovn-controller                |
| xxxx-xxxx-... | OVN Metadata agent   | xxxx-xxxx-... |               | :-)   | UP    | networking-ovn-metadata-agent |
+---------------+----------------------+---------------+---------------+-------+-------+-------------------------------+

Other resources

Raft leader election timeout

The Raft leader election timeout is a crucial factor in the upgrade. It is governed by the ovn-central charm’s ovsdb-server-election-timer configuration option, whose default value is ‘4’ (seconds).

The amount of wall clock time a database (Northbound or Southbound) cluster leader consumes during the upgrade process cannot exceed the election timer. If this occurs, the database unit attempting the upgrade (schema conversion) will be evicted from the cluster, thereby preventing its results from being stored. This scenario will lead to an endless retry loop.

Conversion happens on startup of the DB services after package upgrades. To prevent the aforementioned retry loop, the startup scripts have a 30 second hardcoded timeout. Therefore:

  1. the maximum effective value for the ovsdb-server-election-timer option is ‘30’

  2. an alternative upgrade path would be needed if the conversion cannot succeed within that maximum

There is no template answer for what the value of the option should be. External factors (e.g. server performance characteristics, load, database size, number of records) all have a role to play.

See the upstream mailing list thread for a discussion on the topic. Issue LP #2013344 raises concerns about the option’s default value being too small.