Upgrade OVN to 22.03 on Focal¶
Charmed OpenStack supports OVN version 22.03 starting with OpenStack Ussuri. Clouds running on Focal nodes that are not using this version are strongly recommended to upgrade to it in order to benefit from important bug fixes and software enhancements.
In particular, the procedure described on this page aims to prevent OVN data plane downtime during the upgrade to 22.03. This is an upstream OVN issue that can cause network disruption to all cloud VMs.
Important
Read this entire document before making any changes to your cloud.
It is recommended that the upgrade be tested in a staged environment prior to applying the steps to a production cloud.
Prerequisites¶
Ensure the following prerequisites are satisfied before making any changes.
Juju version¶
Juju is running the latest stable version of its major and minor release (e.g. 2.9.x). This pertains to all three Juju components: client, controller model, and workload model. Juju upgrade documentation is available, but quick guidance is also given below.
First ensure that the client context is the cloud’s controller and model (check with command juju whoami). The essential commands are then:
sudo snap refresh juju
juju upgrade-controller
juju upgrade-model
Channel charms¶
OVN is managed by channel charms (charms ovn-central and ovn-chassis). If it is not, perform the migration away from legacy charms by applying special procedure All charms: migration to channels to those two charms.
Caution
As the above migration document states, when performing the migration to channel charms, ensure that the currently running OVN version does not change.
Procedure¶
Set the election timer¶
To minimise timeout issues during the upgrade, set the OVS database server election timer to its maximum value:
juju config ovn-central ovsdb-server-election-timer=30
For background information, see section Raft leader election timeout of this document.
Allow the ovn-central application to settle - use the juju status ovn-central command.
Ensure package requirements¶
Ensure that select packages are up to date on the cloud’s OVN and Neutron units.
OVN¶
Perform the package upgrades on all OVN units by running commands across the ovn-chassis and ovn-central applications:
juju run -a ovn-chassis 'apt update && apt -y install \
--only-upgrade openvswitch-common ovn-common'
juju run -a ovn-central 'apt update && apt -y install \
--only-upgrade openvswitch-common ovn-common'
Note
Some clouds may be running ovn-dedicated-chassis as opposed to ovn-chassis.
Neutron¶
Perform the package upgrades on all neutron-api units by running commands across the neutron-api application:
juju run -a neutron-api 'apt -y install ubuntu-cloud-keyring'
juju run -a neutron-api -- add-apt-repository \
'deb http://ubuntu-cloud.archive.canonical.com/ubuntu focal-updates/ovn-22.03 main'
juju run -a neutron-api -- 'apt update; apt -y install \
--only-upgrade neutron-common openvswitch-common --option Dpkg::Options::="--force-confold"'
Note
For the rationale behind manually enabling the UCA pocket for OVN 22.03 on neutron-api units see LP #1992770.
Fail-safe mode on OVN < 22.03¶
To prevent an OVN data plane outage during the upgrade to 22.03 the
ovn-controller
daemon must be placed into fail-safe mode. This section
corresponds to upstream’s documented fail-safe method.
First stop the ovn-northd
daemon:
juju run -a ovn-central 'systemctl stop ovn-northd'
Secondly, identify the Southbound database leader unit (see the Querying OVN page for guidance).
Finally, manually set the northd
version to an arbitrary string. The
ovn-controller
processes will detect this change and adapt to be able to
understand the data that the upgraded northd
daemon will subsequently
insert into the database (use the Southbound leader unit found above):
juju run -u <sb-db-leader-unit> 'ovn-sbctl set sb-global . options:northd_internal_version="<string>"'
An example invocation of the above if the Southbound leader unit is
ovn-central/2
:
juju run -u ovn-central/2 'ovn-sbctl set sb-global . options:northd_internal_version="safe"'
The above command contains the string ‘safe’. Any string will suffice provided that it is different from the current OVN version.
Perform the upgrade¶
To ensure a smooth migration, guidance is provided below that includes verification steps.
ovn-central¶
Prior to upgrading the ovn-central application, change its software sources to ‘distro’ and change the charm’s channel to ‘22.03/stable’:
juju refresh ovn-central --channel 22.03/stable \
--config <(printf "ovn-central:\n source: \"distro\"")
Now upgrade the application by selecting the UCA pocket for OVN 22.03 on Focal:
juju config ovn-central ovn-source=cloud:focal-ovn-22.03
As before, allow the ovn-central application to settle - use the juju status ovn-central command.
Verify: database migration¶
Ensure that the upgraded Northbound and Southbound database schemas match what’s expected (the target version). An example set of commands are provided below.
The Northbound database’s target version and actual version, respectively:
juju run -a ovn-central 'ovsdb-tool schema-version /usr/share/ovn/ovn-nb.ovsschema'
Stdout: |
6.1.0
UnitId: ovn-central/0
Stdout: |
6.1.0
UnitId: ovn-central/1
Stdout: |
6.1.0
UnitId: ovn-central/2
juju run -a ovn-central 'ovsdb-client get-schema-version unix:/var/run/ovn/ovnnb_db.sock OVN_Northbound'
Stdout: |
6.1.0
UnitId: ovn-central/0
Stdout: |
6.1.0
UnitId: ovn-central/1
Stdout: |
6.1.0
UnitId: ovn-central/2
The Southbound database’s target version and actual version, respectively:
juju run -a ovn-central 'ovsdb-tool schema-version /usr/share/ovn/ovn-sb.ovsschema'
Stdout: |
20.21.0
UnitId: ovn-central/0
Stdout: |
20.21.0
UnitId: ovn-central/2
Stdout: |
20.21.0
UnitId: ovn-central/1
juju run -a ovn-central 'ovsdb-client get-schema-version unix:/var/run/ovn/ovnsb_db.sock OVN_Southbound'
Stdout: |
20.21.0
UnitId: ovn-central/0
Stdout: |
20.21.0
UnitId: ovn-central/1
Stdout: |
20.21.0
UnitId: ovn-central/2
If versions do not match it might be that the database migration did not
succeed (see log files under /var/log/ovn
on the ovn-central units).
Verify: cluster status¶
Check the status of the Northbound and Southbound database clusters. It is
expected that one unit has Role: leader
and the others have Role:
follower
. An example set of commands are provided below.
The Northbound database cluster:
juju run -a ovn-central 'ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound' | egrep "Server ID|Role|Leader"
Server ID: 2a92 (2a9226b6-7a57-411a-94ee-092aa6a19e40)
Role: follower
Leader: bc3a
Server ID: adb2 (adb28a73-4e21-492c-81d0-f51adc6665a4)
Role: follower
Leader: bc3a
Server ID: bc3a (bc3a26b1-14c0-4133-b2c3-d8f64e4b722d)
Role: leader
Leader: self
The Southbound database cluster:
juju run -a ovn-central 'ovs-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound' | egrep "Server ID|Role|Leader"
Server ID: 8849 (8849b07b-cc32-47cf-8800-ed89fbc7db94)
Role: follower
Leader: fa7e
Server ID: 50b7 (50b7f34e-b295-4329-8d29-47039f697365)
Role: follower
Leader: fa7e
Server ID: fa7e (fa7e81bb-90e9-4c87-8ce4-cedcd54c6150)
Role: leader
Leader: self
ovn-chassis¶
To upgrade the ovn-chassis application, change the charm’s channel to ‘22.03/stable’ and then select the UCA pocket for OVN 22.03 on Focal:
juju refresh ovn-chassis --channel 22.03/stable
juju config ovn-chassis ovn-source=cloud:focal-ovn-22.03
Verify: network agents¶
Ensure that all network agents are “alive” and “up”:
openstack network agent list
Sample output:
+---------------+----------------------+---------------+---------------+-------+-------+-------------------------------+
| ID | Agent Type | Host | Avail... Zone | Alive | State | Binary |
+---------------+----------------------+---------------+---------------+-------+-------+-------------------------------+
| xxxx-xxxx-... | OVN Controller agent | xxxx-xxxx-... | | :-) | UP | ovn-controller |
| xxxx-xxxx-... | OVN Metadata agent | xxxx-xxxx-... | | :-) | UP | networking-ovn-metadata-agent |
| xxxx-xxxx-... | OVN Controller agent | xxxx-xxxx-... | | :-) | UP | ovn-controller |
| xxxx-xxxx-... | OVN Metadata agent | xxxx-xxxx-... | | :-) | UP | networking-ovn-metadata-agent |
+---------------+----------------------+---------------+---------------+-------+-------+-------------------------------+
Other resources¶
Raft leader election timeout¶
The Raft leader election timeout is a crucial factor in the upgrade. It is governed by the ovn-central charm’s ovsdb-server-election-timer configuration option, whose default value is ‘4’ (seconds).
The amount of wall clock time a database (Northbound or Southbound) cluster leader consumes during the upgrade process cannot exceed the election timer. If this occurs, the database unit attempting the upgrade (schema conversion) will be evicted from the cluster, thereby preventing its results from being stored. This scenario will lead to an endless retry loop.
Conversion happens on startup of the DB services after package upgrades. To prevent the aforementioned retry loop, the startup scripts have a 30 second hardcoded timeout. Therefore:
the maximum effective value for the
ovsdb-server-election-timer
option is ‘30’an alternative upgrade path would be needed if the conversion cannot succeed within that maximum
There is no template answer for what the value of the option should be. External factors (e.g. server performance characteristics, load, database size, number of records) all have a role to play.
See the upstream mailing list thread for a discussion on the topic. Issue LP #2013344 raises concerns about the option’s default value being too small.