Wallaby Series Release Notes

14.3.0-590

Prelude

Environment file collectd-write-qdr.yaml no longer specifies a default CollectdAmqpInstances hash.

The default ovsdb-server deployment mode has been switched from active/backup with Pacemaker to the native active/active RAFT clustering.

New Features

  • Added support for Unbound to forward DNS resolution requests to other DNS resolvers (DNS resolver forwarding).

  • Added new heat role specific parameter option ‘DdpPackage’ to select the required DDP Package.

  • Added new heat role specific param OVNAvailabilityZone to set availability-zones for ovn. This param replace seting availability-zones throught OVNCMSOptions

  • Added parameter OVNEncapTos to indicates the value to be applied to the OVN tunnel interface’s option:tos, as specified in the Open_vSwitch database Interface table. The default value is “0”. “inherit” allows to copy the inner ToS into the outer packet header.

  • Add IronicDefaultBootInterface parameter to allow users to set / override the default boot interface used by ironic. This may not work if a hardware type does not support the set boot interface. This overrides create-time defaults. The ordered union of the enabled boot interfaces and hardware type determines, under normal circumstances, what the default will be.

  • Add new parameter CeilometerTenantNameDiscovery, enabling this parameter will identify user and project names using the resource UUIDs for every polled sample. Upon a successful discovery, the identified names are added to the corresponding sample.

  • Since genisoimage was removed from CentOS9 / RHEL9, the nova’s default mkisofs_cmd option will not work anymore. In RHEL/CentOS realm, mkisofs is an alias to alternatives that either map to xorriso (9) or genisoimage (8).

  • Added new OctaviaLogOffloadProtocol setting that allows to select either UDP (default) or TCP as protocol for log offloading.

  • The new settings OctaviaMultiVcpuFlavorProperties and OctaviaMultiVcpuFlavorId allow to configure parameters of the new Octavia multi-vCPU flavor feature.

    • Added the Octavia TLS parameters.

  • Added a new parameter OVNOvsdbProbeInterval to configure ovsdb_probe_interval for neutron ml2-ovn plugin and ovn metadata agent.

  • Added a new parameter OVNOvsdbProbeInterval to configure OVSDB Connection.probe_interval. This requires setting the a single Connection entry for all RAFT servers which listens on all interfaces. To address the security implications, the iptables rules are set to limit traffic to the proper subnet.

  • RabbitMQ can be configured to run in FIPS mode via the new configuration option RabbitFIPS. The default value is false.

  • To support Glance Distributed Image Import, adding configuration of worker_self_reference_url by providing the internal API URL for each node where glance api will run with glance-direct method of image-import is enabled.

  • Add param NeutronAgentDownTime to configure neutron server agent_down_time Seconds to regard the agent as down; should be at least twice report_interval, to be sure the agent is down for good. agent_down_time is a config for neutron-server, set by class neutron::server report_interval is a config for neutron agents, set by class neutron

  • The new ApacheTimeout parameter has been added, which determines the timeout used for IO operations in Apache.

  • Add option to override the default corosync token_timeout value. There are cases where the default allotted time (10s) is not enough. This only works during cluster setup (first deployment).

  • Logging for the designate bind backend is now more fully configured. DNS query logging can be enabled by setting DesignateBindQueryLogging to true.

  • ‘dns_domain_ports’ extension driver is now enabled by default and this allows ‘dns_domain’ to be set for ports.

  • Neutron can now be configured to support secure RBAC using EnforceSecureRbac. Note, you may not be able to use this until Neutron upstream has support for common RBAC personas.

  • The new parameter EnforceSecureRbac has been added to enforce authorization based on common RBAC personas. Currently in glance the support is only available for project-admin, project-member and project-reader personas and system personas will come in a later release.

  • When deploying a new HA overcloud, the mysql/galera service can now be configured to use mariabackup for State Snapshot Transfers (SST) by configuring the new Heat parameter MysqlGaleraSSTMethod. Mariabackup SST uses a dedicated SQL user with the appropriate grants to transfer the database content across nodes. The user credentials can be configured via two additional Heat parameters MysqlMariabackupUser and MysqlMariabackupPassword.

  • Two instances of the glance-api service are now deployed per the recommendations outlined in OSSN-0090. The user facing service does not provide access to image location data, whereas a new internal glance-api service provides location data to administrators and services that need it (e.g. cinder and nova), and is accessible via the admin and internal keystone endpoints.

  • The new HorizonHstsHeaderValue parameter has been added. When this parameter is set, haproxy adds HTTP Strict-Transport-Security header to HTTP response to enforce SSL.

  • New configuration IronicDefaultBootMode allows to change the default boot mode to use for bare metal instances. The default for now remains bios for legacy BIOS boot but may switch to uefi in the future.

  • Added paramter IronicEnableNovaPowerNotifications (defaults to: true). The parameter controls the [nova]/send_power_notifications option in ironic.conf which is used to enable/disable the power state change callbacks to nova.

  • The new KeystoneNotificationDriver parameter has been added. This parameter overrides the global NotificationDriver parameter and allows customizing notification driver only in Keystone, which is required to use notification listner function in Barbican.

  • Added LeappPreRebootCommand parameter to the tripleo-packages. This is a list of commands to be executed just before rebooting the node to perform the leapp upgrade. This allows, for example, to remove kernel parameters to avoid affecting the leapp reboot.

  • Containerized Libvirt swtpm logs will be placed into /var/log/containers/libvirt/swtpm host path.

  • This change adds functionality to enable modular libvirt daemons. All these daemons runs in its respective container. Also the default configuration is to use modular libvirt daemons instead of monolithic libvirt daemon. Here is the list of libvirt daemon which are added in this change. - virtnodedevd - virtproxyd - virtqemud - virtsecretd - virtstoraged

    It’s possible to define the individual log filters for each one of these daemon using the following new parameters: - LibvirtVirtlogdLogFilters - LibvirtVirtsecretdLogFilters - LibvirtVirtnodedevdLogFilters - LibvirtVirtstoragedLogFilters - LibvirtVirtqemudLogFilters - LibvirtVirtproxydLogFilters

    More information regarding modular libvirt daemons is available here. Libvirt Daemons <https://libvirt.org/daemons.html> _.

  • Add support for overriding the default cipher used by galera. This is useful for cases like FIPS where the default ‘AES128-SHA256’ is not allowed.

  • Introduce new parameters {{role.name}}NetworkConfigUpdate. This will be a bool. When {{role.name}}NetworkConfigUpdate is True existing network configurations will be updated. By default, this is False and only new deployments will have the networks configured. This parameter is role based only, with no global option.

  • New config options for Neutron logging service plugin configuration were added. There are options added for L3 Agent: NeutronL3AgentLoggingRateLimit, NeutronL3AgentLoggingBurstLimit, NeutronL3AgentLoggingLocalOutputLogBase, for OVS agent: NeutronOVSAgentLoggingRateLimit, NeutronOVSAgentLoggingBurstLimit, NeutronOVSAgentLoggingLocalOutputLogBase and for ML2/OVN backend: NeutronOVNLoggingRateLimit, NeutronOVNLoggingBurstLimit, NeutronOVNLoggingLocalOutputLogBase.

  • Add NovaShowHostStatus to allow overriding API policies to access the compute host status in the requested Nova server details. The default value ‘hidden’ allows only admins to access it. Setting it to ‘all’ (‘unknown-only’) without additional fine-grained tuning of NovaApiHostStatusPolicy shows the full (limited) host_status to the system/project readers.

    Add NovaApiHostStatusPolicy that defines a custom API policy for os_compute_api:servers:show:host_status and `os_compute_api:servers:show:host_status:unknown-only. These rules, or roles, replace the admins-only policies based on the given NovaShowHostStatus: ‘unknown-only’ shows the limited host status UNKNOWN whenever a heartbeat was not received within the configured threshold, and ‘all’ also reveals UP, DOWN, or MAINTENANCE statuses in the Nova server details. Finally, NovaShowHostStatus: ‘hidden’ puts it back being visible only for admins. Additional policies specified using NovaApiPolicies get merged with this policy.

  • With conditional monitoring enabled in OVN, southbound ovsdb-serve takes lot of time in handling the monitoring and sending the updates to all its connected clients. Its takes lot of CPU. With monitor-all option, all ovn-controllers do not enable conditional monitoring there by reducing the load on the Southbound ovsdb-server.

  • Added support for PMD load based sleeping. It can be configured via the role specific THT OvsPmdSleepMax parameter to set other_config:pmd-maxsleep in OVS.

  • The new PlacementPolicies parameter has been added.

  • A heat parameter IronicPowerStateChangeTimeout has been added which sets the number of seconds to wait for power operations to complete, i.e., so that a baremetal node is in the desired power state. If timed out, the power operation is considered a failure. The default is 60 seconds, which is the same as the current Ironic default.

  • Added pure_iscsi_cidr and pure_host_personality and eradicate_on_delete support for the Pure Storage FlashArray Cinder driver.

  • Added NovaDisableComputeServiceCheckForFfu parameter to configure nova::workarounds::disable_compute_service_check_for_ffu to disable the service version check workaround for FFU.

  • DeploymentServerBlacklist parameter now supports both heat and actual hostnames.

  • Adding Hugepages role parameter

    Hugepages management was always a manual step done by operators via the TripleO parameter KernelArgs. This is error prone and causing confusion.

    The new Hugepages parameter allow operators to define hugepages as dictionnary, making it easier to read and follow.

    To prevent unvolontary changes, there’s multiple validations before applying a change:

    • We convert the current running configurations to an actual dictionnary that we validate the new format against

    • If no change is necessary, even though the format might not be the same, there’s no kernel_args update.

    • By default, we don’t remove hugepages in places except when operators specifically set the ReconfigureHugepages to true.

    This change is also opening the door to more automations and automatic tuning.

  • This changes the ServiceNetMap and VipSubnetMap interfaces to allow for server side env merging. This would, for example, allow for adding network for a new services without having to specify complete ServiceNetMap in parameter_defaults section of an environment file.

Known Issues

  • To operate well at scale, it is important that OVS 2.17+ is used when deploying with RAFT clustering. Specifically, python-ovs >= 2.17.1 is required.

Upgrade Notes

  • Operators using the audit service must change the way they provide custom configuration, and use a new “AuditdConfig” dict in the parameter_defaults

  • Changes the ironic PXE container TFTP service from in.tftpd to use the dnsmasq TFTP service. This is because the in.tftpd service is not anticipated to be carried by Linux distributions moving forward, and dnsmasq is actively maintained.

  • When upgrading an environment that uses collectd-write-qdr.yaml the CollectdAmqpInstances defaults previously specified need to be added to an administrator provided environment file and used during the overcloud deploy process.

  • When upgrading from a non-RAFT deployment, the old Pacemaker ovn-dbs-bundle containers will still exist and need to be cleaned up. They will not interfere with the function of the cluster, as all services connecting to ovsdb-server will be configured to connect to the server’s individual IP addresses and not the Pacemaker ovsdb-server VIP.

  • Delayed Nova Compute script uses /run/nova/startup for its state file instead of /run.

  • The new support for mariabackup SST for the mysql/galera service is currently limited to new overcloud deployments. Doing a stack update to change SST method from rsync to mariabackup or the other way around is currently not supported.

  • A new OS::TripleO::Services::GlanceApiInternal service is introduced to handle deploying the internal instance of the glance-api service. When upgrading an overcloud deployed with a custom roles file, the new GlanceApiInternal service must be added to every role that includes the GlanceApi service. Roles that include the GlanceApiEdge service should not include the new GlanceApiInternal service.

    Deployment of the new internal glance-api service is generally transparent, and includes updating glance’s endpoints in the keystone catalog. In a Distributed Compute Node (DCN) deployment, the control plane and all DCN sites need to be updated in order to fully deploy the new internal glance-api service.

  • Mistral has been removed as it was Deprecated in Wallaby and is no longer in use.

  • For Nova computes that need to keep running EL8, you can replace OS::TripleO::Services::NovaLibvirt service with OS::TripleO::Services::NovaLibvirtLegacy in its role files to run the monolithic libvirt. Unlike the modular deamons consumable with EL9 computes, that legacy service should only be used for Train to Wallaby skip-level (fast-forward) upgrades, and should not be used in new deployments.

  • To re-enable the QEMU driver features lost after the previous minor update, such as virsh cpu-stats, or volume attachements for existing Nova Compute instances, those need to be live-migrated (or cold-migrated) to either of the newly updated Nova Compute hosts.

  • Redis is now disabled by default in new deployments, so existing deployments have to delete the redis resource in pacemaker prior to upgrade, or include the new environment file ha-redis.yaml if they still implicitely depend on redis.

  • The templates to install ReaR via heat have been removed. Rear can be installed using Ansible via the command openstack overcloud backup –setup-rear.

  • The StackAction/StackUpdateType parameters have been removed because they have no significance with deployment using ephemeral heat.

  • The default boot mode for ironic deployed nodes is now uefi when no boot mode is explicitly set in the node’s driver_info, capabilities, or instance_info configuration. To restore the previous default, set the heat parameter IronicDefaultBootMode to bios.

  • The default UEFI iPXE bootfile is now snponly.efi. The boolean parameter IronicIPXEUefiSnpOnly was added to allow custom configuration. When set to true snponly is used, when false the previous default ipxe.efi is used. See bug: 1959726.

  • With the change to ServiceNetMap/VipSubnetMap interface, existing environments where they are overridden have to specify ‘merge’ strategy for the parameters in a new ‘parameter_merge_strategies’ section.

  • Zaqar has been removed as it was deprecated in Wallaby and is no longer in use on the undercloud. Additionally it hasn’t been supproted in the overcloud.

Deprecation Notes

  • All of the hiera value for the service configuration are deprecated, and replaced by a new “AuditdConfig” dict to be passed in the parameter_defaults

  • The tripleo-heat-templates parameter DnsServers has been deprecated.

    The dns_nameservers from the ctlplane subnets has been used by default for overcloud node nameservers for a long time, see: https://review.opendev.org/579582.

    Since Wallaby network configuration is applied prior to the Heat stack create, during overcloud node provisioning. In this case the THT parameter DnsServers is not available when network configuration is applied. Effectively the DnsServers parameter cannot be used in Wallaby and later releases.

  • The parameter SshServerOptionsOverrides has been deprecated since Ussuri. Use SshServerOptions to override partial sshd_config.

  • The following parameters have been deprecated and have no effect.

    • ManilaIsilonDriverHandlesShareServers

    • ManilaVNXDriverHandlesShareServers

    • ManilaVMAXDriverHandlesShareServers

  • The ManilaCephFSCephFSEnableSnapshots parameter has been deprecated, and has no effect now. Manila always enables snapshot support in Ceph FS backend since Wallaby.

  • Using environments/enable-designate.yaml has been deprecated in favor of environments/services/designate.yaml, the current location for environment files that enable TripleO components.

  • The GlanceShowMultipleLocations parameter is deprecated.

  • This change deprecates the nova-libvirt-container-puppet.yaml heat-template which configures monolithic modular libvirt daemon. The newly added heat-template for modular libvirt daemons will be used to configure libvirt services in different containers.

  • This change removes NetworkDeploymentActions and {{role.name}}NetworkDeploymentActions. Since we can no longer rely on the Heat stack action when using Ephemeral Heat in tripleo.

  • With the switch to ephemeral heat for the overcloud, the UndercloudMinion is no longer viable. Deploying UndercloudMinion is not supported anymore and environments files to enable its deployment are dropped.

Bug Fixes

  • Adds the port used for directly accessing Ironic-Inspector using TLS, 13050, to the list of ports to permit inbound connections on.

  • Parameters FrrBgpAsn and FrrOvnBgpAgentAsn should be configured to a common interger value. The default value for FrrBgpAsn has been updated from 65000 to 64999, to be aligned with the default value of FrrOvnBgpAgentAsn.

  • This fixes LP#1964733 and the deprecation/abandon of puppet-auditd module.

  • Fixed wrong usage of the PasswordMinLen parameter and the PasswordWarnAge parameter.

  • Cinder NVMe-oF: Use the right port 4420 instead of 4460 and add the appropriate iptables rule for LVM+nvmet to work.

  • Cinder NVMe-oF: Cinder nodes where not loading nvme-fabrics kernel module, so nvme-of would not work correctly on controller nodes.

  • Remove the processes plugin from the default collectd plugins list as it can cause logging to be flooded with messages such as ‘procs_running not found’.

  • The collectd-write-qdr.yaml no longer specifies a default CollectdAmqpInstances hash. When specified, it was not possible to override the parameter, resulting in a combined hash of the default values and the administrators custom values which could lead to unexpected issues.

  • Delayed Nova Compute script uses /run/nova/startup for its state file instead of /run.

  • Fixes an issue where gateway ping validations performed during deployment would fail. When setting the ManageNetworks parameter to false and no gateway was configured, the list of gateway IP addresses to ping would include empty strings for networks with no gateway. The validation would attempt to run a ping command without the address to ping, which caused the deployment to fail. See bug: 1973866.

  • Upon including “environments/metrics/ceilometer-write-qdr.yaml” to the deploy command, “CeilometerAgentIpmi” service does not start. If enabled, this service is supposed to run on individual compute nodes to gather IPMI sensor data using “hardware.*” pollster. Update every Compute role file where “CeilometerAgentIpmi” service was found missing.

    Add “–logfile” parameter to the ceilometer-polling command to log the process output to “ipmi.log” file.

    Ipmi agent must run inside a privileged container which is required to execute ipmitool commands on the host and gather power metrics. Without privileges oslo.privsep.daemon reports “OSError: [Errno 1] Operation not permitted”.

  • Default of the NovaSyncPowerStateInterval parameter has been changed from 0 to 600, to use the default value consistent with the one defined in nova.

  • Fix missing roles for Octavia services.

  • When we install libvirt on a host, the system parameter fs.aio-max-nr is to 1048576. Since we containerized libvirtd, we lost this system parameter. We now make sure it’s defined by adding it from the nova-libvirt-common template.

  • For HA services managed by pacemaker, it is now possible during a minor update to change Heat parameters ClusterCommonTag or ClusterFullTag to reconfigure the intermediate container image name used internally by pacemaker. This is achieved by running external_update_tasks with tag ha_image_update. A new pre-check ensures that when changing those Heat parameters, the external update tasks must have run prior to the regular update tasks.

  • The undercloud now disables [nova]/send_power_notifications in the ironic service. This fixes an issue where ironic-conductor on the undercloud would try to report power state changes to nova and fail because nova service is not runnint on the undercloud. See bug: 2000308.

  • The neutron agent report interval was recently changed from the 30s default to 300s. This caused issues whith timeouts when providing baremetal nodes. A new parameter IronicNeutronAgentReportInterval has been added with a default of 30s so that the report interval specifically for the networking baremetal agent is restored. See bug: 1940838.

  • Tripleo Nova Libvirt service unit no longer manages the QEMU driver cgroups by systemd, but delagates that to libvirt. In a result, newly created Nova Compute instances no longer experience problems with volume attachements, or executing virsh commands in the libvirt podman container, after the libvirt service restarts multiple times.

  • The [oslo_messaging_rabbit] heartbeat_in_pthread parameter is set to False to workaround some known issues with non-wsgi services like nova-compute. In case the parameter should be overridden, use ExtraConfig or <role>ExtraConfig.

  • Do not change ownership recursive for Swift. This was required when deployments upgraded from baremetal to containerized deployments. However, by now all deployments should be containerized, and running chown recursive against a large amount of data might timeout during upgrades.

  • Avoid Octavia HAProxy logs showing “[ssl_c_s_dn]” instead of the client certificate DN string. TripleO uses Octavia’s own default user_log_format setting now if possible.

Other Notes

  • A new param MlnxSDNToken has been added to authenticate sdn controller

  • Steps are taken to minimize chances of confusion between the default block storage volume type established by the CinderDefaultVolumeType parameter, and cinder’s own __DEFAULT__ volume type.

    In a new deployment where no volumes exist, cinder’s __DEFAULT__ type is deleted because it is redundant. In an upgrade scenerio, if volumes exist then the __DEFAULT__ type’s description is updated to indicate the actual default volume type is the one established by the CinderDefaultVolumeType parameter.

  • Parameter DhcpAgentNotification is set to False by default now. It should be set to True in case when Neutron DHCP agent is going to be deployed. It shouldn’t be enabled with ML2/OVN backend.

  • “podman image prune” is no longer used on the undercloud to remove unused images during the undercloud update/upgrade. With the usage of ephemeral Heat, not all images will always be used by running or stopped containers, so “podman image prune” should not be used to clean up the local container image storage. Images that are no longer being used can still be removed individually with “podman rmi”.

14.3.0

New Features

  • The libvirt driver has added support for hardware-offloaded OVS with vDPA (vhost Data Path Acceleration) type interfaces. vDPA allows virtio net interfaces to be presented to the guest while the datapath can be offloaded to a software or hardware implementation. This enables high performance networking with the portablity of standard virtio interfaces.

    Nova added support for vhost-vdpa devices in Wallaby.

  • Added OVN DBs clustering support. In this service model, a clustered database runs across multiple hosts in multi-active mode.

  • To help operators protect their workload, they can now enable the KernelArgsDeferReboot role parameter. This will prevent the tripleo-kernel ansible module from automatically rebooting nodes even if KernelArgs were changed unexpectedly.

  • Enable image copy for multiple RBD Glance stores

    Previously when using multiple RBD glance stores the operator was responsible for copying the image to all stores. Nova-compute now has the ability to automatically copy an image to the local glance store when required. This change enables the feature and adds the following role specific parameters to control the behaviour.

    • NovaGlanceRbdCopyPollInterval

    • NovaGlanceRbdCopyTimeout

Upgrade Notes

  • Upgrades from OVN non-HA and OVN DBs pacemaker to OVN DBs clustered are currently not supported.

Security Issues

  • The OVN database servers in an OVN DBs clustering and TLS-everywhere deployment will listen on all IP addresses (0.0.0.0). This is a caveat that can only be addressed once RHBZ 1952038 is fixed.

Bug Fixes

  • NFSv4.2 is now there for long time and default in RHEL/CentOS 8. This changes the default for NovaNfsVersion to be v4.2 instead of v4 to have this the new default.

14.2.0

Prelude

Enablement of data collection and transportation to an STF instance is now handled via existing templates.

New Features

  • The following parameters add support for mounting Cinder’s image conversion directory on an external NFS share.

    • CinderImageConversionNfsShare

    • CinderImageConversionNfsOptions

  • The glance_api_cron container has been introduced, which executes db purge job for Glance service. Use GlanceCronDbPurge* parameters to override cron parameters.

  • The new MemcacheUseAdvancedPool parameter is added which enables usage of advanced poll for memcached connections in keystone middleware. This parameter is set to true by default to avoind bursting connections in some services like neutron.

  • When nova_virtlogd container gets restarted the instance console auth files will not be reopened again by virtlogd. As a result either instances need to be restarted or live migrated to a different compute node to get new console logs messages logged again. Usually on receipt of SIGUSR1, virtlogd will re-exec() its binary, while maintaining all current logs and clients. This allows for live upgrades of the virtlogd service on non containerized environments where updates just by doing an RPM update. To reduce the likelihood in a containerized environment virtlogd should only be restarted on manual request, or on compute node reboot. It should not be restarted on a minor update without migration off instances. This introduces a nova_virtlogd_wrapper container and virtlogd wrapper script, to only restart virtlogd on either manual or compute node restart.

  • Add support for OVS DPDK pmd auto balance parameters. This feature adds 3 new role specific THT parameters to set pmd-auto-lb-load-threshold, pmd-auto-lb-improvement-threshold, and pmd-auto-lb-rebal-interval in OVS through OvsPmdLoadThreshold, OvsPmdImprovementThreshold and OvsPmdRebalInterval respectively.

  • Introduce new parameter to configure OVS PMD Auto Load Balance for OVS DPDK

  • New parameter RbdDiskCachemodes allows to override the disk cache modes for RBD. Defaults to [‘network=writeback’].

  • Added Heat container tear down to the HeatEphemeral service to occur during upgrades. This will convert an undercloud from non-ephemeral heat to ephemeral heat when the service is enabled.

Upgrade Notes

  • When upgrading a deployment with the use of enable-stf.yaml, add the following files to your overcloud deployment command in order to maintain the existing services defined in enable-stf.yaml.

    • environments/metrics/collectd-write-qdr.yaml

    • environments/metrics/ceilometer-write-qdr.yaml

    • environments/metrics/qdr-edge-only.yaml

Bug Fixes

  • On the compute nodes, right now ssl certificates got created for libvirt, qemu-default, qemu-vnc and qemu-nbd. This is not required because the all services use the same NovaLibvirtNetwork network and therefore multiple certificates for the same hostname get created. Also from qemu point of view, if default_tls_x509_cert_dir and default_tls_x509_verify parameters get set for all certificates, there is no need to specify any of the other *_tls* config options. From Secure live migration with QEMU-native TLS

    The intention (of libvirt) is that you can just use the default_tls_x509_* config attributes so that you don’t need to set any other *_tls* parameters, unless you need different certificates for some services. The rationale for that is that some services (e.g. migration / NBD) are only exposed to internal infrastructure; while some sevices (VNC, Spice) might be exposed publically, so might need different certificates. For OpenStack this does not matter, though, we will stick with the defaults.

    Therefore with this change InternalTLSNbdCAFile, InternalTLSVncCAFile and InternalTLSQemuCAFile get removed (which defaulted to /etc/ipa/ca.crt anyways) and just use InternalTLSCAFile.

    Also all cerfificates get created when EnableInternalTLS is true to and mount all SSL certificates from the host. This is to prevent certificate information is not available in a qemu’s process container environment if features get switched later, which has shown to be problematic.

Other Notes

  • Using enable-stf.yaml now defines the expected configuration in OpenStack for use with Service Telemetry Framework. Removal of the defined resource_registry now requires passing additional environment files to enable the preferred data collectors and transport architecture, providing better flexibility to support additional architectures in the future.

  • These parameters can now be set per-role - DnfStreams, UpgradeInitCommand, UpgradeLeappCommandOptions, UpgradeLeappDevelSkip, UpgradeLeappToRemove, UpgradeLeappToInstall

14.1.2

New Features

  • The parameters CephHciOsdCount and CephHciOsdType were added in order to support the derive parameters feature for hyperconverged deployments when using cephadm.

14.1.0

Prelude

It’s not necessary to install ceph-ansible nor prepare a Ceph container when configuring external Ceph in Wallaby and newer. External ceph configuration is done with TripleO (not cephadm nor ceph-ansible) and should be executed using the related environment file.

New Features

  • Added TripleO support for the Unbound DNS resolver service.

  • Adds a new IronicInspectorStorageBackend parameter that can be used to set the storage backend for introspection data.

  • New environments are added at environments/disable-heat.yaml and environments/disable-neutron.yaml which can be used to disable those services.

  • The new parameter GlanceCinderMountPointBase has been added which will be used for mounting NFS volumes on glance nodes. When glance uses cinder as store and cinder backend is NFS, this parameter must be set to match cinder’s mount point.

  • Added new options for deploying Barbican with PKCS#11 backends: BarbicanPkcs11CryptoTokenLabels and BarbicanPkcs11CryptoOsLockingOk

  • The new paramerter GlanceCinderVolumeType parameter has been added which is required while configuring multiple cinder stores as glance backends.

  • The logic to configure the connection from barbican to nShield HSMs has been augmented to parse a nshield_hsms parameter, which allows the specification of multiple HSMs. The underlying ansible role (ansible-role-thales-hsm) will configure the HSMs in load sharing mode to provide HA.

  • The OS::TripleO::{{role.name}}::PreNetworkConfig resource has been restored. This resource can be used to implement any configuration steps executed before network configurations are applied.

  • It is now possible to deploy Ceph with TripleO using cephadm.

  • New CinderRpcResponseTimeout and CinderApiWsgiTimeout parameters provide a means for configuring Cinder’s RPC response and WSGI connection timeouts, respectively.

  • The Cinder Backup service can be switched from running active/passive under pacemaker, to active-active mode where it runs simultaneously on every node on which it’s deployed. Note that the service will be restarted when switching modes, which will interrupt any backup operations currently in progress.

  • A new CinderBackupCompressionAlgorithm parameter supports specifying the compression algorithm used by Cinder Backup backends that support the feature. The parameter defaults to zlib, which is Cinder’s default value.

  • Two new parameters are added to control the concurrency of Cinder’s backup and restore operations:

    • CinderBackupWorkers

    • CinderBackupMaxOperations

  • Adds support for configuring the cinder-backup service with a Google Cloud Storage (GCS) backend, or an Amazon S3 backend.

  • The cinder-backup service can be configured to store backups on external Ceph clusters defined by the CephExternalMultiConfig parameter. New CinderBackupRbdClusterName and CinderBackupRbdClientUserName parameters can be specified, which override the default CephClusterName and CephClientUserName values respectively.

  • A new CinderRbdMultiConfig parameter may be used to configure additional cinder RBD backends on external Ceph clusters defined by the CephExternalMultiConfig parameter.

  • The environment file environments/external-ceph.yaml has been created and can be used when an external Ceph cluster is used.

  • Added FRR as a new TripleO service. This service allows cloud operators to deploy pure L3 control plane via BGP protocol. This has the following benefits:

    • Obtain multiple routes on multiple uplinks

    • BGP used for ECMP load balancing and BFD for resiliency

    • Advertise routes to API endpoints

    • Less L2 traffic

    Please refer to Install and Configure FRRouter specification for more information.

  • QemuDefaultTLSVerify will allow operators to enable or disable TLS client certificate verification. Enabling this option will reject any client who does not have a certificate signed by the CA in /etc/pki/qemu/ca-cert.pem. The default is true and matches libvirt’s. We will want to disable this by default in train.

  • The LibvirtDebug parameter has been added to enable or disable debug logging of libvirtd and virtlogd.

  • Now the debug logging of libvirtd and virtlogd is enabled automatically when the Debug parameter is true.

  • The manila_api_cron container has been introduced, which executes db purge job for Manila service. Use ManilaCronDbPurge* parameters to override cron parameters.

  • Add posibilities to configure ovn dbs monitor interval in tht by OVNDBSPacemakerMonitorInterval (default 30s). Under load, this can create extra stress and since the timeout has already been bumped, it makes sense to bump this interval to a higher value as a trade off between detecting a failure and stressing the service.

  • Introducing the following parameters:
    • NovaComputeForceRawImages

    • NovaComputeUseCowImages

    • NovaComputeLibvirtPreAllocateImages

    • NovaComputeImageCacheManagerInterval

    • NovaComputeImageCacheRemoveUnusedBaseImages

    • NovaComputeImageCacheRemoveUnusedResizedMinimumAge

    • NovaComputeImageCachePrecacheConcurrency

  • When a node has hugepages enabled, we can help with live migrations by enabling NovaLiveMigrationPermitPostCopy and NovaLiveMigrationPermitAutoConverge. These flags are automatically enabled if hugepages are detected, but operators can override these settings.

  • Add the following parameters to tune the behavior of nova-scheduler to achieve better distribution of instances.

    • NovaSchedulerHostSubsetSize

    • NovaSchedulerShuffleBestSameWeighedHosts

  • Introduce new compute role based parameter NovaGlanceEnableRbdDownload to enable direct download if rbd is used for glance, but compute is using local ephemeral storage, to allow nova-compute to direct download the images in this scenario from the glance ceph pool via rbd, instead going through glance api. If NovaGlanceEnableRbdDownload is set, per default the global RBD glance parameters are used, CephClientUserName GlanceRbdPoolName and CephClusterName for the used ceph.conf. Glance supports multi storage backends which can be configured using GlanceMultistoreConfig. If additional RBD glance backends are configured, the NovaGlanceRbdDownloadMultistoreID can be used to pointing to the hash key (backend ID) of GlanceMultistoreConfig to use. If CephClientUserName or GlanceRbdPoolName are not set in the GlanceMultistoreConfig, the global values of those parameters will be used.

  • Add NovaLibvirtMaxQueues role parameter to set [libvirt]/max_queues in nova.conf of the compute. Default 0 corresponds to not set meaning the legacy limits based on the reported kernel major version will be used.

  • security-group logging is now supported under ML2/OVN. A more detailed explanation can be found in bug 1914757.

  • Adds pre_deploy_step_tasks support which is run after kolla files are setup and podman is configured, but before any deployment task or external deployment task. The use case is being able to start containers before any deployment task.

  • Add parameter NovaSchedulerQueryPlacementForRoutedNetworkAggregates that allows the scheduler to verify if the requested networks or the port are related to Neutron routed networks _ with some specific segments to use. In this case, the routed networks prefilter will require the related aggregates to be reported in Placement, so only hosts within the asked aggregates would be accepted. In order to support this behaviour, operators need to set the [scheduler]/query_placement_for_routed_network_aggregates configuration option which defaults to False.

  • The keystone_cron container was reintroduced to run trust_flush job, which removes expired or soft-deleted trusts from keystone database.

  • The KeystoneEnableDBPurge parameter was readded, to enable or disable purge job for Keystone.

  • The following parameters were added, to configure parameters about trust_flush cron job.

    • KeystoneCronTrustFlushEnsure

    • KeystoneCronTrustFlushMinute

    • KeystoneCronTrustFlushHour

    • KeystoneCronTrustFlushMonthday

    • KeystoneCronTrustFlushMonth

    • KeystoneCronTrustFlushWeekday

    • KeystoneCronTrustFlushMaxDelay

    • KeystoneCronTrustFlushDestination

    • KeystoneCronTrustFlushUser

  • Adding ptp parameters for timemaster service configuration on overcloud compute node.Timemaster will use already present chrony parameters. PTPMessageTransport, PTPInterfaces are added new.

Upgrade Notes

  • All service Debug parameters are now booleans as expected by oslo. This helps in proper validation and service template composition complexities.

  • The Keepalived service has been removed. The OS::Tripleo::Service::Keepalived resource should be removed during update/upgrade.

  • The iscsi deploy interface is no longer enabled by default in ironic, making the direct deploy interface the default. You will need to update your nodes to the direct deploy before upgrading or re-enable the iscsi deploy in IronicEnabledDeployInterfaces (but note that it is going to be deprecated in the future).

  • The IronicImageDownloadSource parameter has been changed to http by default making ironic cache glance images and serve them via a local HTTP server. Set the parameter to swift to return the previous behavior of relying on swift temporary URLs.

  • The NovaHWMachineType parameter now defaults x86_64 based instances to the unversioned q35 machine type. The remaining architecture machine type defaults being provided directly by OpenStack Nova.

    A environments/nova-hw-machine-type-upgrade.yaml environment file has been provided to pin NovaHWMachineType to the previous versioned machine type defaults during an upgrade.

    When the upgrade of the overcloud is complete the following OpenStack Nova documentation should then be used to ensure a machine type is recorded for all existing instances before the new NovaHWMachineType default can be used in the environment.

    https://docs.openstack.org/nova/latest/admin/hw-machine-type.html#update

  • Users of the OS::TripleO::Network::Ports::RedisVipPort and OS::TripleO::Network::Ports::OVNDBsVipPort interfaces must update their templates. The interfaces has been removed, and the managment of these virtual IPs has been moved to the tripleo-heat-templates service template.

    This change will typically affect deployments using already deployed servers. Typically the virtual IPs for Redis and OVNDBs was overriden using the deployed-neutron-port template. For example:

    resource_registry:
      OS::TripleO::Network::Ports::RedisVipPort: /usr/share/openstack-tripleo-heat-templates/deployed-server/deployed-neutron-port.yaml
      OS::TripleO::Network::Ports::OVNDBsVipPort: /usr/share/openstack-tripleo-heat-templates/deployed-server/deployed-neutron-port.yaml
    
    parameter_defaults:
      DeployedServerPortMap:
        redis_virtual_ip:
          fixed_ips:
            - ip_address: 192.168.100.10
          subnets:
            - cidr: 192.168.100.0/24
          network:
            tags:
              - 192.168.100.0/24
        ovn_dbs_virtual_ip:
          fixed_ips:
            - ip_address: 192.168.100.11
          subnets:
            - cidr: 192.168.100.0/24
          network:
            tags:
              - 192.168.100.0/24
    

    This will have to be changed. The following example shows how to replicate the above configuration:

    parameter_defaults:
      RedisVirtualFixedIPs:
        - ip_address: 192.168.100.10
          use_neutron: false
      OVNDBsVirtualFixedIPs:
        - ip_address: 192.168.100.11
          use_neutron: false
    
  • The legacy DefaultPasswords interface to use passwords from heat resources has been removed as we don’t use it anymore.

  • The OVNVifType parameter has been removed because the parameter was not used in Neutron.

  • The following two services have been removed, and should be removed from role data during upgrade.

    • OS::TripleO::Services::CinderBackendVRTSHyperScale

    • OS::TripleO::Services::VRTSHyperScale

  • Remove deprecated OS::TripleO::Services::CinderBackendDellEMCXTREMIOIscsi. Use OS::TripleO::Services::CinderBackendDellEMCXtremio instead.

Deprecation Notes

  • The IronicInspectorUseSwift parameter has been deprecated in favor of IronicInspectorStorageBackend and will be removed in a future release.

  • The BarbicanPkcs11CryptoTokenLabel option has been deprecated and replaced with the BarbicanPkcs11CryptoTokenLabels option.

  • Some parameters within ThalesVars have been deprecated. These are - thales_hsm_ip_address and thales_hsm_config_location. See environments/barbican-backend-pkcs11-thales.yaml for details.

  • Ceph Deployment using Ceph versions older than Octopus is deprecated.

  • The CephOsdPercentageMin parameter has been deprecated and has a new default of 0 so that the validation is not run. There is no need to fail the deployment early if a percentage of the OSDs are not running because the Ceph pools created for OpenStack can now be created even if there are 0 OSDs as the PG number is no longer required on pool creation. TripleO no longer waits for OSD creation and instead only queues the request for OSD creation with the ceph orchestrator.

  • The environment file environments/ceph-ansible/ceph-ansible-external.yaml has been deprecated and will be removed in X.

  • The interfaces OS::TripleO::Network::Ports::RedisVipPort and OS::TripleO::Network::Ports::OVNDBsVipPort ha been removed. The resources are no longer used in the overcloud heat stack.

  • Supoort for the Veritas HyperScale Driver has been removed.

Bug Fixes

  • Now ExtraConfigPre resource and NodeExtraConfig resource are executed after network configurations are applied in nodes. This is consitent with the previous version with heat software deployment mechanism instead of config-download.

  • The default value of CinderNfsSnapshotSupport has been changed from true to false, to be consistent with the default value in cinder.

  • Previously access to the sshd running by the nova-migration-target container is only limited via the sshd_config. While login is not possible from other networks, the service is reachable via all networks. This change limits the access to the NovaLibvirt and NovaApi networks which are used for cold and live-migration.

  • Nova vnc configuration right now uses NovaVncProxyNetwork, NovaLibvirtNetwork and NovaApiNetwork to configure the different components (novnc proxy, nova-compute and libvirt) for vnc. If one of the networks get changed from internal_api, the service configuration between libvirt, nova-compute and novnc proxy gets inconsistent and the console is broken. This changed to just use NovaLibvirtNetwork for configuring the vnc endpoints and removes NovaVncProxyNetwork completely.

  • Decrease Swift proxy timeouts for GET/HEAD requests using a new parameter named SwiftProxyRecoverableNodeTimeout. The default node timeout is 10 seconds in Swift, however this has been set to 60 seconds in TripleO in case there are slow nodes. However, this affects all requests - GET, HEAD and PUT. GET/HEAD requests are typically much faster, thus it makes sense to use a lower timeout to recover earlier from node failures. This will increase stability, because the proxy can select another backend node to retry the request.

  • Bug #1915800: Add support for ports filtering in XtremIO driver.

Other Notes

  • The CephPoolDefaultPgNum paramter default is now 16. The Ceph pg_autoscaler is enabled by default in the supported versions of Ceph though the parameter CephPoolDefaultPgNum may still be used as desired.

  • The default value of the parameter ‘RabbitAdditionalErlArgs’ was updated to include the new options ‘+sbwtdcpu none +sbwtdio none’ which disables busy-wait for dirty cpu schedulers and dirty i/o schedulers respectively. This aligns with the flags recommended by RabbitMQ upstream (https://www.rabbitmq.com/runtime.html#busy-waiting).

14.0.0

New Features

  • Added MemcachedMaxConnections setting with a default of 8192 maximum connections in order to allow an operator to override that value in environments where memcached is heavily sollicited.

  • The aodh_api_cron container has been added to run aodh-expirer command periodically, to remove expired alarms from Aodh database. Use AodhExpire* parameters to override cron parameters.

  • The new AodhAlarmHistoryTTL parameter has been added, which defines TTL of alarm histories in aodh. This parameter is set as 86400 by default.

  • Support deploying multiple Cinder Netapp Storage backends. CinderNetappBackendName is enhanced to support a list of backend names, and a new CinderNetappMultiConfig parameter provides a way to specify parameter values for each backend.

  • Introducing the new NovaSchedulerEnabledFilters based on the new nova parameter filter_scheduler.enabled_filters.

  • The parameter NovaComputeStartupDelay allows the operator to delay the startup of nova-compute after a compute node reboot. When all the overcloud nodes are rebooted at the same time, it can take a few minutes to the Ceph cluster to get in a healthy state. This delay will prevent the instances from booting before the Ceph cluster is healthy.

  • The NovaApiMaxLimit parameter allows the operator to set Nova API max_limit using a Heat parameter in their templates.

  • New Heat parameter ClusterFullTag controls how we configure pacemaker container image names for the HA services. Compared to the previous parameter ClusterCommonTag, this new naming convention allows any part of the container image name to change during a minor update, without service disruption. e.g., registryA/namespaceA/imgA:tagA to registryB/namespaceB/imgB:tagB This new paramter ClusterFullTag is enabled by default.

  • Refresh Swift ring files without restarting containers. This makes it possible to update rings without service restarts, lowering the overhead for updates.

  • The SwiftHashPrefix parameter allows the operator to set Swift swift_hash_path_prefix using a Heat parameter in their Templates.

  • OVN now supports VXLAN network type for tenant networks.

Known Issues

  • Cell_v2 discovery has been moved from the nova-compute|nova-ironic containers as this requires nova api database credentials which must not be configured for the nova-compute service. As a result scale-up deployments which explicitly omit the Controller nodes will need to make alternative arrangements to run cell_v2 discovery. Either the nova-manage command can be run manually after scale-up, or an additional helper node using the NovaManage role can be deployed that will be used for this task instead of a Controller node. See Bug: 1786961 and Bug: 1871482.

Upgrade Notes

  • The following parameters have been removed since they have had no effect.

    • NovaDbSyncTimeout

    • ExtractedPlacementEnabled

  • The EnableEtcdInternalTLS parameter’s default value changes from false to true. The change is related to the fact that novajoin is deprecated, and the functionality associated with the EnableEtcdInternalTLS parameter is not required when TLS is deployed using the tripleo-ansible ansible module.

  • The AdminEmail parameter has been removed because it has had no effect since TripleO had bootstrap support implemented.

  • Support for Sahara service has been removed.

Deprecation Notes

  • The EnableEtcdInternalTLS parameter is deprecated. It was added to support a workaround that is necessary when novajoin is used to deploy TLS, but novajoin itself is deprecated. The workaround is not necessary when TLS is deployed using the tripleo-ansible ansible module.

  • Deprecating NovaSchedulerDefaultFilters, it’s replaced with the new setting, NovaSchedulerEnabledFilters.

  • Zaqar services are deprecated for removal.

Bug Fixes

  • When deploying a spine-and-leaf (L3 routed architecture) with TLS enabled for internal endpoints the deployment would fail because some roles are not connected to the network mapped to the service in ServiceNetMap. To fix this issue a role specific parameter {{role.name}}ServiceNetMap is introduced (defaults to: {}). The role specific ServiceNetMap parameter allow the operator to override one or more service network mappings per-role. For example:

    ComputeLeaf2ServiceNetMap:
      NovaLibvirtNetwork: internal_api_leaf2
    

    The role specific {{role.name}}ServiceNetMap override is merged with the global ServiceNetMap when it’s passed as a value to the {{role.name}}ServiceChain resources, and the {{role.name}} resource groups so that the correct network for this role is mapped to the service.

    Closes bug: 1904482.

  • Certificates get merged into the containers using kolla_config mechanism. If a certificate changes, or e.g. UseTLSTransportForNbd gets disabled and enabled at a later point the containers running the qemu process miss the required certificates and live migration fails. This change moves to use bind mount for the certificates and in case of UseTLSTransportForNbd ans creates the required certificates even if UseTLSTransportForNbd is set to False. With this UseTLSTransportForNbd can be enabled/disabled as the required bind mounts/certificates are already present.

  • https://review.opendev.org/q/I8df21d5d171976cbb8670dc5aef744b5fae657b2 introduced THT parameters to set libvirt/cpu_mode. The patch sets the NovaLibvirtCPUMode wrong to ‘none’ string which results in puppet-nova not to handle the default cases correct and sets libvirt/cpu_mode to none which results in ‘qemu64’ CPU model, which is highly buggy and undesirable for production usage. This changes the default to the recommended CPU mode ‘host-model’, for various benefits documented elsewhere.

  • When using RHSM Service (deployment/rhsm/rhsm-baremetal-ansible.yaml) based registration of the overcloud nodes and enabling the KSM using NovaComputeEnableKsm=True the overcloud deployment will fail because the RHSM registration and the ksm task run as host_prep task. The handling of enable/disable ksm is now handled in deploy step 1.

  • In case of cellv2 multicell environment nova-metadata is the only httpd managed service on the cell controller role. In case of tls-everywhere it is required that the cell controller host has ther needed metadata to be able to request the HTTP certificates. Otherwise the getcert request fails with “Insufficient ‘add’ privilege to add the entry ‘krbprincipalname=HTTP/cell1-cellcontrol-0….’”

  • Do not relabel Swift files on every container (re-)start. These will be relabeled already in step 3 preventing additional delays.

Other Notes

  • A new parameter called RabbitTCPBacklog that specifies the maximum tcp backlog for RabbitMQ as been added. The value defaults to 4096 to match previous behavior and can be modified for the needs of larger scale deployments by operators.