Train Series Release Notes

9.3.2-21

New Features

  • Adds a new flag, docker_disable_default_network, which defaults to no. Docker is using 172.17.0.0/16 by default for bridge networking on docker0, and this might cause routing problems for operator networks. Setting this flag to yes will disable Docker’s bridge networking. This feature will be enabled by default from the Wallaby 12.0.0 release.

Upgrade Notes

  • baremetal role now uses CentOS 8 package repository for Docker CE (compared to 7 previously).

Security Issues

  • Adds mitigation for the Apache Log4j2 Remote Code Execution (RCE) Vulnerability in Elasticsearch - CVE-2021-44228.

Bug Fixes

  • Fix the issue when Swift deployed with S3 Token Middleware enabled. Fixes LP#1862765

  • Fixed an issue when Docker was configured after startup on Debian/Ubuntu, which resulted in iptables rules being created - before they were disabled. LP#1923203

  • Fixes iscsid failing in current CentOS 8 based images due to pid file being needlessly set. LP#1933033

  • Fixes unable to connect to zun console when kolla_enable_tls_external is true. Access to console of any zun container fails when kolla_enable_tls_external is true. This fix sets the protocol for wsproxy base_url in zun.conf according to the value of kolla_enable_tls_external LP#1957117

  • Adds a new flag, docker_disable_ip_forward, which defaults to no and can be used (by setting yes) to disable docker’s ip-forward option which makes docker set net.ipv4.ip_forward sysctl to 1. This is to protect from creating all-forwarding hosts. LP#1931615

  • adds back the option to configure the rabbitmq clustering interface via kolla LP#1900160 <https://bugs.launchpad.net/kolla-ansible/+bug/1900160>

  • Fixes an issue seen when using Jinja2 3.1.0.

9.3.2

Bug Fixes

  • A bug where sriov_agent.ini wasn’t copied due to Permission denied error was fixed. LP#1923467

  • Fixed an issue where docker python SDK 5.0.0 was failing due to missing six - introduced a constraint to install version lower than 5.x. LP#1928915

  • Fixes more-than-2-node RabbitMQ upgrade failing randomly. LP#1930293.

  • Fixes potential issue with Alertmanger in non-HA deployments. In this scenario, peer gossip protocol is now disabled and Alertmanager won’t try to form a cluster with non-existing other instances. LP#1926463

  • Fixes some configuration issues around Barbican logging. LP#1891343

  • Fixes some configuration issues around Cinder logging. LP#1916752

  • Fixes an issue with keepalived which was not recreated during an upgrade if configuration is unchanged. LP#1928362

  • Fixes an issue with executing kolla-ansible when installed via pip install --user. LP#1915527

  • Removes whitespace around equal signs in zookeeper.cfg which were preventing the zkCleanup.sh script from running correctly.

9.3.1

Bug Fixes

  • Fix the wrong configuration of the ovs-dpdk service. this breaks the deployment of kolla-ansible. For more details please see bug 1908850.

9.3.0

New Features

  • Adds a new flag, docker_disable_default_iptables_rules, which defaults to no. Docker is manipulating iptables rules by default to provide network isolation, and this might cause problems if the host already has an iptables based firewall. A common problem is that Docker sets the default policy of the FORWARD chain in the filter to DROP. Setting docker_disable_default_iptables_rules to yes will disable Docker’s iptables manipulation. This feature will be enabled by default from the Victoria 11.0.0 release.

  • Improves performance of the common role by generating all fluentd configuration in a single file.

  • Improves performance of the common role by generating all logrotate configuration in a single file.

Upgrade Notes

  • The default value of REST_API_REQUIRED_SETTINGS was synchronized with Horizon. You may want to review settings exposed by the updated configuration.

Security Issues

  • The admin-openrc.sh file generated by kolla-ansible post-deploy was previously created with root:root ownership and 644 permissions. This would allow anyone with access to the same directory to read the file, including the admin credentials. The ownership of admin-openrc.sh is now set to the user executing kolla-ansible, and the file is assigned a mode of 600. This change can be applied by running kolla-ansible post-deploy.

Bug Fixes

  • Add support to use bifrost-deploy behind proxy. It uses existing container_proxy variable.

  • Fixes handling of /dev/kvm permissions to be more robust against host-level actions. LP#1681461

  • Rework keystone fernet bootstrap which had tendencies to fail on multinode setups. See bug 1846789 for details.

  • IPv6 fully-routed topology (/128 addressing) is now allowed (where applicable). LP#1848941

  • Fixes deployment of fluentd without any enabled OpenStack services. LP#1867953

  • This patch adds kolla-ansible internal logrotate config for Logstash. Logstash 2.4 uses integrated in container logrotate configuration which tries to rotate logs in /var/log/logstash while kolla-ansible deployed Logstash logs are in /var/log/kolla/logstash. LP#1886787

  • Fixes --configdir parameter to apply to default passwords.yml location. LP#1887180

  • fluentd is now logging to /var/log/kolla/fluentd/fluentd.log instead of stdout. LP#1888852

  • Fixes deploy-containers action missing for the Masakari role. LP#1889611

  • An issue has been fixed when keystone container would be stuck in restart loop with a message that fernet key is stale. LP#1895723

  • Fixes haproxy_single_service_split template to work with default for mode (http). LP#1896591

  • Fixed invalid fernet cron file path on Debian/Ubuntu from /var/spool/cron/crontabs/root/fernet-cron to /var/spool/cron/crontabs/root. LP#1898765

  • Add with_first_found on placement for placement-api wsgi configuration to allow overwrite from users. LP#1898766

  • RabbitMQ services are now restarted serially to avoid a split brain. LP#1904702

  • Fixes LP#1906796 by adding notice and note loglevels to monasca log-metrics drop configuration

  • Fixes Swift’s stop action. It will no longer try to start swift-object-updater container again. LP#1906944

  • Fixes an issue with the kolla-ansible prechecks command with Docker 20.10. LP#1907436

  • Fixes an issue with kolla-ansible mariadb_recovery when the mariadb container does not exist on one or more hosts. LP#1907658

  • fix deploy freezer failed when use kolla_dev_mod LP#1888242

  • Fixes issues with some CloudKitty commands trying to connect to an external TLS endpoint using HTTP. LP#1888544

  • Fixes an issue where Docker may fail to start if iptables is not installed. LP#1899060

  • The admin-openrc.sh file generated by kolla-ansible post-deploy was previously created with root:root ownership and 644 permissions. This would allow anyone with access to the same directory to read the file, including the admin credentials. The ownership of admin-openrc.sh is now set to the user executing kolla-ansible, and the file is assigned a mode of 600. This change can be applied by running kolla-ansible post-deploy.

  • Fixes an issue during deleting evacuated instances with encrypted block devices. LP#1891462

  • Fixes an issue where Keystone Fernet key rotation may fail due to permission denied error if the Keystone rotation happens before the Keystone container starts. LP#1888512

  • Fixes an issue with Keystone startup when Fernet key rotation does not occur within the configured interval. This may happen due to one of the Keystone hosts being down at the scheduled time of rotation, or due to uneven intervals between cron jobs. LP#1895723

  • Fixes an issue where Grafana instances would race to bootstrap the Grafana DB. See LP#1888681.

  • Fixes LP#1892210 where the number of open connections to Memcached from neutron-server would grow over time until reaching the maximum set by memcached_connection_limit (5000 by default), at which point the Memcached instance would stop working.

  • An issue where when Kafka default topic creation was used to create a Kafka topic, no redundant replicas were created in a multi-node cluster. LP#1888522. This affects Monasca which uses Kafka, and was previously masked by the legacy Kafka client used by Monasca which has since been upgraded in Ussuri. Monasca users with multi-node Kafka clusters should consultant the Kafka documentation to increase the number of replicas.

  • Fixes an issue where the br_netfilter kernel module was not loaded on compute hosts. LP#1886796

  • Prevents adding a new Keystone host to an existing cluster when not targeting all Keystone hosts (e.g. due to --limit or --serial arguments), to avoid overwriting existing Fernet keys. LP#1891364

  • Reduce the use of SQLAlchemy connection pooling, to improve service reliability during a failover of the controller with the internal VIP. LP#1896635

  • No longer configures the Prometheus OpenStack exporter to use the prometheus Docker volume, which was never required.

9.2.0

New Features

  • Adds ability to provide a custom elasticsearch config.

  • Adds Elasticsearch Curator for managing aggregated log data.

  • Kolla Ansible checks now that the local Ansible Python environment is coherent, i.e. used Ansible can see Kolla Ansible. LP#1856346

Upgrade Notes

  • Avoids unnecessary fact gathering using the setup module. This should improve the performance of environments using fact caching and the Ansible smart fact gathering policy. See blueprint for details.

  • Adds elasticsearch_use_v6 and kibana_use_v6 flags which can be set to true to deploy the elasticsearch6 and kibana6 images on CentOS 7 or 8. These flags are true by default on CentOS 8, and false elsewhere. The services should be upgraded from 5.x to 6.x via kolla-ansible upgrade elasticsearch,kibana, and this can be used to provide an Elasticsearch 6.x cluster that is compatible between CentOS 7 and 8.

  • In the previous stable release, the octavia user was no longer given the admin role in the admin project, and a task was added to remove the role during upgrades. However, the octavia configuration was not updated to use the service project, causing load balancer creation to fail.

    There is also an issue for existing deployments in simply switching to the service project. While existing load balancers appear to continue to work, creating new load balancers fails due to the security group belonging to the admin project. For this reason, Train and Stein have been reverted to use the admin project by default, while from the Ussuri release the service project will be used by default.

    To provide flexibility, an octavia_service_auth_project variable has been added. In the Train and Stein releases this is set to admin by default, and from Ussuri it will be set to service by default. For users of Train and Stein, octavia_service_auth_project may be set to service in order to avoid a breaking change during the Ussuri upgrade.

    To switch an existing deployment from using the admin project to the service project, it will at least be necessary to create the required security group in the service project, and update octavia_amp_secgroup_list to this group’s ID. Ideally the Amphora flavor and network would also be recreated in the service project, although this does not appear to be necessary for operation, and will impact existing Amphorae.

    See bug 1873176 for details.

  • Changes the default value of kibana_elasticsearch_ssl_verify from false to true. LP#1885110

  • Apache ZooKeeper will now be automatically deployed whenever Apache Storm is enabled.

  • When deploying Monasca with Logstash 6 (the default for Centos 8), any custom Logstash 2 configuration for Monasca will need to be updated to work with Logstash 6. Please consult the documentation.

Bug Fixes

  • Fixes Kibana deployment with the new E*K stack (6+). LP#1799689

  • Removing chrony package and AppArmor profile from docker host if containerized chrony is enabled. LP#1882513

  • Escapes table names in mariadb upgrade procedure. LP#1883141

  • Fixes an issue with Manila deployment starting openvswitch and neutron-openvswitch-agent containers when enable_manila_backend_generic was set to False. LP#1884939

  • Fixes the Elasticsearch Curator cron schedule run. LP#1885732

  • Fixes an incorrect configuration for nova-conductor when a custom Nova policy was applied, preventing the nova_conductor container from starting successfully. LP#1886170

  • Add missing “become: true” on some VMWare related tasks. Fixed on Copying VMware vCenter CA file and Copying over nsx.ini.

  • fix deploy nova failed when use kolla_dev_mod.

  • In line with clients for other services used by Magnum, Cinder and Octavia also use endpoint_type = internalURL. In the same tune, these services also use the globally defined openstack_region_name.

  • Fixes the default CloudKitty configuration, which included the gnocchi_collector and keystone_fetcher options that were deprecated in Stein and removed in Train. See bug 1876985 for details.

  • Fixes an issue with Cinder upgrades that would cause online schema migration to fail. LP#1880753

  • Fix cyborg api container failed to load api paste file. For details please see bug 1874028.

  • Fix the configuration of the etcd service so that its protocol is independant of the value of the internal_protocol parameter. The etcd service is not load balanced by HAProxy, so there is no proxy layer to do TLS termination when internal_protocol is configured to be https.

  • Fixes an issue where fernet_token_expiry would fail the pre-checks despite being set to a valid value. Please see bug 1856021 for more details.

  • The kolla_logs Docker volume is now mounted into the Elasticsearch container to expose logs which were previously written erroneously to the container filesystem (bug 1859162). It is up to the user to migrate any existing logs if they so desire and this should be done before applying this fix.

  • In the previous stable release, the octavia user was no longer given the admin role in the admin project, and a task was added to remove the role during upgrades. However, the octavia configuration was not updated to use the service project, causing load balancer creation to fail. See upgrade notes for details. LP#1873176

  • Fixes an issue with RabbitMQ where tags would be removed from the openstack user after deploying Nova. This prevents the user from accessing the RabbitMQ management UI. LP#1875786

  • Adds a new variable fluentd_elasticsearch_cacert, which defaults to the value of openstack_cacert. If set, this will be used to set the path of the CA certificate bundle used by Fluentd when communicating with Elasticsearch. LP#1885109

  • Improves error reporting in kolla-genpwd and kolla-mergepwd when input files are not in the expected format. LP#1880220.

  • Fixes Magnum trust operations in multi-region deployments.

  • Deploys Apache ZooKeeper if Apache Storm is enabled explicitly. ZooKeeper would only be deployed if Apache Kafka was also enabled, which is often done implicitly by enabling Monasca.

  • When deploying Elasticsearch 6 (the default for Centos 8), Logstash 2 was deployed by default which is not compatible with Elasticsearch 6. Logstash 6 is now deployed by default when using Centos 8 containers.

9.1.0

New Features

  • Adds support for CentOS 8 as a host Operating System and base container image. This is the only major version of CentOS supported from the Ussuri release. The Train release supports both CentOS 7 and 8 hosts, and provides a route for migration.

  • Add Object Storage service (Swift) support for Ironic.

  • Adds a new variable, openstack_tag, which is used as the default Docker image tag in place of openstack_release. The default value is openstack_release, with a suffix set via openstack_tag_suffix. The suffix is empty except on CentOS 8 where it is set to -centos8. This allows for the availability of images based on CentOS 7 and 8.

Upgrade Notes

  • Some images are supported by CentOS 7 but lack suitable packages in CentOS 8, and are not supported for CentOS 8. See Kolla release notes for details.

  • Adds a rabbitmq_use_3_7_24_on_centos7 flag which can be set to true to deploy the rabbitmq-3.7.24 image on CentOS 7. The image should be deployed via kolla-ansible upgrade, and can be used to provide a RabbitMQ cluster that is compatible with the CentOS 8 rabbitmq image.

  • Support for the SCSI target daemon (tgtd) has been removed for CentOS/RHEL 8. The default value of cinder_target_helper is now lioadm on CentOS/RHEL 8, but remains as tgtadm on other platforms.

  • The octavia user is no longer given the admin role in the admin project. Octavia does not require this role and instead uses octavia user with admin role in service project. During an upgrade the octavia user is removed from the admin project. See bug 1873176 for details.

Security Issues

  • Fixes leak of RabbitMQ password into Ansible logs. LP#1865840

Bug Fixes

  • Fix that the cyborg conductor failed to communicate with placement. See bug 1873717.

  • Fix that cyborg agent failed to start privsep daemon. Add privileged capability for cyborg agent. See bug 1873715.

  • Adds necessary region_name to octavia.conf when enable_barbican is set to true. LP#1867926

  • Adds /etc/timezone to Debian/Ubuntu containers. LP#1821592

  • Fixes an issue with Nova live migration not using migration_interface_address even when TLS was not used. When migrating an instance to a newly added compute host, if addressing depended on /etc/hosts and it had not been updated on the source compute host to include the new compute host, live migration would fail. This did not affect DNS-based name resolution. Analogically, Nova live migration would fail if the address in DNS//etc/hosts was not the same as migration_interface_address due to user customization. LP#1729566

  • Fix qemu loading of ceph.conf (permission error). LP#1861513

  • Remove /run bind mounts in Neutron services causing dbus host-level errors and add /run/netns for neutron-dhcp-agent and neutron-l3-agent. LP#1861792

  • Fixes an issue where old fluentd configuration files would persist in the container across restarts despite being removed from the node_custom_config directory. LP#1862211

  • Use more permissive regex to remove the offending 127.0.1.1 line from /etc/hosts. LP#1862739

  • Each Prometheus mysqld exporter points now to its local mysqld instance (MariaDB) instead of VIP address. LP#1863041

  • Cinder Backup has now access to kernel modules to load e.g. iscsi_tcp module. LP#1863094

  • Makes RabbitMQ hostname address resolution precheck stronger by requiring uniqueness of resolution to avoid later issues. LP#1863363

  • Fix protocol used by neutron-metadata-agent to connect to Nova metadata service. This possibly affected internal TLS setup. Fixes LP#1864615

  • Fixes haproxy role to avoid restarting haproxy service multiple times in a single Ansible run. LP#1864810 LP#1875228

  • Fixes an issue with deploying Grafana when using IPv6. LP#1866141

  • Fixes elasticsearch deployment in IPv6 environments. LP#1866727

  • Fixes failure to deploy telegraf with monitoring of zookeeper due to wrong variable being referenced. LP#1867179

  • Fixes ceph deployment reconfiguration error, when Gathering OSDs step would fail due to Kolla-Ansible user not having access to /var/lib/ceph/osd/_FSID_/whoami. LP#1867946

  • Fix missing glance_ca_certificates_file variable in glance.conf. LP#1869133

  • Fixes designate-worker not to use etcd as its coordination backend because it is not supported by Designate (no group membership support available via tooz). LP#1872205

  • Fixes source-IP-based load balancing for Horizon when using the “split” HAProxy service template.

  • Fixes issue where HAProxy would have no backend servers in its config files when using the “split” config template style.

  • Manage nova scheduler workers through openstack_service_workers variable. LP#1873753

  • Remove the meta field of the Swift rings from the default rsync_module template. Having it by default, undocumented, can lead to unexpected behavior when the Swift documentation states that this field is not processed.

  • Fix elasticsearch schema in fluentd when kolla_enable_tls_internal is true.

  • Fixes an issue with HAProxy prechecks when scaling out using --limit or --serial. LP#1868986.

  • Fixes an issue with the HAProxy monitor VIP precheck when some instances of HAProxy are running and others are not. See bug 1866617.

  • Fixes MariaDB issues in multinode scenarios which affected deployment, reconfiguration, upgrade and Galera cluster resizing. They were usually manifested by WSREP issues in various places and could lead to need to recover the Galera cluster. Note these issues were due to how MariaDB was handled during Kolla Ansible runs and did not affect Galera cluster during normal operations unless MariaDB was later touched by Kolla Ansible. Users wishing to run actions on their Galera clusters using Kolla Ansible are strongly advised to update. For details please see the following Launchpad bug records: bug 1857908 and bug 1859145.

  • Fixes an issue with Nova when deploying new compute hosts using --limit. LP#1869371.

  • Adapt Octavia to the latest dual CA certificate configuration. The following files should exist in /etc/kolla/config/octavia/:

    • client.cert-and-key.pem

    • client_ca.cert.pem

    • server_ca.cert.pem

    • server_ca.key.pem

    See the Octavia documentation for details on generating these files.

  • Since Openstack services can now be configured to use TLS enabled REST endpoints, urls should be constructed using the {{ internal_protocol }} and {{ external_protocol }} configuration parameters.

  • Construct service REST API urls using kolla_internal_fqdn instead of kolla_internal_vip_address. Otherwise SSL validation will fail when certificates are issued using domain names.

  • Fixes an issue with the kolla-ansible stop command where it may fail trying to stop non-existent containers. LP#1868596.

  • Fixes gnocchi-api script name for Ubuntu/Debian binary deployments. LP#1861688

  • Fixes an issue where host configuration tasks (sysctl, loading kernel modules) could be performed during the kolla-ansible genconfig command. See bug 1860161 for details.

  • Fixes an issue where openstack_release was set to master by default, resulting in containers tagged master being deployed. It has been changed to train. The same applies to kolla_source_version, which affects development mode. See bug 1866054 for details.

  • Fixes an issue with port prechecks for the Placement service. See bug 1861189 for details.

  • Removes the [http]/max-row-limit = 10000 setting from the default InfluxDB configuration, which resulted in the CloudKitty v1 API returning only 10000 dataframes when using InfluxDB as a storage backend. See bug 1862358 for details.

  • Skydive’s API and the web UI now rely on Keystone for authentication. Only users in the Keystone project defined by skydive_admin_tenant_name will be able to authenticate. See LP#1870903 <https://launchpad.net/bugs/1870903> for more details.

  • masakari-monitor will now use the internal API to reach masakari-api. LP#1858431

  • Switch endpoint_type from public to internal for octavia communicating with the barbican service. See bug 1875618 for details.

9.0.1

Bug Fixes

  • External Ceph: copy also cinder keyring to nova-compute. Since Train nova-compute needs also the cinder key in case rbd user is set to Cinder, because volume/pool checks have been moved to use rbd python library. Fixes LP#1859408

  • Adds configuration to set also_notifies within the pools.yaml file when using the Infoblox backend for Designate.

    Pushing a DNS NOTIFY packet to the master does not cause the DNS update to be propagated onto other nodes within the cluster. This means each node needs a DNS NOTIFY packet otherwise users may be given a stale DNS record if they query any worker node. For details please see bug 1855085

  • Fixes an issue with Docker client timeouts where Docker reports ‘Read timed out’. The client timeout may be configured via docker_client_timeout. The default timeout has been increased to 120 seconds. See bug for details.

  • Fixes an issue with fluentd parsing of WSGI logs for Aodh, Masakari, Qinling, Vitrage and Zun. See bug 1720371 for details.

  • Fixes glance_api to run as privileged and adds missing mounts so it can use an iscsi cinder backend as its store. LP#1855695

  • When upgrading from Rocky to Stein HAProxy configuration moves from using a single configuration to assembling a file from snippets for each service. Applying the HAProxy tag to the entire play ensures that HAProxy configuration is generated for all services when the HAProxy tag is specified. For details please see bug 1855094.

  • Fixes an issue with the ironic_ipxe container serving instance images. See bug 1856194 for details.

  • Fixes templating of Prometheus configuration when Alertmanager is disabled. In a deployment where Prometheus is enabled and Alertmanager is disabled the configuration for the Prometheus will fail when templating as the variable prometheus_alert_rules does not contain the key files. LP#1854540

9.0.0

Prelude

The Kolla Ansible 9.0.0 release is the first release in the Train cycle. Highlights include support for deployment of the Masakari instance High Availability service and Qinling Function as a Service. It is now possible to deploy multiple Nova cells, and the control plane may be configured to communicate via IPv6.

New Features

  • Adds a new kolla-ansible subcommand: deploy-containers. This action will only do the container comparison and deploy out new containers if that comparison detects a change is needed. This should be used to get updated container images, where no new config changes are need, deployed out quickly.

  • Adds variables horizon_wsgi_processes and horizon_wsgi_threads to configure the number of processes and threads for WSGI in the Horizon container.

  • Adds configuration variables kolla_enable_tls_internal, kolla_internal_fqdn_cert, and kolla_internal_fqdn_cacert to optionally enable TLS encryption for OpenStack endpoints on the internal API network.

  • Adds support for overriding the dnsmasq.conf configuration file used by the Neutron DHCP agent via {{ node_custom_config}}/neutron/dnsmasq.conf or {{ node_custom_config }}/neutron/{{ inventory_hostname }}/dnsmasq.conf.

  • Adds support for deploying Qinling. Qinling is an OpenStack project to provide “Function as a Service”. This project aims to provide a platform to support serverless functions.

  • Add support to Kolla-Ansible for Cloudkitty InfluxDB storage system deployment.

  • Kolla Ansible can now configure deployed docker for Zun. Enable docker_configure_for_zun (disabled by default to retain backwards compatibility).

  • Adds support for providing custom configuration options for the Docker daemon via the docker_custom_config variable (JSON formatted).

  • Neutron port_forwarding service plugin, and l3 extension can be enabled with variable enable_neutron_port_forwarding.

  • Merge action plugins (for config/ini and yaml files) now allow relative imports in the same way that upstream template modules does, e.g. one can now include subtemplate from the same directory as base template.

  • HAProxy - Add the ability to define custom HAProxy services in {{ node_custom_config }}/haproxy/services.d/

  • Adds support for deployment in an IPv6 networking environment.

    For details of IPv6 support consult the relevant docs.

  • Adds support for configuring libvirt with TLS support. This allows for secure communication between nova-compute and libvirt as well as between libvirt on different hypervisors, during live-migration. The default configuration passes data in plain text, over TCP, without authentication.

  • Adds support for deploying Masakari, the instance high availability service.

  • Cinder coordination backend can now be configured via cinder_coordination_backend variable. Coordination is optional and can now be set to either redis or etcd.

  • Adds support for configuring a coordination backend for Designate via the designate_coordination_backend variable. Coordination is mandatory when multiple workers are deployed as in a multinode environment. Possible values are redis or etcd.

  • Adds initial support for deployment of multiple Nova cells. Cells allow Nova clouds to scale to large sizes, by sharding the compute hosts into multiple pools called cells.

    This feature is still evolving, and the associated configuration may be liable to change in the next release to support more advanced deployment topologies.

  • Adds support for deploying Prometheus blackbox exporter

    An example blackbox-exporter module has been added (disabled by default) called os_endpoint. This allows for the probing of endpoints over HTTP and HTTPS. This can be used to monitor that OpenStack endpoints return a status code of either 200 or 300, and the word ‘versions’ in the payload.

  • Adds support for passing extra options to Prometheus.

  • It is now possible to pass RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS to RabbitMQ server’s Erlang VM via the newly introduced rabbitmq_server_additional_erl_args variable. See Kolla Ansible docs RabbitMQ section for details.

  • Adds a standardised method to configure notifications for different services.

  • Adds support for configuring additional Docker volumes for Kolla containers. These are configured via <service_name>_extra_volumes.

  • Adds a new variable to be used by the swift role, swift_extra_ring_files. It allows to pass additional ring files to be deployed in the context of a multi-policy setup.

  • Adds support for Swift Recon.

    Adds the necessary configuration to the Swift account, container and object configuration files to enable the Swift recon cli.

    In order to give the object server on each Swift host access to the recon files, a Docker volume is mounted into each container which generates them. The volume is then mounted read only into the object server container. Note that multiple containers append to the same file. This should not be a problem since Swift uses a lock when appending.

    Example usage: sudo docker exec swift_object_server swift-recon –all

  • Adds support for the Swift S3 API, enabled via the enable_swift_s3api flag.

  • Adds support for configuration of a trusted CA certificate file. CA bundle file must be added to both the Horizon and Kolla Toolbox containers for this to work correctly.

  • Adds support for iSCSI-based (including LVM) Cinder volumes to Zun deployment.

Upgrade Notes

  • Updates the minimum required version of Ansible to 2.6.

  • Removes support for RabbitMQ from the Bifrost container. During the Train cycle, Bifrost switched its default to use JSON-RPC rather than RabbitMQ for internal Ironic communication. This simplifies the deployment and should improve reliability.

  • Set RabbitMQ cluster_partition_handling to pause_minority. This is to avoid split-brain. The setting is overridable using custom config. Note this new config requires at least 3-node RabbitMQ cluster to provide HA (High Availability). See production architecture guide for more info.

  • Modifies the default storage backend for Cloudkitty to InfluxDB, to match the default in Cloudkitty from Stein onwards. This is controlled via cloudkitty_storage_backend. To use the previous default, set cloudkitty_storage_backend to sqlalchemy. See bug 1838641 for details.

  • Modifies the default value for openstack_release from auto to the name of the release (e.g. train), or master on the master branch. The value of auto will no longer detect the version of the kolla-ansible Python package.

  • Docker engine configuration changes are now applied in /etc/docker/daemon.json file instead of altering the systemd unit (which gets removed if present).

  • Increases the default value of docker_graceful_timeout from 10 to 60. This sets the time that docker will wait for a container to gracefully stop before issuing a KILL signal.

  • RHEL-based targets no longer require EPEL repository. It can be safely removed from target hosts if not used otherwise.

  • InfluxDB TSI is now enabled by default. It is recommended for all customers by InfluxData. If you do not want to enable it you can set the variable influxdb_enable_tsi to False in globals.yml. Instructions to migrate existing data to the new, disk based format can be found at https://docs.influxdata.com/influxdb/v1.7/administration/upgrading/ If you do not follow the migration proceedure, InfluxDB should continue to work, but this is not recommended.

  • The Keystone fernet key rotation scheduling algorithm has been modified to avoid issues with over-rotation of keys.

    The variables fernet_token_expiry, fernet_token_allow_expired_window and fernet_key_rotation_interval may be set to configure the token expiry and key rotation schedule.

    By default, fernet_token_expiry is 86400, fernet_token_allow_expired_window is 172800, and fernet_key_rotation_interval is the sum of these two variables. This allows for the minimum number of active keys - 3.

    See bug 1809469 for details.

  • MariaDB is now exposed via HAProxy on the database_port and not the mariadb_port. Out of the box these are both the same, but if you have customised mariadb_port so that it is different to the database_port and you have a service talking to it via HAProxy on that port then you should review your configuration.

  • Modifies the path for custom configuration of swift.conf from /etc/kolla/config/swift/<service>.conf to /etc/kolla/config/swift/<service>/swift.conf, to avoid a collision with custom configuration for <service>.conf. Here, <service> may be proxy-server, account-*, container-* or object-*.

  • Freezer now uses MariaDB as the default database backend.

    Elasticsearch remains as an optional backend due to the requirement of Freezer to use Elasticsearch version 2.3.0. Elasticsearch in kolla-ansible is 5.6.x and that doesn’t work with Freezer.

    New variables have been added:: freezer_database_backend, freezer_database_name, freezer_database_user, freezer_database_address, freezer_elasticsearch_replicas, freezer_es_protocol, freezer_es_address, freezer_es_port

  • The default connection limit for HAProxy backends is 2000 however, MariaDB defaults to a max of 10000 conections. This has been changed to match the MariaDB limit.

    ‘haproxy_max_connections’ has also been increased to 40000 to accommodate this.

  • When installing kolla-ansible from source, the kolla_ansible python module must now be installed in addition to the python dependencies listed in requirements.txt. This is done via:

    pip install /path/to/kolla-ansible
    

    If the git repository is in the current directory, use the following to avoid installing the package from PyPI:

    pip install ./kolla-ansible
    
  • Changes the database backup procedure to use mariabackup which is compatible with MariaDB 10.3. The qpress based compression used previously is now replaced with gzip. The documented restore procedure has been modified accordingly. See the Mariabackup documentation for further information.

  • Removes the cinder_iscsi_helper variable which was deprecated in the Stein cycle in favour of cinder_target_helper.

  • The Heat role has stopped disabling deprecated plugins. To apply this change to existing deployments, the file /etc/kolla/heat-engine/_deprecated.yaml is automatically removed during the upgrade.

  • Removes support for installing Docker using the legacy Docker packages, and the variable docker_legacy_packages. Docker is now always installed using the Community Edition (CE) packages.

  • The nova-consoleauth service is no longer deployed. This has been deprecated in nova since Rocky and has not been used by other nova services since.

  • The legacy upgrade method for Nova has been removed in favour of the rolling upgrade which has been the default since Stein. nova_enable_rolling_upgrade should no longer be set.

  • Support for deployment of OracleLinux containers has been removed.

  • In Train, Tacker started using local filesystem to store VNF packages and CSAR files. Kolla Ansible provides no shared filesystem capabilities, hence only one instance of each Tacker service is deployed and all on the same host. Previous multinode deployments will be descaled when running upgrade.

Deprecation Notes

  • Deprecates support for deploying Ceph. In a future release support for deploying Ceph will be removed from Kolla Ansible. Prior to this we will ensure a migration path to another tool such as Ceph Ansible is available. For new deployments it is recommended to use another tool to deploy Ceph to avoid a future migration. This can be integrated with OpenStack by following the external Ceph guide.

  • Support for deploying ONOS integration config is deprecated. In the Ussuri cycle it will be removed. The Neutron (networking) project does not support ONOS at all since 2017.

  • Support for deploying OpenDaylight (ODL) is deprecated. In the Ussuri cycle support for deploying ODL will be removed. The version of ODL provided by Kolla has not been supported by Neutron since the Rocky release. It is recommended to use ODL upstream documentation to get it deployed.

  • Configuring Docker daemon via docker_custom_option (used in systemd unit file) is deprecated in favour of docker_custom_config variable which adds options to /etc/docker/daemon.json.

  • The enable_xtrabackup variable is deprecated in favour of enable_mariabackup.

  • The Neutron LBaaS project was retired and support for it in Kolla-Ansible removed.

  • The enable_cadf_notifications variable is deprecated. CADF is the default notification format in keystone. To enable keystone notifications, users should now set keystone_default_notifications_topic_enabled to yes or enable Ceilometer via enable_ceilometer.

Bug Fixes

  • Adds system hostnames to /etc/hosts, if different from short hostnames. This can fix live migration of Nova instances in some contexts. See bug 1830023 for details.

  • When etcd is used with cinder_coordination_backend and/or designate_coordination_backend, the config has been changed to use the etcd3gw (aka etcd3+http) tooz coordination driver instead of etcd3 due to issues with the latter’s availability and stability. etcd3 does not handle well eventlet-based services, such as cinder’s and designate’s. See bugs 1852086 and 1854932 for details. See also tooz change introducing etcd3gw.

  • Fixes an issue where a failure in pulling an image could lead to a container being removed and not replaced. See bug 1852572 for details.

  • Fixes Swift volume mounting failing on kernel 4.19 and later due to removal of nobarrier from XFS mount options. See bug 1800132 for details.

Other Notes

  • The upstream-deprecated Nova RetryFilter has been removed from Blazar-enabled and fake Nova config. It has no effect since Queens.

  • While Kolla Ansible now avoids duplicating Nova cells when messaging or database connection information are changed, operators of existing deployments should perform a manual cleanup of duplicate cells using the nova-manage cell_v2 command from a container running the nova_api image, leaving only two cells, one named cell0 and another one with the right connection information.

  • Tempest no longer disables IPv6 tests. The upstream default is used now.