Wallaby Series (6.5.0 - 7.0.x) Release Notes

7.1.0-6

Bug Fixes

  • Fixes UEFI NVRAM record handling with efibootmgr so we can accept and handle UTF-16 encoded data which is to be expected in UEFI NVRAM as the records are UTF-16 encoded.

  • Fixes handling of UEFI NVRAM records to allow for unexpected characters in the response, so it is non-fatal to Ironic.

  • Fixes, or at least lessens the case where a running Ironic agent can stack up numerous lookup requests against an Ironic deployment when a node is locked. In particular, this is beause the lookup also drives generation of the agent token, which requires the conductor to allocate a worker, and generate the token, and return the result to the API client. Ironic’s retry logic will now wait up to 60 seconds, and if an HTTP Conflict (409) message is received, the agent will automatically pause lookup operations for thirty seconds as opposed continue to attempt lookups which could create more work for the Ironic deployment needlessly.

7.1.0

Bug Fixes

  • Fixes a minor issue with the regular expression used for UEFI duplicate entry cleanup which was introduced in a prior change to refactor the cleanup operation to avoid UEFI firmware which treats deletion of entries after addition as an invalid operation.

  • Fixes cases where duplicates may not be found in the UEFI firmware NVRAM boot entry table by explicitly looking for, and deleting for matching labels in advance of creating the EFI boot loader entry.

  • In case the CSV file used for the bootloader hint does not have BOM we fail reading its content as utf-16 codec is too generic. Fail over to utf-16-le as Little Endian is mostly used.

  • Fixes configuring UEFI boot when the EFI partition is located on a devicemapper device.

  • Fixes GenericHardwareManager to find network information for bonded interfaces if they exist.

  • Fixes a race on software RAID creation: since the creation of partitions is asynchronous, we need to wait for all udev events to be processed before we can use the partitions to create an md device.

  • Fixes an issue where partitions are not visible due to a incorrect call to have the partition table re-read.

  • Fixes an issue where partitions are not visible due to an incorrect call to have the partition table re-read during raid configuration creation.

  • Fixes handling of Software RAID device discovery so RAID device Names and Events field values do not inadvertently cause the command to return unexpected output. Previously this could cause a deployment to when handling UEFI partitions.

  • Fixes an issue when the EFI partition UUID is not set and an attempt to edit /etc/fstab is made.

  • Fixes handling of a Partition UUID being returned instead of a Partition’s UUID when the OS may not return the Partition’s UUID in time. These two fields are typically referred to as PARTUUID and UUID, respectively. Often these sorts of issues arise under heavy IO load. We now scan, and identify which “UUID” we identified, and update a Linux fstab entry appropriately. For more information, please see story #2009881.

  • Recent releases of redhat grub2 will always fail when installing to EFI paths, to encourage a transition to the signed shim bootloader. Partition image deploys avoid calling grub2-install with the preserve-efi-assets functions. Deploying whole disk images doesn’t require grub2-install. This leaves whole disk images installed onto softraid devices, which still calls grub2-install. Running grub2-install is still attempted in this one remaining case, but any failures are now ignored.

  • Fixes failures with handling of Multipath IO devices where Active/Passive storage arrays are in use. Previously, “standby” paths could result in IO errors causing cleaning to terminate. The agent now explicitly attempts to handle and account for multipaths based upon the MPIO data available. This requires the multipath and multipathd utility to be present in the ramdisk. These are supplied by the device-mapper-multipath or multipath-tools packages, and are not requried for the agent’s use.

  • Fixes non-ideal behavior when performing cleaning where Active/Active MPIO devices would ultimately be cleaned once per IO path, instead of once per backend device.

  • Fixes discovering WWN/serial numbers for devicemapper devices.

Other Notes

  • The agent will now attempt to collect any multipath path information and upload it to the agent ramdisk, if the tooling is present.

7.0.2

New Features

  • Heartbeats to the conductor are grouped when they are scheduled or requested within a time interval of five seconds to avoid sending them in quick succession.

  • Adds the capability into the agent to read and act upon bootloader CSV files which serve as authoritative indicators of what bootloader to load instead of leaning towards utilizing the default.

Known Issues

  • If multiple bootloader CSV files are present on the EFI filesystem, the first CSV file discovered will be utilized. The Ironic team considers multiple files to be a defect in the image being deployed. This may be changed in the future.

Bug Fixes

  • Fixes an issue with bootloader installation on a software RAID by checking if the ESP is already mounted.

  • Fixes an issue where a quick succession of heartbeats exposes a race condition in the conductor’s RPC handling.

  • Fixes fall-back to sysrq when powering off or rebooting the node from inside a container.

  • Fixes an error with UEFI based deployments where using a partition image a NVMe device was previously failing due to the different device name pattern.

  • Fixes an issue where the NTP time sync at the IPA startup via chronyd is not immediate (which can break time sensitive components such as the generation of a TLS certificate).

  • Fixes failures with disk image conversions which result in memory allocation or input/output errors due to memory limitations by limiting the number of available memory allocation pools to a non-dynamic reasonable number which should not exceed the available system memory.

  • The lshw package version B.02.19.2-5 on CentOS 8.4 and 8.5 contains a bug that prevents the size of individual memory banks from being reported, with the result that the total memory size would be reported as 0 in some places. The total memory size is now taken from lshw’s total memory size output (which does not suffer from the same problem) when available.

  • Mirrors the previously disconnected EFI system partitions (ESPs) in UEFI software RAID setups. Disconnected ESPs can lead to nodes booting with outdated kernel parameters or the UEFI firmware not finding bootable kernels at all.

  • Fixes nodes failing after deployment completes due to issues in the Grub2 EFI loader entry addition where a BOOT.CSV file provides the authoritative pointer to the bootloader to be used for booting the OS. The base issue with Grub2 is that it would update the UEFI bootloader NVRAM entries with whatever is present in a vendor specific BOOT.CSV or BOOTX64.CSV file. In some cases, a baremetal machine can crash when this occurs. More information can be found at story 2008962.

7.0.1

Bug Fixes

  • Fixes initial logging before configuration is loaded to re-log anything recorded for the purposes of troubleshooting. This is necessary as systemd does not report stdout from a process launch as part of the process’s logging. Now messages will be re-logged once the configuration has been loaded.

  • No longer crashes if MAC address cannot be determined for one of the network interfaces.

  • Adds a call to “udevadm settle” in write_image.sh. After GPT and MBR are destroyed systemd-udevd gets triggered which may hold /dev/sda open preventing qemu-img from writting its image.

7.0.0

New Features

  • Adds support for NVMe-specific storage cleaning to IPA. Currently this is implemented by using nvme-cli format functionality. Crypto Erase is used if supported by the device, otherwise the code falls back to User Data Erase. The operators can control NVMe cleaning by using deploy.enable_nvme_erase config option which controls agent_enable_nvme_erase internal setting in driver_internal_info.

Known Issues

  • Logic around virtual media device validation is now much more strict, and may not work in all cases. Should you discover a case, please provide the output from lsblk -P -O with a virtual media device attached to the Ironic development community via Storyboard.

  • Internal logic to copy configuration data from virtual media now requires the boot_method=vmedia flag to be set on the kernel command line of the bootloader for the virtual media. Operators crafting custom boot ISOs, should ensure that the appropriate command line is being added in any custom build processes.

Upgrade Notes

  • It is no longer possible to enable the so called standalone mode, in which the agent does not communicate with ironic. This mode is only useful for local testing, enabling it on production is always wrong. The ironic team does not support using ironic-python-agent as a standalone application outside of the normal workflow.

Security Issues

  • Addresses a potential vector in which an system authenticated malicious actor could leveraged data left on disk in some limited cases to make the API of the ironic-python-agent attackable, or possibly break cleaning processes to prevent the machine from being able to be returned to the available pool. Please see story 2008749 for more information.

Bug Fixes

  • Adds validation of Virtual Media devices in order to prevent existing partitions on the system from being considered as potential sources of IPA configuration data.

  • Adds check into the configuration load from virtual media, to ensure it only occurs when the machine booted from virtual media.

  • IPA will now successfully clean configuration when it encounters a software RAID array that was previously created using entire devices instead of partitions.

  • IPA now properly checks if the root partition is already mounted. See Story 2008631 for details.

  • Fixes an issue where metadata erasure cleaning fails for partitions because the read-only file isn’t found, while it is available at the base device. Adds a check for the base device file on failure. See story 2008696.

  • Fixes incorrect root partition UUID after streaming a raw partition image.

  • Increase memory usage limit for qemu-img convert command to 2 GiB. See Story 2008667 for details.