Ussuri Series Release Notes (6.0.0 - 6.1.x)

6.1.3-4

Bug Fixes

  • In case the CSV file used for the bootloader hint does not have BOM we fail reading its content as utf-16 codec is too generic. Fail over to utf-16-le as Little Endian is mostly used.

6.1.3

Bug Fixes

  • Fixes a minor issue with the regular expression used for UEFI duplicate entry cleanup which was introduced in a prior change to refactor the cleanup operation to avoid UEFI firmware which treats deletion of entries after addition as an invalid operation.

  • Fixes cases where duplicates may not be found in the UEFI firmware NVRAM boot entry table by explicitly looking for, and deleting for matching labels in advance of creating the EFI boot loader entry.

  • Fixes an issue where partitions are not visible due to a incorrect call to have the partition table re-read.

  • Fixes an issue where partitions are not visible due to an incorrect call to have the partition table re-read during raid configuration creation.

6.1.2

New Features

  • Adds an configuration option which can be encoded into the ramdisk itself or the PXE parameters being provided to instruct the agent to ignore bootloader installation or configuration failures. This functionality is useful to work around well-intentioned hardware which is auto-populating all possible device into the UEFI nvram firmware in order to try and help ensure the machine boots. Except, this can also mean any explict configuration attempt will fail. Operators needing this bypass can use the ipa-ignore-bootloader-failure configuration option on the PXE command line or utilize the ignore_bootloader_failure option for the Ramdisk configuration. In a future version of ironic, this setting may be able to be overriden by ironic node level configuration.

  • Adds the capability into the agent to read and act upon bootloader CSV files which serve as authoritative indicators of what bootloader to load instead of leaning towards utilizing the default.

Known Issues

  • If multiple bootloader CSV files are present on the EFI filesystem, the first CSV file discovered will be utilized. The Ironic team considers multiple files to be a defect in the image being deployed. This may be changed in the future.

Bug Fixes

  • Setting the new ipa-ignore-bootloader-failure config option prevents errors due to bootloader installation failure generated by automatic bootloader entries configuration from multiple attached devices.

  • The system file system configuration file for Linux machines, the /etc/fstab file is now updated to include a reference to the EFI partition in the case of a partition image base deployment. Without this reference, images deployed using partition images could end up in situations where upgrading the bootloader could fail.

  • Fixes an issue where the bootloader installation can fail on a software RAID volume when no root_device hint is set. See Story 2007905

  • Fixes an issue with the IntelCnaHardwareManager which prevented hardware managers with lower priority to be executed and therefore may blocked the initialization and collection of hardware these managers are supposed to take care of.

  • Fixes an error with UEFI based deployments where using a partition image a NVMe device was previously failing due to the different device name pattern.

  • Fixes a bug where the partitions created during software RAID setup are cleaned too early and therefore may prevent the proper cleaning of the md superblocks. Leaving superblocks behind will impact the creation of new md devices later on.

  • Fixes retry logic issues with the Agent Lookup which can result in the lookup failing prematurely before being completed, typically resulting in an abrupt end to the agent logging and potentially weird errors like TypeError being reported on the agent process standard error output. For more information see bug 2007968.

  • Detects md component devices by their UUID, rather than by scanning the output of mdadm. This will prevent that devices miss md superblock cleanup when they are currently not part of an array.

  • Fixes failures with disk image conversions which result in memory allocation or input/output errors due to memory limitations by limiting the number of available memory allocation pools to a non-dynamic reasonable number which should not exceed the available system memory.

  • The lshw package version B.02.19.2-5 on CentOS 8.4 and 8.5 contains a bug that prevents the size of individual memory banks from being reported, with the result that the total memory size would be reported as 0 in some places. The total memory size is now taken from lshw’s total memory size output (which does not suffer from the same problem) when available.

  • Fixes the agent’s EFI boot handling such that EFI assets from a partition image are preserved and used instead of overridden. This should permit operators to use Secure Boot with partition images IF the assets are already present in the partition image.

  • Fixes nodes failing after deployment completes due to issues in the Grub2 EFI loader entry addition where a BOOT.CSV file provides the authoritative pointer to the bootloader to be used for booting the OS. The base issue with Grub2 is that it would update the UEFI bootloader NVRAM entries with whatever is present in a vendor specific BOOT.CSV or BOOTX64.CSV file. In some cases, a baremetal machine can crash when this occurs. More information can be found at story 2008962.

  • Increase memory usage limit for qemu-img convert command to 2 GiB. See Story 2008667 for details.

6.1.1

Bug Fixes

  • Fixes deployment failures when the image download is interrupted mid-stream while the contents are being downloaded. Previously retries were limited to only opening the initial connection.

  • Fixes the return value of the apply_configuration deploy step: the agent RAID interface expects the final RAID configuration to be returned.

  • Fixes the short timeout retries interval, which was previously 5 seconds, to a length that will allow the agent to retry after a network interruption. The time between retries is now 10 seconds, and the number of retries are set to 9 to help ensure intermittent network outages do not cause recoverable failures.

  • Fixes an issue with high cpu usage caused by ironic-python-agent greenthread eventlent implementation.

    Using eventlet.sleep(0.1) instead of eventlet.sleep(0) gives other processes of IPA more cpu time to run.

  • Speeds up going from inspection to cleaning with fast-track enabled by caching hardware information between the steps.

  • Fixes serializing exceptions originating from ironic-lib. Previously an attempt to do so would result in a TypeError, for example: Object of type ‘InstanceDeployFailure’ is not JSON serializable.

  • Fixes an issue with the ironic-python-agent where we would call to setup the bootloader, which is necessary with software raid, but also attempt to clean up iSCSI. This can cause issues when using the direct deploy_interface. Now the agent will only clean up iSCSI connections if iSCSI was explicitly started. For more information, please see story 2007937.

  • Fixes failure to detect a hung file download connection in the event that the kernel has not rapidly detected that the remote server has hung up the socket. This can happen when there is intermittent and transient connectivity issues such as those that can occur due to LACP failure response hold-downs timers in switching fabrics.

6.1.0

New Features

  • Adds support for the agent to receive, store, and return an agent token from the Ironic deployment to help secure use of the ironic API /v1/heartbeat endpoint, as well as the API of the ironic-python-agent ramdisk.

  • Target devices for software RAID can now be specified in the form of device hints (same as for root devices) in the physical_disks parameter of a logical disk configuration.

  • Adds a feature where IPA will utilize a [DEFAULT]ntp_server or ipa-ntp-server kernel command line argument to cause the agent to attempt to sync the clock to the NTP source. The agent also attempts to sync the software clock to the NTP time source, and assert an update to the hardware clock prior to powering the machine off. Please note, if your system clock is set to local time as opposed to UTC, this may result in undesirable behavior.

  • Adds UEFI boot support for Software RAID, and for partition table creation based upon boot mode in use.

Upgrade Notes

  • The minimum supported versions of the ironic API is now 1.31, corresponding to the latest available in the Ocata release. All versions before that one are not supported anymore.

  • The type of the partition table created for Software RAID is now based upon the boot mode in use (GPT for UEFI or if explicitly passed via the instance’s capabilities or the node’s properties, otherwise MSDOS). The amount of reserved space on the drives now also depends on the boot mode (128MiB for UEFI/GPT, 8MiB for BIOS/GPT, and one sector otherwise).

Security Issues

  • The salt was generated using random and the module it’s not in compliance with FIPS 140-2. Now we let the salt be automatically generated by the crypt function (it will use the strongest method available).

Bug Fixes

  • Fixes an issue with deployment ramdisks running in UEFI boot mode where dual-boot images may cause the logic to prematurely exit before UEFI parameters can be updated. Internal checks for a BIOS bootloader will always return False now when the machine is in UEFI mode.

  • Fixes an issue where secondary GPT partition tables were not being updated after the ironic-python-agent wrote the disk image to the target disk. The agent now unconditionally attempts to repair the secondary partition table. Previously, software RAID volumes would report errors upon restart.

  • Fixes error handling if efibootmgr is not present in ramdisk. See story for more details.

  • Provides timeout and retries when establishing a connection to download an image in the standby extension. Reduces probability of an image download getting stuck in the event of network problems.

    The default timeout is 60 seconds and can be set via the ipa-image-download-connection-timeout kernel parameter. The default number of retries is 2 and can be set via the ipa-image-download-connection-retries parameter.

  • Fixes risk of potential active node thundering heard by introducing jitter handling into the ironic-collect-introspection-data. By default, the jitter will cause the introspection_daemon_post_interval configuration parameter based time value to be honored between in a range of 70% to 120% of the desired time window.

    Should failures occur after the initial connection and start of the daemon mode for introspection data collection, the fallback is a maximum of 400% of the introspection daemon post interval.

  • The salt now will be automatically generated by the crypt function.

  • Rescans partitions on a software RAID device that gets restarted when installing boot loader.

  • Fixes an issue where the agent was failing to rescan the device deployed upon before checking uefi contents. This would occur with an iSCSI based deployment, as partition management operations are performed by the conductor, and not locally.

  • No longer tries to use GRUB2 for configuring boot for whole disk images with an EFI partition present but only marked as boot (not esp).