Yoga Series Release Notes

13.0.3

Bug Fixes

  • Fixes an issue where failure notification stuck into running status when timeout. LP#1996835

13.0.2

Bug Fixes

  • Fixes an issue that could be caused by a user sending a malformed host notification missing host status. Such notification would block the host from being added back from maintenance until manual intervention or notification expiration. LP#1960619

13.0.1

Bug Fixes

  • Fixes “Instance stopping fails randomly due to already stopped instances”. LP#1980736

12.0.0

New Features

  • Nova compute service “disable reason” is now set in case of host or process failure. It can be customised per type of failure via config.

Bug Fixes

  • Fixes Masakari API to properly return error codes for invalid requests to the user instead of 500. LP#1932194

11.0.0

New Features

  • Sometimes, operators want to temporarily disable instance-ha function. This version adds ‘enabled’ to segment. If the segment ‘enabled’ value is set False, all notifications of this segment will be ignored and no recovery methods will execute.

Upgrade Notes

  • The default value of [oslo_policy] policy_file config option has been changed from policy.json to policy.yaml. Operators who are utilizing customized or previously generated static policy JSON files (which are not needed by default), should generate new policy files or convert them in YAML format. Use the oslopolicy-convert-json-to-yaml tool to convert a JSON to YAML formatted policy file in backward compatible way.

Deprecation Notes

  • Use of JSON policy files was deprecated by the oslo.policy library during the Victoria development cycle. As a result, this deprecation is being noted in the Wallaby cycle with an anticipated future removal of support by oslo.policy. As such operators will need to convert to YAML policy files. Please see the upgrade notes for details on migration of any custom policy files.

Bug Fixes

  • Fixes /v1/ API path which returned 404 ResourceNotFound preventing microversion discovery. LP#1685145

  • Allows segment description to contain new line characters. LP#1776385

  • Fixes Masakari Engine not to try to stop an already stopped instance and fail with 409 from Nova. LP#1782517

  • Adds reserved_host to all aggregates of the failing host, instead of just the first one. LP#1856164

  • Fixes Masakari Engine not to wait for timeout when it’s known that the evacuation has failed. LP#1859406 (This fix has already been included in the first Victoria release, 10.0.0, but it was not mentioned in the release notes previously.)

  • Fixes API microversion reporting to report the latest supported microversion. LP#1882516

  • Fixes an issue where a periodic task in Masakari Engine could loop forever querying Nova API following a failed evacuation. LP#1897888

10.0.0

New Features

  • Adds ha_enabled_instance_metadata_key config option to host_failure and instance_failure config groups. This option allows operators to override the default HA_Enabled instance metadata key which controls the behaviour of Masakari towards the instance. This way one can have different keys for different failure types (host vs instance failures).

Bug Fixes

  • Fixes validation of compute host existence from checking hypervisor list to compute service list. Since masakari needs to match nova compute service hostname with the one in pacemaker cluster and added to API for correctly processing hostmonitors failover notifications.

9.0.0

Upgrade Notes

  • Python 2.7 support has been dropped. Last release of Masakari to support python 2.7 is OpenStack Train. The minimum version of Python now supported by Masakari is Python 3.6.

7.0.0

Prelude

Added new tool masakari-status upgrade check.

New Features

  • New framework for masakari-status upgrade check command is added. This framework allows adding various checks which can be run before a Masakari upgrade to ensure if the upgrade can be performed safely.

  • Added support to emit event notifications whenever user interacts with Masakari restFul APIs. The emitted notifications are documented at sample_payloads.

    To enable this feature one should set driver config option under the oslo_messaging_notifications section as shown below:

    [oslo_messaging_notifications]
    driver = log
    

    Note: Possible values are messaging, messagingv2, routing, log, test, noop. Notifications can be completely disabled by setting driver value as noop

  • Added support to record the recovery workflow details of the notification which will be returned in a new microversion 1.1 in GET /notifications/{notification_id} API.

    For example, GET /notifications/<notification_uuid> response will contain recovery_workflow_details parameter as shown here notification_details

    Added a new config section in Masakari conf file for configuring the back end to be used by taskflow driver:

    [taskflow]
    # The back end for storing recovery_workflow details of the notification.
    # (string value)
    
    connection = mysql+pymysql://root:admin@127.0.0.1/<db name>?charset=utf8
    
    # Where db_name, can be a new database or you can also specify masakari
    # database.
    

    Operator should run masakari-manage db sync command to add new db tables required for storing recovery_workflow_details.

    Note: When you run masakari-manage db sync, make sure you have notification_driver=taskflow_driver set in masakari.conf.

Upgrade Notes

  • Operator can now use new CLI tool masakari-status upgrade check to check if Masakari deployment can be safely upgraded from N-1 to N release.

6.0.0.0rc1

New Features

  • Masakari now support policy in code, which means if operators doesn’t need to modify any of the default policy rules, they do not need a policy file. Operators can modify/generate a policy.yaml.sample file which will override specific policy rules from their defaults.

    Masakari is now configured to work with two oslo.policy CLI scripts that have been added:

    • The first of these can be called like oslopolicy-list-redundant --namespace masakari and will output a list of policy rules in policy.[json|yaml] that match the project defaults. These rules can be removed from the policy file as they have no effect there.

    • The second script can be called like oslopolicy-policy-generator --namespace masakari --output-file policy-merged.yaml and will populate the policy-merged.yaml file with the effective policy. This is the merged results of project defaults and config file overrides.

    NOTE: Default policy.json file is now removed as Masakari now uses default policies. A policy file is only needed if overriding one of the defaults.

  • Operator can now customize workflows to process each type of failure notifications (hosts, instance and process) as per their requirements. Added below new config section for customized recovery flow in a new conf file masakari-custom-recovery-methods.conf

    • [taskflow_driver_recovery_flows]

    Under [taskflow_driver_recovery_flows] is added below five new config options

    • ‘instance_failure_recovery_tasks’ is a dict of tasks which will recover instance failure.

    • ‘process_failure_recovery_tasks’ is a dict of tasks which will recover process failure.

    • ‘host_auto_failure_recovery_tasks’ is a dict of tasks which will recover host failure for auto recovery.

    • ‘host_rh_failure_recovery_tasks’ is a dict of tasks which will recover host failure for rh recovery on failure host.

6.0.0.0b3

New Features

  • Masakari has been enabled for mutable config. Below option may be reloaded by sending SIGHUP to the correct process.

    ‘retry_notification_new_status_interval’ option will apply to process unfinished notifications.

6.0.0.0b2

Upgrade Notes

  • WSGI application script masakari-wsgi is now available. It allows running the masakari APIs using a WSGI server of choice (for example nginx and uwsgi, apache2 with mod_proxy_uwsgi or gunicorn). The eventlet-based servers are still available, but the WSGI options will allow greater deployment flexibility.

6.0.0.0b1

New Features

  • Operators can now purge the soft-deleted records from the database tables. Added below command to purge the records:

    masakari-manage db purge --age_in_days <days> --max_rows <rows>

    NOTE: notifications db records will be purged on the basis of update_at and status fields (finished, ignored, failed) as these records will not be automatically soft-deleted by the system.

4.0.0

Prelude

Domain name is needed when using keystone v3 to create keystone session, if not provided, InvalidInput exception will be raised. Two new options “os_user_domain_name” and “os_project_domain_name” with default value “default” are added to fix the issue.

New Features

  • Operators can decide whether error instances should be allowed for evacuation along with other instances from a failed source compute node or not. Added a new config option ignore_instances_in_error_state to achieve this. When set to True, masakari will skip the recovery of error instances otherwise it will evacuate error instances as well from a failed source compute node.

    To use this feature, following config option need to be set under host_failure section in ‘masakari.conf’ file:

    [host_failure]
    ignore_instances_in_error_state = False
    

    The default value for this config option is set to False.

  • Implemented workflow for ‘auto_priority’ and ‘rh_priority’ recovery methods in case of host failure recovery. Operators can set failover_segment’s recovery_method as ‘auto_priority’ and ‘rh_priority’ now. In case of ‘auto_priority’ the ‘auto’ workflow will be executed first to recover the instances from failed compute host. If ‘auto’ workflow fails to recover the instances then ‘reserved_host’ workflow will be tried. In case of ‘rh_priority’ the ‘reserved_host’ workflow will be executed first to recover the instances from failed compute host. If ‘reserved_host’ workflow fails to recover the instances then ‘auto’ workflow will be tried.

Deprecation Notes

  • The masakari_topic config option is now deprecated and will be removed in the Queens release.

Bug Fixes

  • Fixes bug 1693728 which will fix the race condition where after evacuation of an instance to other host user might perform some actions on that instance which gives wrong instance vm_state to ConfirmEvacuationTask that results into notification failure.

    To fix this issue, following config option is added under DEFAULT section in ‘masakari.conf’ file:

    [DEFAULT]
    host_failure_recovery_threads = 3
    

    This config option decides the number of threads going to be used for evacuating the instances.

3.0.0.0rc1

New Features

  • Added _process_unfinished_notifications to process notifications which are in error or new state. This periodic task will execute at regular interval defined by new config option ‘process_unfinished_notifications_interval’ defaults to 120 seconds. The notifications which are in ‘new’ status will be picked up based on a new config option ‘retry_notification_new_status_interval’ defaults to 60 seconds.

    To change the default execution time of periodic task, following config option needs to be set with desirable time under ‘DEFAULT’ section in ‘masakari.conf’ file:

    [DEFAULT]
    process_unfinished_notifications_interval = 120
    

    To change the default identification time of notifications which are stuck in ‘NEW’ state, following config option needs to be set with desirable time under ‘DEFAULT’ section in ‘masakari.conf’ file:

    [DEFAULT]
    retry_notification_new_status_interval = 60
    
  • Operators can decide whether all instances or only those instances which contain metadata key ‘HA_Enabled=True’ should be allowed for evacuation from a failed source compute node. When set to True, it will evacuate all instances from a failed source compute node. First preference will be given to those instances which contain ‘HA_Enabled=True’ metadata key, and then it will evacuate the remaining ones. When set to False, it will evacuate only those instances which contain ‘HA_Enabled=True’ metadata key.

    To use this feature, following config option need to be set under host_failure section in ‘masakari.conf’ file:

    [host_failure]
    evacuate_all_instances = True
    
  • Operators can decide whether all instances or only those instances which contain metadata key ‘HA_Enabled=True’ should be taken into account to recover from instance failure events. When set to True, it will execute instance failure recovery actions for an instance irrespective of whether that particular instance contains metadata key ‘HA_Enabled=True’ or not. When set to False, it will only execute instance failure recovery action for an instance which contain metadata key ‘HA_Enabled=True’.

    To use this feature, following config option need to be set under instance_failure section in ‘masakari.conf’ file:

    [instance_failure]
    process_all_instances = True
    
  • Operators can now decide based on the new config option ‘add_reserved_host_to_aggregate’ whether to add or not a reserved_host to all host aggregates which failed compute host belongs to.

    To use this feature, following config option need to be set under host_failure section in ‘masakari.conf’ file:

    [host_failure]
    add_reserved_host_to_aggregate = True
    
  • Implemented workflow for ‘reserved_host’ recovery method in case of host failure. Now operator can create or update failover segment with ‘reserved_host’ recovery method along with the existing ‘auto’ method. When ‘reserved_host’ recovery_method is set to a failover segment, operators should also add one or more hosts with reserved flag set as True.

Bug Fixes

  • Fixes bug 1645699 which will return correct response codes for below apis:

    • POST /v1/notification - old_response: 200, new_response: 202

    • DELETE /v1/notification - old_response: 404, new_response: 405

    • PUT /v1/notification/<notification_uuid> - old_response: 404, new_response: 405

    • POST /v1/host - old_response: 200, new_response: 201

    • DELETE /v1/host/<host_uuid> - old_response: 200, new_response: 204

    • POST /v1/segment - old_response: 200, new_response: 201

    • DELETE /v1/segment/<segment_uuid> - old_response: 200, new_response: 204

2.0.0

New Features

  • Added following new REST API’s for masakari operators:

    • GET /v1/segments - Returns list of all failover segments.

    • GET /v1/segments/<segment_uuid> - Returns specific failover segment with uuid.

    • POST /v1/segments - Creates a new failover segment

    • PUT /v1/segments/<segment_uuid> - Updates a failover segment by uuid

    • DELETE /v1/segments/<segment_uuid> - Delete a failover segment by uuid

  • Added following new REST API’s for masakari operators:

    • GET /v1/segments/<segment_uuid>/hosts - Returns list of all hosts associated with failover segment.

    • GET /v1/segments/<segment_uuid>/hosts/<host_uuid> - Returns specific host from the failover segment with uuid.

    • POST /v1/segments/<segment_uuid>/hosts - Creates a new host in failover segment

    • PUT /v1/segments/<segment_uuid>/hosts/<host_uuid> - Updates a host in failover segment by uuid

    • DELETE /v1/segments/<segment_uuid>/hosts/<host_uuid> - Delete a host from failover segment by uuid

  • Added following new REST API’s related to notifications:

    • GET /v1/notifications - Returns list of all notifications.

    • GET /v1/notifications/<notification_uuid> - Returns specific notification with uuid.

    • POST /v1/notifications - Creates a new notification.

Other Notes

  • Adopt oslo-config-generator to generate sample config files. New config options from masakari code should register with masakari/conf/opts.py. A deprecated option should add a deprecated group even if it didn’t alter its group, otherwise the deprecated group will use ‘DEFAULT’ by default.