2023.1 Series Release Notes

6.0.0

Upgrade Notes

  • The default value of [oslo_policy] policy_file config option has been changed from policy.json to policy.yaml. Operators who are utilizing customized or previously generated static policy JSON files (which are not needed by default), should generate new policy files or convert them in YAML format. Use the oslopolicy-convert-json-to-yaml tool to convert a JSON to YAML formatted policy file in backward compatible way.

Deprecation Notes

  • Use of JSON policy files was deprecated by the oslo.policy library during the Victoria development cycle. As a result, this deprecation is being noted in the Wallaby cycle with an anticipated future removal of support by oslo.policy. As such operators will need to convert to YAML policy files. Please see the upgrade notes for details on migration of any custom policy files.

4.0.0.0rc1

Prelude

Many operations in the decision engine will block on I/O. Such I/O operations can stall the execution of a sequential application significantly. To reduce the potential bottleneck of many operations the general purpose decision engine threadpool is introduced.

New Features

  • A new threadpool for the decision engine that contributors can use to improve the performance of many operations, primarily I/O bound onces. The amount of workers used by the decision engine threadpool can be configured to scale according to the available infrastructure using the watcher_decision_engine.max_general_workers config option. Documentation for contributors to effectively use this threadpool is available online: https://docs.openstack.org/watcher/latest/contributor/concurrency.html

  • The building of the compute (Nova) data model will be done using the decision engine threadpool, thereby, significantly reducing the total time required to build it.

Upgrade Notes

  • Python 2.7 support has been dropped. Last release of Watcher to support py2.7 is OpenStack Train. The minimum version of Python now supported by Watcher is Python 3.6.

3.0.0.0rc1

New Features

  • Add force field to Audit. User can set –force to enable the new option when launching audit. If force is True, audit will be executed despite of ongoing actionplan. The new audit may create a wrong actionplan if they use the same data model.

  • API calls while building the Compute data model will be retried upon failure. The amount of failures allowed before giving up and the time before reattempting are configurable. The api_call_retries and api_query_timeout parameters in the [collector] group can be used to adjust these parameters. 10 retries with a 1 second time in between reattempts is the default.

  • All datasources can now be configured to retry retrieving a metric upon encountering an error. Between each attempt will be a set amount of time which can be adjusted from the configuration. These configuration options can be found in the [watcher_datasources] group and are named query_max_retries and query_timeout.

  • Allow using file to override metric map. Override the metric map of each datasource as soon as it is created by the manager. This override comes from a file whose path is provided by a setting in config file. The setting is watcher_decision_engine/metric_map_path. The file contains a map per datasource whose keys are the metric names as recognized by watcher and the value is the real name of the metric in the datasource. This setting defaults to /etc/watcher/metric_map.yaml, and presence of this file is optional.

  • Improved interface for datasource baseclass that better defines expected values and types for parameters and return types of all abstract methods. This allows all strategies to work with every datasource provided the metrics are configured for that given datasource.

  • Watcher now supports configuring which datasource to use and in which order. This configuration is done by specifying datasources in the watcher_datasources section:

    • [watcher_datasources] datasources = gnocchi,monasca,ceilometer

    Specific strategies can override this order and use datasources which are not listed in the global preference.

  • Grafana has been added as datasource that can be used for collecting metrics. The configuration options allow to specify what metrics and how they are stored in Grafana so that no matter how Grafana is configured it can still be used. The configuration can be done via the typical configuration file but it is recommended to configure most options in the yaml file for metrics. For a complete walkthrough on configuring Grafana see: https://docs.openstack.org/watcher/latest/datasources/grafana.html

  • Watcher can get resource information such as total, allocation ratio and reserved information from Placement API. Now we add some new fields to the Watcher Data Model:

    • vcpu_reserved: The amount of CPU a node has reserved for its own use.

    • vcpu_ratio: CPU allocation ratio.

    • memory_mb_reserved: The amount of memory a node has reserved for its own use.

    • memory_ratio: Memory allocation ratio.

    • disk_gb_reserved: The amount of disk a node has reserved for its own use.

    • disk_ratio: Disk allocation ratio.

    We also add some new properties:

    • vcpu_capacity: The amount of vcpu, take allocation ratio into account, but do not include reserved.

    • memory_mb_capacity: The amount of memory, take allocation ratio into account, but do not include reserved.

    • disk_gb_capacity: The amount of disk, take allocation ratio into account, but do not include reserved.

  • Added strategy “node resource consolidation”. This strategy is used to centralize VMs to as few nodes as possible by VM migration. User can set an input parameter to decide how to select the destination node.

  • Add show data model API for Watcher. New in version 1.3. User can use ‘openstack optimize datamodel list’ command to view the current data model information in memory. User can also add ‘–audit <Audit_UUID>’ to view specific data model in memory filtered by the scope in audit. User can also add ‘–detail’ to view detailed information about current data model. User can also add ‘–type <type>’ to specify the type of data model. Default type is ‘compute’. In the future, type ‘storage’ and ‘baremetal’ will be supported.

  • Add keystone_client Group for user to configure ‘interface’ and ‘region_name’ by watcher.conf. The default value of ‘interface’ is ‘admin’.

  • Added Placement API helper to Watcher. Now Watcher can get information about resource providers, it can be used for the data model and strategies. Config group placement_client with options ‘api_version’, ‘interface’ and ‘region_name’ is also added. The default values for ‘api_version’ and ‘interface’ are 1.29 and ‘public’, respectively.

  • Now Watcher strategy can select specific planner beyond default. Strategy can set planner property to specify its own planner.

Upgrade Notes

  • If Gnocchi was configured to have a custom amount of retries and or a custom timeout then the configuration needs to moved into the [watcher_datasources] group instead of the [gnocchi_client] group.

  • The minimum required version of the [nova_client]/api_version value is now enforced to be 2.56 which is available since the Queens version of the Nova compute service.

    A watcher-status upgrade check has been added for this.

  • An Watcher API WSGI application script watcher-api-wsgi is now available. It is auto-generated by pbr and allows to run the API service using WSGI server (for example Nginx and uWSGI).

Deprecation Notes

  • The configuration options for query retries in [gnocchi_client] are deprecated and the option in [watcher_datasources] should now be used.

  • The new strategy baseclass has significant changes in method parameters and any out-of-tree strategies will have to be adopted.

  • Several strategies have changed the node parameter to compute_node to be better aligned with terminology. These strategies include basic_consolidation and workload_stabilzation. The node parameter will remain supported during Train release and will be removed in the subsequent release.

  • Using watcher/api/app.wsgi script is deprecated and it will be removed in U release. Please switch to automatically generated watcher-api-wsgi script instead.

2.0.0

Prelude

Added new tool watcher-status upgrade check.

New Features

  • Baremetal Model gets Audit scope with an ability to exclude Ironic nodes.

  • Add start_time and end_time fields in audits table. User can set the start time and/or end time when creating CONTINUOUS audit.

  • New framework for watcher-status upgrade check command is added. This framework allows adding various checks which can be run before a Watcher upgrade to ensure if the upgrade can be performed safely.

  • Watcher starts to support API microversions since the Stein cycle. From now onwards all API changes should be made with saving backward compatibility. To specify API version operator should use OpenStack-API-Version HTTP header. If operator wants to know the minimum and maximum supported versions by API, he/she can access /v1 resource and Watcher API will return appropriate headers in response.

  • Watcher consumes Nova notifications to update its internal Compute CDM(Cluster Data Model). All the notifications as below

    pre-existing:

    • service.update

    • instance.update

    • instance.delete.end

    new:

    • instance.lock

    • instance.unlock

    • instance.pause.end

    • instance.power_off.end

    • instance.power_on.end

    • instance.resize_confirm.end

    • instance.restore.end

    • instance.resume.end

    • instance.shelve.end

    • instance.shutdown.end

    • instance.suspend.end

    • instance.unpause.end

    • instance.unrescue.end

    • instance.unshelve.end

    • instance.rebuild.end

    • instance.rescue.end

    • instance.create.end

    • instance.live_migration_force_complete.end

    • instance.live_migration_post_dest.end

    • instance.soft_delete.end

    • service.create

    • service.delete

  • Added a new config option ‘action_execution_rule’ which is a dict type. Its key field is strategy name and the value is ‘ALWAYS’ or ‘ANY’. ‘ALWAYS’ means the callback function returns True as usual. ‘ANY’ means the return depends on the result of previous action execution. The callback returns True if previous action gets failed, and the engine continues to run the next action. If previous action executes success, the callback returns False then the next action will be ignored. For strategies that aren’t in ‘action_execution_rule’, the callback always returns True. Please add the next section in the watcher.conf file if your strategy needs this feature. [watcher_workflow_engines.taskflow] action_execution_rule = {‘your strategy name’: ‘ANY’}

  • For a large cloud infrastructure, retrieving data from Nova may take a long time. To avoid getting too much data from Nova, building the compute data model according to the scope of audit.

Upgrade Notes

  • Operator can now use new CLI tool watcher-status upgrade check to check if Watcher deployment can be safely upgraded from N-1 to N release.

Deprecation Notes

  • Ceilometer Datasource has been deprecated since its API has been deprecated in Ocata cycle. Watcher has supported Ceilometer for some releases after Ocata to let users migrate to Gnocchi/Monasca datasources. Since Train release, Ceilometer support will be removed.

  • Watcher removes the support to Nova legacy notifications because of Nova will deprecate them.

1.11.0

New Features

  • Watcher services can be launched in HA mode. From now on Watcher Decision Engine and Watcher Applier services may be deployed on different nodes to run in active-active or active-passive mode. Any ONGOING Audits or Action Plans will be CANCELLED if service they are executed on is restarted.

1.10.0

New Features

  • Feature to exclude instances from audit scope based on project_id is added. Now instances from particular project in OpenStack can be excluded from audit defining scope in audit templates.

  • Added a strategy for one compute node maintenance, without having the user’s application been interrupted. If given one backup node, the strategy will firstly migrate all instances from the maintenance node to the backup node. If the backup node is not provided, it will migrate all instances, relying on nova-scheduler.

1.9.0

New Features

  • Audits have ‘name’ field now, that is more friendly to end users. Audit’s name can’t exceed 63 characters.

  • Watcher has a whole scope of the cluster, when building compute CDM which includes all instances. It filters excluded instances when migration during the audit.

  • Watcher got an ability to calculate multiple global efficacy indicators during audit’s execution. Now global efficacy can be calculated for many resource types (like volumes, instances, network) if strategy supports efficacy indicators.

  • Added notifications about cancelling of action plan. Now event based plugins know when action plan cancel started and completed.

  • Instance cold migration logic is now replaced with using Nova migrate Server(migrate Action) API which has host option since v2.56.

Upgrade Notes

  • Nova API version is now set to 2.56 by default. This needs the migrate action of migration type cold with destination_node parameter to work.

Bug Fixes

  • The migrate action of migration type cold with destination_node parameter was fixed. Before fixing, it booted an instance in the service project as a migrated instance.

1.7.0

New Features

  • Adds audit scoper for storage data model, now watcher users can specify audit scope for storage CDM in the same manner as compute scope.

  • Adds baremetal data model in Watcher

  • Added a way to check state of strategy before audit’s execution. Administrator can use “watcher strategy state <strategy_name>” command to get information about metrics’ availability, datasource’s availability and CDM’s availability.

  • Added storage capacity balance strategy.

  • Added strategy “Zone migration” and it’s goal “Hardware maintenance”. The strategy migrates many instances and volumes efficiently with minimum downtime automatically.

1.6.0

New Features

  • Each CDM collector can have its own CDM scoper now. This changed Scope JSON schema definition for the audit template POST data. Please see audit template create help message in python-watcherclient.

1.5.0

New Features

  • Existing workload_balance strategy based on the VM workloads of CPU. This feature improves the strategy. By the input parameter “metrics”, it makes decision to migrate a VM base on CPU or memory utilisation.

1.4.0

New Features

  • Add notifications related to Action object.

  • Added the functionality to filter out instances which have metadata field ‘optimize’ set to False. For now, this is only available for the basic_consolidation strategy (if “check_optimize_metadata” configuration option is enabled).

  • Added binding between apscheduler job and Watcher decision engine service. It will allow to provide HA support in the future.

  • Enhancement of vm_workload_consolidation strategy by using ‘memory.resident’ metric in place of ‘memory.usage’, as memory.usage shows the memory usage inside guest-os and memory.resident represents volume of RAM used by instance on host machine.

  • There is new ability to create Watcher continuous audits with cron interval. It means you may use, for example, optional argument ‘–interval “*/5 * * * *”’ to launch audit every 5 minutes. These jobs are executed on a best effort basis and therefore, we recommend you to use a minimal cron interval of at least one minute.

  • Add description property for dynamic action. Admin can see detail information of any specify action.

  • Added Gnocchi support as data source for metrics. Administrator can change data source for each strategy using config file.

  • Added using of JSONSchema instead of voluptuous to validate Actions.

  • Added strategy to identify and migrate a Noisy Neighbour - a low-priority VM that negatively affects the performance of a high-priority VM by over utilising Last Level Cache.

  • Add notifications related to Service object.

  • Added volume migrate action

1.3.0

New Features

  • Add action for compute node power on/off

  • Added cinder cluster data model

1.1.0

New Features

  • Added SUSPENDED audit state

1.0.0

New Features

  • Add notifications related to Action plan object.

0.34.0

New Features

  • Add notifications related to Audit object.

  • Watcher can continuously optimise the OpenStack cloud for a specific strategy or goal by triggering an audit periodically which generates an action plan and run it automatically.

  • Centralise all configuration options for Watcher.

  • Watcher database can now be upgraded thanks to Alembic.

  • Provides a generic way to define the scope of an audit. The set of audited resources will be called “Audit scope” and will be defined in each audit template (which contains the audit settings).

  • The graph model describes how VMs are associated to compute hosts. This allows for seeing relationships upfront between the entities and hence can be used to identify hot/cold spots in the data centre and influence a strategy decision.

  • Watcher supports multiple metrics backend and relies on Ceilometer and Monasca.

  • Watcher can now run specific actions in parallel improving the performance dramatically when executing an action plan.

  • Check the creation time of the action plan, and set its state to SUPERSEDED if it has expired.

  • Provide a notification mechanism into Watcher that supports versioning. Whenever a Watcher object is created, updated or deleted, a versioned notification will, if it’s relevant, be automatically sent to notify in order to allow an event-driven style of architecture within Watcher. Moreover, it will also give other services and/or 3rd party software (e.g. monitoring solutions or rules engines) the ability to react to such events.

  • Add a service supervisor to watch Watcher daemons.

  • all Watcher objects have been refactored to support OVO (oslo.versionedobjects) which was a prerequisite step in order to implement versioned notifications.

0.29.0

New Features

  • Added a standard way to both declare and fetch configuration options so that whenever the administrator generates the Watcher configuration sample file, it contains the configuration options of the plugins that are currently available.

  • Added a generic scoring engine module, which will standardize interactions with scoring engines through the common API. It is possible to use the scoring engine by different Strategies, which improve the code and data model re-use.

  • Added an in-memory cache of the cluster model built up and kept fresh via notifications from services of interest in addition to periodic syncing logic.

  • Added a way to add a new action without having to amend the source code of the default planner.

  • Added a way to create periodic audit to be able to continuously optimise the cloud infrastructure.

  • Added a way to compare the efficacy of different strategies for a give optimisation goal.

  • Added a way to return the of available goals depending on which strategies have been deployed on the node where the decision engine is running.

  • Allow decision engine to pass strategy parameters, like optimisation threshold, to selected strategy, also strategy to provide parameters info to end user.

  • Copy all audit templates parameters into audit instead of having a reference to the audit template.

  • Added a strategy that monitors if there is a higher load on some hosts compared to other hosts in the cluster and re-balances the work across hosts to minimise the standard deviation of the loads in the cluster.

  • Added a new strategy based on the airflow of servers. This strategy makes decisions to migrate VMs to make the airflow uniform.

  • Added policies to handle user rights to access Watcher API.

  • Added a strategy based on the VM workloads of hypervisors. This strategy makes decisions to migrate workloads to make the total VM workloads of each hypervisor balanced, when the total VM workloads of hypervisor reaches threshold.