Xena Series Release Notes

7.4.0

Upgrade Notes

  • The default value of [oslo_policy] policy_file config option has been changed from policy.json to policy.yaml. Operators who are utilizing customized or previously generated static policy JSON files (which are not needed by default), should generate new policy files or convert them in YAML format. Use the oslopolicy-convert-json-to-yaml tool to convert a JSON to YAML formatted policy file in backward compatible way.

Deprecation Notes

  • Use of JSON policy files was deprecated by the oslo.policy library during the Victoria development cycle. As a result, this deprecation is being noted in the Wallaby cycle with an anticipated future removal of support by oslo.policy. As such operators will need to convert to YAML policy files. Please see the upgrade notes for details on migration of any custom policy files.

7.3.0

New Features

  • A new Cetus Datasource has been introduced to include Cetus entities (cluster and pod) in Vitrage Entity Graph. Cetus is a self-developed openstack solution of k8s on openstack. It can automatically create multiple instances, and automatically deploy multiple k8s clusters on instances. Cetus mainly represents the self-developed openstack project and the multi-cluster k8s project, so it can be operated through openstack authentication access. Cetus mainly includes cetus.cluster, cetus.pod, cetus.node corresponding to k8s cluster, pod, node, among cetus.node is vm instance or bm instance in openstack, so only includes cetus.cluster and cetus.pod in the cetus datasource. At this point, Cetus entities are extracted using PULL approach, based on periodical snapshot-query to Cetus API for the current list of Cetus entities.

7.1.0

New Features

  • New TMF API 639 datasource added capable of both handling topology snapshots and further updates. All described within the TMF’s API 639 specification.

7.0.0

Upgrade Notes

  • The deprecated os-region-name option has been dropped.

Deprecation Notes

  • The region-name is deprecated in keystone_client’s cfgs, and use region_name instead.

6.0.0

New Features

  • Starting with Train release Vitrage supports database migrations. It means that starting U release you will be able to upgrade Vitrage from the previous release.

Upgrade Notes

  • Python 2.7 support has been dropped. Last release of vitrage to support py2.7 is OpenStack Train. The minimum version of Python now supported by vitrage is Python 3.6.

5.0.0

New Features

  • Added a new API to show vitrage status.

  • Added a new API to list all vitrage template versions supported.

  • A new Kapacitor Datasource was added, to handle alerts coming from Kapacitor. Kapacitor is an alarming engine in the TICK Stack. It is build on an Open Source core, processing metric of host or instance store in InfluxDB to export alerts.

  • A new Monasca Datasource has been introduced to include Monasca alarms in Vitrage Entity Graph. Monasca is Monitoring as a Service solution offering centralized monitoring sink for metrics gathered by Monasca Agents at many infrastructure levels. Moreover it provides alarm management API that enables defining alarms based on collected metrics. This change is the first stage of integration with Monasca. At this point, Monasca entities are extracted using PULL approach, based on periodical snapshot-query to Monasca Alarm API for the current list of alarm entities. In the future, PUSH approach based on Monasca notifications will be implemented. Current implementation requires that the metrics associated with the given alarm contain information about resource type and ID - required for associating alarms with entities in Vitrage Entity Graph. This additional information should be included in the form of metric dimensions, precisely resource_type and resource_id. Dimensions can be defined in Monasca agent configuration.

  • Added support to overwrite existing template when adding one.

  • Added support to show and delete template by name.

4.3.0

Prelude

Vitrage Stein release contains some significant changes:

  • New and simplified template language! the new templates are shorter and much easier to understand and reuse: https://docs.openstack.org/vitrage/latest/contributor/vitrage-templates.html

  • Added a Trove datasource and a Zaqar notifier.

  • New APIs for querying Vitrage services and for resource count.

  • Performance improvements and faster data retrieval. The memory signature and processing runtime were significantly reduced.

4.2.0

New Features

  • Datasource end messages previously used to notify the processor that get_all finished successfully. Are no longer used and are removed.

Upgrade Notes

  • Default Cinder API version is changed to v3. It’s fully compatible with API v2. If you need to use Cinder API v2 you need to cinder_version=’2’ in the Vitrage configuration file.

Deprecation Notes

  • Config option initialization_interval is deprecated and no longer used, due to the removal of datasource end messages.

  • Config option initialization_max_retries is deprecated and no longer used, due to the removal of datasource end messages.

4.1.0

New Features

  • Added a new API to list all vitrage services present in the system.

  • A new zaqar notifier was added, in order to send alrmas from Vitrage to zaqar messaging framework.

  • Added support for a yaml configuration file that maps the Prometheus alert labels to a corresponding Vitrage resource with specific properties (id or other unique properties).

  • Added support for get_all alerts from Prometheus Alertmanager.

  • Added support for parameters in Vitrage templates. A template may contain one or more parameters that are assigned with actual values upon template creation. This enables easy reuse of common templates.

  • Introduced template version 3, a shorter, more fluent template language. Overall template yaml appearance improvements. condition definitions were revised, relationships declarations removed.

4.0.0

Prelude

Added new tool vitrage-status upgrade check.

New Features

  • New framework for vitrage-status upgrade check command is added. This framework allows adding various checks which can be run before a Vitrage upgrade to ensure if the upgrade can be performed safely.

  • Collector service removal to simplify and enhance scale performance. vitrage-collector service was removed and vitrage-graph is responsible to execute the drivers. Allowing drivers to take advantage of python yield generators and conserve memory.

  • Use Nova versioned notifications instead of the legacy, unversioned ones. This bahavior is controlled by the use_nova_versioned_notifications configuration option.

  • Resource count new API with support for queries and group-by. Allows retrieving quick summaries of graph nodes.

  • Resource list API now supports using a query

  • A new Trove Datasource has been introduced to include Trove entities (database instances and clusters) in Vitrage Entity Graph. Trove is Database as a Service solution offering database lifecycle management (automated provisioning, configuration, backups, clustering etc.). Adding the datasource to Vitrage enables detecting problems at lower levels of infrastructure that may affect functioning of running databases, and react in response to identified issues e.g. scale the database up/out or live-migrate virtual machines from failed compute. This change is the first stage of integration with Trove. At this point, Trove entities are extracted using PULL approach, based on periodical snapshot-query to Trove API for the current list of Trove entities. In the future, PUSH approach based on Trove notifications will be implemented.

Upgrade Notes

  • Operator can now use new CLI tool vitrage-status upgrade check to check if Vitrage deployment can be safely upgraded from N-1 to N release.

Deprecation Notes

  • The static_physical datasource was removed. Please use the static datasource instead.

  • Resource list GET is deprecated, use POST instead.

3.1.0

Prelude

Vitrage Rocky release contains significant infrastructure changes that bring a lot of value to the end user. The main ones are:

  • Graph fast-failover and better HA support.

  • High-scale support. The graph was tested to work with over 100,000 entities.

  • Alarm and RCA history.

In addition, we added Kubernetes and Prometheus datasources.

New Features

  • The Alarm and RCA History feature allows saving and quering historical alarms and exploring their root cause. New set of parameters in alarm list api and a new history api allows users to query the data saved in Vitrage schema in the DB.

  • Add support for more aodh alarm types - composite, gnocchi_aggregation_by_metrics_threshold and gnocchi_aggregation_by_resources_threshold.

  • High availability of active standby vitrage-graph is better supported. A fast fail-over is implemented by storing all the required in-memory state data in mysql. Vitrage-graph initializes quickly upon failover without requesting any updates.

  • Added a new datasource for Kubernetes cluster as a workload on Openstack. We support kubernetes on top of Nova.

  • A new Prometheus Datasource was added, to handle alerts coming from Prometheus. Prometheus is an open-source systems monitoring and alerting toolkit, with exporters that exports different metrics to Prometheus and Alertmanager that handles alerts sent by Prometheus server.

  • Support for graphs with more than 100,000 vertices has been added and tested. See high-scale configuration document.

Known Issues

  • As part of Rocky fast-failover support, vitrage-graph is now reloaded from the database. This causes an issue with datasources using caches that can become outdated after vitrage-graph restart, or if more than one vitrage-collector is used. Please avoid running multiple vitrage-collector services.

Bug Fixes

  • Added support for Networkx version 2.1

3.0.0

New Features

  • Add a command line tool used as scaffold for creating new datasource.

  • Added a new Mock datasource, which can mock an entire graph and allows testing large scale stability as well as performance.

  • The collector service was changed to run on demand instead of periodically, hence it can now be run in active-active mode. This is as part of a larger design to improve high availability.

  • Oslo service was replaced by cotyledon, so Vitrage uses real threads and multiprocessing. This change removes unnecessary complications of using eventlets and timers.

  • Created a dedicated process for the api handler, for better handling api calls under stress.

  • Support get_changes in the static datasource

  • The static datasource now supports changes in existing yaml files, and updates the graph accordingly.

Bug Fixes

  • Many bug fixes related to performance and stability.

2.1.0

Deprecation Notes

  • The static_physical datasource is deprecated. Please use the static datasource instead.

2.0.0

Prelude

Vitrage Queens release contains many new features and bug fixes.

  • Major changes were made in Vitrage templates. Version 2 was introduced, and it includes features like:

    • templates that contain only topology definitions

    • regular expressions

    • functions (get_attr)

    The templates are now stored in a database, and there is a new API for adding and deleting them.

  • A support was added for webhook registration on Vitrage alarms. The registered webhooks will be notified on every alarm state change.

  • There are significant performance enhancements, mostly around parallel evaluation of the Vitrage templates.

New Features

  • Created a new API to query alarm counts, with an optional parameter to query for all_tenants. The path for the api is /v1/alarm/count/.

  • Support Aodh Gnocchi threshold alarm.

  • Added the option to import external “definition only” templates (containing only entities and relationships) into a template and use the imported definitions to create scenarios.

  • Added an Event Persistor service that listens to the RabbitMQ2 (on a different topic) and asynchronously writes the events to a relational database. All events are stored after the filter/enrich phase.

  • Support querying Vitrage healthcheck status in Vitrage client and displaying it in the console.

  • Fixed multiple non-working tempest tests and added new ones, increasing the number of running tests.

  • A version field was added to the metadata section of Vitrage templates, to allow future changes that are not backward-compatible. The default version is 1.

  • Support nested heat stacks in heat datasource.

  • Parallel evaluation of Vitrage templates. The user can now specify a number of workers to evaluate the templates. Each such worker, holds a clone of the graph and will evaluate a portion of the template scenarios. The number of workers defaults to the number of available cores.

  • Persisting of the current active actions. Scenarios that execute the same actions are considered overlapping, Vitrage keeps track of these actions, previously in-memory and now it is stored in the DB. This allows for parallel evaluators.

  • Register default policies in code. Support the community goal for the Queens release.

  • Refactored the execute-mistral action. All input parameters should appear under an input section. The change takes effect in template version 2. execute-mistral actions from version 1 are automatically converted to the new format.

  • The aodh datasource has been rewritten by using Aodh client, as Ceilometer API is being removed. It also supports ceilometer datasource with an older OpenStack version that contains the Ceilometer API.

  • A new service SNMP parsing is added. The service parses alarms reported from SNMP managed systems and sends them to the OpenStack message bus, for further processing by specific alarm datasources.

  • Integration with Sql Alchemy. Allows for data and state to be kept after restarts. And will also allow a shared data store for multiple processes.

  • Support functions in Vitrage templates version 2. The first supported function is get_attr which allows retrieving attributes from the matched entity in the graph. As the first stage this function is supported only for execute_mistral action.

  • Backend support for alarm show API. Return the alarm properties for a specific alarm. Alarm is fetched according to vitrage_id parameter. Path for the api is /v1/alarm/_id_.

  • Support mark_down action for instances via calling Nova reset-state API.

  • Added a mandatory type property to the templates metadata section in version 2. The type Should be one of {standard, definition, equivalence}

  • Added support for template add and template delete. Templates can now be added/removed by the API while vitrage is running (no restart is required). Templates are stored in the database and remain after restarting vitrage. Adding/removing a template at runtime preforms a live update to the entity graph.

  • Template fields can now contain regular expressions.

  • Added support to register webhooks to the database. When these webhooks are added and the webhook notifier option is enabled via the config file, Vitrage will send notifications regarding alarm state changes to the registered webhooks.

1.8.0

New Features

  • A new vitrage-collector service was added, in order to separate the datasource loading from the Vitrage graph processing. This change allows better HA support and will enable storing the datasources events in a database in order to support alarm history.

  • Vitrage now supports authentication with KeyCloak server using OpenId Connect protocol.

  • A new service was added to Vitrage, the machine learning service. Together with it, the first machine learning plugin was added - the Jaccard Correlation plugin. This plugin listens to rabbit MQ receiving messages about creation and deletion of alarms, and learns the correlation between each pair of alarms, in order to recommend on creation of new templates in the future.

  • Integration with Mistral (the OpenStack workflow service). The Vitrage user can specify in a Vitrage template that if a certain condition is met (like an alarm on the host, or a combination of a few alarms) a Mistral workflow should be executed. The workflow can be used, for example, to take corrective actions.

1.7.0

New Features

  • Added osprofiler support. OSProfiler is an OpenStack cross-project profiling library. It allows the user to generate a trace per request which is processed in multiple services, and then generate a tree of calls which is intuitive to understand.

1.6.0

New Features

  • A new SNMP notifier was added, in order to send SNMP traps from Vitrage. SNMP traps will be sent to signed up targets, when Vitrage deduced alarm is raised. This notifier allows to listen to alarms raised by Vitrage. The new notifier is pluggable so anyone can add an implementation.

  • The Vitrage ID feature is a change in the way we create an entity Vitrage ID. Instead of The ID being created as a concatenation of different entity fields, it is a standard openstack UUID generation. This also allows future history support.

  • Support definition of entity equivalence. If the equivalence between A and B is configured, all scenarios linked to entity A will be expanded to equivalent scenarios, i.e. same conditions and actions with the referenced entity A replaced by entity B. One use case of equivalence is treating equivalent alarms from multiple data source as the same, but no need for duplicating scenario definition in templates.

  • The Multi Tenancy feature allows different tenants to see their own entities for each api command. The feature allows admin to add the --all-tenants parameter in order to see all the entities for that api command.

  • The Not Operator feature adds support of the not operator to the templates language in addition to and and or operators. The not operator allows scenarios such as - - Support of High Availability scenarios - Support of negative scenarios, for example - raise an alarm on host that has no cpu_alarm on it.

1.5.0

Prelude

The main focus of Vitrage in the Ocata version was to enhance the integration with external projects, to support more use cases, and to give more value to the user.

New Features

  • A new Collectd Datasource was added, to handle notifications coming from collectd. collectd is a fast system statistics collection daemon, with plugins that collect different metrics. We tested the DPDK plugin, that can trigger alarms such as interface failure or noisy neighbors.

  • A new Doctor Datasource was added in order to support the OPNFV Doctor Inspector requirements. This datasource handles notifications sent from the Doctor monitor. If a compute.host.down notification arrives, the Doctor datasource will create an alarm on the host in Vitrage, call nova force-down API and create deduced alarms on the relevant instances and applications.

  • The Vitrage Static Datasource is meant to define in a yaml file cloud resources that cannot be retrieved dynamically. Switches are a good example, as currently no OpenStack project provides information about them. In Newton, only switches could be defined in the static yaml files. In Ocata the file definition was enhanced, so the user can define practically everything. The new schema is subset of the Vitrage evaluator templates schema, to make it easier to use and maintain.

  • The Aodh Datasource is used to collect alarms from Aodh and pass them to Vitrage, so Vitrage can correlate them with other alarms in the system. In Ocata we added support for receiving immediate notifications on alarm state changes from Aodh. This allows Vitrage to act immediately in case Aodh detects a problem and there is a need to trigger new alarms (e.g. on an application) or modify the states of resources.

Deprecation Notes

  • The static_physical file format is deprecated. Please use the new static file format instead.