Rocky Series Release Notes

3.1.0

Prelude

Vitrage Rocky release contains significant infrastructure changes that bring a lot of value to the end user. The main ones are:

  • Graph fast-failover and better HA support.

  • High-scale support. The graph was tested to work with over 100,000 entities.

  • Alarm and RCA history.

In addition, we added Kubernetes and Prometheus datasources.

New Features

  • The Alarm and RCA History feature allows saving and quering historical alarms and exploring their root cause. New set of parameters in alarm list api and a new history api allows users to query the data saved in Vitrage schema in the DB.

  • Add support for more aodh alarm types - composite, gnocchi_aggregation_by_metrics_threshold and gnocchi_aggregation_by_resources_threshold.

  • High availability of active standby vitrage-graph is better supported. A fast fail-over is implemented by storing all the required in-memory state data in mysql. Vitrage-graph initializes quickly upon failover without requesting any updates.

  • Added a new datasource for Kubernetes cluster as a workload on Openstack. We support kubernetes on top of Nova.

  • A new Prometheus Datasource was added, to handle alerts coming from Prometheus. Prometheus is an open-source systems monitoring and alerting toolkit, with exporters that exports different metrics to Prometheus and Alertmanager that handles alerts sent by Prometheus server.

  • Support for graphs with more than 100,000 vertices has been added and tested. See high-scale configuration document.

Known Issues

  • As part of Rocky fast-failover support, vitrage-graph is now reloaded from the database. This causes an issue with datasources using caches that can become outdated after vitrage-graph restart, or if more than one vitrage-collector is used. Please avoid running multiple vitrage-collector services.

Bug Fixes

  • Added support for Networkx version 2.1

3.0.0

New Features

  • Add a command line tool used as scaffold for creating new datasource.

  • Added a new Mock datasource, which can mock an entire graph and allows testing large scale stability as well as performance.

  • The collector service was changed to run on demand instead of periodically, hence it can now be run in active-active mode. This is as part of a larger design to improve high availability.

  • Oslo service was replaced by cotyledon, so Vitrage uses real threads and multiprocessing. This change removes unnecessary complications of using eventlets and timers.

  • Created a dedicated process for the api handler, for better handling api calls under stress.

  • Support get_changes in the static datasource

  • The static datasource now supports changes in existing yaml files, and updates the graph accordingly.

Bug Fixes

  • Many bug fixes related to performance and stability.