Glossary¶
This page explains the different terms used in the Watcher system.
They are sorted in alphabetical order.
Action¶
An Action is what enables Watcher to transform the current state of a Cluster after an Audit.
An Action is an atomic task which changes the current state of a target Managed resource of the OpenStack Cluster such as:
- Live migration of an instance from one compute node to another compute node with Nova
- Changing the power level of a compute node (ACPI level, ...)
- Changing the current state of a compute node (enable or disable) with Nova
In most cases, an Action triggers some concrete commands on an existing OpenStack module (Nova, Neutron, Cinder, Ironic, etc.).
An Action has a life-cycle and its current state may be one of the following:
- PENDING : the Action has not been executed yet by the Watcher Applier
- ONGOING : the Action is currently being processed by the Watcher Applier
- SUCCEEDED : the Action has been executed successfully
- FAILED : an error occurred while trying to execute the Action
- DELETED : the Action is still stored in the Watcher database but is not returned any more through the Watcher APIs.
- CANCELLED : the Action was in PENDING or ONGOING state and was cancelled by the Administrator
Some default implementations are provided, but it is possible to develop new implementations which are dynamically loaded by Watcher at launch time.
Action Plan¶
An Action Plan specifies a flow of Actions that should be executed in order to satisfy a given Goal. It also contains an estimated global efficacy alongside a set of efficacy indicators.
An Action Plan is generated by Watcher when an Audit is successful which implies that the Strategy which was used has found a Solution to achieve the Goal of this Audit.
In the default implementation of Watcher, an action plan is composed of a list of successive Actions (i.e., a Workflow of Actions belonging to a unique branch).
However, Watcher provides abstract interfaces for many of its components, allowing other implementations to generate and handle more complex Action Plan(s) composed of two types of Action Item(s):
- simple Actions: atomic tasks, which means it can not be split into smaller tasks or commands from an OpenStack point of view.
- composite Actions: which are composed of several simple Actions ordered in sequential and/or parallel flows.
An Action Plan may be described using standard workflow model description formats such as Business Process Model and Notation 2.0 (BPMN 2.0) or Unified Modeling Language (UML).
To see the life-cycle and description of Action Plan states, visit the Action Plan state machine.
Administrator¶
The Administrator is any user who has admin access on the OpenStack cluster. This user is allowed to create new projects for tenants, create new users and assign roles to each user.
The Administrator usually has remote access to any host of the cluster in order to change the configuration and restart any OpenStack service, including Watcher.
In the context of Watcher, the Administrator is a role for users which allows them to run any Watcher commands, such as:
- Create/Delete an Audit Template
- Launch an Audit
- Get the Action Plan
- Launch a recommended Action Plan manually
- Archive previous Audits and Action Plans
The Administrator is also allowed to modify any Watcher configuration files and to restart Watcher services.
Audit¶
In the Watcher system, an Audit is a request for optimizing a Cluster.
The optimization is done in order to satisfy one Goal on a given Cluster.
For each Audit, the Watcher system generates an Action Plan.
To see the life-cycle and description of an Audit states, visit the Audit State machine.
Audit Template¶
An Audit may be launched several times with the same settings (Goal, thresholds, ...). Therefore it makes sense to save those settings in some sort of Audit preset object, which is known as an Audit Template.
An Audit Template contains at least the Goal of the Audit.
It may also contain some error handling settings indicating whether:
- Watcher Applier stops the entire operation
- Watcher Applier performs a rollback
and how many retries should be attempted before failure occurs (also the latter can be complex: for example the scenario in which there are many first-time failures on ultimately successful Actions).
Moreover, an Audit Template may contain some settings related to the level of automation for the Action Plan that will be generated by the Audit. A flag will indicate whether the Action Plan will be launched automatically or will need a manual confirmation from the Administrator.
Availability Zone¶
Please, read the official OpenStack definition of an Availability Zone.
Cluster¶
A Cluster is a set of physical machines which provide compute, storage and networking resources and are managed by the same OpenStack Controller node. A Cluster represents a set of resources that a cloud provider is able to offer to his/her customers.
A data center may contain several clusters.
The Cluster may be divided in one or several Availability Zone(s).
Cluster Data Model (CDM)¶
A Cluster Data Model (or CDM) is a logical representation of the current state and topology of the Cluster Managed resources.
It is represented as a set of Managed resources (which may be a simple tree or a flat list of key-value pairs) which enables Watcher Strategies to know the current relationships between the different resources) of the Cluster during an Audit and enables the Strategy to request information such as:
- What compute nodes are in a given Audit Scope?
- What Instances are hosted on a given compute node?
- What is the current load of a compute node?
- What is the current free memory of a compute node?
- What is the network link between two compute nodes?
- What is the available bandwidth on a given network link?
- What is the current space available on a given virtual disk of a given Instance ?
- What is the current state of a given Instance?
- ...
In a word, this data model enables the Strategy to know:
- the current topology of the Cluster
- the current capacity for each Managed resource
- the current amount of used/free space for each Managed resource
- the current state of each Managed resources
In the Watcher project, we aim at providing a some generic and basic Cluster Data Model for each Goal, usable in the associated Strategies through a plugin-based mechanism which are called cluster data model collectors (or CDMCs). These CDMCs are responsible for loading and keeping up-to-date their associated CDM by listening to events and also periodically rebuilding themselves from the ground up. They are also directly accessible from the strategies classes. These CDMs are used to:
- simplify the development of a new Strategy for a given Goal when there already are some existing Strategies associated to the same Goal
- avoid duplicating the same code in several Strategies associated to the same Goal
- have a better consistency between the different Strategies for a given Goal
- avoid any strong coupling with any external Cluster Data Model (the proposed data model acts as a pivot data model)
There may be various generic and basic Cluster Data Models proposed in Watcher helpers, each of them being adapted to achieving a given Goal:
- For example, for a Goal which aims at optimizing the network resources the Strategy may need to know which resources are communicating together.
- Whereas for a Goal which aims at optimizing thermal and power conditions, the Strategy may need to know the location of each compute node in the racks and the location of each rack in the room.
Note however that a developer can use his/her own Cluster Data Model if the proposed data model does not fit his/her needs as long as the Strategy is able to produce a Solution for the requested Goal. For example, a developer could rely on the Nova Data Model to optimize some compute resources.
The Cluster Data Model may be persisted in any appropriate storage system (SQL database, NoSQL database, JSON file, XML File, In Memory Database, ...). As of now, an in-memory model is built and maintained in the background in order to accelerate the execution of strategies.
Cluster History¶
The Cluster History contains all the previously collected timestamped data such as metrics and events associated to any managed resource of the Cluster.
Just like the Cluster Data Model, this history may be used by any Strategy in order to find the most optimal Solution during an Audit.
In the Watcher project, a generic Cluster History API is proposed with some helper classes in order to :
- share a common measurement (events or metrics) naming based on what is defined in Ceilometer. See the full list of available measurements
- share common meter types (Cumulative, Delta, Gauge) based on what is defined in Ceilometer. See the full list of meter types
- simplify the development of a new Strategy
- avoid duplicating the same code in several Strategies
- have a better consistency between the different Strategies
- avoid any strong coupling with any external metrics/events storage system (the proposed API and measurement naming system acts as a pivot format)
Note however that a developer can use his/her own history management system if the Ceilometer system does not fit his/her needs as long as the Strategy is able to produce a Solution for the requested Goal.
The Cluster History data may be persisted in any appropriate storage system (InfluxDB, OpenTSDB, MongoDB,...).
Controller Node¶
A controller node is a machine that typically runs the following core OpenStack services:
- Keystone: for identity and service management
- Cinder scheduler: for volumes management
- Glance controller: for image management
- Neutron controller: for network management
- Nova controller: for global compute resources management with services such as nova-scheduler, nova-conductor and nova-network.
In many configurations, Watcher will reside on a controller node even if it can potentially be hosted on a dedicated machine.
Compute node¶
Please, read the official OpenStack definition of a Compute Node.
Customer¶
A Customer is the person or company which subscribes to the cloud provider offering. A customer may have several Project(s) hosted on the same Cluster or dispatched on different clusters.
In the private cloud context, the Customers are different groups within the same organization (different departments, project teams, branch offices and so on). Cloud infrastructure includes the ability to precisely track each customer’s service usage so that it can be charged back to them, or at least reported to them.
Goal¶
A Goal is a human readable, observable and measurable end result having one objective to be achieved.
Here are some examples of Goals:
- minimize the energy consumption
- minimize the number of compute nodes (consolidation)
- balance the workload among compute nodes
- minimize the license cost (some softwares have a licensing model which is based on the number of sockets or cores where the software is deployed)
- find the most appropriate moment for a planned maintenance on a given group of host (which may be an entire availability zone): power supply replacement, cooling system replacement, hardware modification, ...
Host Aggregate¶
Please, read the official OpenStack definition of a Host Aggregate.
Instance¶
A running virtual machine, or a virtual machine in a known state such as suspended, that can be used like a hardware server.
Managed resource¶
A Managed resource is one instance of Managed resource type in a topology with particular properties and dependencies on other Managed resources (relationships).
For example, a Managed resource can be one virtual machine (i.e., an instance) hosted on a compute node and connected to another virtual machine through a network link (represented also as a Managed resource in the Cluster Data Model).
Managed resource type¶
A Managed resource type is a type of hardware or software element of the Cluster that the Watcher system can act on.
Here are some examples of Managed resource types:
- Nova Host Aggregates
- Nova Servers
- Cinder Volumes
- Neutron Routers
- Neutron Networks
- Neutron load-balancers
- Sahara Hadoop Cluster
- ...
It can be any of the the official list of available resource types defined in OpenStack for HEAT.
Efficacy Indicator¶
An efficacy indicator is a single value that gives an indication on how the solution produced by a given strategy performed. These efficacy indicators are specific to a given goal and are usually used to compute the global efficacy of the resulting action plan.
In Watcher, these efficacy indicators are specified alongside the goal they relate to. When a strategy (which always relates to a goal) is executed, it produces a solution containing the efficacy indicators specified by the goal. This solution, which has been translated by the Watcher Planner into an action plan, will see its indicators and global efficacy stored and would now be accessible through the Watcher API.
Efficacy Specification¶
An efficacy specfication is a contract that is associated to each Goal that defines the various efficacy indicators a strategy achieving the associated goal should provide within its solution. Indeed, each solution proposed by a strategy will be validated against this contract before calculating its global efficacy.
Optimization Efficacy¶
The Optimization Efficacy is the objective measure of how much of the Goal has been achieved in respect with constraints and SLAs defined by the Customer.
The way efficacy is evaluated will depend on the Goal to achieve.
Of course, the efficacy will be relevant only as long as the Action Plan is relevant (i.e., the current state of the Cluster has not changed in a way that a new Audit would need to be launched).
For example, if the Goal is to lower the energy consumption, the Efficacy will be computed using several efficacy indicators (KPIs):
- the percentage of energy gain (which must be the highest possible)
- the number of SLA violations (which must be the lowest possible)
- the number of virtual machine migrations (which must be the lowest possible)
All those indicators are computed within a given timeframe, which is the time taken to execute the whole Action Plan.
The efficacy also enables the Administrator to objectively compare different Strategies for the same goal and same workload of the Cluster.
Project¶
Projects represent the base unit of “ownership” in OpenStack, in that all resources in OpenStack should be owned by a specific project. In OpenStack Identity, a project must be owned by a specific domain.
Please, read the official OpenStack definition of a Project.
Scoring Engine¶
A Scoring Engine is an executable that has a well-defined input, a well-defined output, and performs a purely mathematical task. That is, the calculation does not depend on the environment in which it is running - it would produce the same result anywhere.
Because there might be multiple algorithms used to build a particular data model (and therefore a scoring engine), the usage of scoring engine might vary. A metainfo field is supposed to contain any information which might be needed by the user of a given scoring engine.
SLA¶
SLA means Service Level Agreement.
The resources are negotiated between the Customer and the Cloud Provider in a contract.
Most of the time, this contract is composed of two documents:
Note that the SLA is more general than the SLO in the sense that the former specifies what service is to be provided, how it is supported, times, locations, costs, performance, and responsibilities of the parties involved while the SLO focuses on more measurable characteristics such as availability, throughput, frequency, response time or quality.
You can also read the Wikipedia page for SLA which provides a good definition.
SLA violation¶
A SLA violation happens when a SLA defined with a given Customer could not be respected by the cloud provider within the timeframe defined by the official contract document.
SLO¶
A Service Level Objective (SLO) is a key element of a SLA between a service provider and a Customer. SLOs are agreed as a means of measuring the performance of the Service Provider and are outlined as a way of avoiding disputes between the two parties based on misunderstanding.
You can also read the Wikipedia page for SLO which provides a good definition.
Solution¶
A Solution is the result of execution of a strategy (i.e., an algorithm). Each solution is composed of many pieces of information:
- A set of actions generated by the strategy in order to achieve the goal of an associated audit.
- A set of efficacy indicators as defined by the associated goal
- A global efficacy which is computed by the associated goal using the aforementioned efficacy indicators.
A Solution is different from an Action Plan because it contains the non-scheduled list of Actions which is produced by a Strategy. In other words, the list of Actions in a Solution has not yet been re-ordered by the Watcher Planner.
Note that some algorithms (i.e. Strategies) may generate several Solutions. This gives rise to the problem of determining which Solution should be applied.
Two approaches to dealing with this can be envisaged:
- fully automated mode: only the Solution with the highest ranking (i.e., the highest Optimization Efficacy) will be sent to the Watcher Planner and translated into concrete Actions.
- manual mode: several Solutions are proposed to the Administrator with a detailed measurement of the estimated Optimization Efficacy and he/she decides which one will be launched.
Strategy¶
A Strategy is an algorithm implementation which is able to find a Solution for a given Goal.
There may be several potential strategies which are able to achieve the same Goal. This is why it is possible to configure which specific Strategy should be used for each goal.
Some strategies may provide better optimization results but may take more time to find an optimal Solution.
Watcher Applier¶
This component is in charge of executing the Action Plan built by the Watcher Decision Engine.
See: System Architecture for more details on this component.
Watcher Database¶
This database stores all the Watcher domain objects which can be requested by the Watcher API or the Watcher CLI:
- Audit templates
- Audits
- Action plans
- Actions
- Goals
The Watcher domain being here “optimization of some resources provided by an OpenStack system”.
See System Architecture for more details on this component.
Watcher Decision Engine¶
This component is responsible for computing a set of potential optimization Actions in order to fulfill the Goal of an Audit.
It first reads the parameters of the Audit from the associated Audit Template and knows the Goal to achieve.
It then selects the most appropriate Strategy depending on how Watcher was configured for this Goal.
The Strategy is then executed and generates a set of Actions which are scheduled in time by the Watcher Planner (i.e., it generates an Action Plan).
See System Architecture for more details on this component.
Watcher Planner¶
The Watcher Planner is part of the Watcher Decision Engine.
This module takes the set of Actions generated by a Strategy and builds the design of a workflow which defines how-to schedule in time those different Actions and for each Action what are the prerequisite conditions.
It is important to schedule Actions in time in order to prevent overload of the Cluster while applying the Action Plan. For example, it is important not to migrate too many instances at the same time in order to avoid a network congestion which may decrease the SLA for Customers.
It is also important to schedule Actions in order to avoid security issues such as denial of service on core OpenStack services.
Some default implementations are provided, but it is possible to develop new implementations which are dynamically loaded by Watcher at launch time.
See System Architecture for more details on this component.