Watcher Overload standard deviation algorithm¶

Synopsis¶

display name: Workload stabilization

goal: workload_balancing

Workload Stabilization control using live migration

This is workload stabilization strategy based on standard deviation algorithm. The goal is to determine if there is an overload in a cluster and respond to it by migrating VMs to stabilize the cluster.

This strategy has been tested in a small (32 nodes) cluster.

It assumes that live migrations are possible in your cluster.

Requirements¶

Metrics¶

The workload_stabilization strategy requires the following metrics:

metric	service name	plugins	comment
`compute.node.cpu.percent`	ceilometer	none	need to set the `compute_monitors` option to `cpu.virt_driver` in the nova.conf.
`hardware.memory.used`	ceilometer	SNMP
`cpu`	ceilometer	none
`instance_ram_usage`	ceilometer	none

Cluster data model¶

Default Watcher’s Compute cluster data model:

Nova cluster data model collector

The Nova cluster data model collector creates an in-memory representation of the resources exposed by the compute service.

Actions¶

Default Watcher’s actions:

action

description
migration
Migrates a server to a destination nova-compute host

This action will allow you to migrate a server to another compute destination host. Migration type ‘live’ can only be used for migrating active VMs. Migration type ‘cold’ can be used for migrating non-active VMs as well active VMs, which will be shut down while migrating.

The action schema is:
schema = Schema({
 'resource_id': str,  # should be a UUID
 'migration_type': str,  # choices -> "live", "cold"
 'destination_node': str,
 'source_node': str,
})
The resource_id is the UUID of the server to migrate. The source_node and destination_node parameters are respectively the source and the destination compute hostname (list of available compute hosts is returned by this command: nova service-list --binary nova-compute).

Note

Nova API version must be 2.56 or above if destination_node parameter is given.

action	description
`migration`	Migrates a server to a destination nova-compute host This action will allow you to migrate a server to another compute destination host. Migration type ‘live’ can only be used for migrating active VMs. Migration type ‘cold’ can be used for migrating non-active VMs as well active VMs, which will be shut down while migrating. The action schema is: schema = Schema({ 'resource_id': str, # should be a UUID 'migration_type': str, # choices -> "live", "cold" 'destination_node': str, 'source_node': str, }) The resource_id is the UUID of the server to migrate. The source_node and destination_node parameters are respectively the source and the destination compute hostname (list of available compute hosts is returned by this command: `nova service-list --binary nova-compute`). Note Nova API version must be 2.56 or above if destination_node parameter is given.

Planner¶

Default Watcher’s planner:

Weight planner implementation

This implementation builds actions with parents in accordance with weights. Set of actions having a higher weight will be scheduled before the other ones. There are two config options to configure: action_weights and parallelization.

Limitations

This planner requires to have action_weights and parallelization configs tuned well.

Configuration¶

Strategy parameters are:

parameter	type	default Value	description
`metrics`	array	[“instance_cpu_usage”, “instance_ram_usage”]	Metrics used as rates of cluster loads.
`thresholds`	object	{“instance_cpu_usage”: 0.2, “instance_ram_usage”: 0.2}	Dict where key is a metric and value is a trigger value.
`weights`	object	{“instance_cpu_usage_weight”: 1.0, “instance_ram_usage_weight”: 1.0}	These weights used to calculate common standard deviation. Name of weight contains meter name and _weight suffix.
`instance_metrics`	object	{“instance_cpu_usage”: “compute.node.cpu.percent”, “instance_ram_usage”: “hardware.memory.used”}	Mapping to get hardware statistics using instance metrics.
`host_choice`	string	retry	Method of host’s choice. There are cycle, retry and fullsearch methods. Cycle will iterate hosts in cycle. Retry will get some hosts random (count defined in retry_count option). Fullsearch will return each host from list.
`retry_count`	number	1	Count of random returned hosts.
`periods`	object	{“instance”: 720, “node”: 600}	These periods are used to get statistic aggregation for instance and host metrics. The period is simply a repeating interval of time into which the samples are grouped for aggregation. Watcher uses only the last period of all received ones.

Efficacy Indicator¶

[{'name': 'released_nodes_ratio', 'description': 'Ratio of released compute nodes divided by the total number of enabled compute nodes.', 'unit': '%', 'value': 0}]

Algorithm¶

You can find description of overload algorithm and role of standard deviation here: https://specs.openstack.org/openstack/watcher-specs/specs/newton/implemented/sd-strategy.html

How to use it ?¶

$ openstack optimize audittemplate create \
  at1 workload_balancing --strategy workload_stabilization

$ openstack optimize audit create -a at1 \
  -p thresholds='{"instance_ram_usage": 0.05}' \
  -p metrics='["instance_ram_usage"]'

External Links¶

Watcher Overload standard deviation algorithm spec

Watcher Overload standard deviation algorithm

Watcher Overload standard deviation algorithm¶

Synopsis¶

Requirements¶

Metrics¶

Cluster data model¶

Actions¶

Planner¶

Configuration¶

Efficacy Indicator¶

Algorithm¶

How to use it ?¶

External Links¶

Watcher 12.0.1.dev8

Page Contents