Workload Stabilization Strategy¶
Synopsis¶
display name: Workload stabilization
goal: workload_balancing
Workload Stabilization control using live migration
This workload stabilization strategy is based on the standard deviation algorithm, as a measure of cluster resource usage balance. The goal is to determine if there is an overload in a cluster and respond to it by migrating VMs to stabilize the cluster.
The standard deviation is determined using normalized CPU and/or memory usage values, which are scaled to a range between 0 and 1 based on the usage metrics in the data sources.
A standard deviation of 0 means that your cluster’s resources are perfectly balanced, with all usage values being identical. However, a standard deviation of 0.5 indicates completely unbalanced resource usage, where some resources are heavily utilized and others are not at all.
This strategy has been tested in a small (32 nodes) cluster.
It assumes that live migrations are possible in your cluster.
Requirements¶
Metrics¶
The workload_stabilization strategy requires the following metrics:
metric |
description |
---|---|
|
ram memory usage in an instance as float in megabytes |
|
cpu usage in an instance as float ranging between 0 and 100 representing the total cpu usage as percentage |
|
ram memory usage in a compute node as float in megabytes |
|
cpu usage in a compute node as float ranging between 0 and 100 representing the total cpu usage as percentage |
Cluster data model¶
Default Watcher’s Compute cluster data model:
Nova cluster data model collector
The Nova cluster data model collector creates an in-memory representation of the resources exposed by the compute service.
Actions¶
Default Watcher’s actions:
action
description
migration
Migrates a server to a destination nova-compute host
This action will allow you to migrate a server to another compute destination host. Migration type ‘live’ can only be used for migrating active VMs. Migration type ‘cold’ can be used for migrating non-active VMs as well active VMs, which will be shut down while migrating.
The action schema is:
schema = Schema({ 'resource_id': str, # should be a UUID 'migration_type': str, # choices -> "live", "cold" 'destination_node': str, 'source_node': str, })The resource_id is the UUID of the server to migrate. The source_node and destination_node parameters are respectively the source and the destination compute hostname.
Note
Nova API version must be 2.56 or above if destination_node parameter is given.
Planner¶
Default Watcher’s planner:
Weight planner implementation
This implementation builds actions with parents in accordance with weights. Set of actions having a higher weight will be scheduled before the other ones. There are two config options to configure: action_weights and parallelization.
Limitations
This planner requires to have action_weights and parallelization configs tuned well.
Configuration¶
Strategy parameters are:
parameter |
type |
default Value |
description |
---|---|---|---|
|
array |
[“instance_cpu_usage”, “instance_ram_usage”] |
Metrics used as rates of cluster loads. |
|
object |
{“instance_cpu_usage”: 0.2, “instance_ram_usage”: 0.2} |
Dict where key is a metric and value is a trigger value. The strategy will only will look for an action plan when the standard deviation for the usage of one of the resources included in the metrics, taken as a normalized usage between 0 and 1 among the hosts is higher than the threshold. The value of a perfectly balanced cluster for the standard deviation would be 0, while in a totally unbalanced one would be 0.5, which should be the maximum value. |
|
object |
{“instance_cpu_usage_weight”: 1.0, “instance_ram_usage_weight”: 1.0} |
These weights are used to calculate common standard deviation when optimizing the resources usage. Name of weight contains meter name and _weight suffix. Higher values imply the metric will be prioritized when calculating an optimal resulting cluster distribution. |
|
object |
{“instance_cpu_usage”: “host_cpu_usage”, “instance_ram_usage”: “host_ram_usage”} |
This parameter represents the compute node metrics representing compute resource usage for the instances resource indicated in the metrics parameter. |
|
string |
retry |
Method of host’s choice when analyzing destination for instances. There are cycle, retry and fullsearch methods. Cycle will iterate hosts in cycle. Retry will get some hosts random (count defined in retry_count option). Fullsearch will return each host from list. |
|
number |
1 |
Count of random returned hosts. |
|
object |
{“instance”: 720, “node”: 600} |
Time, in seconds, to get statistical values for resources usage for instance and host metrics. Watcher will use the last period to calculate resource usage. |
|
number |
300 |
NOT RECOMMENDED TO MODIFY: The time between two measures in an aggregated timeseries of a metric. |
|
object |
{“instance”: ‘mean’, “compute_node”: ‘mean’} |
NOT RECOMMENDED TO MODIFY: Function used to aggregate multiple measures into an aggregated value. |
Efficacy Indicator¶
Global efficacy indicator:
[{'name': 'live_migrations_count', 'description': 'Ratio of migrated virtual machines to audited virtual machines', 'unit': '%', 'value': 0}]
Other efficacy indicators of the goal are:
instance_migrations_count
: The number of VM migrations to be performedinstances_count
: The total number of audited instances in strategystandard_deviation_after_audit
: The value of resulted standard deviationstandard_deviation_before_audit
: The value of original standard deviation
Algorithm¶
You can find description of overload algorithm and role of standard deviation here: https://specs.openstack.org/openstack/watcher-specs/specs/newton/implemented/sd-strategy.html
How to use it ?¶
$ openstack optimize audittemplate create \
at1 workload_balancing --strategy workload_stabilization
$ openstack optimize audit create -a at1 \
-p thresholds='{"instance_ram_usage": 0.05}' \
-p metrics='["instance_ram_usage"]'
External Links¶
None