Workload Stabilization Strategy

Synopsis

display name: Workload stabilization

goal: workload_balancing

Workload Stabilization control using live migration

This workload stabilization strategy is based on the standard deviation algorithm, as a measure of cluster resource usage balance. The goal is to determine if there is an overload in a cluster and respond to it by migrating VMs to stabilize the cluster.

The standard deviation is determined using normalized CPU and/or memory usage values, which are scaled to a range between 0 and 1 based on the usage metrics in the data sources.

A standard deviation of 0 means that your cluster’s resources are perfectly balanced, with all usage values being identical. However, a standard deviation of 0.5 indicates completely unbalanced resource usage, where some resources are heavily utilized and others are not at all.

This strategy has been tested in a small (32 nodes) cluster.

It assumes that live migrations are possible in your cluster.

Requirements

Metrics

The workload_stabilization strategy requires the following metrics:

metric

description

instance_ram_usage

ram memory usage in an instance as float in megabytes

instance_cpu_usage

cpu usage in an instance as float ranging between 0 and 100 representing the total cpu usage as percentage

host_ram_usage

ram memory usage in a compute node as float in megabytes

host_cpu_usage

cpu usage in a compute node as float ranging between 0 and 100 representing the total cpu usage as percentage

Cluster data model

Default Watcher’s Compute cluster data model:

Nova cluster data model collector

The Nova cluster data model collector creates an in-memory representation of the resources exposed by the compute service.

Actions

Default Watcher’s actions:

action

description

migration

Migrates a server to a destination nova-compute host

This action will allow you to migrate a server to another compute destination host. Migration type ‘live’ can only be used for migrating active VMs. Migration type ‘cold’ can be used for migrating non-active VMs as well active VMs, which will be shut down while migrating.

The action schema is:

schema = Schema({
 'resource_id': str,  # should be a UUID
 'migration_type': str,  # choices -> "live", "cold"
 'destination_node': str,
 'source_node': str,
})

The resource_id is the UUID of the server to migrate. The source_node and destination_node parameters are respectively the source and the destination compute hostname.

Note

Nova API version must be 2.56 or above if destination_node parameter is given.

Planner

Default Watcher’s planner:

Weight planner implementation

This implementation builds actions with parents in accordance with weights. Set of actions having a higher weight will be scheduled before the other ones. There are two config options to configure: action_weights and parallelization.

Limitations

  • This planner requires to have action_weights and parallelization configs tuned well.

Configuration

Strategy parameters are:

parameter

type

default Value

description

metrics

array

[“instance_cpu_usage”, “instance_ram_usage”]

Metrics used as rates of cluster loads.

thresholds

object

{“instance_cpu_usage”: 0.2, “instance_ram_usage”: 0.2}

Dict where key is a metric and value is a trigger value. The strategy will only will look for an action plan when the standard deviation for the usage of one of the resources included in the metrics, taken as a normalized usage between 0 and 1 among the hosts is higher than the threshold. The value of a perfectly balanced cluster for the standard deviation would be 0, while in a totally unbalanced one would be 0.5, which should be the maximum value.

weights

object

{“instance_cpu_usage_weight”: 1.0, “instance_ram_usage_weight”: 1.0}

These weights are used to calculate common standard deviation when optimizing the resources usage. Name of weight contains meter name and _weight suffix. Higher values imply the metric will be prioritized when calculating an optimal resulting cluster distribution.

instance_metrics

object

{“instance_cpu_usage”: “host_cpu_usage”, “instance_ram_usage”: “host_ram_usage”}

This parameter represents the compute node metrics representing compute resource usage for the instances resource indicated in the metrics parameter.

host_choice

string

retry

Method of host’s choice when analyzing destination for instances. There are cycle, retry and fullsearch methods. Cycle will iterate hosts in cycle. Retry will get some hosts random (count defined in retry_count option). Fullsearch will return each host from list.

retry_count

number

1

Count of random returned hosts.

periods

object

{“instance”: 720, “node”: 600}

Time, in seconds, to get statistical values for resources usage for instance and host metrics. Watcher will use the last period to calculate resource usage.

granularity

number

300

NOT RECOMMENDED TO MODIFY: The time between two measures in an aggregated timeseries of a metric.

aggregation_method

object

{“instance”: ‘mean’, “compute_node”: ‘mean’}

NOT RECOMMENDED TO MODIFY: Function used to aggregate multiple measures into an aggregated value.

Efficacy Indicator

Global efficacy indicator:

[{'name': 'live_migrations_count', 'description': 'Ratio of migrated virtual machines to audited virtual machines', 'unit': '%', 'value': 0}]

Other efficacy indicators of the goal are:

  • instance_migrations_count: The number of VM migrations to be performed

  • instances_count: The total number of audited instances in strategy

  • standard_deviation_after_audit: The value of resulted standard deviation

  • standard_deviation_before_audit: The value of original standard deviation

Algorithm

You can find description of overload algorithm and role of standard deviation here: https://specs.openstack.org/openstack/watcher-specs/specs/newton/implemented/sd-strategy.html

How to use it ?

$ openstack optimize audittemplate create \
  at1 workload_balancing --strategy workload_stabilization

$ openstack optimize audit create -a at1 \
  -p thresholds='{"instance_ram_usage": 0.05}' \
  -p metrics='["instance_ram_usage"]'