Load Balancer Member Respawning

As a cloud operator, whenever a load balancer member node fails, I want the load balancer to stop directing traffic to the failed member and for a new member to be spawned.

Fault class

  • Hardware failure

  • Software error

  • Network failure

OpenStack projects used

  • Openstack Aodh (telemetry alarm service)

  • Openstack Heat (orchestration)

  • Openstack Octavia (load balancer as a service)

Remediation class

  • Reactive

Fault detection

From the Octavia admin guide:

Octavia will use the health information from the underlying load balancing application to determine the health of members. This information will be streamed to the Octavia database and made available via the status tree or other API methods.

In addition, an Aodh alarm is defined to detect load balancer member node failure and trigger the alarm action to notify Heat. This loadbalancer_member_health type alarm rule was added to Aodh in April 2019, and at the time of writing a patch is under review to add a Heat resource for creating this alarm type automatically via Heat templates. It is intended to update this document later with sample Heat templates.

Inputs, decision-making, and remediation

  • Octavia’s builtin behavior automatically stops directing traffic to the unresponsive member node.

  • Heat receives the Aodh alarm regarding the unresponsive member node, and according to the behavior defined in the stack template, spawns a new instance to replace the unresponsive member node.

  • Octavia detects when the new member node is operational and begins directing some traffic to the new node.

Existing implementation(s)

A demo video is available here.

Future work

Dependencies