[ English | русский | Deutsch | Indonesia | English (United Kingdom) ]

Replacing failed hardware¶

It is essential to plan and know how to replace failed hardware in your cluster without compromising your cloud environment.

Consider the following to help establish a hardware replacement plan:

What type of node am I replacing hardware on?
Can the hardware replacement be done without the host going down? For example, a single disk in a RAID-10.
If the host DOES have to be brought down for the hardware replacement, how should the resources on that host be handled?

If you have a Compute (nova) host that has a disk failure on a RAID-10, you can swap the failed disk without powering the host down. On the other hand, if the RAM has failed, you would have to power the host down. Having a plan in place for how you will manage these types of events is a vital part of maintaining your OpenStack environment.

For a Compute host, shut down instances on the host before it goes down. For a Block Storage (cinder) host using non-redundant storage, shut down any instances with volumes attached that require that mount point. Unmount the drive within your operating system and re-mount the drive once the Block Storage host is back online.

Replacing failed hardware

Replacing failed hardware¶

openstack-ansible 31.1.0.dev56