[ English | русский | Deutsch | Indonesia | English (United Kingdom) ]

Recover a compute host failure

The following procedure addresses Compute node failure if shared storage is used.

Note

If shared storage is not used, data can be copied from the /var/lib/nova/instances directory on the failed Compute node ${FAILED_NODE} to another node ${RECEIVING_NODE}before performing the following procedure. Please note this method is not supported.

  1. Re-launch all instances on the failed node.

  2. Invoke the MariaDB command line tool.

  3. Generate a list of instance UUIDs hosted on the failed node:

    mysql> select uuid from instances where host = '${FAILED_NODE}' and deleted = 0;
    
  4. Set instances on the failed node to be hosted on a different node:

    mysql> update instances set host ='${RECEIVING_NODE}' where host = '${FAILED_NODE}' \
    and deleted = 0;
    
  5. Reboot each instance on the failed node listed in the previous query to regenerate the XML files:

    # nova reboot —hard $INSTANCE_UUID
    
  6. Find the volumes to check the instance has successfully booted and is at the login:

    mysql> select nova.instances.uuid as instance_uuid, cinder.volumes.id \
    as voume_uuid, cinder.volumes.status, cinder.volumes.attach_status, \
    cinder.volumes.mountpoint, cinder.volumes,display_name from \
    cinder.volumes inner join nova.instances on cinder.volumes.instance_uuid=nova.instances.uuid \
    where nova.instances.host = '${FAILED_NODE}';
    
  7. If rows are found, detach and re-attach the volumes using the values listed in the previous query:

    # nova volume-detach $INSTANCE_UUID $VOLUME_UUID && \
    # nova volume-attach $INSTANCE_UUID $VOLUME_UUID $VOLUME_MOUNTPOINT
    
  8. Rebuild or replace the failed node as described in Add a compute host.