.. _troubleshooting: ====================== Troubleshooting Ironic ====================== Nova returns "No valid host was found" Error ============================================ Sometimes Nova Conductor log file "nova-conductor.log" or a message returned from Nova API contains the following error:: NoValidHost: No valid host was found. There are not enough hosts available. "No valid host was found" means that the Nova Scheduler could not find a bare metal node suitable for booting the new instance. This in turn usually means some mismatch between resources that Nova expects to find and resources that Ironic advertised to Nova. A few things should be checked in this case: #. Inspection should have succeeded for you before, or you should have entered the required Ironic node properties manually. For each node with available state in ``ironic node-list --provision-state available`` use :: ironic node-show and make sure that ``properties`` JSON field has valid values for keys ``cpus``, ``cpu_arch``, ``memory_mb`` and ``local_gb``. #. The Nova flavor that you are using does not match any properties of the available Ironic nodes. Use :: nova flavor-show to compare. If you're using exact match filters in Nova Scheduler, please make sure the flavor and the node properties match exactly. Regarding the extra specs in flavor, you should make sure they map to ``node.properties['capabilities']``. #. Make sure that enough nodes are in ``available`` state according to ``ironic node-list --provision-state available``. #. Make sure nodes you're going to deploy to are not in maintenance mode. Again, use ``ironic node-list`` to check. A node automatically going to maintenance mode usually means wrong power credentials for this node. Check them and then remove maintenance mode:: ironic node-set-maintenance off #. After making changes to nodes in Ironic, it takes time for those changes to propagate from Ironic to Nova. Check that :: nova hypervisor-stats correctly shows total amount of resources in your system. You can also check ``nova hypervisor-list`` to see the status of individual Ironic nodes as reported to Nova. And you can correlate the Nova "hypervisor hostname" to the Ironic node UUID. #. If none of the above helped, check Ironic conductor log carefully to see if there are any conductor-related errors which are the root cause for "No valid host was found". If there are any "Error in deploy of node : [Errno 28] ..." error messages in Ironic conductor log, it means the conductor run into a special error during deployment. So you can check the log carefully to fix or work around and then try again. Patching the Deploy Ramdisk =========================== When debugging a problem with deployment and/or inspection you may want to quickly apply a change to the ramdisk to see if it helps. Of course you can inject your code and/or SSH keys during the ramdisk build (depends on how exactly you've built your ramdisk). But it's also possible to quickly modify an already built ramdisk. Create an empty directory and unpack the ramdisk content there:: mkdir unpack cd unpack gzip -dc /path/to/the/ramdisk | cpio -id The last command will result in the whole Linux file system tree unpacked in the current directory. Now you can modify any files you want. The actual location of the files will depend on the way you've built the ramdisk. After you've done the modifications, pack the whole content of the current directory back:: find . | cpio -H newc -o > /path/to/the/new/ramdisk .. note:: You don't need to modify the kernel (e.g. ``tinyipa-master.vmlinuz``), only the ramdisk part. .. note:: For CoreOS-based ramdisk you also need to unpack and pack back the squashfs archive inside the unpacked ramdisk. API Errors ========== The `debug_tracebacks_in_api` config option may be set to return tracebacks in the API response for all 4xx and 5xx errors. Retrieving logs from the deploy ramdisk ======================================= When troubleshooting deployments (specially in case of a deploy failure) it's important to have access to the deploy ramdisk logs to be able to identify the source of the problem. By default, Ironic will retrieve the logs from the deploy ramdisk when the deployment fails and save it on the local filesystem at ``/var/log/ironic/deploy``. To change this behavior, operators can make the following changes to ``/etc/ironic/ironic.conf`` under the ``[agent]`` group: * ``deploy_logs_collect``: Whether Ironic should collect the deployment logs on deployment. Valid values for this option are: * ``on_failure`` (**default**): Retrieve the deployment logs upon a deployment failure. * ``always``: Always retrieve the deployment logs, even if the deployment succeed. * ``never``: Disable retrieving the deployment logs. * ``deploy_logs_storage_backend``: The name of the storage backend where the logs will be stored. Valid values for this option are: * ``local`` (**default**): Store the logs in the local filesystem. * ``swift``: Store the logs in Swift. * ``deploy_logs_local_path``: The path to the directory where the logs should be stored, used when the ``deploy_logs_storage_backend`` is configured to ``local``. By default logs will be stored at **/var/log/ironic/deploy**. * ``deploy_logs_swift_container``: The name of the Swift container to store the logs, used when the deploy_logs_storage_backend is configured to "swift". By default **ironic_deploy_logs_container**. * ``deploy_logs_swift_days_to_expire``: Number of days before a log object is marked as expired in Swift. If None, the logs will be kept forever or until manually deleted. Used when the deploy_logs_storage_backend is configured to "swift". By default **30** days. When the logs are collected, Ironic will store a *tar.gz* file containing all the logs according to the ``deploy_logs_storage_backend`` configuration option. All log objects will be named with the following pattern:: [_]_.tar.gz .. note:: The *instance_uuid* field is not required for deploying a node when Ironic is configured to be used in standalone mode. If present it will be appended to the name. Accessing the log data ---------------------- When storing in the local filesystem ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When storing the logs in the local filesystem, the log files can be found at the path configured in the ``deploy_logs_local_path`` configuration option. For example, to find the logs from the node ``5e9258c4-cfda-40b6-86e2-e192f523d668``: .. code-block:: bash $ ls /var/log/ironic/deploy | grep 5e9258c4-cfda-40b6-86e2-e192f523d668 5e9258c4-cfda-40b6-86e2-e192f523d668_88595d8a-6725-4471-8cd5-c0f3106b6898_2016-08-08-13:52:12.tar.gz 5e9258c4-cfda-40b6-86e2-e192f523d668_db87f2c5-7a9a-48c2-9a76-604287257c1b_2016-08-08-14:07:25.tar.gz .. note:: When saving the logs to the filesystem, operators may want to enable some form of rotation for the logs to avoid disk space problems. When storing in Swift ~~~~~~~~~~~~~~~~~~~~~ When using Swift, operators can associate the objects in the container with the nodes in Ironic and search for the logs for the node ``5e9258c4-cfda-40b6-86e2-e192f523d668`` using the **prefix** parameter. For example: .. code-block:: bash $ swift list ironic_deploy_logs_container -p 5e9258c4-cfda-40b6-86e2-e192f523d668 5e9258c4-cfda-40b6-86e2-e192f523d668_88595d8a-6725-4471-8cd5-c0f3106b6898_2016-08-08-13:52:12.tar.gz 5e9258c4-cfda-40b6-86e2-e192f523d668_db87f2c5-7a9a-48c2-9a76-604287257c1b_2016-08-08-14:07:25.tar.gz To download a specific log from Swift, do: .. code-block:: bash $ swift download ironic_deploy_logs_container "5e9258c4-cfda-40b6-86e2-e192f523d668_db87f2c5-7a9a-48c2-9a76-604287257c1b_2016-08-08-14:07:25.tar.gz" 5e9258c4-cfda-40b6-86e2-e192f523d668_db87f2c5-7a9a-48c2-9a76-604287257c1b_2016-08-08-14:07:25.tar.gz [auth 0.341s, headers 0.391s, total 0.391s, 0.531 MB/s] The contents of the log file ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The log is just a ``.tar.gz`` file that can be extracted as: .. code-block:: bash $ tar xvf The contents of the file may differ slightly depending on the distribution that the deploy ramdisk is using: * For distributions using ``systemd`` there will be a file called **journal** which contains all the system logs collected via the ``journalctl`` command. * For other distributions, the ramdisk will collect all the contents of the ``/var/log`` directory. For all distributions, the log file will also contain the output of the following commands (if present): ``ps``, ``df``, ``ip addr`` and ``iptables``. Here's one example when extracting the content of a log file for a distribution that uses ``systemd``: .. code-block:: bash $ tar xvf 5e9258c4-cfda-40b6-86e2-e192f523d668_88595d8a-6725-4471-8cd5-c0f3106b6898_2016-08-08-13:52:12.tar.gz df ps journal ip_addr iptables