rdo-manager: fail to discover nodes with "instack-ironic-deployment --discover-nodes": ERROR: Data pre-processing failed Environment: instack-undercloud-2.1.1-dev13.el7.centos.noarch instack-0.0.6-1.el7ost.noarch Steps to reproduce: 1. Instakk the undercloud. 2. Register a node. 3. Attempt to discover the node(s) with instack-ironic-deployment --discover-nodes Result: Preparing for deployment... Discovering nodes. Sending node ID 7bb4b5c7-bb32-4d47-950c-d125a6ddec7f to discoverd for discovery ... DONE. Polling discoverd for discovery results ... Result for node 7bb4b5c7-bb32-4d47-950c-d125a6ddec7f is ... ERROR: Data pre-processing failed Prepared. Checking the console: the host drops into dracut-emergency. Expected result: The discovery should complete without errors.
The packages listed seem to be inconsistent with OSP-d. Please retest off the puddle and set the product/component appropriately.
Created attachment 1029476 [details] dmesg journalctl rdsosreport.txt
This happens when there are no disks configured on the host being discovered.
Please get discoverd and ramdisk logs as described in https://repos.fedorapeople.org/repos/openstack-m/docs/master/troubleshooting/troubleshooting-nodes.html#where-are-the-logs Logs provided do not contain required information. Also please provide version of openstack-ironic-discoverd
Created attachment 1029543 [details] system.journal from the system
What's this binary file? I'm expecting to see discoverd logs and file from /var/log/ironic-discoverd/ramdisk
This is the only existing file under /var/log. Running "sudo journalctl -u openstack-ironic-discoverd -u openstack-ironic-discoverd-dnsmasq" as in the provided URL - returns an empty log.
The RPM version: openstack-ironic-discoverd-1.1.0-1.el7ost.noarch
You can gunzip the file and look through its content with: systemctl --file <path to the file>.
Created attachment 1029544 [details] output from "journalctl -u openstack-ironic-discoverd.service -u openstack-ironic-discoverd-dnsmasq.service"
So, in the discoverd logs: "value for local_gb is missing or malformed:", which means that ramdisk did not send hard drive size. Now there should be files called "log" and "discoverd-log" in the ramdisk logs tarball, could you please find them?
The issue is caused by ironic-discoverd always expecting a volume on the node. Before RAID configuration, there's no volume on the node, so discovery will fail. But the RAID configuration cannot be started before discovery because the target RAID configuration is tied to the deployment profiles, which is selected for each node based on the discovered facts.
Patches - for discoverd: https://review.openstack.org/#/c/185896/ - for discovery-ramdisk: https://review.openstack.org/#/c/186033/ - for the ironic-discoverd element in instack-undercloud: https://review.gerrithub.io/#/c/234408/
Commits in midstream repos - for discoverd: https://github.com/rdo-management/ironic-discoverd/commit/12eddeaaba8c0244c06957f769db05e6162a0dd7 - for discovery-ramdisk: https://github.com/rdo-management/ironic-discoverd/commit/66d9d00c6b7b47a2c1a578fc5243b7bbd535a0ed - for the ironic-discoverd element in instack-undercloud: https://github.com/rdo-management/instack-undercloud/commit/8e7f62768599dfc9a0f5eef51fdd3223a3da9fad
(In reply to Dmitry Tantsur from comment #11) > So, in the discoverd logs: "value for local_gb is missing or malformed:", > which means that ramdisk did not send hard drive size. Now there should be > files called "log" and "discoverd-log" in the ramdisk logs tarball, could > you please find them? We faced the same message error today in a virtual machine: 'value for local_gb is missing or malformed:' However that was due to a disk of 40Go while the partition was of 1Go.
> (In reply to Dmitry Tantsur from comment #11) > > So, in the discoverd logs: "value for local_gb is missing or malformed:", > > which means that ramdisk did not send hard drive size. Now there should be > > files called "log" and "discoverd-log" in the ramdisk logs tarball, could > > you please find them? > > We faced the same message error today in a virtual machine: > 'value for local_gb is missing or malformed:' > > However that was due to a disk of 40Go while the partition was of 1Go. Sorry, haven't dig enough. It seems that this is linked to something else. I'll open another bug if need be.(In reply to hrosnet from comment #15)
closed, no need for needinfo.