Bug 1312187 - Ironic fails to find the disk to write image to
Summary: Ironic fails to find the disk to write image to
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic-python-agent
Version: 8.0 (Liberty)
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ga
: 8.0 (Liberty)
Assignee: Dmitry Tantsur
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-26 03:07 UTC by Sai Sindhur Malleni
Modified: 2016-05-02 14:24 UTC (History)
8 users (show)

Fixed In Version: openstack-ironic-python-agent-1.1.0-6.el7ost
Doc Type: Bug Fix
Doc Text:
Sometimes, hard drives were not available in time for a deployment ramdisk run. Consequently, the deployment failed if the ramdisk was unable to find the required root device. With this update, the "udev settle" command is executed before enumerating disks in the ramdisk, and the deployment no longer fails due to the missing root device.
Clone Of:
Environment:
Last Closed: 2016-04-07 21:31:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 284796 0 'None' MERGED Wait for udev to settle before listing the block devices 2020-08-10 16:40:19 UTC
Red Hat Product Errata RHEA-2016:0603 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 8 Enhancement Advisory 2016-04-08 00:53:53 UTC

Description Sai Sindhur Malleni 2016-02-26 03:07:42 UTC
Description of problem:
Overcloud deploy always fails after retries timing out. In a 3 controller+2 compute deployment at the most 4 of the nodes deployed before the stack create fails. Each time the nodes that fail to deploy are not necessarilythe same. Introspection completes without any complaints but in the ironic debug logs the following error is seen frequently. Some nodes which show the error also go through installation in subsequent attempts but some fail after retries. 

Failed to deploy instance: Failed to start the iSCSI target to deploy the node 79aa5db0-df3a-4f2e-9e6b-45820b18a97c. Error: {u'message': u'Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B [].', u'code': 404, u'type': u'DeviceNotFound', u'details': u'No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B [].'}

The systems  are PowerEdge R610 with 500gigs hard disk and as said introspection doesnt fail.

 


How reproducible: Always on the environment I tried


Steps to Reproduce:
1. Build undercloud
2.Finish introspection
3. Deploy overcloud

Actual results:Overcloud deploys


Expected results: Deploy fails


Additional info:
Using the latest OSP 8 bits. Local_boot option wasn't used

Comment 2 Lucas Alvares Gomes 2016-02-29 15:26:17 UTC
So I've tested the ramdisk and apparently the problem is just the time that the IPA service starts.

I've first tried to modify the openstack-ironic-python-agent.service in the ramdisk to Require the systemd-udev-settle.service [0] service, but that didn't work.

Modifying the ramdisk manually to trigger udevadm and then settle in the code actually worked for me, e.g:

# udevadm trigger --verbose --dry-run --type=devices --subsystem-match=scsi_disk
# udevadm settle

I will work on a patch for IPA to do it.

[0] https://github.com/systemd/systemd/blob/master/units/systemd-udev-settle.service.in

Comment 3 Lucas Alvares Gomes 2016-03-02 17:24:40 UTC
Fix was approved upstream and backported [0]

[0] https://code.engineering.redhat.com/gerrit/#/c/68938/

Comment 6 Raviv Bar-Tal 2016-04-07 08:05:01 UTC
Hi Sindhur,
I don't really know how to verify this bug,
There is not enough information and hw specification.

Can you tell if the fix did solve the problem for you?
If so I'll verify we have the current package in the release and close this bug.

Thanks

Comment 7 Sai Sindhur Malleni 2016-04-07 12:54:25 UTC
Raviv Bar-Tal,

I tried the latest puddle/poodle and did not hit the issue, in fact in the performance team we have several working OSP 8 deployments, and haven't seen this issue after the fix.

Comment 8 errata-xmlrpc 2016-04-07 21:31:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0603.html


Note You need to log in before you can comment on or make changes to this bug.