Bug 1939362 - Baremetal provisioning is failing with error: No partition with UUID <> found on device /dev/sda'
Summary: Baremetal provisioning is failing with error: No partition with UUID <> found...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic-python-agent
Version: 17.0 (Wallaby)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: Steve Baker
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-16 08:46 UTC by Sandeep Yadav
Modified: 2022-08-23 22:47 UTC (History)
1 user (show)

Fixed In Version: openstack-ironic-python-agent-7.0.1-0.20210406124832.3123406.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-20 14:58:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 781564 0 None NEW Fix root UUID for streamed partition images 2021-03-19 01:36:28 UTC
Red Hat Issue Tracker OSP-3052 0 None None None 2022-08-23 22:47:53 UTC

Description Sandeep Yadav 2021-03-16 08:46:11 UTC
Description of problem:

Baremetal provisioning is failing with command_error': {'type': 'DeviceNotFound', 'code': 404, 'message': 'Error finding the disk or partition device to deploy the image onto', 'details': 'No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda'}

Version-Release number of selected component (if applicable):

RHOSP 17, Downstream pipeline - OSP17 Integration line


Actual results: Baremetal provisioning is failing

Expected results: Baremetal provisioning should pass


Additional info:

This issue is happening in Downstream pipeline - OSP17 Integration line. We have a ci job which is testing deployment on a real baremetal. We are 4 baremetal nodes in this environment which are failing on "openstack overcloud node provision" step.

Log snippet:-


~~~
2021-03-14 20:27:15.432 7 ERROR ironic.conductor.utils [req-6e729687-c394-416b-a91b-da4033a2ee78 - - - - -] Node 20d52ee9-8342-4dcb-a2c8-e6b7c93fd374 failed deploy step {}. Error: Failed to install a bootloader when deploying node 20d52ee9-8342-4dcb-a2c8-e6b7c93fd374. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda: ironic.common.exception.InstanceDeployFailure: Failed to install a bootloader when deploying node 20d52ee9-8342-4dcb-a2c8-e6b7c93fd374. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda

2021-03-14 20:27:18.793 7 ERROR ironic.conductor.utils [req-4211d61f-18d4-4eef-bf2e-660795ba9abf - - - - -] Node 416a9b96-3920-45aa-840a-f16a445d65e1 failed deploy step {}. Error: Failed to install a bootloader when deploying node 416a9b96-3920-45aa-840a-f16a445d65e1. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda: ironic.common.exception.InstanceDeployFailure: Failed to install a bootloader when deploying node 416a9b96-3920-45aa-840a-f16a445d65e1. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda

2021-03-14 20:27:20.446 7 ERROR ironic.conductor.utils [req-6f7c6bd2-552e-42fb-a3c2-13965680bc9c - - - - -] Node 03be0314-36ca-4f92-9578-b630f805df53 failed deploy step {}. Error: Failed to install a bootloader when deploying node 03be0314-36ca-4f92-9578-b630f805df53. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda: ironic.common.exception.InstanceDeployFailure: Failed to install a bootloader when deploying node 03be0314-36ca-4f92-9578-b630f805df53. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda

2021-03-14 20:27:21.894 7 ERROR ironic.conductor.utils [req-146f12eb-db86-4531-9183-87edc022b5ec - - - - -] Node a4aa0679-09ca-4708-8ce6-f82b314b6bee failed deploy step {}. Error: Failed to install a bootloader when deploying node a4aa0679-09ca-4708-8ce6-f82b314b6bee. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda: ironic.common.exception.InstanceDeployFailure: Failed to install a bootloader when deploying node a4aa0679-09ca-4708-8ce6-f82b314b6bee. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda
~~~

~~~
Mar 14 20:33:22 host-10-9-120-141 ironic-python-agent[1221]: 2021-03-14 20:33:22.070 1221 DEBUG ironic_python_agent.extensions.image [-] First fallback detection attempt for locating partition via UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 failed. Error: Unexpected error while running command.
                                                             Command: findfs UUID=8868a6f9-a572-42e9-8dc0-7a277dce7856
                                                             Exit code: 1
                                                             Stdout: ''
                                                             Stderr: "findfs: unable to resolve 'UUID=8868a6f9-a572-42e9-8dc0-7a277dce7856'\n" _get_partition /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py:108
~~~

Comment 2 Steve Baker 2021-03-16 19:42:14 UTC
This will need some investigation, but the current theory is that the behavior of findfs has changed

Comment 3 Steve Baker 2021-03-17 04:25:15 UTC
The deploy log shows that the UUID of /dev/sda2 remains and is assumed to be 8868a6f9-a572-42e9-8dc0-7a277dce7856 when deleting partitions, recreating partitions, and writing overcloud-full.raw onto it. But after the image is written the UUID changes to c9107bc1-f707-4417-9bf4-5609495d67c0:

Mar 14 20:33:21 host-10-9-120-141 ironic-python-agent[1221]: 2021-03-14 20:33:21.923 1221 DEBUG ironic_lib.utils [-] Command stdout is: "KNAME="sda" MODEL="INTEL SSDSC1BG20" SIZE="200049647616" ROTA="0" TYPE="disk" UUID="" PARTUUID=""
                                                             KNAME="sda1" MODEL="" SIZE="1048576" ROTA="0" TYPE="part" UUID="2021-03-14-20-19-04-00" PARTUUID="5d9c5f07-01"
                                                             KNAME="sda2" MODEL="" SIZE="197568495616" ROTA="0" TYPE="part" UUID="c9107bc1-f707-4417-9bf4-5609495d67c0" PARTUUID="5d9c5f07-02"
                                                             KNAME="sdb" MODEL="INTEL SSDSC1BG20" SIZE="200049647616" ROTA="0" TYPE="disk" UUID="" PARTUUID=""

Later attempts to find the root device fail because it is searching for 8868a6f9-a572-42e9-8dc0-7a277dce7856, which no longer exists.

I'll keep looking into why this is happening.

Comment 6 Steve Baker 2021-03-21 19:54:48 UTC
The upstream fix has merged


Note You need to log in before you can comment on or make changes to this bug.