Bug 1939362

Summary: Baremetal provisioning is failing with error: No partition with UUID <> found on device /dev/sda'
Product: Red Hat OpenStack Reporter: Sandeep Yadav <sandyada>
Component: openstack-ironic-python-agentAssignee: Steve Baker <sbaker>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 17.0 (Wallaby)CC: sbaker
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-ironic-python-agent-7.0.1-0.20210406124832.3123406.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-20 14:58:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sandeep Yadav 2021-03-16 08:46:11 UTC
Description of problem:

Baremetal provisioning is failing with command_error': {'type': 'DeviceNotFound', 'code': 404, 'message': 'Error finding the disk or partition device to deploy the image onto', 'details': 'No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda'}

Version-Release number of selected component (if applicable):

RHOSP 17, Downstream pipeline - OSP17 Integration line


Actual results: Baremetal provisioning is failing

Expected results: Baremetal provisioning should pass


Additional info:

This issue is happening in Downstream pipeline - OSP17 Integration line. We have a ci job which is testing deployment on a real baremetal. We are 4 baremetal nodes in this environment which are failing on "openstack overcloud node provision" step.

Log snippet:-


~~~
2021-03-14 20:27:15.432 7 ERROR ironic.conductor.utils [req-6e729687-c394-416b-a91b-da4033a2ee78 - - - - -] Node 20d52ee9-8342-4dcb-a2c8-e6b7c93fd374 failed deploy step {}. Error: Failed to install a bootloader when deploying node 20d52ee9-8342-4dcb-a2c8-e6b7c93fd374. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda: ironic.common.exception.InstanceDeployFailure: Failed to install a bootloader when deploying node 20d52ee9-8342-4dcb-a2c8-e6b7c93fd374. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda

2021-03-14 20:27:18.793 7 ERROR ironic.conductor.utils [req-4211d61f-18d4-4eef-bf2e-660795ba9abf - - - - -] Node 416a9b96-3920-45aa-840a-f16a445d65e1 failed deploy step {}. Error: Failed to install a bootloader when deploying node 416a9b96-3920-45aa-840a-f16a445d65e1. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda: ironic.common.exception.InstanceDeployFailure: Failed to install a bootloader when deploying node 416a9b96-3920-45aa-840a-f16a445d65e1. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda

2021-03-14 20:27:20.446 7 ERROR ironic.conductor.utils [req-6f7c6bd2-552e-42fb-a3c2-13965680bc9c - - - - -] Node 03be0314-36ca-4f92-9578-b630f805df53 failed deploy step {}. Error: Failed to install a bootloader when deploying node 03be0314-36ca-4f92-9578-b630f805df53. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda: ironic.common.exception.InstanceDeployFailure: Failed to install a bootloader when deploying node 03be0314-36ca-4f92-9578-b630f805df53. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda

2021-03-14 20:27:21.894 7 ERROR ironic.conductor.utils [req-146f12eb-db86-4531-9183-87edc022b5ec - - - - -] Node a4aa0679-09ca-4708-8ce6-f82b314b6bee failed deploy step {}. Error: Failed to install a bootloader when deploying node a4aa0679-09ca-4708-8ce6-f82b314b6bee. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda: ironic.common.exception.InstanceDeployFailure: Failed to install a bootloader when deploying node a4aa0679-09ca-4708-8ce6-f82b314b6bee. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda
~~~

~~~
Mar 14 20:33:22 host-10-9-120-141 ironic-python-agent[1221]: 2021-03-14 20:33:22.070 1221 DEBUG ironic_python_agent.extensions.image [-] First fallback detection attempt for locating partition via UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 failed. Error: Unexpected error while running command.
                                                             Command: findfs UUID=8868a6f9-a572-42e9-8dc0-7a277dce7856
                                                             Exit code: 1
                                                             Stdout: ''
                                                             Stderr: "findfs: unable to resolve 'UUID=8868a6f9-a572-42e9-8dc0-7a277dce7856'\n" _get_partition /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py:108
~~~

Comment 2 Steve Baker 2021-03-16 19:42:14 UTC
This will need some investigation, but the current theory is that the behavior of findfs has changed

Comment 3 Steve Baker 2021-03-17 04:25:15 UTC
The deploy log shows that the UUID of /dev/sda2 remains and is assumed to be 8868a6f9-a572-42e9-8dc0-7a277dce7856 when deleting partitions, recreating partitions, and writing overcloud-full.raw onto it. But after the image is written the UUID changes to c9107bc1-f707-4417-9bf4-5609495d67c0:

Mar 14 20:33:21 host-10-9-120-141 ironic-python-agent[1221]: 2021-03-14 20:33:21.923 1221 DEBUG ironic_lib.utils [-] Command stdout is: "KNAME="sda" MODEL="INTEL SSDSC1BG20" SIZE="200049647616" ROTA="0" TYPE="disk" UUID="" PARTUUID=""
                                                             KNAME="sda1" MODEL="" SIZE="1048576" ROTA="0" TYPE="part" UUID="2021-03-14-20-19-04-00" PARTUUID="5d9c5f07-01"
                                                             KNAME="sda2" MODEL="" SIZE="197568495616" ROTA="0" TYPE="part" UUID="c9107bc1-f707-4417-9bf4-5609495d67c0" PARTUUID="5d9c5f07-02"
                                                             KNAME="sdb" MODEL="INTEL SSDSC1BG20" SIZE="200049647616" ROTA="0" TYPE="disk" UUID="" PARTUUID=""

Later attempts to find the root device fail because it is searching for 8868a6f9-a572-42e9-8dc0-7a277dce7856, which no longer exists.

I'll keep looking into why this is happening.

Comment 6 Steve Baker 2021-03-21 19:54:48 UTC
The upstream fix has merged