Description of problem: Baremetal provisioning is failing with command_error': {'type': 'DeviceNotFound', 'code': 404, 'message': 'Error finding the disk or partition device to deploy the image onto', 'details': 'No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda'} Version-Release number of selected component (if applicable): RHOSP 17, Downstream pipeline - OSP17 Integration line Actual results: Baremetal provisioning is failing Expected results: Baremetal provisioning should pass Additional info: This issue is happening in Downstream pipeline - OSP17 Integration line. We have a ci job which is testing deployment on a real baremetal. We are 4 baremetal nodes in this environment which are failing on "openstack overcloud node provision" step. Log snippet:- ~~~ 2021-03-14 20:27:15.432 7 ERROR ironic.conductor.utils [req-6e729687-c394-416b-a91b-da4033a2ee78 - - - - -] Node 20d52ee9-8342-4dcb-a2c8-e6b7c93fd374 failed deploy step {}. Error: Failed to install a bootloader when deploying node 20d52ee9-8342-4dcb-a2c8-e6b7c93fd374. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda: ironic.common.exception.InstanceDeployFailure: Failed to install a bootloader when deploying node 20d52ee9-8342-4dcb-a2c8-e6b7c93fd374. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda 2021-03-14 20:27:18.793 7 ERROR ironic.conductor.utils [req-4211d61f-18d4-4eef-bf2e-660795ba9abf - - - - -] Node 416a9b96-3920-45aa-840a-f16a445d65e1 failed deploy step {}. Error: Failed to install a bootloader when deploying node 416a9b96-3920-45aa-840a-f16a445d65e1. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda: ironic.common.exception.InstanceDeployFailure: Failed to install a bootloader when deploying node 416a9b96-3920-45aa-840a-f16a445d65e1. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda 2021-03-14 20:27:20.446 7 ERROR ironic.conductor.utils [req-6f7c6bd2-552e-42fb-a3c2-13965680bc9c - - - - -] Node 03be0314-36ca-4f92-9578-b630f805df53 failed deploy step {}. Error: Failed to install a bootloader when deploying node 03be0314-36ca-4f92-9578-b630f805df53. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda: ironic.common.exception.InstanceDeployFailure: Failed to install a bootloader when deploying node 03be0314-36ca-4f92-9578-b630f805df53. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda 2021-03-14 20:27:21.894 7 ERROR ironic.conductor.utils [req-146f12eb-db86-4531-9183-87edc022b5ec - - - - -] Node a4aa0679-09ca-4708-8ce6-f82b314b6bee failed deploy step {}. Error: Failed to install a bootloader when deploying node a4aa0679-09ca-4708-8ce6-f82b314b6bee. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda: ironic.common.exception.InstanceDeployFailure: Failed to install a bootloader when deploying node a4aa0679-09ca-4708-8ce6-f82b314b6bee. Error: No partition with UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 found on device /dev/sda ~~~ ~~~ Mar 14 20:33:22 host-10-9-120-141 ironic-python-agent[1221]: 2021-03-14 20:33:22.070 1221 DEBUG ironic_python_agent.extensions.image [-] First fallback detection attempt for locating partition via UUID 8868a6f9-a572-42e9-8dc0-7a277dce7856 failed. Error: Unexpected error while running command. Command: findfs UUID=8868a6f9-a572-42e9-8dc0-7a277dce7856 Exit code: 1 Stdout: '' Stderr: "findfs: unable to resolve 'UUID=8868a6f9-a572-42e9-8dc0-7a277dce7856'\n" _get_partition /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py:108 ~~~
This will need some investigation, but the current theory is that the behavior of findfs has changed
The deploy log shows that the UUID of /dev/sda2 remains and is assumed to be 8868a6f9-a572-42e9-8dc0-7a277dce7856 when deleting partitions, recreating partitions, and writing overcloud-full.raw onto it. But after the image is written the UUID changes to c9107bc1-f707-4417-9bf4-5609495d67c0: Mar 14 20:33:21 host-10-9-120-141 ironic-python-agent[1221]: 2021-03-14 20:33:21.923 1221 DEBUG ironic_lib.utils [-] Command stdout is: "KNAME="sda" MODEL="INTEL SSDSC1BG20" SIZE="200049647616" ROTA="0" TYPE="disk" UUID="" PARTUUID="" KNAME="sda1" MODEL="" SIZE="1048576" ROTA="0" TYPE="part" UUID="2021-03-14-20-19-04-00" PARTUUID="5d9c5f07-01" KNAME="sda2" MODEL="" SIZE="197568495616" ROTA="0" TYPE="part" UUID="c9107bc1-f707-4417-9bf4-5609495d67c0" PARTUUID="5d9c5f07-02" KNAME="sdb" MODEL="INTEL SSDSC1BG20" SIZE="200049647616" ROTA="0" TYPE="disk" UUID="" PARTUUID="" Later attempts to find the root device fail because it is searching for 8868a6f9-a572-42e9-8dc0-7a277dce7856, which no longer exists. I'll keep looking into why this is happening.
The upstream fix has merged