Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2308366

Summary: [17.1][ironic] - overcloud node deployment broken with 4k native disks
Product: Red Hat OpenStack Reporter: Matt Flusche <mflusche>
Component: diskimage-builderAssignee: Julia Kreger <jkreger>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 17.1 (Wallaby)CC: alisci, apevec, cmayapka, dhill, elicohen, jkreger, knoha, mariel, mburns, parthee, pweeks, sbaker, schhabdi, tvainio
Target Milestone: z5Keywords: TestOnly, Triaged
Target Release: 17.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: diskimage-builder-3.31.1-17.1.20240905210750.0576fad.el9ost openstack-ironic-python-agent-7.1.1-17.1.20240918110803.0211fa9.el9ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-01-15 11:30:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2305061    
Bug Blocks:    

Description Matt Flusche 2024-08-28 18:43:49 UTC
Description of problem:

This was addressed in older releases but seems to be back in 17.1 deployments. ref: https://bugzilla.redhat.com/show_bug.cgi?id=1430435

Error from ironic-python-agent journal (I'll attach the full logs), we see the image streamed to device successfully but then failure to read partitions:


Aug 28 06:14:09 host-10-151-154-171 ironic-python-agent[3013]: 2024-08-28 06:14:09.718 3013 DEBUG oslo_concurrency.processutils [-] CMD "lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE,UUID,PARTUUID" returned: 0 in 0.034s execute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:422
Aug 28 06:14:09 host-10-151-154-171 ironic-python-agent[3013]: 2024-08-28 06:14:09.747 3013 DEBUG ironic_lib.utils [-] Command stdout is: "KNAME="sda" MODEL="RAID" SIZE="1599741100032" ROTA="0" TYPE="disk" UUID="" PARTUUID=""


Aug 28 06:14:12 host-10-151-154-171 ironic-python-agent[3013]: 2024-08-28 06:14:12.005 3013 DEBUG ironic_python_agent.extensions.standby [-] Preparing image overcloud-hardened-uefi-full.raw prepare_image /usr/lib/python3.9/site-packages/ironic_python_agent/extensions/standby.py:681

Aug 28 06:14:12 host-10-151-154-171 ironic-python-agent[3013]: 2024-08-28 06:14:12.386 3013 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE,UUID,PARTUUID execute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:384
Aug 28 06:14:12 host-10-151-154-171 ironic-python-agent[3013]: 2024-08-28 06:14:12.423 3013 DEBUG oslo_concurrency.processutils [-] CMD "lsblk -Pbia -oKNAME,MODEL,SIZE,ROTA,TYPE,UUID,PARTUUID" returned: 0 in 0.037s execute /usr/lib/python3.9/site-packages/oslo_concurrency/processutils.py:422
Aug 28 06:14:12 host-10-151-154-171 ironic-python-agent[3013]: 2024-08-28 06:14:12.454 3013 DEBUG ironic_lib.utils [-] Command stdout is: "KNAME="sda" MODEL="RAID" SIZE="1599741100032" ROTA="0" TYPE="disk" UUID="" PARTUUID=""


Aug 28 06:14:23 host-10-151-154-171 ironic-python-agent[3013]: 2024-08-28 06:14:23.393 3013 INFO ironic_python_agent.extensions.standby [-] Image streamed onto device /dev/sda in 10.58223032951355 seconds
Aug 28 06:14:23 host-10-151-154-171 ironic-python-agent[3013]: 2024-08-28 06:14:23.421 3013 DEBUG ironic_python_agent.extensions.standby [-] Verifying image at /dev/sda against sha256 checksum e09921921e96a7e5bb076635e3553a5a21728281d1ceedabdc82b469e65aa79c verify_image /usr/lib/python3.9/site-packages/ironic_python_agent/extensions/standby.py:385


ug 28 06:14:23 host-10-151-154-171 ironic-python-agent[3013]: 2024-08-28 06:14:23.655 3013 DEBUG ironic_lib.utils [-] Command stdout is: "Model: DELL RAID (scsi)
                                                               Disk /dev/sda: 1600GB
                                                               Sector size (logical/physical): 4096B/4096B
                                                               Partition Table: unknown
                                                               Disk Flags:
                                                               " _log /usr/lib/python3.9/site-packages/ironic_lib/utils.py:99
Aug 28 06:14:23 host-10-151-154-171 ironic-python-agent[3013]: 2024-08-28 06:14:23.736 3013 DEBUG ironic_lib.utils [-] Command stderr is: "Error: /dev/sda: unrecognised disk label
                                                               " _log /usr/lib/python3.9/site-packages/ironic_lib/utils.py:100
Aug 28 06:14:23 host-10-151-154-171 ironic-python-agent[3013]: 2024-08-28 06:14:23.762 3013 ERROR ironic_lib.disk_utils [-] Failed to fix GPT partition on disk /dev/sda for node None. Error: Unexpected error while running command.


I can reproduce the issue on a virtual overcloud node by adding the following disk config (see logical_block_size, physical_block_size config):

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/ssd2/overcloud17-node1.qcow2'/>
      <backingStore/>
      <blockio logical_block_size='4096' physical_block_size='4096' discard_granularity='4096'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </disk>

Version-Release number of selected component (if applicable):
17.1 current

How reproducible:
100%

Steps to Reproduce:
1. see above
2.
3.


Expected results:
4k native disk support

Additional info:

full logs will be uploaded

Comment 5 Steve Baker 2024-08-29 22:51:19 UTC
*** Bug 2291316 has been marked as a duplicate of this bug. ***

Comment 10 Steve Baker 2024-09-01 22:10:58 UTC
Adding a depends-on to the bz tracking the ability to rebuild overcloud images with different block device layouts, which will allow an existing overcloud image to be rebuilt with 4k sector size.

Comment 51 Eliad Cohen 2025-01-14 20:52:59 UTC
A KB article exists https://access.redhat.com/articles/7096924 that will help work around this issue. We consider it verified by merit of those who used the article.