Bug 2222981
| Summary: | Overcloud deploy fails when mounting config drive on 4k disks | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | nalmond |
| Component: | openstack-ironic-python-agent | Assignee: | Julia Kreger <jkreger> |
| Status: | CLOSED MIGRATED | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.2 (Train) | CC: | jkreger, pweeks, sbaker |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | Flags: | jkreger:
needinfo-
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-01-04 17:25:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
nalmond
2023-07-14 17:34:36 UTC
There doesn't seem to be a precisely clear single path forward. There are a few different, and distinct things going on here. 1) Obviously a filesystem to underlying block IO device incompatibility. Realistically, there is no "fix" for this, we can only realistically work around and prevent such a case later on in the code path. 2) Changing the default type to vfat fails, because the configuration drive ends up being too small on a non-4k system and promptly explodes. The inherent challenge is we support a few different ways of getting a configuration drive: 1) We get a pre-prepared binary payload from the client, be it Nova, Metalsmith, OpenStackSDK, or even python-ironicclient, and the contents are written out byte for byte as requested by original requester. 2) We can be sent chunks of the data, and then assemble a fresh configuration drive payload to write to disk. There is a third issue though, with this bug. Ironic doesn't present a configuration parameter named ``config_drive_format``. Nova does[0]. Which leaves us in an odd place. Thoughts on paths forward: 1) I do suspect we should clone this out to RHEL and see if they can resolve iso9660 being unfriendly to 4k devices, since there is such a huge build-up already of writers to such volumes. 2) I also think we might need to look at transforming the payload, given we have so many *different* ways of getting payloads to support. Further team discussion and research is required. [0]: https://opendev.org/openstack/nova/src/branch/master/nova/conf/configdrive.py#L18 Adding an upstream bug. Could we please get the output of the following command from the customer's system: sudo blockdev --report /path/to/device Specifically we need to make sure we understand which field is different, since it seems odd that this would also be presenting now and this way. If we can get it from an existing deployed machine which deployed without issues, and the machine they are attempting to deploy to, that would be helpful. Thanks! Here are the blockdev outputs: working node: [heat-admin@ctrl1 ~]$ sudo blockdev --report /dev/sda2 RO RA SSZ BSZ StartSec Size Device rw 8192 512 4096 411648 1048576 /dev/sda2 non-working (4k) node: [root@gen16gpu0 ~]# sudo blockdev --report /dev/sda2 RO RA SSZ BSZ StartSec Size Device rw 8192 4096 4096 411648 1048576 /dev/sda2 Greetings, we've updated the upstream patch which is pending CI and review upstream. If upstream agrees to the path forward, it will take a little time to get this into the product, but given we've not seen this exact behavior with in-kernel block device drivers yet, I suspect we may be paving over some a kernel bug with the third party driver. Regardless initial upstream feedback was positive and in agreement since it seems like a logical constraint of behavior in some applications. Greetings, we won't be able to fix this in OSP 17.x much less OSP16.x as it involves changes to stable libraries which are essentially frozen in time at this point. We anticipate this fix will be available in OSP18. As this issue was rooted in a third party driver, we recommend you engage the hardware manufacturer regarding their supplied driver. |