Bug 2222981
| Summary: | Overcloud deploy fails when mounting config drive on 4k disks | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | nalmond |
| Component: | openstack-ironic-python-agent | Assignee: | Julia Kreger <jkreger> |
| Status: | ON_DEV --- | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.2 (Train) | CC: | jkreger, sbaker |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | Flags: | jkreger:
needinfo-
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
nalmond
2023-07-14 17:34:36 UTC
There doesn't seem to be a precisely clear single path forward. There are a few different, and distinct things going on here. 1) Obviously a filesystem to underlying block IO device incompatibility. Realistically, there is no "fix" for this, we can only realistically work around and prevent such a case later on in the code path. 2) Changing the default type to vfat fails, because the configuration drive ends up being too small on a non-4k system and promptly explodes. The inherent challenge is we support a few different ways of getting a configuration drive: 1) We get a pre-prepared binary payload from the client, be it Nova, Metalsmith, OpenStackSDK, or even python-ironicclient, and the contents are written out byte for byte as requested by original requester. 2) We can be sent chunks of the data, and then assemble a fresh configuration drive payload to write to disk. There is a third issue though, with this bug. Ironic doesn't present a configuration parameter named ``config_drive_format``. Nova does[0]. Which leaves us in an odd place. Thoughts on paths forward: 1) I do suspect we should clone this out to RHEL and see if they can resolve iso9660 being unfriendly to 4k devices, since there is such a huge build-up already of writers to such volumes. 2) I also think we might need to look at transforming the payload, given we have so many *different* ways of getting payloads to support. Further team discussion and research is required. [0]: https://opendev.org/openstack/nova/src/branch/master/nova/conf/configdrive.py#L18 Adding an upstream bug. Could we please get the output of the following command from the customer's system: sudo blockdev --report /path/to/device Specifically we need to make sure we understand which field is different, since it seems odd that this would also be presenting now and this way. If we can get it from an existing deployed machine which deployed without issues, and the machine they are attempting to deploy to, that would be helpful. Thanks! Here are the blockdev outputs: working node: [heat-admin@ctrl1 ~]$ sudo blockdev --report /dev/sda2 RO RA SSZ BSZ StartSec Size Device rw 8192 512 4096 411648 1048576 /dev/sda2 non-working (4k) node: [root@gen16gpu0 ~]# sudo blockdev --report /dev/sda2 RO RA SSZ BSZ StartSec Size Device rw 8192 4096 4096 411648 1048576 /dev/sda2 |