RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1915965 - Running kpartx after growing multipathed partition sometimes yields "device-mapper: resume ioctl on mpatha4 failed: Invalid argument"
Summary: Running kpartx after growing multipathed partition sometimes yields "device-m...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: device-mapper-multipath
Version: 8.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: Ben Marzinski
QA Contact: Lin Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-13 20:28 UTC by Jonathan Lebon
Modified: 2021-09-06 15:20 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-28 17:21:10 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jonathan Lebon 2021-01-13 20:28:09 UTC
Description of problem:

In RHCOS on multipath, we grow the root partition on first boot using sfdisk, and then kpartx to force a partition reread:

https://github.com/coreos/fedora-coreos-config/blob/9afd0b9f8faab379b78426aa44c68503e7a7d777/overlay.d/05core/usr/lib/dracut/modules.d/40ignition-ostree/coreos-growpart#L58-L61

Sometimes (maybe 20% of the time), kpartx fails with:

> device-mapper: resume ioctl on mpatha4  failed: Invalid argument

Version-Release number of selected component (if applicable):

device-mapper-multipath-0.8.4-5.el8.x86_64

How reproducible:

About 20% of the time.

Steps to Reproduce:

I don't have a simple reproducer right now apart from re-running RHCOS 4.7 on multipath multiple times. This can be done with `coreos-assembler` like this:

```
cosa run --qemu-image path/to/rhcos-4.7.qcow2 --qemu-multipath -c --kargs 'rd.multipath=default root=/dev/disk/by-label/dm-mpath-root rd.break'
```

You can get the RHCOS qcow2 from https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/ and use cosa as a container as described in the README at https://github.com/coreos/coreos-assembler.

Actual results:

Sometimes fails with EINVAL.

Expected results:

Never fails with EINVAL.

Additional info:

The error message looks similar to https://access.redhat.com/solutions/201973, which was for el5. Not sure if there was a regression there, or if the root cause is entirely different.

For now, we're planning to work around this by running kpartx multiple times until it succeeds.

Comment 1 Ben Marzinski 2021-01-18 23:18:52 UTC
It's possible that your issue is that your kpartx call is getting messed up by the automatic kpartx call udev runs to resize the partitions. /lib/udev/rules.d/66-kpartx.rules should trigger a kpartx update on every reasonable change event to the multipath device.  Writing to the partition table with sfdisk should do that. It does for me on a RHEL system, and the kpartx paritition is automatically updated. It seems possible that manually running kpartx at the same time could attempt to update the device while the udev triggered update is in process. In that case, it could race and try to suspend the device when it had already been suspended by the other process.

So if I'm right, the issue is that there is no synchronization between your script and the udev daemon. DM devices must be in the suspended state while updating their device tables, and if the device is already in the suspend state when you try to suspend to update the table, dm will fail with exactly the error message you saw. If this is indeed what is happening, then the easiest solution is to not run kpartx, and trust udev to do this. If you needed to guarantee that udev has run, you could call "udevadm settle". If you don't want to wait for udevadm settle to run, but do want to guarantee that kpartx has run, then you might be left with what you have. Keep running kpartx until it succeeds. This is, of course, assuming that what's going on here is what I outlined above. But as long as /lib/udev/rules.d/66-kpartx.rules exists, udev should try to run kpartx in this case, and racing kpartx calls will occasionally cause this issue.

Comment 2 Ben Marzinski 2021-01-27 15:33:00 UTC
Does this look like it could be the issue to you?

Comment 3 Jonathan Lebon 2021-01-28 17:21:10 UTC
Ahh thanks for the details. I wasn't aware of 66-kpartx.rules. In that case, your theory is indeed highly plausible.
In the end, we were able to drop the code that was hitting this entirely, so we can close this issue.

Thanks!


Note You need to log in before you can comment on or make changes to this bug.