RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1949076 - Add missing ordering with initrd-cleanup.service in dracut's multipathd.service
Summary: Add missing ordering with initrd-cleanup.service in dracut's multipathd.service
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: dracut
Version: 8.3
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: beta
: 8.5
Assignee: Lukáš Nykrýn
QA Contact: Petr Matyáš
URL:
Whiteboard:
: 1933038 1951002 2018269 (view as bug list)
Depends On:
Blocks: 1916117 1951002 1985975
TreeView+ depends on / blocked
 
Reported: 2021-04-13 11:54 UTC by Peter Rajnoha
Modified: 2023-04-26 15:31 UTC (History)
22 users (show)

Fixed In Version: dracut-049-151.git20210719.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1985975 (view as bug list)
Environment:
Last Closed: 2021-11-09 19:38:39 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github dracutdevs dracut pull 1394 0 None Merged fix(multipath): stop multipath before udev db cleanup 2022-02-04 09:19:19 UTC
IBM Linux Technology Center 191116 0 None None None 2021-04-21 08:09:31 UTC
Red Hat Knowledge Base (Solution) 5901081 0 None None None 2021-07-09 18:57:16 UTC
Red Hat Product Errata RHBA-2021:4394 0 None None None 2021-11-09 19:38:58 UTC

Description Peter Rajnoha 2021-04-13 11:54:38 UTC
We need to add missing ordering in multipathd.service that is used in dracut (/usr/lib/dracut/modules.d/90multipath/multipathd.service) so the Before and Conflicts settings contain:

  Before=local-fs-pre.target initrd-cleanup.service
  Conflicts=initrd-cleanup.service

(The initrd-cleanup.service is added.)

This is important for multipathd in initramfs to be able to have all required dependencies in place before systemd starts stopping them, like udevd service which mutlipathd needs for creating the /dev content and populate udev database so it can be reused after we switch to root fs. All device-mapper based devices, including device-mapper-multipath, do reuse udev db from initramfs after switching to root fs.

This came out of bug #1933038 comment #16 (and comments later) where this was proved to fix the issue.

(The bug #1933038 comment #16 also has shutdown.target besides initrd-cleanup.service in the dependencies, but I think that is not necessary for initramfs, but only for multipathd.service that is used in rootfs. And right now, initramfs uses its own multipathd.service, not the one distributed with device-mapper-multipath that is used in rootfs.)

Comment 1 Ben Marzinski 2021-04-21 14:29:33 UTC
*** Bug 1951002 has been marked as a duplicate of this bug. ***

Comment 2 Ben Marzinski 2021-04-21 14:31:11 UTC
Bug 1951002 is yet another instance of this issue.

Comment 3 IBM Bug Proxy 2021-04-21 14:37:14 UTC
------- Comment From thorsten.diehl.com 2021-04-20 04:15 EDT-------
Hi Benjamin M.,
amazing! That makes sense and worked for me; in 20 out of 20 reboots where I previously failed to 100%, I have now 0% fails.

Here's what I did:
1. patching /usr/lib/dracut/modules.d/90multipath/multipathd.service:
--- multipathd.service.orig	2018-10-08 15:38:33.000000000 +0200
+++ /usr/lib/dracut/modules.d/90multipath/multipathd.service	2021-04-20 10:01:03.807343324 +0200
@@ -4,8 +4,10 @@
Wants=systemd-udev-trigger.service systemd-udev-settle.service local-fs-pre.target
After=systemd-udev-trigger.service systemd-udev-settle.service
Before=local-fs-pre.target
+Before=initrd-cleanup.service
DefaultDependencies=no
Conflicts=shutdown.target
+Conflicts=initrd-cleanup.service
ConditionKernelCommandLine=!nompath
ConditionKernelCommandLine=!rd.multipath=0
ConditionKernelCommandLine=!rd_NO_MULTIPATH
2. rebuilding the initrd, zipl
3. rebooting several times; in all cases the expected partition was found and used/mounted.

If this workaround (not sure, whether this is really a fix) finds it's way into RHEL8.4 zStream and RHEL8.5+, the suggested upstream device-mapper-multipath change should still be considered and evaluated for RHEL9. (But as long as this systemd tune works, I will not insist in, sure.)

------- Comment From thorsten.diehl.com 2021-04-20 11:18 EDT-------
This method works (as expected) also very well, if I have a large LVM on top (48 LUNs via 4 paths each via zfcp.conf; 48 PVs on single partitions, 2 VGs, 3 LVs, mountpoints /opt, /home and swap).

------- Comment From thorsten.diehl.com 2021-04-20 13:36 EDT-------
(In reply to comment #15)
> (In reply to IBM Bug Proxy from comment #5)
> > If this workaround (not sure, whether this is really a fix) finds it's way
> > into RHEL8.4 zStream and RHEL8.5+, the suggested upstream
> > device-mapper-multipath change should still be considered and evaluated for
> > RHEL9. (But as long as this systemd tune works, I will not insist in, sure.)
>
> The upstream fix you mentioned is already in the rhel-9 and fedora 34+
> packages.  Its goal is to fix the cases where multipath devices are created
> but not properly initialized, where the goal of the dracut fix is to make
> sure the devices are properly initialized in the first place. While I think
> that the dracut fix is the proper fix for your situation (and all the cases
> I've seen where devices aren't getting initialized), there is still a
> possibility of devices not getting properly initialized for other reasons,
> so I makes sense to have both fixes.
>
> Unless you have any objections, I'll close this bug as a duplicate of Bug
> 1949076?

Well, it depends.
https://bugzilla.redhat.com/show_bug.cgi?id=1933038 describes the same problem as I have.
https://bugzilla.redhat.com/show_bug.cgi?id=1949076 describes the dracut solution.
What will happen with bug 1933038?
(The background of my question is, on which RH bug can I track the implementation of the dracut fix - and to reference in our bugzilla to that RH bug.)

------- Comment From thorsten.diehl.com 2021-04-21 03:59 EDT-------
OK, then it's ok for me to close this bug as a duplicate of Bug 1949076 on RH side.

Comment 4 IBM Bug Proxy 2021-04-21 16:20:43 UTC
------- Comment From thorsten.diehl.com 2021-04-21 12:16 EDT-------
@Red Hat: My proposal for kbase article - feel free to tune:

Under very rare conditions it might happen, that one or more of your multipathed LUNs might not become available on reboot.
This might cause dracut to drop into an emergency shell.
The symptom is described more detailed in https://bugzilla.redhat.com/show_bug.cgi?id=1933038 and in https://bugzilla.redhat.com/show_bug.cgi?id=1951002.
To avoid this and get the system up and running, one possible (temporary) workaround is to start the system with a single CPU only, e.g. by reconfiguration in the respective hypervisor or by adding a kernel parameter " nr_cpus=1" during boot. After successful boot, one can add the missing ordering in multipathd.service that is used in dracut (/usr/lib/dracut/modules.d/90multipath/multipathd.service) by adding the following entries:

Before=initrd-cleanup.service

as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1949076#c0

Rebuilding the initrd and rewriting the boot loader is required to make these changes effective.
When the system has been rebooted, the reduction of the number of CPUs can be removed.

Comment 5 Harald Hoyer 2021-04-29 09:30:38 UTC
Upstream PR https://github.com/dracutdevs/dracut/pull/1394

Comment 6 Peter Rajnoha 2021-05-25 20:31:34 UTC
*** Bug 1933038 has been marked as a duplicate of this bug. ***

Comment 11 IBM Bug Proxy 2021-07-07 10:50:57 UTC
------- Comment From thorsten.diehl.com 2021-07-07 06:46 EDT-------
Is there already an errata / RHBA available for that issue?

Comment 12 IBM Bug Proxy 2021-07-07 11:00:57 UTC
------- Comment From thorsten.diehl.com 2021-07-07 06:52 EDT-------
any updates here?

Comment 13 IBM Bug Proxy 2021-07-13 14:10:55 UTC
------- Comment From tstaudt.com 2021-07-13 10:10 EDT-------
Please find below the Business Justification to provide this fix also for RHEL 8.4.0.z stream.
Thanks.

------------ template for zstream request (as of 02/07/2020) --------------
1. Origin of z-stream request:
a. {customer, partner, Red Hat product, Red Hat business opportunity, etc}

Customer

2.Stream(s) where inclusion is requested (all that apply):

RHEL 8.4.0.z

3.List of bugs required to resolve this request fully:
a. Where the nature of the resolution requires inclusion of patchsets from
multiple bugs, or across multiple streams, how is resolving this
justified / required over the opportunity to instead resolve multiple
issues for more than one requestor?

Red Hat Bug 1949076 - Add missing ordering with initrd-cleanup.service in dracut's multipathd.service

4.Requested resolution target date:

next z-stream

5.Architectures where the problem is present:

s390x

6.Has the problem been seen by real customers in the field:

Yes

7.How many customers are experiencing the problem:

TBD

8.How often is the problem encountered:

TBD

9.What is the impact where encountered:

Partition cannot be mounted or accessed, restart or reboot will not work

10.What workarounds are possible, and how do they impact customer/partner
business:

Manual edits of systemd unit files are possible, but no valid solution for an enterprise system

11.What is the consequence if not resolved by the requested target date:

Customer dissatisfaction, might move away from using RHEL

12.What is the level of risk inherent in fixing the problem with available
patchset:
a. historical sensitivity of the code area to code changes, longevity and
stability of patchset; please consult with the bug assignee for an
engineering assessment on this

low

13.What are the consequences to the customer/partner/Red Hat business
if this is not resolved by the requested time, or in z-stream at all?

Customer dissatisfaction, might move away from using RHEL

14.Driver updates, hardware enablement, and new functionality/features are excluded from Z-stream by policy.  If your request requires inclusion of new functionality or feature
a. How is existing behaviour maintained, and how likely would follow up
fixes be required in order to finish the new feature?
b. Who is signing off on the policy waiver to request the inclusion against
policy

n/a

------------ end of template for zstream request ------------------------

Comment 21 Petr Matyáš 2021-08-16 12:41:28 UTC
Verified on dracut-049-188.git20210802.el8.x86_64

Comment 23 IBM Bug Proxy 2021-09-07 16:41:13 UTC
------- Comment From thorsten.diehl.com 2021-09-07 12:37 EDT-------
I reinstalled RHEL8.5 Nightly 0726, kernel 4.18.0-323.el8, with dracut version 049-136.git20210426.el8.
Problem occured very frequently, as expected.

Then I updated to dracut version 049-188.git20210802.el8, rebuilt the initrd and zipled.
Before reboot, I had a look into /usr/lib/dracut/modules.d/90multipath/multipathd.service, which showed the code with added statements as in the above described workaround.
The problem is now fixed! I was not able to reproduce it any more (during 100 reboots). Thanks.
Closing this bug.

Comment 26 errata-xmlrpc 2021-11-09 19:38:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (dracut bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4394

Comment 27 David Teigland 2022-01-05 15:11:02 UTC
*** Bug 2018269 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.