1886229 – Multipath support for RHCOS sysroot

Bug 1886229 - Multipath support for RHCOS sysroot

Summary: Multipath support for RHCOS sysroot

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RHCOS
Sub Component:
Version:	4.6
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Jonathan Lebon
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-10-08 02:15 UTC by mkumatag
Modified:	2024-10-01 16:57 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:	Feature: RHCOS now supports multipath on the primary disk. Reason: Multipathing allows stronger resilience to hardware failure. Result: Users can now set up RHCOS on top of multipath to achieve higher host availability.
Clone Of:
Environment:
Last Closed:	2021-02-24 15:23:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	coreos coreos-assembler pull 2009	0	None	closed	kola: add multipath test	2021-02-08 22:06:18 UTC
Red Hat Product Errata	RHSA-2020:5633	0	None	None	None	2021-02-24 15:24:25 UTC

Description mkumatag 2020-10-08 02:15:52 UTC

Description of problem:

Include the multipath support for RHCOS including dracut level, in the production environment it is very common if vms are backed by SAN storage, it is always recommended to use the multipath for the failure and redundancy.

Hence this task it to enable the multipath support for the boot, sysroot

Comment 1 Ben Howard 2020-10-09 14:17:01 UTC

The support for dracut support hasn't landed in RHEL yet, and therefore, is unable to land in RHCOS.

We need https://github.com/dracutdevs/dracut/pull/780

Comment 2 Ben Howard 2020-10-09 14:17:09 UTC

The support for dracut support hasn't landed in RHEL yet, and therefore, is unable to land in RHCOS.

We need https://github.com/dracutdevs/dracut/pull/780

Comment 3 Micah Abbott 2020-10-09 14:27:03 UTC

We'll track this on the RHCOS side, targeted for 4.7.  But it will be up to RHEL to ship this as part of dracut before RHCOS can use it.

Comment 4 Prashanth Sundararaman 2020-10-15 18:26:36 UTC

i filed https://bugzilla.redhat.com/show_bug.cgi?id=1888779 for updating dracut in RHEL.

Comment 5 Micah Abbott 2020-10-25 18:34:44 UTC

We are working on higher priority items in the 4.6 release; marking for UpcomingSprint

Comment 6 Prashanth Sundararaman 2020-11-05 15:15:09 UTC

In addition to dracut being patched , would also need this to support multi-path on RHCOS: https://github.com/openshift/os/issues/426

Comment 7 Micah Abbott 2020-12-04 22:20:34 UTC

The PR https://github.com/dracutdevs/dracut/pull/780 is included in dracut-049-100.git20201120.el8 which is included in RHCOS 47.83.202012021342-0

Additionally, the issue in comment #6 was addressed in ignition-2.8.0-1.rhaos4.7.gitdb4d30d.el8 (perhaps in an earlier version)

I am going to move this to POST and ask that the reporter try with a recent RHCOS 4.7 build, if possible.

Comment 8 Prashanth Sundararaman 2020-12-05 01:46:08 UTC

I have tried multipath with zVM by enabling it through machine config and one of the power team testers also tried it on powerVM and it works.

Manju - can you confirm that multipath provisioning works on ppc64le?

Comment 9 Micah Abbott 2020-12-10 17:17:25 UTC

(In reply to Prashanth Sundararaman from comment #8)
> I have tried multipath with zVM by enabling it through machine config and
> one of the power team testers also tried it on powerVM and it works.

This is encouraging; I'll move this to MODIFIED

Comment 11 mkumatag 2020-12-14 07:10:26 UTC

(In reply to Prashanth Sundararaman from comment #8)
> I have tried multipath with zVM by enabling it through machine config and
> one of the power team testers also tried it on powerVM and it works.
> 

I depend on Archana's test for the ppc64le platform
> Manju - can you confirm that multipath provisioning works on ppc64le?

aprabhak

Comment 17 Prashanth Sundararaman 2021-01-12 21:06:57 UTC

I see this error too..but only when `rd.multipath=default` and `root=/dev/disk/by-label/dm-mpath-root` are added as separate ostree commits. if they are done at the same time and rebooted, it works fine.

Comment 18 Prashanth Sundararaman 2021-01-12 22:30:25 UTC

one more issue i am seeing - when i try to setup multipath on a libvirt IPI installed OCP system and i provide the kargs and reboot, the system enters the emergency shell. I couldn't find anything in the journal logs pointing to anything obvious. The difference in the libvirt setup is that it is based off the qemu qcow disk image and the coreos installer is not involved. Any ideas on why this happens would be appreciated.

Comment 19 Micah Abbott 2021-01-13 14:49:40 UTC

Based on the last few comments, I'm not convinced this is fully functional/issue free.  Going to set this back to ASSIGNED and drop it from the errata.

Let's figure out how to make this work reliably.

Comment 21 Jonathan Lebon 2021-01-13 22:26:51 UTC

Prashanth, could you test out RHCOS on top of https://github.com/openshift/os/pull/484 and https://github.com/coreos/fedora-coreos-config/pull/815? This fixes the multipath issues for me on x86_64 with `--qemu-multipath`, though if you have a more realistic setup, it'd be great to validate it.

Comment 22 Prashanth Sundararaman 2021-01-15 16:54:11 UTC

Ok - i just figured out why it doesn't work on libvirt. The virtual disks on libvirt do not have the "SCSI_IDENT" or "ID_WWN" property and one of them is needed for multipath, else it will be excluded. the multipath.conf has this by default:

blacklist_exceptions {
        property "(SCSI_IDENT_|ID_WWN)"
}

So this is a non-issue.

Comment 23 Micah Abbott 2021-01-15 20:36:32 UTC

Higher priority work has prevented from this issue being solved; adding UpcomingSprint keyword

Comment 24 Jonathan Lebon 2021-01-15 21:36:21 UTC

CI coverage added in https://github.com/coreos/coreos-assembler/pull/2009.

Comment 25 Prashanth Sundararaman 2021-01-15 23:29:56 UTC

Tested multipath with the latest s390x rhcos(47.83.202101150912-0) on a zVM(baremetal equivalent) and it works fine.

Comment 27 Michael Nguyen 2021-01-22 13:41:23 UTC

Verified on RHCOS 47.83.202101161239-0.  This boot image is included in registry.ci.openshift.org/ocp/release:4.7.0-0.nightly-2021-01-22-085446

Comment 30 errata-xmlrpc 2021-02-24 15:23:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 31 Aleksey Usov 2021-02-25 05:58:36 UTC

Hello everyone. Does anyone know how to apply this to bare metal? Do I have to create a live ignition config to set up /dev/mapper/mpatha device or something else? I tried booting into a live environment (where coreos-installer is supposed to be run from), but there was only /dev/mapper/control in /dev/mapper. Not sure how to proceed.

Comment 32 Jonathan Lebon 2021-03-01 16:14:06 UTC

Hi Aleksey, see the documentation changes in https://github.com/openshift/openshift-docs/pull/28972.

Comment 33 Aleksey Usov 2021-03-10 04:40:04 UTC

(In reply to Jonathan Lebon from comment #32)
> Hi Aleksey, see the documentation changes in
> https://github.com/openshift/openshift-docs/pull/28972.

Thanks for the useful link. However, it's for IBM/Z. So I have a few questions:

  1. For another platform I would have to omit rd.zfcp and rd.znet, correct?
  2. I used coreos.inst.install_dev=/dev/mapper/mpath when bootstrapping all nodes. So all that is left to do is to create a MachineConfig that enabled multipath support, no reinstall necessary?
  3. Since device naming (mpatha, mpathb, etc) depends on the order LUNs are discovered, if there are more than 1, how do I ensure that CoreOS installs on a specific LUN? For IBM/Z, it says to use rd.zfcp, but I don't have that option on Cisco UCS and we are going to present multiple LUNs to the same nodes.
  4. How do I initiate a scan for newly attached LUNs? Is there way to configure multipathd (or something else) to do it automatically?

Comment 34 Jonathan Lebon 2021-03-10 15:15:22 UTC

(In reply to Aleksey Usov from comment #33)
> (In reply to Jonathan Lebon from comment #32)
> > Hi Aleksey, see the documentation changes in
> > https://github.com/openshift/openshift-docs/pull/28972.
> 
> Thanks for the useful link. However, it's for IBM/Z. So I have a few
> questions:
> 
>   1. For another platform I would have to omit rd.zfcp and rd.znet, correct?

Right, you can ignore the Z-specific bits in there. The important part is the MachineConfig containing the `root` and `rd.multipath` kernel arguments.

>   2. I used coreos.inst.install_dev=/dev/mapper/mpath when bootstrapping all
> nodes. So all that is left to do is to create a MachineConfig that enabled
> multipath support, no reinstall necessary?

Correct, enabling multipath via MC does not require any reinstallation.

Note BTW that while installing onto a multipathed target works, it doesn't actually help much because on reboot into the installed system, the node is no longer in multipath mode until the MachineConfig is applied (and the node is rebooted once more). (And it's not currently supported to add `--append-karg` at install time for the `root` and `rd.multipath` kargs).

>   3. Since device naming (mpatha, mpathb, etc) depends on the order LUNs are
> discovered, if there are more than 1, how do I ensure that CoreOS installs
> on a specific LUN? For IBM/Z, it says to use rd.zfcp, but I don't have that
> option on Cisco UCS and we are going to present multiple LUNs to the same
> nodes.

As noted above, installation doesn't need to be done on a multipathed target. Which target device to use to install isn't really something RHCOS can help with. That said, for multipath if you know the WWN of the target device, you can use the `/dev/disk/by-id/wwn-...` symlink.

>   4. How do I initiate a scan for newly attached LUNs? Is there way to
> configure multipathd (or something else) to do it automatically?

Hmm, I'm not sure. I would check the CLI documentation (or a more brute-force approach which might work is just restarting `multipathd.service`).

Comment 35 Aleksey Usov 2021-03-10 19:49:39 UTC

(In reply to Jonathan Lebon from comment #34)
> (In reply to Aleksey Usov from comment #33)
> > (In reply to Jonathan Lebon from comment #32)
> > > Hi Aleksey, see the documentation changes in
> > > https://github.com/openshift/openshift-docs/pull/28972.
> > 
> > Thanks for the useful link. However, it's for IBM/Z. So I have a few
> > questions:
> > 
> >   1. For another platform I would have to omit rd.zfcp and rd.znet, correct?
> 
> Right, you can ignore the Z-specific bits in there. The important part is
> the MachineConfig containing the `root` and `rd.multipath` kernel arguments.
> 
> >   2. I used coreos.inst.install_dev=/dev/mapper/mpath when bootstrapping all
> > nodes. So all that is left to do is to create a MachineConfig that enabled
> > multipath support, no reinstall necessary?
> 
> Correct, enabling multipath via MC does not require any reinstallation.
> 
> Note BTW that while installing onto a multipathed target works, it doesn't
> actually help much because on reboot into the installed system, the node is
> no longer in multipath mode until the MachineConfig is applied (and the node
> is rebooted once more). (And it's not currently supported to add
> `--append-karg` at install time for the `root` and `rd.multipath` kargs).
> 
> >   3. Since device naming (mpatha, mpathb, etc) depends on the order LUNs are
> > discovered, if there are more than 1, how do I ensure that CoreOS installs
> > on a specific LUN? For IBM/Z, it says to use rd.zfcp, but I don't have that
> > option on Cisco UCS and we are going to present multiple LUNs to the same
> > nodes.
> 
> As noted above, installation doesn't need to be done on a multipathed
> target. Which target device to use to install isn't really something RHCOS
> can help with. That said, for multipath if you know the WWN of the target
> device, you can use the `/dev/disk/by-id/wwn-...` symlink.
> 
> >   4. How do I initiate a scan for newly attached LUNs? Is there way to
> > configure multipathd (or something else) to do it automatically?
> 
> Hmm, I'm not sure. I would check the CLI documentation (or a more
> brute-force approach which might work is just restarting
> `multipathd.service`).

So I tried applying that MachineConfig, but the 1st node in the MachineConfigPool is stuck updating and won't switch back to schedulable. MCP and the node itself don't report any issues and multipath -ll displays all multipath devices now, as expected. Any thoughts on where to look?

Comment 36 Aleksey Usov 2021-03-10 20:04:16 UTC

Update: I noticed that a machine config daemon pod on that node is in CrashLoopBackOff and the following is in its logs:

Mar 10 19:47:06 ************************************ systemd[1]: rpm-ostreed.service: Failed with result 'exit-code'.

Mar 10 19:47:06 ************************************ systemd[1]: Failed to start rpm-ostree System Management Daemon.

Mar 10 19:47:06 ************************************ systemd[1]: rpm-ostreed.service: Consumed 146ms CPU time

Mar 10 19:52:32 ************************************ systemd[1]: Starting rpm-ostree System Management Daemon...

Mar 10 19:52:32 ************************************ rpm-ostree[28008]: Reading config file '/etc/rpm-ostreed.conf'

Mar 10 19:52:32 ************************************ rpm-ostree[28008]: error: Couldn't start daemon: Error setting up sysroot: loading sysroot: Unexpected state: /run/ostree-booted found, but no /boot/loader directory

Mar 10 19:52:32 ************************************ systemd[1]: rpm-ostreed.service: Main process exited, code=exited, status=1/FAILURE

Mar 10 19:52:32 ************************************ systemd[1]: rpm-ostreed.service: Failed with result 'exit-code'.

Mar 10 19:52:32 ************************************ systemd[1]: Failed to start rpm-ostree System Management Daemon.

Mar 10 19:52:32 ************************************ systemd[1]: rpm-ostreed.service: Consumed 145ms CPU time

Has anyone seen this before? Quick search yields that initrd needs to be regenerated, but I'll see what else I can dig up before I take any action.

Comment 37 Aleksey Usov 2021-03-10 21:38:09 UTC

I also noticed that /boot was empty and boot.mount unit was failing. Was able to fix the problem by executing mount /dev/disk/by-label/boot /boot && reboot as root. Since it doesn't happen on all nodes, I'm inclined to think that it is a race condition.

Comment 38 Jonathan Lebon 2021-03-11 15:53:47 UTC

(In reply to Aleksey Usov from comment #37)
> I also noticed that /boot was empty and boot.mount unit was failing. Was
> able to fix the problem by executing mount /dev/disk/by-label/boot /boot &&
> reboot as root. Since it doesn't happen on all nodes, I'm inclined to think
> that it is a race condition.

Interesting. Can you file a new RHBZ about this with the full logs from the machine? Likely something with udev or a race between the MCO->rpm-ostree and multipathd.

Comment 39 Aleksey Usov 2021-03-16 17:45:40 UTC

(In reply to Jonathan Lebon from comment #34)
> (In reply to Aleksey Usov from comment #33)
> > (In reply to Jonathan Lebon from comment #32)
> > > Hi Aleksey, see the documentation changes in
> > > https://github.com/openshift/openshift-docs/pull/28972.
> > 
> > Thanks for the useful link. However, it's for IBM/Z. So I have a few
> > questions:
> > 
> >   1. For another platform I would have to omit rd.zfcp and rd.znet, correct?
> 
> Right, you can ignore the Z-specific bits in there. The important part is
> the MachineConfig containing the `root` and `rd.multipath` kernel arguments.
> 
> >   2. I used coreos.inst.install_dev=/dev/mapper/mpath when bootstrapping all
> > nodes. So all that is left to do is to create a MachineConfig that enabled
> > multipath support, no reinstall necessary?
> 
> Correct, enabling multipath via MC does not require any reinstallation.
> 
> Note BTW that while installing onto a multipathed target works, it doesn't
> actually help much because on reboot into the installed system, the node is
> no longer in multipath mode until the MachineConfig is applied (and the node
> is rebooted once more). (And it's not currently supported to add
> `--append-karg` at install time for the `root` and `rd.multipath` kargs).
> 
> >   3. Since device naming (mpatha, mpathb, etc) depends on the order LUNs are
> > discovered, if there are more than 1, how do I ensure that CoreOS installs
> > on a specific LUN? For IBM/Z, it says to use rd.zfcp, but I don't have that
> > option on Cisco UCS and we are going to present multiple LUNs to the same
> > nodes.
> 
> As noted above, installation doesn't need to be done on a multipathed
> target. Which target device to use to install isn't really something RHCOS
> can help with. That said, for multipath if you know the WWN of the target
> device, you can use the `/dev/disk/by-id/wwn-...` symlink.
> 
> >   4. How do I initiate a scan for newly attached LUNs? Is there way to
> > configure multipathd (or something else) to do it automatically?
> 
> Hmm, I'm not sure. I would check the CLI documentation (or a more
> brute-force approach which might work is just restarting
> `multipathd.service`).

So I tried using /dev/disk/by-id/wwwn-... device, but got an error:

Error: checking for exclusive access to /dev/disk/by-id/wwn-...
Caused by: couldn't reread partition table: device may not support partitions
Caused by: EINVAL: Invalid argument

For some reason, /dev/disk/by-id/wwn-... doesn't work even though both it and /dev/mapper/mpatha are symlinks that reference the same device /dev/dm-0.

Any suggestions will be appreciated.

Comment 40 Jonathan Lebon 2021-03-17 15:37:12 UTC

> For some reason, /dev/disk/by-id/wwn-... doesn't work even though both it and /dev/mapper/mpatha are symlinks that reference the same device /dev/dm-0.

Ahh yup, that's a bug in coreos-installer (see https://github.com/coreos/coreos-installer/pull/499).

For now, one suggestion is to *not* turn on multipath at installation time (since as mentioned above, it's not truly turned on until after the kargs are added on the installed system). Then, you should be able to use the WWN symlinks (which will just point to one of the underlying devices; we don't really care which).

Comment 41 Aleksey Usov 2021-03-17 16:05:48 UTC

(In reply to Jonathan Lebon from comment #40)
> > For some reason, /dev/disk/by-id/wwn-... doesn't work even though both it and /dev/mapper/mpatha are symlinks that reference the same device /dev/dm-0.
> 
> Ahh yup, that's a bug in coreos-installer (see
> https://github.com/coreos/coreos-installer/pull/499).
> 
> For now, one suggestion is to *not* turn on multipath at installation time
> (since as mentioned above, it's not truly turned on until after the kargs
> are added on the installed system). Then, you should be able to use the WWN
> symlinks (which will just point to one of the underlying devices; we don't
> really care which).

Thank you for the quick reply. So skip rd.multipath=default at installation time, but apply it via a MachineConfig at post-install, correct?

Comment 42 Jonathan Lebon 2021-03-17 18:51:57 UTC

(In reply to Aleksey Usov from comment #41)
> (In reply to Jonathan Lebon from comment #40)
> > > For some reason, /dev/disk/by-id/wwn-... doesn't work even though both it and /dev/mapper/mpatha are symlinks that reference the same device /dev/dm-0.
> > 
> > Ahh yup, that's a bug in coreos-installer (see
> > https://github.com/coreos/coreos-installer/pull/499).
> > 
> > For now, one suggestion is to *not* turn on multipath at installation time
> > (since as mentioned above, it's not truly turned on until after the kargs
> > are added on the installed system). Then, you should be able to use the WWN
> > symlinks (which will just point to one of the underlying devices; we don't
> > really care which).
> 
> Thank you for the quick reply. So skip rd.multipath=default at installation
> time, but apply it via a MachineConfig at post-install, correct?

Yes, exactly.

Comment 43 Neil Girard 2021-03-18 11:29:40 UTC

Is there an BZ related to the fix (https://github.com/coreos/coreos-installer/pull/499) or an ETA on when that will be available?  My customer had the same issue and for now is doing the workaround mentioned above (to not use multipath at install time).

Comment 44 Jonathan Lebon 2021-03-18 15:20:23 UTC

(In reply to Neil Girard from comment #43)
> Is there an BZ related to the fix
> (https://github.com/coreos/coreos-installer/pull/499) or an ETA on when that
> will be available?  My customer had the same issue and for now is doing the
> workaround mentioned above (to not use multipath at install time).

Not at this time. This is more of a documentation issue I think. The use case for installing directly to a device mapper target is a bit special (see the original upstream case here: https://github.com/coreos/coreos-installer/issues/91). For multipath specifically, the first boot of the installed system must not be multipathed as mentioned above. By extension, it is simpler to also leave multipathing off during the installation phase as well. In the future, we may support multipath starting from installation time. coreos-installer supports this (and the patch above fixes the code related to this), but there's more work needed in Ignition and RHCOS.

Comment 47 Red Hat Bugzilla 2023-09-15 00:49:20 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.