1974411 – Installation with multipath parameters in parmfile fails (DNS resolution missing)

Bug 1974411 - Installation with multipath parameters in parmfile fails (DNS resolution missing)

Summary: Installation with multipath parameters in parmfile fails (DNS resolution miss...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Multi-Arch
Sub Component:
Version:	4.8
Hardware:	s390x
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Muhammad Adeel (IBM)
QA Contact:	Douglas Slavens
Docs Contact:
URL:
Whiteboard:
Depends On:	1981999
Blocks:	1984086
TreeView+	depends on / blocked

Reported:	2021-06-21 15:37 UTC by Stefan Orth
Modified:	2021-10-18 17:36 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1984086 (view as bug list)
Environment:
Last Closed:	2021-10-18 17:35:54 UTC
Target Upstream Version:
Embargoed:
Flags:	madeel: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	coreos coreos-installer pull 565	None	open	Add `coreos-installer install --http-retries`	2021-06-24 19:06:13 UTC
Github	coreos fedora-coreos-config pull 1119	None	open	[rhcos-4.8] coreos-propagate-multipath-conf: various minor tweaks	2021-07-19 15:46:03 UTC
Red Hat Issue Tracker	MULTIARCH-1356	None	Backlog	Installation with multipath parameters in parmfile fails (DNS resolution missing)	2021-06-21 15:41:01 UTC
Red Hat Product Errata	RHSA-2021:3759	None	None	None	2021-10-18 17:36:15 UTC

Description Stefan Orth 2021-06-21 15:37:17 UTC

Description of problem:

An installation with multipath parameters in the parmfile:

rd.multipath=default
coreos.inst.install_dev=/dev/mapper/mpatha

and hostnames in the parmfile fails.

The installation ends with an emergency shell. Network (IP) is configured, but name resolution is not working. Ping with IP to other system works.

The same installation (with MP parameters) works, if IP addresses are specified instead of hostnames in the parmfile.

It also works with hostnames in the parmfile, if rd.multipath=default is removed and sda is used instead of dev/mapper/mpatha.

It looks like the MP parameter(s) breaks the correct setup of the name resolution during installation. Not sure if it should be there, but there is no /etc/resolv.conf in the booted linux (emergency shell).


Version-Release number of selected component (if applicable):

oc version
Client Version: 4.8.0-0.nightly-s390x-2021-06-18-055818
Server Version: 4.8.0-0.nightly-s390x-2021-06-18-055818
Kubernetes Version: v1.21.0-rc.0+120883f


How reproducible:

Install a node with MP parameters and hostnames in the parmfile.


Steps to Reproduce:
1.
2.
3.

Actual results:

Installation ends in an emergency shell.

Expected results:

Installation process works.

Additional info:

Comment 1 Prashanth Sundararaman 2021-06-21 23:09:41 UTC

Created attachment 1792825 [details]
error snapshot

Comment 2 Prashanth Sundararaman 2021-06-21 23:14:08 UTC

Hi Jonathan,

Is this a possible regression caused by https://github.com/coreos/fedora-coreos-config/pull/1011 ?

Like the original description says, if the ignition url is configured with a hostname, the coreos-installer errors out. If configured with an ip address, it works.

Thanks
Prashanth

Comment 3 Dan Li 2021-06-22 14:25:50 UTC

Setting "Blocker-" after discussing with the team. Based on these reasons:
1. configuring multipath as a day 2 operation still works
2. specifying ip address instead of hostname works

Comment 4 Jonathan Lebon 2021-06-22 15:02:14 UTC

Hmm, I'm not sure how this could be multipath related.
It looks a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1967483, except in the initrd.

Full logs from the initrd would be helpful, esp. NetworkManager.

Comment 5 Prashanth Sundararaman 2021-06-22 19:57:13 UTC

funnily enough the coreos-livepxe-rootfs.service succeeds so it is able to resolve the hostname there, but not when running the coreos-installer.

Comment 6 Jonathan Lebon 2021-06-22 20:05:19 UTC

(In reply to Jonathan Lebon from comment #4)
> Hmm, I'm not sure how this could be multipath related.
> It looks a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1967483,
> except in the initrd.

Sorry, this is incorrect. This BZ matches rhbz#1967483 in that respect as well, since coreos-installer.service runs in the real root.
(I'm so used to "emergency shell" referring to the initrd emergency shell that my brain jumped to that. :) )

Comment 7 Nikita Dubrovskii (IBM) 2021-06-23 12:50:43 UTC

Today did some testing of custom rhcos-4.8 (with https://github.com/coreos/coreos-installer/pull/564), ignition config gets downloaded from github.com - system works without any DNS issues.
(
Here is cmdline:
```
Kernel command line: rd.neednet=1 dfltcc=off random.trust_cpu=on rd.znet=qeth,0.0.bdf0,0.0.bdf1,0.0.bdf2,layer2=1,por
tno=0 console=ttysclp0 ip=172.18.142.3::172.18.0.1:255.254.0.0:coreos:encbdf0:off nameserver=172.18.0.1 coreos.inst=yes coreos.inst.
insecure=yes coreos.inst.ignition_url=https://raw.githubusercontent.com/nikita-dubrovskii/s390x-ignition-configs/master/ignition.ign
 coreos.live.rootfs_url=http://172.18.10.243/rhcos-48.84.202106231130-0-live-rootfs.s390x.img zfcp.allow_lun_scan=0 cio_ignore=all,!
condev rd.zfcp=0.0.1903,0x500507630910d435,0x408240d100000000 rd.zfcp=0.0.1943,0x500507630914d435,0x408240d100000000 coreos.inst.ins
tall_dev=sda coreos.inst.mpath=yes 
```
)

Using another zVM/Linux as http-server with ignition config - also works (http://m1314001.lnxne.boe:8080/ignition/ignition.ign).
But using http://bastion.ocp-m1314001.lnxne.boe:8080/ignition/ignition.ign - doesn't work, 
so i guess there is smth wrong with bastion node's config (as you can see same m1314001 is used as http-server).

Comment 8 Jonathan Lebon 2021-06-24 19:06:15 UTC

> Using another zVM/Linux as http-server with ignition config - also works (http://m1314001.lnxne.boe:8080/ignition/ignition.ign).
But using http://bastion.ocp-m1314001.lnxne.boe:8080/ignition/ignition.ign - doesn't work, 
so i guess there is smth wrong with bastion node's config (as you can see same m1314001 is used as http-server).

That's interesting, thanks for the tests. I did some interactive debugging via screenshare with @madeel on this and indeed we saw the install pass without multipath enabled, and fail with it enabled.

I'm still not sure how multipath can affect DNS resolution, unless it simply makes an existing race easier to trigger. If that's the case, then it might be helped by https://github.com/coreos/coreos-installer/pull/565. I've made a scratch build with that patch:

http://brew-task-repos.usersys.redhat.com/repos/scratch/jlebon/coreos-installer/0.9.0/7.pr565.rhaos4.8.el8/s390x/

Re-hosted RPMs in a public space if you don't have VPN access:

https://jlebon.fedorapeople.org/coreos-installer-0.9.0-7.pr565.rhaos4.8.el8.s390x.rpm
https://jlebon.fedorapeople.org/coreos-installer-bootinfra-0.9.0-7.pr565.rhaos4.8.el8.s390x.rpm

Developers with access to an s390x machine who can reproduce this bug should be able to build an RHCOS image with those RPMs and test that.

Comment 9 Nikita Dubrovskii (IBM) 2021-06-25 08:00:07 UTC

Ok, look's like i've got what' wrong here:

1) with 'coreos.inst.install_dev=/dev/mapper/mpatha rd.multipath=default' and `hostname.com/ignition.conf` and in the parm file:
coreos-installer cannot fetch ignition (DNS), but! at first coreos tries to propagate 'multipat.conf' to the '/sysroot', so we end up with a failure:

```
coreos-propagate-multipath-conf[926]: cp: cannot create regular file '/sysroot/etc/multipath.conf': Read-only file system 
systemd[1]: coreos-propagate-multipath-conf.service: Main process exited, code=exited, status=1/FAILURE 
...

systemd[1]: Reached target Emergency Mode. 
```

2) with `coreos.inst.install_dev=/dev/mapper/mpatha rd.multipath=default` and `1.2.3.4/ignition.conf`  in the parm file:
coreos-installer can fetch ignition (no DNS) , but fails with `kpartx` (propagation of 'multipat.conf' to the '/sysroot' also failed):

```
coreos-propagate-multipath-conf[926]: cp: cannot create regular file '/sysroot/etc/multipath.conf': Read-only file system 
...
systemd[1]: Reached target Emergency Mode. 
...

[   23.522376] coreos-installer-service[1859]: device-mapper: resume ioctl on mpatha4  failed: Invalid argument 
[   23.522453] coreos-installer-service[1859]: resume failed on mpatha4 
[   23.811211] coreos-installer-service[1859]: Error: getting partition table for /dev/mapper/mpatha 
[   23.811374] coreos-installer-service[1859]: Caused by: 
[   23.811395] coreos-installer-service[1859]:     "kpartx" "-u" "-n" "/dev/dm-0" failed with exit code: 1 
Failed to start CoreOS Installer. 
```

If we take a look at /etc/resolv.conf without multipath, we have valid config:
```
search lnxne.boe  
nameserver 172.18.0.1
```

But with `rd.multipath=default` it's empty, systemd already had failed, so for me it looks not like a DNS issue.


And installing this way also makes no sense - during fristboot coreos starts without multipath, 
so i don't see any reason for installing coreos with `rd.multipath=default` right now.


i would assume this as not a bug, or not a DNS-bug

Comment 10 Jonathan Lebon 2021-06-28 16:10:41 UTC

(In reply to Nikita Dubrovskii (IBM) from comment #9)
> Ok, look's like i've got what' wrong here:
> 
> 1) with 'coreos.inst.install_dev=/dev/mapper/mpatha rd.multipath=default'
> and `hostname.com/ignition.conf` and in the parm file:

What is that karg? Do you mean `ip=...`? Can you show the full parmfile you used?

> coreos-installer cannot fetch ignition (DNS), but! at first coreos tries to
> propagate 'multipat.conf' to the '/sysroot', so we end up with a failure:
> 
> ```
> coreos-propagate-multipath-conf[926]: cp: cannot create regular file
> '/sysroot/etc/multipath.conf': Read-only file system 
> systemd[1]: coreos-propagate-multipath-conf.service: Main process exited,
> code=exited, status=1/FAILURE 
> ...
> 
> systemd[1]: Reached target Emergency Mode. 
> ```

Ouch good catch. So we continue on to the real root even if the service failed.

> 2) with `coreos.inst.install_dev=/dev/mapper/mpatha rd.multipath=default`
> and `1.2.3.4/ignition.conf`  in the parm file:
> coreos-installer can fetch ignition (no DNS) , but fails with `kpartx`
> (propagation of 'multipat.conf' to the '/sysroot' also failed):
> 
> ```
> coreos-propagate-multipath-conf[926]: cp: cannot create regular file
> '/sysroot/etc/multipath.conf': Read-only file system 
> ...
> systemd[1]: Reached target Emergency Mode. 
> ...
> 
> [   23.522376] coreos-installer-service[1859]: device-mapper: resume ioctl
> on mpatha4  failed: Invalid argument 
> [   23.522453] coreos-installer-service[1859]: resume failed on mpatha4 
> [   23.811211] coreos-installer-service[1859]: Error: getting partition
> table for /dev/mapper/mpatha 
> [   23.811374] coreos-installer-service[1859]: Caused by: 
> [   23.811395] coreos-installer-service[1859]:     "kpartx" "-u" "-n"
> "/dev/dm-0" failed with exit code: 1 
> Failed to start CoreOS Installer. 
> ```
> 
> If we take a look at /etc/resolv.conf without multipath, we have valid
> config:
> ```
> search lnxne.boe  
> nameserver 172.18.0.1
> ```
> 
> But with `rd.multipath=default` it's empty, systemd already had failed, so
> for me it looks not like a DNS issue.

OK, so I think there are two issues here:
1. `coreos-propagate-multipath-conf.service` doesn't have

```
OnFailure=emergency.target
OnFailureJobMode=isolate
```

2. We have no ordering between `coreos-propagate-multipath-conf.service` and `sysroot-etc.mount`.

<times passes>

Filed: https://github.com/coreos/fedora-coreos-config/pull/1077

Can you try that out?

> And installing this way also makes no sense - during fristboot coreos starts
> without multipath, 
> so i don't see any reason for installing coreos with `rd.multipath=default`
> right now.

It's valid to turn on multipath at installation time so that coreos-installer can copy the content on top of the multipath target (for the same reasons as https://github.com/coreos/fedora-coreos-config/pull/1011). coreos-installer should support this already (see e.g. https://github.com/coreos/coreos-installer/pull/499), but if we hit issues with kpartx there, let's work on fixing them.

Comment 11 Dan Li 2021-06-28 18:49:30 UTC

Hi Muhammad, do you think this bug will be resolved before the end of this sprint (July 3rd)? If not, can we set "Reviewed-in-Sprint"?

Comment 12 Muhammad Adeel (IBM) 2021-06-29 07:45:13 UTC

Hi Dan, The root cause is still not clear, so please set the reviewed flag.

Comment 13 Nikita Dubrovskii (IBM) 2021-06-29 08:45:21 UTC

(In reply to Jonathan Lebon from comment #10)
> (In reply to Nikita Dubrovskii (IBM) from comment #9)
> > Ok, look's like i've got what' wrong here:
> > 
> > 1) with 'coreos.inst.install_dev=/dev/mapper/mpatha rd.multipath=default'
> > and `hostname.com/ignition.conf` and in the parm file:
> 
> What is that karg? Do you mean `ip=...`? Can you show the full parmfile you
> used?

no, it's not an IP here, but some hostname:
``` ip=172.18.142.3::172.18.0.1:255.254.0.0:coreos:encbdf0:off nameserver=172.18.0.1 coreos.inst=yes coreos.inst.ignition_url=http://m1314001.lnxne.boe:8080/ignition/ignition.ign ```

```

> OK, so I think there are two issues here:
> 1. `coreos-propagate-multipath-conf.service` doesn't have
> 
> ```
> OnFailure=emergency.target
> OnFailureJobMode=isolate
> ```
> 2. We have no ordering between `coreos-propagate-multipath-conf.service` and
> `sysroot-etc.mount`.
> 
> <times passes>
> 
> Filed: https://github.com/coreos/fedora-coreos-config/pull/1077
> 
> Can you try that out?

Did it, works as expected:
- system could be installed using DNS (```coreos.inst.ignition_url=http://m1314001.lnxne.boe:8080/ignition/ignition.ign```)
- system could be installed using IP (```coreos.inst.ignition_url=http://172.18.10.243/ignition.ign```)

> with kpartx there, let's work on fixing them.

Here is PR for kpartx issue:
https://github.com/coreos/coreos-installer/pull/566

Comment 14 Dan Li 2021-07-19 14:38:56 UTC

Hi Muhammad, do you think this bug will move past ON_QA by the end of this Sprint? If not, can we add "reviewed-in-sprint" flag?

Comment 15 Muhammad Adeel (IBM) 2021-07-19 15:02:34 UTC

Hi Jonathan, do you know when the fix[https://github.com/coreos/fedora-coreos-config/pull/1077] will be pickup by RHCOS?

Comment 16 Jonathan Lebon 2021-07-19 15:46:06 UTC

Will try to get it in the next 4.8 bootimage bump.

Comment 17 Jonathan Lebon 2021-07-20 15:50:53 UTC

Latest RHCOS 4.9 build should have the necessary patches for this bug, so should be ready to be verified. Muhammad, can you verify it's fixed?

Comment 18 Jonathan Lebon 2021-07-20 18:18:32 UTC

Sorry for the confusion on this. It has to stay in POST until the 4.9 bootimage bump PR gets merged.

Comment 19 Dan Li 2021-07-21 14:11:18 UTC

Setting reviewed-in-sprint as we are waiting for OpenShift to pick up the RHCOS PR

Comment 20 Dan Li 2021-08-10 16:53:09 UTC

Hi Muhammad, do you think this bug will reach ON_QA by the end of this sprint (August 14th)? If not, can we add "reviewed-in-sprint" flag?

Comment 21 Muhammad Adeel (IBM) 2021-08-11 07:09:57 UTC

Hi Dan, the fix is landed in 4.9. Though it needs to be tested, therefore you can add the "reviewed-in-sprint" flag.

Comment 22 Stefan Orth 2021-08-11 09:34:43 UTC

I successfully installed two nodes with rd.multipath=default in parmfile:

rd.neednet=1 rd.multipath=default console=ttysclp0   coreos.inst.install_dev=/dev/mapper/mpatha coreos.live.rootfs_url=http://bistro.lnxne.boe/redhat/alkl/rhcos/nightly/PE/rhcos-49.84.202108041448-0/rhcos-49.84.202108041448-0-live-rootfs.s390x.img coreos.inst.ignition_url=http://bastion.m3558001.lnxne.boe:8080/ignition/worker.ign ip=10.107.1.52::10.107.1.51:255.255.255.0::ence383:none nameserver=10.107.1.51 zfcp.allow_lun_scan=0 cio_ignore=all,!condev rd.znet=qeth,0.0.e383,0.0.e384,0.0.e385,layer2=1 rd.zfcp=0.0.1c42,0x5001738030290140,0x0002000000000000 rd.zfcp=0.0.1c02,0x5001738030290140,0x0002000000000000 rd.zfcp=0.0.1c42,0x5001738030290151,0x0002000000000000 rd.zfcp=0.0.1c02,0x5001738030290151,0x0002000000000000


On worker node (without Day-2 operation):
-----------------------------------------

[core@bootstrap-0 ~]$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 112.2G  0 disk 
|-sda3   8:3    0   384M  0 part /boot
`-sda4   8:4    0 111.8G  0 part 
sdb      8:16   0 112.2G  0 disk 
|-sdb3   8:19   0   384M  0 part 
`-sdb4   8:20   0 111.8G  0 part 
sdc      8:32   0 112.2G  0 disk 
|-sdc3   8:35   0   384M  0 part 
`-sdc4   8:36   0 111.8G  0 part /sysroot
sdd      8:48   0 112.2G  0 disk 
|-sdd3   8:51   0   384M  0 part 
`-sdd4   8:52   0 111.8G  0 part 
[core@bootstrap-0 ~]$ cat /proc/cmdline 
random.trust_cpu=on ignition.platform.id=metal $ignition_firstboot ostree=/ostree/boot.0/rhcos/98ec6ea1fe3b3ff8df599883fdae7041851b624a27b00bb70a3f8488616b6e93/0 zfcp.allow_lun_scan=0 cio_ignore=all,!condev rd.znet=qeth,0.0.e383,0.0.e384,0.0.e385,layer2=1 rd.zfcp=0.0.1c42,0x5001738030290140,0x0002000000000000 rd.zfcp=0.0.1c02,0x5001738030290140,0x0002000000000000 rd.zfcp=0.0.1c42,0x5001738030290151,0x0002000000000000 rd.zfcp=0.0.1c02,0x5001738030290151,0x0002000000000000 root=UUID=7c1ea61a-5c64-4a97-807f-f8f55f949faa rw rootflags=prjquota 


[core@bootstrap-0 ~]$ lszdev
TYPE         ID                                              ON   PERS  NAMES
zfcp-host    0.0.1c02                                        yes  no    
zfcp-host    0.0.1c42                                        yes  no    
zfcp-lun     0.0.1c02:0x5001738030290140:0x0002000000000000  yes  no    sdd sg3
zfcp-lun     0.0.1c02:0x5001738030290151:0x0002000000000000  yes  no    sdc sg2
zfcp-lun     0.0.1c42:0x5001738030290140:0x0002000000000000  yes  no    sda sg0
zfcp-lun     0.0.1c42:0x5001738030290151:0x0002000000000000  yes  no    sdb sg1
qeth         0.0.e383:0.0.e384:0.0.e385                      yes  no    ence383
generic-ccw  0.0.0009                                        yes  no 

----------------------------------------------------------------------------------

BUT: On one system, I have run the installation 3 times, on the other system 2 times to have success.

The failed installations stop in emergency shell, but DNS / hostname / IP / FCP looks good:

lsblk
lsblk  
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT  
loop0         7:0    0   7.9G  0 loop  /run/ephemeral  
loop1         7:1    0 771.9M  0 loop  /sysroot  
sda           8:0    0 112.2G  0 disk    
`-mpatha    253:0    0 112.2G  0 mpath   
  |-mpatha3 253:1    0   384M  0 part    
  `-mpatha4 253:2    0 111.8G  0 part    
sdb           8:16   0 112.2G  0 disk    
`-mpatha    253:0    0 112.2G  0 mpath   
  |-mpatha3 253:1    0   384M  0 part    
  `-mpatha4 253:2    0 111.8G  0 part    
sdc           8:32   0 112.2G  0 disk    
`-mpatha    253:0    0 112.2G  0 mpath   
  |-mpatha3 253:1    0   384M  0 part    
  `-mpatha4 253:2    0 111.8G  0 part    
sdd           8:48   0 112.2G  0 disk    
`-mpatha    253:0    0 112.2G  0 mpath   
  |-mpatha3 253:1    0   384M  0 part    
  `-mpatha4 253:2    0 111.8G  0 part    
bash-4.4# lszdev
lszdev  
TYPE         ID                                              ON   PERS  NAMES  
zfcp-host    0.0.1c02                                        yes  no      
zfcp-host    0.0.1c42                                        yes  no      
zfcp-lun     0.0.1c02:0x5001738030290140:0x0002000000000000  yes  no    sdd sg3 
 
zfcp-lun     0.0.1c02:0x5001738030290151:0x0002000000000000  yes  no    sdc sg2 
 
zfcp-lun     0.0.1c42:0x5001738030290140:0x0002000000000000  yes  no    sdb sg1 
 
zfcp-lun     0.0.1c42:0x5001738030290151:0x0002000000000000  yes  no    sda sg0 
 
qeth         0.0.e383:0.0.e384:0.0.e385                      yes  no    ence383 
 
generic-ccw  0.0.0009                                        yes  no      


ping -c 4 bastion.m3558001.lnxne.boe
ping -c 4 bastion.m3558001.lnxne.boe  
PING bastion.m3558001.lnxne.boe (172.18.160.1) 56(84) bytes of data.  
64 bytes from 172.18.160.1 (172.18.160.1): icmp_seq=1 ttl=64 time=0.175 ms  
64 bytes from 172.18.160.1 (172.18.160.1): icmp_seq=2 ttl=64 time=0.203 ms  
64 bytes from 172.18.160.1 (172.18.160.1): icmp_seq=3 ttl=64 time=0.184 ms  
64 bytes from 172.18.160.1 (172.18.160.1): icmp_seq=4 ttl=64 time=0.207 ms  
  
--- bastion.m3558001.lnxne.boe ping statistics ---  
4 packets transmitted, 4 received, 0% packet loss, time 3090ms  
rtt min/avg/max/mdev = 0.175/0.192/0.207/0.016 ms  
bash-4.4# 

For some reason, the image was not downloaded.

@madeel mentioned that this could be related to https://bugzilla.redhat.com/show_bug.cgi?id=1991928 On my system, there is only one NIC configured.

Comment 23 Stefan Orth 2021-08-11 10:14:55 UTC

After Day-2 Operation:

Last login: Wed Aug 11 09:32:37 2021 from 10.107.1.51
[core@bootstrap-0 ~]$ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda           8:0    0 112.2G  0 disk  
`-mpatha    253:0    0 112.2G  0 mpath 
  |-mpatha3 253:1    0   384M  0 part  /boot
  `-mpatha4 253:2    0 111.8G  0 part  /sysroot
sdb           8:16   0 112.2G  0 disk  
`-mpatha    253:0    0 112.2G  0 mpath 
  |-mpatha3 253:1    0   384M  0 part  /boot
  `-mpatha4 253:2    0 111.8G  0 part  /sysroot
sdc           8:32   0 112.2G  0 disk  
`-mpatha    253:0    0 112.2G  0 mpath 
  |-mpatha3 253:1    0   384M  0 part  /boot
  `-mpatha4 253:2    0 111.8G  0 part  /sysroot
sdd           8:48   0 112.2G  0 disk  
`-mpatha    253:0    0 112.2G  0 mpath 
  |-mpatha3 253:1    0   384M  0 part  /boot
  `-mpatha4 253:2    0 111.8G  0 part  /sysroot
[core@bootstrap-0 ~]$ sudo multipath -ll
mpatha (20017380030290193) dm-0 IBM,2810XIV
size=112G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 1:0:0:2 sdc 8:32 active ready running
  |- 1:0:1:2 sdd 8:48 active ready running
  |- 0:0:0:2 sda 8:0  active ready running
  `- 0:0:1:2 sdb 8:16 active ready running

Comment 24 Dan Li 2021-08-11 14:16:45 UTC

The Z team tested this specific error described in the initial comment and the it has been fixed; however, other bugs have come up and we think it may be related to BZ 1991928: https://bugzilla.redhat.com/show_bug.cgi?id=1991928 so we are tracking the other bug. Moving it to VERIFIED at the moment

Comment 25 RHCOS Bug Bot 2021-08-11 21:59:21 UTC

The fix for this bug will not be delivered to customers until it lands in an updated bootimage.  That process is tracked in bug 1981999, which is in state ASSIGNED.  Moving this bug back to POST.

Comment 26 Muhammad Adeel (IBM) 2021-08-17 08:32:41 UTC

Setting needinfo- as there are other networking related patches coming with the tracker bug 1981999. We will wait for the bootimage bump.

Comment 27 Micah Abbott 2021-08-27 13:49:50 UTC

Boot image bump is merged, moving to MODIFIED

Comment 30 RHCOS Bug Bot 2021-09-02 16:36:31 UTC

The fix for this bug will not be delivered to customers until it lands in an updated bootimage.  That process is tracked in bug 1981999, which is in state ASSIGNED.  Moving this bug back to POST.

Comment 31 Dan Li 2021-09-06 15:25:13 UTC

Adding "reviewed-in-sprint" as the bug will not be resolved before the end of this sprint.

Comment 32 Dan Li 2021-09-20 18:20:16 UTC

Hi Muhammad, do you think this bug is still waiting for the PRs to merge? If so, we may want to add "reviewed-in-sprint"

Comment 33 Muhammad Adeel (IBM) 2021-09-21 06:35:13 UTC

Hi Dan, We could not reproduce the problem mentioned in the BZ anymore. We can close this.

Comment 34 Dan Li 2021-09-21 11:04:32 UTC

Closing per Muhammad's Comment 33

Comment 35 RHCOS Bug Bot 2021-09-21 11:05:17 UTC

The fix for this bug will not be delivered to customers until it lands in an updated bootimage.  That process is tracked in bug 1981999, which is in state POST.  Moving this bug back to POST.

Comment 36 Dan Li 2021-09-21 11:19:41 UTC

Bot is linked with the bootimage bug and therefore cannot close. Adding "reviewed-in-sprint"

Comment 37 RHCOS Bug Bot 2021-09-22 18:37:26 UTC

The fix for this bug has landed in a bootimage bump, as tracked in bug 1981999 (now in status MODIFIED).  Moving this bug to MODIFIED.

Comment 40 Douglas Slavens 2021-10-04 20:24:18 UTC

This has been verified.

Comment 41 Dan Li 2021-10-04 20:27:03 UTC

I believe Doug's Comment 40 refers back to Comment 33 regarding the fact that we can no longer reproduce the problem; however, we were unable to close this bug as it is linked to BZ 1981999 (which is VERIFIED)

Comment 43 errata-xmlrpc 2021-10-18 17:35:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.