Bug 1598740

Summary: On app pod restart, mpath device name is not mapped/created for some blockvolumes in the new initiator side
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Neha Berry <nberry>
Component: gluster-blockAssignee: Prasanna Kumar Kalever <prasanna.kalever>
Status: CLOSED ERRATA QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: unspecified    
Version: cns-3.10CC: bgoyal, hchiramm, jsafrane, kramdoss, madam, pkarampu, pprakash, prasanna.kalever, rhs-bugs, sankarshan, vbellur, xiubli
Target Milestone: ---Keywords: ZStream
Target Release: OCS 3.11.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: atomic-openshift-3.11.23-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1599742 (view as bug list) Environment:
Last Closed: 2019-02-07 03:38:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1599217, 1599742, 1609703, 1637413, 1637422    
Bug Blocks: 1641915, 1644154    
Attachments:
Description Flags
journalctl output none

Description Neha Berry 2018-07-06 11:03:08 UTC
On app pod restart, mpath device name is not mapped/created for some block volumes in new initiator side


Description of problem:
+++++++++++++++++++++++++++
we had an OCP 3.10 + CNS 3.10 setup There were around 19 app pods with block devices attached.  Started deletion of some app pods , which resulted in them re-spinning on the same/other nodes.

On logging to the initiator nodes, it is seen that on one of the initiator node, only 7 mpath devices got created, against a requirement of 9 mpaths for the 9 attached volumes.

# ll /dev/disk/by-path/ip*|wc -l    <------- 9 devices with 3 paths each
27
# multipath -ll|grep 'status='|wc -l  <------- lists only 7 devices with 3 paths each
21


App pods on node which have individual devices mapped
====================================================
blk-20-1-rg97p
/dev/sdag                                                                                          3.0G  234M  2.8G   8% /var/lib/mongodb/data

blk-14-1-cbnws
/dev/sdaf                                                                                          3.0G  236M  2.8G   8% /var/lib/mongodb/data


[root@dhcp41-238 pod_deletion]# rpm -qa|grep multipath
device-mapper-multipath-libs-0.4.9-119.el7.x86_64
device-mapper-multipath-0.4.9-119.el7.x86_64
[root@dhcp41-238 pod_deletion]# uname -a
Linux dhcp41-238.lab.eng.blr.redhat.com 3.10.0-862.6.3.el7.x86_64 #1 SMP Fri Jun 15 17:57:37 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@dhcp41-238 pod_deletion]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)
[root@dhcp41-238 pod_deletion]# 


Complete logs along with setup details will be attached to this bug. 

Some logs from initiator
++++++++++++
# journalctl -r|grep sdag
Jul 05 22:34:59 dhcp41-238.lab.eng.blr.redhat.com kernel: XFS (sdag): Ending clean mount
Jul 05 22:34:59 dhcp41-238.lab.eng.blr.redhat.com kernel: XFS (sdag): Mounting V5 Filesystem
Jul 05 22:34:58 dhcp41-238.lab.eng.blr.redhat.com kernel: sd 71:0:0:0: [sdag] Attached SCSI disk
Jul 05 22:34:58 dhcp41-238.lab.eng.blr.redhat.com kernel: sd 71:0:0:0: [sdag] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul 05 22:34:58 dhcp41-238.lab.eng.blr.redhat.com kernel: sd 71:0:0:0: [sdag] Mode Sense: 23 00 00 00
Jul 05 22:34:58 dhcp41-238.lab.eng.blr.redhat.com kernel: sd 71:0:0:0: [sdag] Write Protect is off
Jul 05 22:34:58 dhcp41-238.lab.eng.blr.redhat.com kernel: sd 71:0:0:0: [sdag] 6291456 512-byte logical blocks: (3.22 GB/3.00 GiB)

# journalctl -r|grep sdaf
Jul 05 22:34:56 dhcp41-238.lab.eng.blr.redhat.com kernel: XFS (sdaf): Ending clean mount
Jul 05 22:34:56 dhcp41-238.lab.eng.blr.redhat.com kernel: XFS (sdaf): Mounting V5 Filesystem
Jul 05 22:34:54 dhcp41-238.lab.eng.blr.redhat.com kernel: sd 70:0:0:0: [sdaf] Attached SCSI disk
Jul 05 22:34:54 dhcp41-238.lab.eng.blr.redhat.com kernel: sd 70:0:0:0: [sdaf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul 05 22:34:54 dhcp41-238.lab.eng.blr.redhat.com kernel: sd 70:0:0:0: [sdaf] Mode Sense: 23 00 00 00
Jul 05 22:34:54 dhcp41-238.lab.eng.blr.redhat.com kernel: sd 70:0:0:0: [sdaf] Write Protect is off
Jul 05 22:34:54 dhcp41-238.lab.eng.blr.redhat.com kernel: sd 70:0:0:0: [sdaf] 6291456 512-byte logical blocks: (3.22 GB/3.00 GiB)

For devices which have mpaths assigned, we do not see "Mounting V5 Filesystem" which is correct. Only for above 2 devices, we see that the Filesystem is getting mounted

[root@dhcp41-238 pod_deletion]# lsblk -S
NAME HCTL       TYPE VENDOR   MODEL             REV TRAN
sda  0:0:0:0    disk VMware   Virtual disk     1.0  
sdb  0:0:1:0    disk VMware   Virtual disk     1.0  
sdc  0:0:2:0    disk VMware   Virtual disk     1.0  
sdd  0:0:3:0    disk VMware   Virtual disk     1.0  
sde  0:0:4:0    disk VMware   Virtual disk     1.0  
sdf  0:0:5:0    disk VMware   Virtual disk     1.0  
sdg  60:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdh  61:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdi  62:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdj  63:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdk  64:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdl  65:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdm  57:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdn  58:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdo  59:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdp  66:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdq  67:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdr  68:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sds  45:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdt  46:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdu  47:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdv  48:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdw  49:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdx  50:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdy  51:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdz  52:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sr0  3:0:0:0    rom  NECVMWar VMware SATA CD00 1.00 sata
sdaa 53:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdab 54:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdac 55:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdad 56:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdae 69:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdaf 70:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
sdag 71:0:0:0   disk LIO-ORG  TCMU device      0002 iscsi
[root@dhcp41-238 pod_deletion]# 







Version-Release number of selected component (if applicable):
+++++++++++++++++++++++++++++++
# rpm -qa|grep multipath
device-mapper-multipath-libs-0.4.9-119.el7.x86_64
device-mapper-multipath-0.4.9-119.el7.x86_64
[root@dhcp41-238 pod_deletion]# uname -a
Linux dhcp41-238.lab.eng.blr.redhat.com 3.10.0-862.6.3.el7.x86_64 #1 SMP Fri Jun 15 17:57:37 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@dhcp41-238 pod_deletion]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.5 (Maipo)

 
[root@dhcp43-31 pod_deletion]# oc rsh glusterfs-storage-8j4db rpm -qa|grep targetcli
targetcli-2.1.fb46-6.el7_5.noarch
[root@dhcp43-31 pod_deletion]# oc rsh glusterfs-storage-8j4db rpm -qa|grep gluster
glusterfs-client-xlators-3.8.4-54.12.el7rhgs.x86_64
glusterfs-fuse-3.8.4-54.12.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-54.12.el7rhgs.x86_64
glusterfs-libs-3.8.4-54.12.el7rhgs.x86_64
glusterfs-3.8.4-54.12.el7rhgs.x86_64
glusterfs-api-3.8.4-54.12.el7rhgs.x86_64
glusterfs-cli-3.8.4-54.12.el7rhgs.x86_64
glusterfs-server-3.8.4-54.12.el7rhgs.x86_64
gluster-block-0.2.1-20.el7rhgs.x86_64
[root@dhcp43-31 pod_deletion]# oc version
oc v3.10.10
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dhcp43-31.lab.eng.blr.redhat.com:8443
openshift v3.10.10
kubernetes v1.10.0+b81c8f8
[root@dhcp43-31 pod_deletion]# 



How reproducible:
This is not 100% reproducible and is seen intermittently. As we can see, only 2 devices out of 9 didnt get the mpath device name.

Steps to Reproduce:
1. Configure a setup with APP pods  which have block device mounted
2. Deleted a set of 10 App pods in one go
3. Confirm that the app pods are re-spinned on either the same/other available nodes.
4. For each block device, Check for the multipath -ll, /dev/disk/by-path/ip-*  and iscsiadm -m session output from the initiator side

Actual results:
+++++++++++
2 block devices(with 3 paths each) are logged in but no mpath device name is assinged. Thus, on the APP pod, individula device , like /dev/sdag got mounted instead of /dev/mpath*

Expected results:
++++++++++++++
All iscsi devices which are logged in with more than 1 path, should have got a mpath dveice name assigned and multipath -ll should display all the devices.
Multipath is enabled

Comment 7 Jan Safranek 2018-07-10 13:30:26 UTC
Created attachment 1457822 [details]
journalctl output

Neha reproduced the bug with build with enhanced logging from comment #2. OpenShift attached three paths of one volume:

/dev/disk/by-path/ip-10.70.46.1:3260-iscsi-iqn.2016-12.org.gluster-block:d2a42cc7-6a07-47e0-9b96-c25706d2fad2-lun-0 -> ../../sdl
/dev/disk/by-path/ip-10.70.46.175:3260-iscsi-iqn.2016-12.org.gluster-block:d2a42cc7-6a07-47e0-9b96-c25706d2fad2-lun-0 -> ../../sds
/dev/disk/by-path/ip-10.70.46.75:3260-iscsi-iqn.2016-12.org.gluster-block:d2a42cc7-6a07-47e0-9b96-c25706d2fad2-lun-0 -> ../../sdp

And it mounted /dev/sds instead of multipath's /dev/dm-*.


Looking into the logs, I can see OpenShift indeed initiated attach, but it timed out waiting for 
10.70.46.1 and 10.70.46.75:

Jul 10 15:56:32 dhcp46-175.lab.eng.blr.redhat.com atomic-openshift-node[2453]: I0710 15:56:32.454227    2453 iscsi_util.go:314] iscsi: dev /dev/disk/by-path/ip-10.70.46.1:3260-iscsi-iqn.2016-12.org.gluster-block:d2a42cc7-6a07-47e0-9b96-c25706d2fad2-lun-0 err Could not attach disk: Timeout after 10s

Jul 10 15:56:42 dhcp46-175.lab.eng.blr.redhat.com atomic-openshift-node[2453]: I0710 15:56:42.217983    2453 iscsi_util.go:314] iscsi: dev /dev/disk/by-path/ip-10.70.46.75:3260-iscsi-iqn.2016-12.org.gluster-block:d2a42cc7-6a07-47e0-9b96-c25706d2fad2-lun-0 err Could not attach disk: Timeout after 10s


Only 10.70.46.175 succeeds:
Jul 10 15:56:43 dhcp46-175.lab.eng.blr.redhat.com atomic-openshift-node[2453]: I0710 15:56:43.375884    2453 iscsi_util.go:318] iscsi: dev /dev/disk/by-path/ip-10.70.46.175:3260-iscsi-iqn.2016-12.org.gluster-block:d2a42cc7-6a07-47e0-9b96-c25706d2fad2-lun-0 added to devicepath

Since only the *last* one succeeded, OpenShift quickly checked that there is no /sys/block/dm-* that has /sys/block/dm-X/slaves/sds (i.e. considers the path as not part of multipath) and mounts it.

There are several issues with this approach:

1. iscsi target or initiator is slow to attach the volume (that's intended, it's a stress test, right?)
2. OpenShift does not wait a while for multipath to evaluate a device.
3. OpenShift has no configurable parameter for attach timeout, 10s is hardcoded.

Comment 27 Prasanna Kumar Kalever 2018-11-10 17:26:34 UTC
Humble, could you please add the fixed in version ? Thanks!

Comment 28 Humble Chirammal 2018-11-13 07:12:27 UTC
(In reply to Prasanna Kumar Kalever from comment #27)
> Humble, could you please add the fixed in version ? Thanks!

atomic-openshift-3.11.23-1 and above.

Comment 31 errata-xmlrpc 2019-02-07 03:38:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0285