Bug 1598740
Summary: | On app pod restart, mpath device name is not mapped/created for some blockvolumes in the new initiator side | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Neha Berry <nberry> | ||||
Component: | gluster-block | Assignee: | Prasanna Kumar Kalever <prasanna.kalever> | ||||
Status: | CLOSED ERRATA | QA Contact: | Neha Berry <nberry> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | cns-3.10 | CC: | bgoyal, hchiramm, jsafrane, kramdoss, madam, pkarampu, pprakash, prasanna.kalever, rhs-bugs, sankarshan, vbellur, xiubli | ||||
Target Milestone: | --- | Keywords: | ZStream | ||||
Target Release: | OCS 3.11.1 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | atomic-openshift-3.11.23-1 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1599742 (view as bug list) | Environment: | |||||
Last Closed: | 2019-02-07 03:38:29 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1599217, 1599742, 1609703, 1637413, 1637422 | ||||||
Bug Blocks: | 1641915, 1644154 | ||||||
Attachments: |
|
Description
Neha Berry
2018-07-06 11:03:08 UTC
Created attachment 1457822 [details] journalctl output Neha reproduced the bug with build with enhanced logging from comment #2. OpenShift attached three paths of one volume: /dev/disk/by-path/ip-10.70.46.1:3260-iscsi-iqn.2016-12.org.gluster-block:d2a42cc7-6a07-47e0-9b96-c25706d2fad2-lun-0 -> ../../sdl /dev/disk/by-path/ip-10.70.46.175:3260-iscsi-iqn.2016-12.org.gluster-block:d2a42cc7-6a07-47e0-9b96-c25706d2fad2-lun-0 -> ../../sds /dev/disk/by-path/ip-10.70.46.75:3260-iscsi-iqn.2016-12.org.gluster-block:d2a42cc7-6a07-47e0-9b96-c25706d2fad2-lun-0 -> ../../sdp And it mounted /dev/sds instead of multipath's /dev/dm-*. Looking into the logs, I can see OpenShift indeed initiated attach, but it timed out waiting for 10.70.46.1 and 10.70.46.75: Jul 10 15:56:32 dhcp46-175.lab.eng.blr.redhat.com atomic-openshift-node[2453]: I0710 15:56:32.454227 2453 iscsi_util.go:314] iscsi: dev /dev/disk/by-path/ip-10.70.46.1:3260-iscsi-iqn.2016-12.org.gluster-block:d2a42cc7-6a07-47e0-9b96-c25706d2fad2-lun-0 err Could not attach disk: Timeout after 10s Jul 10 15:56:42 dhcp46-175.lab.eng.blr.redhat.com atomic-openshift-node[2453]: I0710 15:56:42.217983 2453 iscsi_util.go:314] iscsi: dev /dev/disk/by-path/ip-10.70.46.75:3260-iscsi-iqn.2016-12.org.gluster-block:d2a42cc7-6a07-47e0-9b96-c25706d2fad2-lun-0 err Could not attach disk: Timeout after 10s Only 10.70.46.175 succeeds: Jul 10 15:56:43 dhcp46-175.lab.eng.blr.redhat.com atomic-openshift-node[2453]: I0710 15:56:43.375884 2453 iscsi_util.go:318] iscsi: dev /dev/disk/by-path/ip-10.70.46.175:3260-iscsi-iqn.2016-12.org.gluster-block:d2a42cc7-6a07-47e0-9b96-c25706d2fad2-lun-0 added to devicepath Since only the *last* one succeeded, OpenShift quickly checked that there is no /sys/block/dm-* that has /sys/block/dm-X/slaves/sds (i.e. considers the path as not part of multipath) and mounts it. There are several issues with this approach: 1. iscsi target or initiator is slow to attach the volume (that's intended, it's a stress test, right?) 2. OpenShift does not wait a while for multipath to evaluate a device. 3. OpenShift has no configurable parameter for attach timeout, 10s is hardcoded. Humble, could you please add the fixed in version ? Thanks! (In reply to Prasanna Kumar Kalever from comment #27) > Humble, could you please add the fixed in version ? Thanks! atomic-openshift-3.11.23-1 and above. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0285 |