Bug 1419607 - iscsi logout even still other pods use iscsi on same node
Summary: iscsi logout even still other pods use iscsi on same node
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.4.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.4.z
Assignee: hchen
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-06 15:15 UTC by Aleks Lazic
Modified: 2020-04-15 15:13 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1426775 1426778 (view as bug list)
Environment:
Last Closed: 2017-03-15 20:02:35 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Origin (Github) 12990 0 None None None 2017-02-16 14:57:47 UTC
Red Hat Product Errata RHBA-2017:0512 0 normal SHIPPED_LIVE OpenShift Container Platform 3.4.1.10, 3.3.1.17, and 3.2.1.28 bug fix update 2017-03-16 00:01:17 UTC

Description Aleks Lazic 2017-02-06 15:15:43 UTC
Description of problem:

We have created several iscsi luns and the pv's for this luns.
We have several projects which consumed this pv's via pvc.
The 2 pods runs on 2 of the 3 nodes. This means that 2 iscsi mounts was on 1 node.

On all this nodes was iscsi sessions.

We then have scaled one pod on the node with 2 iscsi mounts down.
Now the second pod crashes because the iscsi session was also gone.

It looks like that the DetachDisk in
https://github.com/openshift/origin/blob/85eb37b34f0657631592356d020cef5a58470f8e/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L165

Makes the logout to early.

Version-Release number of selected component (if applicable):

3.3

Comment 2 Aleks Lazic 2017-02-07 09:39:17 UTC
Hi.

the customer have take a deep look into the github sources and we have seen the following.

Beginning in the function DetachDisk is the fuction getDevicePrefixRefCount called

https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L165

which calls the function getDevicePrefixRefCount
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L182
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L66


which calls mounter.List()

https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/util/mount/mount_linux.go#L156

which calls listProcMounts
which calls readProcMounts twice
which calls readProcMountsFrom

https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/util/mount/mount_linux.go#L284

Here are this lines

###
hash := adler32.New()
...
*out = append(*out, mp)
...
return hash.Sum32(), nil
###

which creates the hashes which are compared in

https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/util/mount/mount_linux.go#L264

Based on these comparison result will the iscsiadm logout called.

https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L192

We think now that the logout is called even if there still some mount on this node from running pods.

Comment 15 Troy Dawson 2017-02-21 22:31:11 UTC
This bug needs to be cloned twice.  We need one bug for OCP 3.5, 3.4 and 3.3

Comment 16 Troy Dawson 2017-02-24 20:26:01 UTC
This has been merged into ocp and is in OCP v3.4.1.8 or newer.

Comment 18 Jianwei Hou 2017-02-27 06:03:24 UTC
Verified on 
openshift v3.3.1.15
kubernetes v1.3.0+52492b4
etcd 2.3.0+git

Steps:1
1. Create two Pods on same nodes and mounts same iscsi volume over same session. Make sure their only difference is the LUN number.
2. Verify Pods  are both running.
3. Delete one of the Pods.
4. The remaining Pod is still running.

Comment 19 Jianwei Hou 2017-02-27 06:05:04 UTC
Also verified on 
openshift v3.4.1.8
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

Comment 21 errata-xmlrpc 2017-03-15 20:02:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0512


Note You need to log in before you can comment on or make changes to this bug.