Bug 1419607

Summary: iscsi logout even still other pods use iscsi on same node
Product: OpenShift Container Platform Reporter: Aleks Lazic <aleks>
Component: StorageAssignee: hchen
Status: CLOSED ERRATA QA Contact: Jianwei Hou <jhou>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: aos-bugs, bchilds, bmchugh, eparis, hchen, tdawson
Target Milestone: ---   
Target Release: 3.4.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1426775 1426778 (view as bug list) Environment:
Last Closed: 2017-03-15 20:02:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Aleks Lazic 2017-02-06 15:15:43 UTC
Description of problem:

We have created several iscsi luns and the pv's for this luns.
We have several projects which consumed this pv's via pvc.
The 2 pods runs on 2 of the 3 nodes. This means that 2 iscsi mounts was on 1 node.

On all this nodes was iscsi sessions.

We then have scaled one pod on the node with 2 iscsi mounts down.
Now the second pod crashes because the iscsi session was also gone.

It looks like that the DetachDisk in
https://github.com/openshift/origin/blob/85eb37b34f0657631592356d020cef5a58470f8e/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L165

Makes the logout to early.

Version-Release number of selected component (if applicable):

3.3

Comment 2 Aleks Lazic 2017-02-07 09:39:17 UTC
Hi.

the customer have take a deep look into the github sources and we have seen the following.

Beginning in the function DetachDisk is the fuction getDevicePrefixRefCount called

https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L165

which calls the function getDevicePrefixRefCount
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L182
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L66


which calls mounter.List()

https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/util/mount/mount_linux.go#L156

which calls listProcMounts
which calls readProcMounts twice
which calls readProcMountsFrom

https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/util/mount/mount_linux.go#L284

Here are this lines

###
hash := adler32.New()
...
*out = append(*out, mp)
...
return hash.Sum32(), nil
###

which creates the hashes which are compared in

https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/util/mount/mount_linux.go#L264

Based on these comparison result will the iscsiadm logout called.

https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L192

We think now that the logout is called even if there still some mount on this node from running pods.

Comment 15 Troy Dawson 2017-02-21 22:31:11 UTC
This bug needs to be cloned twice.  We need one bug for OCP 3.5, 3.4 and 3.3

Comment 16 Troy Dawson 2017-02-24 20:26:01 UTC
This has been merged into ocp and is in OCP v3.4.1.8 or newer.

Comment 18 Jianwei Hou 2017-02-27 06:03:24 UTC
Verified on 
openshift v3.3.1.15
kubernetes v1.3.0+52492b4
etcd 2.3.0+git

Steps:1
1. Create two Pods on same nodes and mounts same iscsi volume over same session. Make sure their only difference is the LUN number.
2. Verify Pods  are both running.
3. Delete one of the Pods.
4. The remaining Pod is still running.

Comment 19 Jianwei Hou 2017-02-27 06:05:04 UTC
Also verified on 
openshift v3.4.1.8
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

Comment 21 errata-xmlrpc 2017-03-15 20:02:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0512