Description of problem: We have created several iscsi luns and the pv's for this luns. We have several projects which consumed this pv's via pvc. The 2 pods runs on 2 of the 3 nodes. This means that 2 iscsi mounts was on 1 node. On all this nodes was iscsi sessions. We then have scaled one pod on the node with 2 iscsi mounts down. Now the second pod crashes because the iscsi session was also gone. It looks like that the DetachDisk in https://github.com/openshift/origin/blob/85eb37b34f0657631592356d020cef5a58470f8e/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L165 Makes the logout to early. Version-Release number of selected component (if applicable): 3.3
Hi. the customer have take a deep look into the github sources and we have seen the following. Beginning in the function DetachDisk is the fuction getDevicePrefixRefCount called https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L165 which calls the function getDevicePrefixRefCount https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L182 https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L66 which calls mounter.List() https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/util/mount/mount_linux.go#L156 which calls listProcMounts which calls readProcMounts twice which calls readProcMountsFrom https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/util/mount/mount_linux.go#L284 Here are this lines ### hash := adler32.New() ... *out = append(*out, mp) ... return hash.Sum32(), nil ### which creates the hashes which are compared in https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/util/mount/mount_linux.go#L264 Based on these comparison result will the iscsiadm logout called. https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/volume/iscsi/iscsi_util.go#L192 We think now that the logout is called even if there still some mount on this node from running pods.
This bug needs to be cloned twice. We need one bug for OCP 3.5, 3.4 and 3.3
This has been merged into ocp and is in OCP v3.4.1.8 or newer.
Verified on openshift v3.3.1.15 kubernetes v1.3.0+52492b4 etcd 2.3.0+git Steps:1 1. Create two Pods on same nodes and mounts same iscsi volume over same session. Make sure their only difference is the LUN number. 2. Verify Pods are both running. 3. Delete one of the Pods. 4. The remaining Pod is still running.
Also verified on openshift v3.4.1.8 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0512