Description of problem:
We have created several iscsi luns and the pv's for this luns.
We have several projects which consumed this pv's via pvc.
The 2 pods runs on 2 of the 3 nodes. This means that 2 iscsi mounts was on 1 node.
On all this nodes was iscsi sessions.
We then have scaled one pod on the node with 2 iscsi mounts down.
Now the second pod crashes because the iscsi session was also gone.
It looks like that the DetachDisk in
Makes the logout to early.
Version-Release number of selected component (if applicable):
the customer have take a deep look into the github sources and we have seen the following.
Beginning in the function DetachDisk is the fuction getDevicePrefixRefCount called
which calls the function getDevicePrefixRefCount
which calls mounter.List()
which calls listProcMounts
which calls readProcMounts twice
which calls readProcMountsFrom
Here are this lines
hash := adler32.New()
*out = append(*out, mp)
return hash.Sum32(), nil
which creates the hashes which are compared in
Based on these comparison result will the iscsiadm logout called.
We think now that the logout is called even if there still some mount on this node from running pods.
This bug needs to be cloned twice. We need one bug for OCP 3.5, 3.4 and 3.3
This has been merged into ocp and is in OCP v220.127.116.11 or newer.
1. Create two Pods on same nodes and mounts same iscsi volume over same session. Make sure their only difference is the LUN number.
2. Verify Pods are both running.
3. Delete one of the Pods.
4. The remaining Pod is still running.
Also verified on
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.