Bug 1595763

Summary: "iSCSI Login negotiation failed" messages in logs while draining pods from one node to another (depends on gluster-block bug 1597320 )
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Neha Berry <nberry>
Component: kubernetesAssignee: Prasanna Kumar Kalever <prasanna.kalever>
Status: CLOSED DUPLICATE QA Contact: Arun Kumar <arukumar>
Severity: high Docs Contact:
Priority: high    
Version: cns-3.10CC: amark, jmulligan, kramdoss, madam, pkarampu, pprakash, prasanna.kalever, rhs-bugs, vbellur, xiubli
Target Milestone: ---Keywords: Tracking
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1597320 (view as bug list) Environment:
Last Closed: 2020-03-12 12:27:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1597320    
Bug Blocks:    

Description Neha Berry 2018-06-27 13:39:56 UTC
Description of problem:
++++++++++++
we were trying to reproduce the issue of Bug- https://bugzilla.redhat.com/show_bug.cgi?id=1550279

The setup was using OCP3.10+CNS3.10 and had multiple APP pods with block devices attached.

As per Test case step, we were constantly draining pods from nodes. After one such attempt of drain, when the pods were coming up on another node, it was seen that for 2-3 devices, 1 path failed to login. 2 out of 3 paths were actually logged in from the new initiator node. Also, the logs showed "iSCSI Login negotiation failed" messages.

As per Gluster-block dev
"
I can notice that few of the devices are having < 3 paths. Also the dmesg has

[67483.712820] Initiator is requesting CSG: 1, has not been successfully authenticated, and the Target is enforcing iSCSI Authentication, login failed.                                                            
[67483.713583] iSCSI Login negotiation failed.                                   
[67483.714266]  connection20:0: detected conn error (1020)                       
[67483.739375] scsi host53: iSCSI Initiator over TCP/IP

Which means that in your setup, you have hit the authentication issue. In order to further debug the exact device and show that CHAP credentials are not supplied to that path, we need the machine alive. 

"
As seen from output : 
Multipath -ll listed 2 paths for those devices, instead of 3

mpathe (360014059f617f130d7c47d390aa82389) dm-60 LIO-ORG ,TCMU device     
size=2.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 93:0:0:0 sdau 66:224 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 95:0:0:0 sdaw 67:0   active ready running


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Out of 4 nodes, disabled scheduling on 3 nodes and created 15 mongo db app pods using block PVCs
2.Created 15 mongodb app pods which in turn used block pvcs

3. Drained one node
4. All pods moved to the new node
5. Repeated the above steps multiple times.In 1 attempt, it was seen that a device had only 2 paths instead of the expected 3 paths.

Actual results:
++++++++++++
Some devices faced login issues and had only 2 paths instead of 3

Expected results:
On the new node, all 3 paths should be restored after a drain from old node.

Additional info:

Currently we do not have fresh logs . The logs from bug https://bugzilla.redhat.com/show_bug.cgi?id=1550279 have the dmesg logs from nodes.

http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1550279/

Comment 8 Humble Chirammal 2019-07-09 09:20:26 UTC
Changing the status to ASSIGNED to reflect the status of gluster block bug on which this bugzilla depends on.