Bug 1595763 - "iSCSI Login negotiation failed" messages in logs while draining pods from one node to another (depends on gluster-block bug 1597320 )
Summary: "iSCSI Login negotiation failed" messages in logs while draining pods from on...
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: kubernetes
Version: cns-3.10
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Humble Chirammal
QA Contact: Prasanth
Depends On: 1597320
TreeView+ depends on / blocked
Reported: 2018-06-27 13:39 UTC by Neha Berry
Modified: 2020-02-18 02:18 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1597320 (view as bug list)
Last Closed:
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1597320 'high' 'ASSIGNED' '[Tracking] ''iSCSI Login negotiation failed'' messages in logs while draining pods from one node to another' 2019-11-22 15:42:10 UTC

Internal Links: 1597320

Description Neha Berry 2018-06-27 13:39:56 UTC
Description of problem:
we were trying to reproduce the issue of Bug- https://bugzilla.redhat.com/show_bug.cgi?id=1550279

The setup was using OCP3.10+CNS3.10 and had multiple APP pods with block devices attached.

As per Test case step, we were constantly draining pods from nodes. After one such attempt of drain, when the pods were coming up on another node, it was seen that for 2-3 devices, 1 path failed to login. 2 out of 3 paths were actually logged in from the new initiator node. Also, the logs showed "iSCSI Login negotiation failed" messages.

As per Gluster-block dev
I can notice that few of the devices are having < 3 paths. Also the dmesg has

[67483.712820] Initiator is requesting CSG: 1, has not been successfully authenticated, and the Target is enforcing iSCSI Authentication, login failed.                                                            
[67483.713583] iSCSI Login negotiation failed.                                   
[67483.714266]  connection20:0: detected conn error (1020)                       
[67483.739375] scsi host53: iSCSI Initiator over TCP/IP

Which means that in your setup, you have hit the authentication issue. In order to further debug the exact device and show that CHAP credentials are not supplied to that path, we need the machine alive. 

As seen from output : 
Multipath -ll listed 2 paths for those devices, instead of 3

mpathe (360014059f617f130d7c47d390aa82389) dm-60 LIO-ORG ,TCMU device     
size=2.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 93:0:0:0 sdau 66:224 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 95:0:0:0 sdaw 67:0   active ready running

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Out of 4 nodes, disabled scheduling on 3 nodes and created 15 mongo db app pods using block PVCs
2.Created 15 mongodb app pods which in turn used block pvcs

3. Drained one node
4. All pods moved to the new node
5. Repeated the above steps multiple times.In 1 attempt, it was seen that a device had only 2 paths instead of the expected 3 paths.

Actual results:
Some devices faced login issues and had only 2 paths instead of 3

Expected results:
On the new node, all 3 paths should be restored after a drain from old node.

Additional info:

Currently we do not have fresh logs . The logs from bug https://bugzilla.redhat.com/show_bug.cgi?id=1550279 have the dmesg logs from nodes.


Comment 8 Humble Chirammal 2019-07-09 09:20:26 UTC
Changing the status to ASSIGNED to reflect the status of gluster block bug on which this bugzilla depends on.

Note You need to log in before you can comment on or make changes to this bug.