643236 – iscsi: get nopout and conn errors.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 643236 - iscsi: get nopout and conn errors.

Summary: iscsi: get nopout and conn errors.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Mike Christie
QA Contact:	Gris Ge
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-10-15 03:18 UTC by Mike Christie
Modified:	2011-05-19 12:31 UTC (History)
CC List:	3 users (show)
Fixed In Version:	kernel-2.6.32-112.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-05-19 12:31:55 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0542	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update	2011-05-19 11:58:07 UTC

Description Mike Christie 2010-10-15 03:18:30 UTC

Description of problem:


The scsi layer is sending too many commands to the iscsi layer (more than target->can_queue). The iscsi layer can then end up using all the IO structs for scsi command IO. If the target sends a nop as ping to us, we till not have a struct to use for the reply. In /var/log/messages you will see:

Could not send nopout

This may be followed by a conn error 1011 or 1020 error if the target decides to then drop the session as a result of the nop being dropped.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:


This can be worked around by setting the node.session.cmds_max value higher than the target's command window.

Log out of the targets. Set the value in /etc/iscsi/iscsid.conf then rerun the discovery command and relogin into targets.

Or logout of the targets and run:

iscsiadm -m node -o update node.session.cmds_max -v $NEW_VALUE

Then relogin.

Comment 2 RHEL Program Management 2010-10-15 03:28:45 UTC

Thank you for your bug report. This issue was evaluated for inclusion
in the current release of Red Hat Enterprise Linux. Unfortunately, we
are unable to address this request in the current release. Because we
are in the final stage of Red Hat Enterprise Linux 6 development, only
significant, release-blocking issues involving serious regressions and
data corruption can be considered.

If you believe this issue meets the release blocking criteria as
defined and communicated to you by your Red Hat Support representative,
please ask your representative to file this issue as a blocker for the
current release. Otherwise, ask that it be evaluated for inclusion in
the next minor release of Red Hat Enterprise Linux.

Comment 4 RHEL Program Management 2010-10-21 14:29:52 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 6 Aristeu Rozanski 2011-02-03 17:24:53 UTC

Patch(es) available on kernel-2.6.32-112.el6

Comment 10 Gris Ge 2011-03-09 03:24:19 UTC

Mike,

I failed to reproduce this issue by reduce node.session.cmds_max to 16 and perform huge IO aganist that iscsi disk for about 2 hour.

The network between target and initiator is about 200ms.
10 process 'perl -e while(1){}' is running to consume CPU.

I don't have any change to see any 'Could not send nopout' error.

Can you advise me on how to reproduce this problem?

Comment 11 Mike Christie 2011-03-10 00:41:22 UTC

It is a little difficult.

What target are you using? It is easiest to hit with a Equallogic target.


You need to use bnx2i or cxgb3i (iscsi_tcp does not show the problem), make sure your IO test is set to send more than cmds_max IOs and your target also has to support that many IOs.

So I think most targets support at least 32 cmds. So set

node.session.cmds_max = 16
node.session.queue_depth = 32

(either set this in iscsid.conf then rerun iscsiadm discovery command so the new iscsid.conf values are used or run iscsiadm -m node -o update -n $NAME_OF_SETTING_ABOVE -v $VALUE_ABOVE on a existing target portal record).


Then we want to send more than cmds_max IOs. With this command we would send about 64:

disktest -PT -T130 -h1 -K64 -B256k -ID /dev/sdXYZ



If you cannot find a EQL target, then if you use the settings about and run the command above and you do

cat /sys/class/scsi_host/hostX/host_busy

then that value should always be less than cmds_max if the problem is fixed. If the problem is larger then cmds_max then you hit the problem.

Comment 12 Gris Ge 2011-04-28 08:29:47 UTC

Mike,

We have emulex be2iscsi at hand (not configured).
Does that hit this problem?

Comment 13 Mike Christie 2011-04-29 02:30:17 UTC

Yeah you would hit the problem with that driver.

I think you can just do a sanity check. I tested it here and I believe Chelsio tested it too (They are the ones that pinged me about merging the patch and tested it upstream).

To get the timing right is really hard. I do not think it is worth your time to try and replicate the problem. As long as there are not regressions it should be ok.

Comment 14 Gris Ge 2011-04-29 03:53:18 UTC

Code reviewed.
Patch applyed.

iscsi basic funciton was tested by errata and it's pass the test.

No Hardware and Sanity Only.

Comment 15 errata-xmlrpc 2011-05-19 12:31:55 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Note You need to log in before you can comment on or make changes to this bug.