Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2046533

Summary:	nvme-cli: nvme connect-all returns Failed to write to /dev/nvme-fabrics
Product:	Red Hat Enterprise Linux 8	Reporter:	Marco Patalano <mpatalan>
Component:	nvme-cli	Assignee:	Maurizio Lombardi <mlombard>
Status:	CLOSED WONTFIX	QA Contact:	Marco Patalano <mpatalan>
Severity:	unspecified	Docs Contact:
Priority:	medium
Version:	8.6	CC:	arun.c, jbrassow, jmeneghi, thomasberryiif
Target Milestone:	rc	Keywords:	Triaged
Target Release:	---	Flags:	pm-rhel: mirror+
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	NVMe_P2
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-07-26 07:28:16 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Marco Patalano 2022-01-26 21:50:18 UTC

Description of problem: I am encountering some issues when running nvme connect-all in my NVMe-TCP environment. In the first scenario, the discovery.conf looks as follows:

# cat /etc/nvme/discovery.conf 
-t tcp -a 172.16.0.101 -s 4420
-t tcp -a 172.16.1.101 -s 4420
-t tcp -a 172.16.0.102 -s 4420
-t tcp -a 172.16.1.102 -s 4420

No connections have been made at this point and nvme list is empty:
# nvme list
Node                  SN                   Model                                    Namespace Usage                      Format           FW Rev  
--------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------

I then issue nvme connect-all:

# nvme connect-all
traddr=172.16.1.101 is already connected
traddr=172.16.0.101 is already connected
traddr=172.16.1.102 is already connected
traddr=172.16.0.102 is already connected

The connections are made successfully, but I believe this would be confusing to the customer.

# nvme list-subsys /dev/nvme0n1
nvme-subsys0 - NQN=nqn.1992-08.com.netapp:sn.f9f91ad1ea5811ebb38f00a098cbcac6:subsystem.tcp_nvme_ss_1
\
 +- nvme0 tcp traddr=172.16.1.101 trsvcid=4420 live optimized
 +- nvme1 tcp traddr=172.16.0.101 trsvcid=4420 live optimized
 +- nvme2 tcp traddr=172.16.1.102 trsvcid=4420 live non-optimized
 +- nvme3 tcp traddr=172.16.0.102 trsvcid=4420 live non-optimized

In the next scenario, I specify the host-traddr:

# cat /etc/nvme/discovery.conf 
-t tcp -a 172.16.0.101 -w 172.16.0.110 -s 4420
-t tcp -a 172.16.1.101 -w 172.16.1.110 -s 4420
-t tcp -a 172.16.0.102 -w 172.16.0.110 -s 4420
-t tcp -a 172.16.1.102 -w 172.16.1.110 -s 4420

I then issue nvme connect-all (the connections were removed with nvme disconnect-all prior to this test):

# nvme connect-all
Failed to write to /dev/nvme-fabrics: Connection timed out
Failed to write to /dev/nvme-fabrics: Connection timed out
Failed to write to /dev/nvme-fabrics: Connection timed out
Failed to write to /dev/nvme-fabrics: Connection timed out

It appears that the command fails from the output above - yet all the connections are successful:

# nvme list-subsys /dev/nvme0n1
nvme-subsys0 - NQN=nqn.1992-08.com.netapp:sn.f9f91ad1ea5811ebb38f00a098cbcac6:subsystem.tcp_nvme_ss_1
\
 +- nvme0 tcp traddr=172.16.0.101 trsvcid=4420 host_traddr=172.16.0.110 live optimized
 +- nvme1 tcp traddr=172.16.1.101 trsvcid=4420 host_traddr=172.16.1.110 live optimized
 +- nvme2 tcp traddr=172.16.0.102 trsvcid=4420 host_traddr=172.16.0.110 live non-optimized
 +- nvme3 tcp traddr=172.16.1.102 trsvcid=4420 host_traddr=172.16.1.110 live non-optimized


Version-Release number of selected component (if applicable):
# rpm -qa nvme-cli
nvme-cli-1.16-3.el8.x86_64

How reproducible: Often


Steps to Reproduce:
1. see above

Comment 1 John Meneghini 2022-08-30 12:22:19 UTC

Marco what's the storage array used to find this bug. DellEMC? I think this is a long standing problem that either needs to be fixed upstream, or pushed back to the vendor.  I don't think we see this problem with anything but the specific vendors array.

Comment 2 TimmyJ 2022-12-29 09:54:36 UTC

Did you manage to fix the write error? I have a similar error, I wonder what you found the solution.

Comment 3 TimmyJ 2022-12-29 15:16:37 UTC

Did you manage to fix the write error? I have a similar error, I wonder what you found the solution. I tried searching the internet for solutions, using the essay title maker https://papersowl.com/essay-title-generator/ to generate the right search queries. But so far I have not found the information, and because of this I cannot finish my educational project.

Comment 4 John Meneghini 2023-04-17 20:48:20 UTC

Marco, is this still a problem with RHEL 9.2?

Can we mark this as fixed in the current release.

Note: nvme/tcp is not supported in RHEL 8

Comment 5 Marco Patalano 2023-04-19 13:14:23 UTC

Hello John,

For RHEL-9.2, I do not see the following messages:

Failed to write to /dev/nvme-fabrics: Connection timed out

However, for both RHEL-8.8 and RHEL-9.2, I continue to see these messages when issuing "nvme connect-all":

# nvme connect-all
traddr=172.18.210.60 is already connected
traddr=172.18.210.61 is already connected
traddr=172.18.220.61 is already connected
traddr=172.18.220.60 is already connected

As mentioned previously, this may be confusing to customers as we did not establish any connections prior to issuing the command.

Finally, for RHEL-8.8, I continue to see the following only when using the Powerstore on the backend:

# nvme connect-all
Failed to write to /dev/nvme-fabrics: Connection timed out
Failed to write to /dev/nvme-fabrics: Connection timed out
Failed to write to /dev/nvme-fabrics: Connection timed out
Failed to write to /dev/nvme-fabrics: Connection timed out
Failed to write to /dev/nvme-fabrics: Connection timed out
Failed to write to /dev/nvme-fabrics: Connection timed out

Since NVMe-TCP is tech preview in RHEL-8, I am OK with closing this BZ. However, should I open a separate BZ to determine if the messaging for successful connections needs to be fixed?

Marco

Comment 7 RHEL Program Management 2023-07-26 07:28:16 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.