Description of problem: I am encountering some issues when running nvme connect-all in my NVMe-TCP environment. In the first scenario, the discovery.conf looks as follows: # cat /etc/nvme/discovery.conf -t tcp -a 172.16.0.101 -s 4420 -t tcp -a 172.16.1.101 -s 4420 -t tcp -a 172.16.0.102 -s 4420 -t tcp -a 172.16.1.102 -s 4420 No connections have been made at this point and nvme list is empty: # nvme list Node SN Model Namespace Usage Format FW Rev --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- I then issue nvme connect-all: # nvme connect-all traddr=172.16.1.101 is already connected traddr=172.16.0.101 is already connected traddr=172.16.1.102 is already connected traddr=172.16.0.102 is already connected The connections are made successfully, but I believe this would be confusing to the customer. # nvme list-subsys /dev/nvme0n1 nvme-subsys0 - NQN=nqn.1992-08.com.netapp:sn.f9f91ad1ea5811ebb38f00a098cbcac6:subsystem.tcp_nvme_ss_1 \ +- nvme0 tcp traddr=172.16.1.101 trsvcid=4420 live optimized +- nvme1 tcp traddr=172.16.0.101 trsvcid=4420 live optimized +- nvme2 tcp traddr=172.16.1.102 trsvcid=4420 live non-optimized +- nvme3 tcp traddr=172.16.0.102 trsvcid=4420 live non-optimized In the next scenario, I specify the host-traddr: # cat /etc/nvme/discovery.conf -t tcp -a 172.16.0.101 -w 172.16.0.110 -s 4420 -t tcp -a 172.16.1.101 -w 172.16.1.110 -s 4420 -t tcp -a 172.16.0.102 -w 172.16.0.110 -s 4420 -t tcp -a 172.16.1.102 -w 172.16.1.110 -s 4420 I then issue nvme connect-all (the connections were removed with nvme disconnect-all prior to this test): # nvme connect-all Failed to write to /dev/nvme-fabrics: Connection timed out Failed to write to /dev/nvme-fabrics: Connection timed out Failed to write to /dev/nvme-fabrics: Connection timed out Failed to write to /dev/nvme-fabrics: Connection timed out It appears that the command fails from the output above - yet all the connections are successful: # nvme list-subsys /dev/nvme0n1 nvme-subsys0 - NQN=nqn.1992-08.com.netapp:sn.f9f91ad1ea5811ebb38f00a098cbcac6:subsystem.tcp_nvme_ss_1 \ +- nvme0 tcp traddr=172.16.0.101 trsvcid=4420 host_traddr=172.16.0.110 live optimized +- nvme1 tcp traddr=172.16.1.101 trsvcid=4420 host_traddr=172.16.1.110 live optimized +- nvme2 tcp traddr=172.16.0.102 trsvcid=4420 host_traddr=172.16.0.110 live non-optimized +- nvme3 tcp traddr=172.16.1.102 trsvcid=4420 host_traddr=172.16.1.110 live non-optimized Version-Release number of selected component (if applicable): # rpm -qa nvme-cli nvme-cli-1.16-3.el8.x86_64 How reproducible: Often Steps to Reproduce: 1. see above
Marco what's the storage array used to find this bug. DellEMC? I think this is a long standing problem that either needs to be fixed upstream, or pushed back to the vendor. I don't think we see this problem with anything but the specific vendors array.
Did you manage to fix the write error? I have a similar error, I wonder what you found the solution.
Did you manage to fix the write error? I have a similar error, I wonder what you found the solution. I tried searching the internet for solutions, using the essay title maker https://papersowl.com/essay-title-generator/ to generate the right search queries. But so far I have not found the information, and because of this I cannot finish my educational project.
Marco, is this still a problem with RHEL 9.2? Can we mark this as fixed in the current release. Note: nvme/tcp is not supported in RHEL 8
Hello John, For RHEL-9.2, I do not see the following messages: Failed to write to /dev/nvme-fabrics: Connection timed out However, for both RHEL-8.8 and RHEL-9.2, I continue to see these messages when issuing "nvme connect-all": # nvme connect-all traddr=172.18.210.60 is already connected traddr=172.18.210.61 is already connected traddr=172.18.220.61 is already connected traddr=172.18.220.60 is already connected As mentioned previously, this may be confusing to customers as we did not establish any connections prior to issuing the command. Finally, for RHEL-8.8, I continue to see the following only when using the Powerstore on the backend: # nvme connect-all Failed to write to /dev/nvme-fabrics: Connection timed out Failed to write to /dev/nvme-fabrics: Connection timed out Failed to write to /dev/nvme-fabrics: Connection timed out Failed to write to /dev/nvme-fabrics: Connection timed out Failed to write to /dev/nvme-fabrics: Connection timed out Failed to write to /dev/nvme-fabrics: Connection timed out Since NVMe-TCP is tech preview in RHEL-8, I am OK with closing this BZ. However, should I open a separate BZ to determine if the messaging for successful connections needs to be fixed? Marco
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.