Description of problem: I've noted this very strange, but very reproducible problem, and I think I have a simple fix for it. The problem is that when a RHEL server is rebooted and one of the targets is initially unreachable because the network isn't quite up, strange behavior occurs. When the target is eventually reachable, the first thing sent over the socket is a syslog message (seriously, I have traces). Upon inspecting the code, and looking at the messages occuring during the failure to connect, I found a case where close() will be called multiple times on the same socket. The attached patch fixes the problem. Sequence of code looks like this: - iscsid starts, reads config file, forks child discovery process child process: 1. sendtargets_discovery_process(): calls iscsi_connect() 2. iscsi_connect(): the connect() call fails printing "cannot make connection ..." to /var/log/messages file; NOTE: at this point, close(sock) is called, but the value of 'sock' (the fd for the socket) is returned in session->socket_fd 3. sendtargets_discovery_process(): iscsi_connect() returned '0', so 'then' clause is executed; "Connection to Discovery Address A.B.C.D failed" is written to /var/log/messages; goto set_address; code executes and calls iscsi_disconnect() 4. iscsi_disconnect(): since session->socket_fd is a real value, close() is again called on the same value as was done in #2; session->socket_fd = -1 5. Sequence #1-#4 is repeated another 3 times 6. Sequence #1 is repeated, but this time iscsi_connect() is successful; session->socket_fd is set to 'sock' value, and 1 is returned 7. sendtargets_discovery_process(): iscsi_connect() returns '1', so 'then' clause is skipped and logmsg() is called with "Connected to Discovery Address A.B.C.D"; this message somehow doesn't go to syslog socket though, but instead goes to session->socket_fd, which goes to the target. The target sees this garbage packet, and disconnects the session 8. sendtargets_discovery_process(): iscsi_login() is called 9. iscsi_login(): read() is called, and it exists with '0' value; this calls logmsg() which calls syslog() with "Connection to Discovery Address A.B.C.D closed". This again goes over the socket to the target. Also, iscsi_host_err() is called, with "Login I/O error, failed to receive a PDU\n", which also goes over the socket; a value of LOGIN_IO_ERROR is returned from iscsi_login() 10. sendtargets_discovery_process(): iscsi_login() returns with LOGIN_IO_ERROR, and logmsg() is called with "retrying discovery login to A.B.C.D"; it's interesting that it at least appears that this message goes both to /var/log/messages _and_ across the socket (???) Version-Release number of selected component (if applicable): RHEL4 U3 beta iscsi-initiator-utils-4.0.3.0-3 How reproducible: - Happens every time on my IBM Penguin ALTUS 1000E (dual x86_64) Steps to Reproduce: 1. Put one or more DiscoveryAddress entries in /etc/iscsi.conf 2. Boot machine 3. Observe syslog message goes over iscsi socket, followed by iSCSI packets; target rejects garbage and disconnects the session; process continues with the initiator unable to connect to that target Actual results: Initiator unable to connect to target, continues to send syslog message before valid iSCSI PDUs Expected results: Valid iSCSI login sequence from initiator for discovery session Additional info: I've only repro'd this on x86_64 but the patch seems to fix an arch independent behavior.
Created attachment 122853 [details] Patch against RHEL4 U3 beta that fixes this problem
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0495.html