Description of problem: We have a single iscsi based external raid device networked to an RHEL4 system via a straight cable connection (no switches or hubs). The /etc/iscsi.conf contains a very simple configuration with a single line containing the DiscoveryAddress of the raid system. This configuration worked fine with RHEL3 and (except for the following error messages which now show up on the console and in /var/log/messages) still works just fine. Approximately every 10 seconds the following appears in /var/log/messages: Aug 17 10:23:51 alcserv3 kernel: iscsi-sfnet:host5: Connect failed with rc - 113: No route to host Aug 17 10:23:51 alcserv3 kernel: iscsi-sfnet:host5: establish_session failed. Could not connect to target Aug 17 10:23:51 alcserv3 kernel: iscsi-sfnet:host5: Waiting 10 seconds before next login attempt Version-Release number of selected component (if applicable): Linux alcserv3.psfc.mit.edu 2.6.9-34.0.2.ELsmp #1 SMP Fri Jun 30 10:33:58 EDT 2006 i686 i686 i386 GNU/Linux How reproducible: We have two servers similarly configured and they both exhibit the same behavior. Steps to Reproduce: 1. Directly attach iscsi device 2. Create an iscsi.conf with a single line containing the DiscoveryAddress of the iscsi device. 3. service start iscsi Actual results: iscsi device works fine messages logged every ten seconds Expected results: iscsi device works fine no messages Additional info: This is more of a nuisance issue since the iscsi device is functioning well. The message log just get clogged with these recurring messages.
There is a different bug around the forcedeth driver 6.10 losing connectivity and then the same messages showing up. Reloading the forcedeth driver fixes it until it reoccurrs. I will file a different bug report for that.
This should be fixed in RHEL 4.5 when it comes out. We will now retry the login 3 times, and if we cannot login we will stop retrying. So you will see a error message for each of the 3 retries, but it will no longer continue to spew errors and fill up the logs.
But it should be retrying if it really lost the connection. I read the original bug report as that they have a working connection and get timeout messages anyways. That would be something to fix if it is really what is happening. However if their iscsi box is just not keeping up with the default protocol timing then you can adjust them to your liking: Check out the comments in stock /etc/iscsi.conf that explains LoginTimeout, IdleTimeout, ActiveTimeout and PingTimeout Michael Will
(In reply to comment #4) > But it should be retrying if it really lost the connection. > The code does. If we had a connection and lost it we continue to retry. We only limit retries on the initial login if we never get an initial connection. > I read the original bug report as that they have a working connection and get > timeout messages anyways. That would be something to fix if it is really what is > happening. I did not read the bug that way and that would not make much sense if they are also saying the iscsi device works. The problem is that we can log into some portals and some we cannot get an initial log in to. In RHEL3, we would give up eventually and in some cases because the driver did failover it was completely hidden until you tried to failover. In the RHEL4 driver we do not do failover and the initial login retry limit was accidentally removed.
Yes, the iscsi device worked just fine with the same configuration with RHEL3 and RHEL4.4 but RHEL4.4 produced the errors in /var/log/messages. We have been running RHEL4.5 for some time now and the problem has gone away. I would have reported this but I posted this bug report almost a year ago and there was no activity on it until now so I completely forgot about it. We lived with the messages since the iscsi connection still worked fine. In any event, the problem has gone away with 4.5. Thanks!