Description of problem: When performing a full restart of a Spacewalk Proxy the clients connecting to that proxy fail to restore OSAD connect to the Proxy server once the server is back online. In order to restore OSAD connection one has to remove /etc/sysconfig/rhn/osad-auth.conf and then perform a service osad restart. A normal service osad restart is unable to restart service. Version-Release number of selected component (if applicable): Latest Spacewalk 1.3 packages. How reproducible: Restart Spacewalk Proxy via shutdown -r now. Wait until proxy comes back online. You will see the Spacewalk Client not return to an online status. In Spacewalk OSAD logs you get the following error: "Ignoring Delayed Stanza" Also in the logs it doesn't show that the client ever disconnected. Steps to Reproduce: 1. Restart Spacewalk Proxy via shutdown -r now. 2. Wait for proxy to come back online. 3. Issue will be seen. Actual results: OSAD not reconnecting Expected results: OSAD comes back online after proxy restart.
Any update as to the status of this bug report?
No update. This bug has very low priority for me. But just guess: Can you try to synchronize time on both Spacewalk server and Spacewalk Proxy and see if this still happens?
All time is being synced with ntp. All time is the same.
No update.
Updated to reflect that this issue is also seen in current 1.4 version.
Aligning under space16.
The patch in https://bugzilla.redhat.com/show_bug.cgi?id=664491#c37 may well fix this issue as well: --- osad.py.rpmorig_osad-5.10.24-1.el6 2011-10-31 19:01:15.362182682 +0100 +++ osad.py 2011-11-07 10:15:26.462966152 +0100 @@ -20,6 +20,7 @@ import string from rhn import rpclib import random +import socket from up2date_client.config import initUp2dateConfig from up2date_client import config @@ -210,6 +211,9 @@ c.client_id = self._client_name c.shared_key = self._shared_key c.time_drift = self._time_drift + c._sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) + c._sock.setsockopt(socket.SOL_TCP, socket.TCP_KEEPIDLE, 1800) + c._sock.setsockopt(socket.SOL_TCP, socket.TCP_KEEPCNT, 3) # Update the jabber ID systemid = open(self._systemid_file).read() If the server side connection is closed, but the client still has its end open, then when the keepalive timeout is exceeded (about 30 minutes) the kernel terminates the connection, and osad tries reconnecting.
Thank you, mark. Patches from bug 664491 were applied to Spacewalk master (with whitespaces fixed) as 586766a1030c9607ea97b9d9b87196d9e760541f and 0c859a376540c6ea2cbcf87d09164f76c30a475d.
s/mark/Mark/. Sorry about that, it was an unfortunate typo.
Spacewalk 1.6 has been released.