Bug 691847 - Spacewalk Proxy Full Restart Causes OSAD to stop working
Summary: Spacewalk Proxy Full Restart Causes OSAD to stop working
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Spacewalk
Classification: Community
Component: Proxy Server
Version: 1.4
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
Assignee: Jan Pazdziora
QA Contact: Red Hat Satellite QA List
URL:
Whiteboard:
Depends On:
Blocks: space16
TreeView+ depends on / blocked
 
Reported: 2011-03-29 16:12 UTC by JDavis4102
Modified: 2011-12-22 16:51 UTC (History)
2 users (show)

Fixed In Version: osad-5.10.29-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-22 16:51:15 UTC
Embargoed:


Attachments (Terms of Use)

Description JDavis4102 2011-03-29 16:12:31 UTC
Description of problem:
When performing a full restart of a Spacewalk Proxy the clients connecting to that proxy fail to restore OSAD connect to the Proxy server once the server is back online. In order to restore OSAD connection one has to remove /etc/sysconfig/rhn/osad-auth.conf and then perform a service osad restart. A normal service osad restart is unable to restart service.


Version-Release number of selected component (if applicable):

Latest Spacewalk 1.3 packages.


How reproducible:

Restart Spacewalk Proxy via shutdown -r now.
Wait until proxy comes back online.
You will see the Spacewalk Client not return to an online status.
In Spacewalk OSAD logs you get the following error: "Ignoring Delayed Stanza" 
Also in the logs it doesn't show that the client ever disconnected.


Steps to Reproduce:
1. Restart Spacewalk Proxy via shutdown -r now.
2. Wait for proxy to come back online.
3. Issue will be seen.
  
Actual results:
OSAD not reconnecting

Expected results:
OSAD comes back online after proxy restart.

Comment 1 JDavis4102 2011-04-05 22:49:30 UTC
Any update as to the status of this bug report?

Comment 2 Miroslav Suchý 2011-04-06 08:30:23 UTC
No update. This bug has very low priority for me.
But just guess: Can you try to synchronize time on both Spacewalk server and Spacewalk Proxy and see if this still happens?

Comment 3 JDavis4102 2011-04-06 16:13:58 UTC
All time is being synced with ntp. All time is the same.

Comment 4 JDavis4102 2011-04-18 21:21:03 UTC
Any update as to the status of this bug report?

Comment 5 Miroslav Suchý 2011-04-19 06:34:49 UTC
No update.

Comment 6 JDavis4102 2011-06-08 15:24:49 UTC
Updated to reflect that this issue is also seen in current 1.4 version.

Comment 7 Jan Pazdziora 2011-07-20 11:53:09 UTC
Aligning under space16.

Comment 8 Mark Huth 2011-11-09 03:17:11 UTC
The patch in https://bugzilla.redhat.com/show_bug.cgi?id=664491#c37 may well fix this issue as well:

--- osad.py.rpmorig_osad-5.10.24-1.el6  2011-10-31 19:01:15.362182682 +0100
+++ osad.py     2011-11-07 10:15:26.462966152 +0100
@@ -20,6 +20,7 @@
 import string
 from rhn import rpclib
 import random
+import socket

 from up2date_client.config import initUp2dateConfig
 from up2date_client import config
@@ -210,6 +211,9 @@
         c.client_id = self._client_name
         c.shared_key = self._shared_key
         c.time_drift = self._time_drift
+        c._sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
+        c._sock.setsockopt(socket.SOL_TCP, socket.TCP_KEEPIDLE, 1800)
+        c._sock.setsockopt(socket.SOL_TCP, socket.TCP_KEEPCNT, 3)

         # Update the jabber ID
         systemid = open(self._systemid_file).read()

If the server side connection is closed, but the client still has its end open, then when the keepalive timeout is exceeded (about 30 minutes) the kernel terminates the connection, and osad tries reconnecting.

Comment 9 Jan Pazdziora 2011-12-09 09:50:13 UTC
Thank you, mark.

Patches from bug 664491 were applied to Spacewalk master (with whitespaces fixed) as 586766a1030c9607ea97b9d9b87196d9e760541f and 0c859a376540c6ea2cbcf87d09164f76c30a475d.

Comment 10 Jan Pazdziora 2011-12-12 09:03:55 UTC
s/mark/Mark/. Sorry about that, it was an unfortunate typo.

Comment 11 Milan Zázrivec 2011-12-22 16:51:15 UTC
Spacewalk 1.6 has been released.


Note You need to log in before you can comment on or make changes to this bug.