Description of problem: osa-dispatcher fails while generating a log entry which is listed below. This causes clients/proxy server to not receive actions via the osad service. Log: /var/log/rhn/osa-dispatcher.log 2012/05/01 02:59:45 -07:00 5981 0.0.0.0: osad/jabber_lib.main('ERROR', 'Error caught:') 2012/05/01 02:59:45 -07:00 5981 0.0.0.0: osad/jabber_lib.main('ERROR', 'Traceback (most recent call last):\n File "/usr/share/rhn/osad/jabber_lib.py", line 120, in main\n self.process_forever(c)\n File "/usr/share/rhn/osad/jabber_lib.py", line 178, in process_forever\n self.process_once(client)\n File "/usr/share/rhn/osad/osa_dispatcher.py", line 182, in process_once\n client.retrieve_roster()\n File "/usr/share/rhn/osad/jabber_lib.py", line 714, in retrieve_roster\n stanza = self.get_one_stanza()\n File "/usr/share/rhn/osad/jabber_lib.py", line 786, in get_one_stanza\n self.process(timeout=tm)\n File "/usr/share/rhn/osad/jabber_lib.py", line 1048, in process\n self._parser.Parse(data)\n File "/usr/lib/python2.4/site-packages/jabber/xmlstream.py", line 269, in unknown_endtag\n self.dispatch(self._mini_dom)\n File "/usr/share/rhn/osad/jabber_lib.py", line 814, in _orig_dispatch\n jabber.Client.dispatch(self, stanza)\n File "/usr/lib/python2.4/site-packages/jabber/jabber.py", line 290, in dispatch\n else: handler[\'func\'](self,stanza)\n File "/usr/share/rhn/osad/jabber_lib.py", line 369, in dispatch\n callback(client, stanza)\n File "/usr/share/rhn/osad/dispatcher_client.py", line 145, in _message_callback\n sig = self._check_signature_from_message(stanza, actions)\n File "/usr/share/rhn/osad/jabber_lib.py", line 1310, in _check_signature_from_message\n sig = self._check_signature(stanza, actions=actions)\n File "/usr/share/rhn/osad/dispatcher_client.py", line 69, in _check_signature\n row = lookup_client_by_name(x_client_id)\n File "/usr/share/rhn/osad/dispatcher_client.py", line 214, in lookup_client_by_name\n raise InvalidClientError(client_name)\nInvalidClientError: 910f7d384484c87b\n') 2012/05/01 02:59:55 -07:00 5981 0.0.0.0: osad/jabber_lib.__init__ 2012/05/01 02:59:55 -07:00 5981 0.0.0.0: osad/jabber_lib.setup_connection('Connected to jabber server', 'Removed for security') 2012/05/01 02:59:55 -07:00 5981 0.0.0.0: osad/jabber_lib.register('ERROR', 'Invalid password') Version-Release number of selected component (if applicable): osa-dispatcher-5.10.44-1.el5 How reproducible: Not sure how to reproduce this error as it is random on the Spacewalk server. This started happening with Spacewalk 1.6. Before 1.6 I was not getting this error. Steps to Reproduce: Unable to provide. Actual results: osad service stops working. Pings via web ui work but actions are not picked up. Expected results: Doesn't generate error logs and service stays up and actions are picked up within 10 seconds. Additional info: In order to get the osa-dispatcher service back online I have to perform the following on the application server: service jabberd stop service osa-dispatcher stop rm -f /var/lib/jabberd/db/* service jabberd start sleep 10 service osa-dispatcher start
Any updates as to what might be the root cause?
What does the following SQL run on your Spacewalk server return select jabber_id, password from rhnPushDispatcher; ?
It appears to be showing two(2) entries with the same password. One is showing my new hostname and the other is showing my old hostname. The hostname was changed during my last upgrade. Could this be the cause of the issue?
(In reply to comment #3) > It appears to be showing two(2) entries with the same password. One is > showing my new hostname and the other is showing my old hostname. The > hostname was changed during my last upgrade. Could this be the cause of the > issue? Most likely yes. You need to: 1. stop jabberd & osa-dispatcher, remove jabberd's database 2. log into your database and: delete from rhnPushDispatcher; commit; 3. start jabberd & osa-dispatcher again.
I have deleted the rows within the table and will wait a good couple of weeks to determine if this resolves the issue. By my initial testing this may have resolved the issue but it is hard to tell as this issue only happens at random times and could take a full two weeks to fail.
Everything seems to be working as expected with the above fix.
Great, thank you. I'm closing this report.