Bug 1222868
Summary: | OSA-Dispatcher and OSAD are not executing remote commands after upgrading from 2.2 to 2.3 | ||
---|---|---|---|
Product: | [Community] Spacewalk | Reporter: | Ahmad Al-Masry <ahmad.almasry> |
Component: | Server | Assignee: | Gennadii Altukhov <galtukho> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Red Hat Satellite QA List <satqe-list> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 2.3 | CC: | afinkelsrc, ahmad.almasry, bernhard.lichtinger, fadia.marei, good.dr.ahmad, loay.abdallatif, mihai.petracovici, mueller, tlestach |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-05-09 12:33:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1310034, 1484117 |
Description
Ahmad Al-Masry
2015-05-19 10:31:08 UTC
I had the same problem, only about 40 of my 195 systems had an OSAD status of online. Comparing the verbose logs of osad on an online system and an offline system I think the problem lies in the roster management of osa-dispatcher. osad on online system shows: 2015-07-22 15:27:01 jabber_lib._roster_callback: Updating the roster <iq type='result' id='iq-request-c92a47-2'><query xmlns = 'jabber:iq:roster' ><item ask='subscribe' jid='rhn-dispatcher-sat@SPACEWALK_SERVER' subscription='from' /></query></iq> 2015-07-22 15:27:01 jabber_lib.send_presence: rhn-dispatcher-sat@SPACEWALK_SERVER subscribed 2015-07-22 15:27:01 jabber_lib.register_callback: <bound method Client._presence_callback of <osad.osad_client.Client instance at 0x8d8e18>> presence None None None None 2015-07-22 15:27:01 jabber_lib.register_callback: <bound method Client._message_callback of <osad.osad_client.Client instance at 0x8d8e18>> message None None None None 2015-07-22 15:27:01 jabber_lib.subscribe_to_presence: Subscribed to {} 2015-07-22 15:27:01 jabber_lib.subscribe_to_presence: Subscribed both {} 2015-07-22 15:27:01 jabber_lib.subscribe_to_presence: Subscribed none {} 2015-07-22 15:27:01 jabber_lib.subscribe_to_presence: Subscribed from {'rhn-dispatcher-sat@SPACEWALK_SERVER': {'ask': u'subscribe', 'jid': 'rhn-dispatcher-sat@SPACEWALK_SERVER', 'subscription': u'from'}} 2015-07-22 15:27:01 jabber_lib.subscribe_to_presence: Subscribed from + ask=subscribe 2015-07-22 15:27:01 jabber_lib.send_presence: None None 2015-07-22 15:27:01 jabber_lib.process_forever: 2015-07-22 15:27:01 jabber_lib.process: 180 2015-07-22 15:27:01 jabber_lib._roster_callback: Updating the roster <iq to='osad-a914df551c@SPACEWALK_SERVER/osad' type='set' id='lyfo3fy3ep0w7dtziloywuifbaudde8ydnmg3lkh'><query xmlns = 'jabber:iq:roster' ><item ask='subscribe' jid='rhn-dispatcher-sat@SPACEWALK_SERVER' subscription='from' /></query></iq> 2015-07-22 15:27:09 jabber_lib.process: 180 osad on offline system shows: 2015-07-22 15:03:20 jabber_lib.register_callback: <bound method Client._roster_callback of <osad.osad_client.Client instance at 0x9d8e18>> iq None None None None 2015-07-22 15:03:20 jabber_lib.process: None 2015-07-22 15:03:20 jabber_lib._roster_callback: Updating the roster <iq type='result' id='iq-request-0381de-2'><query xmlns = 'jabber:iq:roster' ><item jid='rhn-dispatcher-sat@SPACEWALK_SERVER' subscription='to' /></query></iq> 2015-07-22 15:03:20 jabber_lib.register_callback: <bound method Client._presence_callback of <osad.osad_client.Client instance at 0x9d8e18>> presence None None None None 2015-07-22 15:03:20 jabber_lib.register_callback: <bound method Client._message_callback of <osad.osad_client.Client instance at 0x9d8e18>> message None None None None 2015-07-22 15:03:20 jabber_lib.subscribe_to_presence: Subscribed to {'rhn-dispatcher-sat@SPACEWALK_SERVER': {'jid': 'rhn-dispatcher-sat@SPACEWALK_SERVER', 'subscription': u'to'}} 2015-07-22 15:03:20 jabber_lib.subscribe_to_presence: Subscribed both {} 2015-07-22 15:03:20 jabber_lib.subscribe_to_presence: Subscribed none {} 2015-07-22 15:03:20 jabber_lib.subscribe_to_presence: Subscribed from {} 2015-07-22 15:03:20 jabber_lib.subscribe_to_presence: Subscribed to 2015-07-22 15:03:20 jabber_lib.send_presence: None None 2015-07-22 15:03:20 jabber_lib.process_forever: 2015-07-22 15:03:20 jabber_lib.process: 180 2015-07-22 15:03:20 jabber_lib._presence_callback: osad-46ac861523@SPACEWALK_SERVER/osad rhn-dispatcher-sat@SPACEWALK_SERVER/superclient None 2015-07-22 15:03:20 jabber_lib._presence_callback: Node is available rhn-dispatcher-sat@SPACEWALK_SERVER/superclient None 2015-07-22 15:03:20 jabber_lib.subscribe_to_presence: Subscribed to {'rhn-dispatcher-sat@SPACEWALK_SERVER': {'jid': 'rhn-dispatcher-sat@SPACEWALK_SERVER', 'subscription': u'to'}} 2015-07-22 15:03:20 jabber_lib.subscribe_to_presence: Subscribed both {} 2015-07-22 15:03:20 jabber_lib.subscribe_to_presence: Subscribed none {} 2015-07-22 15:03:20 jabber_lib.subscribe_to_presence: Subscribed from {} 2015-07-22 15:03:20 jabber_lib.subscribe_to_presence: Subscribed to 2015-07-22 15:03:26 jabber_lib.process: 180 The logs between the "jabber_lib.process: 180" lines repeat every few seconds forever. In my conclusion the offline systems fail to get a proper roster subscription with osa-dispatcher and therefore osa-dispatcher regards them as offline. And then osa-dispatcher doesn't even try to send ping requests or notify the system to run rhn_check. My workaround is: 1. stop osa-dispatcher 2. stop jabberd 3. delete everything in /var/lib/jabberd/db 4. start jabberd 5. wait for all osad to reconnect to jabberd 6. start osa-dispatcher 7. All systems are online again (according to table rhnPushClient) I am seeing in this Spacewalk 2.4 as well and can confirm that the workaround above works. Disabling the roster module in jabberd (commenting it out in /etc/jabberd/sm.xml) also prevents this issue from appearing and I could still ping and schedule actions. Deleting /var/lib/jabberd/db will cause you troubles when spacewalk-proxies are connected. By deleting /var/lib/jabberd/db you loose all subscriptions of clients connected to spacewalk-proxy. They won't notice the problem until the osad service is restarted on the affected clients. This already merged pull request might resolve the problem: https://github.com/spacewalkproject/spacewalk/pull/287 Taking... Looks like it works well now. I cannot reproduce the bug after upgrade from SW 2.2 to 2.3. OSAD status is online, can execute remote commands. My steps: 1) Install SW 2.2 2) Register client, try to execute remote command. Result - OK 3) Upgrade SW from 2.2 to 2.3 ttps://fedorahosted.org/spacewalk/wiki/HowToUpgrade 4) Status for osad of registered client is online, can execute remote command as well. No information provided for 2 months. The scenario as described works for us. I am closing this BZ with INSUFFICIENT_DATA. This BZ closed some time during 2.5, 2.6 or 2.7. Adding to 2.7 tracking bug. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |