Description of problem: I cannot connect to satellite webUI after few days. It writes in log : osad/jabber_lib.main('Unable to connect to jabber servers, sleeping 10 seconds',) Version-Release number of selected component (if applicable): sat530 How reproducible: I saw that three time on one machine Steps to Reproduce: 1. clear i386 RHEL4U8 + sat52 with external oracle 2. upgrade to sat530 3. wait few days Actual results: Service Temporarily Unavailable Expected results: webUI works Additional info: tail /var/log/rhn/osa-dispatcher.log 2009/07/08 07:20:26 -04:00 30903 0.0.0.0: osad/jabber_lib.__init__ 2009/07/08 07:20:26 -04:00 30903 0.0.0.0: osad/jabber_lib.print_message('socket error',) 2009/07/08 07:20:26 -04:00 30903 0.0.0.0: osad/jabber_lib.print_message('Could not connect to jabber server', 'hp-bl460c-02.rhts.bos.redhat.com') 2009/07/08 07:20:26 -04:00 30903 0.0.0.0: osad/jabber_lib.setup_connection('Could not connect to any jabber server',) 2009/07/08 07:20:26 -04:00 30903 0.0.0.0: osad/jabber_lib.main('Unable to connect to jabber servers, sleeping 10 seconds',) --- I restarted satellite last time and it worked for next few days. I appears now again.
jabberd is stopped: [root@hp-bl460c-02 ~]# rhn-satellite status jabberd router is stopped osa-dispatcher (pid 30904) is running... lock file found but no process running for pid 31435 httpd (pid 31456 7059 7058 7057 7056 7055 7054 7053 7052) is running... 2009-07-08 08:28:28 Monitoring: ----------- InstallSoftwareConfig STATUS --------------- 2009-07-08 08:28:28 Monitoring: ----------- GenerateNotifConfig STATUS --------------- 2009-07-08 08:28:28 Monitoring: ----------- NotifEscalator STATUS --------------- 2009-07-08 08:28:28 Monitoring: ----------- NotifLauncher STATUS --------------- 2009-07-08 08:28:28 Monitoring: ----------- Notifier STATUS --------------- 2009-07-08 08:28:29 Monitoring: ----------- AckProcessor STATUS --------------- 2009-07-08 08:28:29 Monitoring: ----------- TSDBLocalQueue STATUS --------------- 2009-07-08 08:28:29 MonitoringScout: ----------- InstallSoftwareConfig STATUS --------------- 2009-07-08 08:28:29 MonitoringScout: ----------- NPBootstrap STATUS --------------- 2009-07-08 08:28:29 MonitoringScout: ----------- SputLite STATUS --------------- 2009-07-08 08:28:30 MonitoringScout: ----------- Dequeuer STATUS --------------- 2009-07-08 08:28:30 MonitoringScout: ----------- Dispatcher STATUS --------------- rhn-search is running (31703). cobblerd (pid 6986) is running... RHN Taskomatic is running (31814).
jabberd is being restarted due to logroate. Was able to replicate by doing a: logrotate -f /etc/logrotate.conf I reviewed all the contents of /etc/logrotate* and see nothing though saying to rotate and HUP jabberd. Very weird. Still looking.
Fix in Spacewalk repo, master 172090659d5a0b6cba91299617794e369672ace4, VADER 7b2dc374f28c04f68c4045520cd4354bfc6c7e59.
Moving to ON_QA
Yes, the fix is in ProgAGoGo-1.11.5-2.el4sat.noarch.rpm and ProgAGoGo-1.11.5-2.el5sat.noarch.rpm.
Pulling from ON_QA, until the fix for bugzilla 516073 hits the compose.
verified in stage on xen5. I forced logrotate to rotate even with 1k of log size. verified that monitoring has been restarted and jabberd and tomcat survived.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-1434.html