Bug 1336426

Summary: on RHEL7 jabberd (and osa-disapatcher because of that) fails to start
Product: [Community] Spacewalk Reporter: Jan Hutař <jhutar>
Component: ServerAssignee: Ondrej Gajdusek <ogajduse>
Status: CLOSED WORKSFORME QA Contact: Red Hat Satellite QA List <satqe-list>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.4CC: areed, bharrington, florin.samareanu, francesco.santagata, frbnrw, max, ogajduse, pascal.pascher, pstudeni, tlestach
Target Milestone: ---Keywords: Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-27 14:49:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1310034, 1484117    

Description Jan Hutař 2016-05-16 12:47:21 UTC
Description of problem:
On RHEL7 jabberd (and osa-disapatcher because of that) fails to start


Version-Release number of selected component (if applicable):
Current SWnightly:
  jabberd-2.3.6-1.el7.x86_64
  osa-dispatcher-5.11.70-1.el7.noarch
  glibc-2.17-106.el7_2.6.x86_64


How reproducible:
often


Steps to Reproduce:
1. Restart SW on RHEL7


Actual results:
jabberd-* services failing


Expected results:
Should work


Additional info:
Some feelings here: issue is more frequent when I do `rhn-satellite restart` then when I do `rhn-satellite stop; sleep 5; rhn-satellite start`.

When I have added strace like this: "ExecStart=/usr/bin/strace -- /usr/bin/c2s -c /etc/jabberd/c2s.xml" to the service file: /usr/lib/systemd/system/jabberd-c2s.service I was not able to reproduce. Maybe strace introduced some delay somewhere?

At the end, after some retries I got detailed log with "ExecStart=/usr/bin/c2s -D -c /etc/jabberd/c2s.xml".

Comment 1 Jan Hutař 2016-05-16 12:49:08 UTC
Workaround is to restart jabberd and osa-dispatcher manually:

  # systemctl jabberd restart
  # systemctl osa-dispatcher restart

Not so tested workaround it to run `rhn-satellite stop; sleep 5; rhn-satellite start` instead of `rhn-satellite restart`.

Comment 4 Jan Hutař 2016-05-16 12:56:02 UTC
Note that although log complains about DB inconsistency, this does not fix the issue:

  # rm -rf /var/lib/jabberd/db/*
  # rhn-satellite restart

Comment 7 pascal.pascher 2016-06-25 18:30:05 UTC
Same problem here with a Spacewalk 2.5 installation (though on CentOS 7):

If I restart the Spacewalk VM jabber doesn't start up and as a result osa-dispatcher service fails. It works after manually restarting jabberd and osa-dispatcher. Here is the startup log:

jabberd/router[1253]: starting up
jabberd/s2s[1254]: starting up (interval=3, queue=60, keepalive=0, idle=86400)
jabberd/router[1253]: process id is 1253, written to /var/lib/jabberd/pid/router.pid
jabberd/s2s[1254]: process id is 1254, written to /var/lib/jabberd/pid/s2s.pid
jabberd/s2s[1254]: attempting connection to router at ::1, port=5347
jabberd/s2s[1254]: [6] [router] write error: Connection refused (111)
jabberd/s2s[1254]: connection to router closed
jabberd/s2s[1254]: attempting reconnect (3 left)
jabberd/sm[1251]: starting up
jabberd/sm[1251]: process id is 1251, written to /var/lib/jabberd/pid/sm.pid
jabberd/sm[1251]: loading 'db' storage module
jabberd/router[1253]: loaded user table (1 users)
jabberd/router[1253]: loaded filters (0 rules)
jabberd/router[1253]: [::, port=5347] listening for incoming connections
c2s: Sat Jun 25 18:56:35 2016 [notice] modules search path: /usr/lib64/jabberd
c2s: Sat Jun 25 18:56:35 2016 [info] loading 'db' authreg module
sm: unable to join the environment
c2s: unable to allocate memory for the lock table
c2s: PANIC: Cannot allocate memory
c2s: Sat Jun 25 18:56:36 2016 [critical] db: corruption detected! close all jabberd processes and run db_recover
systemd: jabberd-c2s.service: main process exited, code=exited, status=2/INVALIDARGUMENT
systemd: Unit jabberd-c2s.service entered failed state.
systemd: jabberd-c2s.service failed.
systemd: Stopped Jabber Server.
systemd: Stopping Jabber Server...
systemd: Stopping Jabber IM Session Manager...
systemd: Stopping Jabber Server To Server Connector...
jabberd/s2s[1254]: attempting connection to router at ::1, port=5347
jabberd/s2s[1254]: shutting down
jabberd/router[1253]: [::1, port=42448] connect
jabberd/s2s[1254]: connection to router closed
jabberd/router[1253]: [::1, port=42448] disconnect
systemd: Stopped Jabber Server To Server Connector.
jabberd/router[1253]: shutting down
osa-dispatcher: Spacewalk 1296 2016/06/25 18:56:37 +02:00: ('Error connecting to jabber server: Unable to connect to the host and port specified. See https://access.redhat.com/solutions/327903 for more information. ',)
osa-dispatcher: Spacewalk 1296 2016/06/25 18:56:37 +02:00: ('Error caught:',)
osa-dispatcher: ERROR: unhandled exception occurred: (unicode argument expected, got 'str').
systemd: osa-dispatcher.service: control process exited, code=exited status=255
systemd: Failed to start OSA Dispatcher daemon.
systemd: Unit osa-dispatcher.service entered failed state.
systemd: osa-dispatcher.service failed.

- Just executing a 'spacewalk-service restart' doesn't fix the issue. 
- What does work, as mentioned before by Jan Hutař, is a
'systemctl restart jabberd' followed by a 'systemctl restart osa-dispatcher'. 
- I also tried following the advice at 'https://access.redhat.com/solutions/327903' but it didn't change anything.
- executing 'spacewalk-setup-jabberd' doesn't help either.

Comment 8 Aaron 2016-07-07 00:31:29 UTC
Run alternatives --config java
Remove all Java version but 8 and my services started working again.

Comment 9 Aaron 2016-07-07 00:48:41 UTC
(In reply to Aaron from comment #8)
> Run alternatives --config java
> Remove all Java version but 8 and my services started working again.

Unfortunately this broke the webui.

Comment 10 Pavel Studeník 2016-07-15 14:07:14 UTC
I found AVC message during restarting service osa-dispatcher. Probably it isn't associated with this bug. 


>> systemctl restart osa-dispatcher.service
...
type=SERVICE_STOP msg=audit(1468591032.022:1734): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=osa-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
type=AVC msg=audit(1468591032.089:1735): avc:  denied  { write } for  pid=31826 comm="osa-dispatcher" name="osad" dev="dm-0" ino=101055545 scontext=system_u:system_r:osa_dispatcher_t:s0 tcontext=system_u:object_r:usr_t:s0 tclass=dir
type=SYSCALL msg=audit(1468591032.089:1735): arch=c000003e syscall=87 success=no exit=-13 a0=207a5e0 a1=ebb8 a2=81b4 a3=7ffd12208a20 items=0 ppid=1 pid=31826 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="osa-dispatcher" exe="/usr/bin/python2.7" subj=system_u:system_r:osa_dispatcher_t:s0 key=(null)
type=SERVICE_START msg=audit(1468591032.313:1736): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=osa-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

Comment 12 francesco.santagata 2016-07-21 15:52:29 UTC
I confirm everything, using 7.2 and latest updates

Comment 13 Florin Samareanu 2016-08-16 06:30:22 UTC
Adding a +1. Workaround confirmed as well: stop jabberd,start jabberd , restart osa-dispatcher.

Comment 14 max 2016-09-05 19:45:31 UTC
Can also confirm, using latest CentOS 7 release and Spacewalk 2.5.

Comment 15 Brian Harrington 2016-09-28 01:22:29 UTC
Potential fix submitted here:

https://github.com/spacewalkproject/spacewalk/pull/462

Before:

[root@spacewalk rhn]# /usr/sbin/spacewalk-service restart 
Shutting down spacewalk services...
Stopping RHN Taskomatic...
Stopped RHN Taskomatic.
Stopping cobblerd (via systemctl):                         [  OK  ]
Redirecting to /bin/systemctl stop  rhn-search.service
Redirecting to /bin/systemctl stop  osa-dispatcher.service
Redirecting to /bin/systemctl stop  httpd.service
Redirecting to /bin/systemctl stop  tomcat.service
Redirecting to /bin/systemctl stop  jabberd.service
Redirecting to /bin/systemctl stop  postgresql.service
Done.
Starting spacewalk services...
Redirecting to /bin/systemctl start  postgresql.service
Redirecting to /bin/systemctl start  jabberd.service
Redirecting to /bin/systemctl start  tomcat.service
Waiting for tomcat to be ready ...
Redirecting to /bin/systemctl start  httpd.service
Redirecting to /bin/systemctl start  osa-dispatcher.service
Job for osa-dispatcher.service failed because the control process exited with error code. See "systemctl status osa-dispatcher.service" and "journalctl -xe" for details.
Redirecting to /bin/systemctl start  rhn-search.service
Starting cobblerd (via systemctl):                         [  OK  ]
Starting RHN Taskomatic...
Done.
[root@spacewalk rhn]# systemctl status osa-dispatcher
● osa-dispatcher.service - OSA Dispatcher daemon
   Loaded: loaded (/usr/lib/systemd/system/osa-dispatcher.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2016-09-28 00:39:49 UTC; 27min ago
  Process: 5679 ExecStart=/usr/sbin/osa-dispatcher --pid-file /var/run/osa-dispatcher.pid (code=exited, status=255)
  Process: 5676 ExecStartPre=/bin/rm -f /var/run/osa-dispatcher.pid (code=exited, status=0/SUCCESS)
 Main PID: 4007 (code=killed, signal=TERM)

Sep 28 00:39:48 spacewalk.lab.libcore.so systemd[1]: Starting OSA Dispatcher daemon...
Sep 28 00:39:49 spacewalk.lab.libcore.so osa-dispatcher[5679]: Spacewalk 5679 2016/09/28 00:39:49 -00:00: ('Error connecting to jabber server: Unable to connect to the host and ...ation. ',)
Sep 28 00:39:49 spacewalk.lab.libcore.so osa-dispatcher[5679]: Spacewalk 5679 2016/09/28 00:39:49 -00:00: ('Error caught:',)
Sep 28 00:39:49 spacewalk.lab.libcore.so osa-dispatcher[5679]: ERROR: unhandled exception occurred: (unicode argument expected, got 'str').
Sep 28 00:39:49 spacewalk.lab.libcore.so systemd[1]: osa-dispatcher.service: control process exited, code=exited status=255
Sep 28 00:39:49 spacewalk.lab.libcore.so systemd[1]: Failed to start OSA Dispatcher daemon.
Sep 28 00:39:49 spacewalk.lab.libcore.so systemd[1]: Unit osa-dispatcher.service entered failed state.
Sep 28 00:39:49 spacewalk.lab.libcore.so systemd[1]: osa-dispatcher.service failed.
Hint: Some lines were ellipsized, use -l to show in full.



After:
[root@spacewalk rhn]# /usr/sbin/spacewalk-service restart
Shutting down spacewalk services...
Stopping RHN Taskomatic...
Stopped RHN Taskomatic.
Stopping cobblerd (via systemctl):                         [  OK  ]
Redirecting to /bin/systemctl stop  rhn-search.service
Redirecting to /bin/systemctl stop  osa-dispatcher.service
Redirecting to /bin/systemctl stop  httpd.service
Redirecting to /bin/systemctl stop  tomcat.service
Redirecting to /bin/systemctl stop  jabberd.service
Redirecting to /bin/systemctl stop  postgresql.service
Done.
Starting spacewalk services...
Redirecting to /bin/systemctl start  postgresql.service
Redirecting to /bin/systemctl start  jabberd.service
Redirecting to /bin/systemctl start  tomcat.service
Waiting for tomcat to be ready ...
Redirecting to /bin/systemctl start  httpd.service
Redirecting to /bin/systemctl start  osa-dispatcher.service
Redirecting to /bin/systemctl start  rhn-search.service
Starting cobblerd (via systemctl):                         [  OK  ]
Starting RHN Taskomatic...
Done.
[root@spacewalk rhn]# systemctl status  osa-dispatcher
● osa-dispatcher.service - OSA Dispatcher daemon
   Loaded: loaded (/usr/lib/systemd/system/osa-dispatcher.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/osa-dispatcher.service.d
           └─10-dependency.conf
   Active: active (running) since Wed 2016-09-28 01:13:53 UTC; 38s ago
  Process: 7684 ExecStart=/usr/sbin/osa-dispatcher --pid-file /var/run/osa-dispatcher.pid (code=exited, status=0/SUCCESS)
  Process: 7681 ExecStartPre=/bin/rm -f /var/run/osa-dispatcher.pid (code=exited, status=0/SUCCESS)
 Main PID: 7686 (osa-dispatcher)
   CGroup: /system.slice/osa-dispatcher.service
           └─7686 /usr/bin/python -s /usr/sbin/osa-dispatcher --pid-file /var/run/osa-dispatcher.pid

Sep 28 01:13:53 spacewalk.lab.libcore.so systemd[1]: Starting OSA Dispatcher daemon...
Sep 28 01:13:53 spacewalk.lab.libcore.so systemd[1]: Started OSA Dispatcher daemon.

Comment 16 max 2016-09-30 13:19:51 UTC
I tried the fix as per the pull request (modifying the base unit file) and found it didn't work for me. Interestingly though, a 60 second PreExec sleep does do the job and it now cleanly starts on system/service start/restart. It almost seems as if osa-dispatcher is trying to connect to jabberd-c2s before it has completely started, causing mayhem with both.

Comment 17 frbnrw 2016-12-06 08:44:07 UTC
Hi

I setup a complet new system wit CentOS Linux release 7.2.1511 (Core) and spacewalk 2.6.

It made a lot of trouble at the installation process and later I saw that jabber and osa dispatcher are not startet.

In the logs I found 
kernel: sm[2824]: segfault at 7f98e9a6cfd8 ip 00007f00e5c66b55 sp 00007ffc413a4bb0 error 4 in libdb-4.8.so[7f00e5ba7000+17d000

After a reboot the tomcat was up but not jabber and osa-dispatcher

I can confirm that 
 systemctl jabberd restart
 systemctl osa-dispatcher restart
can help (1 or 2 tries)


The sm is in my humble opinion the cause why the other processes are die.

For me I found the following solution 

# add Restart=always RestartSec=5
/usr/lib/systemd/system/jabberd-sm.service
[Service]
User=jabber
ExecStart=/usr/bin/sm -c /etc/jabberd/sm.xml
Restart=always
RestartSec=5

# tomcat.service 
/usr/lib/systemd/system/jabberd.service
Requires=tomcat.service jabberd-router.service jabberd-sm.service jabberd-c2s.service jabberd-s2s.service


# add Restart=always RestartSec=5
/usr/lib/systemd/system/osa-dispatcher.service
[Service]
Type=forking
EnvironmentFile=-/etc/sysconfig/osa-dispatcher
PIDFile=/var/run/osa-dispatcher.pid
ExecStart=/usr/sbin/osa-dispatcher --pid-file /var/run/osa-dispatcher.pid
ExecStartPre=/bin/rm -f /var/run/osa-dispatcher.pid
Restart=always
RestartSec=5

Comment 18 Ondrej Gajdusek 2017-06-27 14:49:54 UTC
I can not reproduce this bug - running Spacewalk nightly on RHEL 7.

tried these steps:
#after fresh installation 
spacewalk-service status # every service is active
spacewalk-service restart # every service is active

spacewalk-service stop
spacewalk-service start # everything ok

Tried on two machines, same results.

The patch mentioned in Comment 15 sent as PR has been closed, see resolution: https://github.com/spacewalkproject/spacewalk/pull/462#issuecomment-307578187

We've decided to close this bug. Feel free to comment or open this bug if you can reproduce it against nightly.


Additional info:

# rpm -qa |grep spacewalk |sort
spacewalk-admin-2.6.1-1.el7.noarch
spacewalk-backend-2.7.114-1.el7.noarch
spacewalk-backend-app-2.7.114-1.el7.noarch
spacewalk-backend-applet-2.7.114-1.el7.noarch
spacewalk-backend-config-files-2.7.114-1.el7.noarch
spacewalk-backend-config-files-common-2.7.114-1.el7.noarch
spacewalk-backend-config-files-tool-2.7.114-1.el7.noarch
spacewalk-backend-iss-2.7.114-1.el7.noarch
spacewalk-backend-iss-export-2.7.114-1.el7.noarch
spacewalk-backend-libs-2.7.114-1.el7.noarch
spacewalk-backend-package-push-server-2.7.114-1.el7.noarch
spacewalk-backend-server-2.7.114-1.el7.noarch
spacewalk-backend-sql-2.7.114-1.el7.noarch
spacewalk-backend-sql-postgresql-2.7.114-1.el7.noarch
spacewalk-backend-tools-2.7.114-1.el7.noarch
spacewalk-backend-xml-export-libs-2.7.114-1.el7.noarch
spacewalk-backend-xmlrpc-2.7.114-1.el7.noarch
spacewalk-base-2.7.3-1.el7.noarch
spacewalk-base-minimal-2.7.3-1.el7.noarch
spacewalk-base-minimal-config-2.7.3-1.el7.noarch
spacewalk-branding-2.7.4-1.el7.noarch
spacewalk-certs-tools-2.7.1-1.el7.noarch
spacewalk-common-2.7.2-1.el7.noarch
spacewalk-config-2.7.2-1.el7.noarch
spacewalk-dobby-2.7.3-1.el7.noarch
spacewalk-doc-indexes-2.5.2-1.el7.noarch
spacewalk-html-2.7.3-1.el7.noarch
spacewalk-java-2.7.86-1.el7.noarch
spacewalk-java-config-2.7.86-1.el7.noarch
spacewalk-java-lib-2.7.86-1.el7.noarch
spacewalk-java-postgresql-2.7.86-1.el7.noarch
spacewalk-postgresql-2.7.2-1.el7.noarch
spacewalk-repo-2.6-0.el7.noarch
spacewalk-reports-2.7.5-1.el7.noarch
spacewalk-schema-2.7.24-1.el7.noarch
spacewalk-search-2.7.5-1.el7.noarch
spacewalk-selinux-2.7.1-1.el7.noarch
spacewalk-setup-2.7.9-1.el7.noarch
spacewalk-setup-jabberd-2.7.1-1.el7.noarch
spacewalk-setup-postgresql-2.7.3-1.el7.noarch
spacewalk-taskomatic-2.7.86-1.el7.noarch
spacewalk-usix-2.7.5-1.el7.noarch
spacewalk-utils-2.7.15-1.el7.noarch

Comment 19 Eric Herget 2017-09-28 18:07:12 UTC
This BZ closed some time during 2.5, 2.6 or 2.7.  Adding to 2.7 tracking bug.