Bug 1336426 - on RHEL7 jabberd (and osa-disapatcher because of that) fails to start
Summary: on RHEL7 jabberd (and osa-disapatcher because of that) fails to start
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Spacewalk
Classification: Community
Component: Server
Version: 2.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: Ondrej Gajdusek
QA Contact: Red Hat Satellite QA List
URL:
Whiteboard:
Depends On:
Blocks: spacewalk-review space27
TreeView+ depends on / blocked
 
Reported: 2016-05-16 12:47 UTC by Jan Hutař
Modified: 2017-09-28 18:07 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-27 14:49:54 UTC


Attachments (Terms of Use)

Description Jan Hutař 2016-05-16 12:47:21 UTC
Description of problem:
On RHEL7 jabberd (and osa-disapatcher because of that) fails to start


Version-Release number of selected component (if applicable):
Current SWnightly:
  jabberd-2.3.6-1.el7.x86_64
  osa-dispatcher-5.11.70-1.el7.noarch
  glibc-2.17-106.el7_2.6.x86_64


How reproducible:
often


Steps to Reproduce:
1. Restart SW on RHEL7


Actual results:
jabberd-* services failing


Expected results:
Should work


Additional info:
Some feelings here: issue is more frequent when I do `rhn-satellite restart` then when I do `rhn-satellite stop; sleep 5; rhn-satellite start`.

When I have added strace like this: "ExecStart=/usr/bin/strace -- /usr/bin/c2s -c /etc/jabberd/c2s.xml" to the service file: /usr/lib/systemd/system/jabberd-c2s.service I was not able to reproduce. Maybe strace introduced some delay somewhere?

At the end, after some retries I got detailed log with "ExecStart=/usr/bin/c2s -D -c /etc/jabberd/c2s.xml".

Comment 1 Jan Hutař 2016-05-16 12:49:08 UTC
Workaround is to restart jabberd and osa-dispatcher manually:

  # systemctl jabberd restart
  # systemctl osa-dispatcher restart

Not so tested workaround it to run `rhn-satellite stop; sleep 5; rhn-satellite start` instead of `rhn-satellite restart`.

Comment 4 Jan Hutař 2016-05-16 12:56:02 UTC
Note that although log complains about DB inconsistency, this does not fix the issue:

  # rm -rf /var/lib/jabberd/db/*
  # rhn-satellite restart

Comment 7 pascal.pascher 2016-06-25 18:30:05 UTC
Same problem here with a Spacewalk 2.5 installation (though on CentOS 7):

If I restart the Spacewalk VM jabber doesn't start up and as a result osa-dispatcher service fails. It works after manually restarting jabberd and osa-dispatcher. Here is the startup log:

jabberd/router[1253]: starting up
jabberd/s2s[1254]: starting up (interval=3, queue=60, keepalive=0, idle=86400)
jabberd/router[1253]: process id is 1253, written to /var/lib/jabberd/pid/router.pid
jabberd/s2s[1254]: process id is 1254, written to /var/lib/jabberd/pid/s2s.pid
jabberd/s2s[1254]: attempting connection to router at ::1, port=5347
jabberd/s2s[1254]: [6] [router] write error: Connection refused (111)
jabberd/s2s[1254]: connection to router closed
jabberd/s2s[1254]: attempting reconnect (3 left)
jabberd/sm[1251]: starting up
jabberd/sm[1251]: process id is 1251, written to /var/lib/jabberd/pid/sm.pid
jabberd/sm[1251]: loading 'db' storage module
jabberd/router[1253]: loaded user table (1 users)
jabberd/router[1253]: loaded filters (0 rules)
jabberd/router[1253]: [::, port=5347] listening for incoming connections
c2s: Sat Jun 25 18:56:35 2016 [notice] modules search path: /usr/lib64/jabberd
c2s: Sat Jun 25 18:56:35 2016 [info] loading 'db' authreg module
sm: unable to join the environment
c2s: unable to allocate memory for the lock table
c2s: PANIC: Cannot allocate memory
c2s: Sat Jun 25 18:56:36 2016 [critical] db: corruption detected! close all jabberd processes and run db_recover
systemd: jabberd-c2s.service: main process exited, code=exited, status=2/INVALIDARGUMENT
systemd: Unit jabberd-c2s.service entered failed state.
systemd: jabberd-c2s.service failed.
systemd: Stopped Jabber Server.
systemd: Stopping Jabber Server...
systemd: Stopping Jabber IM Session Manager...
systemd: Stopping Jabber Server To Server Connector...
jabberd/s2s[1254]: attempting connection to router at ::1, port=5347
jabberd/s2s[1254]: shutting down
jabberd/router[1253]: [::1, port=42448] connect
jabberd/s2s[1254]: connection to router closed
jabberd/router[1253]: [::1, port=42448] disconnect
systemd: Stopped Jabber Server To Server Connector.
jabberd/router[1253]: shutting down
osa-dispatcher: Spacewalk 1296 2016/06/25 18:56:37 +02:00: ('Error connecting to jabber server: Unable to connect to the host and port specified. See https://access.redhat.com/solutions/327903 for more information. ',)
osa-dispatcher: Spacewalk 1296 2016/06/25 18:56:37 +02:00: ('Error caught:',)
osa-dispatcher: ERROR: unhandled exception occurred: (unicode argument expected, got 'str').
systemd: osa-dispatcher.service: control process exited, code=exited status=255
systemd: Failed to start OSA Dispatcher daemon.
systemd: Unit osa-dispatcher.service entered failed state.
systemd: osa-dispatcher.service failed.

- Just executing a 'spacewalk-service restart' doesn't fix the issue. 
- What does work, as mentioned before by Jan Hutař, is a
'systemctl restart jabberd' followed by a 'systemctl restart osa-dispatcher'. 
- I also tried following the advice at 'https://access.redhat.com/solutions/327903' but it didn't change anything.
- executing 'spacewalk-setup-jabberd' doesn't help either.

Comment 8 Aaron 2016-07-07 00:31:29 UTC
Run alternatives --config java
Remove all Java version but 8 and my services started working again.

Comment 9 Aaron 2016-07-07 00:48:41 UTC
(In reply to Aaron from comment #8)
> Run alternatives --config java
> Remove all Java version but 8 and my services started working again.

Unfortunately this broke the webui.

Comment 10 Pavel Studeník 2016-07-15 14:07:14 UTC
I found AVC message during restarting service osa-dispatcher. Probably it isn't associated with this bug. 


>> systemctl restart osa-dispatcher.service
...
type=SERVICE_STOP msg=audit(1468591032.022:1734): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=osa-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
type=AVC msg=audit(1468591032.089:1735): avc:  denied  { write } for  pid=31826 comm="osa-dispatcher" name="osad" dev="dm-0" ino=101055545 scontext=system_u:system_r:osa_dispatcher_t:s0 tcontext=system_u:object_r:usr_t:s0 tclass=dir
type=SYSCALL msg=audit(1468591032.089:1735): arch=c000003e syscall=87 success=no exit=-13 a0=207a5e0 a1=ebb8 a2=81b4 a3=7ffd12208a20 items=0 ppid=1 pid=31826 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="osa-dispatcher" exe="/usr/bin/python2.7" subj=system_u:system_r:osa_dispatcher_t:s0 key=(null)
type=SERVICE_START msg=audit(1468591032.313:1736): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=osa-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

Comment 12 francesco.santagata 2016-07-21 15:52:29 UTC
I confirm everything, using 7.2 and latest updates

Comment 13 Florin Samareanu 2016-08-16 06:30:22 UTC
Adding a +1. Workaround confirmed as well: stop jabberd,start jabberd , restart osa-dispatcher.

Comment 14 max 2016-09-05 19:45:31 UTC
Can also confirm, using latest CentOS 7 release and Spacewalk 2.5.

Comment 15 Brian Harrington 2016-09-28 01:22:29 UTC
Potential fix submitted here:

https://github.com/spacewalkproject/spacewalk/pull/462

Before:

[root@spacewalk rhn]# /usr/sbin/spacewalk-service restart 
Shutting down spacewalk services...
Stopping RHN Taskomatic...
Stopped RHN Taskomatic.
Stopping cobblerd (via systemctl):                         [  OK  ]
Redirecting to /bin/systemctl stop  rhn-search.service
Redirecting to /bin/systemctl stop  osa-dispatcher.service
Redirecting to /bin/systemctl stop  httpd.service
Redirecting to /bin/systemctl stop  tomcat.service
Redirecting to /bin/systemctl stop  jabberd.service
Redirecting to /bin/systemctl stop  postgresql.service
Done.
Starting spacewalk services...
Redirecting to /bin/systemctl start  postgresql.service
Redirecting to /bin/systemctl start  jabberd.service
Redirecting to /bin/systemctl start  tomcat.service
Waiting for tomcat to be ready ...
Redirecting to /bin/systemctl start  httpd.service
Redirecting to /bin/systemctl start  osa-dispatcher.service
Job for osa-dispatcher.service failed because the control process exited with error code. See "systemctl status osa-dispatcher.service" and "journalctl -xe" for details.
Redirecting to /bin/systemctl start  rhn-search.service
Starting cobblerd (via systemctl):                         [  OK  ]
Starting RHN Taskomatic...
Done.
[root@spacewalk rhn]# systemctl status osa-dispatcher
● osa-dispatcher.service - OSA Dispatcher daemon
   Loaded: loaded (/usr/lib/systemd/system/osa-dispatcher.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2016-09-28 00:39:49 UTC; 27min ago
  Process: 5679 ExecStart=/usr/sbin/osa-dispatcher --pid-file /var/run/osa-dispatcher.pid (code=exited, status=255)
  Process: 5676 ExecStartPre=/bin/rm -f /var/run/osa-dispatcher.pid (code=exited, status=0/SUCCESS)
 Main PID: 4007 (code=killed, signal=TERM)

Sep 28 00:39:48 spacewalk.lab.libcore.so systemd[1]: Starting OSA Dispatcher daemon...
Sep 28 00:39:49 spacewalk.lab.libcore.so osa-dispatcher[5679]: Spacewalk 5679 2016/09/28 00:39:49 -00:00: ('Error connecting to jabber server: Unable to connect to the host and ...ation. ',)
Sep 28 00:39:49 spacewalk.lab.libcore.so osa-dispatcher[5679]: Spacewalk 5679 2016/09/28 00:39:49 -00:00: ('Error caught:',)
Sep 28 00:39:49 spacewalk.lab.libcore.so osa-dispatcher[5679]: ERROR: unhandled exception occurred: (unicode argument expected, got 'str').
Sep 28 00:39:49 spacewalk.lab.libcore.so systemd[1]: osa-dispatcher.service: control process exited, code=exited status=255
Sep 28 00:39:49 spacewalk.lab.libcore.so systemd[1]: Failed to start OSA Dispatcher daemon.
Sep 28 00:39:49 spacewalk.lab.libcore.so systemd[1]: Unit osa-dispatcher.service entered failed state.
Sep 28 00:39:49 spacewalk.lab.libcore.so systemd[1]: osa-dispatcher.service failed.
Hint: Some lines were ellipsized, use -l to show in full.



After:
[root@spacewalk rhn]# /usr/sbin/spacewalk-service restart
Shutting down spacewalk services...
Stopping RHN Taskomatic...
Stopped RHN Taskomatic.
Stopping cobblerd (via systemctl):                         [  OK  ]
Redirecting to /bin/systemctl stop  rhn-search.service
Redirecting to /bin/systemctl stop  osa-dispatcher.service
Redirecting to /bin/systemctl stop  httpd.service
Redirecting to /bin/systemctl stop  tomcat.service
Redirecting to /bin/systemctl stop  jabberd.service
Redirecting to /bin/systemctl stop  postgresql.service
Done.
Starting spacewalk services...
Redirecting to /bin/systemctl start  postgresql.service
Redirecting to /bin/systemctl start  jabberd.service
Redirecting to /bin/systemctl start  tomcat.service
Waiting for tomcat to be ready ...
Redirecting to /bin/systemctl start  httpd.service
Redirecting to /bin/systemctl start  osa-dispatcher.service
Redirecting to /bin/systemctl start  rhn-search.service
Starting cobblerd (via systemctl):                         [  OK  ]
Starting RHN Taskomatic...
Done.
[root@spacewalk rhn]# systemctl status  osa-dispatcher
● osa-dispatcher.service - OSA Dispatcher daemon
   Loaded: loaded (/usr/lib/systemd/system/osa-dispatcher.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/osa-dispatcher.service.d
           └─10-dependency.conf
   Active: active (running) since Wed 2016-09-28 01:13:53 UTC; 38s ago
  Process: 7684 ExecStart=/usr/sbin/osa-dispatcher --pid-file /var/run/osa-dispatcher.pid (code=exited, status=0/SUCCESS)
  Process: 7681 ExecStartPre=/bin/rm -f /var/run/osa-dispatcher.pid (code=exited, status=0/SUCCESS)
 Main PID: 7686 (osa-dispatcher)
   CGroup: /system.slice/osa-dispatcher.service
           └─7686 /usr/bin/python -s /usr/sbin/osa-dispatcher --pid-file /var/run/osa-dispatcher.pid

Sep 28 01:13:53 spacewalk.lab.libcore.so systemd[1]: Starting OSA Dispatcher daemon...
Sep 28 01:13:53 spacewalk.lab.libcore.so systemd[1]: Started OSA Dispatcher daemon.

Comment 16 max 2016-09-30 13:19:51 UTC
I tried the fix as per the pull request (modifying the base unit file) and found it didn't work for me. Interestingly though, a 60 second PreExec sleep does do the job and it now cleanly starts on system/service start/restart. It almost seems as if osa-dispatcher is trying to connect to jabberd-c2s before it has completely started, causing mayhem with both.

Comment 17 frbnrw 2016-12-06 08:44:07 UTC
Hi

I setup a complet new system wit CentOS Linux release 7.2.1511 (Core) and spacewalk 2.6.

It made a lot of trouble at the installation process and later I saw that jabber and osa dispatcher are not startet.

In the logs I found 
kernel: sm[2824]: segfault at 7f98e9a6cfd8 ip 00007f00e5c66b55 sp 00007ffc413a4bb0 error 4 in libdb-4.8.so[7f00e5ba7000+17d000

After a reboot the tomcat was up but not jabber and osa-dispatcher

I can confirm that 
 systemctl jabberd restart
 systemctl osa-dispatcher restart
can help (1 or 2 tries)


The sm is in my humble opinion the cause why the other processes are die.

For me I found the following solution 

# add Restart=always RestartSec=5
/usr/lib/systemd/system/jabberd-sm.service
[Service]
User=jabber
ExecStart=/usr/bin/sm -c /etc/jabberd/sm.xml
Restart=always
RestartSec=5

# tomcat.service 
/usr/lib/systemd/system/jabberd.service
Requires=tomcat.service jabberd-router.service jabberd-sm.service jabberd-c2s.service jabberd-s2s.service


# add Restart=always RestartSec=5
/usr/lib/systemd/system/osa-dispatcher.service
[Service]
Type=forking
EnvironmentFile=-/etc/sysconfig/osa-dispatcher
PIDFile=/var/run/osa-dispatcher.pid
ExecStart=/usr/sbin/osa-dispatcher --pid-file /var/run/osa-dispatcher.pid
ExecStartPre=/bin/rm -f /var/run/osa-dispatcher.pid
Restart=always
RestartSec=5

Comment 18 Ondrej Gajdusek 2017-06-27 14:49:54 UTC
I can not reproduce this bug - running Spacewalk nightly on RHEL 7.

tried these steps:
#after fresh installation 
spacewalk-service status # every service is active
spacewalk-service restart # every service is active

spacewalk-service stop
spacewalk-service start # everything ok

Tried on two machines, same results.

The patch mentioned in Comment 15 sent as PR has been closed, see resolution: https://github.com/spacewalkproject/spacewalk/pull/462#issuecomment-307578187

We've decided to close this bug. Feel free to comment or open this bug if you can reproduce it against nightly.


Additional info:

# rpm -qa |grep spacewalk |sort
spacewalk-admin-2.6.1-1.el7.noarch
spacewalk-backend-2.7.114-1.el7.noarch
spacewalk-backend-app-2.7.114-1.el7.noarch
spacewalk-backend-applet-2.7.114-1.el7.noarch
spacewalk-backend-config-files-2.7.114-1.el7.noarch
spacewalk-backend-config-files-common-2.7.114-1.el7.noarch
spacewalk-backend-config-files-tool-2.7.114-1.el7.noarch
spacewalk-backend-iss-2.7.114-1.el7.noarch
spacewalk-backend-iss-export-2.7.114-1.el7.noarch
spacewalk-backend-libs-2.7.114-1.el7.noarch
spacewalk-backend-package-push-server-2.7.114-1.el7.noarch
spacewalk-backend-server-2.7.114-1.el7.noarch
spacewalk-backend-sql-2.7.114-1.el7.noarch
spacewalk-backend-sql-postgresql-2.7.114-1.el7.noarch
spacewalk-backend-tools-2.7.114-1.el7.noarch
spacewalk-backend-xml-export-libs-2.7.114-1.el7.noarch
spacewalk-backend-xmlrpc-2.7.114-1.el7.noarch
spacewalk-base-2.7.3-1.el7.noarch
spacewalk-base-minimal-2.7.3-1.el7.noarch
spacewalk-base-minimal-config-2.7.3-1.el7.noarch
spacewalk-branding-2.7.4-1.el7.noarch
spacewalk-certs-tools-2.7.1-1.el7.noarch
spacewalk-common-2.7.2-1.el7.noarch
spacewalk-config-2.7.2-1.el7.noarch
spacewalk-dobby-2.7.3-1.el7.noarch
spacewalk-doc-indexes-2.5.2-1.el7.noarch
spacewalk-html-2.7.3-1.el7.noarch
spacewalk-java-2.7.86-1.el7.noarch
spacewalk-java-config-2.7.86-1.el7.noarch
spacewalk-java-lib-2.7.86-1.el7.noarch
spacewalk-java-postgresql-2.7.86-1.el7.noarch
spacewalk-postgresql-2.7.2-1.el7.noarch
spacewalk-repo-2.6-0.el7.noarch
spacewalk-reports-2.7.5-1.el7.noarch
spacewalk-schema-2.7.24-1.el7.noarch
spacewalk-search-2.7.5-1.el7.noarch
spacewalk-selinux-2.7.1-1.el7.noarch
spacewalk-setup-2.7.9-1.el7.noarch
spacewalk-setup-jabberd-2.7.1-1.el7.noarch
spacewalk-setup-postgresql-2.7.3-1.el7.noarch
spacewalk-taskomatic-2.7.86-1.el7.noarch
spacewalk-usix-2.7.5-1.el7.noarch
spacewalk-utils-2.7.15-1.el7.noarch

Comment 19 Eric Herget 2017-09-28 18:07:12 UTC
This BZ closed some time during 2.5, 2.6 or 2.7.  Adding to 2.7 tracking bug.


Note You need to log in before you can comment on or make changes to this bug.