Bug 1336426
Summary: | on RHEL7 jabberd (and osa-disapatcher because of that) fails to start | ||
---|---|---|---|
Product: | [Community] Spacewalk | Reporter: | Jan Hutař <jhutar> |
Component: | Server | Assignee: | Ondrej Gajdusek <ogajduse> |
Status: | CLOSED WORKSFORME | QA Contact: | Red Hat Satellite QA List <satqe-list> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 2.4 | CC: | areed, bharrington, florin.samareanu, francesco.santagata, frbnrw, max, ogajduse, pascal.pascher, pstudeni, tlestach |
Target Milestone: | --- | Keywords: | Patch |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-06-27 14:49:54 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1310034, 1484117 |
Description
Jan Hutař
2016-05-16 12:47:21 UTC
Workaround is to restart jabberd and osa-dispatcher manually: # systemctl jabberd restart # systemctl osa-dispatcher restart Not so tested workaround it to run `rhn-satellite stop; sleep 5; rhn-satellite start` instead of `rhn-satellite restart`. Note that although log complains about DB inconsistency, this does not fix the issue: # rm -rf /var/lib/jabberd/db/* # rhn-satellite restart Same problem here with a Spacewalk 2.5 installation (though on CentOS 7): If I restart the Spacewalk VM jabber doesn't start up and as a result osa-dispatcher service fails. It works after manually restarting jabberd and osa-dispatcher. Here is the startup log: jabberd/router[1253]: starting up jabberd/s2s[1254]: starting up (interval=3, queue=60, keepalive=0, idle=86400) jabberd/router[1253]: process id is 1253, written to /var/lib/jabberd/pid/router.pid jabberd/s2s[1254]: process id is 1254, written to /var/lib/jabberd/pid/s2s.pid jabberd/s2s[1254]: attempting connection to router at ::1, port=5347 jabberd/s2s[1254]: [6] [router] write error: Connection refused (111) jabberd/s2s[1254]: connection to router closed jabberd/s2s[1254]: attempting reconnect (3 left) jabberd/sm[1251]: starting up jabberd/sm[1251]: process id is 1251, written to /var/lib/jabberd/pid/sm.pid jabberd/sm[1251]: loading 'db' storage module jabberd/router[1253]: loaded user table (1 users) jabberd/router[1253]: loaded filters (0 rules) jabberd/router[1253]: [::, port=5347] listening for incoming connections c2s: Sat Jun 25 18:56:35 2016 [notice] modules search path: /usr/lib64/jabberd c2s: Sat Jun 25 18:56:35 2016 [info] loading 'db' authreg module sm: unable to join the environment c2s: unable to allocate memory for the lock table c2s: PANIC: Cannot allocate memory c2s: Sat Jun 25 18:56:36 2016 [critical] db: corruption detected! close all jabberd processes and run db_recover systemd: jabberd-c2s.service: main process exited, code=exited, status=2/INVALIDARGUMENT systemd: Unit jabberd-c2s.service entered failed state. systemd: jabberd-c2s.service failed. systemd: Stopped Jabber Server. systemd: Stopping Jabber Server... systemd: Stopping Jabber IM Session Manager... systemd: Stopping Jabber Server To Server Connector... jabberd/s2s[1254]: attempting connection to router at ::1, port=5347 jabberd/s2s[1254]: shutting down jabberd/router[1253]: [::1, port=42448] connect jabberd/s2s[1254]: connection to router closed jabberd/router[1253]: [::1, port=42448] disconnect systemd: Stopped Jabber Server To Server Connector. jabberd/router[1253]: shutting down osa-dispatcher: Spacewalk 1296 2016/06/25 18:56:37 +02:00: ('Error connecting to jabber server: Unable to connect to the host and port specified. See https://access.redhat.com/solutions/327903 for more information. ',) osa-dispatcher: Spacewalk 1296 2016/06/25 18:56:37 +02:00: ('Error caught:',) osa-dispatcher: ERROR: unhandled exception occurred: (unicode argument expected, got 'str'). systemd: osa-dispatcher.service: control process exited, code=exited status=255 systemd: Failed to start OSA Dispatcher daemon. systemd: Unit osa-dispatcher.service entered failed state. systemd: osa-dispatcher.service failed. - Just executing a 'spacewalk-service restart' doesn't fix the issue. - What does work, as mentioned before by Jan Hutař, is a 'systemctl restart jabberd' followed by a 'systemctl restart osa-dispatcher'. - I also tried following the advice at 'https://access.redhat.com/solutions/327903' but it didn't change anything. - executing 'spacewalk-setup-jabberd' doesn't help either. Run alternatives --config java Remove all Java version but 8 and my services started working again. (In reply to Aaron from comment #8) > Run alternatives --config java > Remove all Java version but 8 and my services started working again. Unfortunately this broke the webui. I found AVC message during restarting service osa-dispatcher. Probably it isn't associated with this bug.
>> systemctl restart osa-dispatcher.service
...
type=SERVICE_STOP msg=audit(1468591032.022:1734): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=osa-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
type=AVC msg=audit(1468591032.089:1735): avc: denied { write } for pid=31826 comm="osa-dispatcher" name="osad" dev="dm-0" ino=101055545 scontext=system_u:system_r:osa_dispatcher_t:s0 tcontext=system_u:object_r:usr_t:s0 tclass=dir
type=SYSCALL msg=audit(1468591032.089:1735): arch=c000003e syscall=87 success=no exit=-13 a0=207a5e0 a1=ebb8 a2=81b4 a3=7ffd12208a20 items=0 ppid=1 pid=31826 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="osa-dispatcher" exe="/usr/bin/python2.7" subj=system_u:system_r:osa_dispatcher_t:s0 key=(null)
type=SERVICE_START msg=audit(1468591032.313:1736): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=osa-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
I confirm everything, using 7.2 and latest updates Adding a +1. Workaround confirmed as well: stop jabberd,start jabberd , restart osa-dispatcher. Can also confirm, using latest CentOS 7 release and Spacewalk 2.5. Potential fix submitted here: https://github.com/spacewalkproject/spacewalk/pull/462 Before: [root@spacewalk rhn]# /usr/sbin/spacewalk-service restart Shutting down spacewalk services... Stopping RHN Taskomatic... Stopped RHN Taskomatic. Stopping cobblerd (via systemctl): [ OK ] Redirecting to /bin/systemctl stop rhn-search.service Redirecting to /bin/systemctl stop osa-dispatcher.service Redirecting to /bin/systemctl stop httpd.service Redirecting to /bin/systemctl stop tomcat.service Redirecting to /bin/systemctl stop jabberd.service Redirecting to /bin/systemctl stop postgresql.service Done. Starting spacewalk services... Redirecting to /bin/systemctl start postgresql.service Redirecting to /bin/systemctl start jabberd.service Redirecting to /bin/systemctl start tomcat.service Waiting for tomcat to be ready ... Redirecting to /bin/systemctl start httpd.service Redirecting to /bin/systemctl start osa-dispatcher.service Job for osa-dispatcher.service failed because the control process exited with error code. See "systemctl status osa-dispatcher.service" and "journalctl -xe" for details. Redirecting to /bin/systemctl start rhn-search.service Starting cobblerd (via systemctl): [ OK ] Starting RHN Taskomatic... Done. [root@spacewalk rhn]# systemctl status osa-dispatcher ● osa-dispatcher.service - OSA Dispatcher daemon Loaded: loaded (/usr/lib/systemd/system/osa-dispatcher.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2016-09-28 00:39:49 UTC; 27min ago Process: 5679 ExecStart=/usr/sbin/osa-dispatcher --pid-file /var/run/osa-dispatcher.pid (code=exited, status=255) Process: 5676 ExecStartPre=/bin/rm -f /var/run/osa-dispatcher.pid (code=exited, status=0/SUCCESS) Main PID: 4007 (code=killed, signal=TERM) Sep 28 00:39:48 spacewalk.lab.libcore.so systemd[1]: Starting OSA Dispatcher daemon... Sep 28 00:39:49 spacewalk.lab.libcore.so osa-dispatcher[5679]: Spacewalk 5679 2016/09/28 00:39:49 -00:00: ('Error connecting to jabber server: Unable to connect to the host and ...ation. ',) Sep 28 00:39:49 spacewalk.lab.libcore.so osa-dispatcher[5679]: Spacewalk 5679 2016/09/28 00:39:49 -00:00: ('Error caught:',) Sep 28 00:39:49 spacewalk.lab.libcore.so osa-dispatcher[5679]: ERROR: unhandled exception occurred: (unicode argument expected, got 'str'). Sep 28 00:39:49 spacewalk.lab.libcore.so systemd[1]: osa-dispatcher.service: control process exited, code=exited status=255 Sep 28 00:39:49 spacewalk.lab.libcore.so systemd[1]: Failed to start OSA Dispatcher daemon. Sep 28 00:39:49 spacewalk.lab.libcore.so systemd[1]: Unit osa-dispatcher.service entered failed state. Sep 28 00:39:49 spacewalk.lab.libcore.so systemd[1]: osa-dispatcher.service failed. Hint: Some lines were ellipsized, use -l to show in full. After: [root@spacewalk rhn]# /usr/sbin/spacewalk-service restart Shutting down spacewalk services... Stopping RHN Taskomatic... Stopped RHN Taskomatic. Stopping cobblerd (via systemctl): [ OK ] Redirecting to /bin/systemctl stop rhn-search.service Redirecting to /bin/systemctl stop osa-dispatcher.service Redirecting to /bin/systemctl stop httpd.service Redirecting to /bin/systemctl stop tomcat.service Redirecting to /bin/systemctl stop jabberd.service Redirecting to /bin/systemctl stop postgresql.service Done. Starting spacewalk services... Redirecting to /bin/systemctl start postgresql.service Redirecting to /bin/systemctl start jabberd.service Redirecting to /bin/systemctl start tomcat.service Waiting for tomcat to be ready ... Redirecting to /bin/systemctl start httpd.service Redirecting to /bin/systemctl start osa-dispatcher.service Redirecting to /bin/systemctl start rhn-search.service Starting cobblerd (via systemctl): [ OK ] Starting RHN Taskomatic... Done. [root@spacewalk rhn]# systemctl status osa-dispatcher ● osa-dispatcher.service - OSA Dispatcher daemon Loaded: loaded (/usr/lib/systemd/system/osa-dispatcher.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/osa-dispatcher.service.d └─10-dependency.conf Active: active (running) since Wed 2016-09-28 01:13:53 UTC; 38s ago Process: 7684 ExecStart=/usr/sbin/osa-dispatcher --pid-file /var/run/osa-dispatcher.pid (code=exited, status=0/SUCCESS) Process: 7681 ExecStartPre=/bin/rm -f /var/run/osa-dispatcher.pid (code=exited, status=0/SUCCESS) Main PID: 7686 (osa-dispatcher) CGroup: /system.slice/osa-dispatcher.service └─7686 /usr/bin/python -s /usr/sbin/osa-dispatcher --pid-file /var/run/osa-dispatcher.pid Sep 28 01:13:53 spacewalk.lab.libcore.so systemd[1]: Starting OSA Dispatcher daemon... Sep 28 01:13:53 spacewalk.lab.libcore.so systemd[1]: Started OSA Dispatcher daemon. I tried the fix as per the pull request (modifying the base unit file) and found it didn't work for me. Interestingly though, a 60 second PreExec sleep does do the job and it now cleanly starts on system/service start/restart. It almost seems as if osa-dispatcher is trying to connect to jabberd-c2s before it has completely started, causing mayhem with both. Hi I setup a complet new system wit CentOS Linux release 7.2.1511 (Core) and spacewalk 2.6. It made a lot of trouble at the installation process and later I saw that jabber and osa dispatcher are not startet. In the logs I found kernel: sm[2824]: segfault at 7f98e9a6cfd8 ip 00007f00e5c66b55 sp 00007ffc413a4bb0 error 4 in libdb-4.8.so[7f00e5ba7000+17d000 After a reboot the tomcat was up but not jabber and osa-dispatcher I can confirm that systemctl jabberd restart systemctl osa-dispatcher restart can help (1 or 2 tries) The sm is in my humble opinion the cause why the other processes are die. For me I found the following solution # add Restart=always RestartSec=5 /usr/lib/systemd/system/jabberd-sm.service [Service] User=jabber ExecStart=/usr/bin/sm -c /etc/jabberd/sm.xml Restart=always RestartSec=5 # tomcat.service /usr/lib/systemd/system/jabberd.service Requires=tomcat.service jabberd-router.service jabberd-sm.service jabberd-c2s.service jabberd-s2s.service # add Restart=always RestartSec=5 /usr/lib/systemd/system/osa-dispatcher.service [Service] Type=forking EnvironmentFile=-/etc/sysconfig/osa-dispatcher PIDFile=/var/run/osa-dispatcher.pid ExecStart=/usr/sbin/osa-dispatcher --pid-file /var/run/osa-dispatcher.pid ExecStartPre=/bin/rm -f /var/run/osa-dispatcher.pid Restart=always RestartSec=5 I can not reproduce this bug - running Spacewalk nightly on RHEL 7. tried these steps: #after fresh installation spacewalk-service status # every service is active spacewalk-service restart # every service is active spacewalk-service stop spacewalk-service start # everything ok Tried on two machines, same results. The patch mentioned in Comment 15 sent as PR has been closed, see resolution: https://github.com/spacewalkproject/spacewalk/pull/462#issuecomment-307578187 We've decided to close this bug. Feel free to comment or open this bug if you can reproduce it against nightly. Additional info: # rpm -qa |grep spacewalk |sort spacewalk-admin-2.6.1-1.el7.noarch spacewalk-backend-2.7.114-1.el7.noarch spacewalk-backend-app-2.7.114-1.el7.noarch spacewalk-backend-applet-2.7.114-1.el7.noarch spacewalk-backend-config-files-2.7.114-1.el7.noarch spacewalk-backend-config-files-common-2.7.114-1.el7.noarch spacewalk-backend-config-files-tool-2.7.114-1.el7.noarch spacewalk-backend-iss-2.7.114-1.el7.noarch spacewalk-backend-iss-export-2.7.114-1.el7.noarch spacewalk-backend-libs-2.7.114-1.el7.noarch spacewalk-backend-package-push-server-2.7.114-1.el7.noarch spacewalk-backend-server-2.7.114-1.el7.noarch spacewalk-backend-sql-2.7.114-1.el7.noarch spacewalk-backend-sql-postgresql-2.7.114-1.el7.noarch spacewalk-backend-tools-2.7.114-1.el7.noarch spacewalk-backend-xml-export-libs-2.7.114-1.el7.noarch spacewalk-backend-xmlrpc-2.7.114-1.el7.noarch spacewalk-base-2.7.3-1.el7.noarch spacewalk-base-minimal-2.7.3-1.el7.noarch spacewalk-base-minimal-config-2.7.3-1.el7.noarch spacewalk-branding-2.7.4-1.el7.noarch spacewalk-certs-tools-2.7.1-1.el7.noarch spacewalk-common-2.7.2-1.el7.noarch spacewalk-config-2.7.2-1.el7.noarch spacewalk-dobby-2.7.3-1.el7.noarch spacewalk-doc-indexes-2.5.2-1.el7.noarch spacewalk-html-2.7.3-1.el7.noarch spacewalk-java-2.7.86-1.el7.noarch spacewalk-java-config-2.7.86-1.el7.noarch spacewalk-java-lib-2.7.86-1.el7.noarch spacewalk-java-postgresql-2.7.86-1.el7.noarch spacewalk-postgresql-2.7.2-1.el7.noarch spacewalk-repo-2.6-0.el7.noarch spacewalk-reports-2.7.5-1.el7.noarch spacewalk-schema-2.7.24-1.el7.noarch spacewalk-search-2.7.5-1.el7.noarch spacewalk-selinux-2.7.1-1.el7.noarch spacewalk-setup-2.7.9-1.el7.noarch spacewalk-setup-jabberd-2.7.1-1.el7.noarch spacewalk-setup-postgresql-2.7.3-1.el7.noarch spacewalk-taskomatic-2.7.86-1.el7.noarch spacewalk-usix-2.7.5-1.el7.noarch spacewalk-utils-2.7.15-1.el7.noarch This BZ closed some time during 2.5, 2.6 or 2.7. Adding to 2.7 tracking bug. |