Bug 467717 - RHN Satellite fails to start or stop properly with run-level changes
Summary: RHN Satellite fails to start or stop properly with run-level changes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Spacewalk
Classification: Community
Component: Server
Version: 0.3
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Michael Mráka
QA Contact: Shannon Hughes
URL:
Whiteboard:
Depends On:
Blocks: space04
TreeView+ depends on / blocked
 
Reported: 2008-10-20 14:04 UTC by Michael Mráka
Modified: 2009-01-22 16:30 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2009-01-22 16:30:05 UTC
Embargoed:


Attachments (Terms of Use)

Description Michael Mráka 2008-10-20 14:04:17 UTC
+++ This bug was initially created as a clone of Bug #462839 +++

Description of problem:
-----------------------

osa-dispatcher, rhn-database, jabberd, and tomcat5 are started multiple times or stopped multiple times via rc scripts during run-level changes.  This causes the system to send out numerous web tracebacks during a system shutdown, and the web service to constantly return a service temporarily unavailable page after the system has been started again. 

All of these services install their own rc scripts.  The rhn-satellite rc script calls all of their rc scripts directly.  During a run-level change, rc will call each of these scripts and the rhn-satellite script will call them a second time.

The problem with shutdown is that rc calls the rhn-database script to shutdown the database before rhn-satellite is called.  Shutting down the database while rhn-satellite is still trying to use it causes web tracebacks to be generated.

Even if we use chkconfig to turn these services off [ for example rhn-database ] , the tracebacks would be generated as in each rcX.d there is a link to kill the process

[root@dhcp7-119 ~]# chkconfig --list rhn-database
rhn-database   	0:off	1:off	2:off	3:off	4:off	5:off	6:off
[root@dhcp7-119 ~]# ls /etc/rc*.d/ | grep rhn-database
K25rhn-database
K25rhn-database
K25rhn-database
K25rhn-database
K25rhn-database
K25rhn-database
K25rhn-database
[root@dhcp7-119 ~]# 

How reproducible:
-----------------
Always while runlevel is changed

Steps to Reproduce:
-------------------
In Satellite 5.1.1 just issue commands like "init 4", "init 5", "init 3", etc to switch run levels. 

Actual results:
---------------

The traceback would be generated to the email account we have set. Tracebacks have informations like the following : [ Few traceback messages upon syste reboot attached in file traceback-on-runlevel-changes.txt ]

"Caused by: java.sql.SQLException: Connections could not be acquired from the underlying database!"

"Caused by: com.mchange.v2.resourcepool.CannotAcquireResourceException: A ResourcePool could not acquire a resource from its primary factory or source."

"Caused by: java.sql.SQLException: No more data to read from socket"

"Caused by: java.sql.SQLException: ORA-01089: immediate shutdown in progress - no operations are permitted "

Expected results:
-----------------

There should not be any tracebacks, and runlevel switches should be proper.

Additional info:
----------------

Only workaround is to delete the chkconfig scripts [ or exclude these services from the init script of rhn-satellite ]. That is a chkconfig --del command for each of these components to remove all symlinks from the rc?.d directories, letting rhn-satellite manage their starting and stopping.  This will have to be done every time updates for these packages are installed; the postinstall scripts for these packages execute chkconfig --add for their rc scripts regardless of whether the package is being installed or updated.

	# chkconfig --del tomcat5
	# chkconfig --del rhn-database
	# chkconfig --del jabberd
	# chkconfig --del osa-dispatcher


--- Additional comment ---

Hi,

I checked into the default states of all these init scripts when the
package is installed and had set it to the default settings. 

# chkconfig --list | egrep
"rhn-satellite|rhn-database|tomcat5|osa-dispatcher|jabberd|httpd"
jabberd         0:off   1:off   2:on    3:on    4:on    5:on    6:off
httpd           0:off   1:off   2:off   3:off   4:off   5:off   6:off
rhn-database    0:off   1:off   2:on    3:on    4:on    5:on    6:off
osa-dispatcher  0:off   1:off   2:on    3:on    4:on    5:on    6:off
tomcat5         0:off   1:off   2:off   3:off   4:off   5:off   6:off
rhn-satellite   0:off   1:off   2:off   3:on    4:on    5:on    6:off
#

After this I tried switching runlevels between 3,4,5 and didn't get a
traceback. But the tracebacks are generated when we switch to runlevel 6.
A total of 7 tracebacks are generated when I switched to runlevel 6. I
doubt whether it is because services which are already stopped are again
attempted to be stopped/ killed.

Comment 1 Michael Mráka 2008-10-21 13:11:58 UTC
The right fix seems to be move to independent services managed via chkconfig.

Rhn-satellite (and rhn-proxy) scripts will be moved to /sbin as a helper scripts for manual (re)start.

Comment 2 Michael Mráka 2008-10-21 13:15:32 UTC
Fixed in git, between

commit 9fa6f49c106530424afd5aa431637038884e4eeb
    467717 - modified %changelog entries
...
commit fc366f4839195c555b40881cf8cfdba36bdab5e2
    467717 - fixed osad and osa-dispatcher init script description

Comment 3 Shannon Hughes 2008-11-05 19:39:42 UTC
I am seeing the chkconfig list setting services to off? is this correct. init X fails.

Comment 4 Michael Mráka 2008-11-06 11:16:24 UTC
Services should be on.

Steps to reproduce:
1. yum install spacewalk
2. spacewalk-setup
3. chkconfig --list | egrep "rhn-satellite|rhn-database|tomcat5|osa-dispatcher|jabberd|satellite-httpd|oracle"
4. reboot
5. check if services started: 
   for i in rhn-satellite rhn-database tomcat5 osa-dispatcher jabberd satellite-httpd oracle-xe ; do [ -x /etc/init.d/$i ] && service $i status; done 

Expected results:

# chkconfig --list | egrep "rhn-satellite|rhn-database|tomcat5|osa-dispatcher|jabberd|satellite-httpd|oracle"
oracle-xe       0:off   1:off   2:on    3:on    4:on    5:on    6:off
satellite-httpd 0:off   1:off   2:off   3:on    4:on    5:on    6:off
tomcat5         0:off   1:off   2:off   3:on    4:on    5:on    6:off

# for i in rhn-satellite rhn-database tomcat5 osa-dispatcher jabberd satellite-httpd oracle-xe ; do [ -x /etc/init.d/$i ] && service $i status; done
/etc/init.d/tomcat5 is already running (15838)
httpd (pid 6625 6624 6623 6622 6621 6620 6619 6618 6615) is running...


LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 06-NOV-2008 12:11:11

Copyright (c) 1991, 2005, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC_FOR_XE)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 10.2.0.1.0 - Production
Start Date                29-OCT-2008 11:35:47
Uptime                    8 days 0 hr. 35 min. 24 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Default Service           XE
Listener Parameter File   /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/network/admin/listener.ora
Listener Log File         /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/network/log/listener.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC_FOR_XE)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xen70.englab.brq.redhat.com)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xen70.englab.brq.redhat.com)(PORT=9000))(Presentation=HTTP)(Session=RAW))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
  Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "XE" has 1 instance(s).
  Instance "XE", status READY, has 1 handler(s) for this service...
Service "XEXDB" has 1 instance(s).
  Instance "XE", status READY, has 1 handler(s) for this service...
Service "XE_XPT" has 1 instance(s).
  Instance "XE", status READY, has 1 handler(s) for this service...
The command completed successfully

Comment 5 Miroslav Suchý 2009-01-15 12:50:46 UTC
[root@dhcp77-206 tmp]# chkconfig --list rhn-database
error reading information on service rhn-database: No such file or directory
[root@dhcp77-206 tmp]# chkconfig --list tomcat5
tomcat5         0:off   1:off   2:off   3:on    4:on    5:on    6:off
[root@dhcp77-206 tmp]# chkconfig --list osa-dispatcher
osa-dispatcher  0:off   1:off   2:on    3:on    4:on    5:on    6:off
[root@dhcp77-206 tmp]# chkconfig --list jabberd
jabberd         0:off   1:off   2:on    3:on    4:on    5:on    6:off
VERIFIED


Note You need to log in before you can comment on or make changes to this bug.