Bug 475999
Summary: | Mint not shutdown by initscript | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Jan Sarenik <jsarenik> | ||||
Component: | cumin | Assignee: | Nuno Santos <nsantos> | ||||
Status: | CLOSED ERRATA | QA Contact: | Jan Sarenik <jsarenik> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | beta | CC: | aortega, iboverma, jsarenik | ||||
Target Milestone: | 1.1.1 | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | cumin-0.1.3073-1 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-04-21 16:18:58 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jan Sarenik
2008-12-11 14:29:13 UTC
My theory, based on some of the instances where I've seen mint-server fail to exit, is that this occurs because of http://bugzilla.redhat.com/show_bug.cgi?id=476038, db deadlocks that can freeze mint threads. The bug is still valid in cumin-0.1.2986-1.el5 so I doubt it is too much connected to 476038. After having looked into sources in svn trunk it is clear that cumin sends SIGTERM to process mint-server it previously started via Popen in 'start_mint_processes' of 'trunk/cumin/python/cumin/tools.py' file. I am wondering whether the mint-server is still running on 'pop.pid' PID which the above-mentioned function returns or it gets restarted meanwhile, having another process id... If I manually kill mint-server with SIGTERM right after '/etc/init.d/cumin stop', it ends happily, so I do not suspect it is hanging: # /etc/init.d/cumin stop; pkill mint-server I will continue to examine it deeper next week. In change 3001, I have added some logic to more carefully kill the mint process and check for confirmation. Still valid on cumin-0.1.3021-1.el5 After stopping cumin via '/etc/init.d/cumin stop', the mint process starts eating all the CPU. Jan, just to make sure: did you also reinstall the schema from scratch? Ie, use cumin-database-destroy, then cumin-database-init? Sure I did. Again, here is what I have just done (on rev 3030): #(postgresql set up and running) yum -y remove cumin yum -y install cumin cumin-database-destroy cumin-database-init cumin-admin add-user test /etc/init.d/cumin start firefox http://localhost:45672/ # log in, possibly add local broker # if qpidd is running, log out /etc/init.d/cumin stop pgrep mint-server # running sleep 20 pgrep mint-server # still running top # though it's not eating all the CPU # as in previous version pkill mint-server pgrep mint-server # not running anymore BTW I am testin it on my local RHEL-5.2 running in chroot (not jailed via vserver or anything like it). The fact that it's no longer eating all the CPU is a very positive development. I'm now comfortable (a) release noting this for 1.1 and (b) extending the time over which we attempt to kill the mint-server subprocess in the cumin process. I extended the wait time in change 3036. Reopening for 1.1.1 so we make sure to flush out any still-hidden issues. Fixed at revision 3068: added a handler for SIG_TERM that catches the signal from the initscript (it's doing "killproc $servicename -TERM") and shuts down the mint process properly. Created attachment 329746 [details]
Locally built RPM for testing fix
You can use this temporary RPM to test while the fix does not show up in the candidates repo.
Verified on RHEL5.3 i386. Thanks for fixing it! An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0434.html |