Bug 247291 - shutdown while processing relocation request results in node reboot
shutdown while processing relocation request results in node reboot
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager (Show other bugs)
5.0
All Linux
urgent Severity urgent
: ---
: ---
Assigned To: Lon Hohberger
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-07-06 14:18 EDT by Lon Hohberger
Modified: 2009-04-16 18:18 EDT (History)
1 user (show)

See Also:
Fixed In Version: RHBA-2007-0580
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-07 11:46:15 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Lon Hohberger 2007-07-06 14:18:16 EDT
Description of problem:

On one node, if you run 'clusvcadm -r foo' in a loop, it bounces the service
back and forth between nodes.

On one of the nodes the service is using, if you run 'while : ; do service
rgmanager start; sleep 60; service rgmanager stop', you will eventually get this:

Jul  6 13:57:55 lisa rgmanager: [17227]: <notice> Shutting down Cluster Service
Manager...
Jul  6 13:57:55 lisa clurgmgrd[16434]: <notice> Shutting down
Jul  6 13:57:56 lisa clurgmgrd[16434]: <notice> Shutdown complete, exiting
Jul  6 13:57:56 lisa kernel: clurgmgrd[17239]: segfault at 0000000000000000 rip
0000000000415cd8 rsp 0000000044605f30 error 4
Jul  6 13:57:56 lisa kernel: dlm: rgmanager: group leave failed -512 0
Jul  6 13:57:56 lisa clurgmgrd[16433]: <crit> Watchdog: Daemon died, rebooting... 
Jul  6 13:57:56 lisa dlm_controld[3676]: open
"/sys/kernel/dlm/rgmanager/control" error -1 2
Jul  6 13:57:56 lisa dlm_controld[3676]: open
"/sys/kernel/dlm/rgmanager/event_done" error -1 2
Jul  6 13:57:56 lisa kernel: md: stopping all md devices.
Jul  6 13:57:57 lisa kernel: Synchronizing SCSI cache for disk sda:


Version-Release number of selected component (if applicable): 5.1 beta


How reproducible: difficult
Comment 1 Lon Hohberger 2007-07-16 13:34:38 EDT
I think this can be solved by tracking all threads (even simple ones) and making
sure they're cleaned up in the exit path.  I will test this soon.
Comment 3 RHEL Product and Program Management 2007-07-20 16:07:13 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 4 Lon Hohberger 2007-07-23 14:20:45 EDT
Test setup:
* 5 node cluster
* 2 exclusive services (test1, test2)

Reproduce case:
* on node 1: 
   while :; do clusvcadm -r test1; done
* on node 2:
   while :; do clusvcadm -r test2; done
* on node 3 (**):
   while :; do service rgmanager stop; service rgmanager start; sleep 30; done

**: This needs to be one of the nodes the service is hitting.
Comment 5 Lon Hohberger 2007-07-24 14:51:54 EDT
Patches in RHEL5, RHEL51, head.
Comment 8 errata-xmlrpc 2007-11-07 11:46:15 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0580.html

Note You need to log in before you can comment on or make changes to this bug.