Bug 247291 - shutdown while processing relocation request results in node reboot
shutdown while processing relocation request results in node reboot
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager (Show other bugs)
All Linux
urgent Severity urgent
: ---
: ---
Assigned To: Lon Hohberger
Depends On:
  Show dependency treegraph
Reported: 2007-07-06 14:18 EDT by Lon Hohberger
Modified: 2009-04-16 18:18 EDT (History)
1 user (show)

See Also:
Fixed In Version: RHBA-2007-0580
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-11-07 11:46:15 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Lon Hohberger 2007-07-06 14:18:16 EDT
Description of problem:

On one node, if you run 'clusvcadm -r foo' in a loop, it bounces the service
back and forth between nodes.

On one of the nodes the service is using, if you run 'while : ; do service
rgmanager start; sleep 60; service rgmanager stop', you will eventually get this:

Jul  6 13:57:55 lisa rgmanager: [17227]: <notice> Shutting down Cluster Service
Jul  6 13:57:55 lisa clurgmgrd[16434]: <notice> Shutting down
Jul  6 13:57:56 lisa clurgmgrd[16434]: <notice> Shutdown complete, exiting
Jul  6 13:57:56 lisa kernel: clurgmgrd[17239]: segfault at 0000000000000000 rip
0000000000415cd8 rsp 0000000044605f30 error 4
Jul  6 13:57:56 lisa kernel: dlm: rgmanager: group leave failed -512 0
Jul  6 13:57:56 lisa clurgmgrd[16433]: <crit> Watchdog: Daemon died, rebooting... 
Jul  6 13:57:56 lisa dlm_controld[3676]: open
"/sys/kernel/dlm/rgmanager/control" error -1 2
Jul  6 13:57:56 lisa dlm_controld[3676]: open
"/sys/kernel/dlm/rgmanager/event_done" error -1 2
Jul  6 13:57:56 lisa kernel: md: stopping all md devices.
Jul  6 13:57:57 lisa kernel: Synchronizing SCSI cache for disk sda:

Version-Release number of selected component (if applicable): 5.1 beta

How reproducible: difficult
Comment 1 Lon Hohberger 2007-07-16 13:34:38 EDT
I think this can be solved by tracking all threads (even simple ones) and making
sure they're cleaned up in the exit path.  I will test this soon.
Comment 3 RHEL Product and Program Management 2007-07-20 16:07:13 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 4 Lon Hohberger 2007-07-23 14:20:45 EDT
Test setup:
* 5 node cluster
* 2 exclusive services (test1, test2)

Reproduce case:
* on node 1: 
   while :; do clusvcadm -r test1; done
* on node 2:
   while :; do clusvcadm -r test2; done
* on node 3 (**):
   while :; do service rgmanager stop; service rgmanager start; sleep 30; done

**: This needs to be one of the nodes the service is hitting.
Comment 5 Lon Hohberger 2007-07-24 14:51:54 EDT
Patches in RHEL5, RHEL51, head.
Comment 8 errata-xmlrpc 2007-11-07 11:46:15 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.