+++ This bug was initially created as a clone of Bug #440645 +++ This is a clone of 440645 for EL5 as it has not been applied previously. Description of problem: Under certain circumstances the quotaoff command may hang while disabling quotas for an NFS mounted volume (for e.g. if the server or rpc.quotad is unavailable). This causes the /usr/share/cluster/fs.sh agent to hang shutting down the resource. Quotaoff should not hang but if quotas are not enabled or wanted for the file system concerned it's better to avoid the problem by just not running it. Version-Release number of selected component (if applicable): rgmanager-2.0.52 How reproducible: 100% for certain configurations Steps to Reproduce: 1. Configure a file system resource that mounts an NFS server 2. Activate the resource on a node 3. Make the NFS server go away 4. Try to shutdown or relocate the service Actual results: fs.sh hangs at quotaoff Expected results: fs.sh does not hang & shutdown / relocation completes with success Additional info: This was originally seen in a case where nodes were mounting their own NFS exports. When the service was shut down, the NFS server stopped first causing the quotaoff to hang. After discussion with Lon he provided a patch (attached) to work around this by avoiding running quotaoff if quotas are not configured for a given file system.
Created attachment 449826 [details] avoid running quotaoff when stopping nfs services without quotas enabled Patch to avoid running quotaoff when stopping nfs services without quotas enabled. This can cause hangs if the nfs server is unavailable. Ported from RHEL4 bug 440645. This has not been tested yet as I can't reproduce the original issue. I have tested the patch for functionality with services however.
Merged
How to test We were not able to reproduce the hang as described, however, the fix is still pertinent since fs.sh patch fixes the fact that we were always calling quotaoff, even when quotas were not in use. Consequently, we are trying to test for patch correctness, as opposed to hang resolution, since we were not able to reliably reproduce the hang. 1) Create a 2+ node cluster 2) Set up syslog so that it redirects local4 to /var/log/rgmanager: FOR SYSLOG: echo "local4.* /var/log/rgmanager" >> /etc/syslog.conf service syslog restart FOR RSYSLOG: echo "local4.* /var/log/rgmanager" >> /etc/rsyslog.conf service rsyslog restart 3) Set rgmanager's logging up so that it logs debug messages to local4 in cluster.conf: <rm log_facility="local4" log_level="7" > ... </rm> 4) Add a service with a file system resource. Ensure that neither usrquota nor grpquota are specified in the mount options. <rm log_facility="local4" log_level="7" > <service name="test" > <fs name="fs-test" device="/dev/sdb3" mountpoint="/mnt/test" /> </service> </rm> 5) Enable the service. 6) Check the output of 'quota -v'. There should be no output related to the file system added in step (4). 7) Disable the service. * On old versions of rgmanager, quotaoff was always called when unmounting, which was the cause of this issue. * On the new version of rgmanager incorporating the fix, there should NOT be a log message describing quotas being disabled prior to unmounting the file system. 8) Add quota options to the file system resource. Simply add "usrquota,grpquota" to the options attribute of the file system resource: <fs name="fs-test" device="/dev/sdb3" options="usrquota,grpquota" mountpoint="/mnt/test" /> 9) Enable the service. 10) Check the output of 'quota -v'. * On old and new versions of rgmanager, there should be output related to the file system added in step (4): [root@rhel5-1 ~]# quota -v Disk quotas for user root (uid 0): Filesystem blocks quota limit grace files quota limit grace /dev/sda2 17700 0 0 6 0 0 11) Disable the service. * On old versions of rgmanager, quotaoff was always called when unmounting, which was the cause of this issue. * On the new version of rgmanager incorporating the fix, there SHOULD be a log message describing quotas being disabled prior to unmounting the file system: <debug> Turning off quotas for /mnt/test
Tested package rgmanager-2.0.52-19.el5, no hang was reproduced. The patch is working OK, quotaoff is called only if quotas enabled. service WITHOUT usrquota,grpquota fs options: ### starting service May 13 04:07:28 a1 clurgmgrd[32686]: <notice> Starting disabled service service:test May 13 04:07:28 a1 clurgmgrd: [32686]: <info> mounting /dev/loop0 on /mnt/test May 13 04:07:28 a1 clurgmgrd: [32686]: <debug> mount -t ext3 /dev/loop0 /mnt/test May 13 04:07:28 a1 clurgmgrd: [32686]: <info> quotaopts = May 13 04:07:28 a1 clurgmgrd[32686]: <notice> Service service:test started May 13 04:07:38 a1 clurgmgrd[32686]: <debug> 1 events processed ### stopping service May 13 04:08:18 a1 clurgmgrd[32686]: <notice> Stopping service service:test May 13 04:08:19 a1 clurgmgrd: [32686]: <info> unmounting /mnt/test May 13 04:08:19 a1 clurgmgrd[32686]: <notice> Service service:test is disabled May 13 04:08:29 a1 clurgmgrd[32686]: <debug> 1 events processed service WITH usrquota,grpquota fs options ### starting service May 13 04:12:36 a1 clurgmgrd[918]: <notice> Starting disabled service service:test May 13 04:12:36 a1 clurgmgrd: [918]: <info> mounting /dev/loop0 on /mnt/test May 13 04:12:36 a1 clurgmgrd: [918]: <debug> mount -t ext3 -o usrquota,grpquota /dev/loop0 /mnt/test May 13 04:12:36 a1 clurgmgrd: [918]: <info> quotaopts = gu May 13 04:12:36 a1 clurgmgrd: [918]: <info> Enabling Quotas on /mnt/test May 13 04:12:36 a1 clurgmgrd: [918]: <debug> quotaon -gu /mnt/test May 13 04:12:36 a1 clurgmgrd[918]: <notice> Service service:test started May 13 04:12:46 a1 clurgmgrd[918]: <debug> 1 events processed ### stopping service May 13 04:13:57 a1 clurgmgrd[918]: <notice> Stopping service service:test May 13 04:13:58 a1 clurgmgrd: [918]: <debug> Turning off quotas for /mnt/test May 13 04:13:58 a1 clurgmgrd: [918]: <info> unmounting /mnt/test May 13 04:13:58 a1 clurgmgrd[918]: <notice> Service service:test is disabled May 13 04:14:08 a1 clurgmgrd[918]: <debug> 1 events processed
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1000.html