Bug 637678 - service failover hangs at quotaoff in /usr/share/cluster/fs.sh
Summary: service failover hangs at quotaoff in /usr/share/cluster/fs.sh
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager
Version: 5.5
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 440645
Blocks: 694731
TreeView+ depends on / blocked
 
Reported: 2010-09-27 05:48 UTC by Ryan Mitchell
Modified: 2018-11-14 13:49 UTC (History)
10 users (show)

Fixed In Version: rgmanager-2.0.52-16.el5
Doc Type: Bug Fix
Doc Text:
Clone Of: 440645
Environment:
Last Closed: 2011-07-21 10:43:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
avoid running quotaoff when stopping nfs services without quotas enabled (1.79 KB, patch)
2010-09-27 06:08 UTC, Ryan Mitchell
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1000 0 normal SHIPPED_LIVE Low: rgmanager security, bug fix, and enhancement update 2011-07-21 10:43:18 UTC

Description Ryan Mitchell 2010-09-27 05:48:33 UTC
+++ This bug was initially created as a clone of Bug #440645 +++

This is a clone of 440645 for EL5 as it has not been applied previously.

Description of problem:
Under certain circumstances the quotaoff command may hang while disabling quotas
for an NFS mounted volume (for e.g. if the server or rpc.quotad is unavailable).

This causes the /usr/share/cluster/fs.sh agent to hang shutting down the resource.

Quotaoff should not hang but if quotas are not enabled or wanted for the file
system concerned it's better to avoid the problem by just not running it.

Version-Release number of selected component (if applicable):
rgmanager-2.0.52

How reproducible:
100% for certain configurations

Steps to Reproduce:
1. Configure a file system resource that mounts an NFS server
2. Activate the resource on a node
3. Make the NFS server go away
4. Try to shutdown or relocate the service
  
Actual results:
fs.sh hangs at quotaoff

Expected results:
fs.sh does not hang & shutdown / relocation completes with success

Additional info:
This was originally seen in a case where nodes were mounting their own NFS
exports. When the service was shut down, the NFS server stopped first causing
the quotaoff to hang. After discussion with Lon he provided a patch (attached)
to work around this by avoiding running quotaoff if quotas are not configured
for a given file system.

Comment 1 Ryan Mitchell 2010-09-27 06:08:18 UTC
Created attachment 449826 [details]
avoid running quotaoff when stopping nfs services without quotas enabled

Patch to avoid running quotaoff when stopping nfs services without quotas enabled.  This can cause hangs if the nfs server is unavailable.

Ported from RHEL4 bug 440645.  This has not been tested yet as I can't reproduce the original issue.  I have tested the patch for functionality with services however.

Comment 4 Lon Hohberger 2011-04-07 13:56:45 UTC
Merged

Comment 7 Lon Hohberger 2011-05-09 18:17:32 UTC
How to test

We were not able to reproduce the hang as described, however, the fix is still pertinent since fs.sh patch fixes the fact that we were always calling quotaoff, even when quotas were not in use.  Consequently, we are trying to test for patch correctness, as opposed to hang resolution, since we were not able to reliably reproduce the hang.

1) Create a 2+ node cluster

2) Set up syslog so that it redirects local4 to /var/log/rgmanager:

  FOR SYSLOG:

   echo "local4.* /var/log/rgmanager" >> /etc/syslog.conf
   service syslog restart

  FOR RSYSLOG:

   echo "local4.* /var/log/rgmanager" >> /etc/rsyslog.conf
   service rsyslog restart

3) Set rgmanager's logging up so that it logs debug messages to local4 in cluster.conf:

   <rm log_facility="local4" log_level="7" >
      ...
   </rm>

4) Add a service with a file system resource.  Ensure that neither usrquota nor grpquota are specified in the mount options.

   <rm log_facility="local4" log_level="7" >
     <service name="test" >
       <fs name="fs-test" device="/dev/sdb3" mountpoint="/mnt/test" />
     </service>
   </rm>

5) Enable the service.

6) Check the output of 'quota -v'.  There should be no output related to the file system added in step (4).

7) Disable the service.
   * On old versions of rgmanager, quotaoff was always called when unmounting,
     which was the cause of this issue.
   * On the new version of rgmanager incorporating the fix, there should
     NOT be a log message describing quotas being disabled prior to
     unmounting the file system.

8) Add quota options to the file system resource.  Simply add "usrquota,grpquota" to the options attribute of the file system resource:

     <fs name="fs-test" device="/dev/sdb3" options="usrquota,grpquota" mountpoint="/mnt/test" />

9) Enable the service.

10) Check the output of 'quota -v'.

  * On old and new versions of rgmanager, there should be output related to the file system added in step (4):
    [root@rhel5-1 ~]# quota -v
    Disk quotas for user root (uid 0): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
      /dev/sda2   17700       0       0               6       0       0

11) Disable the service.
   * On old versions of rgmanager, quotaoff was always called when unmounting,
     which was the cause of this issue.
   * On the new version of rgmanager incorporating the fix, there SHOULD
     be a log message describing quotas being disabled prior to unmounting
     the file system:

      <debug> Turning off quotas for /mnt/test

Comment 8 Martin Juricek 2011-05-13 09:32:06 UTC
Tested package rgmanager-2.0.52-19.el5, no hang was reproduced.
The patch is working OK, quotaoff is called only if quotas enabled.

service WITHOUT usrquota,grpquota fs options:
### starting service
May 13 04:07:28 a1 clurgmgrd[32686]: <notice> Starting disabled service service:test
May 13 04:07:28 a1 clurgmgrd: [32686]: <info> mounting /dev/loop0 on /mnt/test
May 13 04:07:28 a1 clurgmgrd: [32686]: <debug> mount -t ext3  /dev/loop0 /mnt/test
May 13 04:07:28 a1 clurgmgrd: [32686]: <info> quotaopts =
May 13 04:07:28 a1 clurgmgrd[32686]: <notice> Service service:test started
May 13 04:07:38 a1 clurgmgrd[32686]: <debug> 1 events processed
### stopping service
May 13 04:08:18 a1 clurgmgrd[32686]: <notice> Stopping service service:test
May 13 04:08:19 a1 clurgmgrd: [32686]: <info> unmounting /mnt/test
May 13 04:08:19 a1 clurgmgrd[32686]: <notice> Service service:test is disabled
May 13 04:08:29 a1 clurgmgrd[32686]: <debug> 1 events processed


service WITH usrquota,grpquota fs options
### starting service
May 13 04:12:36 a1 clurgmgrd[918]: <notice> Starting disabled service service:test
May 13 04:12:36 a1 clurgmgrd: [918]: <info> mounting /dev/loop0 on /mnt/test
May 13 04:12:36 a1 clurgmgrd: [918]: <debug> mount -t ext3 -o usrquota,grpquota /dev/loop0 /mnt/test
May 13 04:12:36 a1 clurgmgrd: [918]: <info> quotaopts = gu
May 13 04:12:36 a1 clurgmgrd: [918]: <info> Enabling Quotas on /mnt/test
May 13 04:12:36 a1 clurgmgrd: [918]: <debug> quotaon -gu /mnt/test
May 13 04:12:36 a1 clurgmgrd[918]: <notice> Service service:test started
May 13 04:12:46 a1 clurgmgrd[918]: <debug> 1 events processed
### stopping service
May 13 04:13:57 a1 clurgmgrd[918]: <notice> Stopping service service:test
May 13 04:13:58 a1 clurgmgrd: [918]: <debug> Turning off quotas for /mnt/test
May 13 04:13:58 a1 clurgmgrd: [918]: <info> unmounting /mnt/test
May 13 04:13:58 a1 clurgmgrd[918]: <notice> Service service:test is disabled
May 13 04:14:08 a1 clurgmgrd[918]: <debug> 1 events processed

Comment 11 errata-xmlrpc 2011-07-21 10:43:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1000.html


Note You need to log in before you can comment on or make changes to this bug.