Bug 192784 - relocation of ext3 service fails with heavy client load
relocation of ext3 service fails with heavy client load
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: rgmanager (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
:
Depends On:
Blocks: 180185
  Show dependency treegraph
 
Reported: 2006-05-22 19:00 EDT by Corey Marthaler
Modified: 2009-04-16 16:20 EDT (History)
1 user (show)

See Also:
Fixed In Version: RHCS4U4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-12-14 12:38:02 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Change ordering so that umount is retried *before* reclaim broadcast (1.29 KB, patch)
2006-07-26 11:20 EDT, Lon Hohberger
no flags Details | Diff
Full fs.sh which is compatible with current rgmanager builds (25.91 KB, text/plain)
2006-07-26 11:25 EDT, Lon Hohberger
no flags Details

  None (edit)
Description Corey Marthaler 2006-05-22 19:00:12 EDT
Description of problem:
I had two nfs clients running a pretty heavy I/O load to one gfs and one ext3
fileystem both a part of the same NFS service. I then attempted to relocate that
service and it failed. I tried this both with the force_umount on and off and
each failed. Without any nfs client I/O this senario works.

May 22 12:34:08 link-02 clurgmgrd[4733]: <notice> Stopping service nfs1
May 22 12:34:08 link-02 clurgmgrd: [4733]: <info> Removing IPv4 address
10.15.89.209 from eth0
May 22 12:34:18 link-02 clurgmgrd: [4733]: <info> Removing export: *:/mnt/link0
May 22 12:34:18 link-02 clurgmgrd: [4733]: <info> unmounting
/dev/mapper/LINK_128-LINK_1280 (/mnt/link0)
May 22 12:34:18 link-02 clurgmgrd: [4733]: <notice> Forcefully unmounting /mnt/link0
May 22 12:34:22 link-02 clurgmgrd: [4733]: <info> unmounting
/dev/mapper/LINK_128-LINK_1280 (/mnt/link0)
May 22 12:34:22 link-02 clurgmgrd: [4733]: <notice> Forcefully unmounting /mnt/link0
May 22 12:34:23 link-02 clurgmgrd: [4733]: <err> 'umount
/dev/mapper/LINK_128-LINK_1280' failed (/mnt/link0), error=0
May 22 12:34:23 link-02 clurgmgrd[4733]: <notice> stop on clusterfs "LINK_1280"
returned 2 (invalid argument(s))
May 22 12:34:23 link-02 clurgmgrd[4733]: <crit> #12: RG nfs1 failed to stop;
intervention required
May 22 12:34:23 link-02 clurgmgrd[4733]: <notice> Service nfs1 is failed
May 22 12:34:23 link-02 clurgmgrd[4733]: <alert> #2: Service nfs1 returned
failure code.  Last Owner: link-02
May 22 12:34:23 link-02 clurgmgrd[4733]: <alert> #4: Administrator intervention
required.
May 22 12:35:21 link-02 lock_gulmd_core[3267]: "Magma::6368" is logged out. fd:13
 
[root@link-02 ~]# clustat
Member Status: Quorate

  Member Name                              Status
  ------ ----                              ------
  link-01                                  Online, rgmanager
  link-02                                  Online, Local, rgmanager
  link-08                                  Online, rgmanager

  Service Name         Owner (Last)                   State
  ------- ----         ----- ------                   -----
  nfs1                 (link-02)                      failed

Version-Release number of selected component (if applicable):
[root@link-02 ~]# uname -ar
Linux link-02 2.6.9-34.0.1.ELsmp #1 SMP Wed May 17 16:59:36 EDT 2006 x86_64
x86_64 x86_64 GNU/Linux
[root@link-02 ~]# rpm -q rgmanager
rgmanager-1.9.46-0

<rm>
    <failoverdomains>
      <failoverdomain name="LINK_128_domain" ordered="0" restricted="0">
        <failoverdomainnode name="link-01" priority="1"/>
        <failoverdomainnode name="link-02" priority="1"/>
        <failoverdomainnode name="link-08" priority="1"/>
      </failoverdomain>
    </failoverdomains>
    <resources>
      <ip address="10.15.89.209" monitor_link="1"/>
      <clusterfs device="/dev/LINK_128/LINK_1280" force_unmount="1" fsid="6731"
fstype="gfs" mountpoint="/mnt/link0" name="LINK_1280" options=""/>
      <fs device="/dev/LINK_128/LINK_1281" force_fsck="0" force_unmount="1"
fsid="4139" fstype="ext3" mountpoint="/mnt/link1" name="LINK_1281" options=""/>
      <nfsexport name="LINK_128 nfs exports"/>
      <nfsclient name="*" options="rw" target="*"/>
    </resources>
    <service autostart="1" domain="LINK_128_domain" name="nfs1">
      <clusterfs ref="LINK_1280">
        <nfsexport ref="LINK_128 nfs exports">
          <nfsclient ref="*"/>
        </nfsexport>
      </clusterfs>
      <fs ref="LINK_1281">
        <nfsexport ref="LINK_128 nfs exports">
          <nfsclient ref="*"/>
        </nfsexport>
      </fs>
      <ip ref="10.15.89.209"/>
    </service>
  </rm>


How reproducible:
everytime
Comment 1 Nate Straz 2006-05-23 18:35:28 EDT
I hit this too while running tests on RHEL4-U3 errata

May 23 17:24:31 tank-01 qarshd[18474]: Running cmdline: clusvcadm -r nfs_service
 -m tank-02 
May 23 17:24:32 tank-01 clurgmgrd[8507]: <notice> Stopping service nfs_service 
May 23 17:24:32 tank-01 clurgmgrd: [8507]: <info> Removing IPv4 address 10.15.89
.203 from eth0 
May 23 17:24:42 tank-01 clurgmgrd: [8507]: <info> Removing export: *:/mnt/gfs1 
May 23 17:24:42 tank-01 clurgmgrd: [8507]: <info> Removing export: *:/mnt/ext3 
May 23 17:24:42 tank-01 clurgmgrd: [8507]: <info> unmounting /mnt/ext3 
May 23 17:24:46 tank-01 clurgmgrd: [8507]: <info> unmounting /mnt/ext3 
May 23 17:24:46 tank-01 clurgmgrd: [8507]: <err> 'umount /mnt/ext3' failed, erro
r=0 
May 23 17:24:46 tank-01 clurgmgrd[8507]: <notice> stop on fs "tank-cluster1" ret
urned 2 (invalid argument(s)) 

Here is the rm section of my cluster.conf:
<rm>
    <failoverdomains>
      <failoverdomain name="tank-cluster_domain" ordered="0" restricted="0">
        <failoverdomainnode name="tank-01" priority="1"/>
        <failoverdomainnode name="tank-02" priority="1"/>
        <failoverdomainnode name="tank-03" priority="1"/>
        <failoverdomainnode name="tank-04" priority="1"/>
        <failoverdomainnode name="tank-05" priority="1"/>
      </failoverdomain>
    </failoverdomains>
    <resources>
      <ip address="10.15.89.203" monitor_link="1"/>
      <clusterfs device="/dev/tank-cluster/tank-cluster0" force_unmount="0"
fsid="7989" fstype="gfs" mountpoint="/mnt/gfs1" name="tank-cluster0" options=""/>
      <fs device="/dev/tank-cluster/tank-cluster1" force_fsck="0"
force_unmount="0" fsid="8364" fstype="ext3" mountpoint="/mnt/ext3"
name="tank-cluster1" options=""/>
      <nfsexport name="tank-cluster nfs exports"/>
      <nfsclient name="*" options="rw" target="*"/>
    </resources>
    <service autostart="1" domain="tank-cluster_domain" name="nfs_service">
      <clusterfs ref="tank-cluster0">
        <nfsexport ref="tank-cluster nfs exports">
          <nfsclient ref="*"/>
        </nfsexport>
      </clusterfs>
      <fs ref="tank-cluster1">
        <nfsexport ref="tank-cluster nfs exports">
          <nfsclient ref="*"/>
        </nfsexport>
      </fs>
      <ip ref="10.15.89.203"/>
    </service>
  </rm>
Comment 3 Lon Hohberger 2006-06-15 17:35:19 EDT
This is probably related to the NFS changes we need to get in for U4; Wendy and
I are pretty much set right now, but she's still investigating a bit, I think.
Comment 5 Lon Hohberger 2006-07-05 16:35:36 EDT
Moving to needinfo_reporter; does this happen on the current spin?
Comment 6 Corey Marthaler 2006-07-06 13:15:49 EDT
I did see this again with the latest. 
rgmanager-1.9.51-0
2.6.9-39.1.ELsmp

Will test with the nfslock="1" flag to see if that makes this problem go away.
Comment 7 Corey Marthaler 2006-07-25 15:56:00 EDT
Currently testing the nfslock flag workaround while running the latest rhel4u4
cluster regression tests. 

How will the customers know to use this flag? Where is it documented?

This flag is also not only not in the cluster GUI, but it causes the
cluster.conf to be invalid:

Relax-NG validity error : Extra element rm in interleave
/etc/cluster/cluster.conf:46: element rm: Relax-NG validity error : Element
cluster failed to validate content
/etc/cluster/cluster.conf fails to validate
Comment 8 Corey Marthaler 2006-07-25 18:09:08 EDT
I still appear to see this issue even with the nfslock flag. I'll leave the tank
cluster in this state over night if you'd like to check it out.

Attempted to relocate from tank-01 to tank-05.

Jul 25 16:28:26 tank-01 clurgmgrd[389]: <notice> status on nfsclient "*" returne
d 127 (unspecified)
Jul 25 16:28:26 tank-01 bash: [21380]: <info> Removing export: *:/mnt/tank1
Jul 25 16:28:26 tank-01 bash: [21380]: <info> Adding export: *:/mnt/tank1 (fsid=
9468,rw)
Jul 25 16:28:36 tank-01 clurgmgrd[389]: <notice> status on nfsclient "*" returne
d 127 (unspecified)
Jul 25 16:28:36 tank-01 bash: [21490]: <info> Removing export: *:/mnt/tank2
Jul 25 16:28:36 tank-01 bash: [21490]: <info> Adding export: *:/mnt/tank2 (fsid=
661,rw)
Jul 25 16:28:53 tank-01 clurgmgrd[389]: <notice> Stopping service nfs1
Jul 25 16:28:53 tank-01 clurgmgrd: [389]: <info> Removing IPv4 address 10.15.89.
203 from eth0
Jul 25 16:29:03 tank-01 clurgmgrd: [389]: <info> Removing export: *:/mnt/tank2
Jul 25 16:29:03 tank-01 clurgmgrd: [389]: <warning> Dropping node-wide NFS locks

Jul 25 16:29:03 tank-01 clurgmgrd: [389]: <info> Sending reclaim notifications v
                    ia tank-01
Jul 25 16:29:03 tank-01 rpc.statd[21750]: Version 1.0.6 Starting
Jul 25 16:29:03 tank-01 rpc.statd[21750]: Flags: No-Daemon Notify-Only
Jul 25 16:29:03 tank-01 rpc.statd[21750]: statd running as root. chown /tmp/stat
                    d-tank-01.21703/sm to choose different user
Jul 25 16:29:06 tank-01 rpc.statd[21750]: Caught signal 15, un-registering and e
                    xiting.
Jul 25 16:29:06 tank-01 clurgmgrd: [389]: <info> Removing export: *:/mnt/tank1
Jul 25 16:29:06 tank-01 clurgmgrd: [389]: <info> unmounting /mnt/tank1
Jul 25 16:29:10 tank-01 clurgmgrd: [389]: <info> unmounting /mnt/tank1
Jul 25 16:29:10 tank-01 clurgmgrd: [389]: <err> 'umount /mnt/tank1' failed, erro
                    r=0
Jul 25 16:29:10 tank-01 clurgmgrd[389]: <notice> stop on fs "tank-cluster0" retu
                    rned 2 (invalid argument(s))
Jul 25 16:29:10 tank-01 clurgmgrd[389]: <crit> #12: RG nfs1 failed to stop; inte
                    rvention required
Jul 25 16:29:10 tank-01 clurgmgrd[389]: <notice> Service nfs1 is failed
Jul 25 16:29:11 tank-01 clurgmgrd[389]: <alert> #2: Service nfs1 returned failur
                    e code.  Last Owner: tank-01
Jul 25 16:29:11 tank-01 clurgmgrd[389]: <alert> #4: Administrator intervention r
                    equired.
Jul 25 17:01:01 tank-01 crond(pam_unix)[22077]: session opened for user root by
(uid=0)
Jul 25 17:01:01 tank-01 crond(pam_unix)[22077]: session closed for user root


Jul 25 16:29:11 tank-05 clurgmgrd[25902]: <err> #43: Service nfs1 has failed; ca
                    n not start.


[root@tank-01 ~]# clustat
Member Status: Quorate

  Member Name                              Status
  ------ ----                              ------
  tank-01                                  Online, Local, rgmanager
  tank-02                                  Online, rgmanager
  tank-03                                  Online, rgmanager
  tank-04                                  Online, rgmanager
  tank-05                                  Online, rgmanager

  Service Name         Owner (Last)                   State
  ------- ----         ----- ------                   -----
  nfs1                 (tank-01)                      failed




[root@tank-01 ~]# rpm -q rgmanager
rgmanager-1.9.51-0
[root@tank-01 ~]# uname -ar
Linux tank-01 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:27:17 EDT 2006 i686 i686 i386
GNU/Linux


Comment 9 Lon Hohberger 2006-07-26 11:18:53 EDT
Ok, when I logged in, I retried a manual service stop:

[root@tank-01 ~]# rg_test test /etc/cluster/cluster.conf stop service nfs1
Running in test mode.
Stopping nfs1...
<debug>  10.15.89.203 is not configured
<info>   Removing export: *:/mnt/tank2
<warning>Dropping node-wide NFS locks
<info>   Sending reclaim notifications via tank-01
<debug>  Not umounting /dev/mapper/tank--cluster-tank--cluster1 (clustered file
system)
<info>   Removing export: *:/mnt/tank1
<info>   unmounting /mnt/tank1
umount: /mnt/tank1: device is busy
umount: /mnt/tank1: device is busy
<info>   unmounting /mnt/tank1
umount: /mnt/tank1: device is busy
umount: /mnt/tank1: device is busy
<err>    'umount /mnt/tank1' failed, error=0

It didn't work.  So, I did things manually:

[root@tank-01 ~]# killall -9 lockd
[root@tank-01 ~]# umount /mnt/tank1/

... and it worked fine.

This led me to notice an ordering problem in fs.sh - It was broadcasting reclaim
notifications *before* retrying the unmount; this should be done *after*
retrying the unmount.

Note that in the cluster configuration, the service should have the
'nfslock="1"' flag, not the <fs> tag.  This does *not* affect this particular
bug, however.  The reason this flag exists at all is because the entire model of
how NFS locks are handled is very likely going to change in the next update due
to the work that Wendy is doing upstream.

Comment 10 Lon Hohberger 2006-07-26 11:20:29 EDT
Created attachment 133075 [details]
Change ordering so that umount is retried *before* reclaim broadcast
Comment 11 Lon Hohberger 2006-07-26 11:25:00 EDT
Created attachment 133076 [details]
Full fs.sh which is compatible with current rgmanager builds

If you want to test without a full respin of rgmanager, just copy this to
/usr/share/cluster/ on all cluster nodes.
Comment 12 Lon Hohberger 2006-07-27 14:59:16 EDT
Hit again.
Comment 13 Lon Hohberger 2006-08-02 14:31:55 EDT
The original problem has more or less been solved.

The problem now is that we have hit a bug in the base kernel which prevents
umount entirely if an export has heavy client I/O.  While references were
previously held primarily by lockd (which has been solved via killing
lockd/sending reclaims), there seems to be an incorrect reference count or a
leak somewhere in the kernel.

When this problem is hit, it is impossible to umount the file system without
rebooting the node.  As such, if a user needs to perform a relocation of an NFS
service while there is heavy I/O occurring to that NFS service's exports, it is
a good idea to edit the cluster configuration and enable the option to reboot if
unmount fails.  This corresponds to adding 'self_fence="1"' to the <fs> resource
in cluster.conf.

While rebooting a node is certainly sub-optimal, it will help minimize downtime
for that particular service.
Comment 14 Lon Hohberger 2006-08-02 14:33:47 EDT
To be more clear, the 'reboot-if-unmount-fails' option can be configured from
within system-config-cluster in the file system dialog box.

Note You need to log in before you can comment on or make changes to this bug.