434591 – gfs service relocation can cause it's nfs client io to fail/time out

Bug 434591 - gfs service relocation can cause it's nfs client io to fail/time out

Summary: gfs service relocation can cause it's nfs client io to fail/time out

Keywords:
Status:	CLOSED DUPLICATE of bug 252335
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	rgmanager
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Lon Hohberger
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-02-22 23:17 UTC by Corey Marthaler
Modified:	2009-04-16 20:35 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-02-25 14:38:24 UTC
Embargoed:

Attachments	(Terms of Use)

Description Corey Marthaler 2008-02-22 23:17:58 UTC

Description of problem:
There may already be a bz open for this, but I couldn't find it.

I had NFS I/O running to 2 filesystems (ext and gfs) while relocating the
service around the cluster. The last time I tried the relocation, the client I/O
to the gfs filesystem failed. When I tried an ls, it took like 3-5 minutes to
return.

Run the cmd on hayes-01:
Feb 22 16:59:49 hayes-01 qarshd[23533]: Running cmdline: clusvcadm -r nfs1 -m
hayes-03

Service is running on hayes-02:
Feb 22 16:59:49 hayes-02 clurgmgrd[12469]: <notice> Stopping service nfs1
Feb 22 16:59:49 hayes-02 clurgmgrd: [12469]: <info> Removing IPv4 address
10.15.89.209 from eth0
Feb 22 16:59:59 hayes-02 clurgmgrd: [12469]: <info> Removing export: *:/mnt/hayes0
Feb 22 16:59:59 hayes-02 clurgmgrd: [12469]: <warning> Dropping node-wide NFS locks
Feb 22 16:59:59 hayes-02 clurgmgrd: [12469]: <info> unmounting
/dev/mapper/HAYES-HAYES0 (/mnt/hayes0)
Feb 22 16:59:59 hayes-02 clurgmgrd: [12469]: <info> Removing export: *:/mnt/hayes1
Feb 22 16:59:59 hayes-02 clurgmgrd: [12469]: <info> unmounting /mnt/hayes1
Feb 22 16:59:59 hayes-02 clurgmgrd[12469]: <notice> Service nfs1 is stopped

Service is supposed to relocate to hayes-03:
Feb 22 17:00:00 hayes-03 clurgmgrd[13157]: <notice> Starting stopped service nfs1
Feb 22 17:00:00 hayes-03 clurgmgrd: [13157]: <info> mounting
/dev/mapper/HAYES-HAYES1 on /mnt/hayes1
Feb 22 17:00:00 hayes-03 kernel: kjournald starting.  Commit interval 5 seconds
Feb 22 17:00:00 hayes-03 kernel: EXT3 FS on dm-3, internal journal
Feb 22 17:00:00 hayes-03 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Feb 22 17:00:00 hayes-03 clurgmgrd: [13157]: <info> Adding export: *:/mnt/hayes1
(fsid=7777,rw)
Feb 22 17:00:00 hayes-03 kernel: GFS: Trying to join cluster "lock_dlm",
"HAYES:HAYES0"
Feb 22 17:00:02 hayes-03 kernel: GFS: fsid=HAYES:HAYES0.0: Joined cluster. Now
mounting FS...
Feb 22 17:00:02 hayes-03 kernel: GFS: fsid=HAYES:HAYES0.0: jid=0: Trying to
acquire journal lock...
Feb 22 17:00:02 hayes-03 kernel: GFS: fsid=HAYES:HAYES0.0: jid=0: Looking at
journal...
Feb 22 17:00:02 hayes-03 kernel: GFS: fsid=HAYES:HAYES0.0: jid=0: Done
Feb 22 17:00:02 hayes-03 kernel: GFS: fsid=HAYES:HAYES0.0: jid=1: Trying to
acquire journal lock...
Feb 22 17:00:02 hayes-03 kernel: GFS: fsid=HAYES:HAYES0.0: jid=1: Looking at
journal...
Feb 22 17:00:02 hayes-03 kernel: GFS: fsid=HAYES:HAYES0.0: jid=1: Done
Feb 22 17:00:02 hayes-03 kernel: GFS: fsid=HAYES:HAYES0.0: jid=2: Trying to
acquire journal lock...
Feb 22 17:00:02 hayes-03 kernel: GFS: fsid=HAYES:HAYES0.0: jid=2: Looking at
journal...
Feb 22 17:00:02 hayes-03 kernel: GFS: fsid=HAYES:HAYES0.0: jid=2: Done
Feb 22 17:00:02 hayes-03 clurgmgrd: [13157]: <info> Adding export: *:/mnt/hayes0
(fsid=8868,rw)
Feb 22 17:00:02 hayes-03 clurgmgrd: [13157]: <info> Adding IPv4 address
10.15.89.209 to eth0
Feb 22 17:00:03 hayes-03 clurgmgrd[13157]: <notice> Service nfs1 started

Client I/O failure:
[accordion_quick] accordion(): cache_open(accrdfile4, 4162, 0666) failed: Stale
NFS file handle


Version-Release number of selected component (if applicable):
2.6.9-67.ELsmp
rgmanager-1.9.72-1

NFS client:
2.6.9-42.ELhugemem

Comment 1 Corey Marthaler 2008-02-22 23:19:31 UTC

Here was my resource section:

<rm>
    <failoverdomains>
      <failoverdomain name="HAYES_domain" ordered="0" restricted="0">
        <failoverdomainnode name="hayes-01" priority="1"/>
        <failoverdomainnode name="hayes-02" priority="1"/>
        <failoverdomainnode name="hayes-03" priority="1"/>
      </failoverdomain>
    </failoverdomains>
    <resources>
      <ip address="10.15.89.209" monitor_link="1"/>
      <clusterfs device="/dev/HAYES/HAYES0" force_unmount="1" self_fence="1"
fsid="8868" fstype="gfs" mountpoint="/mnt/hayes0" name="HAYES0" options=""/>
      <fs device="/dev/HAYES/HAYES1" force_fsck="0" force_unmount="1"
self_fence="1" fsid="7777" fstype="ext3" mountpoint="/mnt/hayes1" name="HAYES1"
options=""/>
      <nfsexport name="HAYES nfs exports"/>
      <nfsclient name="*" options="rw" target="*"/>
    </resources>
    <service autostart="1" domain="HAYES_domain" name="nfs1" nfslock="1">
      <clusterfs ref="HAYES0">
        <nfsexport ref="HAYES nfs exports">
          <nfsclient ref="*"/>
        </nfsexport>
      </clusterfs>
      <fs ref="HAYES1">
        <nfsexport ref="HAYES nfs exports">
          <nfsclient ref="*"/>
        </nfsexport>
      </fs>
      <ip ref="10.15.89.209"/>
    </service>
  </rm>


The relocate did appear to work:
[root@hayes-03 etc]# clustat
Member Status: Quorate

  Member Name                              Status
  ------ ----                              ------
  hayes-01                                 Online, rgmanager
  hayes-02                                 Online, rgmanager
  hayes-03                                 Online, Local, rgmanager

  Service Name         Owner (Last)                   State
  ------- ----         ----- ------                   -----
  nfs1                 hayes-03                       started

Comment 2 Lon Hohberger 2008-02-25 14:38:24 UTC


*** This bug has been marked as a duplicate of 252335 ***

Note You need to log in before you can comment on or make changes to this bug.