Bug 784933

Summary: exportfs agent doubles rmtab on each relocation
Product: Red Hat Enterprise Linux 6 Reporter: Jaroslav Kortus <jkortus>
Component: resource-agentsAssignee: David Vossel <dvossel>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 6.2CC: agk, cluster-maint, ddumas, dvossel, fdinitto, gh05t.7id37, lhh, mnovacek
Target Milestone: rcKeywords: TechPreview
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: resource-agents-3.9.2-29.el6 Doc Type: Technology Preview
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-21 05:17:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaroslav Kortus 2012-01-26 17:48:49 UTC
Description of problem:
when exportfs relocates the exported share, rmtab size is doubled.
this is probably due to restore_rmtab call in the agent where it does "
cat  ${rmtab_backup} >> /var/lib/nfs/rmtab"

During shutdown this file is grepped and the result sent to rmtab. This way the file grows twice it's size on each graceful relocation eventually leading to unavailability of the service when the grow and copy operations become too slow to be completed in time.

Probably some sort of sort | uniq would help here, as the file was full of duplicated entries.


Version-Release number of selected component (if applicable):
pacemaker-1.1.6-3.el6.x86_64


How reproducible:
100%

Steps to Reproduce:
1. setup nfs server + nfs export of gfs2 filesystem (see below)
2. mount the share from the client (1 entry now in /var/lib/nfs/rmtab)
3. relocate the service (crm resource move nfsgroup)
4. see the entry doubled in /var/lib/nfs/rmtab
  
Actual results:
rmtab growing on each relocation until the service cannot be relocated any more.

Expected results:
file not growing and not containing duplicate entries

Additional info:
crm configure show
node node01
node node02
node node03
primitive ClusterIP ocf:heartbeat:IPaddr2 \
        params ip="192.168.100.11" cidr_netmask="32" \
        op monitor interval="30s"
primitive datadir ocf:heartbeat:exportfs \
        params clientspec="*" directory="/mnt/vedder0" fsid="4" options="all_squash,rw"
primitive gfs2 ocf:heartbeat:Filesystem \
        params device="/dev/rhts_cluster/vedder0" directory="/mnt/vedder0" fstype="gfs2" options="noatime"
primitive nfsserver ocf:heartbeat:nfsserver \
        params nfs_init_script="/etc/init.d/nfs" nfs_shared_infodir="/mnt/vedder0/nfs" nfs_ip="192.168.100.11" nfs_notify_cmd="/usr/sbin/sm-notify"
group nfsgroup nfsserver datadir ClusterIP \
        meta target-role="Started"
clone gfs2clone gfs2 \
        meta target-role="Started"

Comment 3 Andrew Beekhof 2012-01-29 10:00:31 UTC
Looks like an issue with the agent.  Re-assigning.

Comment 4 Chris Feist 2012-02-25 00:15:57 UTC
Jaroslav,

Do you know what version of resource-agents you had installed?  Was is just what was included in 6.2?  I'm having trouble getting things into /var/lib/nfs/rmtab can you also let me know what kernel you're using as well?

Thanks!
Chris

Comment 5 Jaroslav Kortus 2012-02-27 10:54:35 UTC
If I remember correctly, it was 6.2 GA, same applies for the kernel.

Comment 10 gh05t.7id37 2012-08-24 21:19:33 UTC
I was facing the same problem in a failover cluster implementation based on RHEL 6.2 and using the following packages:

pacemaker-libs-1.1.6-3.el6.x86_64
pacemaker-cluster-libs-1.1.6-3.el6.x86_64
pacemaker-1.1.6-3.el6.x86_64
pacemaker-cli-1.1.6-3.el6.x86_64

corosync-1.4.1-4.el6_2.2.x86_64
corosynclib-1.4.1-4.el6_2.2.x86_64

heartbeat-libs-3.0.4-1.el6.x86_64
heartbeat-3.0.4-1.el6.x86_64

I've fixed the problem changing the script /usr/lib/ocf/resource.d/heartbeat/exportfs, and inserting the following two lines of code:

grep -v ":${OCF_RESKEY_directory}:" /var/lib/nfs/rmtab > /var/lib/nfs/rmtab.tmp
mv   -f /var/lib/nfs/rmtab.tmp /var/lib/nfs/rmtab

The two lines above are inserted before the following line:

cat  ${rmtab_backup} >> /var/lib/nfs/rmtab

Comment 13 David Vossel 2013-08-06 18:27:04 UTC
This issue has been resolved in the latest resource-agents build.  The upstream patch related to this issue can be found here.

https://github.com/ClusterLabs/resource-agents/commit/bbc90e9de8636609842fb01219e8d9c789d8a623

Comment 14 David Vossel 2013-08-06 19:25:13 UTC
This has been fixed as a result of the heartbeat agent refresh.

Comment 18 michal novacek 2013-10-15 11:03:44 UTC
I have verified that the /var/lib/nfs/rmtab file size does not grow
exponentially with the patched version of resource-agents-3.9.2-40.el6.x86_64
after moving the nfs server 10 times.


setup of the cluster and resources is as follows:

---------
virt-021# pcs status
Cluster name: STSRHTS11429
Last updated: Tue Oct 15 11:42:28 2013
Last change: Tue Oct 15 11:36:21 2013 via cibadmin on virt-022
Stack: cman
Current DC: virt-022 - partition with quorum
Version: 1.1.10-14.el6-368c726
3 Nodes configured
7 Resources configured

Online: [ virt-020 virt-021 virt-022 ]

Full list of resources:
 virt-fencing   (stonith:fence_xvm):    Started virt-020 
 Resource Group: ha-nfsserver
     vip        (ocf::heartbeat:IPaddr2):       Started virt-021 
     nfs-server (ocf::heartbeat:nfsserver):     Started virt-021 
     nfs-export (ocf::heartbeat:exportfs):      Started virt-021 
 Clone Set: nfs-shared-fs-clone [nfs-shared-fs]
     Started: [ virt-020 virt-021 virt-022 ]
---------
virt-021# pcs resource show vip nfs-server nfs-export nfs-shared-fs-clone
 Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=10.34.70.217 cidr_netmask=23 
  Operations: monitor interval=30s (vip-monitor-interval-30s)

 Resource: nfs-server (class=ocf provider=heartbeat type=nfsserver)
  Attributes: nfs_ip=10.34.70.217 nfs_init_script=/etc/init.d/nfs \
nfs_shared_infodir=/mnt/nfs nfs_notify_cmd=/usr/sbin/sm-notify 
  Operations: monitor interval=30s (nfs-server-monitor-interval-30s)

 Resource: nfs-export (class=ocf provider=heartbeat type=exportfs)
  Attributes: directory=/mnt clientspec=* options=rw,async,no_all_squash fsid=238 
  Operations: monitor interval=60s (nfs-export-monitor-interval-60s)
 Clone: nfs-shared-fs-clone

 Resource: nfs-shared-fs (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/sda directory=/mnt fstype=gfs2 options= 
   Operations: monitor interval=30s (nfs-shared-fs-monitor-interval-30s)
---------
virt-021# ls -l /var/lib/nfs/rmtab
-rw-r-----. 1 root root 0 Oct 15 11:42 /var/lib/nfs/rmtab

mounted nfs share from outside of the cluster with this command:
# mount 10.34.70.217:/mnt /exports -o nfsvers=3

---------
virt-021# ls -l /var/lib/nfs/rmtab
-rw-r-----. 1 root root 27 Oct 15 11:45 /var/lib/nfs/rmtab


WITHOUT A PATCH (resource-agents-3.9.2-22.el6.x86_64):
======================================================
virt-021# grep -A 12 'restore_rmtab()' \
    /usr/lib/ocf/resource.d/heartbeat/exportfs 
restore_rmtab() {
    local rmtab_backup
    if [ ${OCF_RESKEY_rmtab_backup} != "none" ]; then
        rmtab_backup="${OCF_RESKEY_directory}/${OCF_RESKEY_rmtab_backup}"
        if [ -r ${rmtab_backup} ]; then
            cat  ${rmtab_backup} >> /var/lib/nfs/rmtab
            ocf_log debug "Restored `wc -l ${rmtab_backup}` rmtab entries from ${rmtab_backup}."
        else
            ocf_log warn "rmtab backup ${rmtab_backup} not found or not readable."
        fi
    fi
}

virt-021# for a in $(seq 1 5); do \
pcs resource move ha-nfsserver; sleep 5; \
pcs resource move ha-nfsserver; sleep 5; \
pcs constraint remove $(pcs constraint ref ha-nfsserver | grep cli); echo $a;\
done

virt-021# ls -l /var/lib/nfs/rmtab 
-rw-r-----. 1 root root 182655 Oct 15 12:42 /var/lib/nfs/rmtab


PATCHED VERSION (resource-agents-3.9.2-40.el6.x86_64)
=====================================================
virt-021#  grep -A 12 'restore_rmtab()' \
>     /usr/lib/ocf/resource.d/heartbeat/exportfs 
restore_rmtab() {
    local rmtab_backup
    if [ ${OCF_RESKEY_rmtab_backup} != "none" ]; then
        rmtab_backup="${OCF_RESKEY_directory}/${OCF_RESKEY_rmtab_backup}"
        if [ -r ${rmtab_backup} ]; then
            local tmpf=`mktemp`
            sort -u ${rmtab_backup} /var/lib/nfs/rmtab > $tmpf &&
                install -o root -m 644 $tmpf /var/lib/nfs/rmtab
            rm -f $tmpf
            ocf_log debug "Restored `wc -l ${rmtab_backup}` rmtab entries from ${rmtab_backup}."
        else
            ocf_log warn "rmtab backup ${rmtab_backup} not found or not readable."
        fi

virt-021# for a in $(seq 1 5); do \
pcs resource move ha-nfsserver; sleep 5; \
pcs resource move ha-nfsserver; sleep 5; \
pcs constraint remove $(pcs constraint ref ha-nfsserver | grep cli); echo $a;\
done

virt-021# ls -l /var/lib/nfs/rmtab
-rw-r--r--. 1 root root 27 Oct 15 12:58 /var/lib/nfs/rmtab

Comment 20 errata-xmlrpc 2013-11-21 05:17:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1541.html