RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1091102 - The pacemaker nfsserver resource agent's execution of sm-notify fails during startup
Summary: The pacemaker nfsserver resource agent's execution of sm-notify fails during ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: resource-agents
Version: 6.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: David Vossel
QA Contact: Cluster QE
URL:
Whiteboard:
: 1091474 (view as bug list)
Depends On: 1091101
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-04-24 22:04 UTC by David Vossel
Modified: 2015-08-24 06:36 UTC (History)
7 users (show)

Fixed In Version: resource-agents-3.9.5-8.el6
Doc Type: Bug Fix
Doc Text:
Previously, Pacemaker's nfsserver resource agent was unable to properly perform NFSv3 network status monitor (NSM) state notifications. As a consequence, NFSv3 clients could not reclaim file locks after server relocation or recovery. This update introduces the nfsnotify resource agent, thanks to which NSM notifications can be sent correctly, thus allowing NFSv3 clients to reclaim file locks.
Clone Of: 1091101
Environment:
Last Closed: 2014-10-14 05:00:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 876993 0 None None None Never
Red Hat Product Errata RHBA-2014:1428 0 normal SHIPPED_LIVE resource-agents bug fix and enhancement update 2014-10-14 01:06:18 UTC

Description David Vossel 2014-04-24 22:04:45 UTC
+++ This bug was initially created as a clone of Bug #1091101 +++

Description of problem:

The nfsserver resource-agent's call to sm-notify during startup fails because we do not properly maintain file permissions on the statd folder.

If during a failover, the server needs to notify a client using sm-notify, sm-notify is going to display this warning.

Apr 24 17:52:40 rhel7-node1 lrmd[2771]: notice: operation_finished: nfs-daemon_start_0:3848:stderr [ sm-notify: Failed to delete: could not open original file /var/lib/nfs/statd/sm.ha/sm.bak/rhel7-node3: Permission denied ]


How reproducible:
100%

Steps to Reproduce:

1. deploy this pacemaker scenario for setting up a nfs server and file export. https://github.com/davidvossel/phd/blob/master/scenarios/nfs-basic.scenario
2. mount the export on a node outside of the cluster. grab a filelock on some file within the export

I did this. 

flock /root/nfsshare/clientdatafile -c "sleep 10000"

3. Put whatever node that was hosting the nfs server in standby

pcs cluster standby

4. Watch sm-notify fail during nfs startup on whatever node the nfs server moves to.

Actual results:

sm-notify does not properly delete notify entries.


Expected results:

sm-notify properly deletes notify entries after the notify is complete.

--- Additional comment from David Vossel on 2014-04-24 18:04:12 EDT ---

There is a patch posted for this upstream.

https://github.com/ClusterLabs/resource-agents/pull/414

Comment 2 David Vossel 2014-05-08 15:43:48 UTC
There's an upstream pull request related to this issue.

https://github.com/ClusterLabs/resource-agents/pull/420

Comment 3 David Vossel 2014-05-08 15:44:40 UTC
*** Bug 1091474 has been marked as a duplicate of this bug. ***

Comment 7 michal novacek 2014-07-25 17:26:28 UTC
I have verified (using inscructions from comment #6) that sm-notify works
correctly after nfs server failover with new nfs-notify resource agent
from resource-agents-3.9.5-11.el6.x86_64.

----

nfs-client# mount | grep shared
# mount | grep shared
10.34.70.136:/mnt/shared/1 on /exports/1 type nfs (rw,vers=3,addr=10.34.70.136)

nfs-client# flock /exports/1/urandom -c 'sleep 10000'
...

# tshark -i eth0 -R nlm 
Running as user "root" and group "root". This could be dangerous.
Capturing on eth0
 10.523062 10.34.71.133 -> 10.34.70.136 NLM 330 V4 LOCK Call FH:0x6c895d9c svid:137 pos:0-0
 10.523350 10.34.70.136 -> 10.34.71.133 NLM 106 V4 LOCK Reply (Call In 52)
<failover occurs>
 29.301472 10.34.71.133 -> 10.34.70.136 NLM 330 V4 LOCK Call FH:0x6c895d9c svid:137 pos:0-0
 32.303873 10.34.71.133 -> 10.34.70.136 NLM 330 V4 LOCK Call FH:0x6c895d9c svid:137 pos:0-0
 32.332312 10.34.70.136 -> 10.34.71.133 NLM 106 V4 LOCK Reply (Call In 120)

# tshark -i eth0 -R stat
Running as user "root" and group "root". This could be dangerous.
Capturing on eth0
<failover occurs>
 27.793019 10.34.70.136 -> 10.34.71.133 STAT 142 V1 NOTIFY Call
 27.793204 10.34.71.133 -> 10.34.70.136 STAT 66 V1 NOTIFY Reply (Call In 75)
 27.793440 10.34.70.136 -> 10.34.71.133 STAT 142 V1 NOTIFY Call
 27.793672 10.34.71.133 -> 10.34.70.136 STAT 66 V1 NOTIFY Reply (Call In 77)


Obtaining another lock fails so the locks is still being held by the original
process:
nfs-client# flock --nonblock /exports/1/urandom -c 'sleep 10'
nfs-client# echo $?
1


cluster configuration is as follows: 
virt-136# pcs status 
Cluster name: STSRHTS24129
Last updated: Fri Jul 25 19:03:58 2014
Last change: Fri Jul 25 18:55:14 2014
Stack: cman
Current DC: virt-137 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured
10 Resources configured


Online: [ virt-136 virt-137 ]

Full list of resources:

 fence-virt-136 (stonith:fence_xvm):    Started virt-136 
 fence-virt-137 (stonith:fence_xvm):    Started virt-137 
 fence-virt-138 (stonith:fence_xvm):    Started virt-136 
 Resource Group: hanfs
     mnt-shared (ocf::heartbeat:Filesystem):    Started virt-136 
     nfs-daemon (ocf::heartbeat:nfsserver):     Started virt-136 
     export-root        (ocf::heartbeat:exportfs):      Started virt-136 
     export0    (ocf::heartbeat:exportfs):      Started virt-136 
     export1    (ocf::heartbeat:exportfs):      Started virt-136 
     vip        (ocf::heartbeat:IPaddr2):       Started virt-136 
     nfs-notify (ocf::heartbeat:nfsnotify):     Started virt-136 

virt-136# pcs resource show hanfs
 Group: hanfs
  Resource: mnt-shared (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/shared/shared0 directory=/mnt/shared fstype=ext4 options= force_unmount=safe 
   Operations: start interval=0s timeout=60 (mnt-shared-start-timeout-60)
               stop interval=0s timeout=60 (mnt-shared-stop-timeout-60)
               monitor interval=30s (mnt-shared-monitor-interval-30s)
  Resource: nfs-daemon (class=ocf provider=heartbeat type=nfsserver)
   Attributes: nfs_ip=10.34.70.136 nfs_shared_infodir=/mnt/shared/nfs nfs_no_notify=True 
   Operations: start interval=0s timeout=40 (nfs-daemon-start-timeout-40)
               stop interval=0s timeout=20s (nfs-daemon-stop-timeout-20s)
               monitor interval=30s (nfs-daemon-monitor-interval-30s)
  Resource: export-root (class=ocf provider=heartbeat type=exportfs)
   Attributes: directory=/mnt/shared clientspec=* options=rw,sync fsid=304 
   Operations: start interval=0s timeout=40 (export-root-start-timeout-40)
               stop interval=0s timeout=120 (export-root-stop-timeout-120)
               monitor interval=10 timeout=20 (export-root-monitor-interval-10)
  Resource: export0 (class=ocf provider=heartbeat type=exportfs)
   Attributes: directory=/mnt/shared/0 clientspec=* options=rw,sync fsid=1 
   Operations: start interval=0s timeout=40 (export0-start-timeout-40)
               stop interval=0s timeout=120 (export0-stop-timeout-120)
               monitor interval=10 timeout=20 (export0-monitor-interval-10)
  Resource: export1 (class=ocf provider=heartbeat type=exportfs)
   Attributes: directory=/mnt/shared/1 clientspec=* options=rw,sync fsid=2 
   Operations: start interval=0s timeout=40 (export1-start-timeout-40)
               stop interval=0s timeout=120 (export1-stop-timeout-120)
               monitor interval=10 timeout=20 (export1-monitor-interval-10)
  Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=10.34.70.136 cidr_netmask=23 
   Operations: start interval=0s timeout=20s (vip-start-timeout-20s)
               stop interval=0s timeout=20s (vip-stop-timeout-20s)
               monitor interval=30s (vip-monitor-interval-30s)
  Resource: nfs-notify (class=ocf provider=heartbeat type=nfsnotify)
   Attributes: source_host=pool-10-34-70-136.cluster-qe.lab.eng.brq.redhat.com 
   Operations: start interval=0s timeout=90 (nfs-notify-start-timeout-90)
               stop interval=0s timeout=90 (nfs-notify-stop-timeout-90)
               monitor interval=30 timeout=90 (nfs-notify-monitor-interval-30)

Comment 8 errata-xmlrpc 2014-10-14 05:00:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1428.html


Note You need to log in before you can comment on or make changes to this bug.