Bug 1346733

Summary: nfsserver: systemd integration does not clean up properly
Product: Red Hat Enterprise Linux 7 Reporter: Daniel Kobras <d.kobras>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.2CC: agk, cluster-maint, fdinitto, jruemker, mnovacek
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: resource-agents-3.9.5-78.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 00:04:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Prevent stray daemons and busy mount points left over from nfsserver none

Description Daniel Kobras 2016-06-15 09:28:38 UTC
Created attachment 1168270 [details]
Prevent stray daemons and busy mount points left over from nfsserver

Description of problem:

resource-agents version 3.9.5-54.6 introduced a patch to integrate the nfsserver agent with systemd. Making use of this feature leads to the following problems:

- nfsserver starts the nfs-server.service systemd unit. nfs-server.service 'Wants' auth-rpcgss-module.service, which in turn 'Wants' rpc-gssd.service. The rpc.gssd daemon is used by the NFS client, not the NFS server, and doesn't get stopped if nfs-server.service is stopped. This leaves behind a stray rpc.gssd daemon that keeps the rpc_pipefs mount point busy. Consequently, if using the resource agent's nfs_shared_infodir feature, it cannot be unbound correctly on 'stop'.
- If using the systemd integration, the nfsserver agent tries to call its 'start' action from its 'monitor' action. Pacemaker calls the agents 'monitor' action at multiple times and in various states. In particular, 'monitor' is called to confirm that the agent is stopped. In this state, none of the configured constraints are met. If using the resource agent's nfs_shared_infodir feature, the directory is likely to be created in the wrong filesystem, and rpc_pipefs therefore bind mounted to the wrong mount point.

In a typical active/passive setup with an underlying Filesystem resource, both problems combined prevent proper migration of Filesystem and nfsserver resources, and leave both nodes in a FAILED state.

Version-Release number of selected component (if applicable): >= 3.9.5-54.6


How reproducible: always


Steps to Reproduce:
1. Create 'Filesystem' resource on shared storage
2. Create nfsserver resource with 'nfs_shared_infodir' pointing to directory within 'Filesystem' as configured above, and 'nfs_init_script' left unset.
3. Group Filesystem and nfsserver resource.
4. Start cluster.
5. Initiate failover.

Actual results: rpc.gssd left running, busy rpc_pipefs mount on 'Filesystem', no clean failover.


Expected results: Clean failover of resources to other node. No stray services or mount points left behind.


Additional info:

We've fixed the behaviour for us with the following two changes to the nfsserver agent script (see attached patch):

- Unconditionally stop rpc-gssd from nfsserver_stop (ugly!)
- Remove calls to nfsserver_start from nfsserver_monitor.

This seems to work for us, but I'm unsure in how far this is good enough for everyone. In particular, I fail to understand why the nfsserver_start calls have been introduced to nfsserver_monitor in the first place, instead of leaving this task to Pacemaker itself. There must have been a good reason, but I'm not seeing it.

Comment 3 Oyvind Albrigtsen 2016-07-13 14:57:23 UTC
https://github.com/ClusterLabs/resource-agents/pull/810

Comment 4 Oyvind Albrigtsen 2016-07-13 14:58:14 UTC
Working with patch from last comment.

Comment 6 Daniel Kobras 2016-07-18 15:50:34 UTC
(In reply to Oyvind Albrigtsen from comment #4)
> Working with patch from last comment.

Sorry, while checking the mentioned version from https://github.com/ClusterLabs/resource-agents/pull/810, I noticed that I've overlooked one prerequisite to trigger the problem: /etc/krb5.keytab needs to exist on the system to make systemd automagically fire up rpc.gssd alongside rpc.nfsd.

The version in the pull request fixes the calls to nfsserver_start from nfsserver_monitor, but failover is still broken because of a lingering rpc.gssd after nfsserver_stop. Admittedly though, this might better be fixed in one of the systemd unit files rather than the resource agent.

Comment 7 Oyvind Albrigtsen 2016-07-21 11:27:22 UTC
New build to stop rpc-gssd before unmounting "rpcpipefs_dir".

Comment 8 michal novacek 2016-09-02 09:07:27 UTC
I have verified that clean failover for nfsserver is performed (rpc_pipefs is
umonted and rpc.gssd is not left behind) with resource-agents-3.9.5-81.el7-x86_64.

----

common setup:

 * setup cluster (1)
 * setup ha nfs group resource (2)
 * touch /etc/krb5.keytab on all nodes
 * check that there is rpc.gss running on the node with the group

before the patch (resource-agents-3.9.5-54.el7-x86_64)
=======================================================
[root@virt-005 ~]# pcs resource show
 Clone Set: dlm-clone [dlm]
     Started: [ virt-004 virt-005 virt-021 virt-022 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ virt-004 virt-005 virt-021 virt-022 ]
 Resource Group: hanfs-ap
     havg       (ocf::heartbeat:LVM):   Started virt-005
     mnt-shared (ocf::heartbeat:Filesystem):    Started virt-005
     nfs-daemon (ocf::heartbeat:nfsserver):     Started virt-005
     export-root        (ocf::heartbeat:exportfs):      Started virt-005
     export--mnt-shared-0       (ocf::heartbeat:exportfs):      Started virt-005
     export--mnt-shared-1       (ocf::heartbeat:exportfs):      Started virt-005
     vip        (ocf::heartbeat:IPaddr2):       Started virt-005
     nfs-notify (ocf::heartbeat:nfsnotify):     Started virt-005

[root@virt-005 ~]# mount | grep rpc
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
sunrpc on /mnt/shared/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)

[root@virt-005 ~]# ps axf | grep rpc.gssd
 5085 ?        Ss     0:00 /usr/sbin/rpc.gssd

[root@virt-005 ~]# pcs resource move hanfs-ap
Warning: Creating location constraint cli-ban-hanfs-ap-on-virt-005 with a score
of -INFINITY for resource hanfs-ap on node virt-005.  This will prevent
hanfs-ap from running on virt-005 until the constraint is removed. This will be
the case even if virt-005 is the last node in the cluster.

>>> hanfs-ap is running on another node
[root@virt-005 ~]# pcs resource show
 Clone Set: dlm-clone [dlm]
     Started: [ virt-004 virt-005 virt-021 virt-022 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ virt-004 virt-005 virt-021 virt-022 ]
 Resource Group: hanfs-ap
     havg       (ocf::heartbeat:LVM):   Started virt-004
     mnt-shared (ocf::heartbeat:Filesystem):    Started virt-004
     nfs-daemon (ocf::heartbeat:nfsserver):     Started virt-004
     export-root        (ocf::heartbeat:exportfs):      Started virt-004
     export--mnt-shared-0       (ocf::heartbeat:exportfs):      Started virt-004
     export--mnt-shared-1       (ocf::heartbeat:exportfs):      Started virt-004
     vip        (ocf::heartbeat:IPaddr2):       Started virt-004
     nfs-notify (ocf::heartbeat:nfsnotify):     Started virt-004

>>> but rpc_pipefs is not umounted and rpc.gssd is legt behind
[root@virt-005 ~]# mount | grep rpc
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
sunrpc on /mnt/shared/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)

[root@virt-005 ~]# ps axf | grep rpc.gssd
 5085 ?        Ss     0:00 /usr/sbin/rpc.gssd


corrected version (resource-agents-3.9.5-81.el7-x86_64)
=======================================================
[root@virt-005 ~]# pcs resource show
 Clone Set: dlm-clone [dlm]
     Started: [ virt-004 virt-005 virt-021 virt-022 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ virt-004 virt-005 virt-021 virt-022 ]
 Resource Group: hanfs-ap
     havg       (ocf::heartbeat:LVM):   Started virt-005
     mnt-shared (ocf::heartbeat:Filesystem):    Started virt-005
     nfs-daemon (ocf::heartbeat:nfsserver):     Started virt-005
     export-root        (ocf::heartbeat:exportfs):      Started virt-005
     export--mnt-shared-0       (ocf::heartbeat:exportfs):      Started virt-005
     export--mnt-shared-1       (ocf::heartbeat:exportfs):      Started virt-005
     vip        (ocf::heartbeat:IPaddr2):       Started virt-005
     nfs-notify (ocf::heartbeat:nfsnotify):     Started virt-005

[root@virt-005 ~]# mount | grep rpc
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
sunrpc on /mnt/shared/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)

[root@virt-005 ~]# ps axf | grep rpc.gssd
 5085 ?        Ss     0:00 /usr/sbin/rpc.gssd

[root@virt-005 ~]# pcs resource move hanfs-ap
Warning: Creating location constraint cli-ban-hanfs-ap-on-virt-005 with a score
of -INFINITY for resource hanfs-ap on node virt-005.  This will prevent
hanfs-ap from running on virt-005 until the constraint is removed. This will be
the case even if virt-005 is the last node in the cluster.

>>> hanfs-ap is running on another node
[root@virt-005 ~]# pcs resource show
 Clone Set: dlm-clone [dlm]
     Started: [ virt-004 virt-005 virt-021 virt-022 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ virt-004 virt-005 virt-021 virt-022 ]
 Resource Group: hanfs-ap
     havg       (ocf::heartbeat:LVM):   Started virt-004
     mnt-shared (ocf::heartbeat:Filesystem):    Started virt-004
     nfs-daemon (ocf::heartbeat:nfsserver):     Started virt-004
     export-root        (ocf::heartbeat:exportfs):      Started virt-004
     export--mnt-shared-0       (ocf::heartbeat:exportfs):      Started virt-004
     export--mnt-shared-1       (ocf::heartbeat:exportfs):      Started virt-004
     vip        (ocf::heartbeat:IPaddr2):       Started virt-004
     nfs-notify (ocf::heartbeat:nfsnotify):     Started virt-004

>>> rpc.gssd is cleared, rpc_pipefs is not mounted
[root@virt-005 ~]# ps axf | grep rpc.gssd
18099 pts/0    S+     0:00          \_ grep --color=auto rpc.gssd
[root@virt-005 ~]# mount | grep rpc
[root@virt-005 ~]#

----
>>(1)
Cluster name: STSRHTS2803
Stack: corosync
Current DC: virt-022 (version 1.1.15-10.el7-e174ec8) - partition with quorum
Last updated: Fri Sep  2 10:09:17 2016          Last change: Fri Sep  2 09:44:59 2016 by root via crm_resource on virt-004

4 nodes and 20 resources configured

Online: [ virt-004 virt-005 virt-021 virt-022 ]

Full list of resources:

 fence-virt-004 (stonith:fence_xvm):    Started virt-004
 fence-virt-005 (stonith:fence_xvm):    Started virt-005
 fence-virt-021 (stonith:fence_xvm):    Started virt-021
 fence-virt-022 (stonith:fence_xvm):    Started virt-022
 Clone Set: dlm-clone [dlm]
     Started: [ virt-004 virt-005 virt-021 virt-022 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ virt-004 virt-005 virt-021 virt-022 ]
 Resource Group: hanfs-ap
     havg       (ocf::heartbeat:LVM):   Started virt-005
     mnt-shared (ocf::heartbeat:Filesystem):    Started virt-005
     nfs-daemon (ocf::heartbeat:nfsserver):     Started virt-005
     export-root        (ocf::heartbeat:exportfs):      Started virt-005
     export--mnt-shared-0       (ocf::heartbeat:exportfs):      Started virt-005
     export--mnt-shared-1       (ocf::heartbeat:exportfs):      Started virt-005
     vip        (ocf::heartbeat:IPaddr2):       Started virt-005
     nfs-notify (ocf::heartbeat:nfsnotify):     Started virt-005

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


>>(2)
[root@virt-004 ~]# pcs resource show --full
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s timeout=100 (dlm-stop-interval-0s)
               monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Attributes: with_cmirrord=1
   Operations: start interval=0s timeout=90 (clvmd-start-interval-0s)
               stop interval=0s timeout=90 (clvmd-stop-interval-0s)
               monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
 Group: hanfs-ap
  Resource: havg (class=ocf provider=heartbeat type=LVM)
   Attributes: exclusive=true partial_activation=false volgrpname=shared
   Operations: start interval=0s timeout=30 (havg-start-interval-0s)
               stop interval=0s timeout=30 (havg-stop-interval-0s)
               monitor interval=10 timeout=30 (havg-monitor-interval-10)
  Resource: mnt-shared (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/shared/shared0 directory=/mnt/shared fstype=ext4 options=
   Operations: start interval=0s timeout=60 (mnt-shared-start-interval-0s)
               stop interval=0s timeout=60 (mnt-shared-stop-interval-0s)
               monitor interval=30s (mnt-shared-monitor-interval-30s)
  Resource: nfs-daemon (class=ocf provider=heartbeat type=nfsserver)
   Attributes: nfs_shared_infodir=/mnt/shared/nfs nfs_no_notify=true
   Operations: stop interval=0s timeout=20s (nfs-daemon-stop-interval-0s)
               monitor interval=30s (nfs-daemon-monitor-interval-30s)
               start interval=0s timeout=90s (nfs-daemon-start-interval-0s)
  Resource: export-root (class=ocf provider=heartbeat type=exportfs)
   Attributes: directory=/mnt/shared clientspec=* options=rw fsid=170
   Operations: start interval=0s timeout=40 (export-root-start-interval-0s)
               stop interval=0s timeout=120 (export-root-stop-interval-0s)
               monitor interval=10 timeout=20 (export-root-monitor-interval-10)
  Resource: export--mnt-shared-0 (class=ocf provider=heartbeat type=exportfs)
   Attributes: directory=/mnt/shared/0 clientspec=* options=rw fsid=1
   Operations: start interval=0s timeout=40 (export--mnt-shared-0-start-interval-0s)
               stop interval=0s timeout=120 (export--mnt-shared-0-stop-interval-0s)
               monitor interval=10 timeout=20 (export--mnt-shared-0-monitor-interval-10)
  Resource: export--mnt-shared-1 (class=ocf provider=heartbeat type=exportfs)
   Attributes: directory=/mnt/shared/1 clientspec=* options=rw fsid=2
   Operations: start interval=0s timeout=40 (export--mnt-shared-1-start-interval-0s)
               stop interval=0s timeout=120 (export--mnt-shared-1-stop-interval-0s)
               monitor interval=10 timeout=20 (export--mnt-shared-1-monitor-interval-10)
  Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=10.34.70.86 cidr_netmask=23
   Operations: start interval=0s timeout=20s (vip-start-interval-0s)
               stop interval=0s timeout=20s (vip-stop-interval-0s)
               monitor interval=30s (vip-monitor-interval-30s)
  Resource: nfs-notify (class=ocf provider=heartbeat type=nfsnotify)
   Attributes: source_host=pool-10-34-70-86.cluster-qe.lab.eng.brq.redhat.com
   Operations: start interval=0s timeout=90 (nfs-notify-start-interval-0s)
               stop interval=0s timeout=90 (nfs-notify-stop-interval-0s)
               monitor interval=30 timeout=90 (nfs-notify-monitor-interval-30)

Comment 9 John Ruemker 2016-09-07 23:31:55 UTC
*** Bug 1343737 has been marked as a duplicate of this bug. ***

Comment 11 errata-xmlrpc 2016-11-04 00:04:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2174.html