Bug 1384955

Summary: when nfsserver resource stops rpcbind depending non-clustered services stop too
Product: Red Hat Enterprise Linux 7 Reporter: Josef Zimek <pzimek>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: agk, cluster-maint, fdinitto, jruemker, mnovacek
Target Milestone: rcKeywords: Patch, Reproducer
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: resource-agents-3.9.5-84.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 14:55:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Josef Zimek 2016-10-14 12:24:14 UTC
Description of problem:

nfsserver resource agent stops rpcbind deamon as part of stop sequence. This causes other non-clustered services which utilize rpcbind (NIS / ypbind) to stop too. This behaviour is not desired as it created downtime.



# systemctl is-enabled ypbind
enabled

One of the ypbind’s dependencies is rpcbind:

# systemctl list-dependencies ypbind | grep rpcbind
* |-rpcbind.service
*   | |-rpcbind.socket



The resource agent ((ocf::heartbeat:nfsserver) explicitly starts (Line 736) and stops (Line 900) rpcbind:

# grep -n -E "(start|stop) rpcbind" /usr/lib/ocf/resource.d/heartbeat/nfsserver
736:            nfs_exec start rpcbind
900:            nfs_exec stop rpcbind > /dev/null 2>&1



Version-Release number of selected component (if applicable):
resource-agents-3.9.5-54.el7_2.17.x86_64


How reproducible:
always

Steps to Reproduce:
1. pacemaker based cluster with nfsserver resource
2. NIS/ypbind running on cluster node (not configured as cluster service)
3. stop nfsserver resource


Actual results:
NIS gets stopped due to rpcbind stop

Expected results:
Stopping nfsserver resource doesn't affect services outside cluster

Additional info:

Node1:
Clustered nfs service (Active node)
Non-clustered nis servce running.

Node2: 
Clustered nfs service (passive node)
While clustered nfs service moved from node1 to node2. NIS service will get stopped on node1.

Comment 2 Oyvind Albrigtsen 2016-11-01 15:09:48 UTC
https://github.com/ClusterLabs/resource-agents/pull/869

Comment 5 michal novacek 2017-06-06 12:20:48 UTC
I have verified that ypbind service will not be stopped (as a result of
nfsserver killing rpcbind) when nfsserver is stopped with
resource-agents-3.9.5-104.el7

---

Common setup:

* setup local nis server on one of the nodes using 
   this howto: https://access.redhat.com/solutions/7247
* check that ypbind service is running
> $ systemctl is-active ypbind
> active
* configure cluster with active/passive nfs [1], [2]

before the patch (resource-agents-3.9.5-80.el7)
===============================================

[root@host-134 ~]# pcs resource disable nfs-daemon

[root@host-134 ~]# systemctl is-active ypbind
inactive

[root@host-134 ~]# ypcat passwd
No such map passwd.byname. Reason: Can't bind to server which serves this domain


after the patch (resource-agents-3.9.5-104.el7)
===============================================

[root@host-134 ~]# pcs resource disable nfs-daemon

[root@host-134 ~]# systemctl is-active ypbind
active

[root@host-134 ~]# ypcat passwd
test:x:1000:1000::/home/test:/bin/bash
testmonkey:x:1001:1001::/home/testmonkey:/bin/bash


----

> (2) pcs-status
root@host-134 ~]# pcs status
Cluster name: STSRHTS10447
Stack: corosync
Current DC: host-143 (version 1.1.16-9.el7-94ff4df) - partition with quorum
Last updated: Tue Jun  6 07:10:16 2017
Last change: Tue Jun  6 07:10:05 2017 by hacluster via crmd on host-143

3 nodes configured
17 resources configured

Online: [ host-134 host-142 host-143 ]

Full list of resources:

 fence-host-134 (stonith:fence_xvm):    Started host-142
 fence-host-142 (stonith:fence_xvm):    Started host-143
 fence-host-143 (stonith:fence_xvm):    Started host-134
 Clone Set: dlm-clone [dlm]
     Started: [ host-134 host-142 host-143 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ host-134 host-142 host-143 ]
 Resource Group: hanfs-ap
     havg       (ocf::heartbeat:LVM):   Started host-134
     mnt-shared (ocf::heartbeat:Filesystem):    Started host-134
     nfs-daemon (ocf::heartbeat:nfsserver):     Started host-134
     export-root        (ocf::heartbeat:exportfs):      Started host-134
     export--mnt-shared-0       (ocf::heartbeat:exportfs):      Started host-134
     export--mnt-shared-1       (ocf::heartbeat:exportfs):      Started host-134
     vip        (ocf::heartbeat:IPaddr2):       Started host-134
     nfs-notify (ocf::heartbeat:nfsnotify):     Started host-134

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

> (1) pcs-config
[root@host-134 ~]# pcs config
Cluster Name: STSRHTS10447
Corosync Nodes:
 host-134 host-142 host-143
Pacemaker Nodes:
 host-134 host-142 host-143

Resources:
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
               start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s timeout=100 (dlm-stop-interval-0s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Attributes: with_cmirrord=1
   Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
               start interval=0s timeout=90 (clvmd-start-interval-0s)
               stop interval=0s timeout=90 (clvmd-stop-interval-0s)
 Group: hanfs-ap
  Resource: havg (class=ocf provider=heartbeat type=LVM)
   Attributes: exclusive=true partial_activation=false volgrpname=shared
   Operations: monitor interval=10 timeout=30 (havg-monitor-interval-10)
               start interval=0s timeout=30 (havg-start-interval-0s)
               stop interval=0s timeout=30 (havg-stop-interval-0s)
  Resource: mnt-shared (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/shared/shared0 directory=/mnt/shared fstype=ext4 options=
   Operations: monitor interval=30s (mnt-shared-monitor-interval-30s)
               start interval=0s timeout=60 (mnt-shared-start-interval-0s)
               stop interval=0s timeout=60 (mnt-shared-stop-interval-0s)
  Resource: nfs-daemon (class=ocf provider=heartbeat type=nfsserver)
   Attributes: nfs_no_notify=true nfs_shared_infodir=/mnt/shared/nfs
   Operations: monitor interval=30s (nfs-daemon-monitor-interval-30s)
               start interval=0s timeout=90s (nfs-daemon-start-interval-0s)
               stop interval=0s timeout=20s (nfs-daemon-stop-interval-0s)
  Resource: export-root (class=ocf provider=heartbeat type=exportfs)
   Attributes: clientspec=* directory=/mnt/shared fsid=354 options=rw
   Operations: monitor interval=10 timeout=20 (export-root-monitor-interval-10)
               start interval=0s timeout=40 (export-root-start-interval-0s)
               stop interval=0s timeout=120 (export-root-stop-interval-0s)
  Resource: export--mnt-shared-0 (class=ocf provider=heartbeat type=exportfs)
   Attributes: clientspec=* directory=/mnt/shared/0 fsid=1 options=rw
   Operations: monitor interval=10 timeout=20 (export--mnt-shared-0-monitor-interval-10)
               start interval=0s timeout=40 (export--mnt-shared-0-start-interval-0s)
               stop interval=0s timeout=120 (export--mnt-shared-0-stop-interval-0s)
  Resource: export--mnt-shared-1 (class=ocf provider=heartbeat type=exportfs)
   Attributes: clientspec=* directory=/mnt/shared/1 fsid=2 options=rw
   Operations: monitor interval=10 timeout=20 (export--mnt-shared-1-monitor-interval-10)
               start interval=0s timeout=40 (export--mnt-shared-1-start-interval-0s)
               stop interval=0s timeout=120 (export--mnt-shared-1-stop-interval-0s)
  Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: cidr_netmask=22 ip=10.15.107.148
   Operations: monitor interval=30s (vip-monitor-interval-30s)
               start interval=0s timeout=20s (vip-start-interval-0s)
               stop interval=0s timeout=20s (vip-stop-interval-0s)
  Resource: nfs-notify (class=ocf provider=heartbeat type=nfsnotify)
   Attributes: source_host=dhcp-107-148.lab.msp.redhat.com
   Operations: monitor interval=30 timeout=90 (nfs-notify-monitor-interval-30)
               start interval=0s timeout=90 (nfs-notify-start-interval-0s)
               stop interval=0s timeout=90 (nfs-notify-stop-interval-0s)

Stonith Devices:
 Resource: fence-host-134 (class=stonith type=fence_xvm)
  Attributes: pcmk_host_check=static-list pcmk_host_list=host-134 pcmk_host_map=host-134:host-134.virt.lab.msp.redhat.com
  Operations: monitor interval=60s (fence-host-134-monitor-interval-60s)
 Resource: fence-host-142 (class=stonith type=fence_xvm)
  Attributes: pcmk_host_check=static-list pcmk_host_list=host-142 pcmk_host_map=host-142:host-142.virt.lab.msp.redhat.com
  Operations: monitor interval=60s (fence-host-142-monitor-interval-60s)
 Resource: fence-host-143 (class=stonith type=fence_xvm)
  Attributes: pcmk_host_check=static-list pcmk_host_list=host-143 pcmk_host_map=host-143:host-143.virt.lab.msp.redhat.com
  Operations: monitor interval=60s (fence-host-143-monitor-interval-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
  start dlm-clone then start clvmd-clone (kind:Mandatory)
  start clvmd-clone then start hanfs-ap (kind:Mandatory)
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY)
  hanfs-ap with clvmd-clone (score:INFINITY)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: STSRHTS10447
 dc-version: 1.1.16-9.el7-94ff4df
 have-watchdog: false
 last-lrm-refresh: 1496751005
 no-quorum-policy: freeze

Quorum:
  Options:

Comment 6 errata-xmlrpc 2017-08-01 14:55:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1844