Bug 1356866
| Summary: | NFS services are started up in the monitor action of the ocf:heartbeat:nfsserver resource agent | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | samirjafferali | |
| Component: | resource-agents | Assignee: | Oyvind Albrigtsen <oalbrigt> | |
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.2 | CC: | agk, cfeist, cluster-maint, djansa, fdinitto, gjose, mnovacek, pbokoc, pzimek, samirjafferali, sbradley | |
| Target Milestone: | rc | Keywords: | ZStream | |
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | resource-agents-3.9.5-78.el7 | Doc Type: | Bug Fix | |
| Doc Text: |
A previous update caused Pacemaker to start NFS services during monitor actions such as "pcs resource debug-monitor". This update fixes the nfsserver resource agent, and monitor actions in Pacemaker no longer automatically start these services.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1370385 (view as bug list) | Environment: |
CentOS Linux release 7.2.1511 (Core)
|
|
| Last Closed: | 2016-11-04 00:04:21 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1370385 | |||
Tested and working patch: https://github.com/ClusterLabs/resource-agents/pull/836
I have verified that monitoring action will not try starting the nfsserver in
resource-agents-3.9.5-81.el7.x86_64.
-----
comon setup:
* setup cluster (1)
* setup nfs resource group and disable nfsserver resource in that group (2)
before the patch resource-agents-3.9.5-54.el7.x86_64
====================================================
[root@virt-021 ~]# pcs resource debug-monitor nfs-daemon
Error performing operation: Argument list too long
Operation monitor for nfs-daemon (ocf:heartbeat:nfsserver) returned 7
> stderr: DEBUG: * nfs-server.service - NFS server and services
> stderr: Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled)
> stderr: Active: inactive (dead) since Fri 2016-09-02 11:39:04 CEST; 24min ago
> stderr: Process: 29295 ExecStopPost=/usr/sbin/exportfs -f (code=exited, status=0/SUCCESS)
> stderr: Process: 29292 ExecStopPost=/usr/sbin/exportfs -au (code=exited, status=0/SUCCESS)
> stderr: Process: 29291 ExecStop=/usr/sbin/rpc.nfsd 0 (code=exited, status=0/SUCCESS)
> stderr: Process: 24993 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=0/SUCCESS)
> stderr: Process: 24990 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
> stderr: Main PID: 24993 (code=exited, status=0/SUCCESS)
> stderr:
> stderr: Sep 02 11:35:42 virt-021.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Starting NFS server and services...
> stderr: Sep 02 11:35:42 virt-021.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Started NFS server and services.
> stderr: Sep 02 11:39:04 virt-021.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Stopping NFS server and services...
> stderr: Sep 02 11:39:04 virt-021.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Stopped NFS server and services.
[root@virt-021 ~]# echo $?
7
after the patch resource-agents-3.9.5-81.el7.x86_64
===================================================
[root@virt-021 ~]# pcs resource show
Clone Set: dlm-clone [dlm]
Started: [ virt-004 virt-005 virt-021 virt-022 ]
Clone Set: clvmd-clone [clvmd]
Started: [ virt-004 virt-005 virt-021 virt-022 ]
Resource Group: hanfs-ap
havg (ocf::heartbeat:LVM): Started virt-004
mnt-shared (ocf::heartbeat:Filesystem): Started virt-004
nfs-daemon (ocf::heartbeat:nfsserver): Stopped (disabled)
export-root (ocf::heartbeat:exportfs): Stopped
export--mnt-shared-0 (ocf::heartbeat:exportfs): Stopped
export--mnt-shared-1 (ocf::heartbeat:exportfs): Stopped
vip (ocf::heartbeat:IPaddr2): Stopped
nfs-notify (ocf::heartbeat:nfsnotify): Stopped
[root@virt-021 ~]# pcs resource debug-monitor nfs-daemon
Error performing operation: Argument list too long
Operation monitor for nfs-daemon (ocf:heartbeat:nfsserver) returned 7
> stderr: INFO: Status: rpcbind
> stderr: INFO: Status: nfs-mountd
> stderr: ocf-exit-reason:nfs-mountd is not running
[root@virt-021 ~]# echo $?
7
----
>> (1)
[root@virt-021 ~]# pcs status
Cluster name: STSRHTS2803
Stack: corosync
Current DC: virt-022 (version 1.1.15-10.el7-e174ec8) - partition with quorum
Last updated: Fri Sep 2 11:58:44 2016 Last change: Fri Sep 2 11:55:02 2016 by root via crm_resource on virt-021
4 nodes and 20 resources configured: 2 resources DISABLED and 0 BLOCKED from being started due to failures
Online: [ virt-004 virt-005 virt-021 virt-022 ]
Full list of resources:
fence-virt-004 (stonith:fence_xvm): Started virt-004
fence-virt-005 (stonith:fence_xvm): Started virt-021
fence-virt-021 (stonith:fence_xvm): Started virt-022
fence-virt-022 (stonith:fence_xvm): Started virt-005
Clone Set: dlm-clone [dlm]
Started: [ virt-004 virt-005 virt-021 virt-022 ]
Clone Set: clvmd-clone [clvmd]
Started: [ virt-004 virt-005 virt-021 virt-022 ]
Resource Group: hanfs-ap
havg (ocf::heartbeat:LVM): Started virt-004
mnt-shared (ocf::heartbeat:Filesystem): Started virt-004
> nfs-daemon (ocf::heartbeat:nfsserver): Stopped (disabled)
export-root (ocf::heartbeat:exportfs): Stopped
export--mnt-shared-0 (ocf::heartbeat:exportfs): Stopped
export--mnt-shared-1 (ocf::heartbeat:exportfs): Stopped
vip (ocf::heartbeat:IPaddr2): Stopped
nfs-notify (ocf::heartbeat:nfsnotify): Stopped
Failed Actions:
* nfs-daemon_monitor_30000 on virt-021 'not running' (7): call=99, status=complete, exitreason='nfs-idmapd is not running',
last-rc-change='Fri Sep 2 11:35:39 2016', queued=0ms, exec=0ms
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
>> (2)
[root@virt-021 ~]# pcs resource show --full
Clone: dlm-clone
Meta Attrs: interleave=true ordered=true
Resource: dlm (class=ocf provider=pacemaker type=controld)
Operations: start interval=0s timeout=90 (dlm-start-interval-0s)
stop interval=0s timeout=100 (dlm-stop-interval-0s)
monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
Clone: clvmd-clone
Meta Attrs: interleave=true ordered=true
Resource: clvmd (class=ocf provider=heartbeat type=clvm)
Attributes: with_cmirrord=1
Operations: start interval=0s timeout=90 (clvmd-start-interval-0s)
stop interval=0s timeout=90 (clvmd-stop-interval-0s)
monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
Group: hanfs-ap
Resource: havg (class=ocf provider=heartbeat type=LVM)
Attributes: exclusive=true partial_activation=false volgrpname=shared
Operations: start interval=0s timeout=30 (havg-start-interval-0s)
stop interval=0s timeout=30 (havg-stop-interval-0s)
monitor interval=10 timeout=30 (havg-monitor-interval-10)
Resource: mnt-shared (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/shared/shared0 directory=/mnt/shared fstype=ext4 options=
Operations: start interval=0s timeout=60 (mnt-shared-start-interval-0s)
stop interval=0s timeout=60 (mnt-shared-stop-interval-0s)
monitor interval=30s (mnt-shared-monitor-interval-30s)
Resource: nfs-daemon (class=ocf provider=heartbeat type=nfsserver)
Attributes: nfs_shared_infodir=/mnt/shared/nfs nfs_no_notify=true rpcpipefs_dir=/var/lib/nfs2
Meta Attrs: target-role=Stopped
Operations: stop interval=0s timeout=20s (nfs-daemon-stop-interval-0s)
monitor interval=30s (nfs-daemon-monitor-interval-30s)
start interval=0s timeout=90s (nfs-daemon-start-interval-0s)
Resource: export-root (class=ocf provider=heartbeat type=exportfs)
Attributes: directory=/mnt/shared clientspec=* options=rw fsid=170
Operations: start interval=0s timeout=40 (export-root-start-interval-0s)
stop interval=0s timeout=120 (export-root-stop-interval-0s)
monitor interval=10 timeout=20 (export-root-monitor-interval-10)
Resource: export--mnt-shared-0 (class=ocf provider=heartbeat type=exportfs)
Attributes: directory=/mnt/shared/0 clientspec=* options=rw fsid=1
Operations: start interval=0s timeout=40 (export--mnt-shared-0-start-interval-0s)
stop interval=0s timeout=120 (export--mnt-shared-0-stop-interval-0s)
monitor interval=10 timeout=20 (export--mnt-shared-0-monitor-interval-10)
Resource: export--mnt-shared-1 (class=ocf provider=heartbeat type=exportfs)
Attributes: directory=/mnt/shared/1 clientspec=* options=rw fsid=2
Operations: start interval=0s timeout=40 (export--mnt-shared-1-start-interval-0s)
stop interval=0s timeout=120 (export--mnt-shared-1-stop-interval-0s)
monitor interval=10 timeout=20 (export--mnt-shared-1-monitor-interval-10)
Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=10.34.70.86 cidr_netmask=23
Operations: start interval=0s timeout=20s (vip-start-interval-0s)
stop interval=0s timeout=20s (vip-stop-interval-0s)
monitor interval=30s (vip-monitor-interval-30s)
Resource: nfs-notify (class=ocf provider=heartbeat type=nfsnotify)
Attributes: source_host=pool-10-34-70-86.cluster-qe.lab.eng.brq.redhat.com
Operations: start interval=0s timeout=90 (nfs-notify-start-interval-0s)
stop interval=0s timeout=90 (nfs-notify-stop-interval-0s)
monitor interval=30 timeout=90 (nfs-notify-monitor-interval-30)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2174.html |
Description of problem: When pacemaker is configured for an active/passive NFS cluster using the /usr/lib/ocf/resource.d/heartbeat/nfsserver resource agent, it seems to start the NFS services when performing the monitor action, regardless if the services are suppose to be running on that node or not. This is the passive node, that currently has no NFS services running on it. ============== [root@hcluster2-nfs2 ~]# pcs resource show nfs-daemon Resource: nfs-daemon (class=ocf provider=heartbeat type=nfsserver) Attributes: nfs_shared_infodir=/san/nfs-fs/nfsinfo nfs_no_notify=true rpcpipefs_dir=/var/lib/rpc_pipefs Operations: start interval=0s timeout=40 (nfs-daemon-start-interval-0s) stop interval=0s timeout=20s (nfs-daemon-stop-interval-0s) monitor interval=10 timeout=20s (nfs-daemon-monitor-interval-10) [root@hcluster2-nfs2 ~]# pcs resource debug-stop nfs-daemon Operation stop for nfs-daemon (ocf:heartbeat:nfsserver) returned 0 > stderr: INFO: Stopping NFS server ... > stderr: DEBUG: > stderr: INFO: Stop: threads > stderr: INFO: Stop: rpc-statd > stderr: INFO: Stop: nfs-idmapd > stderr: DEBUG: * nfs-idmapd.service - NFSv4 ID-name mapping service > stderr: Loaded: loaded (/usr/lib/systemd/system/nfs-idmapd.service; static; vendor preset: disabled) > stderr: Active: inactive (dead) since Fri 2016-07-15 01:50:31 CDT; 25min ago > stderr: Process: 19622 ExecStart=/usr/sbin/rpc.idmapd $RPCIDMAPDARGS (code=exited, status=0/SUCCESS) > stderr: Main PID: 19624 (code=killed, signal=TERM) > stderr: > stderr: Jul 15 01:49:39 hcluster2-nfs2 systemd[1]: Starting NFSv4 ID-name mapping service... > stderr: Jul 15 01:49:39 hcluster2-nfs2 systemd[1]: Started NFSv4 ID-name mapping service. > stderr: Jul 15 01:49:40 hcluster2-nfs2 systemd[1]: Started NFSv4 ID-name mapping service. > stderr: Jul 15 01:50:31 hcluster2-nfs2 systemd[1]: Stopping NFSv4 ID-name mapping service... > stderr: Jul 15 01:50:31 hcluster2-nfs2 systemd[1]: Stopped NFSv4 ID-name mapping service. > stderr: Jul 15 01:50:31 hcluster2-nfs2 systemd[1]: Stopped NFSv4 ID-name mapping service. > stderr: Jul 15 02:15:19 hcluster2-nfs2 systemd[1]: Stopped NFSv4 ID-name mapping service. > stderr: Jul 15 02:16:20 hcluster2-nfs2 systemd[1]: Stopped NFSv4 ID-name mapping service. > stderr: INFO: Stop: nfs-mountd > stderr: INFO: Stop: rpcbind > stderr: INFO: NFS server stopped [root@hcluster2-nfs2 ~]# ============== So, performing a "pcs resource debug-monitor nfs-daemon", should confirm that the resource is not started on this node, however, instead it seems to start the service if it is not running and returns a status that it is running. ============== [root@hcluster2-nfs2 ~]# pcs resource debug-monitor nfs-daemon Operation monitor for nfs-daemon (ocf:heartbeat:nfsserver) returned 0 > stdout: # AUTOGENERATED by /usr/lib/ocf/resource.d/heartbeat/nfsserver high availability resource-agent > stdout: * var-lib-nfs-rpc_pipefs.mount - RPC Pipe File System > stdout: Loaded: loaded (/usr/lib/systemd/system/var-lib-nfs-rpc_pipefs.mount; static; vendor preset: disabled) > stdout: Active: inactive (dead) since Fri 2016-07-15 02:16:02 CDT; 3min 10s ago > stdout: Where: /var/lib/nfs/rpc_pipefs > stdout: What: sunrpc > stdout: * rpc-statd.service - NFS status monitor for NFSv2/3 locking. > stdout: Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; static; vendor preset: disabled) > stdout: Active: inactive (dead) since Fri 2016-07-15 01:50:31 CDT; 28min ago > stdout: Process: 19612 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS (code=exited, status=0/SUCCESS) > stdout: Main PID: 19613 (code=exited, status=0/SUCCESS) > stdout: > stdout: Jul 15 01:49:39 hcluster2-nfs2 rpc.statd[19613]: Flags: TI-RPC > stdout: Jul 15 01:49:39 hcluster2-nfs2 systemd[1]: Started NFS status monitor for NFSv2/3 locking.. > stdout: Jul 15 01:49:40 hcluster2-nfs2 systemd[1]: Started NFS status monitor for NFSv2/3 locking.. > stdout: Jul 15 01:50:31 hcluster2-nfs2 systemd[1]: Stopping NFS status monitor for NFSv2/3 locking.... > stdout: Jul 15 01:50:31 hcluster2-nfs2 systemd[1]: Stopped NFS status monitor for NFSv2/3 locking.. > stdout: Jul 15 01:50:32 hcluster2-nfs2 systemd[1]: Stopped NFS status monitor for NFSv2/3 locking.. > stdout: Jul 15 02:15:19 hcluster2-nfs2 systemd[1]: Stopped NFS status monitor for NFSv2/3 locking.. > stdout: Jul 15 02:15:19 hcluster2-nfs2 systemd[1]: Stopped NFS status monitor for NFSv2/3 locking.. > stdout: Jul 15 02:16:20 hcluster2-nfs2 systemd[1]: Stopped NFS status monitor for NFSv2/3 locking.. > stdout: Jul 15 02:16:20 hcluster2-nfs2 systemd[1]: Stopped NFS status monitor for NFSv2/3 locking.. > stdout: * rpc-statd.service - NFS status monitor for NFSv2/3 locking. > stdout: Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; static; vendor preset: disabled) > stdout: Active: active (running) since Fri 2016-07-15 02:19:13 CDT; 262ms ago > stdout: Process: 26554 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS (code=exited, status=0/SUCCESS) > stdout: Main PID: 26555 (rpc.statd) > stdout: CGroup: /system.slice/rpc-statd.service > stdout: `-26555 /usr/sbin/rpc.statd --no-notify --no-notify > stdout: > stdout: Jul 15 02:19:13 hcluster2-nfs2 systemd[1]: Starting NFS status monitor for NFSv2/3 locking.... > stdout: Jul 15 02:19:13 hcluster2-nfs2 rpc.statd[26555]: Version 1.3.0 starting > stdout: Jul 15 02:19:13 hcluster2-nfs2 rpc.statd[26555]: Flags: TI-RPC > stdout: Jul 15 02:19:13 hcluster2-nfs2 systemd[1]: Started NFS status monitor for NFSv2/3 locking.. > stdout: Jul 15 02:19:13 hcluster2-nfs2 systemd[1]: Started NFS status monitor for NFSv2/3 locking.. > stderr: INFO: Status: rpcbind > stderr: INFO: Starting NFS server ... > stderr: INFO: Start: rpcbind i: 10 > stderr: INFO: Start: v3locking: 0 > stderr: INFO: Start: nfs-mountd i: 10 > stderr: INFO: Start: nfs-idmapd i: 10 > stderr: DEBUG: * nfs-idmapd.service - NFSv4 ID-name mapping service > stderr: Loaded: loaded (/usr/lib/systemd/system/nfs-idmapd.service; static; vendor preset: disabled) > stderr: Active: active (running) since Fri 2016-07-15 02:19:13 CDT; 95ms ago > stderr: Process: 26566 ExecStart=/usr/sbin/rpc.idmapd $RPCIDMAPDARGS (code=exited, status=0/SUCCESS) > stderr: Main PID: 26568 (rpc.idmapd) > stderr: CGroup: /system.slice/nfs-idmapd.service > stderr: `-26568 /usr/sbin/rpc.idmapd > stderr: > stderr: Jul 15 02:19:13 hcluster2-nfs2 systemd[1]: Starting NFSv4 ID-name mapping service... > stderr: Jul 15 02:19:13 hcluster2-nfs2 systemd[1]: Started NFSv4 ID-name mapping service. > stderr: Jul 15 02:19:13 hcluster2-nfs2 systemd[1]: Started NFSv4 ID-name mapping service. > stderr: INFO: Start: rpc-statd i: 10 > stderr: DEBUG: > stderr: INFO: NFS server started > stderr: INFO: Status: nfs-mountd > stderr: INFO: Status: nfs-idmapd > stderr: DEBUG: * nfs-idmapd.service - NFSv4 ID-name mapping service > stderr: Loaded: loaded (/usr/lib/systemd/system/nfs-idmapd.service; static; vendor preset: disabled) > stderr: Active: active (running) since Fri 2016-07-15 02:19:13 CDT; 161ms ago > stderr: Process: 26566 ExecStart=/usr/sbin/rpc.idmapd $RPCIDMAPDARGS (code=exited, status=0/SUCCESS) > stderr: Main PID: 26568 (rpc.idmapd) > stderr: CGroup: /system.slice/nfs-idmapd.service > stderr: `-26568 /usr/sbin/rpc.idmapd > stderr: > stderr: Jul 15 02:19:13 hcluster2-nfs2 systemd[1]: Starting NFSv4 ID-name mapping service... > stderr: Jul 15 02:19:13 hcluster2-nfs2 systemd[1]: Started NFSv4 ID-name mapping service. > stderr: Jul 15 02:19:13 hcluster2-nfs2 systemd[1]: Started NFSv4 ID-name mapping service. > stderr: INFO: Status: rpc-statd > stderr: DEBUG: * nfs-server.service - NFS server and services > stderr: Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; disabled; vendor preset: disabled) > stderr: Active: active (exited) since Fri 2016-07-15 02:19:13 CDT; 102ms ago > stderr: Process: 19965 ExecStopPost=/usr/sbin/exportfs -f (code=exited, status=0/SUCCESS) > stderr: Process: 19963 ExecStopPost=/usr/sbin/exportfs -au (code=exited, status=0/SUCCESS) > stderr: Process: 19962 ExecStop=/usr/sbin/rpc.nfsd 0 (code=exited, status=0/SUCCESS) > stderr: Process: 26572 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=0/SUCCESS) > stderr: Process: 26570 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS) > stderr: Main PID: 26572 (code=exited, status=0/SUCCESS) > stderr: > stderr: Jul 15 02:19:13 hcluster2-nfs2 systemd[1]: Starting NFS server and services... > stderr: Jul 15 02:19:13 hcluster2-nfs2 systemd[1]: Started NFS server and services. > stderr: Jul 15 02:19:13 hcluster2-nfs2 systemd[1]: Started NFS server and services. [root@hcluster2-nfs2 ~]# ============== Like other resource agents, the monitor function should only return the status of the service, instead of starting it. This causes a number of issues, such as when you run a pcs resource cleanup, it detects the service up on both nodes and attempts to recover. ============== [root@hcluster2-nfs1 ~]# grep -B1 -i Too /var/log/messages | tail -5 Jul 14 22:25:16 hcluster2-nfs1 pengine[2301]: error: Resource nfs-daemon (ocf::nfsserver) is active on 2 nodes attempting recovery Jul 14 22:25:16 hcluster2-nfs1 pengine[2301]: warning: See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information. -- Jul 14 22:25:16 hcluster2-nfs1 pengine[2301]: error: Resource nfs-daemon (ocf::nfsserver) is active on 2 nodes attempting recovery Jul 14 22:25:16 hcluster2-nfs1 pengine[2301]: warning: See http://clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more information. [root@hcluster2-nfs1 ~]# ============== Version-Release number of selected component (if applicable): resource-agents-3.9.5-54.el7_2.10.x86_64 How reproducible: Setup an active/passive NFS cluster as per https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Administration/ch-nfsserver-HAAA.html . Run "pcs resource debug-monitor nfs-daemon" on the passive node. Steps to Reproduce: See above. Actual results: The NFS services are started on the passive node. Expected results: It should instead exit with a non 0 return code along with returning an error stating the service is not running. Additional info: I believe this may have been caused by some of the fixes implemented in https://rhn.redhat.com/errata/RHBA-2016-0217.html, specifically the one related to BZ#1304370.