Bug 1173193

Summary:

nfsserver resource agent times out first but start after 'cleaned' up

Product:

Red Hat Enterprise Linux 7

Reporter:

michal novacek <mnovacek>

Component:

resource-agents

Assignee:

Fabio Massimo Di Nitto <fdinitto>

Status:

CLOSED ERRATA

QA Contact:

cluster-qe <cluster-qe>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

7.1

CC:

agk, cluster-maint, fdinitto, xin_chen

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

resource-agents-3.9.5-45.el7

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-11-19 04:41:10 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
'pcs cluster report' command output	none

Description michal novacek 2014-12-11 16:23:22 UTC

Created attachment 967291 [details]
'pcs cluster report' command output

Description of problem:
Having hanfs scenario set up according to the
https://github.com/davidvossel/phd/blob/master/scenarios/nfs-active-passive.scenario.
When I do try to start the group nfsserver resource agent does not start (times
out according to the log). However, when 'pcs cleanup' is run an nfsserver
starts happily.

Version-Release number of selected component (if applicable):
resource-agents-3.9.5-38.el7.x86_64
nfs-utils-1.3.0-0.5.el7.x86_64
kernel-3.10.0-210.el7.x86_64

How reproducible: most of the times

Steps to Reproduce:
1. create hanfs group
2. try to start it using 'pcs resource enable hanfs'

Actual results: nfsserver resource agent would not start until 'pcs resource
cleanup' is run upon it

Expected results: nfsserver started at first go

Additional info:
cluster with this problem can be provided.

Comment 10 David Vossel 2015-04-29 15:22:58 UTC

patch
https://github.com/ClusterLabs/resource-agents/pull/607

Comment 12 michal novacek 2015-08-13 14:20:55 UTC

I have verified that nfsserver resource agents have new timeouts set by default
in patch in comment #10 in resource-agents-3.9.5-50.el7.x86_64 and
that these timeouts are sufficient for the daemon to start and/or move.

---

[root@virt-151 ~]# pcs config
Cluster Name: STSRHTS14613
Corosync Nodes:
 virt-151 virt-152 virt-157
Pacemaker Nodes:
 virt-151 virt-152 virt-157

Resources: 
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: start interval=0s timeout=90 (dlm-start-timeout-90)
               stop interval=0s timeout=100 (dlm-stop-timeout-100)
               monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Attributes: with_cmirrord=1 
   Operations: start interval=0s timeout=90 (clvmd-start-timeout-90)
               stop interval=0s timeout=90 (clvmd-stop-timeout-90)
               monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
 Group: ha-nfsserver
  Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=10.34.71.205 cidr_netmask=23 
   Operations: start interval=0s timeout=20s (vip-start-timeout-20s)
               stop interval=0s timeout=20s (vip-stop-timeout-20s)
               monitor interval=30s (vip-monitor-interval-30s)
  Resource: havg (class=ocf provider=heartbeat type=LVM)
   Attributes: volgrpname=shared exclusive=true 
   Operations: start interval=0s timeout=30 (havg-start-timeout-30)
               stop interval=0s timeout=30 (havg-stop-timeout-30)
               monitor interval=10 timeout=30 (havg-monitor-interval-10)
  Resource: nfs-shared-fs (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/shared/shared0 directory=/mnt/shared fstype=ext4 options= 
   Operations: start interval=0s timeout=60 (nfs-shared-fs-start-timeout-60)
               stop interval=0s timeout=60 (nfs-shared-fs-stop-timeout-60)
               monitor interval=30s (nfs-shared-fs-monitor-interval-30s)
  Resource: nfs-server (class=ocf provider=heartbeat type=nfsserver)
   Attributes: nfs_shared_infodir=/mnt/shared0/nfs nfs_ip=10.34.71.205 
   Operations: stop interval=0s timeout=60s (nfs-server-stop-timeout-60s)
               monitor interval=30s (nfs-server-monitor-interval-30s)
               start interval=0s timeout=90s (nfs-server-start-timeout-90s)
  Resource: nfs-export (class=ocf provider=heartbeat type=exportfs)
   Attributes: directory=/mnt/shared clientspec=* options=rw fsid=220 
   Operations: start interval=0s timeout=40 (nfs-export-start-timeout-40)
               stop interval=0s timeout=120 (nfs-export-stop-timeout-120)
               monitor interval=10 timeout=20 (nfs-export-monitor-interval-10)

Stonith Devices: 
 Resource: fence-virt-151 (class=stonith type=fence_xvm)
  Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-151 pcmk_host_map=virt-151:virt-151.cluster-qe.lab.eng.brq.redhat.com 
  Operations: monitor interval=60s (fence-virt-151-monitor-interval-60s)
 Resource: fence-virt-152 (class=stonith type=fence_xvm)
  Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-152 pcmk_host_map=virt-152:virt-152.cluster-qe.lab.eng.brq.redhat.com 
  Operations: monitor interval=60s (fence-virt-152-monitor-interval-60s)
 Resource: fence-virt-157 (class=stonith type=fence_xvm)
  Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-157 pcmk_host_map=virt-157:virt-157.cluster-qe.lab.eng.brq.redhat.com 
  Operations: monitor interval=60s (fence-virt-157-monitor-interval-60s)
Fencing Levels: 

Location Constraints:
Ordering Constraints:
  start dlm-clone then start clvmd-clone (kind:Mandatory) (id:order-dlm-clone-clvmd-clone-mandatory)
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY) (id:colocation-clvmd-clone-dlm-clone-INFINITY)

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: STSRHTS14613
 dc-version: 1.1.13-44eb2dd
 have-watchdog: false
 last-lrm-refresh: 1439470378
 no-quorum-policy: freeze

Comment 15 errata-xmlrpc 2015-11-19 04:41:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2190.html