Bug 1173193

Summary: nfsserver resource agent times out first but start after 'cleaned' up
Product: Red Hat Enterprise Linux 7 Reporter: michal novacek <mnovacek>
Component: resource-agentsAssignee: Fabio Massimo Di Nitto <fdinitto>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: agk, cluster-maint, fdinitto, xin_chen
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: resource-agents-3.9.5-45.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 04:41:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
'pcs cluster report' command output none

Description michal novacek 2014-12-11 16:23:22 UTC
Created attachment 967291 [details]
'pcs cluster report' command output

Description of problem:
Having hanfs scenario set up according to the
https://github.com/davidvossel/phd/blob/master/scenarios/nfs-active-passive.scenario.
When I do try to start the group nfsserver resource agent does not start (times
out according to the log). However, when 'pcs cleanup' is run an nfsserver
starts happily.

Version-Release number of selected component (if applicable):
resource-agents-3.9.5-38.el7.x86_64
nfs-utils-1.3.0-0.5.el7.x86_64
kernel-3.10.0-210.el7.x86_64

How reproducible: most of the times

Steps to Reproduce:
1. create hanfs group
2. try to start it using 'pcs resource enable hanfs'

Actual results: nfsserver resource agent would not start until 'pcs resource
cleanup' is run upon it

Expected results: nfsserver started at first go

Additional info:
cluster with this problem can be provided.

Comment 10 David Vossel 2015-04-29 15:22:58 UTC
patch
https://github.com/ClusterLabs/resource-agents/pull/607

Comment 12 michal novacek 2015-08-13 14:20:55 UTC
I have verified that nfsserver resource agents have new timeouts set by default
in patch in comment #10 in resource-agents-3.9.5-50.el7.x86_64 and
that these timeouts are sufficient for the daemon to start and/or move.

---

[root@virt-151 ~]# pcs config
Cluster Name: STSRHTS14613
Corosync Nodes:
 virt-151 virt-152 virt-157
Pacemaker Nodes:
 virt-151 virt-152 virt-157

Resources: 
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: start interval=0s timeout=90 (dlm-start-timeout-90)
               stop interval=0s timeout=100 (dlm-stop-timeout-100)
               monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Attributes: with_cmirrord=1 
   Operations: start interval=0s timeout=90 (clvmd-start-timeout-90)
               stop interval=0s timeout=90 (clvmd-stop-timeout-90)
               monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
 Group: ha-nfsserver
  Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=10.34.71.205 cidr_netmask=23 
   Operations: start interval=0s timeout=20s (vip-start-timeout-20s)
               stop interval=0s timeout=20s (vip-stop-timeout-20s)
               monitor interval=30s (vip-monitor-interval-30s)
  Resource: havg (class=ocf provider=heartbeat type=LVM)
   Attributes: volgrpname=shared exclusive=true 
   Operations: start interval=0s timeout=30 (havg-start-timeout-30)
               stop interval=0s timeout=30 (havg-stop-timeout-30)
               monitor interval=10 timeout=30 (havg-monitor-interval-10)
  Resource: nfs-shared-fs (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/shared/shared0 directory=/mnt/shared fstype=ext4 options= 
   Operations: start interval=0s timeout=60 (nfs-shared-fs-start-timeout-60)
               stop interval=0s timeout=60 (nfs-shared-fs-stop-timeout-60)
               monitor interval=30s (nfs-shared-fs-monitor-interval-30s)
  Resource: nfs-server (class=ocf provider=heartbeat type=nfsserver)
   Attributes: nfs_shared_infodir=/mnt/shared0/nfs nfs_ip=10.34.71.205 
   Operations: stop interval=0s timeout=60s (nfs-server-stop-timeout-60s)
               monitor interval=30s (nfs-server-monitor-interval-30s)
               start interval=0s timeout=90s (nfs-server-start-timeout-90s)
  Resource: nfs-export (class=ocf provider=heartbeat type=exportfs)
   Attributes: directory=/mnt/shared clientspec=* options=rw fsid=220 
   Operations: start interval=0s timeout=40 (nfs-export-start-timeout-40)
               stop interval=0s timeout=120 (nfs-export-stop-timeout-120)
               monitor interval=10 timeout=20 (nfs-export-monitor-interval-10)

Stonith Devices: 
 Resource: fence-virt-151 (class=stonith type=fence_xvm)
  Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-151 pcmk_host_map=virt-151:virt-151.cluster-qe.lab.eng.brq.redhat.com 
  Operations: monitor interval=60s (fence-virt-151-monitor-interval-60s)
 Resource: fence-virt-152 (class=stonith type=fence_xvm)
  Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-152 pcmk_host_map=virt-152:virt-152.cluster-qe.lab.eng.brq.redhat.com 
  Operations: monitor interval=60s (fence-virt-152-monitor-interval-60s)
 Resource: fence-virt-157 (class=stonith type=fence_xvm)
  Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=virt-157 pcmk_host_map=virt-157:virt-157.cluster-qe.lab.eng.brq.redhat.com 
  Operations: monitor interval=60s (fence-virt-157-monitor-interval-60s)
Fencing Levels: 

Location Constraints:
Ordering Constraints:
  start dlm-clone then start clvmd-clone (kind:Mandatory) (id:order-dlm-clone-clvmd-clone-mandatory)
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY) (id:colocation-clvmd-clone-dlm-clone-INFINITY)

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: STSRHTS14613
 dc-version: 1.1.13-44eb2dd
 have-watchdog: false
 last-lrm-refresh: 1439470378
 no-quorum-policy: freeze

Comment 15 errata-xmlrpc 2015-11-19 04:41:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2190.html