Bug 1151180
Summary: | support HA NFSv4 Active/Passive use case. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | David Vossel <dvossel> | ||||
Component: | resource-agents | Assignee: | David Vossel <dvossel> | ||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.0 | CC: | abeekhof, agk, cluster-maint, djansa, fdinitto, jruemker, mnovacek, sbonnevi, sbradley | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | resource-agents-3.9.5-33.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1158901 (view as bug list) | Environment: | |||||
Last Closed: | 2015-03-05 08:00:37 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1159234, 1182692 | ||||||
Bug Blocks: | 1158901 | ||||||
Attachments: |
|
Description
David Vossel
2014-10-09 17:51:25 UTC
I have verified that nfs server can be relocated between nodes without impacting the client ability to access and lock files on the nfs share. For file locking I have checked that the lock is held by the process after the nfs server have moved. The versions are as follows: resource-agents-3.9.5-40.el7.x86_64 nfs-utils-1.3.0-0.8.el7.x86_64 Python script used to lock files is attached as lock.py. ------------------- Driver node has /mnt/shared/0 mounted as nfsv4 to /exports/0 and /mnt/shared/1 as nfsv3 to /exports/1 driver$ mount | grep exports 10.15.107.226:/mnt/shared/0 on /exports/0 type nfs4 (rw,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.15.105.29,local_lock=none,addr=10.15.107.226) 10.15.107.226:/mnt/shared/1 on /exports/1 type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.15.107.226,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.15.107.226) ------------------- Cluster configuration: Driver node has /mnt/shared/0 mounted as nfsv4 to /exports/0 and /mnt/shared/1 to /exports/1 [root@host-030 ~]# pcs config Cluster Name: STSRHTS30662 Corosync Nodes: host-030 host-031 Pacemaker Nodes: host-030 host-031 Resources: Clone: dlm-clone Meta Attrs: interleave=true ordered=true Resource: dlm (class=ocf provider=pacemaker type=controld) Operations: start interval=0s timeout=90 (dlm-start-timeout-90) stop interval=0s timeout=100 (dlm-stop-timeout-100) monitor interval=30s on-fail=fence (dlm-monitor-interval-30s) Clone: clvmd-clone Meta Attrs: interleave=true ordered=true Resource: clvmd (class=ocf provider=heartbeat type=clvm) Attributes: with_cmirrord=1 Operations: start interval=0s timeout=90 (clvmd-start-timeout-90) stop interval=0s timeout=90 (clvmd-stop-timeout-90) monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s) Group: hanfs Resource: havg (class=ocf provider=heartbeat type=LVM) Attributes: volgrpname=shared exclusive=true Operations: start interval=0s timeout=30 (havg-start-timeout-30) stop interval=0s timeout=30 (havg-stop-timeout-30) monitor interval=10 timeout=30 (havg-monitor-interval-10) Resource: mnt-shared (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/shared/shared0 directory=/mnt/shared fstype=ext4 options= Operations: start interval=0s timeout=60 (mnt-shared-start-timeout-60) stop interval=0s timeout=60 (mnt-shared-stop-timeout-60) monitor interval=30s (mnt-shared-monitor-interval-30s) Resource: nfs-daemon (class=ocf provider=heartbeat type=nfsserver) Attributes: nfs_shared_infodir=/mnt/shared/nfs nfs_no_notify=true Operations: stop interval=0s timeout=20s (nfs-daemon-stop-timeout-20s) monitor interval=30s (nfs-daemon-monitor-interval-30s) start interval=0s timeout=90s (nfs-daemon-start-timeout-90s) Resource: export-root (class=ocf provider=heartbeat type=exportfs) Attributes: directory=/mnt/shared clientspec=* options=rw,sync fsid=106 Operations: start interval=0s timeout=40 (export-root-start-timeout-40) stop interval=0s timeout=120 (export-root-stop-timeout-120) monitor interval=10 timeout=20 (export-root-monitor-interval-10) Resource: export0 (class=ocf provider=heartbeat type=exportfs) Attributes: directory=/mnt/shared/0 clientspec=* options=rw,sync fsid=1 Operations: start interval=0s timeout=40 (export0-start-timeout-40) stop interval=0s timeout=120 (export0-stop-timeout-120) monitor interval=10 timeout=20 (export0-monitor-interval-10) Resource: export1 (class=ocf provider=heartbeat type=exportfs) Attributes: directory=/mnt/shared/1 clientspec=* options=rw,sync fsid=2 Operations: start interval=0s timeout=40 (export1-start-timeout-40) stop interval=0s timeout=120 (export1-stop-timeout-120) monitor interval=10 timeout=20 (export1-monitor-interval-10) Resource: vip (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=10.15.107.226 cidr_netmask=22 Operations: start interval=0s timeout=20s (vip-start-timeout-20s) stop interval=0s timeout=20s (vip-stop-timeout-20s) monitor interval=30s (vip-monitor-interval-30s) Resource: nfs-notify (class=ocf provider=heartbeat type=nfsnotify) Attributes: source_host=dhcp-107-226.lab.msp.redhat.com Operations: start interval=0s timeout=90 (nfs-notify-start-timeout-90) stop interval=0s timeout=90 (nfs-notify-stop-timeout-90) monitor interval=30 timeout=90 (nfs-notify-monitor-interval-30) Stonith Devices: Resource: fence-host-030 (class=stonith type=fence_xvm) Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=host-030 pcmk_host_map=host-030:host-030.virt.lab.msp.redhat.com Operations: monitor interval=60s (fence-host-030-monitor-interval-60s) Resource: fence-host-031 (class=stonith type=fence_xvm) Attributes: action=reboot debug=1 pcmk_host_check=static-list pcmk_host_list=host-031 pcmk_host_map=host-031:host-031.virt.lab.msp.redhat.com Operations: monitor interval=60s (fence-host-031-monitor-interval-60s) Fencing Levels: Location Constraints: Ordering Constraints: start dlm-clone then start clvmd-clone (kind:Mandatory) (id:order-dlm-clone-clvmd-clone-mandatory) Colocation Constraints: clvmd-clone with dlm-clone (score:INFINITY) (id:colocation-clvmd-clone-dlm-clone-INFINITY) Cluster Properties: cluster-infrastructure: corosync cluster-name: STSRHTS30662 dc-version: 1.1.12-a14efad have-watchdog: false last-lrm-refresh: 1422353023 no-quorum-policy: freeze [root@host-030 ~]# pcs status Cluster name: STSRHTS30662 Last updated: Tue Jan 27 14:06:01 2015 Last change: Tue Jan 27 10:47:36 2015 Stack: corosync Current DC: host-030 (1) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 14 Resources configured Online: [ host-030 host-031 ] Full list of resources: fence-host-030 (stonith:fence_xvm): Started host-030 fence-host-031 (stonith:fence_xvm): Started host-031 Clone Set: dlm-clone [dlm] Started: [ host-030 host-031 ] Clone Set: clvmd-clone [clvmd] Started: [ host-030 host-031 ] Resource Group: hanfs havg (ocf::heartbeat:LVM): Started host-030 mnt-shared (ocf::heartbeat:Filesystem): Started host-030 nfs-daemon (ocf::heartbeat:nfsserver): Started host-030 export-root (ocf::heartbeat:exportfs): Started host-030 export0 (ocf::heartbeat:exportfs): Started host-030 export1 (ocf::heartbeat:exportfs): Started host-030 vip (ocf::heartbeat:IPaddr2): Started host-030 nfs-notify (ocf::heartbeat:nfsnotify): Started host-030 PCSD Status: host-030: Online host-031: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled --------------- On driver check that the file can be locked exclusively by only one process. host-029:~/sts-pacemaker/QAlib $ python ./unittests/lock.py -f /exports/0/urandom Trying to acquire lock of file /exports/0/urandom with lockf (wait=False)... Lock acquired. Press Enter to unlock.^Z [1]+ Stopped python ./unittests/lock.py -f /exports/0/urandom host-029:~/sts-pacemaker/QAlib $ bg [1]+ python ./unittests/lock.py -f /exports/0/urandom & [1]+ Stopped python ./unittests/lock.py -f /exports/0/urandom host-029:~/sts-pacemaker/QAlib $ python ./unittests/lock.py -f /exports/0/urandom Trying to acquire lock of file /exports/0/urandom with lockf (wait=False)... Traceback (most recent call last): File "./unittests/lock.py", line 81, in <module> if lock.acquire(wait=options.wait): File "./common.py", line 2035, in acquire raise QALockErrorException("Could not create lock: %s" % e) common.QALockErrorException: Could not create lock: [Errno 11] Resource temporarily unavailable --------------- On the node where nfs server is running the file lock is recognized by nfs and by kernel. [root@host-030 ~]# ls -i /mnt/shared/0/urandom /mnt/shared/1/urandom 39 /mnt/shared/0/urandom 40 /mnt/shared/1/urandom [root@host-030 ~]# cat /proc/locks 1: POSIX ADVISORY WRITE 1365 fd:02:40 0 EOF 2: POSIX ADVISORY WRITE 1368 fd:02:39 0 EOF ... [root@host-030 ~]# find /mnt/shared/nfs/statd/ /mnt/shared/nfs/statd/ /mnt/shared/nfs/statd/sm.ha /mnt/shared/nfs/statd/sm.ha/sm.bak /mnt/shared/nfs/statd/sm.ha/sm /mnt/shared/nfs/statd/sm.ha/state /mnt/shared/nfs/statd/sm.bak /mnt/shared/nfs/statd/sm /mnt/shared/nfs/statd/sm/host-029.virt.lab.msp.redhat.com /mnt/shared/nfs/statd/state /mnt/shared/nfs/statd/nfsnotify.bu /mnt/shared/nfs/statd/nfsnotify.bu/sm.bak /mnt/shared/nfs/statd/nfsnotify.bu/sm /mnt/shared/nfs/statd/nfsnotify.bu/sm/host-029.virt.lab.msp.redhat.com /mnt/shared/nfs/statd/nfsnotify.bu/state ------------------- Issue 'pcs resource move': [root@host-030 ~]# pcs resource move hanfs [root@host-030 ~]# pcs status Cluster name: STSRHTS30662 Last updated: Tue Jan 27 14:20:19 2015 Last change: Tue Jan 27 14:20:07 2015 Stack: corosync Current DC: host-030 (1) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 14 Resources configured Online: [ host-030 host-031 ] Full list of resources: fence-host-030 (stonith:fence_xvm): Started host-030 fence-host-031 (stonith:fence_xvm): Started host-031 Clone Set: dlm-clone [dlm] Started: [ host-030 host-031 ] Clone Set: clvmd-clone [clvmd] Started: [ host-030 host-031 ] Resource Group: hanfs havg (ocf::heartbeat:LVM): Started host-031 mnt-shared (ocf::heartbeat:Filesystem): Started host-031 nfs-daemon (ocf::heartbeat:nfsserver): Started host-031 export-root (ocf::heartbeat:exportfs): Started host-031 export0 (ocf::heartbeat:exportfs): Started host-031 export1 (ocf::heartbeat:exportfs): Started host-031 vip (ocf::heartbeat:IPaddr2): Started host-031 nfs-notify (ocf::heartbeat:nfsnotify): Started host-031 PCSD Status: host-030: Online host-031: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled ------------------- Check again that the lock is still held by the original processs (=no new process can get the exclusive lock): host-029:~/sts-pacemaker/QAlib $ python ./unittests/lock.py -f /exports/0/urandom Trying to acquire lock of file /exports/0/urandom with lockf (wait=False)... Traceback (most recent call last): File "./unittests/lock.py", line 81, in <module> if lock.acquire(wait=options.wait): File "./common.py", line 2035, in acquire raise QALockErrorException("Could not create lock: %s" % e) common.QALockErrorException: Could not create lock: [Errno 11] Resource temporarily unavailable ------------------- Check that the lock is recognized by kernel and by nfs on the new node where nfs server is running: [root@host-031 ~]# cat /proc/locks 1: POSIX ADVISORY WRITE 19799 fd:02:39 0 EOF 1: POSIX ADVISORY WRITE 19790 fd:02:40 0 EOF ... [root@host-031 ~]# find /mnt/shared/nfs/statd/ /mnt/shared/nfs/statd/ /mnt/shared/nfs/statd/sm.ha /mnt/shared/nfs/statd/sm.ha/sm.bak /mnt/shared/nfs/statd/sm.ha/sm /mnt/shared/nfs/statd/sm.ha/sm/host-029.virt.lab.msp.redhat.com /mnt/shared/nfs/statd/sm.ha/state /mnt/shared/nfs/statd/sm.bak /mnt/shared/nfs/statd/sm /mnt/shared/nfs/statd/sm/host-029.virt.lab.msp.redhat.com /mnt/shared/nfs/statd/state /mnt/shared/nfs/statd/nfsnotify.bu /mnt/shared/nfs/statd/nfsnotify.bu/sm.bak /mnt/shared/nfs/statd/nfsnotify.bu/sm /mnt/shared/nfs/statd/nfsnotify.bu/sm/host-029.virt.lab.msp.redhat.com /mnt/shared/nfs/statd/nfsnotify.bu/state ------------------- Lock is gone as soon as the process holding it releases it: host-029:~/sts-pacemaker/QAlib $ fg python ./unittests/lock.py -f /exports/0/urandom Trying to release lock of file /exports/0/urandom... Lock released for file /exports/0/urandom <closed file '/exports/0/urandom', mode 'r+' at 0x128cdb0> Created attachment 985119 [details]
python script used to lock files
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0351.html |