Description of problem: ----------------------- 4 node cluster,4 clients accessing the export via v4. Kill NFS-Ganesha on any node. Grace period should be entered and any and all IO should halt for 90 seconds. I observed that other clients continued running their IO,which is unusual. This may be a regression introduced in latest pacemaker/corosync bits. ============================================================================ Ken Gaillot writes in email > clone RA is created with > > pcs resource create nfs-grace ocf:heartbeat:ganesha_grace --clone meta > notify=true With the above command, pcs puts the notify=true meta-attribute on the primitive instead of the clone. Looking at the pcs help, that seems expected (--clone notify=true would put it on the clone, meta notify=true puts it on the primitive). If you drop the "meta" above, I think it will work again. If that exact command worked on 7.3, pcs behavior might have changed. Double-check, and if so, I'll ask the pcs devs to look into it. ============================================================================ Indeed changing the resource create from `pcs resource create nfs-grace ocf:heartbeat:ganesha_grace --clone meta notify=true` to `pcs resource create nfs-grace ocf:heartbeat:ganesha_grace --clone notify=true` restores the original behavior seen in RHEL 7.3 and earlier. Please check with the pcs devs and we will confirm that the changed command also works correctly on RHEL 7.3.
REVIEW: https://review.gluster.org/17534 (common-ha: surviving ganesha.nfsd not put in grace on fail-over) posted (#1) for review on release-3.10 by Kaleb KEITHLEY (kkeithle)
COMMIT: https://review.gluster.org/17534 committed in release-3.10 by Kaleb KEITHLEY (kkeithle) ------ commit ee1a7560f8c27cc2721347dae37729aba9bac2d6 Author: Kaleb S. KEITHLEY <kkeithle> Date: Tue Jun 13 07:36:50 2017 -0400 common-ha: surviving ganesha.nfsd not put in grace on fail-over Behavior change is seen in new HA in RHEL 7.4 Beta. Up to now clone RAs have been created with "pcs resource create ... meta notify=true". Their notify method is invoked with pre-start or post-stop when one of the clone RAs is started or stopped. In 7.4 Beta the notify method we observe that the notify method is not invoked when one of the clones is stopped (or started). Ken Gaillot, one of the pacemaker devs, wrote: With the above command, pcs puts the notify=true meta-attribute on the primitive instead of the clone. Looking at the pcs help, that seems expected (--clone notify=true would put it on the clone, meta notify=true puts it on the primitive). If you drop the "meta" above, I think it will work again. And indeed his suggested fix does work on both RHEL 7.4 Beta and RHEL 7.3 and presumably Fedora. Change-Id: Idbb539f1366df6d39f77431c357dff4e53a2df6d BUG: 1461019 Signed-off-by: Kaleb S. KEITHLEY <kkeithle> Reviewed-on: https://review.gluster.org/17534 Smoke: Gluster Build System <jenkins.org> Reviewed-by: soumya k <skoduri> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org>
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained. As a result this bug is being closed. If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.