Bug 1461019 - [Ganesha] : Grace period is not being adhered to on RHEL 7.4; Clients continue running IO even during grace.
[Ganesha] : Grace period is not being adhered to on RHEL 7.4; Clients continu...
Status: MODIFIED
Product: GlusterFS
Classification: Community
Component: common-ha (Show other bugs)
3.10
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Kaleb KEITHLEY
: Triaged
Depends On: 1457179
Blocks: glusterfs-3.10.4
  Show dependency treegraph
 
Reported: 2017-06-13 07:30 EDT by Kaleb KEITHLEY
Modified: 2017-07-05 17:11 EDT (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1457179
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 1 Kaleb KEITHLEY 2017-06-13 07:32:51 EDT
Description of problem:
-----------------------

4 node cluster,4 clients accessing the export via v4.

Kill NFS-Ganesha on any node.

Grace period should be entered and any and all IO should halt for 90 seconds.

I observed that other clients continued running their IO,which is unusual.

This may be a regression introduced in latest pacemaker/corosync bits. 

============================================================================

Ken Gaillot writes in email
> clone RA is created with
>
>   pcs resource create nfs-grace ocf:heartbeat:ganesha_grace --clone meta
> notify=true

With the above command, pcs puts the notify=true meta-attribute on the
primitive instead of the clone. Looking at the pcs help, that seems
expected (--clone notify=true would put it on the clone, meta
notify=true puts it on the primitive). If you drop the "meta" above, I
think it will work again.

If that exact command worked on 7.3, pcs behavior might have changed.
Double-check, and if so, I'll ask the pcs devs to look into it.

============================================================================

Indeed changing the resource create from

  `pcs resource create nfs-grace ocf:heartbeat:ganesha_grace --clone meta
 notify=true`

to

  `pcs resource create nfs-grace ocf:heartbeat:ganesha_grace --clone notify=true`

restores the original behavior seen in RHEL 7.3 and earlier.

Please check with the pcs devs and we will confirm that the changed command also works correctly on RHEL 7.3.
Comment 2 Worker Ant 2017-06-13 07:52:07 EDT
REVIEW: https://review.gluster.org/17534 (common-ha: surviving ganesha.nfsd not put in grace on fail-over) posted (#1) for review on release-3.10 by Kaleb KEITHLEY (kkeithle@redhat.com)
Comment 3 Worker Ant 2017-06-14 07:46:38 EDT
COMMIT: https://review.gluster.org/17534 committed in release-3.10 by Kaleb KEITHLEY (kkeithle@redhat.com) 
------
commit ee1a7560f8c27cc2721347dae37729aba9bac2d6
Author: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Date:   Tue Jun 13 07:36:50 2017 -0400

    common-ha: surviving ganesha.nfsd not put in grace on fail-over
    
    Behavior change is seen in new HA in RHEL 7.4 Beta. Up to now clone
    RAs have been created with "pcs resource create ... meta notify=true".
    Their notify method is invoked with pre-start or post-stop when one of
    the clone RAs is started or stopped.
    
    In 7.4 Beta the notify method we observe that the notify method is not
    invoked when one of the clones is stopped (or started).
    
    Ken Gaillot, one of the pacemaker devs, wrote:
      With the above command, pcs puts the notify=true meta-attribute
      on the primitive instead of the clone. Looking at the pcs help,
      that seems expected (--clone notify=true would put it on the clone,
      meta notify=true puts it on the primitive). If you drop the "meta"
      above, I think it will work again.
    
    And indeed his suggested fix does work on both RHEL 7.4 Beta and RHEL
    7.3 and presumably Fedora.
    
    Change-Id: Idbb539f1366df6d39f77431c357dff4e53a2df6d
    BUG: 1461019
    Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
    Reviewed-on: https://review.gluster.org/17534
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: soumya k <skoduri@redhat.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>

Note You need to log in before you can comment on or make changes to this bug.