Bug 1334092
Summary: | [NFS-Ganesha] : stonith-enabled option not set with new versions of cman,pacemaker,corosync and pcs | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ambarish <asoman> | |
Component: | nfs-ganesha | Assignee: | Kaleb KEITHLEY <kkeithle> | |
Status: | CLOSED ERRATA | QA Contact: | Shashank Raj <sraj> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.1 | CC: | asoman, asrivast, jthottan, kgaillot, kkeithle, ndevos, nlevinki, rcyriac, rhinduja, sashinde, skoduri, sraj | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | RHGS 3.1.3 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.7.9-7 | Doc Type: | Bug Fix | |
Doc Text: |
This update includes a new version of Pacemaker that contains changes related to the selection of the Designated Co-ordinator (DC). This updated Pacemaker version caused attempts to set the stonith-enabled property to fail, which meant that set-up and operation did not behave as expected. The setup process now waits for DC selection to complete before setting the stonith-enabled property and continuing with the remainder of the setup.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1336945 (view as bug list) | Environment: | ||
Last Closed: | 2016-06-23 05:21:34 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1311817, 1336945, 1336947, 1336948 |
Description
Ambarish
2016-05-08 10:02:36 UTC
I upgraded my setup with the same versions and i am able to reproduce the issue: [root@dhcp43-33 ~]# rpm -qa|grep pacemaker pacemaker-1.1.14-8.el6.x86_64 pacemaker-libs-1.1.14-8.el6.x86_64 pacemaker-cli-1.1.14-8.el6.x86_64 pacemaker-cluster-libs-1.1.14-8.el6.x86_64 [root@dhcp43-33 ~]# rpm -qa|grep pcs pcsc-lite-libs-1.5.2-15.el6.x86_64 pcs-0.9.148-7.el6.x86_64 [root@dhcp43-33 ~]# rpm -qa|grep cman cman-3.0.12.1-78.el6.x86_64 [root@dhcp43-33 ~]# rpm -qa|grep corosync corosync-1.4.7-5.el6.x86_64 corosynclib-1.4.7-5.el6.x86_64 ganesha setup is successful but if we check the pcs status, the nodes are shown in stopped state with a message "WARNING: no stonith devices and stonith-enabled is not false": [root@dhcp43-33 ~]# pcs status Cluster name: G1462802414.82 WARNING: no stonith devices and stonith-enabled is not false Last updated: Tue May 10 01:06:25 2016 Last change: Tue May 10 01:01:23 2016 by root via cibadmin on dhcp43-33.lab.eng.blr.redhat.com Stack: cman Current DC: dhcp43-33.lab.eng.blr.redhat.com (version 1.1.14-8.el6-70404b0) - partition with quorum 4 nodes and 16 resources configured Online: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Stopped: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ] dhcp43-33.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped dhcp43-40.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped dhcp42-11.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped dhcp42-78.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped PCSD Status: dhcp43-33.lab.eng.blr.redhat.com: Online dhcp43-40.lab.eng.blr.redhat.com: Online dhcp42-11.lab.eng.blr.redhat.com: Online dhcp42-78.lab.eng.blr.redhat.com: Online All the corresponding services are up and running as below: [root@dhcp43-33 yum.repos.d]# service nfs-ganesha status ganesha.nfsd (pid 14058) is running... [root@dhcp43-33 yum.repos.d]# service pcsd status pcsd (pid 14809) is running... [root@dhcp43-33 yum.repos.d]# service pacemaker status pacemakerd (pid 14755) is running... [root@dhcp43-33 yum.repos.d]# service corosync status corosync (pid 14472) is running... Volume can be exported properly but not able to mount following messages can be seen in /var/log/messages: May 10 01:01:23 dhcp43-33 pengine[14765]: error: Resource start-up disabled since no STONITH resources have been defined May 10 01:01:23 dhcp43-33 pengine[14765]: error: Either configure some or disable STONITH with the stonith-enabled option May 10 01:01:23 dhcp43-33 pengine[14765]: error: NOTE: Clusters with shared data need STONITH to ensure data integrity >>>> pcs property status shows as below: [root@dhcp43-33 ~]# pcs property show stonith-enabled Cluster Properties: >>>> tried setting it manually: [root@dhcp43-33 ~]# pcs property set stonith-enabled=false [root@dhcp43-33 ~]# pcs property show stonith-enabled Cluster Properties: stonith-enabled: false and after that pcs status shows the status properly: [root@dhcp43-33 ~]# pcs status Cluster name: G1462802414.82 Last updated: Tue May 10 01:19:18 2016 Last change: Tue May 10 01:18:21 2016 by root via cibadmin on dhcp43-33.lab.eng.blr.redhat.com Stack: cman Current DC: dhcp43-33.lab.eng.blr.redhat.com (version 1.1.14-8.el6-70404b0) - partition with quorum 4 nodes and 16 resources configured Online: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Started: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ] dhcp43-33.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp43-40.lab.eng.blr.redhat.com dhcp42-11.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp42-78.lab.eng.blr.redhat.com PCSD Status: dhcp43-33.lab.eng.blr.redhat.com: Online dhcp43-40.lab.eng.blr.redhat.com: Online dhcp42-11.lab.eng.blr.redhat.com: Online dhcp42-78.lab.eng.blr.redhat.com: Online and i am able to mount the volume and perform IO's from the mount point. changing the title accordingly as setting up ganesha doesn't fail. I set up a four node cluster with RHEL 6.8Beta. I used the default HA components available from the 6.8 HA channel, i.e. pcs-0.9.139-9.el6_7.2.x86_64 pacemaker-1.1.12-8.el6_7.2.x86_64 corosync-1.4.7-5.el6.x86_64 cman-3.0.12.1-78.el6.x86_64 I have nothing in my logs about failing to set stonith. We did have a issue with some of our RHEL7 installs getting older versions than what was in the HA channel and I requested that we be sure we were getting the correct (latest) versions. But I'm not sure why we're trying to use newer versions than what's in the HA channel. Requesting input from Ken Gaillot or Andy Beekhof. (Too bad there's no way to put needinfo on more than one person.) To be clear(er): But I'm not sure why we're trying to use newer versions than what's in the HA channel for RHEL6. RHEL 6.8 does have 1.1.14-8; not sure why the channel isn't showing that. I tried installing/updating the pcs and pacemaker packages on a ISO installed RHGS 3.1.2 and after subscribing to RHEL-6 HA channel. it pulls the latest versions of pcs and pacemaker: subscription-manager repos --enable=rhel-6-server-rpms --enable=rhel-scalefs-for-rhel-6-server-rpms --enable=rhs-3-for-rhel-6-server-rpms --enable=rh-gluster-3-nfs-for-rhel-6-server-rpms --enable=rhel-ha-for-rhel-6-server-rpms -------------------------------------------------------------------------- [root@dhcp43-67 yum.repos.d]# yum install pacemaker Dependencies Resolved ===================================================================================================================================================== Package Arch Version Repository Size ===================================================================================================================================================== Installing: pacemaker x86_64 1.1.14-8.el6 rhel-ha-for-rhel-6-server-rpms 461 k Installing for dependencies: cifs-utils x86_64 4.8.1-20.el6 rhel-6-server-rpms 65 k libqb x86_64 0.17.1-2.el6 rhel-ha-for-rhel-6-server-rpms 71 k libtool-ltdl x86_64 2.2.6-15.5.el6 rhel-6-server-rpms 44 k pacemaker-cli x86_64 1.1.14-8.el6 rhel-ha-for-rhel-6-server-rpms 230 k pacemaker-cluster-libs x86_64 1.1.14-8.el6 rhel-ha-for-rhel-6-server-rpms 84 k pacemaker-libs x86_64 1.1.14-8.el6 rhel-ha-for-rhel-6-server-rpms 478 k perl-TimeDate noarch 1:1.16-13.el6 rhel-6-server-rpms 37 k resource-agents x86_64 3.9.5-34.el6_8.2 rhel-ha-for-rhel-6-server-rpms 386 k samba-common x86_64 3.6.509-169.6.el6rhs rhs-3-for-rhel-6-server-rpms 10 M samba-winbind x86_64 3.6.509-169.6.el6rhs rhs-3-for-rhel-6-server-rpms 2.2 M samba-winbind-clients x86_64 3.6.509-169.6.el6rhs rhs-3-for-rhel-6-server-rpms 2.0 M ----------------------------------------------------------------------------- [root@dhcp43-67 yum.repos.d]# yum update pcs Dependencies Resolved ===================================================================================================================================================== Package Arch Version Repository Size ===================================================================================================================================================== Updating: pcs x86_64 0.9.148-7.el6 rhel-ha-for-rhel-6-server-rpms 5.3 M Updating for dependencies: python-clufter x86_64 0.56.2-1.el6 rhel-ha-for-rhel-6-server-rpms 352 k ----------------------------------------------------------------------------- So in this case we need to have the fix for this bug otherwise all the customers, updating to pcs and pacemaker packages which are available in RHEL6 base HA channel, will hit this issue. Proposing it as blocker for 3.1.3 It's actually the previous behavior that could be considered a bug; Red Hat does not support HA clusters without properly configured fencing. The correct fix for this issue is to configure and test fencing devices. Verified this bug with the latest glusterfs-3.7.9-7 and nfs-ganesha-2.3.1-7 build and the issue reported in this bug is fixed. After setting up ganesha cluster, earlier stonith-enabled value was not getting set and because of which the nodes remains in stopped state. but with the latest build, its getting set and no issues related to stonith-enabled are seen as below: [root@dhcp43-119 ~]# pcs property show Cluster Properties: cluster-infrastructure: cman dc-version: 1.1.14-8.el6-70404b0 have-watchdog: false no-quorum-policy: ignore stonith-enabled: false Node Attributes: dhcp42-33.lab.eng.blr.redhat.com: grace-active=1 dhcp43-119.lab.eng.blr.redhat.com: grace-active=1 However there is a new bug for RHEL 6.8 which stills makes nodes to be in stopped state and there are other grace related failures, which is been tracked under below bug: https://bugzilla.redhat.com/show_bug.cgi?id=1341567 based on the above observation, marking this bug as Verified. requested doctext provided The user doesn't need to wait. (This isn't a user visible change, per se.) The setup process (initiated by issuing a `gluster nfs-ganesha enable` command) has been fixed so that it waits as necessary. I've made a slight change to the doc text. Otherwise it looks fine. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240 |