Bug 1334092
| Summary: | [NFS-Ganesha] : stonith-enabled option not set with new versions of cman,pacemaker,corosync and pcs | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ambarish <asoman> | |
| Component: | nfs-ganesha | Assignee: | Kaleb KEITHLEY <kkeithle> | |
| Status: | CLOSED ERRATA | QA Contact: | Shashank Raj <sraj> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | rhgs-3.1 | CC: | asoman, asrivast, jthottan, kgaillot, kkeithle, ndevos, nlevinki, rcyriac, rhinduja, sashinde, skoduri, sraj | |
| Target Milestone: | --- | Keywords: | ZStream | |
| Target Release: | RHGS 3.1.3 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | glusterfs-3.7.9-7 | Doc Type: | Bug Fix | |
| Doc Text: |
This update includes a new version of Pacemaker that contains changes related to the selection of the Designated Co-ordinator (DC). This updated Pacemaker version caused attempts to set the stonith-enabled property to fail, which meant that set-up and operation did not behave as expected. The setup process now waits for DC selection to complete before setting the stonith-enabled property and continuing with the remainder of the setup.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1336945 (view as bug list) | Environment: | ||
| Last Closed: | 2016-06-23 05:21:34 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1311817, 1336945, 1336947, 1336948 | |||
I upgraded my setup with the same versions and i am able to reproduce the issue:
[root@dhcp43-33 ~]# rpm -qa|grep pacemaker
pacemaker-1.1.14-8.el6.x86_64
pacemaker-libs-1.1.14-8.el6.x86_64
pacemaker-cli-1.1.14-8.el6.x86_64
pacemaker-cluster-libs-1.1.14-8.el6.x86_64
[root@dhcp43-33 ~]# rpm -qa|grep pcs
pcsc-lite-libs-1.5.2-15.el6.x86_64
pcs-0.9.148-7.el6.x86_64
[root@dhcp43-33 ~]# rpm -qa|grep cman
cman-3.0.12.1-78.el6.x86_64
[root@dhcp43-33 ~]# rpm -qa|grep corosync
corosync-1.4.7-5.el6.x86_64
corosynclib-1.4.7-5.el6.x86_64
ganesha setup is successful but if we check the pcs status, the nodes are shown in stopped state with a message "WARNING: no stonith devices and stonith-enabled is not false":
[root@dhcp43-33 ~]# pcs status
Cluster name: G1462802414.82
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Tue May 10 01:06:25 2016 Last change: Tue May 10 01:01:23 2016 by root via cibadmin on dhcp43-33.lab.eng.blr.redhat.com
Stack: cman
Current DC: dhcp43-33.lab.eng.blr.redhat.com (version 1.1.14-8.el6-70404b0) - partition with quorum
4 nodes and 16 resources configured
Online: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ]
Full list of resources:
Clone Set: nfs_setup-clone [nfs_setup]
Stopped: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ]
Clone Set: nfs-mon-clone [nfs-mon]
Started: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ]
Clone Set: nfs-grace-clone [nfs-grace]
Started: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ]
dhcp43-33.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
dhcp43-40.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
dhcp42-11.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
dhcp42-78.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped
PCSD Status:
dhcp43-33.lab.eng.blr.redhat.com: Online
dhcp43-40.lab.eng.blr.redhat.com: Online
dhcp42-11.lab.eng.blr.redhat.com: Online
dhcp42-78.lab.eng.blr.redhat.com: Online
All the corresponding services are up and running as below:
[root@dhcp43-33 yum.repos.d]# service nfs-ganesha status
ganesha.nfsd (pid 14058) is running...
[root@dhcp43-33 yum.repos.d]# service pcsd status
pcsd (pid 14809) is running...
[root@dhcp43-33 yum.repos.d]# service pacemaker status
pacemakerd (pid 14755) is running...
[root@dhcp43-33 yum.repos.d]# service corosync status
corosync (pid 14472) is running...
Volume can be exported properly but not able to mount
following messages can be seen in /var/log/messages:
May 10 01:01:23 dhcp43-33 pengine[14765]: error: Resource start-up disabled since no STONITH resources have been defined
May 10 01:01:23 dhcp43-33 pengine[14765]: error: Either configure some or disable STONITH with the stonith-enabled option
May 10 01:01:23 dhcp43-33 pengine[14765]: error: NOTE: Clusters with shared data need STONITH to ensure data integrity
>>>> pcs property status shows as below:
[root@dhcp43-33 ~]# pcs property show stonith-enabled
Cluster Properties:
>>>> tried setting it manually:
[root@dhcp43-33 ~]# pcs property set stonith-enabled=false
[root@dhcp43-33 ~]# pcs property show stonith-enabled
Cluster Properties:
stonith-enabled: false
and after that pcs status shows the status properly:
[root@dhcp43-33 ~]# pcs status
Cluster name: G1462802414.82
Last updated: Tue May 10 01:19:18 2016 Last change: Tue May 10 01:18:21 2016 by root via cibadmin on dhcp43-33.lab.eng.blr.redhat.com
Stack: cman
Current DC: dhcp43-33.lab.eng.blr.redhat.com (version 1.1.14-8.el6-70404b0) - partition with quorum
4 nodes and 16 resources configured
Online: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ]
Full list of resources:
Clone Set: nfs_setup-clone [nfs_setup]
Started: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ]
Clone Set: nfs-mon-clone [nfs-mon]
Started: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ]
Clone Set: nfs-grace-clone [nfs-grace]
Started: [ dhcp42-11.lab.eng.blr.redhat.com dhcp42-78.lab.eng.blr.redhat.com dhcp43-33.lab.eng.blr.redhat.com dhcp43-40.lab.eng.blr.redhat.com ]
dhcp43-33.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp43-33.lab.eng.blr.redhat.com
dhcp43-40.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp43-40.lab.eng.blr.redhat.com
dhcp42-11.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp42-11.lab.eng.blr.redhat.com
dhcp42-78.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Started dhcp42-78.lab.eng.blr.redhat.com
PCSD Status:
dhcp43-33.lab.eng.blr.redhat.com: Online
dhcp43-40.lab.eng.blr.redhat.com: Online
dhcp42-11.lab.eng.blr.redhat.com: Online
dhcp42-78.lab.eng.blr.redhat.com: Online
and i am able to mount the volume and perform IO's from the mount point.
changing the title accordingly as setting up ganesha doesn't fail. I set up a four node cluster with RHEL 6.8Beta. I used the default HA components available from the 6.8 HA channel, i.e. pcs-0.9.139-9.el6_7.2.x86_64 pacemaker-1.1.12-8.el6_7.2.x86_64 corosync-1.4.7-5.el6.x86_64 cman-3.0.12.1-78.el6.x86_64 I have nothing in my logs about failing to set stonith. We did have a issue with some of our RHEL7 installs getting older versions than what was in the HA channel and I requested that we be sure we were getting the correct (latest) versions. But I'm not sure why we're trying to use newer versions than what's in the HA channel. Requesting input from Ken Gaillot or Andy Beekhof. (Too bad there's no way to put needinfo on more than one person.) To be clear(er): But I'm not sure why we're trying to use newer versions than what's in the HA channel for RHEL6. RHEL 6.8 does have 1.1.14-8; not sure why the channel isn't showing that. I tried installing/updating the pcs and pacemaker packages on a ISO installed RHGS 3.1.2 and after subscribing to RHEL-6 HA channel. it pulls the latest versions of pcs and pacemaker: subscription-manager repos --enable=rhel-6-server-rpms --enable=rhel-scalefs-for-rhel-6-server-rpms --enable=rhs-3-for-rhel-6-server-rpms --enable=rh-gluster-3-nfs-for-rhel-6-server-rpms --enable=rhel-ha-for-rhel-6-server-rpms -------------------------------------------------------------------------- [root@dhcp43-67 yum.repos.d]# yum install pacemaker Dependencies Resolved ===================================================================================================================================================== Package Arch Version Repository Size ===================================================================================================================================================== Installing: pacemaker x86_64 1.1.14-8.el6 rhel-ha-for-rhel-6-server-rpms 461 k Installing for dependencies: cifs-utils x86_64 4.8.1-20.el6 rhel-6-server-rpms 65 k libqb x86_64 0.17.1-2.el6 rhel-ha-for-rhel-6-server-rpms 71 k libtool-ltdl x86_64 2.2.6-15.5.el6 rhel-6-server-rpms 44 k pacemaker-cli x86_64 1.1.14-8.el6 rhel-ha-for-rhel-6-server-rpms 230 k pacemaker-cluster-libs x86_64 1.1.14-8.el6 rhel-ha-for-rhel-6-server-rpms 84 k pacemaker-libs x86_64 1.1.14-8.el6 rhel-ha-for-rhel-6-server-rpms 478 k perl-TimeDate noarch 1:1.16-13.el6 rhel-6-server-rpms 37 k resource-agents x86_64 3.9.5-34.el6_8.2 rhel-ha-for-rhel-6-server-rpms 386 k samba-common x86_64 3.6.509-169.6.el6rhs rhs-3-for-rhel-6-server-rpms 10 M samba-winbind x86_64 3.6.509-169.6.el6rhs rhs-3-for-rhel-6-server-rpms 2.2 M samba-winbind-clients x86_64 3.6.509-169.6.el6rhs rhs-3-for-rhel-6-server-rpms 2.0 M ----------------------------------------------------------------------------- [root@dhcp43-67 yum.repos.d]# yum update pcs Dependencies Resolved ===================================================================================================================================================== Package Arch Version Repository Size ===================================================================================================================================================== Updating: pcs x86_64 0.9.148-7.el6 rhel-ha-for-rhel-6-server-rpms 5.3 M Updating for dependencies: python-clufter x86_64 0.56.2-1.el6 rhel-ha-for-rhel-6-server-rpms 352 k ----------------------------------------------------------------------------- So in this case we need to have the fix for this bug otherwise all the customers, updating to pcs and pacemaker packages which are available in RHEL6 base HA channel, will hit this issue. Proposing it as blocker for 3.1.3 It's actually the previous behavior that could be considered a bug; Red Hat does not support HA clusters without properly configured fencing. The correct fix for this issue is to configure and test fencing devices. Verified this bug with the latest glusterfs-3.7.9-7 and nfs-ganesha-2.3.1-7 build and the issue reported in this bug is fixed. After setting up ganesha cluster, earlier stonith-enabled value was not getting set and because of which the nodes remains in stopped state. but with the latest build, its getting set and no issues related to stonith-enabled are seen as below: [root@dhcp43-119 ~]# pcs property show Cluster Properties: cluster-infrastructure: cman dc-version: 1.1.14-8.el6-70404b0 have-watchdog: false no-quorum-policy: ignore stonith-enabled: false Node Attributes: dhcp42-33.lab.eng.blr.redhat.com: grace-active=1 dhcp43-119.lab.eng.blr.redhat.com: grace-active=1 However there is a new bug for RHEL 6.8 which stills makes nodes to be in stopped state and there are other grace related failures, which is been tracked under below bug: https://bugzilla.redhat.com/show_bug.cgi?id=1341567 based on the above observation, marking this bug as Verified. requested doctext provided The user doesn't need to wait. (This isn't a user visible change, per se.) The setup process (initiated by issuing a `gluster nfs-ganesha enable` command) has been fixed so that it waits as necessary. I've made a slight change to the doc text. Otherwise it looks fine. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240 |
Description of problem: ----------------------- I tried setting up Ganesha (v 2.3.1-4) on RHGS 3.1.3 layered over RHEL 6.8.Everything goes through fine(cluster setup,authentication,Ganesha enabling etc) but pcs status shows nodes as "stopped" : *Snippet from distaf logs* : 2016-05-06 23:30:50,961 INFO run root.lab.eng.bos.redhat.com (cp): pcs status 2016-05-06 23:30:54,313 INFO run RETCODE: 0 2016-05-06 23:30:54,314 INFO run STDOUT: Cluster name: G1462557101.26 WARNING: no stonith devices and stonith-enabled is not false Last updated: Fri May 6 14:00:51 2016 Last change: Fri May 6 14:00:17 2016 by root via cibadmin on gqas001.sbu.lab.eng.bos.redhat.com Stack: cman Current DC: gqas015.sbu.lab.eng.bos.redhat.com (version 1.1.14-8.el6-70404b0) - partition with quorum 4 nodes and 16 resources configured Online: [ gqas001.sbu.lab.eng.bos.redhat.com gqas014.sbu.lab.eng.bos.redhat.com gqas015.sbu.lab.eng.bos.redhat.com gqas016.sbu.lab.eng.bos.redhat.com ] Full list of resources: Clone Set: nfs_setup-clone [nfs_setup] Stopped: [ gqas001.sbu.lab.eng.bos.redhat.com gqas014.sbu.lab.eng.bos.redhat.com gqas015.sbu.lab.eng.bos.redhat.com gqas016.sbu.lab.eng.bos.redhat.com ] Clone Set: nfs-mon-clone [nfs-mon] Started: [ gqas001.sbu.lab.eng.bos.redhat.com gqas014.sbu.lab.eng.bos.redhat.com gqas015.sbu.lab.eng.bos.redhat.com gqas016.sbu.lab.eng.bos.redhat.com ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ gqas001.sbu.lab.eng.bos.redhat.com gqas014.sbu.lab.eng.bos.redhat.com gqas015.sbu.lab.eng.bos.redhat.com gqas016.sbu.lab.eng.bos.redhat.com ] gqas001.sbu.lab.eng.bos.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped gqas014.sbu.lab.eng.bos.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped gqas015.sbu.lab.eng.bos.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped gqas016.sbu.lab.eng.bos.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped PCSD Status: gqas001.sbu.lab.eng.bos.redhat.com: Online gqas014.sbu.lab.eng.bos.redhat.com: Online gqas015.sbu.lab.eng.bos.redhat.com: Online gqas016.sbu.lab.eng.bos.redhat.com: Online I tried downgrading the versions of pacemaker,cman,pcs and corosync and it gives a clean automation run and setup is successful(pcs status=good). Version-Release number of selected component (if applicable): -------------------------------------------------------------- [root@gqas001 yum.repos.d]# rpm -qa|grep cman cman-3.0.12.1-78.el6.x86_64 [root@gqas001 yum.repos.d]# rpm -qa|grep pcs pcs-0.9.148-7.el6.x86_64 [root@gqas001 yum.repos.d]# rpm -qa|grep pacemaker pacemaker-libs-1.1.14-8.el6.x86_64 pacemaker-cli-1.1.14-8.el6.x86_64 pacemaker-cluster-libs-1.1.14-8.el6.x86_64 pacemaker-1.1.14-8.el6.x86_64 [root@gqas001 yum.repos.d]# rpm -qa|grep corosync corosync-1.4.7-5.el6.x86_64 corosynclib-1.4.7-5.el6.x86_64 [root@gqas001 yum.repos.d]# How reproducible: ---------------- 3/3 Steps to Reproduce: ------------------- 1. Do a yum install pacemaker cman pcs ccs resource-agents corosync .This will fetch you latest versions of all these packages 2. Run Ganesha setup via distaf.It'll fail with the error above 3. Downgrade the packages and rerun. Actual results: -------------- Ganesha setup should be successful with latest version of pacemaker,cman,pcs and corosync packages. Expected results: ----------------- Ganesha setup fails on latest versions of pacemaker,cman,pcs and corosync packages. Additional info: ---------------- Testbed : RHEL 6.8