Description of problem: I created a cluster of glusterfs of 8 nodes. Now, I tried to create a cluster of nfs-ganesha. The nfs-ganesha process has come up but pcs cluster has failed Version-Release number of selected component (if applicable): glusterfs-3.7.0-2.el6rhs.x86_64 nfs-ganesha-2.2.0-0.el6.x86_64 How reproducible: tried to first time Steps to Reproduce: 1. take 8 VMs 2. create a volume of type 6x2, start it 3. bring up nfs-ganesha after doing the pre-requisite Actual results: node in one subnet are nfs1, nfs2, nfs3, nfs4 pcs status on these is this nfs1 Cluster name: ganesha-ha-360 Last updated: Tue Jun 2 16:19:37 2015 Last change: Tue Jun 2 16:15:11 2015 Stack: cman Current DC: nfs1 - partition WITHOUT quorum Version: 1.1.11-97629de 8 Nodes configured 32 Resources configured Online: [ nfs1 nfs2 nfs3 nfs4 ] OFFLINE: [ nfs5 nfs6 nfs7 nfs8 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Stopped: [ nfs1 nfs2 nfs3 nfs4 nfs5 nfs6 nfs7 nfs8 ] Clone Set: nfs-grace-clone [nfs-grace] Stopped: [ nfs1 nfs2 nfs3 nfs4 nfs5 nfs6 nfs7 nfs8 ] nfs1-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs1-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs2-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs2-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs3-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs3-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs4-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs4-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs5-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs5-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs6-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs6-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs7-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs7-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs8-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs8-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped --- nfs5, nfs6,nfs7,nfs8 are on other subnet, nfs5 Cluster name: ganesha-ha-360 Last updated: Tue Jun 2 21:43:16 2015 Last change: Tue Jun 2 21:37:54 2015 Stack: cman Current DC: nfs5 - partition WITHOUT quorum Version: 1.1.11-97629de 8 Nodes configured 0 Resources configured Online: [ nfs5 nfs6 nfs7 nfs8 ] OFFLINE: [ nfs1 nfs2 nfs3 nfs4 ] Full list of resources: route on nfs1, [root@nfs1 ~]# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.70.36.0 * 255.255.254.0 U 0 0 0 eth0 link-local * 255.255.0.0 U 1002 0 0 eth0 default 10.70.37.254 0.0.0.0 UG 0 0 0 eth0 route on nfs5, [root@nfs5 ~]# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.70.44.0 * 255.255.252.0 U 0 0 0 eth0 link-local * 255.255.0.0 U 1002 0 0 eth0 default 10.70.47.254 0.0.0.0 UG 0 0 0 eth0 Expected results: Well nfs-ganesha cluster should come up Additional info:
Peer status, [root@nfs1 ~]# gluster peer status Number of Peers: 7 Hostname: 10.70.37.77 Uuid: 052ac999-2cc6-4aaa-9371-9b5a09455952 State: Peer in Cluster (Connected) Hostname: 10.70.37.69 Uuid: f67b2790-37b6-49be-b49e-6cb9fa42fac7 State: Peer in Cluster (Connected) Hostname: 10.70.37.76 Uuid: 3fa0fbaf-f852-4dba-8957-4f5fda5bff7a State: Peer in Cluster (Connected) Hostname: 10.70.46.180 Uuid: 937cc093-b4e1-42c4-8fd6-71d2bc106e3e State: Peer in Cluster (Connected) Hostname: 10.70.46.185 Uuid: e5601ea6-0787-46ef-891f-f97637ea3beb State: Peer in Cluster (Connected) Hostname: 10.70.46.172 Uuid: 667a09ea-e32b-4300-aa46-73b78906db39 State: Peer in Cluster (Connected) Hostname: 10.70.46.179 Uuid: 486f6fc1-e8a5-4bcd-a677-82a1c1bf7e0e State: Peer in Cluster (Connected) ganesha-ha.conf as updated on all nodes, nfs1 # Name of the HA cluster created. HA_NAME="ganesha-ha-360" # The server from which you intend to mount # the shared volume. HA_VOL_SERVER="nfs1" # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES="nfs1,nfs2,nfs3,nfs4,nfs5,nfs6,nfs7,nfs8" # Virtual IPs of each of the nodes specified above. VIP_nfs1="10.70.36.217" VIP_nfs2="10.70.36.218" VIP_nfs3="10.70.36.219" VIP_nfs4="10.70.36.220" VIP_nfs5="10.70.44.92" VIP_nfs6="10.70.44.93" VIP_nfs7="10.70.44.94" VIP_nfs8="10.70.44.95" --- nfs2 # Name of the HA cluster created. HA_NAME="ganesha-ha-360" # The server from which you intend to mount # the shared volume. HA_VOL_SERVER="nfs1" # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES="nfs1,nfs2,nfs3,nfs4,nfs5,nfs6,nfs7,nfs8" # Virtual IPs of each of the nodes specified above. VIP_nfs1="10.70.36.217" VIP_nfs2="10.70.36.218" VIP_nfs3="10.70.36.219" VIP_nfs4="10.70.36.220" VIP_nfs5="10.70.44.92" VIP_nfs6="10.70.44.93" VIP_nfs7="10.70.44.94" VIP_nfs8="10.70.44.95" --- nfs3 # Name of the HA cluster created. HA_NAME="ganesha-ha-360" # The server from which you intend to mount # the shared volume. HA_VOL_SERVER="nfs1" # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES="nfs1,nfs2,nfs3,nfs4,nfs5,nfs6,nfs7,nfs8" # Virtual IPs of each of the nodes specified above. VIP_nfs1="10.70.36.217" VIP_nfs2="10.70.36.218" VIP_nfs3="10.70.36.219" VIP_nfs4="10.70.36.220" VIP_nfs5="10.70.44.92" VIP_nfs6="10.70.44.93" VIP_nfs7="10.70.44.94" VIP_nfs8="10.70.44.95" --- nfs4 # Name of the HA cluster created. HA_NAME="ganesha-ha-360" # The server from which you intend to mount # the shared volume. HA_VOL_SERVER="nfs1" # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES="nfs1,nfs2,nfs3,nfs4,nfs5,nfs6,nfs7,nfs8" # Virtual IPs of each of the nodes specified above. VIP_nfs1="10.70.36.217" VIP_nfs2="10.70.36.218" VIP_nfs3="10.70.36.219" VIP_nfs4="10.70.36.220" VIP_nfs5="10.70.44.92" VIP_nfs6="10.70.44.93" VIP_nfs7="10.70.44.94" VIP_nfs8="10.70.44.95" --- nfs5 # Name of the HA cluster created. HA_NAME="ganesha-ha-360" #The server from which you intend to mount # the shared volume. HA_VOL_SERVER="nfs1" # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES="nfs1,nfs2,nfs3,nfs4,nfs5,nfs6,nfs7,nfs8" # Virtual IPs of each of the nodes specified above. VIP_nfs1="10.70.36.217" VIP_nfs2="10.70.36.218" VIP_nfs3="10.70.36.219" VIP_nfs4="10.70.36.220" VIP_nfs5="10.70.44.92" VIP_nfs6="10.70.44.93" VIP_nfs7="10.70.44.94" VIP_nfs8="10.70.44.95" --- nfs6 # Name of the HA cluster created. HA_NAME="ganesha-ha-360" # The server from which you intend to mount # the shared volume. HA_VOL_SERVER="nfs1" # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES="nfs1,nfs2,nfs3,nfs4,nfs5,nfs6,nfs7,nfs8" # Virtual IPs of each of the nodes specified above. VIP_nfs1="10.70.36.217" VIP_nfs2="10.70.36.218" VIP_nfs3="10.70.36.219" VIP_nfs4="10.70.36.220" VIP_nfs5="10.70.44.92" VIP_nfs6="10.70.44.93" VIP_nfs7="10.70.44.94" VIP_nfs8="10.70.44.95" --- nfs7 # Name of the HA cluster created. HA_NAME="ganesha-ha-360" # The server from which you intend to mount # the shared volume. HA_VOL_SERVER="nfs1" # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES="nfs1,nfs2,nfs3,nfs4,nfs5,nfs6,nfs7,nfs8" # Virtual IPs of each of the nodes specified above. VIP_nfs1="10.70.36.217" VIP_nfs2="10.70.36.218" VIP_nfs3="10.70.36.219" VIP_nfs4="10.70.36.220" VIP_nfs5="10.70.44.92" VIP_nfs6="10.70.44.93" VIP_nfs7="10.70.44.94" VIP_nfs8="10.70.44.95" --- nfs8 # Name of the HA cluster created. HA_NAME="ganesha-ha-360" # The server from which you intend to mount # the shared volume. HA_VOL_SERVER="nfs1" # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES="nfs1,nfs2,nfs3,nfs4,nfs5,nfs6,nfs7,nfs8" # Virtual IPs of each of the nodes specified above VIP_nfs1="10.70.36.217" VIP_nfs2="10.70.36.218" VIP_nfs3="10.70.36.219" VIP_nfs4="10.70.36.220" VIP_nfs5="10.70.44.92" VIP_nfs6="10.70.44.93" VIP_nfs7="10.70.44.94" VIP_nfs8="10.70.44.95". ---
Add sos reports from one node each from the two subnets and also the contents of "/var/log/messages". Can you provide the IP of any of these machines, just to see if all the requirements were met?
What is the subnet mask? Thanks. I'm going to double check with the HA people, but I'm guessing that HA doesn't work across subnets like this.
The route command response is updated in the description section
Created attachment 1034212 [details] sosreport of nfs1
8 node cluster setup fails even when vms are in same subnet nfs5 root 12930 1 0 01:33 ? 00:00:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 15696 22181 6 02:01 pts/0 00:00:00 ssh nfs5 ps -eaf | grep ganesha root 15703 15698 2 02:01 ? 00:00:00 bash -c ps -eaf | grep ganesha root 15713 15703 0 02:01 ? 00:00:00 grep ganesha --- nfs6 root 10168 1 0 01:33 ? 00:00:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 11264 11259 2 02:01 ? 00:00:00 bash -c ps -eaf | grep ganesha root 11274 11264 0 02:01 ? 00:00:00 grep ganesha --- nfs7 root 17098 1 0 01:33 ? 00:00:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 18191 18186 1 02:01 ? 00:00:00 bash -c ps -eaf | grep ganesha root 18201 18191 0 02:01 ? 00:00:00 grep ganesha --- nfs8 root 13708 1 0 01:33 ? 00:00:00 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 14799 14794 0 02:01 ? 00:00:00 bash -c ps -eaf | grep ganesha root 14809 14799 0 02:01 ? 00:00:00 grep ganesha --- nfs9 root 2385 1 0 04:06 ? 00:00:01 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 3133 3130 3 04:34 ? 00:00:00 bash -c ps -eaf | grep ganesha root 3139 3133 0 04:34 ? 00:00:00 grep ganesha --- nfs10 root 2308 1 0 04:06 ? 00:00:01 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 3461 3458 3 04:34 ? 00:00:00 bash -c ps -eaf | grep ganesha root 3465 3461 0 04:34 ? 00:00:00 grep ganesha --- nfs11 root 2288 1 0 04:06 ? 00:00:01 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 3433 3430 0 04:34 ? 00:00:00 bash -c ps -eaf | grep ganesha root 3437 3433 0 04:34 ? 00:00:00 grep ganesha --- nfs12 root 2311 1 0 04:06 ? 00:00:01 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid root 3455 3452 2 04:34 ? 00:00:00 bash -c ps -eaf | grep ganesha root 3459 3455 0 04:34 ? 00:00:00 grep ganesha --- [root@nfs5 ~]# [root@nfs5 ~]# [root@nfs5 ~]# pcs status Cluster name: new-ganesha Last updated: Fri Jun 5 02:02:46 2015 Last change: Fri Jun 5 01:39:01 2015 Stack: cman Current DC: nfs5 - partition WITHOUT quorum Version: 1.1.11-97629de 8 Nodes configured 32 Resources configured Online: [ nfs5 nfs6 nfs7 nfs8 ] OFFLINE: [ nfs10 nfs11 nfs12 nfs9 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Stopped: [ nfs10 nfs11 nfs12 nfs5 nfs6 nfs7 nfs8 nfs9 ] Clone Set: nfs-grace-clone [nfs-grace] Stopped: [ nfs10 nfs11 nfs12 nfs5 nfs6 nfs7 nfs8 nfs9 ] nfs5-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs5-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs6-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs6-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs7-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs7-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs8-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs8-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs9-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs9-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs10-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs10-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs11-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs11-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs12-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs12-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs5 nfs-ganesha-2.2.0-0.el6.x86_64 nfs-ganesha-debuginfo-2.2.0-0.el6.x86_64 nfs-ganesha-gluster-2.2.0-0.el6.x86_64 glusterfs-ganesha-3.7.0-3.el6rhs.x86_64 --- nfs6 nfs-ganesha-2.2.0-0.el6.x86_64 nfs-ganesha-debuginfo-2.2.0-0.el6.x86_64 glusterfs-ganesha-3.7.0-3.el6rhs.x86_64 nfs-ganesha-gluster-2.2.0-0.el6.x86_64 --- nfs7 nfs-ganesha-debuginfo-2.2.0-0.el6.x86_64 nfs-ganesha-gluster-2.2.0-0.el6.x86_64 nfs-ganesha-2.2.0-0.el6.x86_64 glusterfs-ganesha-3.7.0-3.el6rhs.x86_64 --- nfs8 nfs-ganesha-debuginfo-2.2.0-0.el6.x86_64 glusterfs-ganesha-3.7.0-3.el6rhs.x86_64 nfs-ganesha-2.2.0-0.el6.x86_64 nfs-ganesha-gluster-2.2.0-0.el6.x86_64 --- nfs9 glusterfs-ganesha-3.7.0-3.el6rhs.x86_64 nfs-ganesha-2.2.0-0.el6.x86_64 nfs-ganesha-debuginfo-2.2.0-0.el6.x86_64 nfs-ganesha-gluster-2.2.0-0.el6.x86_64 --- nfs10 nfs-ganesha-2.2.0-0.el6.x86_64 nfs-ganesha-gluster-2.2.0-0.el6.x86_64 glusterfs-ganesha-3.7.0-3.el6rhs.x86_64 nfs-ganesha-debuginfo-2.2.0-0.el6.x86_64 --- nfs11 nfs-ganesha-gluster-2.2.0-0.el6.x86_64 nfs-ganesha-2.2.0-0.el6.x86_64 nfs-ganesha-debuginfo-2.2.0-0.el6.x86_64 glusterfs-ganesha-3.7.0-3.el6rhs.x86_64 --- nfs12 nfs-ganesha-gluster-2.2.0-0.el6.x86_64 nfs-ganesha-2.2.0-0.el6.x86_64 nfs-ganesha-debuginfo-2.2.0-0.el6.x86_64 glusterfs-ganesha-3.7.0-3.el6rhs.x86_64 --- service pcsd status nfs5 pcsd (pid 11575) is running... --- nfs6 pcsd (pid 9122) is running... --- nfs7 pcsd (pid 16083) is running... --- nfs8 pcsd (pid 12693) is running... --- nfs9 pcsd (pid 1580) is running... --- nfs10 pcsd (pid 1600) is running... --- nfs11 pcsd (pid 1600) is running... --- nfs12 pcsd (pid 1586) is running... --- [root@nfs5 ~]# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.70.44.0 * 255.255.252.0 U 0 0 0 eth0 link-local * 255.255.0.0 U 1002 0 0 eth0 default 10.70.47.254 0.0.0.0 UG 0 0 0 eth0 [root@nfs9 ~]# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.70.44.0 * 255.255.252.0 U 0 0 0 eth0 link-local * 255.255.0.0 U 1002 0 0 eth0 default 10.70.47.254 0.0.0.0 UG 0 0 0 eth0
Created attachment 1034996 [details] sosreport of nfs5
Created attachment 1034997 [details] sosreport of nfs9
So, my least favorite thing to do, is say "It worked for me." Eight RHEL6.6 machines (actually yum update appears to have updated them to 6.7beta). gluster and ganesha installed from rhgs builds at http://download.eng.bos.redhat.com/brewroot/packages/* gluster testvol is an eight brick replica 2 volume. gluster and ganesha running on all eight nodes. # pcs status Cluster name: ganesha-ha-42 Last updated: Tue Jun 9 17:06:25 2015 Last change: Tue Jun 9 17:05:58 2015 Stack: cman Current DC: r6node1 - partition with quorum Version: 1.1.11-97629de 8 Nodes configured 32 Resources configured Online: [ r6node1 r6node2 r6node3 r6node4 r6node5 r6node6 r6node7 r6node8 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ r6node1 r6node2 r6node3 r6node4 r6node5 r6node6 r6node7 r6node8 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ r6node1 r6node2 r6node3 r6node4 r6node5 r6node6 r6node7 r6node8 ] r6node1-cluster_ip-1 (ocf::heartbeat:IPaddr): Started r6node1 r6node1-trigger_ip-1 (ocf::heartbeat:Dummy): Started r6node1 r6node2-cluster_ip-1 (ocf::heartbeat:IPaddr): Started r6node2 r6node2-trigger_ip-1 (ocf::heartbeat:Dummy): Started r6node2 r6node3-cluster_ip-1 (ocf::heartbeat:IPaddr): Started r6node3 r6node3-trigger_ip-1 (ocf::heartbeat:Dummy): Started r6node3 r6node4-cluster_ip-1 (ocf::heartbeat:IPaddr): Started r6node4 r6node4-trigger_ip-1 (ocf::heartbeat:Dummy): Started r6node4 r6node5-cluster_ip-1 (ocf::heartbeat:IPaddr): Started r6node5 r6node5-trigger_ip-1 (ocf::heartbeat:Dummy): Started r6node5 r6node6-cluster_ip-1 (ocf::heartbeat:IPaddr): Started r6node6 r6node6-trigger_ip-1 (ocf::heartbeat:Dummy): Started r6node6 r6node7-cluster_ip-1 (ocf::heartbeat:IPaddr): Started r6node7 r6node7-trigger_ip-1 (ocf::heartbeat:Dummy): Started r6node7 r6node8-cluster_ip-1 (ocf::heartbeat:IPaddr): Started r6node8 r6node8-trigger_ip-1 (ocf::heartbeat:Dummy): Started r6node8 I found the IP addrs of your nodes from the attached sos report, but they are all turned off (are they? Usually I can ssh to them, but not today), so I couldn't investigate your setup on those machines.
I tried again and I find the issue, [root@nfs5 ~]# gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y ganesha enable : success [root@nfs5 ~]# pcs status Cluster name: sitter Last updated: Fri Jun 12 00:47:31 2015 Last change: Thu Jun 11 23:31:39 2015 Stack: cman Current DC: nfs5 - partition WITHOUT quorum Version: 1.1.11-97629de 8 Nodes configured 32 Resources configured Online: [ nfs5 nfs6 nfs7 nfs8 ] OFFLINE: [ nfs10 nfs11 nfs12 nfs9 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Stopped: [ nfs10 nfs11 nfs12 nfs5 nfs6 nfs7 nfs8 nfs9 ] Clone Set: nfs-grace-clone [nfs-grace] Stopped: [ nfs10 nfs11 nfs12 nfs5 nfs6 nfs7 nfs8 nfs9 ] nfs5-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs5-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs6-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs6-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs7-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs7-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs8-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs8-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs9-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs9-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs10-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs10-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs11-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs11-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped nfs12-cluster_ip-1 (ocf::heartbeat:IPaddr): Stopped nfs12-trigger_ip-1 (ocf::heartbeat:Dummy): Stopped [root@nfs5 ~]# nfs5 root 18633 1 0 Jun11 ? 00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid --------- nfs6 root 15862 1 0 Jun11 ? 00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid --------- nfs7 root 16173 1 0 Jun11 ? 00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid --------- nfs8 root 16198 1 0 Jun11 ? 00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid --------- nfs9 root 16781 1 0 Jun11 ? 00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid --------- nfs10 root 24116 1 0 Jun11 ? 00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid --------- nfs11 root 29442 1 0 02:00 ? 00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid --------- nfs12 root 25046 1 0 02:00 ? 00:00:03 /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid Sending you the setup details in a mail.
I suspect there's an issue with the udp multicast packets that corosync uses to communicate. One web site says: By default, Corosync uses IP multicast to communicate between nodes: mcastaddr: 239.192.26.170 mcastport: 5405 Either configure your firewall to allow multicast traffic: # iptables -A INPUT -p igmp -j ACCEPT # iptables -A INPUT -m addrtype --dst-type MULTICAST -j ACCEPT # iptables -A INPUT -p udp -m state --state NEW -m multiport --dports 5404,5405 -j ACCEPT or switch to unicast. Please confirm that the switch/router between these systems is configured to route multicast udp. I've asked for clarification from the HA team about the need to use an /etc/corosync/corosync.conf to specify an mcastaddr in this configuration. In the mean time a four node cluster on these nodes, w/ all IP addresses and the virt IPaddr addresses in the same range, works. An eight node cluster w/ all IP addresses and virt IPaddr addresses in the same range works a Class C network.
Saurabh's eight node looks okay generally. four nodes (5-8 or 9-12) work fine, e.g. by trimming the config file to four nodes. With eight nodes though, 5-8 are not seeing nodes 9-12. corosync uses udp multicast to communicate. I need to test whether the multicast packets are getting through. I suspect a router that's not forwarding udp multicast packets.
I tried mellanox's sockperf tool to send udp multicast from nfs5 to servers on nfs6 and nfs12. E.g. sockperf: server = `sockperf server -i 239.192.26.170 -p 8642 --mc-rx-if $ip`, client = `sockperf ping-pong -i 239.192.26.170 -p 7201 --mc-tx-if $ip` between nfs5 and nfs6: nfs6% sockperf server -i 239.192.26.170 -p 8642 --mc-rx-if 10.70.46.25 sockperf: == version #2.5.244 == sockperf: [SERVER] listen on: [ 0] IP = 239.192.26.170 PORT = 8642 # UDP sockperf: Warmup stage (sending a few dummy messages)... sockperf: [tid 4105] using recvfrom() to block on socket(s) nfs5% sockperf ping-pong -i 239.192.26.170 -p 8642 --mc-tx-if 10.70.46.8 sockperf: == version #2.5.244 == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 239.192.26.170 PORT = 8642 # UDP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=1.100 sec; SentMessages=5304; ReceivedMessages=5303 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=1.000 sec; SentMessages=4813; ReceivedMessages=4813 sockperf: ====> avg-lat=103.706 (std-dev=13.710) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 103.706 usec sockperf: Total 4813 observations; each percentile contains 48.13 observations sockperf: ---> <MAX> observation = 209.171 sockperf: ---> percentile 99.99 = 209.171 sockperf: ---> percentile 99.90 = 173.791 sockperf: ---> percentile 99.50 = 157.659 sockperf: ---> percentile 99.00 = 148.532 sockperf: ---> percentile 95.00 = 131.088 sockperf: ---> percentile 90.00 = 122.464 sockperf: ---> percentile 75.00 = 109.089 sockperf: ---> percentile 50.00 = 100.158 sockperf: ---> percentile 25.00 = 95.149 sockperf: ---> <MIN> observation = 54.134 ============================================================= between nfs5 and nfs12: nfs12% sockperf server -i 239.192.26.170 -p 8642 --mc-rx-if 10.70.46.25 sockperf: == version #2.5.244 == sockperf: [SERVER] listen on: [ 0] IP = 239.192.26.170 PORT = 8642 # UDP nfs5% sockperf ping-pong -i 239.192.26.170 -p 8642 --mc-tx-if 10.70.46.8 sockperf: == version #2.5.244 == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 239.192.26.170 PORT = 8642 # UDP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: No messages were received from the server. Is the server down?
(In reply to Kaleb KEITHLEY from comment #14) > I suspect there's an issue with the udp multicast packets that corosync uses > to communicate. > > One web site says: > > By default, Corosync uses IP multicast to communicate between nodes: > > mcastaddr: 239.192.26.170 > mcastport: 5405 > > Either configure your firewall to allow multicast traffic: > > # iptables -A INPUT -p igmp -j ACCEPT > # iptables -A INPUT -m addrtype --dst-type MULTICAST -j ACCEPT > > # iptables -A INPUT -p udp -m state --state NEW -m multiport --dports > 5404,5405 -j ACCEPT > > or switch to unicast. > > > Please confirm that the switch/router between these systems is configured to > route multicast udp. > > I've asked for clarification from the HA team about the need to use an > /etc/corosync/corosync.conf to specify an mcastaddr in this configuration. > > In the mean time a four node cluster on these nodes, w/ all IP addresses and > the virt IPaddr addresses in the same range, works. An eight node cluster w/ > all IP addresses and virt IPaddr addresses in the same range works a Class C > network. Kaleb, Are you suggesting this as a workaround? If it so, then I would where exactly we need to put this workaround on the vm or on the physical nodes hosting vms? Asking these questions as presently we are not sure where excatly the packets are getting dropped whether on the switch, physical host or VM. From my end I tried to collect info about the packet drops from physical node and VMs using the "netstat -i", but it didn't report any packet from any VMs or physical nodes. If you want to check the physical nodes hosting the VMs, I can send you the credentials. Thanks, Saurabh
jfriesse suggests: ... you can try omping.
Able to setup the 8 node ganesha cluster
Hi Kaleb, The doc text is updated. Please review the same and share your technical review comments. If it looks ok, then sign-off on the same. I ahve not included the following information that you added, let me know if this should be included as well. "(I suspect that two sets of VMs on the subnet but in different hosts need special routing to make this work, as the pacemaker/corosync team says that participating nodes on different subnets is a supported/working configuration." Regards, Bhavana
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html
closed, clear needinfo