+++ This bug was initially created as a clone of Bug #1101942 +++ Description of problem: When building a new volume on a brand new server I am unable to peer probe a 2nd node when creating a distributed volume. Version-Release number of selected component (if applicable): glusterfs-libs-3.5.1-1.el6.x86_64 glusterfs-cli-3.5.1-1.el6.x86_64 glusterfs-3.5.1-1.el6.x86_64 glusterfs-fuse-3.5.1-1.el6.x86_64 glusterfs-server-3.5.1-1.el6.x86_64 glusterfs-api-3.5.1-1.el6.x86_64 How reproducible: 1. Install the above packages from the Gluster Repo. 2. Create a new volume with 1 node. 3. try and peer probe the 2nd node. Actual results: Hostname: 172.16.242.241 Uuid: ad09b6f9-9e71-462b-bec0-bcdc93db221b State: Probe Sent to Peer (Connected) and on this node, gluster peer status times out with no output. Expected results: the 2nd node should join the pool Additional info: This makes GlusterFS UNUSABLE as a distributed filesystem!
*** Bug 1101942 has been marked as a duplicate of this bug. ***
Peer probing is done by glusterd, changing components.
Hi Richard, I have tried to reproduce this, but can not. This is what I've done: 1. install two servers with RHEL-6.5 2. install the glusterfs-server-3.5.1 packages (+deps) from download.gluster.org 3. create a volume with one brick on the 1st server 4. peer probe the second server The above procedure works fine for me. Could you explain the differences with your environment/test? Thanks, Niels
Not really much to tell other than I run Scientific Linux release 6.5 and not RHEL but in theory that's the same ;-) I run all my test systems on Virtual Box on top of a SL-6.5 host OS. They have their own private back end nic setup just for gluster to communicate, and a front end nic for everything else. The back end nic is a VirtalBox network. My Gluster volume runs both NFS and FUSE mount points on node #1. Everything works fine with 3.4.x, but when I upgrade to 3.5 I'm unable to get anything to work. Come to think if it, from glusterfs-351beta2 up has been a problem... glusterfs-351beta1 worked fine as far as I can remember. Do you want to see any logs/files from the servers? If so, which ones?
ok, I've just tested glusterfs-351beta1 and it is partly broken for me. A new volume that is distributed can't peer probe more nodes, but a fresh replicated volume can be setup with two nodes and the peer probe process works fine for that replicated setup. However, I've not tried adding a 3rd and 4th node to that setup yet. I will try a test with glusterfs-351beta2
ok, I've found the cause. Enabled Quotas causes it to fail. I've just built two brand new SL-6.5 servers using all the defaults and selected the "Minimal Server profile. I then ran thie following commands on both servers: 1) yum -y upgrade 2) service iptables stop; chkconfig iptables off; chkconfig --del iptables 3) cd /etc/yum.repos.d 4) wget http://download.gluster.org/pub/gluster/glusterfs/3.5/LATEST/EPEL.repo/glusterfs-epel.repo 5) yum -y install glusterfs-server glusterfs-fuse 6) service glusterd start And then on Server #1 I ran this to create a volume: gluster volume create md0 transport tcp 192.168.1.21:/mnt force gluster vol md0 start gluster vol start md0 gluster volume quota md0 enable gluster peer probe 192.168.1.20 gluster peer status If I remove the quota enable line from this process everything works as expected. Hope this helps pinpoint the issue. Thanks Rich
Still broken in glusterfs-351beta2-epel.repo Thanks, Rich
Thanks, Rich! The steps to reproduce in comment #6 should be sufficient to check what is happening exactly. I'm moving this to the 'quota' component so that the right developers get it on their radar. I do not expect to see this fixed in 3.5.2, unless someone posts a patch very soon. We will keep this bug report updated when we have more details.
Hi, Has there been any update on this. I'm still unable to re-assemble a distributed volume with 3.6.3beta1 I can peer probe ok, I just can't mount the volume. Thanks, Rich
KP, do you know about this issue related with quota? If not you, who would?
I've disabled quota's on a volume to test and I still get this repeated in my logs when I try and mount a volume after a reboot of all volume nodes: [2015-02-25 09:31:56.385028] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.6.3beta1 (args: /usr/sbin/glusterfs --acl --volfile-server=10.64.0.1 --volfile-id=pxe /var/lib/data) [2015-02-25 09:31:56.474138] I [graph.c:269:gf_add_cmdline_options] 0-pxe-md-cache: adding option 'cache-posix-acl' for volume 'pxe-md-cache' with value 'true'
I can't test this on 3.7.0 as quotas don't even work for me: https://bugzilla.redhat.com/show_bug.cgi?id=1117888
Hi Richard, Can you check in 3.7.11 and let us please know if you still face the issue? -- Thanks, Manikandan.
I'm afraid I no longer use Gluster... it did not meet my needs so I've moved to using LizardFS. If I get the chance to setup a couple of VM's to retest the above steps in the next few weeks I'll let you know the outcome.
Hi, Thanks Richard for your inputs. We tested on our setup and since the bug is not reproducible in the latest version, closing this bug. -- Thanks & regards, Manikandan Selvaganesh.