1113460 – after enabling quota, peer probing fails on glusterfs-3.5.1

Bug 1113460 - after enabling quota, peer probing fails on glusterfs-3.5.1

Summary: after enabling quota, peer probing fails on glusterfs-3.5.1

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	quota
Sub Component:
Version:	3.5.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	---
Assignee:	Manikandan
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1101942 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-06-26 08:47 UTC by Richard
Modified:	2016-05-09 06:44 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:	1101942
Environment:
Last Closed:	2016-05-09 06:44:25 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Richard 2014-06-26 08:47:30 UTC

+++ This bug was initially created as a clone of Bug #1101942 +++

Description of problem:

When building a new volume on a brand new server I am unable to peer probe a 2nd node when creating a distributed volume.

Version-Release number of selected component (if applicable):

glusterfs-libs-3.5.1-1.el6.x86_64
glusterfs-cli-3.5.1-1.el6.x86_64
glusterfs-3.5.1-1.el6.x86_64
glusterfs-fuse-3.5.1-1.el6.x86_64
glusterfs-server-3.5.1-1.el6.x86_64
glusterfs-api-3.5.1-1.el6.x86_64

How reproducible:

1. Install the above packages from the Gluster Repo.
2. Create a new volume with 1 node.
3. try and peer probe the 2nd node.

Actual results:

Hostname: 172.16.242.241
Uuid: ad09b6f9-9e71-462b-bec0-bcdc93db221b
State: Probe Sent to Peer (Connected)

and on this node, gluster peer status times out with no output.

Expected results:

the 2nd node should join the pool

Additional info:

This makes GlusterFS UNUSABLE as a distributed filesystem!

Comment 1 Richard 2014-06-26 08:51:12 UTC

*** Bug 1101942 has been marked as a duplicate of this bug. ***

Comment 2 Niels de Vos 2014-06-26 09:25:53 UTC

Peer probing is done by glusterd, changing components.

Comment 3 Niels de Vos 2014-06-26 15:03:00 UTC

Hi Richard,

I have tried to reproduce this, but can not. This is what I've done:

1. install two servers with RHEL-6.5
2. install the glusterfs-server-3.5.1 packages (+deps) from download.gluster.org
3. create a volume with one brick on the 1st server
4. peer probe the second server

The above procedure works fine for me. Could you explain the differences with your environment/test?

Thanks,
Niels

Comment 4 Richard 2014-06-26 22:14:37 UTC

Not really much to tell other than I run Scientific Linux release 6.5 and not RHEL but in theory that's the same ;-)

I run all my test systems on Virtual Box on top of a SL-6.5 host OS.

They have their own private back end nic setup just for gluster to communicate, and a front end nic for everything else. The back end nic is a VirtalBox network.

My Gluster volume runs both NFS and FUSE mount points on node #1.

Everything works fine with 3.4.x, but when I upgrade to 3.5 I'm unable to get anything to work. Come to think if it, from glusterfs-351beta2 up has been a problem... glusterfs-351beta1 worked fine as far as I can remember.

Do you want to see any logs/files from the servers? If so, which ones?

Comment 5 Richard 2014-06-27 16:24:00 UTC

ok, I've just tested glusterfs-351beta1 and it is partly broken for me.

A new volume that is distributed can't peer probe more nodes, but a fresh replicated volume can be setup with two nodes and the peer probe process works fine for that replicated setup. However, I've not tried adding a 3rd and 4th node to that setup yet.

I will try a test with glusterfs-351beta2

Comment 6 Richard 2014-07-01 18:27:29 UTC

ok, I've found the cause. Enabled Quotas causes it to fail.

I've just built two brand new SL-6.5 servers using all the defaults and selected the "Minimal Server profile.
I then ran thie following commands on both servers:

1) yum -y upgrade
2) service iptables stop; chkconfig iptables off; chkconfig --del iptables
3) cd /etc/yum.repos.d
4) wget http://download.gluster.org/pub/gluster/glusterfs/3.5/LATEST/EPEL.repo/glusterfs-epel.repo
5) yum -y install glusterfs-server glusterfs-fuse
6) service glusterd start

And then on Server #1 I ran this to create a volume:
gluster volume create md0 transport tcp 192.168.1.21:/mnt force
gluster vol md0 start
gluster vol start md0
gluster volume quota md0 enable
gluster peer probe 192.168.1.20
gluster peer status

If I remove the quota enable line from this process everything works as expected.

Hope this helps pinpoint the issue.
Thanks
Rich

Comment 7 Richard 2014-07-29 12:45:46 UTC

Still broken in glusterfs-351beta2-epel.repo
Thanks,
Rich

Comment 8 Niels de Vos 2014-07-29 13:16:32 UTC

Thanks, Rich! The steps to reproduce in comment #6 should be sufficient to check what is happening exactly. I'm moving this to the 'quota' component so that the right developers get it on their radar.

I do not expect to see this fixed in 3.5.2, unless someone posts a patch very soon. We will keep this bug report updated when we have more details.

Comment 9 Richard 2015-02-24 09:56:51 UTC

Hi,
Has there been any update on this.
I'm still unable to re-assemble a distributed volume with 3.6.3beta1
I can peer probe ok, I just can't mount the volume.
Thanks,
Rich

Comment 10 Niels de Vos 2015-02-24 20:06:10 UTC

KP, do you know about this issue related with quota? If not you, who would?

Comment 11 Richard 2015-02-25 10:19:31 UTC

I've disabled quota's on a volume to test and I still get this repeated in my logs when I try and mount a volume after a reboot of all volume nodes:

[2015-02-25 09:31:56.385028] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.6.3beta1 (args: /usr/sbin/glusterfs --acl --volfile-server=10.64.0.1 --volfile-id=pxe /var/lib/data)
[2015-02-25 09:31:56.474138] I [graph.c:269:gf_add_cmdline_options] 0-pxe-md-cache: adding option 'cache-posix-acl' for volume 'pxe-md-cache' with value 'true'

Comment 12 Richard 2015-05-27 13:15:05 UTC

I can't test this on 3.7.0 as quotas don't even work for me:

https://bugzilla.redhat.com/show_bug.cgi?id=1117888

Comment 13 Manikandan 2016-04-14 08:19:48 UTC

Hi Richard,

Can you check in 3.7.11 and let us please know if you still face the issue?


--
Thanks,
Manikandan.

Comment 14 Richard 2016-04-14 13:55:03 UTC

I'm afraid I no longer use Gluster... it did not meet my needs so I've moved to using LizardFS. 

If I get the chance to setup a couple of VM's to retest the above steps in the next few weeks I'll let you know the outcome.

Comment 15 Manikandan 2016-05-09 06:44:25 UTC

Hi,

Thanks Richard for your inputs. We tested on our setup and since the bug is not reproducible in the latest version, closing this bug.


--
Thanks & regards,
Manikandan Selvaganesh.

Note You need to log in before you can comment on or make changes to this bug.