1109872 – quota:peer probe fails after adding the new node to the existing cluster with quota enabled

Bug 1109872 - quota:peer probe fails after adding the new node to the existing cluster with quota enabled

Summary: quota:peer probe fails after adding the new node to the existing cluster with...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Kaushal
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1108505
TreeView+	depends on / blocked

Reported:	2014-06-16 14:19 UTC by Kaushal
Modified:	2014-11-11 08:35 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.6.0beta1
Clone Of:	1108505
Environment:
Last Closed:	2014-11-11 08:35:14 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Kaushal 2014-06-16 14:19:52 UTC

+++ This bug was initially created as a clone of Bug #1108505 +++

Description of problem:
I have an existing cluster of four RHSS nodes. created a volume in this cluster with volume type as 6X2, set some options that are used for nfs. 
Now, if I try to peer probe a new RHSS node, the probe is fail.

Version-Release number of selected component (if applicable):
glusterfs-3.6.0.15-1.el6rhs.x86_64

How reproducible:
have been seen for this test.

Steps to Reproduce:
1. create a volume of type 6x2, start it
2. set the nfs related options such as nfs.rpc-auth-allow/reject, nfs.export-dirs/nfs.export-dir, quota enable
3.gluster peer probe a new RHSS node --- step 3 fails,

Actual results:
[root@nfs1 ~]# gluster peer probe 10.70.37.13
peer probe: failed: Probe returned with unknown errno -1

[root@nfs1 ~]# gluster peer status
Number of Peers: 4

Hostname: 10.70.37.215
Uuid: db4a5cde-f048-4796-84dd-19ba9ca98e6f
State: Peer in Cluster (Connected)

Hostname: 10.70.37.44
Uuid: 7f8f341e-4274-40f0-ae83-bde70365d2f4
State: Peer in Cluster (Connected)

Hostname: 10.70.37.201
Uuid: 9512d008-9dd8-4a5b-bf8c-983862a86c4a
State: Peer in Cluster (Connected)

Hostname: 10.70.37.13
Uuid: ccaeac50-ad54-43ef-a5a2-5a7e17666936
State: Probe Sent to Peer (Connected)


Expected results:
peer probe should be a success

Additional info:
ip of the node already part of cluster, from where the command was executed
inet 10.70.37.62/23 brd 10.70.37.255 scope global eth0

ip of new node,
inet 10.70.37.13/23 brd 10.70.37.255 scope global eth0


Peer status from some other node of the existing cluster,
[root@nfs2 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.37.62
Uuid: bd23f0cb-d64a-4ddb-8543-6e1bbc812c7d
State: Peer in Cluster (Connected)

Hostname: 10.70.37.44
Uuid: 7f8f341e-4274-40f0-ae83-bde70365d2f4
State: Peer in Cluster (Connected)

Hostname: 10.70.37.201
Uuid: 9512d008-9dd8-4a5b-bf8c-983862a86c4a
State: Peer in Cluster (Connected)


--- Additional comment from Saurabh on 2014-06-13 16:22:11 IST ---

Santosh and myself tried the tests to narrow down the issue, so only one or two times we have seen a peer probe failing. Otherwise in the recent trials of peer probe  have been successful, these latest trials were on a new volume while same nfs options being set.

--- Additional comment from Saurabh on 2014-06-13 16:32:10 IST ---

So, probably my bad I didn't test with quota in the latest trials whereas while filing the bz I had quota enabled on volume.
Hence, I tried out the things with quota enabled and it peer probe. As, can be seen in the results mentioned below.

Please do not lower the priority. Changing the summary as well.


Results of latest trial,
[root@nfs1 ~]# gluster peer probe 10.70.37.13
peer probe: failed: Probe returned with unknown errno -1
[root@nfs1 ~]# gluster volume info dist-rep
 
Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 7ab235ad-a666-44b3-a46f-d3321f3eb4d6
Status: Started
Snap Volume: no
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.37.62:/bricks/d1r1
Brick2: 10.70.37.215:/bricks/d1r2
Brick3: 10.70.37.44:/bricks/d2r1
Brick4: 10.70.37.201:/bricks/d2r2
Brick5: 10.70.37.62:/bricks/d3r1
Brick6: 10.70.37.215:/bricks/d3r2
Brick7: 10.70.37.44:/bricks/d4r1
Brick8: 10.70.37.201:/bricks/d4r2
Brick9: 10.70.37.62:/bricks/d5r1
Brick10: 10.70.37.215:/bricks/d5r2
Brick11: 10.70.37.44:/bricks/d6r1
Brick12: 10.70.37.201:/bricks/d6r2
Brick13: 10.70.37.62:/bricks/d1r1-add
Brick14: 10.70.37.215:/bricks/d1r2-add
Options Reconfigured:
features.quota: on
nfs.export-dir: /1(rhsauto054.lab.eng.blr.redhat.com),/2(172.16.0.0/27)
nfs.export-dirs: on
nfs.rpc-auth-reject: 10.70.35.33
nfs.rpc-auth-allow: *.lab.eng.blr.redhat.com
[root@nfs1 ~]# gluster peer status
Number of Peers: 4

Hostname: 10.70.37.44
Uuid: 7f8f341e-4274-40f0-ae83-bde70365d2f4
State: Peer in Cluster (Connected)

Hostname: 10.70.37.201
Uuid: 9512d008-9dd8-4a5b-bf8c-983862a86c4a
State: Peer in Cluster (Connected)

Hostname: 10.70.37.215
Uuid: db4a5cde-f048-4796-84dd-19ba9ca98e6f
State: Peer in Cluster (Connected)

Hostname: 10.70.37.13
Uuid: ccaeac50-ad54-43ef-a5a2-5a7e17666936
State: Probe Sent to Peer (Disconnected)

Comment 1 Anand Avati 2014-06-16 14:26:32 UTC

REVIEW: http://review.gluster.org/8082 (glusterd: Use blocking quotad start only on quota enable) posted (#1) for review on master by Kaushal M (kaushal)

Comment 2 Anand Avati 2014-06-17 02:37:06 UTC

COMMIT: http://review.gluster.org/8082 committed in master by Krishnan Parthasarathi (kparthas) 
------
commit 84d370774cdbc6847f4f2f64a7f47abb27a7471b
Author: Kaushal M <kaushal>
Date:   Mon Jun 16 18:56:18 2014 +0530

    glusterd: Use blocking quotad start only on quota enable
    
    Having quotad always being using the blocking runner variant is
    problematic. In some cases where quotad was started from the epoll
    thread, it lead to a deadlock which lead to glusterd becoming
    unresponsive.
    
    This patch makes the default quotad_start function use the non-blocking
    runner_nowait variant. The blocking start is used only when quotad is
    started on quota enable command.
    
    Change-Id: Ib27042748d69ea28e68badcfaddf61589aae4eba
    BUG: 1109872
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/8082
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Tested-by: Krishnan Parthasarathi <kparthas>

Comment 3 Niels de Vos 2014-09-22 12:43:04 UTC

A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 4 Niels de Vos 2014-11-11 08:35:14 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users

Note You need to log in before you can comment on or make changes to this bug.