Bug 1108505
Summary: | quota:peer probe fails after adding the new node to the existing cluster with quota enabled | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Saurabh <saujain> | ||||||
Component: | glusterd | Assignee: | Kaushal <kaushal> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Saurabh <saujain> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | rhgs-3.0 | CC: | amukherj, asrivast, kaushal, kparthas, mzywusko, nlevinki, nsathyan, ssamanta, vbellur | ||||||
Target Milestone: | --- | ||||||||
Target Release: | RHGS 3.0.0 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | glusterfs-3.6.0.19-1 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Cause:
The way quotad was being started on the new peer when peer probed, lead to glusterd being deadlocked.
Consequence:
As glusterd was deadlocked, the peer probe command failed.
Fix:
Quotad is now started in a non-blocking way during peer probe, and no longer blocks glusterd.
Result:
Peer probe completes successfully.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1109872 (view as bug list) | Environment: | |||||||
Last Closed: | 2014-09-22 19:41:07 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1109872 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Saurabh
2014-06-12 07:20:07 UTC
Created attachment 907965 [details]
sosreport of existing rhs node
Created attachment 907967 [details]
sosreport of new rhss node
Santosh and myself tried the tests to narrow down the issue, so only one or two times we have seen a peer probe failing. Otherwise in the recent trials of peer probe have been successful, these latest trials were on a new volume while same nfs options being set. based on #comment5 I would like to request to remove blocker flag and lower down the priority, but we can can't close the bug since the issue happened and we are not very clear why it is not happening at this time. So, probably my bad I didn't test with quota in the latest trials whereas while filing the bz I had quota enabled on volume. Hence, I tried out the things with quota enabled and it peer probe. As, can be seen in the results mentioned below. Please do not lower the priority. Changing the summary as well. Results of latest trial, [root@nfs1 ~]# gluster peer probe 10.70.37.13 peer probe: failed: Probe returned with unknown errno -1 [root@nfs1 ~]# gluster volume info dist-rep Volume Name: dist-rep Type: Distributed-Replicate Volume ID: 7ab235ad-a666-44b3-a46f-d3321f3eb4d6 Status: Started Snap Volume: no Number of Bricks: 7 x 2 = 14 Transport-type: tcp Bricks: Brick1: 10.70.37.62:/bricks/d1r1 Brick2: 10.70.37.215:/bricks/d1r2 Brick3: 10.70.37.44:/bricks/d2r1 Brick4: 10.70.37.201:/bricks/d2r2 Brick5: 10.70.37.62:/bricks/d3r1 Brick6: 10.70.37.215:/bricks/d3r2 Brick7: 10.70.37.44:/bricks/d4r1 Brick8: 10.70.37.201:/bricks/d4r2 Brick9: 10.70.37.62:/bricks/d5r1 Brick10: 10.70.37.215:/bricks/d5r2 Brick11: 10.70.37.44:/bricks/d6r1 Brick12: 10.70.37.201:/bricks/d6r2 Brick13: 10.70.37.62:/bricks/d1r1-add Brick14: 10.70.37.215:/bricks/d1r2-add Options Reconfigured: features.quota: on nfs.export-dir: /1(rhsauto054.lab.eng.blr.redhat.com),/2(172.16.0.0/27) nfs.export-dirs: on nfs.rpc-auth-reject: 10.70.35.33 nfs.rpc-auth-allow: *.lab.eng.blr.redhat.com [root@nfs1 ~]# gluster peer status Number of Peers: 4 Hostname: 10.70.37.44 Uuid: 7f8f341e-4274-40f0-ae83-bde70365d2f4 State: Peer in Cluster (Connected) Hostname: 10.70.37.201 Uuid: 9512d008-9dd8-4a5b-bf8c-983862a86c4a State: Peer in Cluster (Connected) Hostname: 10.70.37.215 Uuid: db4a5cde-f048-4796-84dd-19ba9ca98e6f State: Peer in Cluster (Connected) Hostname: 10.70.37.13 Uuid: ccaeac50-ad54-43ef-a5a2-5a7e17666936 State: Probe Sent to Peer (Disconnected) Vivek, I dont know to whom to assign, hence assigning to you. Please assign it to Quota team. Regards, Santosh This is another instance of quotad start causing Glusterd to deadlock, similar to 1095585 Downstream patch - https://code.engineering.redhat.com/gerrit/#/c/27049/ [root@nfs2 ~]# gluster volume info dist-rep | grep quota features.quota-deem-statfs: off features.quota: on [root@nfs4 ~]# gluster peer probe rhsauto005.lab.eng.blr.redhat.com peer probe: success. [root@nfs3 ~]# gluster peer status Number of Peers: 4 Hostname: 10.70.37.62 Uuid: ad345a97-3d00-4960-a620-d89f1f715dc0 State: Peer in Cluster (Connected) Hostname: 10.70.37.215 Uuid: b9eded1c-fbae-4e9b-aa31-26a06e747d83 State: Peer in Cluster (Connected) Hostname: 10.70.37.201 Uuid: 542bf4aa-b6b5-40c3-82bf-f344fb637a99 State: Peer in Cluster (Connected) Hostname: rhsauto005.lab.eng.blr.redhat.com Uuid: 5f0ccbd1-bec3-4c37-be35-6ce38647398c State: Peer in Cluster (Connected) Hence, moving this BZ to verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html |