| Summary: | glusterd hangs on big lock | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Anand Avati <aavati> | ||||||
| Component: | glusterfs | Assignee: | Vijaikumar Mallikarjuna <vmallika> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | ssamanta | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 2.1 | CC: | bhubbard, chrisw, gluster-bugs, grajaiya, kparthas, mzywusko, nsathyan, psriniva, sasundar, sdharane, smohan, ssamanta, vagarwal, vbellur, vmallika | ||||||
| Target Milestone: | --- | Keywords: | ZStream | ||||||
| Target Release: | RHGS 2.1.2 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | glusterfs-3.4.0.49rhs-1.el6rhs | Doc Type: | Bug Fix | ||||||
| Doc Text: |
Previously, glusterd would become unresponsive when it disconnected from one of its peers while a gluster CLI command is in execution. With this fix, glusterd does not become unresponsive in such a scenario.
|
Story Points: | --- | ||||||
| Clone Of: | 1037849 | Environment: | |||||||
| Last Closed: | 2014-02-25 08:07:37 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Bug Depends On: | 1037849 | ||||||||
| Bug Blocks: | |||||||||
| Attachments: |
|
||||||||
|
Comment 3
Pavithra
2014-01-03 11:05:55 UTC
Doc-text looks good to me Created attachment 849167 [details]
Verification logs
The following thing are done to reproduce the issue. Testcase ========= 1. Create a distribute volume with 3 nodes and start the volume [root@rhsauto015 brick1]# gluster volume info Volume Name: testvol2 Type: Distribute Volume ID: 61259fc6-d2f6-410e-8832-ea529cf709de Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: 10.70.37.7:/rhs/brick1/ex10 Brick2: 10.70.36.245:/rhs/brick1/ex9 Brick3: 10.70.37.10:/rhs/brick1/ex8 2. Block the glusterd traffic by setting the ip table rules on glusterd port on all nodes. [root@rhsauto015 ~]# iptables -I INPUT 1 -p tcp --dport 24007 -j DROP [root@rhsauto015 ~]# iptables -I OUTPUT 1 -p tcp --dport 24007 -j DROP 3. Through a script simulatneously connect to the nodes and execute the CLI commands(gluster volume info and gluster peer status continuously for 10000 times) 4. The glusterd hangs after some iterations while fetching the "gluster peer status" information from the other nodes. There is no glusterd crash. I will raise a new bug for this issue. Attached verification logs. Marking it "Assigned" as the glusterd still hangs. Steps to reproduce (since the description doesn't provide one at all): 1) Set the following iptables rule #Blocks incoming messages to glusterd iptables -I INPUT 1 -p tcp --dport 513:65535 -j DROP 2) Execute volume-profile-info command periodically, in a loop (say) 3) Remove the above iptables rule. #Unblocks iptables -D INPUT 1 The issue was that the Big lock was being held while submitting RPC requests. This code path was active in volume-profile command and hence the hang (actually deadlock) was observed. The fix address only this issue. The steps in comment#6 is not related to this issue. Sobhan, What do you mean when you say glusterd is hung? How did you confirm that glusterd was indeed hung? Based on https://bugzilla.redhat.com/show_bug.cgi?id=1037851#c6, running the command in a loop for 10000 times is a stress test :) Krishnan P, I waited till 10-15 sec by which the response should have definitely received by the local glustered daemon and make sure the glusterd is not hung. I have tried number of iterations(i.e 100) by blocking the glusterd port using iptables rules and there was no glusterd hung. Verification Information: Test-1 ======= 1. Create a distribute volume using 3 server nodes 2.Set the following iptables rule as follows (i.e which allowing incoming ssh,dns,http etc) #Blocks incoming messages to glusterd iptables -I INPUT 1 -p tcp --dport 513:65535 -j DROP 3.Executed the volume profile info/volume profile start/volume profile info/volume profile stop along with other commands like gluster peer status/gluster volume status parallely in different nodes(100 times). Put a sleep 2,10,15 sec in between the commands to make sure there is no issue due to timeouts. Test-2 ====== 1.Unblocked the iptables rules(iptables -F/iptables -D INPUT 1) 2.Executed the volume profile info/volume profile start/volume profile info/volume profile stop along with other commands like gluster peer status/gluster volume status parallely in different nodes(100 times). Put a sleep 2,10,15 sec in between the commands to make sure there is no issue due to timeouts. 3.Verify glustered donot hang. Verified with following Build: ============================= glusterfs 3.4.0.57rhs Number of RHSS nodes: ==================== 3 Created attachment 851033 [details]
Verification Logs-2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html |