Bug 763263 (GLUSTER-1531)
Summary: | FINODELK bailing out | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Anush Shetty <anush> | ||||
Component: | rdma | Assignee: | Raghavendra G <raghavendra> | ||||
Status: | CLOSED WORKSFORME | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | low | ||||||
Version: | mainline | CC: | gluster-bugs, raghavendra, vijay | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | Type: | --- | |||||
Regression: | RTNR | Mount Type: | fuse | ||||
Documentation: | DNR | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Anush Shetty
2010-09-06 06:14:17 UTC
Created attachment 304 *** Bug 1462 has been marked as a duplicate of this bug. *** Not able to reproduce the issue. State dump also does not indicate anything being blocked. *** Bug 1811 has been marked as a duplicate of this bug. *** * bug is reproducible even on a two node afr setup. * looking at the logs, on client side, afr is not sending unlock to one of the locks granted to it and strangely it is sending unlock call to one of its children but not the other. This was verified by putting trace in between client and afr. Server side logs also confirm that the unlock call has not been received. *** Bug 1559 has been marked as a duplicate of this bug. *** *** Bug 1881 has been marked as a duplicate of this bug. *** Another symptom of this bug is ping timer getting expired on rdma setup (bugs 2060 and 2043). rdma has a quota mechanism for managing the receive buffers. In order to send a request, there should be non-zero quota available and this quota is returned to pool once a reply has come back. Since finodelk requests are blocked on server side, ping request could not be transmitted to server thereby resulting in expiry of ping timer. On looking through logs, I found that afr did not send UNLOCK request to one of its children. Below are the logs Observe the lock requests on ino=32623548. [2010-11-08 00:54:03.954115] I [trace.c:1364:trace_finodelk] trace-0: 9485: volume=rdma-replicate-0, (fd =0x2aaaae181138, ino=32623548}, cmd=SETLK, type=WRITE, start=0, len=0, pid=18446744073606080992) [2010-11-08 00:54:03.954151] I [trace.c:1364:trace_finodelk] trace-1: 9485: volume=rdma-replicate-0, (fd =0x2aaaae181138, ino=32623548}, cmd=SETLK, type=WRITE, start=0, len=0, pid=18446744073606080992) [2010-11-08 00:54:03.954228] I [trace.c:1301:trace_finodelk_cbk] trace-0: 9483: op_ret=-1, op_errno=11 [2010-11-08 00:54:03.954252] I [trace.c:1301:trace_finodelk_cbk] trace-1: 9483: op_ret=-1, op_errno=11 [2010-11-08 00:54:03.954269] I [trace.c:1364:trace_finodelk] trace-0: 9483: volume=rdma-replicate-0, (fd =0x2aaaae181090, ino=46006322}, cmd=SETLKW, type=WRITE, start=0, len=0, pid=18446744073670823160) [2010-11-08 00:54:03.954351] I [trace.c:1183:trace_fxattrop_cbk] trace-0: 9482: (op_ret=0, op_errno=22) [2010-11-08 00:54:03.954425] I [trace.c:1183:trace_fxattrop_cbk] trace-1: 9482: (op_ret=0, op_errno=22) [2010-11-08 00:54:03.954441] I [trace.c:1364:trace_finodelk] trace-0: 9482: volume=rdma-replicate-0, (fd =0x2aaaae181090, ino=46006322}, cmd=SETLK, type=UNLOCK, start=0, len=0, pid=18446744072335397008) [2010-11-08 00:54:03.954468] I [trace.c:1364:trace_finodelk] trace-1: 9482: volume=rdma-replicate-0, (fd =0x2aaaae181090, ino=46006322}, cmd=SETLK, type=UNLOCK, start=0, len=0, pid=18446744072335397008) [2010-11-08 00:54:03.954502] I [trace.c:840:trace_flush_cbk] trace-1: 9458: (op_ret=0, op_errno=0) [2010-11-08 00:54:03.954521] I [trace.c:1364:trace_finodelk] trace-1: 9458: volume=rdma-replicate-0, (fd =0x2aaaae18103c, ino=32623972}, cmd=SETLK, type=UNLOCK, start=0, len=0, pid=18446744072292244719) [2010-11-08 00:54:03.954553] I [trace.c:1301:trace_finodelk_cbk] trace-0: 9485: op_ret=-1, op_errno=11 [2010-11-08 00:54:03.954575] I [trace.c:1301:trace_finodelk_cbk] trace-1: 9485: op_ret=0, op_errno=0 [2010-11-08 00:54:03.954590] I [trace.c:1364:trace_finodelk] trace-1: 9485: volume=rdma-replicate-0, (fd =0x2aaaae181138, ino=32623548}, cmd=SETLK, type=UNLOCK, start=0, len=0, pid=0) [2010-11-08 00:54:03.954621] I [trace.c:1301:trace_finodelk_cbk] trace-0: 9483: op_ret=0, op_errno=0 [2010-11-08 00:54:03.954640] I [trace.c:1364:trace_finodelk] trace-1: 9483: volume=rdma-replicate-0, (fd =0x2aaaae1 81090, ino=46006322}, cmd=SETLKW, type=WRITE, start=0, len=0, pid=13935544) [2010-11-08 00:54:03.954710] I [trace.c:1424:trace_lookup] trace-0: 9486: (loc {path=/glusterfs/build/conf1702.sh, ino=46006322}) [2010-11-08 00:54:03.954755] I [trace.c:1301:trace_finodelk_cbk] trace-1: 9482: op_ret=0, op_errno=0 [2010-11-08 00:54:03.954790] I [trace.c:1424:trace_lookup] trace-1: 9486: (loc {path=/glusterfs/build/conf1702.sh, ino=46006322}) [2010-11-08 00:54:03.954832] I [trace.c:1301:trace_finodelk_cbk] trace-0: 9482: op_ret=0, op_errno=0 [2010-11-08 00:54:03.954832] I [trace.c:1301:trace_finodelk_cbk] trace-0: 9482: op_ret=0, op_errno=0 [2010-11-08 00:54:03.954882] I [trace.c:1301:trace_finodelk_cbk] trace-1: 9459: op_ret=0, op_errno=0 [2010-11-08 00:54:03.954902] I [trace.c:1822:trace_flush] trace-1: 9459: (*fd=0x2aaaae18103c) [2010-11-08 00:54:03.955049] I [trace.c:1301:trace_finodelk_cbk] trace-1: 9458: op_ret=0, op_errno=0 [2010-11-08 00:54:03.955123] I [trace.c:1301:trace_finodelk_cbk] trace-1: 9485: op_ret=0, op_errno=0 [2010-11-08 00:54:03.955139] I [trace.c:1364:trace_finodelk] trace-0: 9485: volume=rdma-replicate-0, (fd =0x2aaaae181138, ino=32623548}, cmd=SETLKW, type=WRITE, start=0, len=0, pid=18446744072283501362) As can be seen from logs, a SETLK request was sent on both trace-0 and trace-1 followed by an UNLOCK request only on trace-1. Ignore my previous comment related to this bug similar to 2060 and 2043. I was wrong in my analysis. Also ignore the comment related to afr not sending UNLOCK to server. The bug is no longer reproducible on latest git pull (26cedae57d5b7cb8). Hence marking this bug as resolved. |