Bug 802039 - [glusterfs-3.3.0qa26]: blocked posix locks in the server
Summary: [glusterfs-3.3.0qa26]: blocked posix locks in the server
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: locks
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: krishnan parthasarathi
QA Contact: Raghavendra Bhat
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-10 13:27 UTC by Raghavendra Bhat
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:59:08 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: release-3.3(638a4740cc553c96bc01d1dfe4a2b7acf0b406e6)
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Bhat 2012-03-10 13:27:22 UTC
Description of problem:
2x2 distributed replicate volume, with 1 fuse and 1 nfs client. On fuse was running rdd, ping_pong, fs-perf test one after another in a loop. Rebooted one of the bricks and restarted it.

ping_pong running on the fuse client hung. (ping_pong was also getting errors before hanging since it was started)

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. create a distributed replicate volume, start it and mount it
2.  run ping_pong on the fuse client
3. bring one of the bricks down and then bring it up after some time.
  
Actual results:

ping_piong gets blocked

Expected results:

ping_pong should not block and continue to get locks

Additional info:

statedump of the server having blocked locks.

[1./export-xfs/mirror.itable.active.1999]
gfid=275f3060-b089-43f1-8930-36a3582e0a6c
nlookup=579
ref=3
ia_type=1

[xlator.features.locks.mirror-locks.inode]
path=/file
mandatory=0
posixlk-count=5
posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=96, len=1, pid = 18890, owner=4e6e35ff74ffffff, transport=0x10dd9690, , granted at Sat 
Mar 10 03:02:41 2012

posixlk.posixlk[1](ACTIVE)=type=WRITE, whence=0, start=103, len=1, pid = 18899, owner=4e6e35ff74ffffff, transport=0x10dd9690, , granted at Sat
 Mar 10 03:02:41 2012

posixlk.posixlk[2](ACTIVE)=type=WRITE, whence=0, start=97, len=1, pid = 18890, owner=4e6e35ff74ffffff, transport=0x10dd9690, , granted at Sat 
Mar 10 03:02:41 2012

posixlk.posixlk[3](ACTIVE)=type=READ, whence=0, start=0, len=0, pid = 18899, owner=4e6e35ff74ffffff, transport=0x10feec10, , granted at Sat Ma
r 10 03:09:59 2012

posixlk.posixlk[4](BLOCKED)=type=WRITE, whence=0, start=96, len=1, pid = 18899, owner=4e6e35ff74ffffff, transport=0x10feec10, , blocked at Sat Mar 10 03:09:59 2012


[1./export-xfs/mirror.itable.active.2000]
gfid=23599b35-4d8b-401c-a493-768a0c19fad8
nlookup=582
ref=5
ia_type=2

[xlator.features.locks.mirror-locks.inode]
path=/samba/lib/ccan/tally
mandatory=0

[1./export-xfs/mirror.itable.active.2001]
gfid=32121b6f-1080-4410-9022-469dd3f46dc6
nlookup=576
ref=4
:

server logs where blocked locks were found:


2012-03-10 02:57:11.368580] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-10 02:57:11.368681] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of node131-19679-2012/03
/10-02:49:10:171926-mirror-client-3-0
[2012-03-10 02:57:12.368882] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-10 02:57:12.368976] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of node130-17561-2012/03
/10-02:49:31:050508-mirror-client-3-0
[2012-03-10 02:57:12.368991] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-10 02:57:12.369004] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of RHEL6.1-8777-2012/03/
10-02:49:47:085068-mirror-client-3-0
[2012-03-10 03:02:20.755382] I [server.c:622:server_rpc_notify] 0-mirror-server: disconnecting connectionfrom 10.1.11.104:997
[2012-03-10 03:02:20.755525] I [server.c:631:server_rpc_notify] 0-mirror-server: starting a grace timer for tcp.mirror-server
[2012-03-10 03:02:31.408647] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-10 03:02:31.408772] I [server-helpers.c:474:do_fd_cleanup] 0-mirror-server: fd cleanup on /file
[2012-03-10 03:02:31.408810] I [server-helpers.c:474:do_fd_cleanup] 0-mirror-server: fd cleanup on /in
[2012-03-10 03:02:31.409164] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of RHEL6.1-2225-2012/03/09-02:06:40:494326-mirror-client-3-18
[2012-03-10 03:02:35.888654] I [server.c:622:server_rpc_notify] 0-mirror-server: disconnecting connectionfrom 10.1.11.131:1013
[2012-03-10 03:02:35.888720] I [server.c:631:server_rpc_notify] 0-mirror-server: starting a grace timer for tcp.mirror-server
[2012-03-10 03:02:36.947712] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2012-03-10 03:02:36.974833] I [server.c:622:server_rpc_notify] 0-mirror-server: disconnecting connectionfrom 10.1.11.130:1009
[2012-03-10 03:02:36.974866] I [server.c:631:server_rpc_notify] 0-mirror-server: starting a grace timer for tcp.mirror-server
[2012-03-10 03:02:36.979345] I [server.c:622:server_rpc_notify] 0-mirror-server: disconnecting connectionfrom 10.1.11.145:1017
[2012-03-10 03:02:36.979400] I [server.c:631:server_rpc_notify] 0-mirror-server: starting a grace timer for tcp.mirror-server
[2012-03-10 03:02:38.003381] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2012-03-10 03:02:38.003455] I [glusterfsd-mgmt.c:1299:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2012-03-10 03:02:38.003785] I [glusterfsd-mgmt.c:1299:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2012-03-10 03:02:41.882657] I [server-handshake.c:569:server_setvolume] 0-mirror-server: accepted client from 10.1.11.104:993 (version: 3.3.0qa26)
[2012-03-10 03:02:41.976401] I [server-handshake.c:569:server_setvolume] 0-mirror-server: accepted client from 10.1.11.131:1011 (version: 3.3.0qa26)
[2012-03-10 03:02:42.020283] I [server-handshake.c:569:server_setvolume] 0-mirror-server: accepted client from 10.1.11.130:1008 (version: 3.3.0qa26)
[2012-03-10 03:02:42.031175] I [server-handshake.c:569:server_setvolume] 0-mirror-server: accepted client from 10.1.11.145:1013 (version: 3.3.0qa26)
[2012-03-10 03:02:46.410339] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-10 03:02:46.410444] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of node131-19694-2012/03/10-02:56:25:690380-mirror-client-3-0
[2012-03-10 03:02:47.410582] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-10 03:02:47.410693] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of node130-17622-2012/03/10-02:56:46:569325-mirror-client-3-0
[2012-03-10 03:02:47.410721] I [server.c:52:grace_time_handler] 0-mirror-server: grace timer expired
[2012-03-10 03:02:47.410740] I [server-helpers.c:630:server_connection_destroy] 0-mirror-server: destroyed connection of RHEL6.1-8787-2012/03/10-02:57:02:665588-mirror-client-3-0
:


Also fuse client is logging too much about the registering grace timer and grce-timer getting expired.

2012-03-10 08:15:51.119192] W [client.c:2011:client_rpc_notify] 46-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.122198] W [client.c:2011:client_rpc_notify] 42-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.125312] W [client.c:2011:client_rpc_notify] 40-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.127181] W [client.c:2011:client_rpc_notify] 38-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.130176] W [client.c:2011:client_rpc_notify] 32-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.133261] W [client.c:2011:client_rpc_notify] 31-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.136168] W [client.c:2011:client_rpc_notify] 34-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.139166] W [client.c:2011:client_rpc_notify] 33-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.141172] W [client.c:2011:client_rpc_notify] 28-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.144165] W [client.c:2011:client_rpc_notify] 23-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.147170] W [client.c:2011:client_rpc_notify] 22-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.150174] W [client.c:2011:client_rpc_notify] 16-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.153377] W [client.c:2011:client_rpc_notify] 29-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.155177] W [client.c:2011:client_rpc_notify] 24-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.158166] W [client.c:2011:client_rpc_notify] 52-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.161253] W [client.c:2011:client_rpc_notify] 21-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.164197] W [client.c:2011:client_rpc_notify] 13-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:51.166181] W [client.c:2011:client_rpc_notify] 9-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:52.172189] W [client.c:2011:client_rpc_notify] 54-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:52.402671] W [client.c:112:client_grace_timeout] 35-mirror-client-2: client grace timer expired, updating the lk-version to 562
[2012-03-10 08:15:52.402722] W [client.c:112:client_grace_timeout] 3-mirror-client-2: client grace timer expired, updating the lk-version to 1881
[2012-03-10 08:15:52.404982] W [client.c:112:client_grace_timeout] 62-mirror-client-2: client grace timer expired, updating the lk-version to 17
[2012-03-10 08:15:53.178180] W [client.c:2011:client_rpc_notify] 47-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:53.410586] W [client.c:112:client_grace_timeout] 57-mirror-client-2: client grace timer expired, updating the lk-version to 118
[2012-03-10 08:15:54.184213] W [client.c:2011:client_rpc_notify] 45-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:54.186190] W [client.c:2011:client_rpc_notify] 44-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:54.189204] W [client.c:2011:client_rpc_notify] 27-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:54.192202] W [client.c:2011:client_rpc_notify] 0-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:54.194209] W [client.c:2011:client_rpc_notify] 25-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:54.197352] W [client.c:2011:client_rpc_notify] 19-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:54.200194] W [client.c:2011:client_rpc_notify] 20-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:54.202200] W [client.c:2011:client_rpc_notify] 17-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:54.205181] W [client.c:2011:client_rpc_notify] 10-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:54.208181] W [client.c:2011:client_rpc_notify] 15-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:54.210215] W [client.c:2011:client_rpc_notify] 7-mirror-client-2: Registering a grace timer
[2012-03-10 08:15:54.416538] W [client.c:112:client_grace_timeout] 39-mirror-client-2: client grace timer expired, updating the lk-version to 646
[2012-03-10 08:15:54.416586] W [client.c:112:client_grace_timeout] 58-mirror-client-2: client grace timer expired, updating the lk-version to 105

Comment 1 Amar Tumballi 2012-03-12 09:46:16 UTC
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.

Comment 2 Vijay Bellur 2012-05-24 07:22:47 UTC
Fixed as part of bug 810502. Can you please confirm?

Thanks,
Vijay

Comment 3 Raghavendra Bhat 2012-05-24 07:37:45 UTC
Checked on release-3.3 (git head: 638a4740cc553c96bc01d1dfe4a2b7acf0b406e6). Ran ping_pong from 2 fuse client parallely. Repeated the tests multiple times with both lock-heal on and off. Did not hit the issue and ping_pong completed without hang evertytime on all the mounts. Can be closed.


Note You need to log in before you can comment on or make changes to this bug.