Bug 845754

Summary: untar linux kernel tree, untars 15 entries in 1 minute.
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs, jdarcy, rfortier
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 853679 (view as bug list) Environment:
Last Closed: 2013-07-24 17:22:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 853679, 895528    

Description Pranith Kumar K 2012-08-04 15:12:35 UTC
Description of problem:
Used git-bisect to figure out that the following commit caused the problem.

20:33:06  ⚡ git bisect good
5a3d849b8563067d35c1422e43e605bd9533f3c2 is the first bad commit
commit 5a3d849b8563067d35c1422e43e605bd9533f3c2
Author: Anand Avati <avati>
Date:   Tue Jun 26 21:06:39 2012 -0700

   cluster/afr: post-op-delay support
let me just verify that it is indeed the bad-one.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Pranith Kumar K 2012-08-04 16:46:18 UTC
This is happening because of the following sequence of operations:
1) Write: takes a full lock on file with fd address.
2) setattr: issues blocking lock based on frame->address. Because of the delayed changelog the unlock happens after a second. Some how logs show 2 seconds delay though. Still need to figure that out. This delay is the reason for the bug.
3) Flush: takes a full lock on file with fd address. (this is not contributing to the bug)

[2012-08-04 16:36:47.691995] I [server-helpers.c:807:get_frame_from_request] 0-server: 30 - ccad140100000000
[2012-08-04 16:36:47.692093] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: FINODELK scheduled as normal fop
[2012-08-04 16:36:47.692558] I [server-helpers.c:807:get_frame_from_request] 0-server: 34 - ccad140100000000
[2012-08-04 16:36:47.692693] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: FXATTROP scheduled as slow fop
[2012-08-04 16:36:47.694386] I [server-helpers.c:807:get_frame_from_request] 0-server: 13 - 0000000000000000
[2012-08-04 16:36:47.694515] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: WRITE scheduled as slow fop
[2012-08-04 16:36:47.695786] I [server-helpers.c:807:get_frame_from_request] 0-server: 29 - 98b8f48baf7f0000
[2012-08-04 16:36:47.695903] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: INODELK scheduled as normal fop
[2012-08-04 16:36:47.696479] I [server-helpers.c:807:get_frame_from_request] 0-server: 29 - 98b8f48baf7f0000
[2012-08-04 16:36:47.696565] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: INODELK scheduled as normal fop
[2012-08-04 16:36:49.508681] I [server-helpers.c:807:get_frame_from_request] 0-server: 34 - ccad140100000000
[2012-08-04 16:36:49.508798] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: FXATTROP scheduled as slow fop
[2012-08-04 16:36:49.509421] I [server-helpers.c:807:get_frame_from_request] 0-server: 30 - ccad140100000000
[2012-08-04 16:36:49.509483] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: FINODELK scheduled as normal fop
[2012-08-04 16:36:49.509530] D [inodelk.c:313:__inode_unlock_lock] 0-r2-locks:  Matching lock found for unlock 0-
9223372036854775807, by ccad140100000000 on 0xc53bb0
[2012-08-04 16:36:49.510362] I [server-helpers.c:807:get_frame_from_request] 0-server: 38 - 0000000000000000
[2012-08-04 16:36:49.510462] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: SETATTR scheduled as normal fop
[2012-08-04 16:36:49.511198] I [server-helpers.c:807:get_frame_from_request] 0-server: 29 - 98b8f48baf7f0000
[2012-08-04 16:36:49.511298] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: INODELK scheduled as normal fop
[2012-08-04 16:36:49.511452] I [server-helpers.c:807:get_frame_from_request] 0-server: 29 - bcc0f48baf7f0000
[2012-08-04 16:36:49.511609] D [io-threads.c:2380:__iot_workers_scale] 0-r2-io-threads: scaled threads to 2 (queue_size=2/2)
[2012-08-04 16:36:49.511672] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: INODELK scheduled as normal fop
[2012-08-04 16:36:49.511759] D [inodelk.c:313:__inode_unlock_lock] 0-r2-locks:  Matching lock found for unlock 9223372036854775806-9223372036854775807, by 98b8f48baf7f0000 on 0xc53bb0
[2012-08-04 16:36:49.512325] I [server-helpers.c:807:get_frame_from_request] 0-server: 38 - 0000000000000000
[2012-08-04 16:36:49.512410] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: SETATTR scheduled as normal fop
[2012-08-04 16:36:49.512978] I [server-helpers.c:807:get_frame_from_request] 0-server: 29 - bcc0f48baf7f0000
[2012-08-04 16:36:49.513074] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: INODELK scheduled as normal fop
[2012-08-04 16:36:49.513117] D [inodelk.c:313:__inode_unlock_lock] 0-r2-locks:  Matching lock found for unlock 9223372036854775806-9223372036854775807, by bcc0f48baf7f0000 on 0xc53bb0
[2012-08-04 16:36:49.513232] I [server-helpers.c:807:get_frame_from_request] 0-server: 29 - 74b0f48baf7f0000
[2012-08-04 16:36:49.513318] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: INODELK scheduled as normal fop
[2012-08-04 16:36:49.513914] I [server-helpers.c:807:get_frame_from_request] 0-server: 38 - 0000000000000000
[2012-08-04 16:36:49.514028] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: SETATTR scheduled as normal fop
[2012-08-04 16:36:49.514284] D [posix.c:262:posix_do_chmod] 0-r2-posix: /gfs/r2_0/linux-3.0.1/.gitignore (Function not implemented)
[2012-08-04 16:36:49.513117] D [inodelk.c:313:__inode_unlock_lock] 0-r2-locks:  Matching lock found for unlock 9223372036854775806-9223372036854775807, by bcc0f48baf7f0000 on 0xc53bb0
[2012-08-04 16:36:49.513232] I [server-helpers.c:807:get_frame_from_request] 0-server: 29 - 74b0f48baf7f0000
[2012-08-04 16:36:49.513318] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: INODELK scheduled as normal fop
[2012-08-04 16:36:49.513914] I [server-helpers.c:807:get_frame_from_request] 0-server: 38 - 0000000000000000
[2012-08-04 16:36:49.514028] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: SETATTR scheduled as normal fop
[2012-08-04 16:36:49.514284] D [posix.c:262:posix_do_chmod] 0-r2-posix: /gfs/r2_0/linux-3.0.1/.gitignore (Function not implemented)
[2012-08-04 16:36:49.514887] I [server-helpers.c:807:get_frame_from_request] 0-server: 29 - 74b0f48baf7f0000
[2012-08-04 16:36:49.514983] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: INODELK scheduled as normal fop
[2012-08-04 16:36:49.515117] I [server-helpers.c:807:get_frame_from_request] 0-server: 30 - ccad140100000000
[2012-08-04 16:36:49.515186] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: FINODELK scheduled as normal fop
[2012-08-04 16:36:49.515265] D [inodelk.c:313:__inode_unlock_lock] 0-r2-locks:  Matching lock found for unlock 9223372036854775806-9223372036854775807, by 74b0f48baf7f0000 on 0xc53bb0
[2012-08-04 16:36:49.515806] I [server-helpers.c:807:get_frame_from_request] 0-server: 27 - 0000000000000000
[2012-08-04 16:36:49.515928] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: LOOKUP scheduled as fast fop
[2012-08-04 16:36:49.516435] I [server-helpers.c:807:get_frame_from_request] 0-server: 15 - 2c3736089ab3ced2
[2012-08-04 16:36:49.516520] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: FLUSH scheduled as normal fop
[2012-08-04 16:36:49.517220] I [server-helpers.c:807:get_frame_from_request] 0-server: 30 - ccad140100000000
[2012-08-04 16:36:49.517295] D [io-threads.c:266:iot_schedule] 0-r2-io-threads: FINODELK scheduled as normal fop
[2012-08-04 16:36:49.517330] D [inodelk.c:313:__inode_unlock_lock] 0-r2-locks:  Matching lock found for unlock 0-9223372036854775807, by ccad140100000000 on 0xc53bb0

Comment 2 Pranith Kumar K 2012-08-04 17:17:09 UTC
I think the steps were a bit confusing in the comment above:
Here are the steps:
1) Write: takes a full lock on file with fd address.
Completes fop and unwinds, write-post-op is delayed by 1 second.
2) setattr: issues non-blocking lock with frame->address. This fails because of the full lock by write in prev step. Now a blocking lock is issued . Because of the delayed changelog the unlock for the write-fop's full lock happens after a second. This delay is the reason for the bug. Some how logs show 2 seconds delay though. Still need to figure that out.

Comment 3 Jeff Darcy 2012-10-31 21:02:21 UTC
http://review.gluster.org/3975