Bug 765188 (GLUSTER-3456) - [b6e3e9c480be4226925b51c5e9ee0c368aa94a6d]: client hanging
Summary: [b6e3e9c480be4226925b51c5e9ee0c368aa94a6d]: client hanging
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3456
Product: GlusterFS
Classification: Community
Component: replicate
Version: pre-release
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-21 18:33 UTC by Raghavendra Bhat
Modified: 2011-08-29 07:44 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: glusterfs-3.3beta


Attachments (Terms of Use)

Description Raghavendra Bhat 2011-08-21 18:33:07 UTC
After the crash of 3455 (where one client crashed and other client was still running), tried to remove the contents by doing rm -rf. But it hung. This is what statedump says.


[global.callpool.stack.1]
global.callpool.stack.1.uid=0
global.callpool.stack.1.gid=0
global.callpool.stack.1.pid=17773
global.callpool.stack.1.unique=1072429
global.callpool.stack.1.op=LOOKUP
global.callpool.stack.1.type=1
global.callpool.stack.1.cnt=3

[global.callpool.stack.1.frame.1]
global.callpool.stack.1.frame.1.ref_count=1
global.callpool.stack.1.frame.1.translator=fuse
global.callpool.stack.1.frame.1.complete=0

[global.callpool.stack.1.frame.2]
global.callpool.stack.1.frame.2.ref_count=0
global.callpool.stack.1.frame.2.translator=mirror-stat-prefetch
global.callpool.stack.1.frame.2.complete=0
global.callpool.stack.1.frame.2.parent=mirror
global.callpool.stack.1.frame.2.wind_from=io_stats_lookup
global.callpool.stack.1.frame.2.wind_to=FIRST_CHILD(this)->fops->lookup
global.callpool.stack.1.frame.2.unwind_to=io_stats_lookup_cbk

[global.callpool.stack.1.frame.3]
global.callpool.stack.1.frame.3.ref_count=1
global.callpool.stack.1.frame.3.translator=mirror
global.callpool.stack.1.frame.3.complete=0
global.callpool.stack.1.frame.3.parent=fuse
global.callpool.stack.1.frame.3.wind_from=fuse_lookup_resume
global.callpool.stack.1.frame.3.wind_to=xl->fops->lookup
global.callpool.stack.1.frame.3.unwind_to=fuse_lookup_cbk

There are still many lookups being hung (not just on stat-prefetch but also in afr).


[global.callpool.stack.4]
global.callpool.stack.4.uid=0
global.callpool.stack.4.gid=0
global.callpool.stack.4.pid=17324
global.callpool.stack.4.unique=1072029
global.callpool.stack.4.op=LOOKUP
global.callpool.stack.4.type=1
global.callpool.stack.4.cnt=10

[global.callpool.stack.4.frame.1]
global.callpool.stack.4.frame.1.ref_count=1
global.callpool.stack.4.frame.1.translator=fuse
global.callpool.stack.4.frame.1.complete=0

[global.callpool.stack.4.frame.2]
global.callpool.stack.4.frame.2.ref_count=0
global.callpool.stack.4.frame.2.translator=mirror-client-1
global.callpool.stack.4.frame.2.complete=1
global.callpool.stack.4.frame.2.parent=mirror-replicate-0
global.callpool.stack.4.frame.2.wind_from=afr_lookup
global.callpool.stack.4.frame.2.wind_to=priv->children[i]->fops->lookup
global.callpool.stack.4.frame.2.unwind_from=client3_1_lookup_cbk
global.callpool.stack.4.frame.2.unwind_to=afr_lookup_cbk

[global.callpool.stack.4.frame.3]
global.callpool.stack.4.frame.3.ref_count=0
global.callpool.stack.4.frame.3.translator=mirror-client-0
global.callpool.stack.4.frame.3.complete=1
global.callpool.stack.4.frame.3.parent=mirror-replicate-0
global.callpool.stack.4.frame.3.wind_from=afr_lookup
global.callpool.stack.4.frame.3.wind_to=priv->children[i]->fops->lookup
global.callpool.stack.4.frame.3.unwind_from=client3_1_lookup_cbk
global.callpool.stack.4.frame.3.unwind_to=afr_lookup_cbk

[global.callpool.stack.4.frame.4]
global.callpool.stack.4.frame.4.ref_count=0
global.callpool.stack.4.frame.4.translator=mirror-replicate-0
global.callpool.stack.4.frame.4.complete=0
global.callpool.stack.4.frame.4.parent=mirror-write-behind
global.callpool.stack.4.frame.4.wind_from=default_lookup
global.callpool.stack.4.frame.4.wind_to=FIRST_CHILD(this)->fops->lookup
global.callpool.stack.4.frame.4.unwind_to=default_lookup_cbk

[global.callpool.stack.4.frame.5]
global.callpool.stack.4.frame.5.ref_count=1
global.callpool.stack.4.frame.5.translator=mirror-write-behind
global.callpool.stack.4.frame.5.complete=0
global.callpool.stack.4.frame.5.parent=mirror-read-ahead
global.callpool.stack.4.frame.5.wind_from=default_lookup
global.callpool.stack.4.frame.5.wind_to=FIRST_CHILD(this)->fops->lookup
global.callpool.stack.4.frame.5.unwind_to=default_lookup_cbk

[global.callpool.stack.4.frame.6]
global.callpool.stack.4.frame.6.ref_count=1
global.callpool.stack.4.frame.6.translator=mirror-read-ahead
global.callpool.stack.4.frame.6.complete=0
global.callpool.stack.4.frame.6.parent=mirror-io-cache
global.callpool.stack.4.frame.6.wind_from=ioc_lookup
global.callpool.stack.4.frame.6.wind_to=FIRST_CHILD (this)->fops->lookup
global.callpool.stack.4.frame.6.unwind_to=ioc_lookup_cbk

[global.callpool.stack.4.frame.7]
global.callpool.stack.4.frame.7.ref_count=1
global.callpool.stack.4.frame.7.translator=mirror-io-cache
global.callpool.stack.4.frame.7.complete=0
global.callpool.stack.4.frame.7.parent=mirror-quick-read
global.callpool.stack.4.frame.7.wind_from=qr_lookup
global.callpool.stack.4.frame.7.wind_to=FIRST_CHILD(this)->fops->lookup
global.callpool.stack.4.frame.7.unwind_to=qr_lookup_cbk

Comment 1 Anand Avati 2011-08-22 06:41:20 UTC
CHANGE: http://review.gluster.com/294 (Change-Id: I66362a3087a635fb7b759d7836a1f6564a6a7fc9) merged in master by Vijay Bellur (vijay)

Comment 2 Raghavendra Bhat 2011-08-29 04:44:58 UTC
The problem was that, earlier we used send flush on the source and then all the sinks. But before sending the flush to source, we would have cleared the pending xattrs of the sinks and thus all the sinks would also have become the sources.

So we used to send the flush only to the sources and the other stck wind to the sink (flush fop) would not happen and we would be expecting 2 unwinds for continuing. Thus the client would hang since the other stack unwind nerver happened.

But now we are keeping all the souce and sinks in the success array, and we will call stack wind of flush only once for all source as well as sink and the client will not hang.

Thus, now flush is sent on both source as well as sink. Hence the hang is not seen now.


Note You need to log in before you can comment on or make changes to this bug.