Hide Forgot
After the crash of 3455 (where one client crashed and other client was still running), tried to remove the contents by doing rm -rf. But it hung. This is what statedump says. [global.callpool.stack.1] global.callpool.stack.1.uid=0 global.callpool.stack.1.gid=0 global.callpool.stack.1.pid=17773 global.callpool.stack.1.unique=1072429 global.callpool.stack.1.op=LOOKUP global.callpool.stack.1.type=1 global.callpool.stack.1.cnt=3 [global.callpool.stack.1.frame.1] global.callpool.stack.1.frame.1.ref_count=1 global.callpool.stack.1.frame.1.translator=fuse global.callpool.stack.1.frame.1.complete=0 [global.callpool.stack.1.frame.2] global.callpool.stack.1.frame.2.ref_count=0 global.callpool.stack.1.frame.2.translator=mirror-stat-prefetch global.callpool.stack.1.frame.2.complete=0 global.callpool.stack.1.frame.2.parent=mirror global.callpool.stack.1.frame.2.wind_from=io_stats_lookup global.callpool.stack.1.frame.2.wind_to=FIRST_CHILD(this)->fops->lookup global.callpool.stack.1.frame.2.unwind_to=io_stats_lookup_cbk [global.callpool.stack.1.frame.3] global.callpool.stack.1.frame.3.ref_count=1 global.callpool.stack.1.frame.3.translator=mirror global.callpool.stack.1.frame.3.complete=0 global.callpool.stack.1.frame.3.parent=fuse global.callpool.stack.1.frame.3.wind_from=fuse_lookup_resume global.callpool.stack.1.frame.3.wind_to=xl->fops->lookup global.callpool.stack.1.frame.3.unwind_to=fuse_lookup_cbk There are still many lookups being hung (not just on stat-prefetch but also in afr). [global.callpool.stack.4] global.callpool.stack.4.uid=0 global.callpool.stack.4.gid=0 global.callpool.stack.4.pid=17324 global.callpool.stack.4.unique=1072029 global.callpool.stack.4.op=LOOKUP global.callpool.stack.4.type=1 global.callpool.stack.4.cnt=10 [global.callpool.stack.4.frame.1] global.callpool.stack.4.frame.1.ref_count=1 global.callpool.stack.4.frame.1.translator=fuse global.callpool.stack.4.frame.1.complete=0 [global.callpool.stack.4.frame.2] global.callpool.stack.4.frame.2.ref_count=0 global.callpool.stack.4.frame.2.translator=mirror-client-1 global.callpool.stack.4.frame.2.complete=1 global.callpool.stack.4.frame.2.parent=mirror-replicate-0 global.callpool.stack.4.frame.2.wind_from=afr_lookup global.callpool.stack.4.frame.2.wind_to=priv->children[i]->fops->lookup global.callpool.stack.4.frame.2.unwind_from=client3_1_lookup_cbk global.callpool.stack.4.frame.2.unwind_to=afr_lookup_cbk [global.callpool.stack.4.frame.3] global.callpool.stack.4.frame.3.ref_count=0 global.callpool.stack.4.frame.3.translator=mirror-client-0 global.callpool.stack.4.frame.3.complete=1 global.callpool.stack.4.frame.3.parent=mirror-replicate-0 global.callpool.stack.4.frame.3.wind_from=afr_lookup global.callpool.stack.4.frame.3.wind_to=priv->children[i]->fops->lookup global.callpool.stack.4.frame.3.unwind_from=client3_1_lookup_cbk global.callpool.stack.4.frame.3.unwind_to=afr_lookup_cbk [global.callpool.stack.4.frame.4] global.callpool.stack.4.frame.4.ref_count=0 global.callpool.stack.4.frame.4.translator=mirror-replicate-0 global.callpool.stack.4.frame.4.complete=0 global.callpool.stack.4.frame.4.parent=mirror-write-behind global.callpool.stack.4.frame.4.wind_from=default_lookup global.callpool.stack.4.frame.4.wind_to=FIRST_CHILD(this)->fops->lookup global.callpool.stack.4.frame.4.unwind_to=default_lookup_cbk [global.callpool.stack.4.frame.5] global.callpool.stack.4.frame.5.ref_count=1 global.callpool.stack.4.frame.5.translator=mirror-write-behind global.callpool.stack.4.frame.5.complete=0 global.callpool.stack.4.frame.5.parent=mirror-read-ahead global.callpool.stack.4.frame.5.wind_from=default_lookup global.callpool.stack.4.frame.5.wind_to=FIRST_CHILD(this)->fops->lookup global.callpool.stack.4.frame.5.unwind_to=default_lookup_cbk [global.callpool.stack.4.frame.6] global.callpool.stack.4.frame.6.ref_count=1 global.callpool.stack.4.frame.6.translator=mirror-read-ahead global.callpool.stack.4.frame.6.complete=0 global.callpool.stack.4.frame.6.parent=mirror-io-cache global.callpool.stack.4.frame.6.wind_from=ioc_lookup global.callpool.stack.4.frame.6.wind_to=FIRST_CHILD (this)->fops->lookup global.callpool.stack.4.frame.6.unwind_to=ioc_lookup_cbk [global.callpool.stack.4.frame.7] global.callpool.stack.4.frame.7.ref_count=1 global.callpool.stack.4.frame.7.translator=mirror-io-cache global.callpool.stack.4.frame.7.complete=0 global.callpool.stack.4.frame.7.parent=mirror-quick-read global.callpool.stack.4.frame.7.wind_from=qr_lookup global.callpool.stack.4.frame.7.wind_to=FIRST_CHILD(this)->fops->lookup global.callpool.stack.4.frame.7.unwind_to=qr_lookup_cbk
CHANGE: http://review.gluster.com/294 (Change-Id: I66362a3087a635fb7b759d7836a1f6564a6a7fc9) merged in master by Vijay Bellur (vijay)
The problem was that, earlier we used send flush on the source and then all the sinks. But before sending the flush to source, we would have cleared the pending xattrs of the sinks and thus all the sinks would also have become the sources. So we used to send the flush only to the sources and the other stck wind to the sink (flush fop) would not happen and we would be expecting 2 unwinds for continuing. Thus the client would hang since the other stack unwind nerver happened. But now we are keeping all the souce and sinks in the success array, and we will call stack wind of flush only once for all source as well as sink and the client will not hang. Thus, now flush is sent on both source as well as sink. Hence the hang is not seen now.