Description of problem: 3 replica pure replicate volume. 3 fuse clients and 3 nfs clients, each executing different tools in a loop such as rdd, fs-perf-test, ping_pong, create-bench etc. volume set operations were getting executed paralley and brought down a brick and then brought it up. Ths client executing the fs-perf-test hung(even df -h hung when statfs on that mount point was hit). According to the statedump there were 230929 callstacks in progress. [global.callpool] callpool_address=0x1757cf0 callpool.cnt=230929 [global.callpool.stack.1] uid=0 gid=0 pid=0 unique=0 type=0 cnt=1 [global.callpool.stack.1.frame.1] ref_count=0 translator=fuse complete=0 [global.callpool.stack.2] uid=0 gid=0 pid=0 unique=0 type=0 cnt=11 [global.callpool.stack.2.frame.1] ref_count=1 translator=fuse complete=0 [global.callpool.stack.2.frame.2] ref_count=0 translator=mirror-replicate-0 complete=0 parent=mirror-replicate-0 wind_from=afr_open_cbk wind_to=this->fops->ftruncate unwind_to=afr_open_ftruncate_cbk [global.callpool.stack.2.frame.3] ref_count=0 translator=mirror-client-2 complete=1 parent=mirror-replicate-0 Version-Release number of selected component (if applicable): How reproducible: always Steps to Reproduce: 1. create a replicate volume, start it and mount it via multiple clients (fuse, nfs) 2. run different tools in a loop on different mounts (ping_pong, fs-perf-tes, rdd, create-bench, threaded-io etc) 3. after some hours client running fs-perf-test hangs. Actual results: fuse client running fs-perf-test hangs Expected results: fuse client should not hang Additional info: [2012-03-12 01:13:27.621882] W [fuse-bridge.c:3590:fuse_migrate_fd] 0-glusterfs-fuse: open on gfid (b372810c-0cfb-4bd1-a11e-461c6cd115c1) fail ed (Cannot allocate memory) [2012-03-12 01:13:27.623179] I [afr-common.c:1313:afr_launch_self_heal] 15-mirror-replicate-0: background data self-heal triggered. path: , r eason: lookup detected pending operations [2012-03-12 01:13:27.638233] I [afr-self-heal-data.c:738:afr_sh_data_fix] 15-mirror-replicate-0: no active sinks for performing self-heal on f ile [2012-03-12 01:13:27.647158] I [afr-self-heal-common.c:2037:afr_self_heal_completion_cbk] 15-mirror-replicate-0: background data self-heal co mpleted on [2012-03-12 01:13:27.651514] W [fd.c:804:__fd_ctx_set] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/write-behind.so(wb_open_cbk+0 x190) [0x7f129b462995] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/write-behind.so(wb_file_create+0x1e1) [0x7f129b45d607] (-->/u sr/local/lib/libglusterfs.so.0(fd_ctx_set+0xb5) [0x7f129fec5ee0]))) 15-: 0xb15cfbc mirror-write-behind [2012-03-12 01:13:27.651586] W [fd.c:804:__fd_ctx_set] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/write-behind.so(wb_open_cbk+0 x3c6) [0x7f129b462bcb] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/read-ahead.so(ra_open_cbk+0x28a) [0x7f129b24f7c8] (-->/usr/lo cal/lib/libglusterfs.so.0(fd_ctx_set+0xb5) [0x7f129fec5ee0]))) 15-: 0xb15cfbc mirror-read-ahead [2012-03-12 01:13:27.651605] W [read-ahead.c:110:ra_open_cbk] 15-mirror-read-ahead: cannot set read-ahead context information in fd (0xb15cfbc ) [2012-03-12 01:13:27.651677] W [fuse-bridge.c:3590:fuse_migrate_fd] 0-glusterfs-fuse: open on gfid (de28f094-18d5-49e2-8b9c-39af139a083a) fail ed (Cannot allocate memory) [2012-03-12 01:13:27.652473] I [afr-common.c:1313:afr_launch_self_heal] 15-mirror-replicate-0: background data self-heal triggered. path: , r eason: lookup detected pending operations [2012-03-12 01:13:27.665738] I [afr-self-heal-data.c:738:afr_sh_data_fix] 15-mirror-replicate-0: no active sinks for performing self-heal on f ile [2012-03-12 01:13:27.674267] I [afr-self-heal-common.c:2037:afr_self_heal_completion_cbk] 15-mirror-replicate-0: background data self-heal co mpleted on [2012-03-12 01:13:27.677538] W [fd.c:804:__fd_ctx_set] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/write-behind.so(wb_open_cbk+0 x190) [0x7f129b462995] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/write-behind.so(wb_file_create+0x1e1) [0x7f129b45d607] (-->/u sr/local/lib/libglusterfs.so.0(fd_ctx_set+0xb5) [0x7f129fec5ee0]))) 15-: 0xb15d020 mirror-write-behind [2012-03-12 01:13:27.677616] W [fd.c:804:__fd_ctx_set] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/write-behind.so(wb_open_cbk+0 x3c6) [0x7f129b462bcb] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/read-ahead.so(ra_open_cbk+0x28a) [0x7f129b24f7c8] (-->/usr/lo cal/lib/libglusterfs.so.0(fd_ctx_set+0xb5) [0x7f129fec5ee0]))) 15-: 0xb15d020 mirror-read-ahead [2012-03-12 01:13:27.677637] W [read-ahead.c:110:ra_open_cbk] 15-mirror-read-ahead: cannot set read-ahead context information in fd (0xb15d020 ) [2012-03-12 01:13:27.677708] W [fuse-bridge.c:3590:fuse_migrate_fd] 0-glusterfs-fuse: open on gfid (ca814e91-e1f9-4f00-9d42-51953993c0a9) fail ed (Cannot allocate memory) [2012-03-12 01:13:27.679087] I [afr-common.c:1313:afr_launch_self_heal] 15-mirror-replicate-0: background data self-heal triggered. path: , r eason: lookup detected pending operations [2012-03-12 01:13:27.694692] I [afr-self-heal-data.c:738:afr_sh_data_fix] 15-mirror-replicate-0: no active sinks for performing self-heal on f ile [2012-03-12 01:13:27.697798] W [client3_1-fops.c:1228:client3_1_inodelk_cbk] 15-mirror-client-0: remote operation failed: No such file or directory [2012-03-12 01:13:27.697829] E [afr-lk-common.c:568:afr_unlock_inodelk_cbk] 15-mirror-replicate-0: : unlock failed on 0, reason: No such file or directory [2012-03-12 01:13:27.703460] I [afr-self-heal-common.c:2037:afr_self_heal_com
Not reproducible anymore. Hence removing from the blocker list.
http://review.gluster.org/3566