Bug 802414 - [glusterfs-3.3.0qa27]: glusterfs client hung when fs-perf-test was executed
Summary: [glusterfs-3.3.0qa27]: glusterfs client hung when fs-perf-test was executed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Vijay Bellur
QA Contact: Raghavendra Bhat
URL:
Whiteboard:
Depends On:
Blocks: 850501
TreeView+ depends on / blocked
 
Reported: 2012-03-12 13:50 UTC by Raghavendra Bhat
Modified: 2013-07-24 17:57 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:57:26 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Bhat 2012-03-12 13:50:31 UTC
Description of problem:
3 replica pure replicate volume. 3 fuse clients and 3 nfs clients, each executing different tools in a loop such as rdd, fs-perf-test, ping_pong, create-bench etc. volume set operations were getting executed paralley and brought down a brick and then brought it up.

Ths client executing the fs-perf-test hung(even df -h hung when statfs on that mount point was hit). According to the statedump there were 230929 callstacks in progress.

[global.callpool]
callpool_address=0x1757cf0
callpool.cnt=230929

[global.callpool.stack.1]
uid=0
gid=0
pid=0
unique=0
type=0
cnt=1

[global.callpool.stack.1.frame.1]
ref_count=0
translator=fuse
complete=0

[global.callpool.stack.2]
uid=0
gid=0
pid=0
unique=0
type=0
cnt=11

[global.callpool.stack.2.frame.1]
ref_count=1
translator=fuse
complete=0

[global.callpool.stack.2.frame.2]
ref_count=0
translator=mirror-replicate-0
complete=0
parent=mirror-replicate-0
wind_from=afr_open_cbk
wind_to=this->fops->ftruncate
unwind_to=afr_open_ftruncate_cbk

[global.callpool.stack.2.frame.3]
ref_count=0
translator=mirror-client-2
complete=1
parent=mirror-replicate-0


Version-Release number of selected component (if applicable):


How reproducible:

always
Steps to Reproduce:
1. create a replicate volume, start it and mount it via multiple clients (fuse, nfs)
2. run different tools in a loop on different mounts (ping_pong, fs-perf-tes, rdd, create-bench, threaded-io etc)
3. after some hours client running fs-perf-test hangs.
  
Actual results:

fuse client running fs-perf-test hangs
Expected results:
fuse client should not hang

Additional info:

[2012-03-12 01:13:27.621882] W [fuse-bridge.c:3590:fuse_migrate_fd] 0-glusterfs-fuse: open on gfid (b372810c-0cfb-4bd1-a11e-461c6cd115c1) fail
ed (Cannot allocate memory)
[2012-03-12 01:13:27.623179] I [afr-common.c:1313:afr_launch_self_heal] 15-mirror-replicate-0: background  data self-heal triggered. path: , r
eason: lookup detected pending operations
[2012-03-12 01:13:27.638233] I [afr-self-heal-data.c:738:afr_sh_data_fix] 15-mirror-replicate-0: no active sinks for performing self-heal on f
ile
[2012-03-12 01:13:27.647158] I [afr-self-heal-common.c:2037:afr_self_heal_completion_cbk] 15-mirror-replicate-0: background  data self-heal co
mpleted on
[2012-03-12 01:13:27.651514] W [fd.c:804:__fd_ctx_set] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/write-behind.so(wb_open_cbk+0
x190) [0x7f129b462995] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/write-behind.so(wb_file_create+0x1e1) [0x7f129b45d607] (-->/u
sr/local/lib/libglusterfs.so.0(fd_ctx_set+0xb5) [0x7f129fec5ee0]))) 15-: 0xb15cfbc mirror-write-behind
[2012-03-12 01:13:27.651586] W [fd.c:804:__fd_ctx_set] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/write-behind.so(wb_open_cbk+0
x3c6) [0x7f129b462bcb] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/read-ahead.so(ra_open_cbk+0x28a) [0x7f129b24f7c8] (-->/usr/lo
cal/lib/libglusterfs.so.0(fd_ctx_set+0xb5) [0x7f129fec5ee0]))) 15-: 0xb15cfbc mirror-read-ahead
[2012-03-12 01:13:27.651605] W [read-ahead.c:110:ra_open_cbk] 15-mirror-read-ahead: cannot set read-ahead context information in fd (0xb15cfbc
)
[2012-03-12 01:13:27.651677] W [fuse-bridge.c:3590:fuse_migrate_fd] 0-glusterfs-fuse: open on gfid (de28f094-18d5-49e2-8b9c-39af139a083a) fail
ed (Cannot allocate memory)
[2012-03-12 01:13:27.652473] I [afr-common.c:1313:afr_launch_self_heal] 15-mirror-replicate-0: background  data self-heal triggered. path: , r
eason: lookup detected pending operations
[2012-03-12 01:13:27.665738] I [afr-self-heal-data.c:738:afr_sh_data_fix] 15-mirror-replicate-0: no active sinks for performing self-heal on f
ile
[2012-03-12 01:13:27.674267] I [afr-self-heal-common.c:2037:afr_self_heal_completion_cbk] 15-mirror-replicate-0: background  data self-heal co
mpleted on
[2012-03-12 01:13:27.677538] W [fd.c:804:__fd_ctx_set] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/write-behind.so(wb_open_cbk+0
x190) [0x7f129b462995] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/write-behind.so(wb_file_create+0x1e1) [0x7f129b45d607] (-->/u
sr/local/lib/libglusterfs.so.0(fd_ctx_set+0xb5) [0x7f129fec5ee0]))) 15-: 0xb15d020 mirror-write-behind
[2012-03-12 01:13:27.677616] W [fd.c:804:__fd_ctx_set] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/write-behind.so(wb_open_cbk+0
x3c6) [0x7f129b462bcb] (-->/usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/read-ahead.so(ra_open_cbk+0x28a) [0x7f129b24f7c8] (-->/usr/lo
cal/lib/libglusterfs.so.0(fd_ctx_set+0xb5) [0x7f129fec5ee0]))) 15-: 0xb15d020 mirror-read-ahead
[2012-03-12 01:13:27.677637] W [read-ahead.c:110:ra_open_cbk] 15-mirror-read-ahead: cannot set read-ahead context information in fd (0xb15d020
)
[2012-03-12 01:13:27.677708] W [fuse-bridge.c:3590:fuse_migrate_fd] 0-glusterfs-fuse: open on gfid (ca814e91-e1f9-4f00-9d42-51953993c0a9) fail
ed (Cannot allocate memory)
[2012-03-12 01:13:27.679087] I [afr-common.c:1313:afr_launch_self_heal] 15-mirror-replicate-0: background  data self-heal triggered. path: , r
eason: lookup detected pending operations
[2012-03-12 01:13:27.694692] I [afr-self-heal-data.c:738:afr_sh_data_fix] 15-mirror-replicate-0: no active sinks for performing self-heal on f
ile
[2012-03-12 01:13:27.697798] W [client3_1-fops.c:1228:client3_1_inodelk_cbk] 15-mirror-client-0: remote operation failed: No such file or directory
[2012-03-12 01:13:27.697829] E [afr-lk-common.c:568:afr_unlock_inodelk_cbk] 15-mirror-replicate-0: : unlock failed on 0, reason: No such file or directory
[2012-03-12 01:13:27.703460] I [afr-self-heal-common.c:2037:afr_self_heal_com

Comment 1 Vijay Bellur 2012-05-25 05:37:30 UTC
Not reproducible anymore. Hence removing from the blocker list.

Comment 2 Amar Tumballi 2012-10-11 09:38:05 UTC
http://review.gluster.org/3566


Note You need to log in before you can comment on or make changes to this bug.