Hide Forgot
Jonathan, Can you retry the operation without the filter translator and let us know if you still face the issue? We are trying to understand if the cause of the problem is associated with the filter translator Avati
Running an rsync of ~20 million small files crashes the client. The client sits there at 100% CPU. strace reveals: futex(0xcd32a70, FUTEX_WAIT, 2, NULL The client layout is: testing/features/filter -> cluster/distribute -> cluster/replicate -> protocol/client -> protocol/server -> performance/io-cache -> performance/io-threads -> features/locks -> storage/posix The server does not crash/lock up, the client does.
I will test that tomorrow. Right now I am running: for i in {1..1000000}; do ls -alR /gluster/mount; done against ~2TB of small files; which has crashed in the past. If this passes, I'll run the rsyncs again with the filter translator removed.
The recursive ls also caused the issue. I've removed the filter translator and am running the ls test again.
After removing the filter translator, we still see the 100% cpu usage on the client and the fs is no longer responding. strace reveals: epoll_wait(3, on the servers and client.
Restarting the servers does not recover the client. I'm going to do a test with the following layout: testing/features/filter -> cluster/distribute -> cluster/replicate -> protocol/client -> protocol/server -> features/locks -> storage/posix
The attempted restart of the glusterfsd did result in a defunct process on both servers.
"testing/features/filter -> cluster/distribute -> cluster/replicate -> protocol/client -> protocol/server -> features/locks -> storage/posix" still causes the client to stop responding when running the for loop listing content. Additionally, if the client process is killed before the servers are restarted this does not result in a defunct process.
> Additionally, if the client process is killed before the servers are restarted > this does not result in a defunct process. Can you get us gdb backtraces of the process threads when it is in the high cpu usage mode? or when hung? Avati
I just found this in DEBUG loging: [..... running .....] [2009-07-09 18:11:36] D [inode.c:572:__inode_replace] fuse/inode: inode(428478) replaced (428478 [2009-07-09 18:11:37] D [inode.c:572:__inode_replace] fuse/inode: inode(428482) replaced (428482 [2009-07-09 18:11:37] D [inode.c:572:__inode_replace] fuse/inode: inode(428249) replaced (428249 [2009-07-09 18:11:37] D [inode.c:572:__inode_replace] fuse/inode: inode(428481) replaced (428481 [2009-07-09 18:11:37] D [inode.c:572:__inode_replace] fuse/inode: inode(2148348676) replaced (2148348676 [2009-07-09 18:11:37] D [inode.c:572:__inode_replace] fuse/inode: inode(428493) replaced (428493 [2009-07-09 18:11:37] D [inode.c:572:__inode_replace] fuse/inode: inode(2148348682) replaced (2148348682 [2009-07-09 18:11:37] D [inode.c:572:__inode_replace] fuse/inode: inode(428499) replaced (428499 [2009-07-09 18:11:37] D [client-protocol.c:4746:client_lookup_cbk] iops_lun_1: LOOKUP 2188059948/6000 (/path/to/dir): inode number changed from 428499 to 428244 [2009-07-09 18:11:37] D [client-protocol.c:4746:client_lookup_cbk] iops_lun_0: LOOKUP 2188059948/6000 (/path/to/dir): inode number changed from 428787 to 428499
Sorry, after the client_lookup_cbk calls the process goes to 100% and the fs stops responding. This looks like an issue with cluster/replicate.
(In reply to comment #10) > Sorry, after the client_lookup_cbk calls the process goes to 100% and the fs > stops responding. This looks like an issue with cluster/replicate. Jonathan, We would really be interested to see a backtrace of all the threads when the FS is in 100% cpu. Could you do that for us or give us remote login access to your setup so that we can inspect it in gdb?
I have just verified it looks like an issue with cluster/replicate. I removed the second subvolume from the cluster/replicate volume and have not experienced the issue running the same ls loop. gdb on the way (I'm going to let this test run a little while longer.)
Having run the test(s) successfully with only one cluster/replicate subvolume, I'm unable to reproduce this issue anymore. I'm going to start another rsync against this volume which caused/exposed this issue and hopefully it re-creates it. There is another user on IRC that can reproduce the issue and I will prod them more to add comments and hopefully get a gdb backtrace for us (assuming I can not reproduce the situation.)
I was also able to reproduce the issue under Glusterfs 2.0.2 using the following volfile. Currently attempting to reproduce and generate gdb backtrace. volume one01.brick01 type protocol/client option transport-type tcp/client option remote-host 192.168.2.101 option remote-subvolume brick01 end-volume volume two01.brick01 type protocol/client option transport-type tcp/client option remote-host 192.168.1.101 option remote-subvolume brick01 end-volume volume afr01 type cluster/afr option read-subvolume one01.brick01 subvolumes one01.brick01 two01.brick01 end-volume ## Add writeback feature volume write-behind_afr01 type performance/write-behind # option aggregate-size 1MB option cache-size 3MB option flush-behind on subvolumes afr01 end-volume ### Add IO-Cache feature volume iocache_afr01 type performance/io-cache option cache-size 64MB # default is 32MB option page-size 1MB #128KB is default option option force-revalidate-timeout 2 # default is 1 subvolumes write-behind_afr01 end-volume ### Add readahead feature volume readahead_afr01 type performance/read-ahead option page-count 16 # cache per file = (page-count x page-size) subvolumes iocache_afr01 end-volume
[Switching to Thread 47931321951968 (LWP 31120)] 0x00002b97e114f888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0 (gdb) bt #0 0x00002b97e114f888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0 #1 0x00002b97e114b8a5 in _L_mutex_lock_107 () from /lib64/libpthread.so.0 #2 0x00002b97e114b333 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002b97e0d146ac in inode_unref (inode=0x20015ab0) at inode.c:374 #4 0x00002b97e0d1dc48 in fd_unref (fd=0x2aaab17d2e50) at fd.c:461 #5 0x00002b97e2154363 in afr_sh_data_flush_cbk (frame=0x20a25680, cookie=<value optimized out>, this=0x1ff52c50, op_ret=<value optimized out>, op_errno=<value optimized out>) at afr-self-heal-data.c:99 #6 0x00002b97e1f1df47 in client_flush_cbk (frame=0x2aaaac41dc20, hdr=0x2aaaacaa0b70, hdrlen=<value optimized out>, iobuf=<value optimized out>) at client-protocol.c:4578 #7 0x00002b97e1f1d8aa in protocol_client_pollin (this=0x1ff52890, trans=0x1ff57630) at client-protocol.c:6169 #8 0x00002b97e1f24372 in notify (this=0x1ff58a70, event=2, data=0x1ff57630) at client-protocol.c:6213 #9 0x00002aaaaaaafdeb in socket_event_handler (fd=<value optimized out>, idx=2, data=0x1ff57630, poll_in=1, poll_out=0, poll_err=0) at socket.c:814 #10 0x00002b97e0d1f5a5 in event_dispatch_epoll (event_pool=0x1ff4d5e0) at event.c:804 #11 0x0000000000403b59 in main (argc=6, argv=0x7fffc9fcc978) at glusterfsd.c:1226
(gdb) backtrace full #0 0x00002b97e114f888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00002b97e114b8a5 in _L_mutex_lock_107 () from /lib64/libpthread.so.0 No symbol table info available. #2 0x00002b97e114b333 in pthread_mutex_lock () from /lib64/libpthread.so.0 No symbol table info available. #3 0x00002b97e0d146ac in inode_unref (inode=0x20015ab0) at inode.c:374 table = (inode_table_t *) 0x1ff58a70 #4 0x00002b97e0d1dc48 in fd_unref (fd=0x2aaab17d2e50) at fd.c:461 refcount = <value optimized out> __FUNCTION__ = "fd_unref" #5 0x00002b97e2154363 in afr_sh_data_flush_cbk (frame=0x20a25680, cookie=<value optimized out>, this=0x1ff52c50, op_ret=<value optimized out>, op_errno=<value optimized out>) at afr-self-heal-data.c:99 sh = (afr_self_heal_t *) 0x20cd21d8 call_count = -4 #6 0x00002b97e1f1df47 in client_flush_cbk (frame=0x2aaaac41dc20, hdr=0x2aaaacaa0b70, hdrlen=<value optimized out>, iobuf=<value optimized out>) at client-protocol.c:4578 fn = (ret_fn_t) 0xe87991bc _parent = (call_frame_t *) 0x1ff58a70 op_errno = -1332406208 #7 0x00002b97e1f1d8aa in protocol_client_pollin (this=0x1ff52890, trans=0x1ff57630) at client-protocol.c:6169 conf = (client_conf_t *) 0x1ff57190 ret = 0 iobuf = (struct iobuf *) 0x0 hdr = 0x2aaaacaa0b70 "" hdrlen = 32 #8 0x00002b97e1f24372 in notify (this=0x1ff58a70, event=2, data=0x1ff57630) at client-protocol.c:6213 handshake = <value optimized out> ret = <value optimized out> child_down = <value optimized out> was_not_down = <value optimized out> trans = (transport_t *) 0x0 conn = <value optimized out> conf = (client_conf_t *) 0x1ff57190 parent = <value optimized out> __FUNCTION__ = "notify" #9 0x00002aaaaaaafdeb in socket_event_handler (fd=<value optimized out>, idx=2, data=0x1ff57630, poll_in=1, poll_out=0, poll_err=0) at socket.c:814 this = (transport_t *) 0x1ff58a70 priv = (socket_private_t *) 0x1ff57a50 ret = -4 #10 0x00002b97e0d1f5a5 in event_dispatch_epoll (event_pool=0x1ff4d5e0) at event.c:804 events = (struct epoll_event *) 0x1ff5a100 i = 3 ret = 4 __FUNCTION__ = "event_dispatch_epoll" #11 0x0000000000403b59 in main (argc=6, argv=0x7fffc9fcc978) at glusterfsd.c:1226 ctx = <value optimized out> cmd_args = <value optimized out> stbuf = {st_dev = 0, st_ino = 47931309588925, st_nlink = 0, st_mode = 1243904375, st_uid = 0, st_gid = 0, pad0 = 0, st_rdev = 47931309596326, st_size = 140736582173824, st_blksize = 47931309588864, st_blocks = 140736582173887, st_atim = {tv_sec = 140736582173872, tv_nsec = 140736582173864}, st_mtim = {tv_sec = 47931311753976, tv_nsec = 1}, st_ctim = {tv_sec = 0, tv_nsec = 4198641}, __unused = {47931318545606, 140736582173936, 47931321924464}} tmp_logfile = '\0' <repeats 1023 times> timestr = '\0' <repeats 255 times> utime = 1247351137 tm = <value optimized out> ret = <value optimized out> lim = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615} specfp = <value optimized out> graph = (xlator_t *) 0x0 trav = (xlator_t *) 0x1ff4c060 fuse_volume_found = 0 xl_count = 0 __FUNCTION__ = "main" (gdb) info registers rax 0xfffffffffffffffc -4 rbx 0x1ff58a70 536185456 rcx 0xffffffffffffffff -1 rdx 0x2 2 rsi 0x0 0 rdi 0x1ff58a70 536185456 rbp 0x7990 0x7990 rsp 0x7fffc9fcb998 0x7fffc9fcb998 r8 0x2aaab0951840 46912595368000 r9 0xe87991bc 3900281276 r10 0x0 0 r11 0x202 514 r12 0x1ff58a70 536185456 r13 0x1ff52c50 536161360 r14 0x0 0 r15 0x1 1 rip 0x2b97e114f888 0x2b97e114f888 <__lll_mutex_lock_wait+24> eflags 0x202 [ IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) thread apply all backtrace Thread 3 (Thread 1092397376 (LWP 31121)): #0 0x00002b97e13f6611 in nanosleep () from /lib64/libc.so.6 #1 0x00002b97e1429b24 in usleep () from /lib64/libc.so.6 #2 0x00002b97e0d12dd4 in gf_timer_proc (ctx=0x1ff4c010) at timer.c:177 #3 0x00002b97e11492f7 in start_thread () from /lib64/libpthread.so.0 #4 0x00002b97e142fe3d in clone () from /lib64/libc.so.6 Thread 2 (Thread 1112729920 (LWP 31126)): #0 __dentry_search_arbit (inode=0x21bc4e80) at inode.c:798 #1 0x00002b97e0d1495e in inode_path (inode=0x20de66d0, name=0x2b97e1e4e028 "60", bufp=0x4252e070) at inode.c:859 #2 0x00002b97e278dd96 in fuse_loc_fill (loc=0x217c7938, state=0x217c7920, ino=0, par=4974744968, name=0x2b97e1e4e028 "60") at fuse-bridge.c:248 #3 0x00002b97e2790de5 in fuse_lookup (req=0x20386510, par=4974744968, name=0x2b97e1e4e028 "60") at fuse-bridge.c:423 #4 0x00002b97e2792f6d in fuse_thread_proc (data=<value optimized out>) at fuse-bridge.c:2480 #5 0x00002b97e11492f7 in start_thread () from /lib64/libpthread.so.0 #6 0x00002b97e142fe3d in clone () from /lib64/libc.so.6 Thread 1 (Thread 47931321951968 (LWP 31120)): #0 0x00002b97e114f888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0 #1 0x00002b97e114b8a5 in _L_mutex_lock_107 () from /lib64/libpthread.so.0 #2 0x00002b97e114b333 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00002b97e0d146ac in inode_unref (inode=0x20015ab0) at inode.c:374 #4 0x00002b97e0d1dc48 in fd_unref (fd=0x2aaab17d2e50) at fd.c:461 #5 0x00002b97e2154363 in afr_sh_data_flush_cbk (frame=0x20a25680, cookie=<value optimized out>, this=0x1ff52c50, op_ret=<value optimized out>, op_errno=<value optimized out>) at afr-self-heal-data.c:99 #6 0x00002b97e1f1df47 in client_flush_cbk (frame=0x2aaaac41dc20, hdr=0x2aaaacaa0b70, hdrlen=<value optimized out>, iobuf=<value optimized out>) at client-protocol.c:4578 #7 0x00002b97e1f1d8aa in protocol_client_pollin (this=0x1ff52890, trans=0x1ff57630) at client-protocol.c:6169 #8 0x00002b97e1f24372 in notify (this=0x1ff58a70, event=2, data=0x1ff57630) at client-protocol.c:6213 #9 0x00002aaaaaaafdeb in socket_event_handler (fd=<value optimized out>, idx=2, data=0x1ff57630, poll_in=1, poll_out=0, poll_err=0) at socket.c:814 #10 0x00002b97e0d1f5a5 in event_dispatch_epoll (event_pool=0x1ff4d5e0) at event.c:804 #11 0x0000000000403b59 in main (argc=6, argv=0x7fffc9fcc978) at glusterfsd.c:1226
(gdb) thread apply all backtrace full Thread 3 (Thread 1109051712 (LWP 4066)): #0 0x00002ab177e25611 in nanosleep () from /lib64/libc.so.6 No symbol table info available. #1 0x00002ab177e58b24 in usleep () from /lib64/libc.so.6 No symbol table info available. #2 0x00002ab177741dd4 in gf_timer_proc (ctx=0x24f0010) at timer.c:177 now = 1247508494661553 now_tv = {tv_sec = 1247508494, tv_usec = 661553} event = (gf_timer_t *) 0x24feef0 reg = (gf_timer_registry_t *) 0x24f99d0 __FUNCTION__ = "gf_timer_proc" #3 0x00002ab177b782f7 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00002ab177e5ee3d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 2 (Thread 1119541568 (LWP 4071)): #0 __dentry_search_arbit (inode=0x2aaac09e1010) at inode.c:799 dentry = (dentry_t *) 0x0 trav = (dentry_t *) 0x2aaabcb18600 #1 0x00002ab17774395e in inode_path (inode=0x2aaac09e1010, name=0x2ab1788bf028 "60", bufp=0x42bad070) at inode.c:859 table = (inode_table_t *) 0x24fca70 trav = (dentry_t *) 0x2aaabcb1b0a0 i = 121758588324 size = <value optimized out> ret = <value optimized out> len = <value optimized out> buf = <value optimized out> __FUNCTION__ = "inode_path" #2 0x00002ab1791bcd96 in fuse_loc_fill (loc=0x2aaac09dd6c8, state=0x2aaac09dd6b0, ino=0, par=3780234481, name=0x2ab1788bf028 "60") at fuse-bridge.c:248 inode = (inode_t *) 0x0 parent = (inode_t *) 0x2aaac09e1010 ret = <value optimized out> path = 0x0 __FUNCTION__ = "fuse_loc_fill" #3 0x00002ab1791bfde5 in fuse_lookup (req=0x2aaabcaeff80, par=3780234481, name=0x2ab1788bf028 "60") at fuse-bridge.c:423 state = (fuse_state_t *) 0x1 ret = <value optimized out> __FUNCTION__ = "fuse_lookup" #4 0x00002ab1791c1f6d in fuse_thread_proc (data=<value optimized out>) at fuse-bridge.c:2480 mount_point = <value optimized out> this = (xlator_t *) 0x24f1b20 priv = (fuse_private_t *) 0x24fcb40 res = 43 iobuf = (struct iobuf *) 0x24f15a8 chan_size = 135168 __FUNCTION__ = "fuse_thread_proc" #5 0x00002ab177b782f7 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #6 0x00002ab177e5ee3d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 1 (Thread 46941711767264 (LWP 4065)): #0 0x00002ab177e5f228 in epoll_wait () from /lib64/libc.so.6 No symbol table info available. #1 0x00002ab17774e4f2 in event_dispatch_epoll (event_pool=0x24f15e0) at event.c:850 events = (struct epoll_event *) 0x24fe100 i = 1 ret = 1 __FUNCTION__ = "event_dispatch_epoll" #2 0x0000000000403b59 in main (argc=6, argv=0x7fff3359cf48) at glusterfsd.c:1226 ctx = <value optimized out> cmd_args = <value optimized out> stbuf = {st_dev = 0, st_ino = 46941699404221, st_nlink = 0, st_mode = 1243904375, st_uid = 0, st_gid = 0, pad0 = 0, st_rdev = 46941699411622, st_size = 140734054910544, st_blksize = 46941699404160, st_blocks = 140734054910607, st_atim = {tv_sec = 140734054910592, tv_nsec = 140734054910584}, st_mtim = {tv_sec = 46941701569272, tv_nsec = 1}, st_ctim = {tv_sec = 0, tv_nsec = 4198641}, __unused = {46941708360902, 140734054910656, 46941711739760}} tmp_logfile = '\0' <repeats 1023 times> timestr = '\0' <repeats 255 times> utime = 1247505826 tm = <value optimized out> ret = <value optimized out> lim = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615} specfp = <value optimized out> graph = (xlator_t *) 0x0 trav = (xlator_t *) 0x24f0060 fuse_volume_found = 0 xl_count = 0 __FUNCTION__ = "main" #0 0x00002ab177e5f228 in epoll_wait () from /lib64/libc.so.6 (gdb) thread apply all backtrace Thread 3 (Thread 1109051712 (LWP 4066)): #0 0x00002ab177e25611 in nanosleep () from /lib64/libc.so.6 #1 0x00002ab177e58b24 in usleep () from /lib64/libc.so.6 #2 0x00002ab177741dd4 in gf_timer_proc (ctx=0x24f0010) at timer.c:177 #3 0x00002ab177b782f7 in start_thread () from /lib64/libpthread.so.0 #4 0x00002ab177e5ee3d in clone () from /lib64/libc.so.6 Thread 2 (Thread 1119541568 (LWP 4071)): #0 __dentry_search_arbit (inode=0x2aaac09e1010) at inode.c:799 #1 0x00002ab17774395e in inode_path (inode=0x2aaac09e1010, name=0x2ab1788bf028 "60", bufp=0x42bad070) at inode.c:859 #2 0x00002ab1791bcd96 in fuse_loc_fill (loc=0x2aaac09dd6c8, state=0x2aaac09dd6b0, ino=0, par=3780234481, name=0x2ab1788bf028 "60") at fuse-bridge.c:248 #3 0x00002ab1791bfde5 in fuse_lookup (req=0x2aaabcaeff80, par=3780234481, name=0x2ab1788bf028 "60") at fuse-bridge.c:423 #4 0x00002ab1791c1f6d in fuse_thread_proc (data=<value optimized out>) at fuse-bridge.c:2480 #5 0x00002ab177b782f7 in start_thread () from /lib64/libpthread.so.0 #6 0x00002ab177e5ee3d in clone () from /lib64/libc.so.6 Thread 1 (Thread 46941711767264 (LWP 4065)): #0 0x00002ab177e5f228 in epoll_wait () from /lib64/libc.so.6 #1 0x00002ab17774e4f2 in event_dispatch_epoll (event_pool=0x24f15e0) at event.c:850 #2 0x0000000000403b59 in main (argc=6, argv=0x7fff3359cf48) at glusterfsd.c:1226 #0 0x00002ab177e5f228 in epoll_wait () from /lib64/libc.so.6
Please note this looks to have been happening for versions 2.0.0-2.0.3. I'm updating the version back to 2.0.3.
jonathan, thanks for all the detailed reports. We are on it, and will update here when this bug gets fixed. (or a patch is submitted).
(In reply to comment #17) > (gdb) thread apply all backtrace full > > Thread 2 (Thread 1119541568 (LWP 4071)): > #0 __dentry_search_arbit (inode=0x2aaac09e1010) at inode.c:799 > #1 0x00002ab17774395e in inode_path (inode=0x2aaac09e1010, name=0x2ab1788bf028 > "60", bufp=0x42bad070) at inode.c:859 > #2 0x00002ab1791bcd96 in fuse_loc_fill (loc=0x2aaac09dd6c8, > state=0x2aaac09dd6b0, ino=0, par=3780234481, name=0x2ab1788bf028 "60") at > fuse-bridge.c:248 > #3 0x00002ab1791bfde5 in fuse_lookup (req=0x2aaabcaeff80, par=3780234481, > name=0x2ab1788bf028 "60") at fuse-bridge.c:423 > #4 0x00002ab1791c1f6d in fuse_thread_proc (data=<value optimized out>) at > fuse-bridge.c:2480 > #5 0x00002ab177b782f7 in start_thread () from /lib64/libpthread.so.0 > #6 0x00002ab177e5ee3d in clone () from /lib64/libc.so.6 Hi Jonathan, The above trace is the part which should be causing 100% CPU and glusterfs hang effect. Can you try applying patch http://patches.gluster.com/patch/742/ and see if it fixes the whole problem? This patch should fix the issue IMO. Regards,
Building and running tests now. To run a full test case and be sure it's working, it's likely going to take overnight.
I took the 2.0.4 tag and applied this patch. So far, we have not run into the issue that we have been able to reproduce easily with previous commits. I'll be doing more testing but this looks to fix the issue we were seeing.
(In reply to comment #22) > I took the 2.0.4 tag and applied this patch. So far, we have not run into the > issue that we have been able to reproduce easily with previous commits. I'll be > doing more testing but this looks to fix the issue we were seeing. Hi Jonathan, Can you check for the line 'possible infinite loop, breaking' in the log file? That can confirm that this case was actually hit earlier. Thanks,
I will schedule some time next week to test for the log entries. Sorry for the delay.
(In reply to comment #24) > I will schedule some time next week to test for the log entries. Sorry for the > delay. Hi Jonathan, Are you running into this in recent releases, specifically 2.0.9 and 3.0.0? We'd like an update on this bug to determine the future status of it. Thanks
If you see this error in 3.0.x, kindly reopen