+++ This bug was initially created as a clone of Bug #822067 +++ Created attachment 584900 [details] multi threaded program running on one of the fuse clients Description of problem: 3x2 distributed replicate volume. 2 fuse clients. 1 client executing threaded-io and the other client executing dbench. Volume set operations were running, brought a brick from each replicate pair with some intervals. Gave volume start force after some time and did volume heal (also find |xargs stat). glusterfs brick crashed with the following backtrace. Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id mirror.hyperspace.mnt-sda8'. Program terminated with signal 6, Aborted. #0 0x00007fb15b802d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. in ../nptl/sysdeps/unix/sysv/linux/raise.c (gdb) bt #0 0x00007fb15b802d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007fb15b806ab6 in abort () at abort.c:92 #2 0x00007fb15b7fb7c5 in __assert_fail (assertion=0x7fb1571e22b1 "!\"uuid null\"", file=<value optimized out>, line=1790, function=<value optimized out>) at assert.c:81 #3 0x00007fb1571dc2a9 in mq_fetch_child_size_and_contri (frame=0x7fb15a601100, cookie=0x7fb15a807e88, this=0x186ccd0, op_ret=0, op_errno=0, xdata=0x0) at ../../../../../xlators/features/marker/src/marker-quota.c:1790 #4 0x00007fb15c56529a in default_setxattr_cbk (frame=0x7fb15a807e88, cookie=0x7fb15a807724, this=0x186baa0, op_ret=0, op_errno=0, xdata=0x0) at ../../../libglusterfs/src/defaults.c:284 #5 0x00007fb157602bb9 in iot_setxattr_cbk (frame=0x7fb15a807724, cookie=0x7fb15a8100e0, this=0x186a810, op_ret=0, op_errno=0, xdata=0x0) at ../../../../../xlators/performance/io-threads/src/io-threads.c:1627 #6 0x00007fb15c56529a in default_setxattr_cbk (frame=0x7fb15a8100e0, cookie=0x7fb15a807bd8, this=0x18696b0, op_ret=0, op_errno=0, xdata=0x0) at ../../../libglusterfs/src/defaults.c:284 #7 0x00007fb157a390eb in posix_acl_setxattr_cbk (frame=0x7fb15a807bd8, cookie=0x7fb15a80a784, this=0x18684d0, op_ret=0, op_errno=0, xdata=0x0) at ../../../../../xlators/system/posix-acl/src/posix-acl.c:1802 #8 0x00007fb157c54ca3 in posix_setxattr (frame=0x7fb15a80a784, this=0x1867170, loc=0x7fb15a4d1074, dict=0x7fb15a47415c, flags=0, xdata=0x0) at ../../../../../xlators/storage/posix/src/posix.c:2417 #9 0x00007fb157a39391 in posix_acl_setxattr (frame=0x7fb15a807bd8, this=0x18684d0, loc=0x7fb15a4d1074, xattr=0x7fb15a47415c, flags=0, xdata=0x0) at ../../../../../xlators/system/posix-acl/src/posix-acl.c:1821 #10 0x00007fb15c56d32e in default_setxattr (frame=0x7fb15a8100e0, this=0x18696b0, loc=0x7fb15a4d1074, dict=0x7fb15a47415c, flags=0, xdata=0x0) at ../../../libglusterfs/src/defaults.c:889 #11 0x00007fb157602e15 in iot_setxattr_wrapper (frame=0x7fb15a807724, this=0x186a810, loc=0x7fb15a4d1074, dict=0x7fb15a47415c, flags=0, xdata=0x0) at ../../../../../xlators/performance/io-threads/src/io-threads.c:1636 #12 0x00007fb15c58748e in call_resume_wind (stub=0x7fb15a4d1034) at ../../../libglusterfs/src/call-stub.c:2531 #13 0x00007fb15c58ed9b in call_resume (stub=0x7fb15a4d1034) at ../../../libglusterfs/src/call-stub.c:4151 #14 0x00007fb1575f890d in iot_worker (data=0x18814f0) at ../../../../../xlators/performance/io-threads/src/io-threads.c:131 #15 0x00007fb15bef8d8c in start_thread (arg=0x7fb154d47700) at pthread_create.c:304 #16 0x00007fb15b8b504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #17 0x0000000000000000 in ?? () (gdb) f 3 #3 0x00007fb1571dc2a9 in mq_fetch_child_size_and_contri (frame=0x7fb15a601100, cookie=0x7fb15a807e88, this=0x186ccd0, op_ret=0, op_errno=0, xdata=0x0) at ../../../../../xlators/features/marker/src/marker-quota.c:1790 1790 GF_UUID_ASSERT (local->loc.gfid); (gdb) l 1785 mq_set_ctx_updation_status (local->ctx, _gf_false); 1786 1787 if (uuid_is_null (local->loc.gfid)) 1788 uuid_copy (local->loc.gfid, local->loc.inode->gfid); 1789 1790 GF_UUID_ASSERT (local->loc.gfid); 1791 1792 STACK_WIND (frame, mq_update_inode_contribution, FIRST_CHILD(this), 1793 FIRST_CHILD(this)->fops->lookup, &local->loc, newdict); 1794 (gdb) p local->loc $1 = {path = 0x1d89f10 "/clients/client12/~dmtmp/COREL/GRAPHIC1.CDR", name = 0x1d89f2f "GRAPHIC1.CDR", inode = 0x7fb155a08bec, parent = 0x7fb1559f8980, gfid = '\000' <repeats 15 times>, pargfid = "\aZ\354\345\343\aF\243\243\065\025\006\275 \234\215"} (gdb) p *local->loc.inode $2 = {table = 0x188ecb0, gfid = '\000' <repeats 15 times>, lock = 1, nlookup = 0, ref = 1, ia_type = IA_INVAL, fd_list = { next = 0x7fb155a08c1c, prev = 0x7fb155a08c1c}, dentry_list = {next = 0x7fb155a08c2c, prev = 0x7fb155a08c2c}, hash = { next = 0x7fb155a08c3c, prev = 0x7fb155a08c3c}, list = {next = 0x7fb155a0777c, prev = 0x188ed10}, _ctx = 0x7fb14cda6cc0} (gdb) info thr 23 Thread 31833 0x00007fb15bf0139d in fsync () at ../sysdeps/unix/syscall-template.S:82 22 Thread 31864 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 21 Thread 31868 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 20 Thread 31838 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 19 Thread 31817 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 18 Thread 31865 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 17 Thread 31869 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 16 Thread 31839 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 15 Thread 31840 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 14 Thread 31835 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 13 Thread 31867 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 12 Thread 31837 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 11 Thread 31866 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 10 Thread 31836 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 9 Thread 31773 0x00007fb15bf014bd in nanosleep () at ../sysdeps/unix/syscall-template.S:82 8 Thread 31834 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 7 Thread 31768 0x00007fb15b8b56a3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82 6 Thread 31771 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 5 Thread 31816 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 4 Thread 31770 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 3 Thread 31769 do_sigwait (set=<value optimized out>, sig=0x7fb15a28eeb8) at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:65 2 Thread 31820 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216 * 1 Thread 31863 0x00007fb15b802d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 (gdb) t 23 [Switching to thread 23 (Thread 31833)]#0 0x00007fb15bf0139d in fsync () at ../sysdeps/unix/syscall-template.S:82 82 ../sysdeps/unix/syscall-template.S: No such file or directory. in ../sysdeps/unix/syscall-template.S (gdb) bt #0 0x00007fb15bf0139d in fsync () at ../sysdeps/unix/syscall-template.S:82 #1 0x00007fb157c543a4 in posix_fsync (frame=0x7fb15a80889c, this=0x1867170, fd=0x1d8e29c, datasync=0, xdata=0x0) at ../../../../../xlators/storage/posix/src/posix.c:2346 #2 0x00007fb15c56deab in default_fsync (frame=0x7fb15a80d890, this=0x18684d0, fd=0x1d8e29c, flags=0, xdata=0x0) at ../../../libglusterfs/src/defaults.c:929 #3 0x00007fb15c56deab in default_fsync (frame=0x7fb15a81c668, this=0x18696b0, fd=0x1d8e29c, flags=0, xdata=0x0) at ../../../libglusterfs/src/defaults.c:929 #4 0x00007fb1575fe583 in iot_fsync_wrapper (frame=0x7fb15a81e958, this=0x186a810, fd=0x1d8e29c, datasync=0, xdata=0x0) at ../../../../../xlators/performance/io-threads/src/io-threads.c:1020 #5 0x00007fb15c587436 in call_resume_wind (stub=0x7fb15a4de810) at ../../../libglusterfs/src/call-stub.c:2522 #6 0x00007fb15c58ed9b in call_resume (stub=0x7fb15a4de810) at ../../../libglusterfs/src/call-stub.c:4151 #7 0x00007fb1575f890d in iot_worker (data=0x18814f0) at ../../../../../xlators/performance/io-threads/src/io-threads.c:131 #8 0x00007fb15bef8d8c in start_thread (arg=0x7fb155765700) at pthread_create.c:304 #9 0x00007fb15b8b504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #10 0x0000000000000000 in ?? () (gdb) f 1 #1 0x00007fb157c543a4 in posix_fsync (frame=0x7fb15a80889c, this=0x1867170, fd=0x1d8e29c, datasync=0, xdata=0x0) at ../../../../../xlators/storage/posix/src/posix.c:2346 2346 op_ret = fsync (_fd); (gdb) Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. create a 3x2 distribute replicate volume, start it and mount it via 2 fuse clients. 2. Run a multi-threaded application (attached) on one fuse and dbench on other client 3. do volume set opertions parallely 4. bring a brick from each replica pair at regular intervals (300 seconds), sleep for some time and do volume start force. 5. give volume heal from both gluster cli and find | xargs stat. Actual results: glusterfs brick crashed Expected results: glusterfs brick should not crash Additional info: gluster volume info Volume Name: mirror Type: Distributed-Replicate Volume ID: c15b0415-46ec-485d-a1c6-989783bb154a Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: hyperspace:/mnt/sda7/export4 Brick2: hyperspace:/mnt/sda8/export4 Brick3: hyperspace:/mnt/sda7/export5 Brick4: hyperspace:/mnt/sda8/export5 Brick5: hyperspace:/mnt/sda7/export6 Brick6: hyperspace:/mnt/sda8/export6 Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on features.quota: on performance.quick-read: on performance.read-ahead: on performance.stat-prefetch: off features.limit-usage: /:250GB FIL (security.capability) ==> -1 (No data available) [2012-05-16 13:33:37.766524] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 66632: GETXATTR /clients/client7/~dmtmp/SEED/LARGE. FIL (security.capability) ==> -1 (No data available) [2012-05-16 13:33:37.771436] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 66641: GETXATTR /clients/client8/~dmtmp/SEED/LARGE. FIL (security.capability) ==> -1 (No data available) [2012-05-16 13:33:37.994634] W [marker-quota.c:2047:mq_inspect_directory_xattr] 0-mirror-marker: cannot add a new contribution node [2012-05-16 13:33:37.996264] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 66929: GETXATTR /clients/client5/~dmtmp/ACCESS/FAST ENER.MDB (security.capability) ==> -1 (No data available) [2012-05-16 13:33:38.079332] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 67020: GETXATTR /clients/client13/~dmtmp/ACCESS/FAS TENER.MDB (security.capability) ==> -1 (No data available) [2012-05-16 13:33:38.123713] W [marker-quota.c:2047:mq_inspect_directory_xattr] 0-mirror-marker: cannot add a new contribution node [2012-05-16 13:33:38.226573] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 67172: GETXATTR /clients/client21/~dmtmp/ACCESS/SAL ES.PRN (security.capability) ==> -1 (No data available) [2012-05-16 13:33:38.290289] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 67245: GETXATTR /clients/client21/~dmtmp/ACCESS/SAL ES.PRN (security.capability) ==> -1 (No data available) [2012-05-16 13:33:38.406110] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 67404: GETXATTR /clients/client18/~dmtmp/WORDPRO/LW PSAV0.TMP (security.capability) ==> -1 (No data available) [2012-05-16 13:33:38.440258] W [marker-quota.c:2047:mq_inspect_directory_xattr] 0-mirror-marker: cannot add a new contribution node [2012-05-16 13:33:38.476349] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 67497: GETXATTR /clients/client18/~dmtmp/WORDPRO/LWPSAV0.TMP (security.capability) ==> -1 (No data available) [2012-05-16 13:33:38.528169] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 67562: GETXATTR /clients/client20/~dmtmp/SEED/SMALL.FIL (security.capability) ==> -1 (No data available) [2012-05-16 13:33:38.540610] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 67580: GETXATTR /clients/client20/~dmtmp/SEED/SMALL.FIL (security.capability) ==> -1 (No data available) pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2012-05-16 13:33:38 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3git /lib/x86_64-linux-gnu/libc.so.6(+0x33d80)[0x7fb15b802d80] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7fb15b802d05] --- Additional comment from rabhat on 2012-05-16 05:01:17 EDT --- Created attachment 584902 [details] header file for the program attached --- Additional comment from amarts on 2012-07-11 06:16:02 EDT --- fixed in patch @ http://review.gluster.com/3567
This bug is not seen in current master branch (which will get branched as RHS 2.1.0 soon). To consider it for fixing, want to make sure this bug still exists in RHS servers. If not reproduced, would like to close this.
https://code.engineering.redhat.com/gerrit/64 fixes the issue.
dbench was run overnight with added load from `find and stat', and doing graph changes along the way. This crash was not reproducible.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html