Bug 848318 - [27ae1677eb2a6ed4a04bda0df5cc92f2780c11ed]: glusterfs server crashed since loc->gfid was NULL
Summary: [27ae1677eb2a6ed4a04bda0df5cc92f2780c11ed]: glusterfs server crashed since lo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Raghavendra Bhat
QA Contact: Sachidananda Urs
URL:
Whiteboard:
Depends On: 822067
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-15 09:15 UTC by Vidya Sakar
Modified: 2013-09-23 22:33 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.4.0qa5-1
Doc Type: Bug Fix
Doc Text:
Clone Of: 822067
Environment:
Last Closed: 2013-09-23 22:33:04 UTC
Embargoed:


Attachments (Terms of Use)

Description Vidya Sakar 2012-08-15 09:15:56 UTC
+++ This bug was initially created as a clone of Bug #822067 +++

Created attachment 584900 [details]
multi threaded program running on one of the fuse clients

Description of problem:
3x2 distributed replicate volume. 2 fuse clients. 1 client executing threaded-io and the other client executing dbench. Volume set operations were running, brought a brick from each replicate pair with some intervals. Gave volume start force after some time and did volume heal (also find |xargs stat).

glusterfs brick crashed with the following backtrace.

Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id mirror.hyperspace.mnt-sda8'.
Program terminated with signal 6, Aborted.
#0  0x00007fb15b802d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
	in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0  0x00007fb15b802d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007fb15b806ab6 in abort () at abort.c:92
#2  0x00007fb15b7fb7c5 in __assert_fail (assertion=0x7fb1571e22b1 "!\"uuid null\"", file=<value optimized out>, line=1790, 
    function=<value optimized out>) at assert.c:81
#3  0x00007fb1571dc2a9 in mq_fetch_child_size_and_contri (frame=0x7fb15a601100, cookie=0x7fb15a807e88, this=0x186ccd0, op_ret=0, op_errno=0, 
    xdata=0x0) at ../../../../../xlators/features/marker/src/marker-quota.c:1790
#4  0x00007fb15c56529a in default_setxattr_cbk (frame=0x7fb15a807e88, cookie=0x7fb15a807724, this=0x186baa0, op_ret=0, op_errno=0, xdata=0x0)
    at ../../../libglusterfs/src/defaults.c:284
#5  0x00007fb157602bb9 in iot_setxattr_cbk (frame=0x7fb15a807724, cookie=0x7fb15a8100e0, this=0x186a810, op_ret=0, op_errno=0, xdata=0x0)
    at ../../../../../xlators/performance/io-threads/src/io-threads.c:1627
#6  0x00007fb15c56529a in default_setxattr_cbk (frame=0x7fb15a8100e0, cookie=0x7fb15a807bd8, this=0x18696b0, op_ret=0, op_errno=0, xdata=0x0)
    at ../../../libglusterfs/src/defaults.c:284
#7  0x00007fb157a390eb in posix_acl_setxattr_cbk (frame=0x7fb15a807bd8, cookie=0x7fb15a80a784, this=0x18684d0, op_ret=0, op_errno=0, 
    xdata=0x0) at ../../../../../xlators/system/posix-acl/src/posix-acl.c:1802
#8  0x00007fb157c54ca3 in posix_setxattr (frame=0x7fb15a80a784, this=0x1867170, loc=0x7fb15a4d1074, dict=0x7fb15a47415c, flags=0, xdata=0x0)
    at ../../../../../xlators/storage/posix/src/posix.c:2417
#9  0x00007fb157a39391 in posix_acl_setxattr (frame=0x7fb15a807bd8, this=0x18684d0, loc=0x7fb15a4d1074, xattr=0x7fb15a47415c, flags=0, 
    xdata=0x0) at ../../../../../xlators/system/posix-acl/src/posix-acl.c:1821
#10 0x00007fb15c56d32e in default_setxattr (frame=0x7fb15a8100e0, this=0x18696b0, loc=0x7fb15a4d1074, dict=0x7fb15a47415c, flags=0, 
    xdata=0x0) at ../../../libglusterfs/src/defaults.c:889
#11 0x00007fb157602e15 in iot_setxattr_wrapper (frame=0x7fb15a807724, this=0x186a810, loc=0x7fb15a4d1074, dict=0x7fb15a47415c, flags=0, 
    xdata=0x0) at ../../../../../xlators/performance/io-threads/src/io-threads.c:1636
#12 0x00007fb15c58748e in call_resume_wind (stub=0x7fb15a4d1034) at ../../../libglusterfs/src/call-stub.c:2531
#13 0x00007fb15c58ed9b in call_resume (stub=0x7fb15a4d1034) at ../../../libglusterfs/src/call-stub.c:4151
#14 0x00007fb1575f890d in iot_worker (data=0x18814f0) at ../../../../../xlators/performance/io-threads/src/io-threads.c:131
#15 0x00007fb15bef8d8c in start_thread (arg=0x7fb154d47700) at pthread_create.c:304
#16 0x00007fb15b8b504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#17 0x0000000000000000 in ?? ()
(gdb) f 3
#3  0x00007fb1571dc2a9 in mq_fetch_child_size_and_contri (frame=0x7fb15a601100, cookie=0x7fb15a807e88, this=0x186ccd0, op_ret=0, op_errno=0, 
    xdata=0x0) at ../../../../../xlators/features/marker/src/marker-quota.c:1790
1790	        GF_UUID_ASSERT (local->loc.gfid);
(gdb) l
1785	        mq_set_ctx_updation_status (local->ctx, _gf_false);
1786	
1787	        if (uuid_is_null (local->loc.gfid))
1788	                uuid_copy (local->loc.gfid, local->loc.inode->gfid);
1789	
1790	        GF_UUID_ASSERT (local->loc.gfid);
1791	
1792	        STACK_WIND (frame, mq_update_inode_contribution, FIRST_CHILD(this),
1793	                    FIRST_CHILD(this)->fops->lookup, &local->loc, newdict);
1794	
(gdb) p local->loc
$1 = {path = 0x1d89f10 "/clients/client12/~dmtmp/COREL/GRAPHIC1.CDR", name = 0x1d89f2f "GRAPHIC1.CDR", inode = 0x7fb155a08bec, 
  parent = 0x7fb1559f8980, gfid = '\000' <repeats 15 times>, pargfid = "\aZ\354\345\343\aF\243\243\065\025\006\275 \234\215"}
(gdb) p *local->loc.inode
$2 = {table = 0x188ecb0, gfid = '\000' <repeats 15 times>, lock = 1, nlookup = 0, ref = 1, ia_type = IA_INVAL, fd_list = {
    next = 0x7fb155a08c1c, prev = 0x7fb155a08c1c}, dentry_list = {next = 0x7fb155a08c2c, prev = 0x7fb155a08c2c}, hash = {
    next = 0x7fb155a08c3c, prev = 0x7fb155a08c3c}, list = {next = 0x7fb155a0777c, prev = 0x188ed10}, _ctx = 0x7fb14cda6cc0}
(gdb) info thr
  23 Thread 31833  0x00007fb15bf0139d in fsync () at ../sysdeps/unix/syscall-template.S:82
  22 Thread 31864  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  21 Thread 31868  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  20 Thread 31838  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  19 Thread 31817  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  18 Thread 31865  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  17 Thread 31869  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  16 Thread 31839  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  15 Thread 31840  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  14 Thread 31835  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  13 Thread 31867  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  12 Thread 31837  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  11 Thread 31866  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  10 Thread 31836  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  9 Thread 31773  0x00007fb15bf014bd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
  8 Thread 31834  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
  7 Thread 31768  0x00007fb15b8b56a3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82
  6 Thread 31771  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  5 Thread 31816  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  4 Thread 31770  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  3 Thread 31769  do_sigwait (set=<value optimized out>, sig=0x7fb15a28eeb8)
    at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:65
  2 Thread 31820  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
* 1 Thread 31863  0x00007fb15b802d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
(gdb) t 23
[Switching to thread 23 (Thread 31833)]#0  0x00007fb15bf0139d in fsync () at ../sysdeps/unix/syscall-template.S:82
82	../sysdeps/unix/syscall-template.S: No such file or directory.
	in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0  0x00007fb15bf0139d in fsync () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fb157c543a4 in posix_fsync (frame=0x7fb15a80889c, this=0x1867170, fd=0x1d8e29c, datasync=0, xdata=0x0)
    at ../../../../../xlators/storage/posix/src/posix.c:2346
#2  0x00007fb15c56deab in default_fsync (frame=0x7fb15a80d890, this=0x18684d0, fd=0x1d8e29c, flags=0, xdata=0x0)
    at ../../../libglusterfs/src/defaults.c:929
#3  0x00007fb15c56deab in default_fsync (frame=0x7fb15a81c668, this=0x18696b0, fd=0x1d8e29c, flags=0, xdata=0x0)
    at ../../../libglusterfs/src/defaults.c:929
#4  0x00007fb1575fe583 in iot_fsync_wrapper (frame=0x7fb15a81e958, this=0x186a810, fd=0x1d8e29c, datasync=0, xdata=0x0)
    at ../../../../../xlators/performance/io-threads/src/io-threads.c:1020
#5  0x00007fb15c587436 in call_resume_wind (stub=0x7fb15a4de810) at ../../../libglusterfs/src/call-stub.c:2522
#6  0x00007fb15c58ed9b in call_resume (stub=0x7fb15a4de810) at ../../../libglusterfs/src/call-stub.c:4151
#7  0x00007fb1575f890d in iot_worker (data=0x18814f0) at ../../../../../xlators/performance/io-threads/src/io-threads.c:131
#8  0x00007fb15bef8d8c in start_thread (arg=0x7fb155765700) at pthread_create.c:304
#9  0x00007fb15b8b504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#10 0x0000000000000000 in ?? ()
(gdb) f 1
#1  0x00007fb157c543a4 in posix_fsync (frame=0x7fb15a80889c, this=0x1867170, fd=0x1d8e29c, datasync=0, xdata=0x0)
    at ../../../../../xlators/storage/posix/src/posix.c:2346
2346	                op_ret = fsync (_fd);
(gdb) 





Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. create a 3x2 distribute replicate volume, start it and mount it via 2 fuse clients.
2. Run a multi-threaded application (attached) on one fuse and dbench on other client
3. do volume set opertions parallely
4. bring a brick from each replica pair at regular intervals (300 seconds), sleep for some time and do volume start force.
5. give volume heal from both gluster cli and find | xargs stat.
  
Actual results:

glusterfs brick crashed

Expected results:

glusterfs brick should not crash

Additional info:
gluster volume info
 
Volume Name: mirror
Type: Distributed-Replicate
Volume ID: c15b0415-46ec-485d-a1c6-989783bb154a
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: hyperspace:/mnt/sda7/export4
Brick2: hyperspace:/mnt/sda8/export4
Brick3: hyperspace:/mnt/sda7/export5
Brick4: hyperspace:/mnt/sda8/export5
Brick5: hyperspace:/mnt/sda7/export6
Brick6: hyperspace:/mnt/sda8/export6
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.quota: on
performance.quick-read: on
performance.read-ahead: on
performance.stat-prefetch: off
features.limit-usage: /:250GB


FIL (security.capability) ==> -1 (No data available)
[2012-05-16 13:33:37.766524] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 66632: GETXATTR /clients/client7/~dmtmp/SEED/LARGE.
FIL (security.capability) ==> -1 (No data available)
[2012-05-16 13:33:37.771436] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 66641: GETXATTR /clients/client8/~dmtmp/SEED/LARGE.
FIL (security.capability) ==> -1 (No data available)
[2012-05-16 13:33:37.994634] W [marker-quota.c:2047:mq_inspect_directory_xattr] 0-mirror-marker: cannot add a new contribution node
[2012-05-16 13:33:37.996264] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 66929: GETXATTR /clients/client5/~dmtmp/ACCESS/FAST
ENER.MDB (security.capability) ==> -1 (No data available)
[2012-05-16 13:33:38.079332] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 67020: GETXATTR /clients/client13/~dmtmp/ACCESS/FAS
TENER.MDB (security.capability) ==> -1 (No data available)
[2012-05-16 13:33:38.123713] W [marker-quota.c:2047:mq_inspect_directory_xattr] 0-mirror-marker: cannot add a new contribution node
[2012-05-16 13:33:38.226573] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 67172: GETXATTR /clients/client21/~dmtmp/ACCESS/SAL
ES.PRN (security.capability) ==> -1 (No data available)
[2012-05-16 13:33:38.290289] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 67245: GETXATTR /clients/client21/~dmtmp/ACCESS/SAL
ES.PRN (security.capability) ==> -1 (No data available)
[2012-05-16 13:33:38.406110] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 67404: GETXATTR /clients/client18/~dmtmp/WORDPRO/LW
PSAV0.TMP (security.capability) ==> -1 (No data available)
[2012-05-16 13:33:38.440258] W [marker-quota.c:2047:mq_inspect_directory_xattr] 0-mirror-marker: cannot add a new contribution node
[2012-05-16 13:33:38.476349] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 67497: GETXATTR /clients/client18/~dmtmp/WORDPRO/LWPSAV0.TMP (security.capability) ==> -1 (No data available)
[2012-05-16 13:33:38.528169] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 67562: GETXATTR /clients/client20/~dmtmp/SEED/SMALL.FIL (security.capability) ==> -1 (No data available)
[2012-05-16 13:33:38.540610] I [server3_1-fops.c:823:server_getxattr_cbk] 0-mirror-server: 67580: GETXATTR /clients/client20/~dmtmp/SEED/SMALL.FIL (security.capability) ==> -1 (No data available)
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2012-05-16 13:33:38
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3git
/lib/x86_64-linux-gnu/libc.so.6(+0x33d80)[0x7fb15b802d80]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7fb15b802d05]

--- Additional comment from rabhat on 2012-05-16 05:01:17 EDT ---

Created attachment 584902 [details]
header file for the program attached

--- Additional comment from amarts on 2012-07-11 06:16:02 EDT ---

fixed in patch @  http://review.gluster.com/3567

Comment 2 Amar Tumballi 2012-08-23 06:45:12 UTC
This bug is not seen in current master branch (which will get branched as RHS 2.1.0 soon). To consider it for fixing, want to make sure this bug still exists in RHS servers. If not reproduced, would like to close this.

Comment 3 Raghavendra Bhat 2012-10-17 08:53:26 UTC
https://code.engineering.redhat.com/gerrit/64 fixes the issue.

Comment 4 Sachidananda Urs 2013-01-09 07:23:30 UTC
dbench was run overnight with added load from `find and stat', and doing graph changes along the way. This crash was not reproducible.

Comment 6 Scott Haines 2013-09-23 22:33:04 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.