Bug 767224 - [289c2902d6a81f7a5b6da04c24cc955bd5427178] client crash with segfault at inode_link in inode.c
Summary: [289c2902d6a81f7a5b6da04c24cc955bd5427178] client crash with segfault at inod...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-12-13 15:05 UTC by Rahul C S
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-04-17 13:46:44 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
glusterfs logs (3.99 MB, application/x-compressed-tar)
2011-12-13 15:05 UTC, Rahul C S
no flags Details

Description Rahul C S 2011-12-13 15:05:57 UTC
Created attachment 546245 [details]
glusterfs logs

Description of problem:
Crash happened after running dbench second time with xattr option enabled while a stat had been called on the mountpoint from another client.

A brick process up & down was done before the stat as well.

Core was generated by `/usr/local/sbin/glusterfs --volfile-id=vol --volfile-server=dagobah mount/'.
Program terminated with signal 11, Segmentation fault.
#0  __pthread_mutex_lock (mutex=0x100000000) at pthread_mutex_lock.c:50
50	pthread_mutex_lock.c: No such file or directory.
	in pthread_mutex_lock.c
(gdb) bt
#0  __pthread_mutex_lock (mutex=0x100000000) at pthread_mutex_lock.c:50
#1  0x00007f9a72352c80 in inode_link (inode=0x1905900, parent=0x7f9a595a604c, name=0x7f9a600600f1 "clients", iatt=0x7f9a5ac5742c)
    at ../../../libglusterfs/src/inode.c:812
#2  0x00007f9a703edcac in fuse_entry_cbk (frame=0x7f9a706b4878, cookie=0x7f9a7092adfc, this=0x1895e30, op_ret=0, op_errno=22, inode=0x1905900, 
    buf=0x7f9a5ac5742c) at ../../../../../xlators/mount/fuse/src/fuse-bridge.c:195
#3  0x00007f9a703ee1ec in fuse_lookup_cbk (frame=0x7f9a706b4878, cookie=0x7f9a7092adfc, this=0x1895e30, op_ret=0, op_errno=22, inode=0x1905900, 
    stat=0x7f9a5ac5742c, dict=0x7f9a5acd404c, postparent=0x7f9a7092adfc) at ../../../../../xlators/mount/fuse/src/fuse-bridge.c:280
#4  0x00007f9a6e3285e0 in client3_1_xattrop_cbk (req=0x7f9a5acd404c, iov=0x7f9a5acd408c, count=1, myframe=0x7f9a7092adfc)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1425
#5  0x00007f9a721189c6 in rpc_clnt_handle_reply (clnt=0x7f9a6009c800, pollin=0x190e580) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:789
#6  0x00007f9a72118d28 in rpc_clnt_notify (trans=0x7f9a6009cad0, mydata=0x7f9a6009c830, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x190e580)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:908
#7  0x00007f9a72114e3d in rpc_transport_notify (this=0x7f9a6009cad0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x190e580)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#8  0x00007f9a6ef8a359 in socket_event_poll_in (this=0x7f9a6009cad0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1675
#9  0x00007f9a6ef8a8cd in socket_event_handler (fd=43, idx=23, data=0x7f9a6009cad0, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1790
#10 0x00007f9a723694b9 in event_dispatch_epoll_handler (event_pool=0x18952d0, events=0x189aa40, i=0) at ../../../libglusterfs/src/event.c:794
#11 0x00007f9a723696d3 in event_dispatch_epoll (event_pool=0x18952d0) at ../../../libglusterfs/src/event.c:856
#12 0x00007f9a72369a45 in event_dispatch (event_pool=0x18952d0) at ../../../libglusterfs/src/event.c:956
#13 0x0000000000407d83 in main (argc=4, argv=0x7fff4c81e6f8) at ../../../glusterfsd/src/glusterfsd.c:1601
(gdb) info thr
  10 Thread 7365  0x00007f9a718174f9 in __libc_readv (fd=5, vector=0x7f9a671d2e50, count=2) at ../sysdeps/unix/sysv/linux/readv.c:56
  9 Thread 7549  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  8 Thread 7364  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  7 Thread 7671  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  6 Thread 7363  0x00007f9a71add4bd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
  5 Thread 7658  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  4 Thread 7360  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  3 Thread 7562  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  2 Thread 7359  do_sigwait (set=<value optimized out>, sig=0x7f9a703e3eb8)
    at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:65
* 1 Thread 7358  __pthread_mutex_lock (mutex=0x100000000) at pthread_mutex_lock.c:50
(gdb) f 1
#1  0x00007f9a72352c80 in inode_link (inode=0x1905900, parent=0x7f9a595a604c, name=0x7f9a600600f1 "clients", iatt=0x7f9a5ac5742c)
    at ../../../libglusterfs/src/inode.c:812
812	        pthread_mutex_lock (&table->lock);
(gdb) l
807	                return NULL;
808	        }
809	
810	        table = inode->table;
811	
812	        pthread_mutex_lock (&table->lock);
813	        {
814	                linked_inode = __inode_link (inode, parent, name, iatt);
815	
816	                if (linked_inode)
(gdb) p *inode
$1 = {table = 0x100000000, gfid = "\002\000\000\000\001\000\000\000p\201\221\001\000\000\000", lock = 25984256, nlookup = 26229184, ref = 0, 
  ia_type = IA_INVAL, fd_list = {next = 0x1, prev = 0xbaadf00d}, dentry_list = {next = 0x0, prev = 0xd1}, hash = {next = 0x9800000057, 
    prev = 0x6008c29000000000}, list = {next = 0xcafebabe00007f9a, prev = 0x0}, _ctx = 0xd942de7300000001}
(gdb) f 3
#3  0x00007f9a703ee1ec in fuse_lookup_cbk (frame=0x7f9a706b4878, cookie=0x7f9a7092adfc, this=0x1895e30, op_ret=0, op_errno=22, inode=0x1905900, 
    stat=0x7f9a5ac5742c, dict=0x7f9a5acd404c, postparent=0x7f9a7092adfc) at ../../../../../xlators/mount/fuse/src/fuse-bridge.c:280
280	        fuse_entry_cbk (frame, cookie, this, op_ret, op_errno, inode, stat);
(gdb) p *state
$8 = {pool = 0x18953a0, this = 0x1895e30, itable = 0x0, loc = {path = 0x7f9a600600f0 "/clients", name = 0x7f9a600600f1 "clients", inode = 0x7f9a595a60e0, 
    parent = 0x7f9a595a604c, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, loc2 = {path = 0x0, name = 0x0, inode = 0x0, 
    parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, finh = 0x7f9a60006860, flags = 0, off = 0, size = 0, nlookup = 0, 
  fd = 0x0, dict = 0x0, name = 0x0, is_revalidate = 1 '\001', truncate_needed = _gf_false, lock = 1, lk_owner = 0, resolve = {type = 0, ino = 0, gen = 0, 
    par = 0, fd = 0x0, path = 0x7f9a60034170 "/clients", bname = 0x7f9a6000b3b0 "clients", gfid = "\275,MY$fI;\275z\202\210\212ϸF", 
    pargfid = '\000' <repeats 15 times>, "\001", resolved = 0x0, op_ret = 0, op_errno = 0, deep_loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, 
      gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, components = 0x0, comp_count = 0}, resolve2 = {type = 0, ino = 0, gen = 0, 
    par = 0, fd = 0x0, path = 0x0, bname = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>, resolved = 0x0, op_ret = 0, 
    op_errno = 0, deep_loc = {path = 0x0, name = 0x0, inode = 0x0, parent = 0x0, gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}, 
    components = 0x0, comp_count = 0}, loc_now = 0x0, resolve_now = 0x0, resume_fn = 0x7f9a703ee1f3, valid = 0, mask = 0, rdev = 0, mode = 0, attr = {
    ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, ia_dev = 0, ia_type = IA_INVAL, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', 
      owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', 
        write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, 
    ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0}, lk_lock = {l_type = 0, l_whence = 0, l_start = 0, l_len = 0, 
    l_pid = 0, l_owner = 0}, vector = {iov_base = 0x0, iov_len = 0}, gfid = '\000' <repeats 15 times>}
(gdb) p *state->loc.inode
$9 = {table = 0x7f9a68007e30, gfid = "\275,MY$fI;\275z\202\210\212ϸF", lock = 1, nlookup = 0, ref = 43, ia_type = IA_IFDIR, fd_list = {
    next = 0x7f9a595a6110, prev = 0x7f9a595a6110}, dentry_list = {next = 0x7f9a5945504c, prev = 0x7f9a5945504c}, hash = {next = 0x7f9a5940c490, 
    prev = 0x7f9a5940c490}, list = {next = 0x7f9a595a60ac, prev = 0x7f9a595aa0d8}, _ctx = 0x18fac50}
(gdb) p *state->loc.inode->table
$10 = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
    __size = '\000' <repeats 39 times>, __align = 0}, hashsize = 14057, name = 0x7f9a6803b7e0 "vol/inode", root = 0x7f9a595a604c, xl = 0x7f9a6008c290, 
  lru_limit = 0, inode_hash = 0x7f9a59354030, name_hash = 0x7f9a5931d030, active = {next = 0x7f9a595abe54, prev = 0x7f9a595a60ac}, active_size = 44, lru = {
    next = 0x7f9a68007ea8, prev = 0x7f9a68007ea8}, lru_size = 0, purge = {next = 0x7f9a68007ec0, prev = 0x7f9a68007ec0}, purge_size = 0, 
  inode_pool = 0x7f9a68076cf0, dentry_pool = 0x7f9a680212b0, fd_mem_pool = 0x7f9a68040cc0}

After a lot of setxattr & getxattr failed calls, i am getting
[2011-12-13 18:17:50.013317] W [client3_1-fops.c:899:client3_1_getxattr_cbk] 4-vol-client-5: remote operation failed: No data available. Path: (null)
[2011-12-13 18:17:50.017972] W [client3_1-fops.c:899:client3_1_getxattr_cbk] 4-vol-client-4: remote operation failed: No data available. Path: (null)
[2011-12-13 18:17:50.070850] W [client3_1-fops.c:899:client3_1_getxattr_cbk] 4-vol-client-0: remote operation failed: No data available. Path: (null)
[2011-12-13 18:17:50.075770] W [client3_1-fops.c:899:client3_1_getxattr_cbk] 4-vol-client-1: remote operation failed: No data available. Path: (null)
[2011-12-13 18:17:50.283978] W [client3_1-fops.c:899:client3_1_getxattr_cbk] 4-vol-client-2: remote operation failed: No data available. Path: (null)
[2011-12-13 18:17:50.412274] W [client3_1-fops.c:899:client3_1_getxattr_cbk] 4-vol-client-0: remote operation failed: No data available. Path: (null)
[2011-12-13 18:17:50.415153] W [client3_1-fops.c:899:client3_1_getxattr_cbk] 4-vol-client-1: remote operation failed: No data available. Path: (null)
[2011-12-13 18:17:50.448554] W [client3_1-fops.c:899:client3_1_getxattr_cbk] 4-vol-client-2: remote operation failed: No data available. Path: (null)
[2011-12-13 18:17:50.581526] W [client3_1-fops.c:899:client3_1_getxattr_cbk] 4-vol-client-2: remote operation failed: No data available. Path: (null)
[2011-12-13 18:17:51.614044] W [client3_1-fops.c:899:client3_1_getxattr_cbk] 4-vol-client-2: remote operation failed: No data available. Path: (null)
[2011-12-13 18:17:51.646347] W [client3_1-fops.c:899:client3_1_getxattr_cbk] 4-vol-client-1: remote operation failed: No data available. Path: (null)
[2011-12-13 18:17:51.646820] W [client3_1-fops.c:899:client3_1_getxattr_cbk] 4-vol-client-0: remote operation failed: No data available. Path: (null)
these in the client log file before the crash


How reproducible:
Not so frequent

Steps to Reproduce:
1. Distributed replicate volume (3x2)
2. Enabled & Disabled quota/georep with stat-prefetch on/off to do some graph changes
3. From a fuse client, ran dbench -s -F -S --stat-check 10.
4. Killed a process & brought it back up.
5. From another fuse client ran find . | xargs stat
6. From the first fuse client, i ran dbench -s -F -S -x --one-byte-write-fix --stat-check 10. 
7. This crashed the client

Actual results:
The dbench test errors out & client crashed.

Expected results:
The dbench test should complete.

Additional info:
I have attached the logs for bricks & clients.

Comment 1 Pranith Kumar K 2012-01-11 02:51:03 UTC
Could you reproduce this using valgrind?.

Comment 2 Rahul C S 2012-02-24 07:47:05 UTC
No did not get it again. Will try one more time.

Comment 4 Vijay Bellur 2012-04-17 13:46:44 UTC
Not reproducible anymore. Shwetha has tested this several times.


Note You need to log in before you can comment on or make changes to this bug.