Bug 804606 - [glusterfs-3.3.0qa28]: glusterfs client crashed when all the bricks were down
[glusterfs-3.3.0qa28]: glusterfs client crashed when all the bricks were down
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
mainline
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Pranith Kumar K
Raghavendra Bhat
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-19 08:11 EDT by Raghavendra Bhat
Modified: 2015-12-01 11:45 EST (History)
3 users (show)

See Also:
Fixed In Version: glusterfs-3.5.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-04-17 07:38:13 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Raghavendra Bhat 2012-03-19 08:11:08 EDT
Description of problem:
3 replica volume with 3 fuse and 3 nfs clients. One of the fuse clients crashed since all the bricks of the volume were down. (Actually 2 of the bricks were crashed, and 1 brick was killed earlier to crashes to test self-heal).

This is the backtrace of the crash.

GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/sbin/glusterfs...done.
[New Thread 21101]
[New Thread 21103]
[New Thread 21109]
[New Thread 21107]
[New Thread 21102]
[New Thread 21108]
[New Thread 21104]
Reading symbols from /usr/local/lib/libglusterfs.so.0...done.
Loaded symbols for /usr/local/lib/libglusterfs.so.0
Reading symbols from /usr/local/lib/libgfrpc.so.0...done.
Loaded symbols for /usr/local/lib/libgfrpc.so.0
Reading symbols from /usr/local/lib/libgfxdr.so.0...done.
Loaded symbols for /usr/local/lib/libgfxdr.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa28/xlator/mount/fuse.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa28/xlator/mount/fuse.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa28/rpc-transport/socket.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa28/rpc-transport/socket.so
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa28/xlator/protocol/client.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa28/xlator/protocol/client.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa28/xlator/cluster/replicate.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa28/xlator/cluster/replicate.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa28/xlator/performance/write-behind.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa28/xlator/performance/write-behind.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa28/xlator/performance/read-ahead.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa28/xlator/performance/read-ahead.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa28/xlator/performance/io-cache.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa28/xlator/performance/io-cache.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa28/xlator/performance/quick-read.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa28/xlator/performance/quick-read.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa28/xlator/performance/md-cache.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa28/xlator/performance/md-cache.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa28/xlator/debug/io-stats.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa28/xlator/debug/io-stats.so
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
Core was generated by `/usr/local/sbin/glusterfs --volfile-id=mirror --volfile-server=10.16.156.15 /mn'.
Program terminated with signal 6, Aborted.
#0  0x0000003a29c32885 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) bt
#0  0x0000003a29c32885 in raise () from /lib64/libc.so.6
#1  0x0000003a29c34065 in abort () from /lib64/libc.so.6
#2  0x0000003a29c2b9fe in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003a29c2bac0 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f4fc4063c94 in afr_sh_save_child_iatts_from_policy (children=0x7f4fb800fe50, bufs=0x7f4fb800a510, save=0x7f4fbc1acb30, 
    child_count=3) at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1588
#5  0x00007f4fc4063f11 in afr_sh_children_lookup_done (frame=0x7f4fc747a894, this=0x1f3e270, op_ret=0, op_errno=107)
    at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1635
#6  0x00007f4fc4062f62 in afr_sh_common_lookup_cbk (frame=0x7f4fc747a894, cookie=0x1, this=0x1f3e270, op_ret=0, op_errno=107, 
    inode=0x7f4fb652d174, buf=0x7fffcd1c1d90, xattr=0x0, postparent=0x7fffcd1c1d20)
    at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1316
#7  0x00007f4fc42cd597 in client3_1_lookup_cbk (req=0x7f4fb7ef51ec, iov=0x7fffcd1c1fa0, count=1, myframe=0x7f4fc7681c64)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:2185
#8  0x00007f4fc8626ab4 in saved_frames_unwind (saved_frames=0x205b210) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:387
#9  0x00007f4fc8626b63 in saved_frames_destroy (frames=0x205b210) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:405
#10 0x00007f4fc862711d in rpc_clnt_connection_cleanup (conn=0x1f91a40) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:567
#11 0x00007f4fc8627bfe in rpc_clnt_notify (trans=0x1fa1550, mydata=0x1f91a40, event=RPC_TRANSPORT_DISCONNECT, data=0x1fa1550)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:870
#12 0x00007f4fc8623e78 in rpc_transport_notify (this=0x1fa1550, event=RPC_TRANSPORT_DISCONNECT, data=0x1fa1550)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#13 0x00007f4fc510a1c7 in socket_event_poll_err (this=0x1fa1550) at ../../../../../rpc/rpc-transport/socket/src/socket.c:694
#14 0x00007f4fc510e880 in socket_event_handler (fd=14, idx=7, data=0x1fa1550, poll_in=1, poll_out=0, poll_err=16)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1808
#15 0x00007f4fc887e0c4 in event_dispatch_epoll (event_pool=0x1dc8c30) at ../../../libglusterfs/src/event.c:816
#16 0x00007f4fc887e2e7 in event_pool_new (count=1) at ../../../libglusterfs/src/event.c:893
#17 0x00007f4fc887e672 in list_del (old=0xffffffff29a21188) at ../../../libglusterfs/src/list.h:61
#18 0x0000000000407ecd in main (argc=4, argv=0x7fffcd1c2618) at ../../../glusterfsd/src/glusterfsd.c:1609
(gdb) f 4
#4  0x00007f4fc4063c94 in afr_sh_save_child_iatts_from_policy (children=0x7f4fb800fe50, bufs=0x7f4fb800a510, save=0x7f4fbc1acb30, 
    child_count=3) at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1588
1588	        GF_ASSERT (saved);
(gdb) p saved
$1 = _gf_false
(gdb) l afr_sh_save_child_iatts_from_policy
1567	
1568	void
1569	afr_sh_save_child_iatts_from_policy (int32_t *children, struct iatt *bufs,
1570	                                     struct iatt *save,
1571	                                     unsigned int child_count)
1572	{
1573	        int             i = 0;
1574	        int             child = 0;
1575	        gf_boolean_t    saved = _gf_false;
1576	
(gdb) 
1577	        GF_ASSERT (save);
1578	        //if iatt buf with gfid exists sets it
1579	        for (i = 0; i < child_count; i++) {
1580	                child = children[i];
1581	                if (child == -1)
1582	                        break;
1583	                *save = bufs[child];
1584	                saved = _gf_true;
1585	                if (!uuid_is_null (save->ia_gfid))
1586	                        break;
(gdb) 
1587	        }
1588	        GF_ASSERT (saved);
1589	}
1590	
1591	void
1592	afr_get_children_of_fresh_parent_dirs (afr_self_heal_t *sh,
1593	                                       unsigned int child_count)
1594	{
1595	        afr_children_intersection_get (sh->success_children,
1596	                                       sh->fresh_parent_dirs,
(gdb) p children[0]
$2 = -1
(gdb) p children[0][1P]1]
$3 = -1
(gdb) p children[1][1P]2]
$4 = -1
(gdb) p child_count
$5 = 3
(gdb) quit[K[K[K[Kinfo thr
  7 Thread 0x7f4fc5d19700 (LWP 21104)  0x0000003a2a40b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6 Thread 0x7f4fbe29a700 (LWP 21108)  0x0000003a29cddae7 in readv () from /lib64/libc.so.6
  5 Thread 0x7f4fc711b700 (LWP 21102)  0x0000003a2a40f245 in sigwait () from /lib64/libpthread.so.0
  4 Thread 0x7f4fc4eee700 (LWP 21107)  0x0000003a2a40eccd in nanosleep () from /lib64/libpthread.so.0
  3 Thread 0x7f4fbd899700 (LWP 21109)  0x0000003a2a40e4ed in read () from /lib64/libpthread.so.0
  2 Thread 0x7f4fc671a700 (LWP 21103)  0x0000003a2a40b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 0x7f4fc83f0700 (LWP 21101)  0x0000003a29c32885 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003a29c32885 in raise () from /lib64/libc.so.6
#1  0x0000003a29c34065 in abort () from /lib64/libc.so.6
#2  0x0000003a29c2b9fe in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003a29c2bac0 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f4fc4063c94 in afr_sh_save_child_iatts_from_policy (children=0x7f4fb800fe50, bufs=0x7f4fb800a510, save=0x7f4fbc1acb30, 
    child_count=3) at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1588
#5  0x00007f4fc4063f11 in afr_sh_children_lookup_done (frame=0x7f4fc747a894, this=0x1f3e270, op_ret=0, op_errno=107)
    at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1635
#6  0x00007f4fc4062f62 in afr_sh_common_lookup_cbk (frame=0x7f4fc747a894, cookie=0x1, this=0x1f3e270, op_ret=0, op_errno=107, 
    inode=0x7f4fb652d174, buf=0x7fffcd1c1d90, xattr=0x0, postparent=0x7fffcd1c1d20)
    at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1316
#7  0x00007f4fc42cd597 in client3_1_lookup_cbk (req=0x7f4fb7ef51ec, iov=0x7fffcd1c1fa0, count=1, myframe=0x7f4fc7681c64)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:2185
#8  0x00007f4fc8626ab4 in saved_frames_unwind (saved_frames=0x205b210) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:387
#9  0x00007f4fc8626b63 in saved_frames_destroy (frames=0x205b210) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:405
#10 0x00007f4fc862711d in rpc_clnt_connection_cleanup (conn=0x1f91a40) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:567
#11 0x00007f4fc8627bfe in rpc_clnt_notify (trans=0x1fa1550, mydata=0x1f91a40, event=RPC_TRANSPORT_DISCONNECT, data=0x1fa1550)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:870
#12 0x00007f4fc8623e78 in rpc_transport_notify (this=0x1fa1550, event=RPC_TRANSPORT_DISCONNECT, data=0x1fa1550)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#13 0x00007f4fc510a1c7 in socket_event_poll_err (this=0x1fa1550) at ../../../../../rpc/rpc-transport/socket/src/socket.c:694
#14 0x00007f4fc510e880 in socket_event_handler (fd=14, idx=7, data=0x1fa1550, poll_in=1, poll_out=0, poll_err=16)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1808
#15 0x00007f4fc887e0c4 in event_dispatch_epoll (event_pool=0x1dc8c30) at ../../../libglusterfs/src/event.c:816
#16 0x00007f4fc887e2e7 in event_pool_new (count=1) at ../../../libglusterfs/src/event.c:893
#17 0x00007f4fc887e672 in list_del (old=0xffffffff29a21188) at ../../../libglusterfs/src/list.h:61
#18 0x0000000000407ecd in main (argc=4, argv=0x7fffcd1c2618) at ../../../glusterfsd/src/glusterfsd.c:1609
(gdb) f 5
#5  0x00007f4fc4063f11 in afr_sh_children_lookup_done (frame=0x7f4fc747a894, this=0x1f3e270, op_ret=0, op_errno=107)
    at ../../../../../xlators/cluster/afr/src/afr-self-heal-common.c:1635
1635	                afr_sh_save_child_iatts_from_policy (sh->fresh_children,
(gdb) l
1630	                sh->op_failed = 1;
1631	                afr_sh_purge_entry (frame, this);
1632	        } else if (!afr_conflicting_iattrs (sh->buf, sh->fresh_children,
1633	                                            priv->child_count, local->loc.path,
1634	                                            this->name)) {
1635	                afr_sh_save_child_iatts_from_policy (sh->fresh_children,
1636	                                                     sh->buf, &sh->entrybuf,
1637	                                                     priv->child_count);
1638	                afr_update_gfid_from_iatts (sh->sh_gfid_req, sh->buf,
1639	                                            sh->fresh_children,
(gdb) quit



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:

glusterfs client asserted since all the bricks were down (while self-healing)
Expected results:
glusterfs client should not crash.

Additional info:
Comment 1 Jeff Darcy 2012-10-31 10:12:21 EDT
http://review.gluster.org/3092 has been posted for this (but for some reason incorrectly shows up as "rfc" in Gerrit).
Comment 2 Niels de Vos 2014-04-17 07:38:13 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.