Bug 787671

Summary: glusterfs client crashed with SIGABRT while mounting.
Product: [Community] GlusterFS Reporter: M S Vishwanath Bhat <vbhat>
Component: coreAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: urgent    
Version: mainlineCC: gluster-bugs, mzywusko
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:56:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 817967    
Attachments:
Description Flags
fuse client log none

Description M S Vishwanath Bhat 2012-02-06 13:28:07 UTC
Created attachment 559632 [details]
fuse client log

Description of problem:
Created a 2*3 distribute-replicate volume created some data on mountpoint. Deleted the .glusterfs from the back-end in umounted client. Now when I tried to mount again fuse client crashed.

Version-Release number of selected component (if applicable):
glusterfs-3.3.30qa21

How reproducible:
1/1

Steps to Reproduce:
1. Create and start a 2*3 distribute-replicate volume.
2. Mount the volume via fuse.
3. Download the linux kernel and untar it on the mountpoint.
4. Now on one leg of dht remove .glusterfs from back-end i.e. from three servers.
5. ran ls and find on the mountpoint all returning nothing.
6. umounted and mounted again, client process crashed but no core.
7. ran the mount command from gdb.
  
Actual results:
Client process crashed with following backtrace.

Program received signal SIGABRT, Aborted.
0x000000334ca32905 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x000000334ca32905 in raise () from /lib64/libc.so.6
#1  0x000000334ca340e5 in abort () from /lib64/libc.so.6
#2  0x000000334ca2b9be in __assert_fail_base () from /lib64/libc.so.6
#3  0x000000334ca2ba80 in __assert_fail () from /lib64/libc.so.6
#4  0x00007ffff260267e in afr_build_parent_loc (parent=0x67d318, child=0x7fffe4001128) at afr-dir-write.c:57
#5  0x00007ffff2605c30 in afr_mkdir (frame=0x7ffff69be270, this=0x65ae70, loc=0x7fffe4001128, mode=16877, params=0x680c20) at afr-dir-write.c:758
#6  0x00007ffff23ab4af in dht_selfheal_dir_mkdir (frame=0x7ffff69bd454, loc=0x7fffe4001128, layout=0x682530, force=0) at dht-selfheal.c:434
#7  0x00007ffff23ac8f0 in dht_selfheal_directory (frame=0x7ffff69bd454, dir_cbk=0x7ffff23b56ff <dht_lookup_selfheal_cbk>, loc=0x7fffe4001128, layout=0x682530) at dht-selfheal.c:855
#8  0x00007ffff23b7a1a in dht_lookup_dir_cbk (frame=0x7ffff69bd454, cookie=0x7ffff69bdfc0, this=0x65ce50, op_ret=0, op_errno=0, inode=0x7ffff00ff04c, stbuf=0x684048, xattr=0x678d80, postparent=0x6840b8) at dht-common.c:504
#9  0x00007ffff2652b9f in afr_lookup_done (frame=0x7ffff69bdfc0, this=0x65c170) at afr-common.c:1741
#10 0x00007ffff265329a in afr_lookup_cbk (frame=0x7ffff69bdfc0, cookie=0x2, this=0x65c170, op_ret=0, op_errno=0, inode=0x7ffff00ff04c, buf=0x7fffffffd8e0, xattr=0x67a790, postparent=0x7fffffffd870) at afr-common.c:1904
#11 0x00007ffff2893138 in client3_1_lookup_cbk (req=0x7ffff044a908, iov=0x7ffff044a948, count=1, myframe=0x7ffff69be1c4) at client3_1-fops.c:2292
#12 0x00007ffff7b676a4 in rpc_clnt_handle_reply (clnt=0x66f490, pollin=0x677700) at rpc-clnt.c:790
#13 0x00007ffff7b67a2b in rpc_clnt_notify (trans=0x66f810, mydata=0x66f4c0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x677700) at rpc-clnt.c:909
#14 0x00007ffff7b63c08 in rpc_transport_notify (this=0x66f810, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x677700) at rpc-transport.c:498
#15 0x00007ffff36d323d in socket_event_poll_in (this=0x66f810) at socket.c:1675
#16 0x00007ffff36d37c1 in socket_event_handler (fd=17, idx=6, data=0x66f810, poll_in=1, poll_out=0, poll_err=0) at socket.c:1790
#17 0x00007ffff7dbc76c in event_dispatch_epoll_handler (event_pool=0x649b80, events=0x64efe0, i=2) at event.c:794
#18 0x00007ffff7dbc98f in event_dispatch_epoll (event_pool=0x649b80) at event.c:856
#19 0x00007ffff7dbcd1a in event_dispatch (event_pool=0x649b80) at event.c:956
#20 0x0000000000407c2e in main (argc=5, argv=0x7fffffffdfd8) at glusterfsd.c:1601




Expected results:
There should be no crashes and ls or find should show the actual result.

Additional info:

trace entries from the client log.


[2012-02-06 07:45:25.694868] W [client3_1-fops.c:644:client3_1_statfs_cbk] 0-hosdu-client-0: remote operation failed: No such file or directory
[2012-02-06 07:45:25.695000] W [client3_1-fops.c:644:client3_1_statfs_cbk] 0-hosdu-client-2: remote operation failed: No such file or directory
[2012-02-06 07:45:25.695028] W [client3_1-fops.c:644:client3_1_statfs_cbk] 0-hosdu-client-1: remote operation failed: No such file or directory
[2012-02-06 07:45:25.695040] W [dht-diskusage.c:53:dht_du_info_cbk] 0-hosdu-dht: failed to get disk info from hosdu-replicate-0
[2012-02-06 07:45:25.696093] I [afr-common.c:1825:afr_set_root_inode_on_first_lookup] 0-hosdu-replicate-1: added root inode
[2012-02-06 07:45:25.696848] I [dht-layout.c:600:dht_layout_normalize] 0-hosdu-dht: found anomalies in /. holes=1 overlaps=0
pending frames:
frame : type(1) op(NULL)
frame : type(1) op(NULL)
frame : type(1) op(NULL)

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2012-02-06 07:45:25
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.3.0qa21
/lib64/libc.so.6[0x334ca32980]
/lib64/libc.so.6(gsignal+0x35)[0x334ca32905]
/lib64/libc.so.6(abort+0x175)[0x334ca340e5]
/lib64/libc.so.6[0x334ca2b9be]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x334ca2ba80]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/cluster/replicate.so(afr_build_parent_loc+0x44)[0x7f12ac77167e]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/cluster/replicate.so(afr_mkdir+0x421)[0x7f12ac774c30]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/cluster/distribute.so(dht_selfheal_dir_mkdir+0x583)[0x7f12ac51a4af]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/cluster/distribute.so(dht_selfheal_directory+0x25e)[0x7f12ac51b8f0]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/cluster/distribute.so(dht_lookup_dir_cbk+0x9f3)[0x7f12ac526a1a]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/cluster/replicate.so(+0x61b9f)[0x7f12ac7c1b9f]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/cluster/replicate.so(afr_lookup_cbk+0xed)[0x7f12ac7c229a]
/usr/local/lib/glusterfs/3.3.0qa21/xlator/protocol/client.so(client3_1_lookup_cbk+0x6ff)[0x7f12aca02138]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x211)[0x7f12b1cd66a4]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x2bd)[0x7f12b1cd6a2b]
/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x130)[0x7f12b1cd2c08]
/usr/local/lib/glusterfs/3.3.0qa21/rpc-transport/socket.so(socket_event_poll_in+0x54)[0x7f12ad84223d]
/usr/local/lib/glusterfs/3.3.0qa21/rpc-transport/socket.so(socket_event_handler+0x21d)[0x7f12ad8427c1]
/usr/local/lib/libglusterfs.so.0(+0x4b76c)[0x7f12b1f2b76c]
/usr/local/lib/libglusterfs.so.0(+0x4b98f)[0x7f12b1f2b98f]
/usr/local/lib/libglusterfs.so.0(event_dispatch+0x88)[0x7f12b1f2bd1a]
/usr/local/sbin/glusterfs(main+0x238)[0x407c2e]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x334ca1ecdd]
/usr/local/sbin/glusterfs[0x403e99]


I have attached the client logfile and have archived other logs.

Comment 1 Anand Avati 2012-02-29 10:47:26 UTC
CHANGE: http://review.gluster.com/2825 (libglusterfs: Handle loc_copy for nameless loc) merged in master by Vijay Bellur (vijay)

Comment 2 Anand Avati 2012-03-01 16:44:50 UTC
CHANGE: http://review.gluster.com/2826 (cluster/afr: Handle errors in build_parent_loc) merged in master by Vijay Bellur (vijay)

Comment 3 M S Vishwanath Bhat 2012-05-11 10:50:39 UTC
Did not find the glusterfs client crashing after remount.