Bug 762601 (GLUSTER-869) - replicate crash in selfheal
Summary: replicate crash in selfheal
Keywords:
Status: CLOSED DUPLICATE of bug 762547
Alias: GLUSTER-869
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: All
OS: Windows
urgent
high
Target Milestone: ---
Assignee: Vikas Gorur
QA Contact:
URL:
Whiteboard:
: GLUSTER-953 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-29 01:22 UTC by Harshavardhana
Modified: 2015-12-01 16:45 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Harshavardhana 2010-04-28 22:26:36 UTC
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgcc_s.so.1
Core was generated by `/usr/local/sbin/glusterfs --log-level=NORMAL --volume-name=iocache --volfile=/u'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fedee0a6005 in afr_sh_data_fix (frame=0x59316e8, this=0x616d00) at afr-self-heal-data.c:750
750             orig_local->cont.lookup.xattr = dict_ref (orig_local->cont.lookup.xattrs[sh->source]);
(gdb) p orig_local->cont.lookup.xattrs
$1 = (dict_t **) 0x0
(gdb) bt
#0  0x00007fedee0a6005 in afr_sh_data_fix (frame=0x59316e8, this=0x616d00) at afr-self-heal-data.c:750
#1  0x00007fedee0a6292 in afr_sh_data_fstat_cbk (frame=0x59316e8, cookie=0x0, this=0x616d00, op_ret=<value optimized out>, op_errno=<value optimized out>,
    buf=0x7fffc983cc70) at afr-self-heal-data.c:828
#2  0x00007fedee2de93a in client_fstat_cbk (frame=0x57cf530, hdr=<value optimized out>, hdrlen=<value optimized out>, iobuf=<value optimized out>)
    at client-protocol.c:4344
#3  0x00007fedee2ca4da in protocol_client_pollin (this=0x613f40, trans=0x61f500) at client-protocol.c:6827
#4  0x00007fedee2d9c90 in notify (this=0x0, event=<value optimized out>, data=0x61f500) at client-protocol.c:6946
#5  0x00007fedef471493 in xlator_notify (xl=0x613f40, event=2, data=0x61f500) at xlator.c:924
#6  0x00007fedec801690 in socket_event_handler (fd=<value optimized out>, idx=32, data=0x61f500, poll_in=1, poll_out=0, poll_err=0) at socket.c:831
#7  0x00007fedef48bdbf in event_dispatch_epoll_handler (event_pool=0x6094f0) at event.c:804
#8  event_dispatch_epoll (event_pool=0x6094f0) at event.c:867
#9  0x000000000040451d in main (argc=5, argv=0x7fffc983dbe8) at glusterfsd.c:1425
(gdb) p orig_local->cont.lookup
$2 = {inode = 0x0, buf = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 121047,
    st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = 0},
    __unused = {0, 0, 0}}, read_child_buf = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0,
    st_size = 0, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0,
      tv_nsec = 0}, __unused = {0, 0, 0}}, postparent = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, __pad0 = 0,
    st_rdev = 0, st_size = 0, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {
      tv_sec = 0, tv_nsec = 0}, __unused = {0, 0, 0}}, ino = 0, gen = 0, parent_ino = 0, xattr = 0x0, xattrs = 0x0, is_revalidate = _gf_false}
(gdb)

Comment 1 Harshavardhana 2010-04-29 01:22:29 UTC
gluster006 was rebooted after which we see this crash


[2010-04-29 01:15:47] E [saved-frames.c:165:saved_frames_unwind] gluster006-3: forced unwinding frame type(1) op(LOOKUP)
[2010-04-29 01:15:47] E [saved-frames.c:165:saved_frames_unwind] gluster006-3: forced unwinding frame type(1) op(LOOKUP)
[2010-04-29 01:15:47] E [saved-frames.c:165:saved_frames_unwind] gluster006-3: forced unwinding frame type(2) op(PING)
[2010-04-29 01:15:47] N [client-protocol.c:6994:notify] gluster006-3: disconnected
[2010-04-29 01:15:47] E [saved-frames.c:165:saved_frames_unwind] gluster006-1: forced unwinding frame type(1) op(LOOKUP)
[2010-04-29 01:15:47] E [saved-frames.c:165:saved_frames_unwind] gluster006-1: forced unwinding frame type(1) op(LOOKUP)
[2010-04-29 01:15:47] E [saved-frames.c:165:saved_frames_unwind] gluster006-1: forced unwinding frame type(1) op(LOOKUP)
[2010-04-29 01:15:47] E [saved-frames.c:165:saved_frames_unwind] gluster006-1: forced unwinding frame type(2) op(PING)
[2010-04-29 01:15:47] N [client-protocol.c:6994:notify] gluster006-1: disconnected
[2010-04-29 01:15:54] E [client-protocol.c:415:client_ping_timer_expired] gluster006-3: Server 192.168.101.56:10002 has not responded in the last 42 seconds, disconnecting.
[2010-04-29 01:15:59] E [socket.c:762:socket_connect_finish] gluster006-3: connection to 192.168.101.56:10002 failed (No route to host)
[2010-04-29 01:16:08] E [socket.c:762:socket_connect_finish] gluster006-1: connection to 192.168.101.56:10002 failed (Connection timed out)
[2010-04-29 01:16:08] E [socket.c:762:socket_connect_finish] gluster006-1: connection to 192.168.101.56:10002 failed (Connection timed out)
[2010-04-29 01:16:18] E [socket.c:762:socket_connect_finish] gluster006-3: connection to 192.168.101.56:10002 failed (Connection timed out)
[2010-04-29 01:16:36] E [client-protocol.c:415:client_ping_timer_expired] gluster006-2: Server 192.168.101.56:10002 has not responded in the last 42 seconds, disconnecting.
[2010-04-29 01:16:36] E [saved-frames.c:165:saved_frames_unwind] gluster006-2: forced unwinding frame type(1) op(LOOKUP)
[2010-04-29 01:16:36] E [saved-frames.c:165:saved_frames_unwind] gluster006-2: forced unwinding frame type(1) op(LOOKUP)
[2010-04-29 01:16:36] E [saved-frames.c:165:saved_frames_unwind] gluster006-2: forced unwinding frame type(1) op(STAT)
[2010-04-29 01:16:36] E [saved-frames.c:165:saved_frames_unwind] gluster006-2: forced unwinding frame type(2) op(PING)
[2010-04-29 01:16:36] N [client-protocol.c:6994:notify] gluster006-2: disconnected
[2010-04-29 01:16:40] N [client-protocol.c:6246:client_setvolume_cbk] gluster006-2: Connected to 192.168.101.56:10002, attached to remote volume 'brick2'.
[2010-04-29 01:16:41] N [client-protocol.c:6246:client_setvolume_cbk] gluster006-2: Connected to 192.168.101.56:10002, attached to remote volume 'brick2'.
[2010-04-29 01:16:41] N [client-protocol.c:6246:client_setvolume_cbk] gluster006-3: Connected to 192.168.101.56:10002, attached to remote volume 'brick3'.
[2010-04-29 01:16:41] E [afr-self-heal-common.c:1214:sh_missing_entries_create] mirror-8: no missing files - /working/archive2.log. proceeding to metadata check
pending frames:
frame : type(1) op(LOOKUP)

patchset: v3.0.4avail1
signal received: 11
time of crash: 2010-04-29 01:16:41
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.0.4avail1
/lib/libc.so.6[0x7fedeed19290]
/usr/local/lib/glusterfs/3.0.4avail1/xlator/cluster/replicate.so(afr_sh_data_fix+0x1d5)[0x7fedee0a6005]
/usr/local/lib/glusterfs/3.0.4avail1/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0xa2)[0x7fedee0a6292]
/usr/local/lib/glusterfs/3.0.4avail1/xlator/protocol/client.so(client_fstat_cbk+0x1aa)[0x7fedee2de93a]
/usr/local/lib/glusterfs/3.0.4avail1/xlator/protocol/client.so(protocol_client_pollin+0xca)[0x7fedee2ca4da]
/usr/local/lib/glusterfs/3.0.4avail1/xlator/protocol/client.so(notify+0xe0)[0x7fedee2d9c90]
/usr/local/lib/libglusterfs.so.0(xlator_notify+0x43)[0x7fedef471493]
/usr/local/lib/glusterfs/3.0.4avail1/transport/socket.so(socket_event_handler+0xe0)[0x7fedec801690]
/usr/local/lib/libglusterfs.so.0[0x7fedef48bdbf]
/usr/local/sbin/glusterfs(main+0x8cd)[0x40451d]
/lib/libc.so.6(__libc_start_main+0xe6)[0x7fedeed055c6]
/usr/local/sbin/glusterfs[0x402a59]

Comment 2 Amar Tumballi 2010-05-04 08:07:00 UTC
This issue came up with the patch for bug 762547. Currently this is fixed by reverting the patch.

*** This bug has been marked as a duplicate of bug 815 ***

Comment 3 Shehjar Tikoo 2010-05-31 04:12:14 UTC
*** Bug 953 has been marked as a duplicate of this bug. ***

Comment 4 Raghavendra Bhat 2010-06-15 06:25:37 UTC
Bug 815 which is the duplicate if this bug has been checked and moved to verified status.


Note You need to log in before you can comment on or make changes to this bug.