| Summary: | replicate crash in selfheal | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Harshavardhana <fharshav> |
| Component: | replicate | Assignee: | Vikas Gorur <vikas> |
| Status: | CLOSED DUPLICATE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | mainline | CC: | amarts, chida, cww, gluster-bugs, rabhat, shehjart |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Windows | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
gluster006 was rebooted after which we see this crash [2010-04-29 01:15:47] E [saved-frames.c:165:saved_frames_unwind] gluster006-3: forced unwinding frame type(1) op(LOOKUP) [2010-04-29 01:15:47] E [saved-frames.c:165:saved_frames_unwind] gluster006-3: forced unwinding frame type(1) op(LOOKUP) [2010-04-29 01:15:47] E [saved-frames.c:165:saved_frames_unwind] gluster006-3: forced unwinding frame type(2) op(PING) [2010-04-29 01:15:47] N [client-protocol.c:6994:notify] gluster006-3: disconnected [2010-04-29 01:15:47] E [saved-frames.c:165:saved_frames_unwind] gluster006-1: forced unwinding frame type(1) op(LOOKUP) [2010-04-29 01:15:47] E [saved-frames.c:165:saved_frames_unwind] gluster006-1: forced unwinding frame type(1) op(LOOKUP) [2010-04-29 01:15:47] E [saved-frames.c:165:saved_frames_unwind] gluster006-1: forced unwinding frame type(1) op(LOOKUP) [2010-04-29 01:15:47] E [saved-frames.c:165:saved_frames_unwind] gluster006-1: forced unwinding frame type(2) op(PING) [2010-04-29 01:15:47] N [client-protocol.c:6994:notify] gluster006-1: disconnected [2010-04-29 01:15:54] E [client-protocol.c:415:client_ping_timer_expired] gluster006-3: Server 192.168.101.56:10002 has not responded in the last 42 seconds, disconnecting. [2010-04-29 01:15:59] E [socket.c:762:socket_connect_finish] gluster006-3: connection to 192.168.101.56:10002 failed (No route to host) [2010-04-29 01:16:08] E [socket.c:762:socket_connect_finish] gluster006-1: connection to 192.168.101.56:10002 failed (Connection timed out) [2010-04-29 01:16:08] E [socket.c:762:socket_connect_finish] gluster006-1: connection to 192.168.101.56:10002 failed (Connection timed out) [2010-04-29 01:16:18] E [socket.c:762:socket_connect_finish] gluster006-3: connection to 192.168.101.56:10002 failed (Connection timed out) [2010-04-29 01:16:36] E [client-protocol.c:415:client_ping_timer_expired] gluster006-2: Server 192.168.101.56:10002 has not responded in the last 42 seconds, disconnecting. [2010-04-29 01:16:36] E [saved-frames.c:165:saved_frames_unwind] gluster006-2: forced unwinding frame type(1) op(LOOKUP) [2010-04-29 01:16:36] E [saved-frames.c:165:saved_frames_unwind] gluster006-2: forced unwinding frame type(1) op(LOOKUP) [2010-04-29 01:16:36] E [saved-frames.c:165:saved_frames_unwind] gluster006-2: forced unwinding frame type(1) op(STAT) [2010-04-29 01:16:36] E [saved-frames.c:165:saved_frames_unwind] gluster006-2: forced unwinding frame type(2) op(PING) [2010-04-29 01:16:36] N [client-protocol.c:6994:notify] gluster006-2: disconnected [2010-04-29 01:16:40] N [client-protocol.c:6246:client_setvolume_cbk] gluster006-2: Connected to 192.168.101.56:10002, attached to remote volume 'brick2'. [2010-04-29 01:16:41] N [client-protocol.c:6246:client_setvolume_cbk] gluster006-2: Connected to 192.168.101.56:10002, attached to remote volume 'brick2'. [2010-04-29 01:16:41] N [client-protocol.c:6246:client_setvolume_cbk] gluster006-3: Connected to 192.168.101.56:10002, attached to remote volume 'brick3'. [2010-04-29 01:16:41] E [afr-self-heal-common.c:1214:sh_missing_entries_create] mirror-8: no missing files - /working/archive2.log. proceeding to metadata check pending frames: frame : type(1) op(LOOKUP) patchset: v3.0.4avail1 signal received: 11 time of crash: 2010-04-29 01:16:41 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.0.4avail1 /lib/libc.so.6[0x7fedeed19290] /usr/local/lib/glusterfs/3.0.4avail1/xlator/cluster/replicate.so(afr_sh_data_fix+0x1d5)[0x7fedee0a6005] /usr/local/lib/glusterfs/3.0.4avail1/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0xa2)[0x7fedee0a6292] /usr/local/lib/glusterfs/3.0.4avail1/xlator/protocol/client.so(client_fstat_cbk+0x1aa)[0x7fedee2de93a] /usr/local/lib/glusterfs/3.0.4avail1/xlator/protocol/client.so(protocol_client_pollin+0xca)[0x7fedee2ca4da] /usr/local/lib/glusterfs/3.0.4avail1/xlator/protocol/client.so(notify+0xe0)[0x7fedee2d9c90] /usr/local/lib/libglusterfs.so.0(xlator_notify+0x43)[0x7fedef471493] /usr/local/lib/glusterfs/3.0.4avail1/transport/socket.so(socket_event_handler+0xe0)[0x7fedec801690] /usr/local/lib/libglusterfs.so.0[0x7fedef48bdbf] /usr/local/sbin/glusterfs(main+0x8cd)[0x40451d] /lib/libc.so.6(__libc_start_main+0xe6)[0x7fedeed055c6] /usr/local/sbin/glusterfs[0x402a59] This issue came up with the patch for bug 762547. Currently this is fixed by reverting the patch. *** This bug has been marked as a duplicate of bug 815 *** *** Bug 953 has been marked as a duplicate of this bug. *** |
Loaded symbols for /lib/libnss_files.so.2 Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libgcc_s.so.1 Core was generated by `/usr/local/sbin/glusterfs --log-level=NORMAL --volume-name=iocache --volfile=/u'. Program terminated with signal 11, Segmentation fault. #0 0x00007fedee0a6005 in afr_sh_data_fix (frame=0x59316e8, this=0x616d00) at afr-self-heal-data.c:750 750 orig_local->cont.lookup.xattr = dict_ref (orig_local->cont.lookup.xattrs[sh->source]); (gdb) p orig_local->cont.lookup.xattrs $1 = (dict_t **) 0x0 (gdb) bt #0 0x00007fedee0a6005 in afr_sh_data_fix (frame=0x59316e8, this=0x616d00) at afr-self-heal-data.c:750 #1 0x00007fedee0a6292 in afr_sh_data_fstat_cbk (frame=0x59316e8, cookie=0x0, this=0x616d00, op_ret=<value optimized out>, op_errno=<value optimized out>, buf=0x7fffc983cc70) at afr-self-heal-data.c:828 #2 0x00007fedee2de93a in client_fstat_cbk (frame=0x57cf530, hdr=<value optimized out>, hdrlen=<value optimized out>, iobuf=<value optimized out>) at client-protocol.c:4344 #3 0x00007fedee2ca4da in protocol_client_pollin (this=0x613f40, trans=0x61f500) at client-protocol.c:6827 #4 0x00007fedee2d9c90 in notify (this=0x0, event=<value optimized out>, data=0x61f500) at client-protocol.c:6946 #5 0x00007fedef471493 in xlator_notify (xl=0x613f40, event=2, data=0x61f500) at xlator.c:924 #6 0x00007fedec801690 in socket_event_handler (fd=<value optimized out>, idx=32, data=0x61f500, poll_in=1, poll_out=0, poll_err=0) at socket.c:831 #7 0x00007fedef48bdbf in event_dispatch_epoll_handler (event_pool=0x6094f0) at event.c:804 #8 event_dispatch_epoll (event_pool=0x6094f0) at event.c:867 #9 0x000000000040451d in main (argc=5, argv=0x7fffc983dbe8) at glusterfsd.c:1425 (gdb) p orig_local->cont.lookup $2 = {inode = 0x0, buf = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 121047, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = 0}, __unused = {0, 0, 0}}, read_child_buf = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 0, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = 0}, __unused = {0, 0, 0}}, postparent = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 0, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = { tv_sec = 0, tv_nsec = 0}, __unused = {0, 0, 0}}, ino = 0, gen = 0, parent_ino = 0, xattr = 0x0, xattrs = 0x0, is_revalidate = _gf_false} (gdb)