Bug 762063 (GLUSTER-331)

Summary: 2 servers replicating re-exporting samba share both crash within minutes
Product: [Community] GlusterFS Reporter: Mark <mark>
Component: locksAssignee: Pavan Vilas Sondur <pavan>
Status: CLOSED WORKSFORME QA Contact:
Severity: high Docs Contact:
Priority: urgent    
Version: 2.0.7CC: anush, gluster-bugs, hauser, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTNR Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Node1 server and client logs
none
Node2 server and client logs none

Description Mark 2009-10-23 13:11:52 UTC
glusterfsd.log (similar on both machines)
==============

pending frames:
frame : type(1) op(LK)

patchset: v2.0.7
signal received: 11
time of crash: 2009-10-23 13:11:07
configuration details:
argp 1
backtrace 1
db.h 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 2.0.7
/lib64/libc.so.6[0x3350830280]
/usr/lib64/glusterfs/2.0.7/xlator/features/locks.so(__delete_lock+0x7)[0x2b5c2bd2fd07]
/usr/lib64/glusterfs/2.0.7/xlator/features/locks.so[0x2b5c2bd2ff69]
/usr/lib64/glusterfs/2.0.7/xlator/features/locks.so(pl_setlk+0x79)[0x2b5c2bd307c9]
/usr/lib64/glusterfs/2.0.7/xlator/features/locks.so(pl_lk+0x15c)[0x2b5c2bd30dec]
/usr/lib64/glusterfs/2.0.7/xlator/protocol/server.so(server_lk+0x1eb)[0x2b5c2bf43b9b]
/usr/lib64/glusterfs/2.0.7/xlator/protocol/server.so(protocol_server_pollin+0x90)[0x2b5c2bf3ccd0]
/usr/lib64/glusterfs/2.0.7/xlator/protocol/server.so(notify+0xcb)[0x2b5c2bf3cdab]
/usr/lib64/glusterfs/2.0.7/transport/socket.so(socket_event_handler+0xd3)[0x2aaaaaaafdf3]
/usr/lib64/libglusterfs.so.0[0x2b5c2b093755]
/usr/sbin/glusterfsd(main+0x9e8)[0x403fa8]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x335081d974]
/usr/sbin/glusterfsd[0x4025c9]
---------

glusterfsd.vol (server)
==============

volume posix
  type storage/posix
  option directory /mnt/sdb1
end-volume

volume brick
  type features/locks
  subvolumes posix
end-volume

volume server
  type protocol/server
  option transport-type tcp
  option auth.addr.brick.allow *
  subvolumes brick
end-volume


glusterfs.vol (client)
=============

volume client1
  type protocol/client
  option transport-type tcp
  option remote-host 192.168.25.31
  option remote-subvolume brick
end-volume

volume client2
  type protocol/client
  option transport-type tcp
  option remote-host 192.168.25.32
  option remote-subvolume brick
end-volume

volume replicate
  type cluster/replicate
  subvolumes client1 client2
end-volume

Comment 1 Pavan Vilas Sondur 2009-11-13 03:42:52 UTC
Mark, can you send us the log files and the backtrace of this crash?

Comment 2 Mark 2009-11-17 19:57:48 UTC
Created attachment 103 [details]
patch replacing the fpregset patch, fixing the described problems

Comment 3 Mark 2009-11-17 20:00:19 UTC
Created attachment 104 [details]
Slovenian XKB - experimental

Comment 4 Mark 2009-11-17 20:00:53 UTC
(In reply to comment #1)
> Mark, can you send us the log files and the backtrace of this crash?

how do I get the backtrace?

Comment 5 Pavan Vilas Sondur 2009-11-24 08:56:10 UTC
The backtrace can be got from the core file using gdb:
gdb -c <core-file> <glusterfs binary> and then a 'bt' command on the gdb prompt.

I tried reproducing this issue and was unable to crash glusterfs when it's mount point is re-exported as a Samba share. I saw your client logs and there are plenty of messages indicating a possible 'spilt brain' (files from the backend directories are modified, but not from the mount point). Was the backend accessed and files modified to result in a split brain?

Comment 6 Mark 2009-12-02 16:47:05 UTC
(In reply to comment #5)
> The backtrace can be got from the core file using gdb:
> gdb -c <core-file> <glusterfs binary> and then a 'bt' command on the gdb
> prompt.
> I tried reproducing this issue and was unable to crash glusterfs when it's
> mount point is re-exported as a Samba share. I saw your client logs and there
> are plenty of messages indicating a possible 'spilt brain' (files from the
> backend directories are modified, but not from the mount point). Was the
> backend accessed and files modified to result in a split brain?

Node1 Back Trace:

(gdb) bt
#0  0x00002b5c2bd2fd07 in ?? ()
#1  0x00002b5c2bd2ff69 in ?? ()
#2  0x0000000000000098 in ?? ()
#3  0x000000104ae19d5a in ?? ()
#4  0x0000000011696fb0 in ?? ()
#5  0x000000001165b550 in ?? ()
#6  0x0000000000000000 in ?? ()

Node2 Back Trace:

(gdb) bt
#0  0x00002b9f7e775d07 in ?? ()
#1  0x00002b9f7e775f69 in ?? ()
#2  0x0000000000000098 in ?? ()
#3  0x000000104ae15adc in ?? ()
#4  0x0000000012f43e20 in ?? ()
#5  0x0000000012fc0900 in ?? ()
#6  0x0000000000000000 in ?? ()

Comment 7 Pavan Vilas Sondur 2010-02-02 03:57:37 UTC
The logs show a slew of 'split brain' messages. Were there any files modified directly from the backend? I am unable to reproduce this crash, infact, glusterfs mounts being re-exported over samba has been well tested. Can we have remote access to the core if possible, since the backtrace provided is not much use.

Comment 8 Pavan Vilas Sondur 2010-02-16 08:43:42 UTC
Closing this bug due to lack of sufficient data. Please re-open this bug if it surfaces again.