Bug 761773 (GLUSTER-41)

Summary: FreeBSD and Linux crash (by moving write-behind location in config)
Product: [Community] GlusterFS Reporter: Basavanagowda Kanur <gowda>
Component: scriptsAssignee: Harshavardhana <fharshav>
Status: CLOSED WONTFIX QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: pre-2.0CC: amarts, cww, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: FreeBSD   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Basavanagowda Kanur 2009-06-23 18:53:02 UTC
[Migrated from savannah BTS] - bug 25208 [https://savannah.nongnu.org/bugs/?25208]
Mon 29 Dec 2008 10:05:38 PM GMT, original submission by swankier <swankier>:

Linux server:

2008-12-29 14:00:32 D [inode.c:455:__inode_create] array/inode: create inode(0)
2008-12-29 14:00:32 D [inode.c:280:__inode_activate] array/inode: activating inode(0), lru=0/1024 active=2 purge=0
2008-12-29 14:00:32 D [server-protocol.c:3288:server_lookup_resume] array: LOOKUP '1/100M'
2008-12-29 14:00:32 D [inode.c:94:__dentry_hash] array/inode: dentry hashed 100M (6)
2008-12-29 14:00:32 D [inode.c:299:__inode_passivate] array/inode: passivating inode(6) lru=1/1024 active=1 purge=0
2008-12-29 14:00:32 D [inode.c:280:__inode_activate] array/inode: activating inode(6), lru=0/1024 active=2 purge=0
2008-12-29 14:00:32 D [server-protocol.c:4157:server_truncate_resume] array: TRUNCATE '/100M (6)'
2008-12-29 14:00:32 E [posix.c:2420:posix_xattrop] nas1-ns: /100M: No such file or directory
pending frames:
frame : type(1) op(TRUNCATE)
frame : type(1) op(TRUNCATE)

Signal received: 11
configuration details:argp 1
backtrace 1
db.h 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
tv_nsec 1
package-string: glusterfs 1.4.0tla
/lib64/libc.so.6[0x397ec301b0]
/usr/local/lib/glusterfs/1.4.0tla/xlator/cluster/unify.so(unify_truncate+0xd8)[0x2ae40b4afd28]
/usr/local/lib/glusterfs/1.4.0tla/xlator/cluster/afr.so(afr_truncate_wind+0x12a)[0x2ae40baf4b1a]
/usr/local/lib/glusterfs/1.4.0tla/xlator/cluster/afr.so(afr_write_pending_pre_op_cbk+0xc6)[0x2ae40baf9b76]
/usr/local/lib/libglusterfs.so.0(default_xattrop_cbk+0x20)[0x2ae40abed230]
/usr/local/lib/glusterfs/1.4.0tla/xlator/protocol/client.so(client_xattrop_cbk+0xe8)[0x2ae40b6c82c8]
/usr/local/lib/glusterfs/1.4.0tla/xlator/protocol/client.so(protocol_client_pollin+0xc1)[0x2ae40b6c2ff1]
/usr/local/lib/glusterfs/1.4.0tla/xlator/protocol/client.so(notify+0x13a)[0x2ae40b6c94ca]
/usr/local/lib/glusterfs/1.4.0tla/transport/socket.so[0x2aaaaaaadd6e]
/usr/local/lib/libglusterfs.so.0[0x2ae40abfea75]
glusterfsd(main+0x8f9)[0x4032f9]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x397ec1d8b4]
glusterfsd[0x402099]
---------
Segmentation fault (core dumped)

===========
FreeBSD server:

08-12-29 14:00:08 D [server-protocol.c:3414:server_forget] nas2: FORGET '2895920637882553344'
2008-12-29 14:00:08 D [server-protocol.c:4500:server_xattrop_resume] nas2: XATTROP '/ (1)'
2008-12-29 14:00:08 D [server-protocol.c:5712:server_entrylk_resume] nas2: ENTRYLK '/ (1) '
2008-12-29 14:00:08 D [server-protocol.c:4793:server_opendir_resume] nas2: OPENDIR '/ (2892807421867065345)'
2008-12-29 14:00:08 D [server-protocol.c:3288:server_lookup_resume] nas2: LOOKUP '0//'
2008-12-29 14:00:08 D [server-protocol.c:5000:server_readdir] nas2: READDIR 'fd=0; offset=0; size=4096
2008-12-29 14:00:08 D [server-protocol.c:5000:server_readdir] nas2: READDIR 'fd=0; offset=23; size=4096
2008-12-29 14:00:11 D [server-protocol.c:3288:server_lookup_resume] nas2: LOOKUP '0//'
2008-12-29 14:00:11 D [server-protocol.c:4793:server_opendir_resume] nas2: OPENDIR '/ (2892807421867065345)'
2008-12-29 14:00:11 D [server-protocol.c:3288:server_lookup_resume] nas2: LOOKUP '0//'
2008-12-29 14:00:11 D [server-protocol.c:5000:server_readdir] nas2: READDIR 'fd=0; offset=0; size=4096
2008-12-29 14:00:11 D [server-protocol.c:5000:server_readdir] nas2: READDIR 'fd=0; offset=23; size=4096
2008-12-29 14:00:11 D [inode.c:455:__inode_create] nas2/inode: create inode(2885374594414280704)
2008-12-29 14:00:11 D [inode.c:280:__inode_activate] nas2/inode: activating inode(0), lru=1024/2 active=0 purge=671803624
2008-12-29 14:00:11 D [server-protocol.c:3288:server_lookup_resume] nas2: LOOKUP '2891443975909015553//100M'
2008-12-29 14:00:11 D [inode.c:94:__dentry_hash] nas2/inode: dentry hashed 100M (577943598352302083)
2008-12-29 14:00:11 D [inode.c:299:__inode_passivate] nas2/inode: passivating inode(4294967299) lru=1024/1 active=0 purge=671924606
2008-12-29 14:00:11 D [server-protocol.c:5712:server_entrylk_resume] nas2: ENTRYLK '/ (2892753520027500545) '
2008-12-29 14:00:11 D [server-protocol.c:4500:server_xattrop_resume] nas2: XATTROP '/ (1)'
2008-12-29 14:00:11 D [inode.c:280:__inode_activate] nas2/inode: activating inode(3), lru=1024/2 active=0 purge=671803624
pending frames:
frame : type(1) op(UNLINK)

Signal received: 11
configuration details:db.h 1
dlfcn 1
libpthread 1
spinlock 1
extattr.h 1
package-string: glusterfs 1.4.0tla
Segmentation fault (core dumped)

======
Linux server config (moved write-behind above nas2 volume instead of at the very top):

[root@nas1 glusterfs]# cat /usr/local/etc/glusterfs/glusterfs-server.vol
volume nas1brick1
type storage/posix
option directory /data/brick1
end-volume

volume nas1brick1p
type features/posix-locks
subvolumes nas1brick1
end-volume

volume nas1brick1t
type performance/io-threads
option thread-count 4 # deault is 1
subvolumes nas1brick1p
end-volume

volume nas1brick2
type storage/posix
option directory /data/brick2
end-volume

volume nas1brick2p
type features/posix-locks
subvolumes nas1brick2
end-volume

volume nas1brick2t
type performance/io-threads
option thread-count 4 # deault is 1
subvolumes nas1brick2p
end-volume

volume nas1-ns
type storage/posix
option directory /data/ns
end-volume

volume nas1-nsp
type features/posix-locks
subvolumes nas1-ns
end-volume

volume nas1-nst
type performance/io-threads
option thread-count 4 # deault is 1
subvolumes nas1-nsp
end-volume

volume nas1
type cluster/unify
subvolumes nas1brick1t nas1brick2t
option namespace nas1-nst
option scheduler rr
end-volume

volume nas2
type protocol/client
option transport-type tcp/client
option remote-host 10.10.1.185
option remote-subvolume nas2
end-volume

volume nas2w
type performance/write-behind
option aggregate-size 2MB
option flush-behind off
subvolumes nas2
end-volume

volume array
type cluster/afr
subvolumes nas2w nas1
option favorite-child nas2
end-volume

volume server
type protocol/server
option transport-type tcp
subvolumes array
option auth.addr.array.allow *
end-volume

========
FreeBSD server config (unchanged since the setup was not crashing):

[2:04pm] root@nas2 [/usr/local/src/glusterfs]111: cat /usr/local/etc/glusterfs/glusterfs-server.vol
volume brick1
type storage/posix
option directory /storage
end-volume

volume brick1p
type features/locks
subvolumes brick1
end-volume

volume nas2
type performance/io-threads
option thread-count 4 # deault is 1
subvolumes brick1p
end-volume

### Add network serving capability to above brick.
volume server
type protocol/server
subvolumes nas2
option transport-type tcp
option auth.addr.nas2.allow *
end-volume
--------------------------------------------------------------------------------
Mon 29 Dec 2008 10:07:09 PM GMT, comment #1 by 	swankier <swankier>:

this is tla version 819
--------------------------------------------------------------------------------
Mon 29 Dec 2008 10:10:59 PM GMT, comment #2 by 	swankier <swankier>:

another example crash on the FreeBSD side:

8-12-29 14:09:12 D [server-protocol.c:5000:server_readdir] nas2: READDIR 'fd=0; offset=0; size=4096
2008-12-29 14:09:12 D [server-protocol.c:5000:server_readdir] nas2: READDIR 'fd=0; offset=23; size=4096
2008-12-29 14:09:12 D [inode.c:455:__inode_create] nas2/inode: create inode(2885374594414280704)
2008-12-29 14:09:12 D [inode.c:280:__inode_activate] nas2/inode: activating inode(0), lru=1024/2 active=0 purge=671803624
2008-12-29 14:09:12 D [server-protocol.c:3288:server_lookup_resume] nas2: LOOKUP '2891443701031108609//100M'
2008-12-29 14:09:12 D [inode.c:94:__dentry_hash] nas2/inode: dentry hashed 100M (2885699053423689731)
2008-12-29 14:09:12 D [inode.c:299:__inode_passivate] nas2/inode: passivating inode(4294967299) lru=1024/1 active=0 purge=671924606
2008-12-29 14:09:12 D [inode.c:280:__inode_activate] nas2/inode: activating inode(3), lru=1024/2 active=0 purge=671728835
2008-12-29 14:09:12 D [server-protocol.c:5541:server_inodelk_resume] nas2: INODELK '/100M (3)'
2008-12-29 14:09:12 D [common.c:513:pl_setlk] brick1p: Lock (pid=18) 0 - 0 => OK
2008-12-29 14:09:12 D [server-protocol.c:3288:server_lookup_resume] nas2: LOOKUP '2891444250786922497//100M'
2008-12-29 14:09:12 D [server-protocol.c:3688:server_open_resume] nas2: OPEN '/100M (2892754447740436483)'
2008-12-29 14:09:12 D [server-protocol.c:3793:server_readv] nas2: READV 'fd=0; offset=0; size=4096
2008-12-29 14:09:12 D [server-protocol.c:4500:server_xattrop_resume] nas2: XATTROP '/100M (3)'
2008-12-29 14:09:12 D [server-protocol.c:5541:server_inodelk_resume] nas2: INODELK '/100M (3)'
2008-12-29 14:09:12 D [common.c:513:pl_setlk] brick1p: Unlock (pid=18) 0 - 0 => OK
2008-12-29 14:09:12 D [server-protocol.c:4021:server_flush] nas2: FLUSH 'fd=0'
2008-12-29 14:09:12 D [server-protocol.c:3923:server_release] nas2: RELEASE 'fd=0'
2008-12-29 14:09:12 D [inode.c:299:__inode_passivate] nas2/inode: passivating inode(4294967299) lru=1024/1 active=0 purge=671924606
2008-12-29 14:09:12 D [server-protocol.c:5712:server_entrylk_resume] nas2: ENTRYLK '/ (2892753520027500545) '
2008-12-29 14:09:12 D [server-protocol.c:4500:server_xattrop_resume] nas2: XATTROP '/ (1)'
2008-12-29 14:09:12 D [inode.c:280:__inode_activate] nas2/inode: activating inode(3), lru=1024/2 active=0 purge=671803624
pending frames:
frame : type(1) op(UNLINK)

Signal received: 11
configuration details:db.h 1
dlfcn 1
libpthread 1
spinlock 1
extattr.h 1
package-string: glusterfs 1.4.0tla
Segmentation fault (core dumped)

--------------------------------------------------------------------------------

Mon 29 Dec 2008 10:27:02 PM GMT, comment #3 by 	swankier <swankier>:

this is when I try and work with a file created with

dd if=/dev/zero of=100M count=1 bs=100M

the FreeBSD side crashed while it was being created.

I deleted the file from the AFR, however the 100M file still exists on the FreeBSD server. If I try and delete the file off of the FreeBSD server via gluster, it crashes.

I can delete the 100M file from the filesystem directly to get gluster working again. Trying to make the 100M file again afterwards will crash the server again.

--------------------------------------------------------------------------------
Mon 29 Dec 2008 10:47:56 PM GMT, comment #4:

this is also occurring with "touch"
	swankier <swankier>

(Flag as spamFlag as spam)
--------------------------------------------------------------------------------
Mon 29 Dec 2008 11:28:10 PM GMT, comment #5 by 	swankier <swankier>:

after starting all servers without any xlators at all, the crash continued to happen.

I stopped using debug on the FreeBSD server and saw the following errors:

2008-12-29 15:19:25 E [server-protocol.c:2431:server_stub_resume] server: OPEN (/me) on nas2 returning error: -1 (2)
2008-12-29 15:19:25 E [server-protocol.c:2533:server_stub_resume] server: XATTROP (/me) on nas2 returning error: -1 (2)
2008-12-29 15:20:07 E [server-protocol.c:2567:server_stub_resume] server: UNLINK (/me) on nas2 returning error: -1 (2)

this was after an attempted "touch me" and "rm me".

after this, whatever was out of sync was in sync and the servers acted as normal again.

Comment 1 Amar Tumballi 2009-11-13 16:42:12 UTC
The development team is now 'only' working on 3.0.x branch, and the earlier versions will only support GNU/Linux and Solaris (for server). FreeBSD will be taken later. Please open the bug later if seen after 3.0.0 release.