Bug 761773 (GLUSTER-41) - FreeBSD and Linux crash (by moving write-behind location in config)
Summary: FreeBSD and Linux crash (by moving write-behind location in config)
Keywords:
Status: CLOSED WONTFIX
Alias: GLUSTER-41
Product: GlusterFS
Classification: Community
Component: scripts
Version: pre-2.0
Hardware: All
OS: FreeBSD
low
medium
Target Milestone: ---
Assignee: Harshavardhana
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-23 18:53 UTC by Basavanagowda Kanur
Modified: 2015-03-23 01:03 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Basavanagowda Kanur 2009-06-23 18:53:02 UTC
[Migrated from savannah BTS] - bug 25208 [https://savannah.nongnu.org/bugs/?25208]
Mon 29 Dec 2008 10:05:38 PM GMT, original submission by swankier <swankier>:

Linux server:

2008-12-29 14:00:32 D [inode.c:455:__inode_create] array/inode: create inode(0)
2008-12-29 14:00:32 D [inode.c:280:__inode_activate] array/inode: activating inode(0), lru=0/1024 active=2 purge=0
2008-12-29 14:00:32 D [server-protocol.c:3288:server_lookup_resume] array: LOOKUP '1/100M'
2008-12-29 14:00:32 D [inode.c:94:__dentry_hash] array/inode: dentry hashed 100M (6)
2008-12-29 14:00:32 D [inode.c:299:__inode_passivate] array/inode: passivating inode(6) lru=1/1024 active=1 purge=0
2008-12-29 14:00:32 D [inode.c:280:__inode_activate] array/inode: activating inode(6), lru=0/1024 active=2 purge=0
2008-12-29 14:00:32 D [server-protocol.c:4157:server_truncate_resume] array: TRUNCATE '/100M (6)'
2008-12-29 14:00:32 E [posix.c:2420:posix_xattrop] nas1-ns: /100M: No such file or directory
pending frames:
frame : type(1) op(TRUNCATE)
frame : type(1) op(TRUNCATE)

Signal received: 11
configuration details:argp 1
backtrace 1
db.h 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
tv_nsec 1
package-string: glusterfs 1.4.0tla
/lib64/libc.so.6[0x397ec301b0]
/usr/local/lib/glusterfs/1.4.0tla/xlator/cluster/unify.so(unify_truncate+0xd8)[0x2ae40b4afd28]
/usr/local/lib/glusterfs/1.4.0tla/xlator/cluster/afr.so(afr_truncate_wind+0x12a)[0x2ae40baf4b1a]
/usr/local/lib/glusterfs/1.4.0tla/xlator/cluster/afr.so(afr_write_pending_pre_op_cbk+0xc6)[0x2ae40baf9b76]
/usr/local/lib/libglusterfs.so.0(default_xattrop_cbk+0x20)[0x2ae40abed230]
/usr/local/lib/glusterfs/1.4.0tla/xlator/protocol/client.so(client_xattrop_cbk+0xe8)[0x2ae40b6c82c8]
/usr/local/lib/glusterfs/1.4.0tla/xlator/protocol/client.so(protocol_client_pollin+0xc1)[0x2ae40b6c2ff1]
/usr/local/lib/glusterfs/1.4.0tla/xlator/protocol/client.so(notify+0x13a)[0x2ae40b6c94ca]
/usr/local/lib/glusterfs/1.4.0tla/transport/socket.so[0x2aaaaaaadd6e]
/usr/local/lib/libglusterfs.so.0[0x2ae40abfea75]
glusterfsd(main+0x8f9)[0x4032f9]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x397ec1d8b4]
glusterfsd[0x402099]
---------
Segmentation fault (core dumped)

===========
FreeBSD server:

08-12-29 14:00:08 D [server-protocol.c:3414:server_forget] nas2: FORGET '2895920637882553344'
2008-12-29 14:00:08 D [server-protocol.c:4500:server_xattrop_resume] nas2: XATTROP '/ (1)'
2008-12-29 14:00:08 D [server-protocol.c:5712:server_entrylk_resume] nas2: ENTRYLK '/ (1) '
2008-12-29 14:00:08 D [server-protocol.c:4793:server_opendir_resume] nas2: OPENDIR '/ (2892807421867065345)'
2008-12-29 14:00:08 D [server-protocol.c:3288:server_lookup_resume] nas2: LOOKUP '0//'
2008-12-29 14:00:08 D [server-protocol.c:5000:server_readdir] nas2: READDIR 'fd=0; offset=0; size=4096
2008-12-29 14:00:08 D [server-protocol.c:5000:server_readdir] nas2: READDIR 'fd=0; offset=23; size=4096
2008-12-29 14:00:11 D [server-protocol.c:3288:server_lookup_resume] nas2: LOOKUP '0//'
2008-12-29 14:00:11 D [server-protocol.c:4793:server_opendir_resume] nas2: OPENDIR '/ (2892807421867065345)'
2008-12-29 14:00:11 D [server-protocol.c:3288:server_lookup_resume] nas2: LOOKUP '0//'
2008-12-29 14:00:11 D [server-protocol.c:5000:server_readdir] nas2: READDIR 'fd=0; offset=0; size=4096
2008-12-29 14:00:11 D [server-protocol.c:5000:server_readdir] nas2: READDIR 'fd=0; offset=23; size=4096
2008-12-29 14:00:11 D [inode.c:455:__inode_create] nas2/inode: create inode(2885374594414280704)
2008-12-29 14:00:11 D [inode.c:280:__inode_activate] nas2/inode: activating inode(0), lru=1024/2 active=0 purge=671803624
2008-12-29 14:00:11 D [server-protocol.c:3288:server_lookup_resume] nas2: LOOKUP '2891443975909015553//100M'
2008-12-29 14:00:11 D [inode.c:94:__dentry_hash] nas2/inode: dentry hashed 100M (577943598352302083)
2008-12-29 14:00:11 D [inode.c:299:__inode_passivate] nas2/inode: passivating inode(4294967299) lru=1024/1 active=0 purge=671924606
2008-12-29 14:00:11 D [server-protocol.c:5712:server_entrylk_resume] nas2: ENTRYLK '/ (2892753520027500545) '
2008-12-29 14:00:11 D [server-protocol.c:4500:server_xattrop_resume] nas2: XATTROP '/ (1)'
2008-12-29 14:00:11 D [inode.c:280:__inode_activate] nas2/inode: activating inode(3), lru=1024/2 active=0 purge=671803624
pending frames:
frame : type(1) op(UNLINK)

Signal received: 11
configuration details:db.h 1
dlfcn 1
libpthread 1
spinlock 1
extattr.h 1
package-string: glusterfs 1.4.0tla
Segmentation fault (core dumped)

======
Linux server config (moved write-behind above nas2 volume instead of at the very top):

[root@nas1 glusterfs]# cat /usr/local/etc/glusterfs/glusterfs-server.vol
volume nas1brick1
type storage/posix
option directory /data/brick1
end-volume

volume nas1brick1p
type features/posix-locks
subvolumes nas1brick1
end-volume

volume nas1brick1t
type performance/io-threads
option thread-count 4 # deault is 1
subvolumes nas1brick1p
end-volume

volume nas1brick2
type storage/posix
option directory /data/brick2
end-volume

volume nas1brick2p
type features/posix-locks
subvolumes nas1brick2
end-volume

volume nas1brick2t
type performance/io-threads
option thread-count 4 # deault is 1
subvolumes nas1brick2p
end-volume

volume nas1-ns
type storage/posix
option directory /data/ns
end-volume

volume nas1-nsp
type features/posix-locks
subvolumes nas1-ns
end-volume

volume nas1-nst
type performance/io-threads
option thread-count 4 # deault is 1
subvolumes nas1-nsp
end-volume

volume nas1
type cluster/unify
subvolumes nas1brick1t nas1brick2t
option namespace nas1-nst
option scheduler rr
end-volume

volume nas2
type protocol/client
option transport-type tcp/client
option remote-host 10.10.1.185
option remote-subvolume nas2
end-volume

volume nas2w
type performance/write-behind
option aggregate-size 2MB
option flush-behind off
subvolumes nas2
end-volume

volume array
type cluster/afr
subvolumes nas2w nas1
option favorite-child nas2
end-volume

volume server
type protocol/server
option transport-type tcp
subvolumes array
option auth.addr.array.allow *
end-volume

========
FreeBSD server config (unchanged since the setup was not crashing):

[2:04pm] root@nas2 [/usr/local/src/glusterfs]111: cat /usr/local/etc/glusterfs/glusterfs-server.vol
volume brick1
type storage/posix
option directory /storage
end-volume

volume brick1p
type features/locks
subvolumes brick1
end-volume

volume nas2
type performance/io-threads
option thread-count 4 # deault is 1
subvolumes brick1p
end-volume

### Add network serving capability to above brick.
volume server
type protocol/server
subvolumes nas2
option transport-type tcp
option auth.addr.nas2.allow *
end-volume
--------------------------------------------------------------------------------
Mon 29 Dec 2008 10:07:09 PM GMT, comment #1 by 	swankier <swankier>:

this is tla version 819
--------------------------------------------------------------------------------
Mon 29 Dec 2008 10:10:59 PM GMT, comment #2 by 	swankier <swankier>:

another example crash on the FreeBSD side:

8-12-29 14:09:12 D [server-protocol.c:5000:server_readdir] nas2: READDIR 'fd=0; offset=0; size=4096
2008-12-29 14:09:12 D [server-protocol.c:5000:server_readdir] nas2: READDIR 'fd=0; offset=23; size=4096
2008-12-29 14:09:12 D [inode.c:455:__inode_create] nas2/inode: create inode(2885374594414280704)
2008-12-29 14:09:12 D [inode.c:280:__inode_activate] nas2/inode: activating inode(0), lru=1024/2 active=0 purge=671803624
2008-12-29 14:09:12 D [server-protocol.c:3288:server_lookup_resume] nas2: LOOKUP '2891443701031108609//100M'
2008-12-29 14:09:12 D [inode.c:94:__dentry_hash] nas2/inode: dentry hashed 100M (2885699053423689731)
2008-12-29 14:09:12 D [inode.c:299:__inode_passivate] nas2/inode: passivating inode(4294967299) lru=1024/1 active=0 purge=671924606
2008-12-29 14:09:12 D [inode.c:280:__inode_activate] nas2/inode: activating inode(3), lru=1024/2 active=0 purge=671728835
2008-12-29 14:09:12 D [server-protocol.c:5541:server_inodelk_resume] nas2: INODELK '/100M (3)'
2008-12-29 14:09:12 D [common.c:513:pl_setlk] brick1p: Lock (pid=18) 0 - 0 => OK
2008-12-29 14:09:12 D [server-protocol.c:3288:server_lookup_resume] nas2: LOOKUP '2891444250786922497//100M'
2008-12-29 14:09:12 D [server-protocol.c:3688:server_open_resume] nas2: OPEN '/100M (2892754447740436483)'
2008-12-29 14:09:12 D [server-protocol.c:3793:server_readv] nas2: READV 'fd=0; offset=0; size=4096
2008-12-29 14:09:12 D [server-protocol.c:4500:server_xattrop_resume] nas2: XATTROP '/100M (3)'
2008-12-29 14:09:12 D [server-protocol.c:5541:server_inodelk_resume] nas2: INODELK '/100M (3)'
2008-12-29 14:09:12 D [common.c:513:pl_setlk] brick1p: Unlock (pid=18) 0 - 0 => OK
2008-12-29 14:09:12 D [server-protocol.c:4021:server_flush] nas2: FLUSH 'fd=0'
2008-12-29 14:09:12 D [server-protocol.c:3923:server_release] nas2: RELEASE 'fd=0'
2008-12-29 14:09:12 D [inode.c:299:__inode_passivate] nas2/inode: passivating inode(4294967299) lru=1024/1 active=0 purge=671924606
2008-12-29 14:09:12 D [server-protocol.c:5712:server_entrylk_resume] nas2: ENTRYLK '/ (2892753520027500545) '
2008-12-29 14:09:12 D [server-protocol.c:4500:server_xattrop_resume] nas2: XATTROP '/ (1)'
2008-12-29 14:09:12 D [inode.c:280:__inode_activate] nas2/inode: activating inode(3), lru=1024/2 active=0 purge=671803624
pending frames:
frame : type(1) op(UNLINK)

Signal received: 11
configuration details:db.h 1
dlfcn 1
libpthread 1
spinlock 1
extattr.h 1
package-string: glusterfs 1.4.0tla
Segmentation fault (core dumped)

--------------------------------------------------------------------------------

Mon 29 Dec 2008 10:27:02 PM GMT, comment #3 by 	swankier <swankier>:

this is when I try and work with a file created with

dd if=/dev/zero of=100M count=1 bs=100M

the FreeBSD side crashed while it was being created.

I deleted the file from the AFR, however the 100M file still exists on the FreeBSD server. If I try and delete the file off of the FreeBSD server via gluster, it crashes.

I can delete the 100M file from the filesystem directly to get gluster working again. Trying to make the 100M file again afterwards will crash the server again.

--------------------------------------------------------------------------------
Mon 29 Dec 2008 10:47:56 PM GMT, comment #4:

this is also occurring with "touch"
	swankier <swankier>

(Flag as spamFlag as spam)
--------------------------------------------------------------------------------
Mon 29 Dec 2008 11:28:10 PM GMT, comment #5 by 	swankier <swankier>:

after starting all servers without any xlators at all, the crash continued to happen.

I stopped using debug on the FreeBSD server and saw the following errors:

2008-12-29 15:19:25 E [server-protocol.c:2431:server_stub_resume] server: OPEN (/me) on nas2 returning error: -1 (2)
2008-12-29 15:19:25 E [server-protocol.c:2533:server_stub_resume] server: XATTROP (/me) on nas2 returning error: -1 (2)
2008-12-29 15:20:07 E [server-protocol.c:2567:server_stub_resume] server: UNLINK (/me) on nas2 returning error: -1 (2)

this was after an attempted "touch me" and "rm me".

after this, whatever was out of sync was in sync and the servers acted as normal again.

Comment 1 Amar Tumballi 2009-11-13 16:42:12 UTC
The development team is now 'only' working on 3.0.x branch, and the earlier versions will only support GNU/Linux and Solaris (for server). FreeBSD will be taken later. Please open the bug later if seen after 3.0.0 release.


Note You need to log in before you can comment on or make changes to this bug.