Bug 761771 (GLUSTER-39)

Summary: glusterfs--mainline--2.5--patch-797 received signal 11
Product: [Community] GlusterFS Reporter: Basavanagowda Kanur <gowda>
Component: unifyAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: pre-2.0CC: gluster-bugs, gowda, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Basavanagowda Kanur 2009-06-23 14:15:12 UTC
[Migrated from savannah BTS] - bug 24927 [https://savannah.nongnu.org/bugs/?24972]

Sun 30 Nov 2008 02:50:07 PM GMT, original submission by	Emmet Brown <chatmann>:

Hi, on an unify setup, gluster crashed on two nodes with the following message. The error occured during a test in which 24 processes spread on various nodes were trying to read the same file. Yes, this is nasty, but I the crash should not occur anyway.

Thanks,
Chris.

2008-11-30 15:28:49 E [fuse-bridge.c:468:fuse_entry_cbk] glusterfs-fuse: 632212: (34) /chris/si
muspace/samap/maps/cmbXXX/params_x50mcmb.dat => -1 (2)

TLA Repo Revision: glusterfs--mainline--2.5--patch-797
Time : 2008-11-30 15:28:49
Signal Number : 11

/usr/local/sbin/glusterfs -f /etc/glusterfs/gluster.vol -l /usr/local/var/log/glusterfs/gluster
fs.log -L WARNING /mnt/data
volume fuse
type mount/fuse
option direct-io-mode 1
option entry-timeout 1
option attr-timeout 1
option mount-point /mnt/data
subvolumes unify
end-volume

volume unify
type cluster/unify
option nufa.limits.min-free-disk 5%
option nufa.local-volume-name brick
option scheduler nufa
option namespace client-ns
subvolumes client-02 client-03 brick client-05 client-06 client-07
end-volume

volume client-ns
type protocol/client
option remote-subvolume brick-ns
option remote-host cosmo
option transport-type tcp/client
end-volume

volume client-07
type protocol/client
option remote-subvolume brick
option remote-host saturn
option transport-type tcp/client
end-volume

volume client-06
type protocol/client
option remote-subvolume brick
option remote-host jupiter
option transport-type tcp/client
end-volume

volume client-05
type protocol/client
option remote-subvolume brick
option remote-host mars
option transport-type tcp/client
end-volume

volume client-03
type protocol/client
option remote-subvolume brick
option remote-host venus
option transport-type tcp/client
end-volume

volume client-02
type protocol/client
option remote-subvolume brick
option remote-host mercury
option transport-type tcp/client
end-volume

volume server
type protocol/server
option auth.ip.brick.allow *
option transport-type tcp/server
subvolumes brick
end-volume
volume brick
type performance/io-threads
subvolumes posix
end-volume

volume posix
type storage/posix
option directory /data
end-volume

frame : type(1) op(35)
frame : type(1) op(35)
frame : type(1) op(35)
frame : type(1) op(11)

/lib64/libc.so.6[0x3d1a8301b0]
/usr/local/lib/glusterfs/1.3.12/xlator/cluster/unify.so(unify_ns_truncate_cbk+0x7c)[0x2aaaab2e5
29c]
/usr/local/lib/glusterfs/1.3.12/xlator/protocol/client.so(client_stat_cbk+0xc4)[0x2aaaab0d7954]
/usr/local/lib/glusterfs/1.3.12/xlator/protocol/client.so(notify+0x922)[0x2aaaab0d4d82]
/usr/local/lib/libglusterfs.so.0(sys_epoll_iteration+0xbb)[0x2ab76373a22b]
/usr/local/lib/libglusterfs.so.0(poll_iteration+0x79)[0x2ab7637395e9]
[glusterfs](main+0x67c)[0x4026bc]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3d1a81d8b4]
[glusterfs][0x401b69]

--------------------------------------------------------------------------------
Tue 03 Feb 2009 11:09:34 AM GMT, comment #1 by Amar Tumballi:

Hi Emmet,
Can you please upgrade to 2.0.0rc1 and see whether this behavior is fixed too? Few bugs which were known in 1.3.x branch were fixed in the new branch for 2.0.x releases.

Regards,

--------------------------------------------------------------------------------

Wed 11 Mar 2009 05:39:04 AM GMT, comment #2 by 	Leandro Martelli <martellix>:

I'm having the same error with 2.0rc4, 2.0git and mainline-3.0-patch-928.

My configuration involved having multiples (11) nufa volume entries, one for each client, which was adapted from an older 1.3.10 config file (where I couldn't use the `localhost` option to specify the local disk).
I noticed that the last host worked (the last of the nufa entries), while others didn't. After many experiments with kernel version, libc etc. (checking differenced between these hosts), I then copied the content of a non-working nufa to a file on the failing host, mounted the nufa directory again and the error was gone.

At last, I deleted all nufa configs and added only one nufa with `localhost`, which managed to solve the problem.

Comment 1 Amar Tumballi 2009-06-24 16:40:29 UTC
truncate should not be called over NS. now this code is not there.. closing the bug.