Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 761798 (GLUSTER-66)

Summary:	deadlock in fetch_spec when mounting gluster
Product:	[Community] GlusterFS	Reporter:	Basavanagowda Kanur <gowda>
Component:	core	Assignee:	Raghavendra G <raghavendra>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	low	Docs Contact:
Priority:	low
Version:	mainline	CC:	gluster-bugs
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Basavanagowda Kanur 2009-06-25 06:50:10 UTC

[Migrated from savannah BTS] - bug 26054 [https://savannah.nongnu.org/bugs/?26054]


Mon 30 Mar 2009 10:00:47 PM GMT, original submission by Erick Tryzelaar <erickt>:

We've got a autofs-loaded gluster that's deadlocking while trying to mount a gluster fs. We got this stack trace:

#0 0x00002aaaaaf37766 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002aaaaabeb43d in event_dispatch_epoll (event_pool=0x607320) at event.c:830
#2 0x00000000004046b4 in fetch_spec (ctx=0x607010) at fetch-spec.c:200
#3 0x00000000004032da in main (argc=5, argv=0x7fffa3520ce8) at glusterfsd.c:289

However, the spec file is local, not remote. It was mounted with:

/bin/mount -t glusterfs -s -o rw,intr /etc/glusterfs/client.vol /mnt/gluster

Which spawned off this:

/usr/sbin/glusterfs --log-level=WARNING --volfile-server=/etc/glusterfs/client.vol --volfile-server-port=6996 /mnt/gluster

I'm not sure if you need this, but the ctx that used in fetch_spec is this according to gdb:

$2 = {cmd_args = {volfile_server = 0x100000001 <Address 0x100000001 out of bounds>, volume_file = 0x0, log_level = GF_LOG_NONE,
log_file = 0x60735000000000 <Address 0x60735000000000 out of bounds>, volfile_server_port = 0, volfile_server_transport = 0x0,
pid_file = 0x0, no_daemon_mode = 0, run_id = 0x9ea648a000000000 <Address 0x9ea648a000000000 out of bounds>, debug_mode = -1,
xlator_options = {next = 0xa0862aa0ffffffff, prev = 0xa19af790ffffffff}, fuse_direct_io_mode_flag = -1,
volfile_check = -880207200, fuse_entry_timeout = -4.9620051038398657e+87, fuse_attribute_timeout = -6.8227563537816586e+88,
volume_name = 0xd6fe7420ffffffff <Address 0xd6fe7420ffffffff out of bounds>, non_local = -1,
icon_name = 0xdafed1a0ffffffff <Address 0xdafed1a0ffffffff out of bounds>, fuse_nodev = -1, fuse_nosuid = -608137200,
mount_point = 0xdcdeb3a0ffffffff <Address 0xdcdeb3a0ffffffff out of bounds>,
volfile_id = 0xdda9ac90ffffffff <Address 0xdda9ac90ffffffff out of bounds>},
process_uuid = 0xdebe95a0ffffffff <Address 0xdebe95a0ffffffff out of bounds>, specfp = 0xdf898e90ffffffff,
pidfp = 0xe09e77a0ffffffff, fin = -1 '', timer = 0xe27e59a0ffffffff, ib = 0xe3495290ffffffff, pool = 0xe45e3ba0ffffffff,
graph = 0xe5293490ffffffff, top = 0xe6475820ffffffff, event_pool = 0xe7125110ffffffff, lock = {__data = {__lock = -1,
__count = 3894884896, __owner = -1, __nusers = 3908186896, __kind = -1, __spins = -368632800, __list = {
__prev = 0xead21510ffffffff, __next = 0xebe6fe20ffffffff}}, __size = " :'�\0203� \034\a�\020\025� �",
__align = -1718340819410223105}, xl_count = -1, volfile_checksum = 3971086096}

--------------------------------------------------------------------------------
Tue 31 Mar 2009 06:18:28 AM GMT, comment #1 by Raghavendra <raghavendra>:

--volfile-server option should specify a server. If the volume specfication is present locally, it can be either specified using -f option or --volfile-server should be passed localhost and a glusterfs server should be running locally.

--------------------------------------------------------------------------------
Tue 31 Mar 2009 10:29:35 PM GMT, comment #2 by Erick Tryzelaar <erickt>:

I've figured out how this happened. We configured automount through an NIS server, with a line like this:

gluster -fstype=glusterfs :/etc/glusterfs/client.vol

Then we tried to go to the mountpoint without /etc/glusterfs/client.vol existing. I'm guessing that since the file didn't exist, gluster interpreted "/etc/glusterfs/client.vol" as a hostname, and specified the "--volfile-server=/etc/glusterfs/client.vol". When the file does exist, /usr/sbin/gluster is started with "--volfile=/etc/glusterfs/client.vol" and mount correctly. It's repeatable in that if we later remove the client.vol file and restart gluster, it'll hang again.

--------------------------------------------------------------------------------
Wed 01 Apr 2009 07:34:17 AM GMT, comment #3 by 	Raghavendra <raghavendra>:

You were correct in your diagnosis. A fix is on its way to repository.

Comment 1 Raghavendra G 2009-06-29 03:01:27 UTC

The problem was due to glusterfs interpreting the path-to-volume-spec file as remote server address and trying to connect to it. A fix has been committed in 96b687b9b8d58fc70dfaaed42dbe1b35799117f8 which adds a new option type INTERNET_ADDRESS and corresponding validation code. since remote-host has to be of type INTERNET_ADDRESS, the initialization of protocol/client fails in the above bug preventing deadlock.