Hide Forgot
[Migrated from savannah BTS] - bug 26054 [https://savannah.nongnu.org/bugs/?26054] Mon 30 Mar 2009 10:00:47 PM GMT, original submission by Erick Tryzelaar <erickt>: We've got a autofs-loaded gluster that's deadlocking while trying to mount a gluster fs. We got this stack trace: #0 0x00002aaaaaf37766 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002aaaaabeb43d in event_dispatch_epoll (event_pool=0x607320) at event.c:830 #2 0x00000000004046b4 in fetch_spec (ctx=0x607010) at fetch-spec.c:200 #3 0x00000000004032da in main (argc=5, argv=0x7fffa3520ce8) at glusterfsd.c:289 However, the spec file is local, not remote. It was mounted with: /bin/mount -t glusterfs -s -o rw,intr /etc/glusterfs/client.vol /mnt/gluster Which spawned off this: /usr/sbin/glusterfs --log-level=WARNING --volfile-server=/etc/glusterfs/client.vol --volfile-server-port=6996 /mnt/gluster I'm not sure if you need this, but the ctx that used in fetch_spec is this according to gdb: $2 = {cmd_args = {volfile_server = 0x100000001 <Address 0x100000001 out of bounds>, volume_file = 0x0, log_level = GF_LOG_NONE, log_file = 0x60735000000000 <Address 0x60735000000000 out of bounds>, volfile_server_port = 0, volfile_server_transport = 0x0, pid_file = 0x0, no_daemon_mode = 0, run_id = 0x9ea648a000000000 <Address 0x9ea648a000000000 out of bounds>, debug_mode = -1, xlator_options = {next = 0xa0862aa0ffffffff, prev = 0xa19af790ffffffff}, fuse_direct_io_mode_flag = -1, volfile_check = -880207200, fuse_entry_timeout = -4.9620051038398657e+87, fuse_attribute_timeout = -6.8227563537816586e+88, volume_name = 0xd6fe7420ffffffff <Address 0xd6fe7420ffffffff out of bounds>, non_local = -1, icon_name = 0xdafed1a0ffffffff <Address 0xdafed1a0ffffffff out of bounds>, fuse_nodev = -1, fuse_nosuid = -608137200, mount_point = 0xdcdeb3a0ffffffff <Address 0xdcdeb3a0ffffffff out of bounds>, volfile_id = 0xdda9ac90ffffffff <Address 0xdda9ac90ffffffff out of bounds>}, process_uuid = 0xdebe95a0ffffffff <Address 0xdebe95a0ffffffff out of bounds>, specfp = 0xdf898e90ffffffff, pidfp = 0xe09e77a0ffffffff, fin = -1 '', timer = 0xe27e59a0ffffffff, ib = 0xe3495290ffffffff, pool = 0xe45e3ba0ffffffff, graph = 0xe5293490ffffffff, top = 0xe6475820ffffffff, event_pool = 0xe7125110ffffffff, lock = {__data = {__lock = -1, __count = 3894884896, __owner = -1, __nusers = 3908186896, __kind = -1, __spins = -368632800, __list = { __prev = 0xead21510ffffffff, __next = 0xebe6fe20ffffffff}}, __size = " :'�\0203� \034\a�\020\025� �", __align = -1718340819410223105}, xl_count = -1, volfile_checksum = 3971086096} -------------------------------------------------------------------------------- Tue 31 Mar 2009 06:18:28 AM GMT, comment #1 by Raghavendra <raghavendra>: --volfile-server option should specify a server. If the volume specfication is present locally, it can be either specified using -f option or --volfile-server should be passed localhost and a glusterfs server should be running locally. -------------------------------------------------------------------------------- Tue 31 Mar 2009 10:29:35 PM GMT, comment #2 by Erick Tryzelaar <erickt>: I've figured out how this happened. We configured automount through an NIS server, with a line like this: gluster -fstype=glusterfs :/etc/glusterfs/client.vol Then we tried to go to the mountpoint without /etc/glusterfs/client.vol existing. I'm guessing that since the file didn't exist, gluster interpreted "/etc/glusterfs/client.vol" as a hostname, and specified the "--volfile-server=/etc/glusterfs/client.vol". When the file does exist, /usr/sbin/gluster is started with "--volfile=/etc/glusterfs/client.vol" and mount correctly. It's repeatable in that if we later remove the client.vol file and restart gluster, it'll hang again. -------------------------------------------------------------------------------- Wed 01 Apr 2009 07:34:17 AM GMT, comment #3 by Raghavendra <raghavendra>: You were correct in your diagnosis. A fix is on its way to repository.
The problem was due to glusterfs interpreting the path-to-volume-spec file as remote server address and trying to connect to it. A fix has been committed in 96b687b9b8d58fc70dfaaed42dbe1b35799117f8 which adds a new option type INTERNET_ADDRESS and corresponding validation code. since remote-host has to be of type INTERNET_ADDRESS, the initialization of protocol/client fails in the above bug preventing deadlock.