532426 – mount.gfs2 hangs on cluster-3.0.3

Bug 532426 - mount.gfs2 hangs on cluster-3.0.3

Summary: mount.gfs2 hangs on cluster-3.0.3

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	cluster
Sub Component:
Version:	12
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	David Teigland
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-11-02 09:54 UTC by Dan Candea
Modified:	2010-09-22 09:55 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2010-09-22 09:55:21 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
node logs and debug info (20.25 KB, application/gzip) 2009-11-02 09:55 UTC, Dan Candea	no flags	Details
node logs and debug info (13.92 KB, application/gzip) 2009-11-02 09:56 UTC, Dan Candea	no flags	Details
node-techno logs and debug info (7.56 KB, application/gzip) 2009-11-02 09:57 UTC, Dan Candea	no flags	Details
View All

Description Dan Candea 2009-11-02 09:54:12 UTC

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/532.3 (KHTML, like Gecko) Chrome/4.0.223.6 Safari/532.3

hi all

I would appreciate some help.

I have set up a cluster 3.0.3 with 2.6.31 kernel
All went well until I tried a gfs2 mount. The mount hangs without an error
gfs_control dump reports nothing:

gfs_control dump
1256941054 logging mode 3 syslog f 160 p 6 logfile p 6 
/var/log/cluster/gfs_controld.log
1256941054 gfs_controld 3.0.3 started
1256941054 /cluster/gfs_controld/@plock_ownership is 1
1256941054 /cluster/gfs_controld/@plock_rate_limit is 0
1256941054 logging mode 3 syslog f 160 p 6 logfile p 6 
/var/log/cluster/gfs_controld.log
1256941054 group_mode 3 compat 0


an strace on mount comand it appers that gfs_control is not responding

brk(0)                                  = 0x7f86d9054000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
= 0x7f86d71aa000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
= 0x7f86d71a9000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or 
directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=27308, ...}) = 0
mmap(NULL, 27308, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f86d71a2000
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, 
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\346\1\0\0\0\0\0@"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1338408, ...}) = 0
mmap(NULL, 3446712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 
0) = 0x7f86d6c46000
mprotect(0x7f86d6d86000, 2097152, PROT_NONE) = 0
mmap(0x7f86d6f86000, 20480, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x140000) = 0x7f86d6f86000
mmap(0x7f86d6f8b000, 18360, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f86d6f8b000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
= 0x7f86d71a1000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
= 0x7f86d71a0000
arch_prctl(ARCH_SET_FS, 0x7f86d71a06f0) = 0
open("/dev/urandom", O_RDONLY)          = 3
read(3, "k\6\244\266U\3731\237"..., 8)  = 8
close(3)                                = 0
mprotect(0x7f86d6f86000, 16384, PROT_READ) = 0
mprotect(0x7f86d73b7000, 4096, PROT_READ) = 0
mprotect(0x7f86d71ab000, 4096, PROT_READ) = 0
munmap(0x7f86d71a2000, 27308)           = 0
brk(0)                                  = 0x7f86d9054000
brk(0x7f86d9076000)                     = 0x7f86d9076000
lstat("/dev", {st_mode=S_IFDIR|0755, st_size=3300, ...}) = 0
lstat("/dev/mapper", {st_mode=S_IFDIR|0755, st_size=80, ...}) = 0
lstat("/dev/mapper/san", {st_mode=S_IFBLK|0640, st_rdev=makedev(253, 0), 
...}) = 0
brk(0x7f86d9075000)                     = 0x7f86d9075000
lstat("/var", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/var/www", {st_mode=S_IFDIR|0755, st_size=26, ...}) = 0
lstat("/var/www/superstore.to", {st_mode=S_IFDIR|0755, st_size=17, ...}) = 0
lstat("/var/www/superstore.to/data", {st_mode=S_IFDIR|0755, st_size=6, 
...}) = 0
stat("/var/www/superstore.to/data", {st_mode=S_IFDIR|0755, st_size=6, 
...}) = 0
open("/dev/mapper/san", O_RDONLY)       = 3
lseek(3, 65536, SEEK_SET)               = 65536
read(3, 
"\1\26\31p\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0d\0\0\0\0\0\0\7\t\0\0\7l\0"..., 
512) = 512
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
socket(PF_FILE, SOCK_STREAM, 0)         = 3
connect(3, {sa_family=AF_FILE, path=@"gfsc_sock"...}, 12) = 0
write(3, 
"\\o\\o\1\0\1\0\7\0\0\0\0\0\0\0`p\0\0\0\0\0\0\0\0\0\0\0\0\0\0s"..., 
28768) = 28768
read(3,


any idee how to proceed further?

thank you

Reproducible: Always

Steps to Reproduce:
1. start the cluster

ulimit -c unlimited
modprobe dlm
modprobe gfs2
mount -t configfs none /sys/kernel/config

cman_tool join
fence_node -U
groupd
fenced
dlm_controld
gfs_controld
fence_tool join


2. try to mount a gfs2 partition

Actual Results:  
mount hangs and gfs_controld is not responding

Comment 1 Dan Candea 2009-11-02 09:55:59 UTC

Created attachment 367099 [details]
node logs and debug info

Comment 2 Dan Candea 2009-11-02 09:56:22 UTC

Created attachment 367100 [details]
node logs and debug info

Comment 3 Dan Candea 2009-11-02 09:57:00 UTC

Comment on attachment 367099 [details]
node logs and debug info

node trompeten

Comment 4 Dan Candea 2009-11-02 09:57:27 UTC

Created attachment 367101 [details]
node-techno logs and debug info

Comment 5 Fabio Massimo Di Nitto 2009-11-02 10:20:32 UTC

Hi,

something isn´t right in this bug report. What is your base distribution? Are you trying to run cluster 3.0.3 on top of RHEL5.4?

Why are you running the daemons manually instead of using the init script? groupd doesn´t need to start at all in cluster 3 unless you are performing rolling upgrades operations.

Comment 6 Dan Candea 2009-11-02 10:41:11 UTC

I installed it manually from the source, so I can trace the problem. 
I have started groupd cause it was the way I did it in cluster2, there is light documentation for cluster3. I just started the cluster without groupd, and is the same behavior

Comment 7 Steve Whitehouse 2009-11-02 10:48:09 UTC

I recommend that you run:

udevadm monitor --environment --kernel

on the node on which you do the mount, and then paste the results from that into this bz after running a mount which hangs. That should tell us a bit more about what is going on.

Comment 8 Dan Candea 2009-11-02 10:57:06 UTC

I don't receive anything, this means no uevent from the mount command?

techno ~ # udevadm monitor --environment --kernel
monitor will print the received events for:
KERNEL - the kernel uevent

Comment 9 Steve Whitehouse 2009-11-02 11:03:43 UTC

It is the kernel which is supposed to produce the uevents, not the mount command, so it looks like the issue is very early in the mount sequence, in this case.

Comment 10 Dan Candea 2009-11-02 11:14:05 UTC

is a 2.6.31 kernel, as requested in the cluster website. there is a special feature that should I check in the kernel?

Comment 11 Steve Whitehouse 2009-11-05 09:40:15 UTC

No, and I don't think that the kernel is the issue since it looks like the mount process doesn't get to the point of actually trying to talk to the kernel (otherwise you'd get uevents reported). The problem looks like it occurs earlier than that.

Comment 12 Bug Zapper 2009-11-16 14:52:24 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 13 Steve Whitehouse 2010-08-09 15:15:30 UTC

Have you solved this issue yet? I'd suggest upgrading to a more recent Fedora at least.

From what I can see above it seems that the problem might be caused by a lack of gfs_controld running which maybe because the cman packages isn't installed or working.

Let us know if you are continuing to have problems, otherwise we'll close this.

Comment 14 Dan Candea 2010-08-10 11:24:47 UTC

I rolled back to cluster2, so the issue is still there. the cluster3 was compiled from source and gfs_controld was running, but I suspect it didn't respond

Comment 15 Steve Whitehouse 2010-09-22 09:28:12 UTC

I still don't understand what is going on here.... can you explain exactly which versions of software don't work and exactly what you are doing to reproduce the problem?

Comment 16 Dan Candea 2010-09-22 09:49:27 UTC

mount.gfs2 command hanged on cluster-3.0.3 with kernel 2.6.31
I recently tried cluster 3.0.16 with kernel 2.6.32 and everything is working fine. I think you should close the bug cause it was an issue I had last year, the new cluster sources are working fine.

Comment 17 Steve Whitehouse 2010-09-22 09:55:21 UTC

Ok, let us know if you have any more issues.

Note You need to log in before you can comment on or make changes to this bug.