Bug 559735 - GFS2 mount fails incorrectly after correctly failed second-mount attempt
Summary: GFS2 mount fails incorrectly after correctly failed second-mount attempt
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman
Version: 5.4
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Robert Peterson
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 590000
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-01-28 21:02 UTC by Issue Tracker
Modified: 2018-11-14 17:55 UTC (History)
9 users (show)

Fixed In Version: cman-2.0.115-55.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-13 22:31:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Proposed patch (606 bytes, patch)
2010-09-22 21:32 UTC, Robert Peterson
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0036 0 normal SHIPPED_LIVE cman bug-fix and enhancement update 2011-01-12 17:39:38 UTC

Description Issue Tracker 2010-01-28 21:02:25 UTC
Escalated to Bugzilla from IssueTracker

Comment 1 Toure Dunnon 2010-01-28 21:05:11 UTC
(/dev/sda is a partition that have GFS2 fs, with enough journal.) 

[root@rhel5-1 ~]# mount /dev/sda /mnt
[root@rhel5-1 ~]# mkdir /tmp/t1
[root@rhel5-1 ~]# mount -o ro /dev/sda /tmp/t1
/sbin/mount.gfs2: /dev/sda already mounted or /tmp/t1 busy

But the customer can do that if it is read-write instead of read-only.

Furthermore, after their have the failed read-only mount attempt, if they try to mount the partition with read-write again, it will fail like this:

[root@rhel5-1 ~]# mkdir /tmp/t2
[root@rhel5-1 ~]# mount /dev/sda /tmp/t2
/sbin/mount.gfs2: error mounting /dev/sda on /tmp/t2: Invalid argument
[root@rhel5-1 ~]# mount /dev/sda /tmp/t2  <-- *and if I try again, the error msg is different*
/sbin/mount.gfs2: mount point already used or other mount in progress
/sbin/mount.gfs2: error mounting lockproto lock_dlm

Comment 2 Robert Peterson 2010-02-05 17:13:02 UTC
I'm guessing that this is related to group membership and all
due to clustering.  As a test, can you try changing the gfs
locking protocol to lock_nolock temporarily and try these same
commands again?

Comment 4 Robert Peterson 2010-07-01 22:28:42 UTC
Resetting NEEDINFO since bugzilla reset it.

Comment 5 Steve Whitehouse 2010-07-02 09:54:12 UTC
I think its just a case of an issue in the error path so that the group membership isn't correctly set in this case. I think its clear enough how to reproduce this so we can test it and either close or fix the bug according to the results.

Comment 7 Robert Peterson 2010-09-20 21:39:39 UTC
I recreated this problem and did some debugging on it.
Here is what I know so far:

(1) The problem is _not_ that the second mount as -ro fails
because gfs2 behaves the same way as other file systems.  For
example, here's ext3:

[root@roth-01 ~]# mount -text3 /dev/roth_vg/roth_lv /mnt/gfs2
[root@roth-01 ~]# mount -o ro -text3 /dev/roth_vg/roth_lv /tmp/t1
mount: /dev/roth_vg/roth_lv already mounted or /tmp/t1 busy

(2) GFS2 behaves the same as other file systems in regards to
mounting to a second location as rw as well.  For example:

[root@roth-01 ~]# mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2
[root@roth-01 ~]# mount -tgfs2 /dev/roth_vg/roth_lv /tmp/t1
[root@roth-01 ~]# umount /tmp/t1
[root@roth-01 ~]# umount /mnt/gfs2

(3) Toure's subsequent mount problem doesn't seem to recreate when
I have the latest gfs2-utils and kernel 2.6.18-222.el5 running:

[root@roth-01 ~]# mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2
[root@roth-01 ~]# mount -o ro -tgfs2 /dev/roth_vg/roth_lv /tmp/t1
/sbin/mount.gfs2: /dev/mapper/roth_vg-roth_lv already mounted or /tmp/t1 busy
[root@roth-01 ~]# mount -tgfs2 /dev/roth_vg/roth_lv /tmp/t2
[root@roth-01 ~]# umount /tmp/t2
[root@roth-01 ~]# umount /mnt/gfs2

(3) However, after the above scenario, I have trouble mounting
the original mount point:

[root@roth-01 ~]# mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2
/sbin/mount.gfs2: error mounting /dev/mapper/roth_vg-roth_lv on /mnt/gfs2: Invalid argument

accompanied by these dmesgs:

lock_dlm: no mount options, (u)mount helpers not installed
GFS2: fsid=: can't mount proto=lock_dlm, table=bobs_roth:roth_lv, hostdata=

which is odd, because:
[root@roth-01 ~]# ls -l /sbin/umount.gfs2
-rwxr-xr-x 1 root root 37160 Sep 20 09:46 /sbin/umount.gfs2
[root@roth-01 ~]# ls -l /sbin/mount.gfs2
-rwxr-xr-x 1 root root 40552 Sep 20 09:46 /sbin/mount.gfs2

The gfs_controld daemon had this to say about the most recent
scenario:

[root@roth-01 ~]# group_tool dump gfs
1285017809 config_no_withdraw 0
1285017809 config_no_plock 0
1285017809 config_plock_rate_limit 100
1285017809 config_plock_ownership 0
1285017809 config_drop_resources_time 10000
1285017809 config_drop_resources_count 10
1285017809 config_drop_resources_age 10000
1285017809 protocol 1.0.0
1285017809 listen 1
1285017809 cpg 4
1285017809 groupd 5
1285017809 uevent 6
1285017809 plocks 8
1285017809 plock need_fsid_translation 1
1285017809 plock cpg message size: 336 bytes
1285017809 setup done
1285017843 client 6: join /mnt/gfs2 gfs2 lock_dlm bobs_roth:roth_lv rw /dev/mapper/roth_vg-roth_lv
1285017843 mount: /mnt/gfs2 gfs2 lock_dlm bobs_roth:roth_lv rw /dev/mapper/roth_vg-roth_lv
1285017843 roth_lv cluster name matches: bobs_roth
1285017843 roth_lv do_mount: rv 0
1285017843 groupd cb: set_id roth_lv 10001
1285017843 groupd cb: start roth_lv type 2 count 1 members 1 
1285017843 roth_lv start 3 init 1 type 2 member_count 1
1285017843 roth_lv add member 1
1285017843 roth_lv total members 1 master_nodeid -1 prev -1
1285017843 roth_lv start_first_mounter
1285017843 roth_lv start_done 3
1285017843 notify_mount_client: nodir not found for lockspace roth_lv
1285017843 notify_mount_client: ccs_disconnect
1285017843 notify_mount_client: hostdata=jid=0:id=65537:first=1
1285017843 groupd cb: finish roth_lv
1285017843 roth_lv finish 3 needs_recovery 0
1285017843 roth_lv set /sys/fs/gfs2/bobs_roth:roth_lv/lock_module/block to 0
1285017843 roth_lv set open /sys/fs/gfs2/bobs_roth:roth_lv/lock_module/block error -1 2
1285017843 kernel: add@ bobs_roth:roth_lv
1285017843 roth_lv ping_kernel_mount 0
1285017843 kernel: change@ bobs_roth:roth_lv
1285017843 roth_lv kernel_recovery_done_first first_done 0
1285017843 kernel: change@ bobs_roth:roth_lv
1285017843 roth_lv kernel_recovery_done_first first_done 0
1285017843 kernel: change@ bobs_roth:roth_lv
1285017843 roth_lv kernel_recovery_done_first first_done 0
1285017843 kernel: change@ bobs_roth:roth_lv
1285017843 roth_lv kernel_recovery_done_first first_done 1
1285017843 roth_lv receive_recovery_done from 1 needs_recovery 0
1285017843 roth_lv set /sys/fs/gfs2/bobs_roth:roth_lv/lock_module/block to 0
1285017843 client 6: mount_result /mnt/gfs2 gfs2 0
1285017843 roth_lv got_mount_result: ci 6 result 0 another 0 first_mounter 1 opts 9
1285017843 roth_lv send_mount_status kernel_mount_error 0 first_mounter 1
1285017843 client 6 fd 9 dead
1285017843 roth_lv receive_mount_status from 1 len 288 last_cb 3
1285017843 roth_lv _receive_mount_status from 1 kernel_mount_error 0 first_mounter 1 opts 9
1285017860 client 6: join /tmp/t1 gfs2 lock_dlm bobs_roth:roth_lv ro /dev/mapper/roth_vg-roth_lv
1285017860 mount: /tmp/t1 gfs2 lock_dlm bobs_roth:roth_lv ro /dev/mapper/roth_vg-roth_lv
1285017860 roth_lv add_another_mountpoint dir /tmp/t1 dev /dev/mapper/roth_vg-roth_lv ci 6
1285017860 roth_lv do_mount: rv -114
1285017860 client 6: mount_result /tmp/t1 gfs2 -1
1285017860 roth_lv got_mount_result: ci 6 result -1 another -114 first_mounter 1 opts 1
1285017860 Assertion failed on line 2164 of file recover.c
Assertion:  "found"

1285017860 client 6 fd 9 dead
1285017902 client 6: join /tmp/t2 gfs2 lock_dlm bobs_roth:roth_lv rw /dev/mapper/roth_vg-roth_lv
1285017902 mount: /tmp/t2 gfs2 lock_dlm bobs_roth:roth_lv rw /dev/mapper/roth_vg-roth_lv
1285017902 roth_lv add_another_mountpoint dir /tmp/t2 dev /dev/mapper/roth_vg-roth_lv ci 6
1285017902 roth_lv do_mount: rv -114
1285017902 client 6: mount_result /tmp/t2 gfs2 0
1285017902 roth_lv got_mount_result: ci 6 result 0 another -114 first_mounter 1 opts 1
1285017902 client 6 fd 9 dead
1285017974 client 6: leave /tmp/t2 gfs2 0
1285017974 roth_lv removed mountpoint /tmp/t2, more remaining
1285017974 client 6 fd 9 dead
1285017974 client 6 fd -1 dead
1285017978 kernel: remove@ bobs_roth:roth_lv
1285017978 roth_lv ping_kernel_mount 0
1285017978 client 6: leave /mnt/gfs2 gfs2 0
1285017978 roth_lv removed mountpoint /mnt/gfs2, more remaining
1285017978 client 6 fd 9 dead
1285017978 client 6 fd -1 dead
1285018103 client 6: join /mnt/gfs2 gfs2 lock_dlm bobs_roth:roth_lv rw /dev/mapper/roth_vg-roth_lv
1285018103 mount: /mnt/gfs2 gfs2 lock_dlm bobs_roth:roth_lv rw /dev/mapper/roth_vg-roth_lv
1285018103 roth_lv add_another_mountpoint dir /mnt/gfs2 dev /dev/mapper/roth_vg-roth_lv ci 6
1285018103 roth_lv do_mount: rv -114
1285018103 client 6: mount_result /mnt/gfs2 gfs2 -1
1285018103 roth_lv got_mount_result: ci 6 result -1 another -114 first_mounter 1 opts 1
1285018103 Assertion failed on line 2164 of file recover.c
Assertion:  "found"

1285018103 client 6 fd 9 dead
1285018392 client 6: dump

I haven't really dug through the gfs_controld log; I'll look at
it in the morning.  In the meantime, I'm adding Dave T,
gfs_controld expert, to the cc list.

Comment 8 Robert Peterson 2010-09-22 21:32:31 UTC
Created attachment 449057 [details]
Proposed patch

This patch seems to fix the problem.

Comment 9 Robert Peterson 2010-09-22 21:34:43 UTC
I'd like Dave Teigland to look at the patch before I ship it.

Comment 10 David Teigland 2010-09-22 22:04:48 UTC
Looks good, thanks.

Comment 11 Robert Peterson 2010-09-23 13:38:44 UTC
I pushed the patch to the RHEL56 branch of the cluster git tree
for inclusion into 5.6.  This bug does not recreate in RHEL6
and the patched code does not exist in RHEL6 or upstream.
Therefore, there should be no upstream or RHEL6 crosswrites
needed.  It was tested on system roth-01.  Changing status
to POST until this gets built.

Comment 13 Nate Straz 2010-11-17 20:59:05 UTC
Verified that second mount attempt does not fail according to Bob's instructions.
Used cman-2.0.115-63.el5.

Comment 15 errata-xmlrpc 2011-01-13 22:31:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0036.html


Note You need to log in before you can comment on or make changes to this bug.