Bug 559735
Summary: | GFS2 mount fails incorrectly after correctly failed second-mount attempt | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Issue Tracker <tao> | ||||
Component: | cman | Assignee: | Robert Peterson <rpeterso> | ||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 5.4 | CC: | adas, bmarzins, cluster-maint, edamato, iannis, swhiteho, tao, tdunnon, teigland | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | cman-2.0.115-55.el5 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-01-13 22:31:38 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 590000 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Issue Tracker
2010-01-28 21:02:25 UTC
(/dev/sda is a partition that have GFS2 fs, with enough journal.) [root@rhel5-1 ~]# mount /dev/sda /mnt [root@rhel5-1 ~]# mkdir /tmp/t1 [root@rhel5-1 ~]# mount -o ro /dev/sda /tmp/t1 /sbin/mount.gfs2: /dev/sda already mounted or /tmp/t1 busy But the customer can do that if it is read-write instead of read-only. Furthermore, after their have the failed read-only mount attempt, if they try to mount the partition with read-write again, it will fail like this: [root@rhel5-1 ~]# mkdir /tmp/t2 [root@rhel5-1 ~]# mount /dev/sda /tmp/t2 /sbin/mount.gfs2: error mounting /dev/sda on /tmp/t2: Invalid argument [root@rhel5-1 ~]# mount /dev/sda /tmp/t2 <-- *and if I try again, the error msg is different* /sbin/mount.gfs2: mount point already used or other mount in progress /sbin/mount.gfs2: error mounting lockproto lock_dlm I'm guessing that this is related to group membership and all due to clustering. As a test, can you try changing the gfs locking protocol to lock_nolock temporarily and try these same commands again? Resetting NEEDINFO since bugzilla reset it. I think its just a case of an issue in the error path so that the group membership isn't correctly set in this case. I think its clear enough how to reproduce this so we can test it and either close or fix the bug according to the results. I recreated this problem and did some debugging on it. Here is what I know so far: (1) The problem is _not_ that the second mount as -ro fails because gfs2 behaves the same way as other file systems. For example, here's ext3: [root@roth-01 ~]# mount -text3 /dev/roth_vg/roth_lv /mnt/gfs2 [root@roth-01 ~]# mount -o ro -text3 /dev/roth_vg/roth_lv /tmp/t1 mount: /dev/roth_vg/roth_lv already mounted or /tmp/t1 busy (2) GFS2 behaves the same as other file systems in regards to mounting to a second location as rw as well. For example: [root@roth-01 ~]# mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2 [root@roth-01 ~]# mount -tgfs2 /dev/roth_vg/roth_lv /tmp/t1 [root@roth-01 ~]# umount /tmp/t1 [root@roth-01 ~]# umount /mnt/gfs2 (3) Toure's subsequent mount problem doesn't seem to recreate when I have the latest gfs2-utils and kernel 2.6.18-222.el5 running: [root@roth-01 ~]# mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2 [root@roth-01 ~]# mount -o ro -tgfs2 /dev/roth_vg/roth_lv /tmp/t1 /sbin/mount.gfs2: /dev/mapper/roth_vg-roth_lv already mounted or /tmp/t1 busy [root@roth-01 ~]# mount -tgfs2 /dev/roth_vg/roth_lv /tmp/t2 [root@roth-01 ~]# umount /tmp/t2 [root@roth-01 ~]# umount /mnt/gfs2 (3) However, after the above scenario, I have trouble mounting the original mount point: [root@roth-01 ~]# mount -tgfs2 /dev/roth_vg/roth_lv /mnt/gfs2 /sbin/mount.gfs2: error mounting /dev/mapper/roth_vg-roth_lv on /mnt/gfs2: Invalid argument accompanied by these dmesgs: lock_dlm: no mount options, (u)mount helpers not installed GFS2: fsid=: can't mount proto=lock_dlm, table=bobs_roth:roth_lv, hostdata= which is odd, because: [root@roth-01 ~]# ls -l /sbin/umount.gfs2 -rwxr-xr-x 1 root root 37160 Sep 20 09:46 /sbin/umount.gfs2 [root@roth-01 ~]# ls -l /sbin/mount.gfs2 -rwxr-xr-x 1 root root 40552 Sep 20 09:46 /sbin/mount.gfs2 The gfs_controld daemon had this to say about the most recent scenario: [root@roth-01 ~]# group_tool dump gfs 1285017809 config_no_withdraw 0 1285017809 config_no_plock 0 1285017809 config_plock_rate_limit 100 1285017809 config_plock_ownership 0 1285017809 config_drop_resources_time 10000 1285017809 config_drop_resources_count 10 1285017809 config_drop_resources_age 10000 1285017809 protocol 1.0.0 1285017809 listen 1 1285017809 cpg 4 1285017809 groupd 5 1285017809 uevent 6 1285017809 plocks 8 1285017809 plock need_fsid_translation 1 1285017809 plock cpg message size: 336 bytes 1285017809 setup done 1285017843 client 6: join /mnt/gfs2 gfs2 lock_dlm bobs_roth:roth_lv rw /dev/mapper/roth_vg-roth_lv 1285017843 mount: /mnt/gfs2 gfs2 lock_dlm bobs_roth:roth_lv rw /dev/mapper/roth_vg-roth_lv 1285017843 roth_lv cluster name matches: bobs_roth 1285017843 roth_lv do_mount: rv 0 1285017843 groupd cb: set_id roth_lv 10001 1285017843 groupd cb: start roth_lv type 2 count 1 members 1 1285017843 roth_lv start 3 init 1 type 2 member_count 1 1285017843 roth_lv add member 1 1285017843 roth_lv total members 1 master_nodeid -1 prev -1 1285017843 roth_lv start_first_mounter 1285017843 roth_lv start_done 3 1285017843 notify_mount_client: nodir not found for lockspace roth_lv 1285017843 notify_mount_client: ccs_disconnect 1285017843 notify_mount_client: hostdata=jid=0:id=65537:first=1 1285017843 groupd cb: finish roth_lv 1285017843 roth_lv finish 3 needs_recovery 0 1285017843 roth_lv set /sys/fs/gfs2/bobs_roth:roth_lv/lock_module/block to 0 1285017843 roth_lv set open /sys/fs/gfs2/bobs_roth:roth_lv/lock_module/block error -1 2 1285017843 kernel: add@ bobs_roth:roth_lv 1285017843 roth_lv ping_kernel_mount 0 1285017843 kernel: change@ bobs_roth:roth_lv 1285017843 roth_lv kernel_recovery_done_first first_done 0 1285017843 kernel: change@ bobs_roth:roth_lv 1285017843 roth_lv kernel_recovery_done_first first_done 0 1285017843 kernel: change@ bobs_roth:roth_lv 1285017843 roth_lv kernel_recovery_done_first first_done 0 1285017843 kernel: change@ bobs_roth:roth_lv 1285017843 roth_lv kernel_recovery_done_first first_done 1 1285017843 roth_lv receive_recovery_done from 1 needs_recovery 0 1285017843 roth_lv set /sys/fs/gfs2/bobs_roth:roth_lv/lock_module/block to 0 1285017843 client 6: mount_result /mnt/gfs2 gfs2 0 1285017843 roth_lv got_mount_result: ci 6 result 0 another 0 first_mounter 1 opts 9 1285017843 roth_lv send_mount_status kernel_mount_error 0 first_mounter 1 1285017843 client 6 fd 9 dead 1285017843 roth_lv receive_mount_status from 1 len 288 last_cb 3 1285017843 roth_lv _receive_mount_status from 1 kernel_mount_error 0 first_mounter 1 opts 9 1285017860 client 6: join /tmp/t1 gfs2 lock_dlm bobs_roth:roth_lv ro /dev/mapper/roth_vg-roth_lv 1285017860 mount: /tmp/t1 gfs2 lock_dlm bobs_roth:roth_lv ro /dev/mapper/roth_vg-roth_lv 1285017860 roth_lv add_another_mountpoint dir /tmp/t1 dev /dev/mapper/roth_vg-roth_lv ci 6 1285017860 roth_lv do_mount: rv -114 1285017860 client 6: mount_result /tmp/t1 gfs2 -1 1285017860 roth_lv got_mount_result: ci 6 result -1 another -114 first_mounter 1 opts 1 1285017860 Assertion failed on line 2164 of file recover.c Assertion: "found" 1285017860 client 6 fd 9 dead 1285017902 client 6: join /tmp/t2 gfs2 lock_dlm bobs_roth:roth_lv rw /dev/mapper/roth_vg-roth_lv 1285017902 mount: /tmp/t2 gfs2 lock_dlm bobs_roth:roth_lv rw /dev/mapper/roth_vg-roth_lv 1285017902 roth_lv add_another_mountpoint dir /tmp/t2 dev /dev/mapper/roth_vg-roth_lv ci 6 1285017902 roth_lv do_mount: rv -114 1285017902 client 6: mount_result /tmp/t2 gfs2 0 1285017902 roth_lv got_mount_result: ci 6 result 0 another -114 first_mounter 1 opts 1 1285017902 client 6 fd 9 dead 1285017974 client 6: leave /tmp/t2 gfs2 0 1285017974 roth_lv removed mountpoint /tmp/t2, more remaining 1285017974 client 6 fd 9 dead 1285017974 client 6 fd -1 dead 1285017978 kernel: remove@ bobs_roth:roth_lv 1285017978 roth_lv ping_kernel_mount 0 1285017978 client 6: leave /mnt/gfs2 gfs2 0 1285017978 roth_lv removed mountpoint /mnt/gfs2, more remaining 1285017978 client 6 fd 9 dead 1285017978 client 6 fd -1 dead 1285018103 client 6: join /mnt/gfs2 gfs2 lock_dlm bobs_roth:roth_lv rw /dev/mapper/roth_vg-roth_lv 1285018103 mount: /mnt/gfs2 gfs2 lock_dlm bobs_roth:roth_lv rw /dev/mapper/roth_vg-roth_lv 1285018103 roth_lv add_another_mountpoint dir /mnt/gfs2 dev /dev/mapper/roth_vg-roth_lv ci 6 1285018103 roth_lv do_mount: rv -114 1285018103 client 6: mount_result /mnt/gfs2 gfs2 -1 1285018103 roth_lv got_mount_result: ci 6 result -1 another -114 first_mounter 1 opts 1 1285018103 Assertion failed on line 2164 of file recover.c Assertion: "found" 1285018103 client 6 fd 9 dead 1285018392 client 6: dump I haven't really dug through the gfs_controld log; I'll look at it in the morning. In the meantime, I'm adding Dave T, gfs_controld expert, to the cc list. Created attachment 449057 [details]
Proposed patch
This patch seems to fix the problem.
I'd like Dave Teigland to look at the patch before I ship it. Looks good, thanks. I pushed the patch to the RHEL56 branch of the cluster git tree for inclusion into 5.6. This bug does not recreate in RHEL6 and the patched code does not exist in RHEL6 or upstream. Therefore, there should be no upstream or RHEL6 crosswrites needed. It was tested on system roth-01. Changing status to POST until this gets built. Verified that second mount attempt does not fail according to Bob's instructions. Used cman-2.0.115-63.el5. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0036.html |