Description of problem: gfs2_tool unfreeze hangs Version-Release number of selected component (if applicable): RHEL5 with the 2.6.18-132 kernel How reproducible: Always Steps to Reproduce: 1. mount -tgfs2 /dev/exxon_vg/exxon_lv /mnt/gfs2 2. gfs2_tool freeze /mnt/gfs2 3. echo foo > /mnt/gfs2/gronk & 4. gfs2_tool unfreeze /mnt/gfs2 Actual results: Unfreeze hangs permanently Expected results: Unfreeze should work Additional info:
Did a little debugging; it never even makes it to ->unfreeze_fs, some other lock above must be holding on. Also xfs_io (which I was using to try the unfreeze) can't open the mountpoint: open("/mnt/test", O_RDONLY <HANG> gfs2_tool also hangs up at: stat("/mnt/test", <HANG> but a direct twiddle of the sysfs file works fine. gfs2_tool is stuck down a path like this: gfs2_tool D ffff81000101d480 0 3442 3441 (NOTLB) ffff81012a4f1cf8 0000000000000086 0007810100000007 ffffffff801248d7 000001a400000005 0000000000000007 ffff81013a9770c0 ffff810104796100 0000012dfdcbec62 0000000000002a04 ffff81013a9772a8 00000003000001a4 Call Trace: [<ffffffff801248d7>] avc_has_perm+0x43/0x55 [<ffffffff88650150>] :gfs2:just_schedule+0x0/0xe [<ffffffff88650159>] :gfs2:just_schedule+0x9/0xe [<ffffffff80063ac7>] __wait_on_bit+0x40/0x6e [<ffffffff88650150>] :gfs2:just_schedule+0x0/0xe [<ffffffff80063b61>] out_of_line_wait_on_bit+0x6c/0x78 [<ffffffff8009dbe3>] wake_bit_function+0x0/0x23 [<ffffffff8865014b>] :gfs2:gfs2_glock_wait+0x2b/0x30 [<ffffffff8865d861>] :gfs2:gfs2_getattr+0x85/0xc4 [<ffffffff8865d859>] :gfs2:gfs2_getattr+0x7d/0xc4 [<ffffffff8000dfe9>] vfs_getattr+0x2d/0xa9 [<ffffffff80027fe7>] vfs_stat_fd+0x32/0x4a [<ffffffff800bd0b0>] utrace_quiescent+0x20f/0x256 [<ffffffff80022d97>] sys_newstat+0x19/0x31 [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 and xfs_io like this: xfs_io D ffff810001004400 0 3618 3615 (NOTLB) ffff810128925c88 0000000000000082 000001a400000071 ffff810128f4f100 ffff81013ebc0c80 0000000000000004 ffff810128f4f100 ffffffff802f0ae0 0000014672a7fb54 00000000000913bb ffff810128f4f2e8 0000000000100000 Call Trace: [<ffffffff801248d7>] avc_has_perm+0x43/0x55 [<ffffffff88650150>] :gfs2:just_schedule+0x0/0xe [<ffffffff88650159>] :gfs2:just_schedule+0x9/0xe [<ffffffff80063ac7>] __wait_on_bit+0x40/0x6e [<ffffffff88650150>] :gfs2:just_schedule+0x0/0xe [<ffffffff80063b61>] out_of_line_wait_on_bit+0x6c/0x78 [<ffffffff8009dbe3>] wake_bit_function+0x0/0x23 [<ffffffff8865014b>] :gfs2:gfs2_glock_wait+0x2b/0x30 [<ffffffff8865f250>] :gfs2:gfs2_permission+0x83/0xd3 [<ffffffff8865f248>] :gfs2:gfs2_permission+0x7b/0xd3 [<ffffffff8000d5da>] permission+0x81/0xc8 [<ffffffff80011f05>] may_open+0x65/0x22f [<ffffffff8001aafb>] open_namei+0x2c4/0x6d5 [<ffffffff80066bcd>] do_page_fault+0x4fe/0x830 [<ffffffff80026cdb>] do_filp_open+0x1c/0x38 [<ffffffff80019676>] do_sys_open+0x44/0xbe [<ffffffff8005d28d>] tracesys+0xd5/0xe0 and the "echo" process is this: bash R running task 0 3383 2927 (NOTLB) bash D ffff810104634338 0 3614 2999 3615 (NOTLB) ffff8101289e3ae8 0000000000000086 0000000000000000 0000000000000000 0000000000000000 0000000000000007 ffff81013f590860 ffff810128f4f860 00000146725cc4fe 00000000000402e2 ffff81013f590a48 00000001800133e6 Call Trace: [<ffffffff80025390>] find_or_create_page+0x22/0x72 [<ffffffff88650150>] :gfs2:just_schedule+0x0/0xe [<ffffffff88650159>] :gfs2:just_schedule+0x9/0xe [<ffffffff80063ac7>] __wait_on_bit+0x40/0x6e [<ffffffff88650150>] :gfs2:just_schedule+0x0/0xe [<ffffffff80063b61>] out_of_line_wait_on_bit+0x6c/0x78 [<ffffffff8009dbe3>] wake_bit_function+0x0/0x23 [<ffffffff8865014b>] :gfs2:gfs2_glock_wait+0x2b/0x30 [<ffffffff88666f0d>] :gfs2:gfs2_do_trans_begin+0xd6/0x144 [<ffffffff886536bf>] :gfs2:gfs2_createi+0x114/0xd28 [<ffffffff8012d391>] sidtab_context_to_sid+0x93/0x1d9 [<ffffffff801312b8>] security_compute_sid+0x307/0x329 [<ffffffff801248d7>] avc_has_perm+0x43/0x55 [<ffffffff8865e80f>] :gfs2:gfs2_create+0x65/0x143 [<ffffffff8865360e>] :gfs2:gfs2_createi+0x63/0xd28 [<ffffffff80039e74>] vfs_create+0xe6/0x158 [<ffffffff8001a9d4>] open_namei+0x19d/0x6d5 [<ffffffff80026cdb>] do_filp_open+0x1c/0x38 [<ffffffff80019676>] do_sys_open+0x44/0xbe [<ffffffff8005d28d>] tracesys+0xd5/0xe0 I suppose it's holding something that the stat & open need, but it's blocked by the freeze. -Eric
Created attachment 334446 [details] Don't let gfs2_tool touch the mountpoint gfs2_tool tries to stat() the mountpoint in order to get to the device number and ultimately get to the sysfs freeze tunable. When the filesystem is frozen, followed by another process holding an exclusive lock on the mountpoint (eg. touch creating a new file in root dir), gfs2_tool hangs behind this lock at the stat() call. This patch makes freeze/unfreeze stat the block device instead of the mountpoint.
Patch looks good to me.
Checked in patch to RHEL5, STABLE3 and master.
What version of gfs2-utils is this fixed in? Is there a scratch build somewhere? Using gfs2-utils-0.1.53-1.el5_3.2 and this appears to still be happening.
(In reply to comment #6) > What version of gfs2-utils is this fixed in? You could check gfs2-utils-0.1.55-1.el5. This bug is not listed in the changelog for that version, but it was checked into the tree before it was built.
(In reply to comment #7) > (In reply to comment #6) > > What version of gfs2-utils is this fixed in? > > You could check gfs2-utils-0.1.55-1.el5. This bug is not listed in the > changelog for that version, but it was checked into the tree before it was > built. I don't know where to find that. I'll just be patient and wait for the errata release. :-) Thanks.
Tested on RHEL5.4 Snapshot 3 with gfs2-utils-0.1.61-1.el5, arch: x86_64 in kvm # mount -tgfs2 /dev/VolGroup00/gfs2 /mnt/gfs2 # gfs2_tool freeze /mnt/gfs2 # echo foo > /mnt/gfs2/gronk & [1] 2379 # gfs2_tool unfreeze /mnt/gfs2 [1]+ Done echo foo > /mnt/gfs2/gronk # cat /mnt/gfs2/gronk foo Moving to VERIFIED
Should we be able to take cLVM snapshots now as a result of this being fixed?
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1337.html