Bug 217438
Summary: | scrolling kernel requests to mark mirror regions | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> |
Component: | cmirror | Assignee: | Jonathan Earl Brassow <jbrassow> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | agk, cfeist, dwysocha, jbrassow, mbroz, prockai |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-08-05 21:40:48 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2006-11-27 21:54:08 UTC
Possibly: 1) recovery complete Is the fix for this in lvm2-2.02.17-1/lvm2-cluster-2.02.17-1? I'm still seeing these in the latest build. This bug still exists. I basically did exactly what was done in the original report, that is, create a cmirror and begin writing to it before is was fully synced (in this case I used GFS). I then noticed that the copy percent had gotten stuck at 88%. This causes the filesytem to deadlock. [root@link-08 ~]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Move Log Copy% Devices fail_primary_synced_2_legs helter_skelter mwi-ao 800.00M fail_primary_synced_2_legs_mlog 88.00 fail_primary_synced_2_legs_mimage_0(0),fail_primary_synced_2_legs_mimage_1(0) [fail_primary_synced_2_legs_mimage_0] helter_skelter iwi-ao 800.00M /dev/sde1(0) [fail_primary_synced_2_legs_mimage_1] helter_skelter iwi-ao 800.00M /dev/sdf1(0) [fail_primary_synced_2_legs_mlog] helter_skelter lwi-ao 4.00M /dev/sdb1(0) [...] Apr 4 09:50:12 link-08 kernel: dm-cmirror: Attempt to mark a region 0/6t9xT1Dd which is being recovered. Apr 4 09:50:12 link-08 kernel: dm-cmirror: Current recoverer: 1 Apr 4 09:50:12 link-08 kernel: dm-cmirror: Mark requester : 4 Apr 4 09:50:12 link-08 kernel: dm-cmirror: Delaying mark to region 0, due to recovery Apr 4 09:50:12 link-08 kernel: dm-cmirror: Attempt to mark a region 0/6t9xT1Dd which is being recovered. Apr 4 09:50:12 link-08 kernel: dm-cmirror: Current recoverer: 1 Apr 4 09:50:12 link-08 kernel: dm-cmirror: Mark requester : 4 Apr 4 09:50:12 link-08 kernel: dm-cmirror: Delaying mark to region 0, due to recovery Apr 4 09:50:13 link-08 kernel: dm-cmirror: Attempt to mark a region 0/6t9xT1Dd which is being recovered. Apr 4 09:50:13 link-08 kernel: dm-cmirror: Current recoverer: 1 Apr 4 09:50:13 link-08 kernel: dm-cmirror: Mark requester : 4 Apr 4 09:50:13 link-08 kernel: dm-cmirror: Delaying mark to region 0, due to recovery Apr 4 09:50:13 link-08 kernel: dm-cmirror: Attempt to mark a region 0/6t9xT1Dd which is being recovered. Apr 4 09:50:13 link-08 kernel: dm-cmirror: Current recoverer: 1 Apr 4 09:50:13 link-08 kernel: dm-cmirror: Mark requester : 4 Apr 4 09:50:13 link-08 kernel: dm-cmirror: Delaying mark to region 0, due to recovery Apr 4 09:50:14 link-08 kernel: dm-cmirror: Attempt to mark a region 0/6t9xT1Dd which is being recovered. Apr 4 09:50:14 link-08 kernel: dm-cmirror: Current recoverer: 1 Apr 4 09:50:14 link-08 kernel: dm-cmirror: Mark requester : 4 Apr 4 09:50:14 link-08 kernel: dm-cmirror: Delaying mark to region 0, due to recovery Apr 4 09:50:14 link-08 kernel: dm-cmirror: Attempt to mark a region 0/6t9xT1Dd which is being recovered. Apr 4 09:50:14 link-08 kernel: dm-cmirror: Current recoverer: 1 Apr 4 09:50:14 link-08 kernel: dm-cmirror: Mark requester : 4 Apr 4 09:50:14 link-08 kernel: dm-cmirror: Delaying mark to region 0, due to recovery Apr 4 09:50:15 link-08 kernel: dm-cmirror: Attempt to mark a region 0/6t9xT1Dd which is being recovered. Apr 4 09:50:15 link-08 kernel: dm-cmirror: Current recoverer: 1 Apr 4 09:50:15 link-08 kernel: dm-cmirror: Mark requester : 4 Apr 4 09:50:15 link-08 kernel: dm-cmirror: Delaying mark to region 0, due to recovery [...] All the other nodes in the cluster say this: Apr 4 14:13:12 link-07 kernel: dm-cmirror: Delaying mark to region 0, due to recovery These are the current running versions: [root@link-08 ~]# uname -ar Linux link-08 2.6.9-50.ELsmp #1 SMP Tue Mar 6 18:04:58 EST 2007 x86_64 x86_64 x86_64 GNU/Linux [root@link-08 ~]# rpm -q lvm2 lvm2-2.02.21-3.el4 [root@link-08 ~]# rpm -q lvm2-cluster lvm2-cluster-2.02.21-3.el4 [root@link-08 ~]# rpm -q cmirror-kernel cmirror-kernel-2.6.9-27.0 Hit another variation of this while attempting to umount a cmirror w/ gfs on top. In this case, the fs wasn't created until after the mirror was in sync, same with the I/O. Then a failure scenario was attemped, and again, no I/O was done until the mirror was in sync. [...] Apr 4 12:01:20 link-02 kernel: dm-cmirror: Attempt to mark a region 0/rto3kYQN which is being recovered. Apr 4 12:01:20 link-02 kernel: dm-cmirror: Current recoverer: 3 Apr 4 12:01:20 link-02 kernel: dm-cmirror: Mark requester : 1 Apr 4 12:01:20 link-02 kernel: dm-cmirror: Delaying mark to region 0, due to recovery Apr 4 12:01:21 link-02 kernel: dm-cmirror: Attempt to mark a region 0/rto3kYQN which is being recovered. Apr 4 12:01:21 link-02 kernel: dm-cmirror: Current recoverer: 3 Apr 4 12:01:21 link-02 kernel: dm-cmirror: Mark requester : 1 Apr 4 12:01:21 link-02 kernel: dm-cmirror: Delaying mark to region 0, due to recovery Apr 4 12:01:21 link-02 kernel: dm-cmirror: Attempt to mark a region 0/rto3kYQN which is being recovered. Apr 4 12:01:21 link-02 kernel: dm-cmirror: Current recoverer: 3 Apr 4 12:01:21 link-02 kernel: dm-cmirror: Mark requester : 1 Apr 4 12:01:21 link-02 kernel: dm-cmirror: Delaying mark to region 0, due to recovery [...] Here the stuck umount attempt. umount D 0000000000000000 0 6369 6376 (NOTLB) 000001003196fa48 0000000000000006 0000010001013a80 0000000100000008 0000000100000000 ffffffffffffffff 00000000fa6bba34 00000000a023f7aa 000001002fea77f0 0000000000000d1a Call Trace:<ffffffff8030cea7>{io_schedule+38} <ffffffff8015a8f0>{wait_on_page_bit+180} <ffffffff8015a796>{page_wake_function+0} <ffffffff8015a796>{page_wake_function+0} <ffffffff8015a977>{wait_on_page_writeback_range+104} <ffffffff8015a69c>{__filemap_fdatawrite_range+95} <ffffffffa02400dc>{:gfs:gfs_sync_buf+54} <ffffffffa024d8cb>{:gfs:inode_go_sync+87} <ffffffffa024b83c>{:gfs:gfs_glock_drop_th+203} <ffffffffa0249fe5>{:gfs:run_queue+314} <ffffffffa024ad8c>{:gfs:gfs_glock_dq+400} <ffffffffa024add4>{:gfs:gfs_glock_dq_uninit+9} <ffffffffa026c9a5>{:gfs:gfs_statfs_sync+359} <ffffffffa026c9f5>{:gfs:gfs_make_fs_ro+63} <ffffffffa0264100>{:gfs:gfs_put_super+630} <ffffffff8017f6cd>{generic_shutdown_super+198} <ffffffffa02618fe>{:gfs:gfs_kill_sb+41} <ffffffff8017f5ee>{deactivate_super+95} <ffffffff801952e2>{sys_umount+925} <ffffffff80182a67>{sys_newstat+17} <ffffffff80110d91>{error_exit+0} <ffffffff8011026a>{system_call+126} proposing blocker flag, as we're starting to hit this more often. *** Bug 235252 has been marked as a duplicate of this bug. *** I've pushed the check for recovery/mark colision into the flush function... delaying the check until it is necessary. testing has begun on the code, but the package has not yet been built assigned -> post Bringing out of post because I missed part of the check-in, so it is incomplete. corey is testing the complete version now. After a cursory check, I check-in the rest of the code and put this bug back into POST post -> assigned assigned -> post post -> modified built 4/5/07 Have not seen this issue with the latest fix. Marking verified. Fixed in current release (4.7). |