Description of problem: Scenario: Kill both logs of synced 2 leg redundant log mirror(s) ********* Mirror hash info for this scenario ********* * names: syncd_both_logs_2legs_2logs_1 * sync: 1 * leg devices: /dev/sdf1 /dev/sdd1 * log devices: /dev/sdb1 /dev/sdg1 * failpv(s): /dev/sdb1 /dev/sdg1 * failnode(s): taft-01 * additional snap: /dev/sdf1 * leg fault policy: remove * log fault policy: allocate ****************************************************** Creating mirror(s) on taft-01... taft-01: lvcreate --mirrorlog mirrored -m 1 -n syncd_both_logs_2legs_2logs_1 -L 600M helter_skelter /dev/sdf1:0-1000 /dev/sdd1:0-1000 /dev/sdb1:0-150 /dev/sdg1:0-150 Creating a snapshot volume of each of the mirrors PV=/dev/sdg1 syncd_both_logs_2legs_2logs_1_mlog_mimage_1: 1.4 PV=/dev/sdb1 syncd_both_logs_2legs_2logs_1_mlog_mimage_0: 1.4 PV=/dev/sdg1 syncd_both_logs_2legs_2logs_1_mlog_mimage_1: 1.4 PV=/dev/sdb1 syncd_both_logs_2legs_2logs_1_mlog_mimage_0: 1.4 Waiting until all mirrors become fully syncd... 0/1 mirror(s) are fully synced: ( 80.67% ) 1/1 mirror(s) are fully synced: ( 100.00% ) Creating ext on top of mirror(s) on taft-01... mke2fs 1.39 (29-May-2006) Mounting mirrored ext filesystems on taft-01... Writing verification files (checkit) to mirror(s) on... ---- taft-01 ---- checkit starting with: CREATE Num files: 100 Random Seed: 7619 Verify XIOR Stream: /tmp/checkit_syncd_both_logs_2legs_2logs_1 Working dir: /mnt/syncd_both_logs_2legs_2logs_1/checkit <start name="taft-01_syncd_both_logs_2legs_2logs_1" pid="19564" time="Mon Sep 27 14:12:58 2010" type="cmd" /> Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- taft-01 ---- checkit starting with: VERIFY Verify XIOR Stream: /tmp/checkit_syncd_both_logs_2legs_2logs_1 Working dir: /mnt/syncd_both_logs_2legs_2logs_1/checkit Disabling device sdb on taft-01 Disabling device sdg on taft-01 [DEADLOCK] [root@taft-01 ~]# dmsetup ls helter_skelter-snap0-cow (253, 10) helter_skelter-syncd_both_logs_2legs_2logs_1_mlog_mimage_0-missing_0_0 (253, 12) helter_skelter-syncd_both_logs_2legs_2logs_1_mimage_1 (253, 6) helter_skelter-syncd_both_logs_2legs_2logs_1_mlog_mimage_1 (253, 3) helter_skelter-syncd_both_logs_2legs_2logs_1_mimage_0 (253, 5) helter_skelter-syncd_both_logs_2legs_2logs_1_mlog_mimage_0 (253, 2) helter_skelter-snap0 (253, 8) helter_skelter-syncd_both_logs_2legs_2logs_1 (253, 7) helter_skelter-syncd_both_logs_2legs_2logs_1_mlog (253, 4) helter_skelter-syncd_both_logs_2legs_2logs_1-real (253, 9) helter_skelter-syncd_both_logs_2legs_2logs_1_mlog_mimage_1-missing_0_0 (253, 11) Log from taft-01: taft-01 qarshd[7632]: Running cmdline: echo offline > /sys/block/sdb/device/state & taft-01 qarshd[7635]: Talking to peer 10.15.80.47:44602 taft-01 qarshd[7635]: Running cmdline: echo offline > /sys/block/sdg/device/state & taft-01 kernel: sd 1:0:0:6: rejecting I/O to offline device [...] taft-01 kernel: sd 1:0:0:1: rejecting I/O to offline device taft-01 kernel: device-mapper: raid1: All sides of mirror have failed. taft-01 lvm[6327]: Another thread is handling an event. Waiting... taft-01 lvm[6327]: Log device 253:4 has failed (D). taft-01 lvm[6327]: Device failure in helter_skelter-syncd_both_logs_2legs_2logs_1-real. taft-01 lvm[6327]: Couldn't find device with uuid xe95w7-0iAN-RE2G-mPaw-mXXY-eqzP-WDTdjq. taft-01 kernel: sd 1:0:0:1: rejecting I/O to offline device [...] taft-01 kernel: sd 1:0:0:1: rejecting I/O to offline device taft-01 qarshd[7639]: Running cmdline: pvs -a taft-01 kernel: sd 1:0:0:1: rejecting I/O to offline device [...] taft-01 kernel: sd 1:0:0:6: rejecting I/O to offline device taft-01 lvm[6327]: Mirror log status: 2 of 2 images failed - switching to core taft-01 kernel: device-mapper: raid1: All sides of mirror have failed. taft-01 lvm[6327]: Monitoring mirror device helter_skelter-syncd_both_logs_2legs_2logs_1_mlog for events. taft-01 lvm[6327]: Another thread is handling an event. Waiting... taft-01 lvm[6327]: Another thread is handling an event. Waiting... taft-01 kernel: INFO: task pdflush:301 blocked for more than 120 seconds. taft-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. taft-01 kernel: pdflush D ffffffff80150839 0 301 87 302 300 (L-TLB) taft-01 kernel: ffff81021ec41b30 0000000000000046 ffff81021504d380 ffff81021fb73200 taft-01 kernel: ffff81021fb73200 0000000000000009 ffff81021eccc820 ffff8101fff15100 taft-01 kernel: 00000053c6a5ed6c 000000000004fa30 ffff81021eccca08 0000000100000000 taft-01 kernel: Call Trace: taft-01 kernel: [<ffffffff8006e1db>] do_gettimeofday+0x40/0x90 taft-01 kernel: [<ffffffff80028b14>] sync_page+0x0/0x43 taft-01 kernel: [<ffffffff800637ea>] io_schedule+0x3f/0x67 taft-01 kernel: [<ffffffff80028b52>] sync_page+0x3e/0x43 taft-01 kernel: [<ffffffff8006392e>] __wait_on_bit_lock+0x36/0x66 taft-01 kernel: [<ffffffff8003fc56>] __lock_page+0x5e/0x64 taft-01 kernel: [<ffffffff800a0a06>] wake_bit_function+0x0/0x23 taft-01 kernel: [<ffffffff8001cf60>] mpage_writepages+0x14f/0x37d taft-01 kernel: [<ffffffff88050479>] :ext3:ext3_ordered_writepage+0x0/0x198 taft-01 kernel: [<ffffffff8005ac6f>] do_writepages+0x29/0x2f taft-01 kernel: [<ffffffff8002fc33>] __writeback_single_inode+0x1ae/0x328 taft-01 kernel: [<ffffffff80020ff4>] sync_sb_inodes+0x1b5/0x26f taft-01 kernel: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4 taft-01 kernel: [<ffffffff80050ffd>] writeback_inodes+0x82/0xd8 taft-01 kernel: [<ffffffff800c9714>] wb_kupdate+0xd4/0x14e taft-01 kernel: [<ffffffff8005663c>] pdflush+0x0/0x1fb taft-01 kernel: [<ffffffff8005678d>] pdflush+0x151/0x1fb taft-01 kernel: [<ffffffff800c9640>] wb_kupdate+0x0/0x14e taft-01 kernel: [<ffffffff8003290a>] kthread+0xfe/0x132 taft-01 kernel: [<ffffffff8009d64e>] request_module+0x0/0x14d taft-01 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 taft-01 kernel: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4 taft-01 kernel: [<ffffffff8003280c>] kthread+0x0/0x132 taft-01 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 taft-01 kernel: taft-01 kernel: INFO: task dmeventd:7545 blocked for more than 120 seconds. taft-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. taft-01 kernel: dmeventd D ffffffff80150839 0 7545 1 7678 7544 (NOTLB) taft-01 kernel: ffff81020a3dbbd8 0000000000000086 0000000100000000 ffff8101ffd13080 taft-01 kernel: 0000000000000000 0000000000000008 ffff810209ca07e0 ffff8101fff15100 taft-01 kernel: 0000005247e193ae 000000000008e6ed ffff810209ca09c8 0000000100000000 taft-01 kernel: Call Trace: taft-01 kernel: [<ffffffff80129317>] avc_alloc_node+0x3a/0x187 taft-01 kernel: [<ffffffff880eb530>] :dm_mod:dev_suspend+0x0/0x18c taft-01 kernel: [<ffffffff8009daa3>] flush_cpu_workqueue+0x7f/0xad taft-01 kernel: [<ffffffff800a09d8>] autoremove_wake_function+0x0/0x2e taft-01 kernel: [<ffffffff8009dadf>] flush_workqueue+0xe/0x87 taft-01 kernel: [<ffffffff8810d63c>] :dm_mirror:mirror_presuspend+0x11d/0x126 taft-01 kernel: [<ffffffff880e8bb4>] :dm_mod:suspend_targets+0x33/0x43 taft-01 kernel: [<ffffffff880e85e6>] :dm_mod:dm_suspend+0x89/0x2db taft-01 kernel: [<ffffffff8008cfa1>] default_wake_function+0x0/0xe taft-01 kernel: [<ffffffff880eb58d>] :dm_mod:dev_suspend+0x5d/0x18c taft-01 kernel: [<ffffffff880ebed8>] :dm_mod:ctl_ioctl+0x210/0x25b taft-01 kernel: [<ffffffff80042181>] do_ioctl+0x55/0x6b taft-01 kernel: [<ffffffff80030204>] vfs_ioctl+0x457/0x4b9 taft-01 kernel: [<ffffffff8004c633>] sys_ioctl+0x59/0x78 taft-01 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 taft-01 kernel: taft-01 kernel: INFO: task kmirrord:7483 blocked for more than 120 seconds. taft-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. taft-01 kernel: kmirrord D ffffffff80150839 0 7483 87 7484 6075 (L-TLB) taft-01 kernel: ffff81020a455bb0 0000000000000046 ffff81020b08d668 ffff81020a455be0 taft-01 kernel: 0000000000000011 000000000000000a ffff8101ffdcf820 ffff8101ffdd87a0 taft-01 kernel: 00000051fdc32143 0000000000023daa ffff8101ffdcfa08 0000000300000000 taft-01 kernel: Call Trace: taft-01 kernel: [<ffffffff8006e1db>] do_gettimeofday+0x40/0x90 taft-01 kernel: [<ffffffff8005a7d6>] getnstimeofday+0x10/0x28 taft-01 kernel: [<ffffffff800637ea>] io_schedule+0x3f/0x67 taft-01 kernel: [<ffffffff880ec4f8>] :dm_mod:sync_io+0xb6/0xf3 taft-01 kernel: [<ffffffff880ec76d>] :dm_mod:dm_io+0xb7/0xf1 taft-01 kernel: [<ffffffff880ec17d>] :dm_mod:vm_get_page+0x0/0x42 taft-01 kernel: [<ffffffff880ec0ce>] :dm_mod:vm_next_page+0x0/0x17 taft-01 kernel: [<ffffffff881016e3>] :dm_log:disk_flush+0x43/0x6a taft-01 kernel: [<ffffffff8810ef4e>] :dm_mirror:do_mirror+0x93a/0xcbb taft-01 kernel: [<ffffffff8008bae4>] find_busiest_group+0x20d/0x621 taft-01 kernel: [<ffffffff80062ff8>] thread_return+0x62/0xfe taft-01 kernel: [<ffffffff8002e23f>] __wake_up+0x38/0x4f taft-01 kernel: [<ffffffff8810e614>] :dm_mirror:do_mirror+0x0/0xcbb taft-01 kernel: [<ffffffff8004d6b3>] run_workqueue+0x94/0xe4 taft-01 kernel: [<ffffffff80049eee>] worker_thread+0x0/0x122 taft-01 kernel: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4 taft-01 kernel: [<ffffffff80049fde>] worker_thread+0xf0/0x122 taft-01 kernel: [<ffffffff8008cfa1>] default_wake_function+0x0/0xe taft-01 kernel: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4 taft-01 kernel: [<ffffffff8003290a>] kthread+0xfe/0x132 taft-01 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 taft-01 kernel: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4 taft-01 kernel: [<ffffffff8003280c>] kthread+0x0/0x132 taft-01 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Version-Release number of selected component (if applicable): 2.6.18-194.11.3.el5 lvm2-2.02.73-2.el5 BUILT: Mon Aug 30 06:36:20 CDT 2010 lvm2-cluster-2.02.73-2.el5 BUILT: Mon Aug 30 06:38:05 CDT 2010 device-mapper-1.02.54-2.el5 BUILT: Fri Sep 10 12:00:05 CDT 2010 cmirror-1.1.39-10.el5 BUILT: Wed Sep 8 16:32:05 CDT 2010 kmod-cmirror-0.1.22-3.el5 BUILT: Tue Dec 22 13:39:47 CST 2009 How reproducible: Everytime
[root@taft-01 ~]# lvs -a -o +devices LV VG Attr LSize Origin Snap% Log Copy% Convert Devices snap0 helter_skelter swi-a- 252.00M syncd_both_logs_2legs_2logs_1 0.00 /dev/sdf1(150) syncd_both_logs_2legs_2logs_1 helter_skelter owi-a- 600.00M syncd_both_logs_2legs_2logs_1_mlog 100.00 syncd_both_logs_2legs_2logs_1_mimage_0(0),syncd_both_logs_2legs_2logs_1_mimage_1(0) [syncd_both_logs_2legs_2logs_1_mimage_0] helter_skelter iwi-ao 600.00M /dev/sdf1(0) [syncd_both_logs_2legs_2logs_1_mimage_1] helter_skelter iwi-ao 600.00M /dev/sdd1(0) [syncd_both_logs_2legs_2logs_1_mlog] helter_skelter mwi-ao 4.00M 100.00 syncd_both_logs_2legs_2logs_1_mlog_mimage_0(0),syncd_both_logs_2legs_2logs_1_mlog_mimage_1(0) [syncd_both_logs_2legs_2logs_1_mlog_mimage_0] helter_skelter iwi-ao 4.00M /dev/sdb1(0) [syncd_both_logs_2legs_2logs_1_mlog_mimage_1] helter_skelter iwi-ao 4.00M /dev/sdg1(0)
Just a note that the mirror in comment #0 & #1 has a snapshot volume associated with it, however this issue has been reproduced without there being a snapshot involved.
Please identify test, I'm not sure which one it is: kill_primary_synced_2_legs kill_primary_synced_3_legs kill_primary_synced_core_log_2_legs kill_secondary_synced_2_legs kill_secondary_synced_3_legs kill_secondary_synced_core_log_2_legs kill_secondary_non_synced_2_legs kill_secondary_non_synced_3_legs kill_secondary_non_synced_core_log_2_legs kill_log_synced_2_legs kill_log_synced_3_legs kill_log_non_synced_2_legs kill_log_non_synced_3_legs kill_primary_synced_4_legs kill_primary_synced_core_log_4_legs kill_secondary_synced_4_legs kill_secondary_synced_core_log_4_legs kill_secondary_non_synced_4_legs kill_log_synced_4_legs kill_log_non_synced_4_legs kill_primary_and_log_synced_2_legs kill_primary_and_log_synced_3_legs kill_secondary_and_log_synced_2_legs kill_secondary_and_log_synced_4_legs kill_multiple_legs_synced_3_legs kill_multiple_legs_synced_4_legs [jbrassow@silver sts-root]$ grep syncd_both_logs_2legs_2logs_1 -r lvm2 [jbrassow@silver sts-root]$
Created attachment 453256 [details] Output of 'lvconvert --repair --use-policies vg/lv -vvvv' *** [root@bp-01 ~]# devices LV Copy% Devices LogVol00 /dev/sda2(0) LogVol01 /dev/sda2(4451) lv 100.00 lv_mimage_0(0),lv_mimage_1(0) [lv_mimage_0] /dev/sdb1(0) [lv_mimage_1] /dev/sdc1(0) [lv_mlog] 100.00 lv_mlog_mimage_0(0),lv_mlog_mimage_1(0) [lv_mlog_mimage_0] /dev/sdf1(0) [lv_mlog_mimage_1] /dev/sdg1(0) [root@bp-01 ~]# ps -C dmeventd PID TTY TIME CMD [root@bp-01 ~]# off.sh sd[fg] Turning off sdf Turning off sdg [root@bp-01 ~]# dmsetup status vg-lv: 0 10485760 mirror 2 253:5 253:6 10240/10240 1 AA 3 disk 253:4 D vg-lv_mimage_1: 0 10485760 linear vg-lv_mimage_0: 0 10485760 linear vg-lv_mlog_mimage_1: 0 8192 linear vg-lv_mlog: 0 8192 mirror 2 253:2 253:3 7/8 1 DD 1 core vg-lv_mlog_mimage_0: 0 8192 linear VolGroup00-LogVol01: 0 20578304 linear VolGroup00-LogVol00: 0 291700736 linear [root@bp-01 ~]# lvconvert --repair --use-policies vg/lv -vvvv >& output.txt *** lvconvert seems to be trying to suspend vg/lv - which will fail because the kernel target is blocking I/O (because it's log has failed). I thought we had this logic right. Did something go missing in the rebase?
In '_lvconvert_mirrors_repair', after removing the entire log, I pause (in gdb) before going on to attempt re-adding the mirrored log and I see the following: [root@bp-01 ~]# dmsetup table vg-lv: 0 10485760 mirror core 3 1024 nosync block_on_error 2 253:5 0 253:6 0 vg-lv_mimage_1: 0 10485760 linear 8:33 2048 vg-lv_mimage_0: 0 10485760 linear 8:17 2048 vg-lv_mlog_mimage_1-missing_0_0: 0 8192 error vg-lv_mlog_mimage_1: 0 8192 linear 253:8 0 vg-lv_mlog_mimage_0: 0 8192 linear 253:9 0 vg-lv_mlog_mimage_0-missing_0_0: 0 8192 error The devices there seem to prevent us from adding the mirror log. Allowed to continue, we see the log is replace; but it is a linear (not mirrored) log: [root@bp-01 ~]# devices /dev/sdf1: open failed: No such device or address /dev/sdg1: open failed: No such device or address LV Copy% Devices LogVol00 /dev/sda2(0) LogVol01 /dev/sda2(4451) lv 100.00 lv_mimage_0(0),lv_mimage_1(0) [lv_mimage_0] /dev/sdb1(0) [lv_mimage_1] /dev/sdc1(0) [lv_mlog] /dev/sde1(0) [root@bp-01 ~]# dmsetup ls vg-lv (253, 7) vg-lv_mimage_1 (253, 6) vg-lv_mimage_0 (253, 5) vg-lv_mlog_mimage_1-missing_0_0 (253, 8) vg-lv_mlog_mimage_1 (253, 3) vg-lv_mlog (253, 4) vg-lv_mlog_mimage_0 (253, 2) vg-lv_mlog_mimage_0-missing_0_0 (253, 9) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) ... and we still have the old mlog devices left around. This seems to be just one possible outcome. There seem to be other ways of failing. I'm not sure if they are all related to DM devices that are not being removed.
gdb is bouncing all over the place... I can correlate the line numbers, but it isn't doing them in the proper sequence: (gdb) n 703 if (!log_count) (gdb) 1250 if (!(lp->failed_pvs = _failed_pv_list(lv->vg))) (gdb) 703 if (!log_count) (gdb) 1250 if (!(lp->failed_pvs = _failed_pv_list(lv->vg))) (gdb) 703 if (!log_count) (gdb) 1265 if (failed_mirrors) { (gdb) 1262 if (lp->mirrors == 1) (gdb) 1265 if (failed_mirrors) { (gdb) 1272 if (!_lv_update_log_type(cmd, lp, lv, lp->failed_pvs, The _lv_update_log_type at 1272 is already too late... it skips over the one at 1259. Probably have to compile w/o optimization.
After attempting to remove the mirrored log _lvconvert_mirrors_repair -> _lv_update_log_type -> remove_mirror_log -> _remove_mirror_images -> replace_lv_with_error_segment - followed by suspend Why is there no new table loaded for the _mlog, which should be an error target? [from dmsetup ls] vg-lv_mlog_mimage_1 (253, 3) vg-lv_mlog (253, 4) vg-lv_mlog_mimage_0 (253, 2) [from dmsetup info] Name: vg-lv_mlog State: SUSPENDED Read Ahead: 256 Tables present: LIVE And after the resume, why do I have -missing_ devices? What possible good are they? Especially since they aren't used? [from dmsetup ls] vg-lv_mlog_mimage_1-missing_0_0 (253, 8) vg-lv_mlog_mimage_1 (253, 3) vg-lv_mlog (253, 4) vg-lv_mlog_mimage_0 (253, 2) vg-lv_mlog_mimage_0-missing_0_0 (253, 9) [from dmsetup table] vg-lv_mlog_mimage_1-missing_0_0: 0 8192 error vg-lv_mlog_mimage_1: 0 8192 linear 253:8 0 vg-lv_mlog: 0 8192 mirror core 3 1024 nosync block_on_error 2 253:2 0 253:3 0 vg-lv_mlog_mimage_0: 0 8192 linear 253:9 0 vg-lv_mlog_mimage_0-missing_0_0: 0 8192 error Where did this behavior come from?
Jon, about gdb jumping around: you need to compile with -O0 instead of -O2 to get the runs straight. Optimized code does not usually follow the control flow of the program straightforwardly. I will have a look at replace_lv_with_error_segment. Since you can reproduce this problem, do you think it would be possible to script this down as a regression test? It would be very valuable, both for hunting this down and for making sure it does not regress in the future... It seems you don't even need dmeventd to reproduce the problem, is that right?
Jon, what version are you using here? I can kill a mirrored log and have lvconvert --repair clean it up (and even replace the legs) here, without hitting any bugs. That is with the current CVS, although nothing has changed recently in this respect. The ...-missing_0_0 mappings do not appear (they may appear during the repair temporarily, but after it is done, there's no debris left).
I know about gdb (it's why I mentioned recompiling), but I'm trying to stick with the rpms. I'll abandon the RPMs soon in favor of CVS head. Versions: lvm2-2.02.73-2.el5 lvm2-cluster-2.02.73-2.el5 device-mapper-multipath-0.4.7-39.el5 device-mapper-1.02.54-2.el5 device-mapper-event-1.02.54-2.el5 The components necessary to reproduce are: 1) mirror with a mirrored log [from comment #11 you have this] 2) I/O must be occurring at time of failure [I mount and do loops of kernel untars] 3) kill both devices containing the mirrored log I know you are doing 1 & 3 - perhaps not 2? Ok, I won't worry about the -missing_* mappings for now. Even w/o that, there is still no new table for _mlog - which should be error after the above mentioned suspend.
Yes, the bug can be hit without going through dmeventd. I move dmeventd so I can run 'lvconvert --repair' directly.
I should mention too that if you are going to run 'lvconvert --repair', I typically wait for 'dmsetup status' to show that the devices have failed before running the command (note the 'D's): [root@bp-01 ~]# dmsetup status vg-lv: 0 10485760 mirror 2 253:5 253:6 10240/10240 1 AA 3 disk 253:4 D vg-lv_mimage_1: 0 10485760 linear vg-lv_mimage_0: 0 10485760 linear vg-lv_mlog_mimage_1-missing_0_0: 0 8192 error vg-lv_mlog_mimage_1: 0 8192 linear vg-lv_mlog: 0 8192 mirror 2 253:2 253:3 7/8 1 DD 1 core vg-lv_mlog_mimage_0: 0 8192 linear vg-lv_mlog_mimage_0-missing_0_0: 0 8192 error
The following change fixes the problem: --- old-upstream/lib/metadata/mirror.c 2010-10-14 18:38:37.000000000 +0200 +++ new-upstream/lib/metadata/mirror.c 2010-10-14 18:38:37.000000000 +0200 @@ -905,6 +905,11 @@ static int _remove_mirror_images(struct return 0; } + if (!vg_write(detached_log_lv->vg)) { + log_error("intermediate VG write failed."); + return 0; + } + /* * Flush all I/Os held by mirrored log. */ @@ -915,6 +920,12 @@ static int _remove_mirror_images(struct return 0; } + if (!vg_commit(detached_log_lv->vg)) { + if (!resume_lv(detached_log_lv->vg->cmd, detached_log_lv)) + stack; + return_0; + } + if (!resume_lv(detached_log_lv->vg->cmd, detached_log_lv)) { log_error("Failed to resume %s",
Jon, The H_S test case is "kill_both_logs_2_legs_2_logs". It doesn't show up because it's been commented out in lvm2/lib/helter_skelter/HS_Scenarios.pm due to this issue.
New patch created, tested, and sent to lvm-devel
Created attachment 453534 [details] Patch that was sent to lvm-devel
Fix in lvm2-2.02.74-1.el5.
Fix verified in lvm2-2.02.74-1.el5.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0052.html