Description of problem: May 26 15:01:20 taft-04 qarshd[7933]: Running cmdline: lvconvert -m 0 helter_skelter/nonsyncd_log_4legs_1 May 26 15:01:20 taft-04 lvm[7244]: No longer monitoring mirror device helter_skelter-nonsyncd_log_4legs_1 for events May 26 15:01:20 taft-04 kernel: lvconvert[7934]: segfault at 0000000000000010 rip 000000000045008a rsp 00007fff6591cc20 error This was while running single machine device failure testing: Scenario: Kill disk log of non synced 4 leg mirror(s) ****** Mirror hash info for this scenario ****** * name: nonsyncd_log_4legs * sync: 0 * num mirrors: 1 * disklog: /dev/sdg1 * failpv(s): /dev/sdg1 * leg devices: /dev/sde1 /dev/sdf1 /dev/sdh1 /dev/sdd1 ************************************************ Creating mirror(s) on taft-04... taft-04: lvcreate -m 3 -n nonsyncd_log_4legs_1 -L 600M helter_skelter /dev/sde1:0-1000 /dev/sdf1:0-1000 /dev/sdh1:0-1000 /dev/sdd1:0-1000 /dev/sdg1:0-150 Continuing on without fully syncd mirrors, currently at... ( 1=2.83% ) Creating ext on top of mirror(s) on taft-04... mke2fs 1.39 (29-May-2006) Mounting mirrored ext filesystems on taft-04... Writing verification files (checkit) to mirror(s) on... ---- taft-04 ---- <start name="taft-04_1" pid="8732" time="Tue May 26 15:00:35 2009" type="cmd" /> Sleeping 10 seconds to get some outsanding EXT I/O locks before the failure Verifying files (checkit) on mirror(s) on... ---- taft-04 ---- Disabling device sdg on taft-04 Attempting I/O to cause mirror down conversion(s) on taft-04 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.151064 seconds, 278 MB/s Verifying the down conversion of the failed mirror(s) /dev/sdg1: open failed: No such device or address Verifying FAILED device /dev/sdg1 is *NOT* in the volume(s) /dev/sdg1: open failed: No such device or address Verifying LOG device /dev/sdg1 is *NOT* in the linear(s) /dev/sdg1: open failed: No such device or address Verifying LEG device /dev/sde1 *IS* in the volume(s) /dev/sdg1: open failed: No such device or address Verifying LEG device /dev/sdf1 *IS* in the volume(s) /dev/sdg1: open failed: No such device or address Verifying LEG device /dev/sdh1 *IS* in the volume(s) /dev/sdg1: open failed: No such device or address Verifying LEG device /dev/sdd1 *IS* in the volume(s) /dev/sdg1: open failed: No such device or address Verify the dm devices associated with /dev/sdg1 are no longer present Verify that the mirror image order remains the same after the down conversion /dev/sdg1: open failed: No such device or address /dev/sdg1: open failed: No such device or address /dev/sdg1: open failed: No such device or address /dev/sdg1: open failed: No such device or address Verifying files (checkit) on mirror(s) on... ---- taft-04 ---- Enabling device sdg on taft-04 Recreating PVs /dev/sdg1 WARNING: Volume group helter_skelter is not consistent WARNING: Volume Group helter_skelter is not consistent WARNING: Volume group helter_skelter is not consistent Extending the recreated PVs back into VG helter_skelter Since we can't yet up convert existing mirrors, down converting to linear(s) on taft-04 before re-converting back to original mirror(s) couldn't down convert mirror to linear Version-Release number of selected component (if applicable): 2.6.18-149.el5 lvm2-2.02.46-2.el5 device-mapper-1.02.32-1.el5 I'll attempt to reproduce and add more info...
Created attachment 345664 [details] log from taft-04 before the segfault
Reproduced this. May 27 14:52:47 taft-04 qarshd[19692]: Running cmdline: lvconvert -m 0 helter_skelter/nonsyncd_log_4legs_1 May 27 14:52:47 taft-04 lvm[7341]: No longer monitoring mirror device helter_skelter-nonsyncd_log_4legs_1 for events May 27 14:52:48 taft-04 kernel: lvconvert[19693]: segfault at 0000000000000010 rip 000000000045008a rsp 00007fff93dcb0d0 error 4
I reproduces this also with lvm2-2.02.46-2.el5. I was able to gather a core and produce this backtrace: Core was generated by `lvconvert -m 0 /dev/mirror_sanity/mirror_2_linear'. Program terminated with signal 11, Segmentation fault. [New process 18262] #0 0x000000000045008a in _lv_read_ahead_single (lv=<value optimized out>, data=0x7fff6fc0726c) at metadata/metadata.c:1427 1427 dev_get_read_ahead(seg_pv(seg, 0)->dev, &seg_read_ahead); (gdb) bt #0 0x000000000045008a in _lv_read_ahead_single (lv=<value optimized out>, data=0x7fff6fc0726c) at metadata/metadata.c:1427 #1 0x000000000044eb0e in _lv_postorder_visit (lv=0x1c48550, fn=0x450050 <_lv_read_ahead_single>, data=0x7fff6fc0726c) at metadata/metadata.c:1334 #2 0x000000000044ebf6 in _lv_postorder (lv=0x1c485e8, fn=0x7fff6fc071ec, data=0x0) at metadata/metadata.c:1352 #3 0x000000000044ec79 in lv_calculate_readhead (lv=0x1c485e8) at metadata/metadata.c:1440 #4 0x000000000042aa14 in _lv_info (cmd=0x1c144b0, lv=0x1c48550, with_mknodes=0, info=0x7fff6fc07320, with_open_count=0, with_read_ahead=0, by_uuid_only=0) at activate/activate.c:475 #5 0x000000000042ac0b in lv_info (cmd=0x1c485e8, lv=0x7fff6fc071ec, info=0x0, with_open_count=-1400596269, with_read_ahead=29670480) at activate/activate.c:486 #6 0x000000000042b63f in _lv_activate (cmd=0x1c144b0, lvid_s=<value optimized out>, exclusive=0, filter=1) at activate/activate.c:1088 #7 0x0000000000469c59 in _file_lock_resource (cmd=0x1c144b0, resource=0x7fff6fc085d0 "T0RbQZw61u7Z53rz2kzlsWRAYuhd6i7QEMdSY2hUk5vFROdAvMJhg5khzhbcmNs0", flags=57) at locking/file_locking.c:258 #8 0x0000000000446ef7 in _lock_vol (cmd=0x1c144b0, resource=0x7fff6fc085d0 "T0RbQZw61u7Z53rz2kzlsWRAYuhd6i7QEMdSY2hUk5vFROdAvMJhg5khzhbcmNs0", flags=57) at locking/locking.c:349 #9 0x0000000000447330 in lock_vol (cmd=0x1c144b0, vol=0x1c3a500 "T0RbQZw61u7Z53rz2kzlsWRAYuhd6i7QEMdSY2hUk5vFROdAvMJhg5khzhbcmNs0", flags=57) at locking/locking.c:401 #10 0x00000000004558c9 in _delete_lv (mirror_lv=0x1c3a2f0, lv=0x1c3a500) at metadata/mirror.c:370 #11 0x0000000000456156 in _remove_mirror_images (lv=0x1c3a2f0, num_removed=1, removable_pvs=<value optimized out>, remove_log=1, collapse=0, removed=0x7fff6fc08814) at metadata/mirror.c:627 #12 0x0000000000456655 in remove_mirror_images (lv=0x1c3a2f0, num_mirrors=<value optimized out>, removable_pvs=0x0, remove_log=1) at metadata/mirror.c:673 #13 0x0000000000410e22 in lvconvert_single (cmd=0x1c144b0, lv=0x1c3a2f0, handle=0x7fff6fc088e0) at lvconvert.c:578 #14 0x00000000004116f5 in lvconvert (cmd=0x1c144b0, argc=<value optimized out>, argv=<value optimized out>) at lvconvert.c:876 #15 0x00000000004183aa in lvm_run_command (cmd=0x1c144b0, argc=1, argv=0x7fff6fc0ab68) at lvmcmdline.c:1007 #16 0x0000000000418798 in lvm2_main (argc=4, argv=0x7fff6fc0ab68) at lvmcmdline.c:1343 #17 0x00002b7face7b994 in __libc_start_main (main=0x42a780 <main>, argc=4, ubp_av=0x7fff6fc0ab68, init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>, stack_end=0x7fff6fc0ab58) at libc-start.c:231 #18 0x000000000040da69 in _start ()
Created attachment 345818 [details] core dump corresponding to comment #3 Here's the core dump from my x86_64 system.
Apparent bug in new readahead code, during vgreduce is failed mirror image replaced with error segment, this segment type set always seg area_count to 0. We cannot expect that first area is always here.
Patch sent for review here https://www.redhat.com/archives/lvm-devel/2009-May/msg00237.html
*** Bug 502648 has been marked as a duplicate of this bug. ***
Fixed upstream, setting bug to POST for now.
Fixed in lvm2-2_02_46-3_el5.
Fix verified in lvm2-2.02.46-8.el5.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1393.html