Description of problem: I recently upgraded from FC9 to FC10 via Anaconda and found that 2 out of 3 LVM's that I had no longer mount. Both arrays are pairs of identical hard-drives, with JFS partitions on them, and worked properly from fc5 through to fc9. This is the first time it's given me a problem. Yum shows everything up to date as well. They are recognizable by LVM manager, and appear to be intact, but I get a serious of errors when trying to mount: dmesg shows messages like: device-mapper: table: 253:2: striped: Couldn't parse stripe destination device-mapper: ioctl: error adding target to table vgchange shows: [root@asparagus ~]# vgchange -a y File descriptor 4 left open File descriptor 9 left open File descriptor 10 left open File descriptor 11 left open device-mapper: reload ioctl failed: No such device or address device-mapper: reload ioctl failed: No such device or address device-mapper: reload ioctl failed: No such device or address device-mapper: reload ioctl failed: No such device or address device-mapper: reload ioctl failed: No such device or address 5 logical volume(s) in volume group "250GBx2" now active device-mapper: reload ioctl failed: No such device or address device-mapper: reload ioctl failed: No such device or address device-mapper: reload ioctl failed: No such device or address 3 logical volume(s) in volume group "750GBx2" now active 2 logical volume(s) in volume group "VolGroup00" now active vgscan finds them ok [root@asparagus ~]# vgscan File descriptor 4 left open File descriptor 9 left open File descriptor 10 left open File descriptor 11 left open Reading all physical volumes. This may take a while... Found volume group "250GBx2" using metadata type lvm2 Found volume group "750GBx2" using metadata type lvm2 Found volume group "VolGroup00" using metadata type lvm2 pvscan finds them ok [root@asparagus ~]# pvscan File descriptor 4 left open File descriptor 9 left open File descriptor 10 left open File descriptor 11 left open PV /dev/sdf1 VG 250GBx2 lvm2 [232.88 GB / 0 free] PV /dev/sdg1 VG 250GBx2 lvm2 [232.88 GB / 0 free] PV /dev/sde1 VG 750GBx2 lvm2 [698.63 GB / 0 free] PV /dev/sdd1 VG 750GBx2 lvm2 [698.63 GB / 0 free] PV /dev/sda2 VG VolGroup00 lvm2 [37.97 GB / 32.00 MB free] Total: 5 [1.86 TB] / in use: 5 [1.86 TB] / in no VG: 0 [0 ] lvscan shows ok as well [root@asparagus ~]# lvscan File descriptor 4 left open File descriptor 9 left open File descriptor 10 left open File descriptor 11 left open ACTIVE '/dev/250GBx2/250GBx2_Vol00' [100.00 GB] inherit ACTIVE '/dev/250GBx2/250GBx2_Vol01' [100.00 GB] inherit ACTIVE '/dev/250GBx2/250GBx2_Vol02' [100.00 GB] inherit ACTIVE '/dev/250GBx2/250GBx2_Vol03' [100.00 GB] inherit ACTIVE '/dev/250GBx2/250GBx2_Vol04' [65.77 GB] inherit ACTIVE '/dev/750GBx2/750GBx2_Vol00' [500.00 GB] inherit ACTIVE '/dev/750GBx2/750GBx2_Vol01' [500.00 GB] inherit ACTIVE '/dev/750GBx2/750GBx2_Vol02' [397.27 GB] inherit ACTIVE '/dev/VolGroup00/LogVol00' [36.00 GB] inherit ACTIVE '/dev/VolGroup00/LogVol01' [1.94 GB] inherit Version-Release number of selected component (if applicable): lvm2-2.02.39-6.fc10.i386 device-mapper-multipath-0.4.8-7.fc10.i386 device-mapper-devel-1.02.27-6.fc10.i386 device-mapper-libs-1.02.27-6.fc10.i386 device-mapper-1.02.27-6.fc10.i386 dmraid-libs-1.0.0.rc15-2.fc10.i386 dmraid-1.0.0.rc15-2.fc10.i386 How reproducible: Very Steps to Reproduce: 1. Boot 2. Check dmesg for errors 3. Unable to mount volumes
Using the fc10 rescue CD seems to properly recognize all LVM's properly. Other than the kernel modules loaded being different, the only other notable difference is the rescue kernel is i586 (instead of i686). Tried making initrd with wait-scsi-scan, and an additional 60 second sleep before the stabilization init line to no avail. Problem seems to surface at: lvm vgscan --ignorelockingfailure Noteworthy also is that /boot is from /dev/sdc (not sda). Failing lvm's are sata_sil and sata_promise raid controllers. Functioning lvm is just an normal PATA drive.
Created attachment 325866 [details] vvv output of vgchange activating the VG .. shows ioctl error Important part of output is: Loading 250GBx2-250GBx2_Vol00 table Adding target: 0 209715200 striped 2 256 8:81 384 8:97 384 dm table (253:2) OF [16384] dm reload (253:2) NF [16384] device-mapper: reload ioctl failed: No such device or address
Created attachment 326024 [details] LVM Dump Results Hopefully this lvmdump provides some more information.
Not related to FS type. Re-created one of the LVM's with ext3 partitions, and still had the same results (still striped though). Maybe something changed with striped between fc9 and fc10?
There are probably two subsequent problems: 1) initrd for some reason wrongly activates LVs in 250GBx2 and 750GBx2 VG with empty table - table from lvmdump: 250GBx2-250GBx2_Vol04: 250GBx2-250GBx2_Vol03: 250GBx2-250GBx2_Vol02: 250GBx2-250GBx2_Vol01: 250GBx2-250GBx2_Vol00: 750GBx2-750GBx2_Vol02: 750GBx2-750GBx2_Vol01: 750GBx2-750GBx2_Vol00: VolGroup00-LogVol01: 0 4063232 linear 8:2 75497856 VolGroup00-LogVol00: 0 75497472 linear 8:2 384 2) lvm is slightly confused and "vgchange" command will not activate correct mapping If you manualy remove all volumes with empty table (see "dmsetup table" and "dmsetup remove 250GBx2-250GBx2_Vol04" etc.) and run "vgchange -a y" again, the striped device will be probably back for that moment. Anyway 1) - I think it is mkinitrd problem but 2) lvm should handle this situation properly (at least print correct error message and not show wrong number of activated volumes).
I tried both of your suggestions (remove, and re-activate). The end results are the same: vgchange -a y 250GBx2 device-mapper: reload ioctl failed: No such device or address device-mapper: reload ioctl failed: No such device or address device-mapper: reload ioctl failed: No such device or address 3 logical volume(s) in volume group "250GBxv2" now active dmsetup table still shows empty table for above: dmsetup table | grep 250 250GBx2-250GBx2_Vol04: 250GBx2-250GBx2_Vol03: 250GBx2-250GBx2_Vol02: 250GBx2-250GBx2_Vol01: 250GBx2-250GBx2_Vol00: Device-mapper still shows: device-mapper: table: 253:4: striped: Couldn't parse stripe destination device-mapper: ioctl: error adding target to table I'm wondering if something changed in the 'striped' part of lvm/device-mapper land. I could try loading in a 'good' table, but I'm not sure why disaster-recovery would be able to read/gen the table ok, but a normal boot does not.
Where does the initial table get loaded from? Is it scanned/deduced, or stored in /etc/lvm/ ? If I reload a table from the rescue CD successful boot(via lvmdump) and load it into the current unsuccessful boot will it's state be saved after the next reboot? Or would reloading a table possibly make things worse?
Definitely striping related issue. If I setup the LVM as linear dmsetup shows: dmsetup table 250GBx2-250GBx2_Vol00: 0 486539264 linear 8:81 384 The system won't even let me create a striped disk in the same way (lvcreate fails): /sbin/lvcreate -n 250GBx2_Vol00 -l 119236 -i 2 -I 4 250GBx2 Results in (dmesg) device-mapper: table: 253:2: striped: Couldn't parse stripe destination device-mapper: ioctl: error adding target to table and (/var/log/messages) lvm[15662]: Aborting. Failed to activate new LV to wipe the start of it.
Created attachment 327893 [details] lvcreate failing to create a striped disk This is the -vvvvv output of using lvcreate to create a striped disk, using the command: /sbin/lvcreate -vvvvvv -n 250GBx2_Vol00 -l 119236 -i 2 -I 4 250GBx2
Created attachment 327988 [details] lvcreate successfully creating striped LVM in RESCUE mode If you diff this with the failed lvcreate from NON-resuce mode you can see there are slight differences before things go horribly wrong.. Most difference follow along the lines of offsets, and where metadata is located (23552 vs 26112) Can't really explain the differences.
You say this fails: Adding target: 0 209715200 striped 2 256 8:81 384 8:97 384 Look at the lvmdump. (You need to run it with '-a' to get more info.) brw-rw---- 1 root disk 8, 81 2008-12-05 17:42 sdf1 brw-rw---- 1 root disk 8, 97 2008-12-05 17:42 sdg1 So check *carefully* what is on those two disks, how big they are compared to the size lvm thinks they are, whether you can read/seek to the end etc. Then if you can get the devices to work correctly booting a different way, run lvmdump -a again and compare the output.
Created attachment 328117 [details] lvmdump -a in rescue mode. Devices are correctly mapped.
Created attachment 328118 [details] lvmdump -a in normal mode. Devices are incorrectly mapped.
Created attachment 328119 [details] diff -w -r output of 2 lvmdump -a commands I'm not sure where to even begin with the diff output. Rescue mode would naturally differ in output (by device), and timestamps on files/directories causes lots of differences. Anyone see anything interesting?
same problem here with : FakeRaid using nForce 3 stripe 00:09.0 IDE interface [0101]: nVidia Corporation CK8S Serial ATA Controller (v2.5) [10de:00ee] (rev a2) 00:0a.0 IDE interface [0101]: nVidia Corporation CK8S Serial ATA Controller (v2.5) [10de:00e3] (rev a2) I will provide a comparaison with a multiboot on same hardware using: - Fedora 10 x86_64 with failing LVM - Centos 5.2 i686 with accurate LVM
Created attachment 328182 [details] lvmdump -a on Fedora 10 x86_64 with failing LVM
Created attachment 328183 [details] lvmdump -a on CentOS 5.2 with accurate LVM
(In reply to comment #15) > same problem here with : FakeRaid using nForce 3 stripe No, this is different problem. The mapping is really (over)complicated here... You have 3 VG on system mapped over Nvidia dmraid (striped) mapping. IOW - 7 nvidia_cebfcccgN volumes, some of them are PVs for 3 LVM VolumeGroup0[0-2]. In CentOS lvmdump, there is one striped dmraid device (nvidia_cebfcccg), over it 7 nvidia devices (nvidia_cebfcccgp1,5,6,7,8,9,1) and volumes 7,8,10 are PVs for three VolGroup0[0-2]. In Fedora lvmdump, dmraid activated Nvidia volumes differently: there is the same striped dmraid device (nvidia_cebfcccg) but nvidia volumes5,6,7,8,9 are now stacked over new nvidia volume 2 (nvidia_cebfcccgp2). Only VolGoup00 and 02 is activated here (VolGroup01 is missing - is this the reported problem?). Maybe just order of activation in initscripts... If you run manually "vgchange -a y" after the Fedora boots, is the VolGroup01 now activated? If not, please can you post output of "vgchange -vvvv -a y" here? (Anyway the problem in comment #15 - #17 is different - dmraid (fake raid) related, former problem is pure LVM mapping.)
(In reply to comment #18) > (In reply to comment #15) ... > Only VolGoup00 and 02 is activated here (VolGroup01 is missing - is this the > reported problem?). Yes, but there is also the nvidia_cebfcccg5 which is reported as NTFS by fdisk but cannot be mounted: [root@kwizatz ~]# LANG=C;mount /dev/mapper/nvidia_cebfcccg5 /mnt/ mount: you must specify the filesystem type > Maybe just order of activation in initscripts... > > If you run manually "vgchange -a y" after the Fedora boots, is the VolGroup01 > now activated? [root@kwizatz ~]# vgchange -a y 1 logical volume(s) in volume group "VolGroup02" now active So , nope ): VolGroup02 was already activated > If not, please can you post output of "vgchange -vvvv -a y" here? output following. > (Anyway the problem in comment #15 - #17 is different - dmraid (fake raid) > related, former problem is pure LVM mapping.) Should I report another bug ?
Created attachment 328195 [details] output of vgchange-vvvv-a-y on kwizatz
well, the command was not not exacly correct... :-) #lvmcmdline.c:914 Processing: vgchange -vvvv -a y 2 .. #toollib.c:493 Volume group "2" not found But the problem that the scan it doesn't find the PV correctly. #device/dev-io.c:439 Opened /dev/dm-6 RO O_DIRECT #device/dev-io.c:134 /dev/dm-6: block size is 2048 bytes #label/label.c:184 /dev/dm-6: No label detected and here should be PV label. It isn't because dmraid wrongly shifts device to completely different offset. CentOS: 253:0 -> nvidia_cebfcccg: 0 796593920 striped 2 128 8:16 0 8:0 0 253:6 -> nvidia_cebfcccgp8: 0 41945652 linear 253:0 609008148 Fedora: 253:0 -> nvidia_cebfcccg: 0 796593920 striped 2 128 8:16 0 8:0 0 253:2 -> nvidia_cebfcccgp2: 0 670745880 linear 253:0 125837145 253:6 -> nvidia_cebfcccgp8: 0 41945652 linear 253:2 609008148 So the device is mapped through nvidia_cebfcccgp2 device which is shifted by 125837145 sectors! This is clearly dmraid bug (or incompatibility?), I'll clone the bug for dmraid.
Maybe it is bug 474697 ... Heinz, is the comment #21 a known nvidia dmraid related bug?
I have a feeling my issue is related to device-mapper and above, rather than lvm2. The ioctl error is coming from the device-mapper library. Strange that both rescue and non-rescue are using the same versions of all libraries (except dm-crypt isn't loaded in non-rescue, but I'm not using encrypted filesystems). Is there any more information I can provide?
Created attachment 328224 [details] fdisk -l output in non-rescue mode (= bad) I've noticed that fdisk reports the 'disk identifiers' for the failed LVM drives as: Disk identifier: 0x00000000 I'm wondering whether this is a cause or a symptom. Need to look further into what this field actually is reporting on.
Actually, scratch the above. The 2 drives with 0x00000000 are not initaliazed. The 2 750GB drives have proper identifiers. I wonder if they are being assembled in the correct sequence.
For lvm is not important initialization sequence, it scans all available devices anyway when searching for PVs. What says blkid on the disk partitions? (I'll look into logs tomorrow, I just read this bugzilla backwards:-)
I had to run blkid -g first to clean up tonnes of stray entries .. The relevant entries (by drive) remaining are (manually grouped by me) below. I notice that even though I can access the lvm data in rescue mode, the /dev/mapper links still aren't fully created, but the /dev/dm-X entries are. NON RESCUE MODE (=bad) 250GBx2 /dev/sde1: UUID="OtffJ0-cMsT-NXgE-ZAOw-jTtr-Mt1C-eMAoU9" TYPE="lvm2pv" /dev/sdd: UUID="c03adc3f-5491-7966-6ee5-aa226def322f" TYPE="mdraid" /dev/sdd1: UUID="lN2KYQ-7m6W-3bTT-JlAv-g4QT-bJ3B-4kJHg7" TYPE="lvm2pv" 750Gbx2 /dev/sdf1: UUID="df43b3b7-bb8c-4496-80ec-847c7e38168a" TYPE="jfs" /dev/sdg1: UUID="YeGfVk-ZOWy-Y9Od-7GOm-Tt0M-RKQt-4wDbhh" TYPE="lvm2pv" VolGroup00: /dev/mapper/VolGroup00-LogVol01: TYPE="swap" UUID="a3ab00d9-ba28-4003-978d-b5ef92d15cad" /dev/mapper/VolGroup00-LogVol00: UUID="0359eb7f-b018-48c8-a8fb-0310d1c2af03" TYPE="ext3" /dev/VolGroup00/LogVol00: UUID="0359eb7f-b018-48c8-a8fb-0310d1c2af03" SEC_TYPE="ext2" TYPE="ext3" /dev/VolGroup00/LogVol01: TYPE="swap" UUID="a3ab00d9-ba28-4003-978d-b5ef92d15cad" /dev/dm-0: UUID="0359eb7f-b018-48c8-a8fb-0310d1c2af03" TYPE="ext3" /dev/dm-1: TYPE="swap" UUID="a3ab00d9-ba28-4003-978d-b5ef92d15cad" /dev/sda2: UUID="RCU1gd-BkpC-jhcC-GNqG-sCRw-cmCb-YdJAwu" TYPE="lvm2pv" RESCUE MODE (=better, but not quite right) 250GBx2 /dev/sde1: UUID="OtffJ0-cMsT-NXgE-ZAOw-jTtr-Mt1C-eMAoU9" TYPE="lvm2pv" /dev/sdd: UUID="c03adc3f-5491-7966-6ee5-aa226def322f" TYPE="mdraid" /dev/sdd1: UUID="lN2KYQ-7m6W-3bTT-JlAv-g4QT-bJ3B-4kJHg7" TYPE="lvm2pv" 750Gbx2 /dev/sdf1: UUID="df43b3b7-bb8c-4496-80ec-847c7e38168a" TYPE="jfs" /dev/sdg1: UUID="YeGfVk-ZOWy-Y9Od-7GOm-Tt0M-RKQt-4wDbhh" TYPE="lvm2pv" /dev/dm-1: LABEL="BIG_DISK1" UUID="6a029782-70c9-42fa-a00c-84f18c6065ec" TYPE="jfs" /dev/dm-2: LABEL="BIG_DISK2" UUID="42c5322e-0354-4c94-8e6f-02a774a1a20b" TYPE="jfs" /dev/dm-3: LABEL="BIG_DISK3" UUID="e863b4b7-7cc4-4730-9214-56e8ba42bad0" TYPE="jfs" VolGroup00: /dev/mapper/VolGroup00-LogVol01: TYPE="swap" UUID="a3ab00d9-ba28-4003-978d-b5ef92d15cad" /dev/mapper/VolGroup00-LogVol00: UUID="0359eb7f-b018-48c8-a8fb-0310d1c2af03" TYPE="ext3" /dev/VolGroup00/LogVol00: UUID="0359eb7f-b018-48c8-a8fb-0310d1c2af03" SEC_TYPE="ext2" TYPE="ext3" /dev/VolGroup00/LogVol01: TYPE="swap" UUID="a3ab00d9-ba28-4003-978d-b5ef92d15cad"
Well, there is no problem in lvm code, it does everything ok. Problem is when it tries to activate *correct* stripe table and the involved devices (for example 8:81 8:97) are rejected. The same commands work perfectly in the rescue mode. Devices apparently exist, and are readable (metadata was read from them!) I guess there is some problem that some other subsystem use the device not allowing exclusive access for device-mapper in kernel. (btw blkid labels are strange too..) And what are these devices? lrwxrwxrwx 1 root root 0 2009-01-03 16:19 md0 -> ../devices/virtual/block/md0 lrwxrwxrwx 1 root root 0 2009-01-03 16:15 md_d0 -> ../devices/virtual/block/md_d0 lrwxrwxrwx 1 root root 0 2009-01-03 16:15 md_d1 -> ../devices/virtual/block/md_d1 Please can attach output of "ls -lR /sys/devices/virtual/block" and "cat /proc/mdstat" ? (note to myself: lvmdump should include this, there is only partial listing now)
Created attachment 328299 [details] Results of ls -lR /sys/devices/virtual/block (non rescue mode = bad) Results of mdstat are: Personalities : md_d0 : inactive sdg[1](S) 244198464 blocks md_d1 : inactive sde[1](S) 732574464 blocks unused devices: <none>
Created attachment 328300 [details] Results of ls -lR /sys/devices/virtual/block (rescue mode = better) Results of mdstat are: Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] unused devices: <none>
That md0 array must be coming from: > cat /etc/mdadm.conf ARRAY /dev/md0 level=raid0 num-devices=2 UUID=9c38044f:4dd0f870:61bd87ff:b4d95cda From what I can gather it's always been there. The timestamp on the file is dated back to 2007. I'm not really sure what *should* be in there for dm raid configurations.
(In reply to comment #29) > Personalities : > md_d0 : inactive sdg[1](S) > 244198464 blocks > > md_d1 : inactive sde[1](S) > 732574464 blocks Thats probably the problem - there can be some change in MD subsystem which lock devices, or the Fedora is too clever in detectind md devices and tries to activate old arrays, no idea yet - I need to reproduce somwhere. Anyway, could you try to mdadm --stop /dev/md_d0 (or somehow stop the md device - best unload md modules to be sure etc...), then manually remove dm devices with empty tables if present (dmsetup remove) and try to "vgchange -a y" again - any change now?
You are correct. I had to remove /dev/md_d0 (250GBx2) and /dev/md_d1 (750GBx2) and re-activatee with vgchange, and the tables came back: [root@asparagus ~]# dmsetup table 250GBx2-250GBx2_Vol00: 0 976781312 striped 2 8 8:81 384 8:97 384 750GBx2-750GBx2_Vol02: 0 833126400 striped 2 256 8:65 1048576384 8:49 1048576384 750GBx2-750GBx2_Vol01: 0 1048576000 striped 2 256 8:65 524288384 8:49 524288384 750GBx2-750GBx2_Vol00: 0 1048576000 striped 2 256 8:65 384 8:49 384 VolGroup00-LogVol01: 0 4063232 linear 8:2 75497856 VolGroup00-LogVol00: 0 75497472 linear 8:2 384 How do I stop the md devices from assembling permanently? I moved the mdadm.conf file out of the way, and rebooted and had to perform the above steps over.
great, removing mdadm.conf probably will not help... I am not sure how to safely wipe md signature on devices (mdadm --zero-superblock or something like it should allow it, but I am not sure if there is not any side effect... please be very careful - I do not know exactly what it really wipes) Also there is posible that initrd has some hardcoded md activation command now. Initrd you can fix easily - deactivate md discs and recreate using mkinitrd again - with running "mkinitrd -f /boot/initrd-$(uname -r).img $(uname -r)" or jut run "/usr/libexec/plymouth/plymouth-update-initrd" (install plymouth-scripts if script not there)
I wonder if this part of initrd is intended to cover this scenario (just not gracefully): in initrd/etc/lvm/lvm.conf: # By default, LVM2 will ignore devices used as components of # software RAID (md) devices by looking for md superblocks. # 1 enables; 0 disables. md_component_detection = 1 I wonder if this is new for fc10 lvm.conf.
(In reply to comment #35) > in initrd/etc/lvm/lvm.conf: > > # By default, LVM2 will ignore devices used as components of > # software RAID (md) devices by looking for md superblocks. > # 1 enables; 0 disables. > md_component_detection = 1 > I wonder if this is new for fc10 lvm.conf. No. This flag say lvm to check that drive has md signature (so it doesn't allow create *new* lvm mapping over MD array member.) Your problem is that kernel reject already prepared mapping table (in lower layer), this flag cannot have any influence here - it is after the mapping is prepared and passed through md superblock check in userspace. (Moreover the md part member is /dev/sdg while PV is /dev/sdg1. It is apparently some mix with old configuration.) (and the kernel error message is misleading here, we should probably change it in this situation...)
(In reply to comment #22) > Maybe it is bug 474697 ... > > Heinz, is the comment #21 a known nvidia dmraid related bug? Refering to your "shifted by 125837145 sectors" comment: I wonder if there could be another partition not being discovered/activated upfront p2. This is not a known dmraid bug. We'd need the metadata retrieved "dmraid -rD ; tar jcvf nvidia-bz474074-raid0.tar.bz2 *.{dat,offset,size}" attached here to analyze, if the partition discovery/activation is bogus. BTW (Richie) : does "dmraid -pay ; kpartx -a /dev/mapper/nvidia_cebfcccg" do right ? On the MD front: is the nvidia metadata detected by dmraid just legacy and hence superfluous and obviously invalid WRT the partition tables ? In that case, "dmraid -rE" would be appropriate before creating an MD array to remove it.
I cloned bug #479116 for the dmraid nvidia problem. The md problem is not related to this - there is no dmraid mapping on this system.
I was able to use mdadm to zero the superblock, and it was non-destructive to the existing data/lvm settings. I made a backup just in case though. Before: # mdadm --examine --scan -v ARRAY /dev/md1 level=raid0 num-devices=2 UUID=3fdc3ac0:66799154:22aae56e:2f32ef6d devices=/dev/sde,/dev/sdd ARRAY /dev/md0 level=raid0 num-devices=2 UUID=9c3a2629:0b2bc36c:3f81e48d:ff31ceca devices=/dev/sdg # vgchange -a n 250GBx2 0 logical volume(s) in volume group "250GBx2" now active # vgchange -a n 750GBx2 0 logical volume(s) in volume group "750GBx2" now active # mdadm --zero-superblock /dev/sdg # mdadm --zero-superblock /dev/sde # mdadm --zero-superblock /dev/sdd # vgchange -a y 1 logical volume(s) in volume group "250GBx2" now active 3 logical volume(s) in volume group "750GBx2" now active # mount -a # mdadm --examine --scan -v < no results and mount points check out ok >
BTW Thanks for all your help. It never dawned on me that leftover MD data would suddenly be waking up and destroying my lvm setup. Originally I was testing different setups to compare performance (onboard raid vs MD vs lvm). I hope something good will come of my pain (like a less cryptic error coming from vgchange), so no one else suffers from this (= my own self-induced pain).
The current lvm2 code whould warn (and wipe md superblock automatically) when creating new mapping over md array member. In the worst case it fails during the creation. So the situation here should not happen at all... Here it was some unfortunate combination of old tools (when creating PVs) & new system (when activating). I already fixed lvmdump to contain proper diagnostic data, maybe we should add some FAQ too...
Whats even more bizarre that it was failing making striped LVM's, but not un-striped LVM's. I guess the striped module is a little more finicky?
(In reply to comment #42) > Whats even more bizarre that it was failing making striped LVM's, but not > un-striped LVM's. I guess the striped module is a little more finicky? No, the test should be exactly the same in kernel. You have just the luck that MD signature was on the disks used in stripes only :-) (if the allocation is on disk, which is not locked, it works. stipe allocates always N physical devices accoding to number of stripes. One failed device is enough to cause this message.) (dm_get_device() is the kernel function which failed, it is used in all dm targets when parsing device string.)