Description of problem: Got a pair of storage in Linux mdraid software raid 1. /boot is on raid 1. (md127) / is on LVM, the LVM is on another raid 1. (md126) Originally It is working fine in FC28. After upgrade to FC29. OS is not bootable with the new fc29 kernel. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. FC28 upgrade to FC29 2. Boot to FC29 new kernel 3. cannot boot Actual results: Expected results: Additional info: Appears to be duplicate of the FC28 bug 1575762. Reverting the shipped FC29 packages with the following causes it to work again: lvm2-libs-2.02.175-1.fc27.x86_64.rpm lvm2-2.02.175-1.fc27.x86_64.rpm device-mapper-libs-1.02.144-1.fc27.x86_64.rpm device-mapper-event-libs-1.02.144-1.fc27.x86_64.rpm device-mapper-event-1.02.144-1.fc27.x86_64.rpm device-mapper-1.02.144-1.fc27.x86_64.rpm I would guess that the shipped FC29 versions of these packages did not include the fix that was originally found in FC28 because the problem looks exactly the same and is worked around using exactly the same FC27 packages that were used as the work around for FC28 (until the FC28 packages were fixed).
Can you see any error or warning messages during boot? What MD metadata version are you using? (check output of "mdadm --examine <underlying md component>") If you're dropped to debug shell at boot after the error, please, try to collect the output of "pvs -vvvv". Thanks.
[root@fileserver alexlang]# mdadm --examine /dev/sda1 /dev/sda1: Magic : a92b4efc Version : 0.90.00 UUID : 5d60811d:9299b67c:75b83b5b:2b17a8bb Creation Time : Wed Mar 26 14:30:04 2008 Raid Level : raid5 Used Dev Size : 962081536 (917.51 GiB 985.17 GB) Array Size : 2886244608 (2752.54 GiB 2955.51 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 127 Update Time : Mon Nov 26 17:49:02 2018 State : clean Internal Bitmap : present Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Checksum : 174893de - correct Events : 520831 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 0 8 1 0 active sync /dev/sda1 0 0 8 1 0 active sync /dev/sda1 1 1 8 17 1 active sync /dev/sdb1 2 2 8 33 2 active sync /dev/sdc1 3 3 8 49 3 active sync /dev/sdd1
Using the working lvm2 packages (as mentioned above), I've attached the result of pvs -vvvv. If you need the same files with the "non-working" files, that will take a bit more time but I think I can do it. Just let me know.
Created attachment 1508681 [details] pvs -vvvv files as requested
Created attachment 1508682 [details] pvs -vvvv files as requested
Created attachment 1508683 [details] pvs -vvvv files as requested
(In reply to Alex Lang from comment #3) > Using the working lvm2 packages (as mentioned above), I've attached the > result of pvs -vvvv. If you need the same files with the "non-working" > files, that will take a bit more time but I think I can do it. Just let me > know. Yes, please try to collect those too, I'd like to see and compare the filtering results inside LVM.
Created attachment 1509722 [details] pvs -vvvv from the bad lvm and libmapper
Created attachment 1509723 [details] This is mdadm output for the failing array fro reference
Created attachment 1509724 [details] This is mdadm output for the failing array fro reference
Peter: As requested find 3 new file attachments. One is the pvs -vvvv and the other two are the mdadm --detail /dev/md126 and /dev/md127. Note that the underlying hardware is exactly the same. However, to get these files I used the Fedora 29 Live USB drive with the stock lvm and device-mapper RPMs that shipped with Fedora 29.
Created attachment 1509792 [details] Patch to consider md metadata version 0.90 to read from end of disk Thanks for the logs. The problem is in MD component filtering after changes in lvm2 code that handles disk reading and caching. The issue here is that MD metadata version 0.9 are placed at the end of the disk, just like version 1.0 which we already fixed here: https://sourceware.org/git/?p=lvm2.git;a=commit;h=3fd75d1bcd714b02fb2b843d1928b2a875402f37 So we just need to include version 0.9 there. I've tried that myself and reproduced. With the version we have in F29, the md components are not detected, with the patch, they're detected correctly, but only if lvmetad is not used. I'll attach the logs in a while...
Created attachment 1509793 [details] Patched MD filter to consider version 0.90 metadata - without lvmetad Without lvmetad, both sda and sdb (which are under version 0.9 md) are correctly detected as md components...
Created attachment 1509794 [details] Patched MD filter to consider version 0.90 metadata - with lvmetad With lvmetad, one of the MD components with 0.9 md metadata is not correctly detected...
(In reply to Peter Rajnoha from comment #14) > Created attachment 1509794 [details] > Patched MD filter to consider version 0.90 metadata - with lvmetad #device/dev-io.c:658 Closed /dev/sda #filters/filter-partitioned.c:30 filter partitioned deferred /dev/sda #filters/filter-signature.c:31 filter signature deferred /dev/sda #filters/filter-md.c:102 filter md deferred /dev/sda #label/label.c:684 Processing data from device /dev/sda 8:0 fd 5 block 0x55e425b4c700 #label/label.c:372 Scan filtering /dev/sda #device/dev-io.c:336 /dev/sda: using cached size 262144 sectors #device/dev-io.c:336 /dev/sda: using cached size 262144 sectors #label/label.c:318 /dev/sda: lvm2 label detected at sector 1 I'm not quite sure though why this happens. Dave, any clues? This happens only for that version 0.9 md metadata. From the man md(4), I can see: "The common format — known as version 0.90 — has a superblock that is 4K long and is written into a 64K aligned block that starts at least 64K and less than 128K from the end of the device (i.e. to get the address of the superblock round the size of the device down to a multiple of 64K and then subtract 64K)." Is this correct with any of our assumptions about reading the end of disks vs. the bcache?
I think there might be some inconsistent information I was reading about which md versions have the end superblock, so we'll need to fix this to recognize .9 also. I'll take a look at the lvmetad case, that may be missing the md special case that is in place for normal scanning.
I've pushed your patch to stable and master branches (doing full scan when 0.90 md devices exist, in addition to 1.0.) Without lvmetad, the way it works in label_scan(): 1. dev_cache_scan() gets a list of devs on the system, this includes the md device and the component devs. 2. label_scan checks each of those devs (in sysfs) to see if it's md 0.9/1.0, and if any are it sets the flag use_full_md_check. 3. device scanning is run, sees use_full_md_check and reads both the start and end of every device, so it detects md components. When using lvmetad, the pvscan --cache <dev> does not include step 2, so use_full_md_check is never set. That means step 3 will only read the start of the device and not detect the 0.9/1.0 component devs. I expect this would result in detecting duplicate PVs, which would cause lvmetad to be disabled.
This commit uses the md sysfs detection to force full scans in pvscan with lvmetad (like is done in label_scan when not using lvmetad): https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=a01e1fec0fe7c2fa61577c0e636e907cde7279ea Since lvmetad is not used for activation during boot, that commit is probably not necessary for the original issue reported in this bug. For that, the fix mentioned above, to recognize 0.90, is probably sufficient: https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=0e42ebd6d4012d210084a9ccf8d76f853726de3c
I just upgraded my home server to CentOS 7.6, and I think I experienced the same problem. Is that possible? I disabled lvmetad, and now it works again. With lvmetad, it complains about duplicate PVs. This is a raid 1 situation with version 0.90 metadata. Do I need to open a separate RHEL 7.6 bug? Regards, Andy
Peter, Haven't heard from you (on this ticket) in a while and I haven't seen any of the components updated in the FC29 repos. However, if and when you are ready, I can update my FC29 "live" installation on my USB stick and try it out.
This should be fixed in lvm2-2.02.183-1.fc29 which was pushed to stable already: https://bodhi.fedoraproject.org/updates/FEDORA-2018-4f678211c1
Updated my Fedora 29 Live USB and things looked good! So, allowed the system itself to update to the new lvm2 packages and that also worked well. So, I'm a happy camper now.