Bug 1653032

Summary: In Fedora 29, OS is unbootable with MDRaid and LVM2
Product: [Fedora] Fedora Reporter: Alex Lang <alexclang1>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 29CC: agk, ajschorr, alexclang1, anprice, bmarzins, bmr, cfeist, heinzm, jbrassow, jonathan, kzak, lvm-team, mcsontos, msnitzer, prajnoha, prockai, teigland, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-19 23:09:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
pvs -vvvv files as requested
alexclang1: review+
pvs -vvvv files as requested
alexclang1: review+
pvs -vvvv files as requested
alexclang1: review+
pvs -vvvv from the bad lvm and libmapper
none
This is mdadm output for the failing array fro reference
none
This is mdadm output for the failing array fro reference
none
Patch to consider md metadata version 0.90 to read from end of disk
none
Patched MD filter to consider version 0.90 metadata - without lvmetad
none
Patched MD filter to consider version 0.90 metadata - with lvmetad none

Description Alex Lang 2018-11-24 21:48:01 UTC
Description of problem:

Got a pair of storage in Linux mdraid software raid 1. /boot is on raid 1. (md127) / is on LVM, the LVM is on another raid 1. (md126)

Originally It is working fine in FC28.

After upgrade to FC29. OS is not bootable with the new fc29 kernel.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. FC28 upgrade to FC29
2. Boot to FC29 new kernel
3. cannot boot

Actual results:


Expected results:


Additional info:

Appears to be duplicate of the FC28 bug 1575762. Reverting the shipped FC29 packages with the following causes it to work again:

lvm2-libs-2.02.175-1.fc27.x86_64.rpm
lvm2-2.02.175-1.fc27.x86_64.rpm
device-mapper-libs-1.02.144-1.fc27.x86_64.rpm
device-mapper-event-libs-1.02.144-1.fc27.x86_64.rpm
device-mapper-event-1.02.144-1.fc27.x86_64.rpm
device-mapper-1.02.144-1.fc27.x86_64.rpm

I would guess that the shipped FC29 versions of these packages did not include the fix that was originally found in FC28 because the problem looks exactly the same and is worked around using exactly the same FC27 packages that were used as the work around for FC28 (until the FC28 packages were fixed).

Comment 1 Peter Rajnoha 2018-11-26 09:01:14 UTC
Can you see any error or warning messages during boot?

What MD metadata version are you using? (check output of "mdadm --examine <underlying md component>")

If you're dropped to debug shell at boot after the error, please, try to collect the output of "pvs -vvvv". Thanks.

Comment 2 Alex Lang 2018-11-26 22:51:32 UTC
[root@fileserver alexlang]# mdadm --examine /dev/sda1
/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 5d60811d:9299b67c:75b83b5b:2b17a8bb
  Creation Time : Wed Mar 26 14:30:04 2008
     Raid Level : raid5
  Used Dev Size : 962081536 (917.51 GiB 985.17 GB)
     Array Size : 2886244608 (2752.54 GiB 2955.51 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 127

    Update Time : Mon Nov 26 17:49:02 2018
          State : clean
Internal Bitmap : present
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 174893de - correct
         Events : 520831

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       49        3      active sync   /dev/sdd1

Comment 3 Alex Lang 2018-11-26 22:57:33 UTC
Using the working lvm2 packages (as mentioned above), I've attached the result of pvs -vvvv. If you need the same files with the "non-working" files, that will take a bit more time but I think I can do it. Just let me know.

Comment 4 Alex Lang 2018-11-26 22:59:59 UTC
Created attachment 1508681 [details]
pvs -vvvv files as requested

Comment 5 Alex Lang 2018-11-26 23:00:31 UTC
Created attachment 1508682 [details]
pvs -vvvv files as requested

Comment 6 Alex Lang 2018-11-26 23:02:31 UTC
Created attachment 1508683 [details]
pvs -vvvv files as requested

Comment 7 Peter Rajnoha 2018-11-27 10:20:37 UTC
(In reply to Alex Lang from comment #3)
> Using the working lvm2 packages (as mentioned above), I've attached the
> result of pvs -vvvv. If you need the same files with the "non-working"
> files, that will take a bit more time but I think I can do it. Just let me
> know.

Yes, please try to collect those too, I'd like to see and compare the filtering results inside LVM.

Comment 8 Alex Lang 2018-11-29 00:17:01 UTC
Created attachment 1509722 [details]
pvs -vvvv from the bad lvm and libmapper

Comment 9 Alex Lang 2018-11-29 00:18:12 UTC
Created attachment 1509723 [details]
This is mdadm output for the failing array fro reference

Comment 10 Alex Lang 2018-11-29 00:18:46 UTC
Created attachment 1509724 [details]
This is mdadm output for the failing array fro reference

Comment 11 Alex Lang 2018-11-29 00:21:26 UTC
Peter: As requested find 3 new file attachments. One is the pvs -vvvv and the other two are the mdadm --detail /dev/md126 and /dev/md127.

Note that the underlying hardware is exactly the same. However, to get these files I used the Fedora 29 Live USB drive with the stock lvm and device-mapper RPMs that shipped with Fedora 29.

Comment 12 Peter Rajnoha 2018-11-29 11:55:42 UTC
Created attachment 1509792 [details]
Patch to consider md metadata version 0.90 to read from end of disk

Thanks for the logs.

The problem is in MD component filtering after changes in lvm2 code that handles disk reading and caching. The issue here is that MD metadata version 0.9 are placed at the end of the disk, just like version 1.0 which we already fixed here:

https://sourceware.org/git/?p=lvm2.git;a=commit;h=3fd75d1bcd714b02fb2b843d1928b2a875402f37

So we just need to include version 0.9 there.

I've tried that myself and reproduced. With the version we have in F29, the md components are not detected, with the patch, they're detected correctly, but only if lvmetad is not used. I'll attach the logs in a while...

Comment 13 Peter Rajnoha 2018-11-29 11:59:25 UTC
Created attachment 1509793 [details]
Patched MD filter to consider version 0.90 metadata - without lvmetad

Without lvmetad, both sda and sdb (which are under version 0.9 md) are correctly detected as md components...

Comment 14 Peter Rajnoha 2018-11-29 12:00:50 UTC
Created attachment 1509794 [details]
Patched MD filter to consider version 0.90 metadata - with lvmetad

With lvmetad, one of the MD components with 0.9 md metadata is not correctly detected...

Comment 15 Peter Rajnoha 2018-11-29 12:04:59 UTC
(In reply to Peter Rajnoha from comment #14)
> Created attachment 1509794 [details]
> Patched MD filter to consider version 0.90 metadata - with lvmetad


#device/dev-io.c:658           Closed /dev/sda                                                                                                                                                                                                 
#filters/filter-partitioned.c:30            filter partitioned deferred /dev/sda
#filters/filter-signature.c:31            filter signature deferred /dev/sda
#filters/filter-md.c:102           filter md deferred /dev/sda


#label/label.c:684           Processing data from device /dev/sda 8:0 fd 5 block 0x55e425b4c700                                                                                                                                                
#label/label.c:372           Scan filtering /dev/sda           
#device/dev-io.c:336         /dev/sda: using cached size 262144 sectors
#device/dev-io.c:336         /dev/sda: using cached size 262144 sectors
#label/label.c:318         /dev/sda: lvm2 label detected at sector 1


I'm not quite sure though why this happens. Dave, any clues?

This happens only for that version 0.9 md metadata. From the man md(4), I can see:

"The common format — known as version 0.90 — has a superblock that is 4K long and is written into a 64K aligned block that starts at least 64K and less than 128K from the end of the device  (i.e.  to  get  the  address  of  the superblock  round  the  size  of  the  device down to a multiple of 64K and then subtract 64K)."

Is this correct with any of our assumptions about reading the end of disks vs. the bcache?

Comment 16 David Teigland 2018-11-29 16:09:36 UTC
I think there might be some inconsistent information I was reading about which md versions have the end superblock, so we'll need to fix this to recognize .9 also.

I'll take a look at the lvmetad case, that may be missing the md special case that is in place for normal scanning.

Comment 17 David Teigland 2018-11-29 18:55:02 UTC
I've pushed your patch to stable and master branches (doing full scan when 0.90 md devices exist, in addition to 1.0.)

Without lvmetad, the way it works in label_scan():

1. dev_cache_scan() gets a list of devs on the system, this includes the md device and the component devs.

2. label_scan checks each of those devs (in sysfs) to see if it's md 0.9/1.0, and if any are it sets the flag use_full_md_check.

3. device scanning is run, sees use_full_md_check and reads both the start and end of every device, so it detects md components.

When using lvmetad, the pvscan --cache <dev> does not include step 2, so use_full_md_check is never set.  That means step 3 will only read the start of the device and not detect the 0.9/1.0 component devs.  I expect this would result in detecting duplicate PVs, which would cause lvmetad to be disabled.

Comment 18 David Teigland 2018-11-29 20:14:44 UTC
This commit uses the md sysfs detection to force full scans in pvscan with lvmetad (like is done in label_scan when not using lvmetad):

https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=a01e1fec0fe7c2fa61577c0e636e907cde7279ea

Since lvmetad is not used for activation during boot, that commit is probably not necessary for the original issue reported in this bug.  For that, the fix mentioned above, to recognize 0.90, is probably sufficient:

https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=0e42ebd6d4012d210084a9ccf8d76f853726de3c

Comment 19 Andrew Schorr 2018-12-06 01:49:01 UTC
I just upgraded my home server to CentOS 7.6, and I think I experienced the same problem. Is that possible? I disabled lvmetad, and now it works again. With lvmetad, it complains about duplicate PVs. This is a raid 1 situation with version 0.90 metadata. Do I need to open a separate RHEL 7.6 bug?

Regards,
Andy

Comment 20 Alex Lang 2018-12-21 17:17:13 UTC
Peter,

Haven't heard from you (on this ticket) in a while and I haven't seen any of the components updated in the FC29 repos. However, if and when you are ready, I can update my FC29 "live" installation on my USB stick and try it out.

Comment 21 Marian Csontos 2019-01-02 12:31:55 UTC
This should be fixed in lvm2-2.02.183-1.fc29 which was pushed to stable already:

https://bodhi.fedoraproject.org/updates/FEDORA-2018-4f678211c1

Comment 22 Alex Lang 2019-01-07 21:40:15 UTC
Updated my Fedora 29 Live USB and things looked good! So, allowed the system itself to update to the new lvm2 packages and that also worked well. So, I'm a happy camper now.