Bug 1602173
| Summary: | Kernel bugcheck on LV conversion from RAID6 to RAID5 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] LVM and device-mapper | Reporter: | Douglas Paul <doug-rh> | ||||||
| Component: | lvm2 | Assignee: | LVM and device-mapper development team <lvm-team> | ||||||
| lvm2 sub component: | Changing Logical Volumes | QA Contact: | cluster-qe <cluster-qe> | ||||||
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |||||||
| Severity: | unspecified | ||||||||
| Priority: | unspecified | CC: | agk, heinzm, jbrassow, msnitzer, prajnoha, zkabelac | ||||||
| Version: | 2.02.179 | Flags: | rule-engine:
lvm-technical-solution?
rule-engine: lvm-test-coverage? |
||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-07-20 14:42:31 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Douglas Paul
2018-07-18 00:04:38 UTC
Additional dmesg log from LV creation until the bugcheck : (lvcreate) [174109.655087] device-mapper: raid: Superblocks created for new raid set [174109.661767] md/raid:mdX: not clean -- starting background reconstruction [174109.666772] md/raid:mdX: device dm-124 operational as raid disk 0 [174109.671694] md/raid:mdX: device dm-179 operational as raid disk 1 [174109.676459] md/raid:mdX: device dm-208 operational as raid disk 2 [174109.681185] md/raid:mdX: device dm-210 operational as raid disk 3 [174109.685906] md/raid:mdX: device dm-212 operational as raid disk 4 [174109.690438] md/raid:mdX: device dm-214 operational as raid disk 5 [174109.696012] md/raid:mdX: raid level 6 active with 6 out of 6 devices, algorithm 8 [174109.799204] mdX: bitmap file is out of date, doing full recovery [174109.901833] md: resync of RAID array mdX [174120.746961] md: mdX: resync done. (lvconvert #1) [174163.621595] md/raid:mdX: device dm-124 operational as raid disk 0 [174163.626293] md/raid:mdX: device dm-179 operational as raid disk 1 [174163.630954] md/raid:mdX: device dm-208 operational as raid disk 2 [174163.635530] md/raid:mdX: device dm-210 operational as raid disk 3 [174163.639992] md/raid:mdX: device dm-212 operational as raid disk 4 [174163.644374] md/raid:mdX: device dm-214 operational as raid disk 5 [174163.649437] md/raid:mdX: raid level 6 active with 6 out of 6 devices, algorithm 8 [174164.601362] md/raid:mdX: device dm-124 operational as raid disk 0 [174164.605900] md/raid:mdX: device dm-179 operational as raid disk 1 [174164.610393] md/raid:mdX: device dm-208 operational as raid disk 2 [174164.614821] md/raid:mdX: device dm-210 operational as raid disk 3 [174164.619149] md/raid:mdX: device dm-212 operational as raid disk 4 [174164.623360] md/raid:mdX: device dm-214 operational as raid disk 5 [174164.628161] md/raid:mdX: raid level 6 active with 6 out of 6 devices, algorithm 18 [174164.939292] md: reshape of RAID array mdX [174179.978504] md: mdX: reshape done. (lvconvert #2) [174203.005817] md/raid:mdX: not clean -- starting background reconstruction [174203.010048] md/raid:mdX: device dm-124 operational as raid disk 0 [174203.014219] md/raid:mdX: device dm-179 operational as raid disk 1 [174203.018203] md/raid:mdX: device dm-208 operational as raid disk 2 [174203.021943] md/raid:mdX: device dm-210 operational as raid disk 3 [174203.025475] md/raid:mdX: device dm-212 operational as raid disk 4 [174203.028877] md/raid:mdX: device dm-214 operational as raid disk 5 [174203.032852] md/raid:mdX: raid level 6 active with 6 out of 6 devices, algorithm 18 [174203.836002] md/raid:mdX: not clean -- starting background reconstruction Tested ok on Fedora 27 (kernel 4.17.6-100.fc27.x86_64 / LVM version: 2.02.175(2) (2017-10-06). Which distribution is this? Please share 'uname -r' and 'lvm version', thanks. As in the description, this is with LVM tools 2.02.173 and 2.02.179. The uname -r was in the bugcheck report in dmesg, 4.14.52-gentoo. The distribution is Gentoo. Version 2.02.173 was from their package, I updated to 2.02.179 myself to test it. It seems to be adding an additional extent during the second run of lvconvert, and I don't understand why (at least this is what I see from the LVM VG config backups). Is the 4.14 series of kernels supported? I can try building a 4.17 series kernel to see if that works better. I just realized that 'lvm version' is a command : LVM version: 2.02.179(2) (2018-06-18) Library version: 1.02.148 (2018-06-18) Driver version: 4.37.0 Configuration: ./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --disable-dependency-tracking --docdir=/usr/share/doc/lvm2-2.02.179 --htmldir=/usr/share/doc/lvm2-2.02.179/html --enable-readline --disable-selinux --enable-pkgconfig --with-confdir=/etc --exec-prefix= --sbindir=/sbin --with-staticdir=/sbin --libdir=/lib64 --with-usrlibdir=/usr/lib64 --with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-default-pid-dir=/run --enable-udev_rules --enable-udev_sync --with-udevdir=/lib/udev/rules.d --disable-lvmlockd-sanlock --disable-udev-systemd-background-jobs --with-systemdsystemunitdir=/lib/systemd/system --enable-dmeventd --enable-cmdlib --enable-applib --enable-fsadm --enable-lvmetad --with-mirrors=internal --with-snapshots=internal --with-thin=internal --with-cache=internal --with-thin-check=/sbin/thin_check --with-cache-check=/sbin/cache_check --with-thin-dump=/sbin/thin_dump --with-cache-dump=/sbin/cache_dump --with-thin-repair=/sbin/thin_repair --with-cache-repair=/sbin/cache_repair --with-thin-restore=/sbin/thin_restore --with-cache-restore=/sbin/cache_restore --with-lvm1=none --with-clvmd=none --with-cluster=none CLDFLAGS=-Wl,-O1 -Wl,--as-needed (In reply to Douglas Paul from comment #3) > As in the description, this is with LVM tools 2.02.173 and 2.02.179. The > uname -r was in the bugcheck report in dmesg, 4.14.52-gentoo. > > The distribution is Gentoo. Version 2.02.173 was from their package, I > updated to 2.02.179 myself to test it. > > It seems to be adding an additional extent during the second run of > lvconvert, and I don't understand why (at least this is what I see from the > LVM VG config backups). That's expected behaviour: it is adding out-of-place reshape space to be used during the reshape process from raid6(_zr) to raid6_ls_6 in order to avoid writing over data in place. > > Is the 4.14 series of kernels supported? I can try building a 4.17 series > kernel to see if that works better. Yes, those are ok. Try a newer kernel though in order to test if that makes a difference in your test case. My confusion is that the first phase of lvconvert seems to be fine, and I do end up with a raid6_ls_6. I notice I forgot to attach the vgcfgbackup files done before and after the second command. I misread them at first, but what seems to happen is that the final extent in the segment (I guess originally added for the reshape) is moved into its own segment. Could that trigger the kernel into thinking the array is unclean? It seems to me that it should be equivalent ... In any case, I don't know why it does this split. I would have expected it just to remove the segment containing the Q syndrome (the P should be the same as the parity for RAID5)? (this is why I suspected a problem in the user-space tools) The bugcheck I am hitting seems to be due to the array being unclean at the point of takeover, I guess. I seems each time I call lvconvert, it first reassembles the RAID as it currently is, then starts the reshape process, so I see two RAID assembly operations in the dmesg: first with the current format and second with the target format. (see comment #1) The second time I run lvconvert (after the reshape, for the takeover) it was complaining (in dmesg) that the array is unclean at that point. Then, at the point of the second assembly operation, I again get the message about it being unclean, then it hits the bugcheck and freezes the array (the repair in progress never completes). I will try again with 4.17.8, and if it still fails, I guess I will build a debug version of the lvmtools and understand why it is doing this segment splitting, or try to find out why the kernel thinks the array is not clean. Created attachment 1460029 [details]
pruned vgcfgbackup before failing command
Created attachment 1460030 [details]
pruned vgcfgbackup 'after' failing command
(In reply to Douglas Paul from comment #7) > My confusion is that the first phase of lvconvert seems to be fine, and I do > end up with a raid6_ls_6. > > I notice I forgot to attach the vgcfgbackup files done before and after the > second command. I misread them at first, but what seems to happen is that > the final extent in the segment (I guess originally added for the reshape) > is moved into its own segment. Could that trigger the kernel into thinking > the array is unclean? It seems to me that it should be equivalent ... > > In any case, I don't know why it does this split. I would have expected it > just to remove the segment containing the Q syndrome (the P should be the > same as the parity for RAID5)? (this is why I suspected a problem in the > user-space tools) > The 2 segments of each rimage LV result from allocating/moving the out-of-place reshape space at/to the the proper offset (it's either at the beginning or at the end depending on the need of a forward reshape with adding stripes or a backward reshape with removing stripes). > The bugcheck I am hitting seems to be due to the array being unclean at the > point of takeover, I guess. With that you mean the respective "md: resync of..." which result from unconditionally requesting synchronization at each md array activation which can be a noop when the array already finished it before. > > I seems each time I call lvconvert, it first reassembles the RAID as it > currently is, then starts the reshape process, so I see two RAID assembly > operations in the dmesg: first with the current format and second with the > target format. (see comment #1) Device-mapper has an active and an inactive mapping table slot per mapped device (i.e. a raid device in this case). A mapping table defines mapped device address segments with start sector and length in sectors, mapping target (e.g. "raid", "striped", ...) and target parameters in ASCII format. The mapping table in the active slot does process mapped device I/O. When the lvconvert command performs a conversion, the new mapping defining e.g. a reshape (via the "raid" target parameters it passed in) is loaded into the inactive slot allocating all necessary resources (memory, threads, ...). This avoids resource constraints like failing allocations to cause blocked I/O. When that succeeds, lvm2 switches the inactive table with the active one, quiescing mapped device I/O before doing so and unquiescing afterwards. The 2 consecutive MD array assembly messages result from lvm2 performing this load cycle twice in order to keep the lvm2 userspace metadata in sync with the kernel. At first it passes the reshape configuration in (e.g. for a raid6_zr -> raid6_ls_6 conversion) which is stored in the raid superblocks on the rmeta SubLVs (see 'lvs -o+devices $vg' for those) and secondly it removes that information and reloads. This is a simplification to show the principle. > > The second time I run lvconvert (after the reshape, for the takeover) it was > complaining (in dmesg) that the array is unclean at that point. Then, at the > point of the second assembly operation, I again get the message about it > being unclean, then it hits the bugcheck and freezes the array (the repair > in progress never completes). The BUG_ON triggered shows that the reshape has finished but the old and new raid levels aren't the same which shouldn't be the case on takeover. Please report if a newer kernel still fails for you, thanks. > > I will try again with 4.17.8, and if it still fails, I guess I will build a > debug version of the lvmtools and understand why it is doing this segment > splitting, or try to find out why the kernel thinks the array is not clean. As mentioned, any reload involves requesting synchronization causing the respective kernel message which may be a noop when it was already finished before. Let's see a newer kernel... The suspicious (to me) kernel messages I was mentioning that I got only in the second lvconvert were these ones: [174203.005817] md/raid:mdX: not clean -- starting background reconstruction Running 4.17.8 the results are definitely different. On the first lvconvert, I get this: [ 514.269796] md/raid:mdX: device dm-226 operational as raid disk 0 ... [ 514.269801] md/raid:mdX: device dm-234 operational as raid disk 4 [ 514.269802] md/raid:mdX: device dm-236 operational as raid disk 5 [ 514.270462] md/raid:mdX: raid level 6 active with 6 out of 6 devices, algorithm 8 [ 515.331847] md: reshape of RAID array mdX [ 517.767100] md/raid:mdX: device dm-226 operational as raid disk 0 ... [ 517.767105] md/raid:mdX: device dm-234 operational as raid disk 4 [ 517.767106] md/raid:mdX: device dm-236 operational as raid disk 5 [ 517.767718] md/raid:mdX: raid level 6 active with 6 out of 6 devices, algorithm 18 [ 517.832551] md: mdX: reshape interrupted. [ 518.756190] md: reshape of RAID array mdX [ 532.952524] md: mdX: reshape done. Looks fine. In the second lvconvert, I get a new warning during the command execution: # lvconvert --type raid5 Depot/reshape_test Using default stripesize 64.00 KiB. Replaced LV type raid5 (same as raid5_ls) with possible type raid5_ls. Repeat this command to convert to raid5 after an interim conversion has finished. Are you sure you want to convert raid6_ls_6 LV Depot/reshape_test to raid5_ls type? [y/n]: y WARNING: Sync status for Depot/reshape_test is inconsistent. <==== NEW WARNING Logical volume Depot/reshape_test successfully converted. And in dmesg: [ 542.454610] md/raid:mdX: not clean -- starting background reconstruction [ 542.454648] md/raid:mdX: device dm-226 operational as raid disk 0 ... [ 542.454652] md/raid:mdX: device dm-234 operational as raid disk 4 [ 542.454653] md/raid:mdX: device dm-236 operational as raid disk 5 [ 542.455270] md/raid:mdX: raid level 6 active with 6 out of 6 devices, algorithm 18 [ 543.371373] md: resync of RAID array mdX [ 543.371387] md: mdX: resync done. [ 543.877618] md/raid:mdX: device dm-226 operational as raid disk 0 ... [ 543.877623] md/raid:mdX: device dm-234 operational as raid disk 4 [ 543.878445] md/raid:mdX: raid level 5 active with 5 out of 5 devices, algorithm 2 [ 544.924005] md/raid:mdX: not clean -- starting background reconstruction [ 544.924060] md/raid:mdX: device dm-226 operational as raid disk 0 ... [ 544.924069] md/raid:mdX: device dm-234 operational as raid disk 4 [ 544.924862] md/raid:mdX: raid level 5 active with 5 out of 5 devices, algorithm 2 [ 545.780715] md: resync of RAID array mdX [ 545.780732] md: mdX: resync done. Looks fine to me. So I guess there is something missing in the 4.14 series of kernels. Maybe some fix needs to be backported or have LVM reshaping disabled on those kernels? For an extra check, I did a dd from /dev/urandom after the LV was created, and check the data after the full conversion to RAID5, and the data matched fine. And looking at a new vgcfgbackup, the segments look sane, with the reshape data cleaned up (they have no extra flag on the LV type) Yes, there's activation related patches. Please rely on distro supported kernels or use the newer kernel you provided. |