Bug 1381996
Summary: | TypeError: SynchronizedABCMeta object argument after ** must be a mapping, not NoneType (Intel firmware RAID discovery fails when RAID set is 'migrating') | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Petr Schindler <pschindl> | ||||||||||||||||||||||||||||
Component: | libblockdev | Assignee: | Vratislav Podzimek <vpodzime> | ||||||||||||||||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||||||||||||||||||
Priority: | unspecified | ||||||||||||||||||||||||||||||
Version: | 25 | CC: | anaconda-maint-list, awilliam, bugzilla, dledford, g.kaviyarasu, jonathan, kparal, mkolman, pschindl, robatino, sbueno, vanmeeuwen+fedora, vpodzime | ||||||||||||||||||||||||||||
Target Milestone: | --- | Keywords: | CommonBugs, Reopened | ||||||||||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||||||||
Whiteboard: | abrt_hash:1651e9b4756458551e31c84d300a2185d5241b409256d1eb4b1811b7800c7322; AcceptedBlocker https://fedoraproject.org/wiki/Common_F25_bugs#some-raid-fail | ||||||||||||||||||||||||||||||
Fixed In Version: | libblockdev-1.9-7.fc25 | Doc Type: | If docs needed, set a value | ||||||||||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||||||||||
Last Closed: | 2016-11-01 11:48:18 UTC | Type: | --- | ||||||||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||||||||||||
Bug Blocks: | 1277289 | ||||||||||||||||||||||||||||||
Attachments: |
|
Description
Petr Schindler
2016-10-05 13:27:54 UTC
Created attachment 1207593 [details]
File: anaconda-tb
Created attachment 1207594 [details]
File: anaconda.log
Created attachment 1207595 [details]
File: dnf.librepo.log
Created attachment 1207596 [details]
File: environ
Created attachment 1207597 [details]
File: hawkey.log
Created attachment 1207598 [details]
File: lsblk_output
Created attachment 1207599 [details]
File: nmcli_dev_list
Created attachment 1207600 [details]
File: os_info
Created attachment 1207601 [details]
File: program.log
Created attachment 1207602 [details]
File: storage.log
Created attachment 1207603 [details]
File: syslog
Created attachment 1207604 [details]
File: ifcfg.log
Created attachment 1207605 [details]
File: packaging.log
This bug seems to happen only when RAID5 or RAID1 is used (with firmware RAID). I was able to install wit RAID0 and without RAID on this computer. I tried to wipefs -a all the disks in raid before creating it, but it didn't help. There is Intel firmware raid on that computer. I propose this as a blocker as it violates the beta criterion: "The installer must be able to detect and install to hardware or firmware RAID storage devices." argh, the mdadm --examine -E output for pschindl's fwraid set has some wacky value format that mine didn't: 13:21:54,566 INFO program: Running [10] mdadm --examine -E /dev/sda ... ... RAID Level : 5 <-- 5 Members : 3 <-- 3 Slots : [UUU] <-- [UUU] Failed disk : none This Slot : 0 Array Size : 468873216 (223.58 GiB 240.06 GB) Per Dev Size : 234436872 (111.79 GiB 120.03 GB) Sector Offset : 0 Num Stripes : 915768 Chunk Size : 128 KiB <-- 128 KiB Reserved : 0 Migrate State : initialize Map State : normal <-- uninitialized ... Note all those X <-- Y values, I dunno why the hell mdadm decided to do that but that's the problem, I think: 13:21:54,952 DEBUG blivet: blockdev.md.examine error: Failed to parse chunk size from mdexamine data: Failed to parse size spec: 128 KiB <-- 128 KiB Adding Doug, see if he's aware of why this output from mdadm would differ between two Intel firmware raid setups. We think your array was still in the process of migrating from one RAID level to another when you ran this test (I guess you tested RAID 0, then just 'changed' the set's RAID level rather than recreating it at RAID 1?). That's why you get those <-- values: according to mdadm source / devs, they only show up when the device "is in the middle of a migration". On that basis I'd be -1 blocker on this, as it's a pretty bad idea to try and run an install to a RAID set that's in the middle of a level migration and I don't think we need to block on that. Could you check if it works once the migration is complete (i.e. once an 'mdadm --examine -E /dev/sda' doesn't show those <-- values)? As Adam said - looks like a reshape. Doing that just before an install is a really bad idea, even if it probably would work. That said Anaconda should be able to cope with it, but I agree it shouldn't be a blocker. Jes in fact, it's probably just that the set was still being *initialized*. Note in the output: Migrate State : initialize looking at the source code (and my output), that is 'idle' if there's no 'migration' going on. It looks like initialization of the set is basically treated as a 'migration', so we get the <-- formatted output showing both maps even though there isn't a level change or anything going on. Source, btw: https://github.com/neilbrown/mdadm/blob/master/super-intel.c#L1132 `print_imsm_dev` is the function that prints the --examine output, you can see that we get the <-- format if there is a `map2`. We can also see in the Migrate State and Map State code pretty much what's going on there. https://github.com/neilbrown/mdadm/blob/master/super-intel.c#L697 `get_imsm_map` gets the maps, and helpfully notes that "A device can have 2 maps if it is in the middle of a migration." So actually I guess the firmware RAID interface let you out while the set was still initializing, which seems kinda like bad design, but still not something to block Beta on I don't think. Jes: when you say "Anaconda should be able to cope with it" - should it just go ahead and install to a set that's migrating? Or should it notice that the set's in a migration state and either wait for it to complete or just warn the user to try again when it's done? If there's a fragile period of initialization like setting up a write intent bitmap or something, then I can see waiting for that. But for reshape, it's supposed to be immediately reliable to use the array. Either a reshape or resync could take hours. I'm kinda surprised this hasn't been bumped into before now though. Chris: the code for analyzing mdraid sets changed substantially between F24 and F25, it's all just different now; the old code may not have had a problem with this (it parsed mdadm output somewhat differently). (In reply to Adam Williamson from comment #19) > So actually I guess the firmware RAID interface let you out while the set > was still initializing, which seems kinda like bad design, but still not > something to block Beta on I don't think. It's very common for the firmware to initialize the control elements of the array and then do the parity/block initialize in the background. I wasn't aware that the imsm code displayed an init in such a way, but the action itself is more or less routine. yeah, I think we all more or less understand what's going on now, we just have to teach libblockdev to handle the variant output. I'll try and come up with a patch today, if it's too hard (I learn C by doing!) I'll leave it to vpodzime. :P But for Beta I think a commonbugs note is enough, it'd probably be good to fix this better for Final. Doug / Jes, out of all the migrate states - MIGR_INIT, MIGR_REBUILD, MIGR_VERIFY, MIGR_GEN_MIGR, MIGR_STATE_CHANGE, and MIGR_REPAIR - do you think anaconda should just happily go ahead and install to a set in any of those states? Or should it refuse to install when the set is in some of them? I'm figuring we should just go ahead and accept MIGR_INIT as it seems to be a pretty normal state, but I'm not sure about the others. I tried to erase both disks used in RAID1 (dd if=/dev/zero ...) and even after this it still keeps crashing. I can't find a way how to install on RAID1 or RAID5. I just tried to boot F24 installer with the same disk settings and it works fine. Installation finished without errors. pschindl: you just have to wait for the initial resync of the RAID set to complete (which will take a good hour or two, depending how big / fast your disks are). You can watch the progress with 'mdadm --detail /dev/md126' (or whatever the node is). Once the sync completes, the 'mdadm --examine /dev/sda' output will no longer have the X <-- Y values, and anaconda will work OK (at least, it did for me, I tested). Discussed at 2016-10-06 Fedora 25 Beta Go/No-Go meeting, acting as a blocker review meeting: https://meetbot-raw.fedoraproject.org/teams/f25-beta-go_no_go-meeting/f25-beta-go_no_go-meeting.2016-10-06-17.00.html . Rejected as a Beta blocker but accepted as a Final blocker: we think a commonbugs note advising to wait for the sync operation to complete is sufficient for Beta, but for Final, at least the fairly common 'initial sync' case should be handled smoothly. pschindl: can you test with https://www.happyassassin.net/updates/1381996.1.img and see if that helps? Create a new RAID > 0 set (1 or 5, anything with redundancy), immediately boot the installer with the updates image, flip to a console and run 'mdadm -e --examine /dev/sda' and make sure you see the lines with "foo <-- bar" values (especially the Chunk Size one), then run the install and see if it works. I can't do it as my test box isn't working right now (I killed its EFI var store, I think). PR: https://github.com/rhinstaller/libblockdev/pull/133 Scratch build: http://koji.fedoraproject.org/koji/taskinfo?taskID=16200095 I tested the updates image from the comment 29 and it solves problem for me. Installation went fine both with RAID1 and RAID5. Huh, seems like Bodhi <-> Bugzilla integration is on the fritz: vpodzime submitted an update marked as fixing this, but it didn't get commented or updated. Update is https://bodhi.fedoraproject.org/updates/FEDORA-2016-4720fedecc . There are a couple of other changes besides my patch in it, so I'll build an installer image to test it. libblockdev-1.9-7.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-4720fedecc libblockdev-1.9-7.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report. Petr, can you verify that this works with the next nightly? Thanks! I tested this with Fedora-25-20161031.n.1 which contains libblockdev-1.9-7. It fixes the problem. Both RAID{1,5} work for me now. |