Bug 1381996 - TypeError: SynchronizedABCMeta object argument after ** must be a mapping, not NoneType (Intel firmware RAID discovery fails when RAID set is 'migrating')
Summary: TypeError: SynchronizedABCMeta object argument after ** must be a mapping, no...
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: libblockdev
Version: 25
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Vratislav Podzimek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: abrt_hash:1651e9b4756458551e31c84d300...
Keywords: CommonBugs, Reopened
Depends On:
Blocks: F25FinalBlocker
TreeView+ depends on / blocked
 
Reported: 2016-10-05 13:27 UTC by Petr Schindler
Modified: 2016-11-01 11:48 UTC (History)
13 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2016-11-01 11:48:18 UTC


Attachments (Terms of Use)
File: anaconda-tb (258.18 KB, text/plain)
2016-10-05 13:28 UTC, Petr Schindler
no flags Details
File: anaconda.log (12.36 KB, text/plain)
2016-10-05 13:28 UTC, Petr Schindler
no flags Details
File: dnf.librepo.log (11.02 KB, text/plain)
2016-10-05 13:28 UTC, Petr Schindler
no flags Details
File: environ (482 bytes, text/plain)
2016-10-05 13:28 UTC, Petr Schindler
no flags Details
File: hawkey.log (900 bytes, text/plain)
2016-10-05 13:28 UTC, Petr Schindler
no flags Details
File: lsblk_output (1.95 KB, text/plain)
2016-10-05 13:28 UTC, Petr Schindler
no flags Details
File: nmcli_dev_list (1.57 KB, text/plain)
2016-10-05 13:28 UTC, Petr Schindler
no flags Details
File: os_info (449 bytes, text/plain)
2016-10-05 13:28 UTC, Petr Schindler
no flags Details
File: program.log (24.24 KB, text/plain)
2016-10-05 13:28 UTC, Petr Schindler
no flags Details
File: storage.log (59.09 KB, text/plain)
2016-10-05 13:28 UTC, Petr Schindler
no flags Details
File: syslog (108.13 KB, text/plain)
2016-10-05 13:28 UTC, Petr Schindler
no flags Details
File: ifcfg.log (2.26 KB, text/plain)
2016-10-05 13:28 UTC, Petr Schindler
no flags Details
File: packaging.log (887 bytes, text/plain)
2016-10-05 13:28 UTC, Petr Schindler
no flags Details

Description Petr Schindler 2016-10-05 13:27:54 UTC
Description of problem:
This crash appeared right after anaconda started. There is Firmware raid on this machine (not sure if it is relevant).

Version-Release number of selected component:
anaconda-25.20.4-1

The following was filed automatically by anaconda:
anaconda 25.20.4-1 exception report
Traceback (most recent call first):
  File "/usr/lib/python3.5/site-packages/blivet/populator/helpers/disk.py", line 74, in run
    device = self._device_class(name, **kwargs)
  File "/usr/lib/python3.5/site-packages/blivet/populator/populator.py", line 306, in handle_device
    device = helper_class(self, info).run()
  File "/usr/lib/python3.5/site-packages/blivet/threads.py", line 45, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/blivet/populator/populator.py", line 518, in _populate
    self.handle_device(dev)
  File "/usr/lib/python3.5/site-packages/blivet/threads.py", line 45, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/blivet/populator/populator.py", line 451, in populate
    self._populate()
  File "/usr/lib/python3.5/site-packages/blivet/threads.py", line 45, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/blivet/blivet.py", line 271, in reset
    self.devicetree.populate(cleanup_only=cleanup_only)
  File "/usr/lib/python3.5/site-packages/blivet/threads.py", line 45, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/blivet/osinstall.py", line 1175, in storage_initialize
    storage.reset()
  File "/usr/lib64/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.5/site-packages/pyanaconda/threads.py", line 251, in run
    threading.Thread.run(self, *args, **kwargs)
TypeError: SynchronizedABCMeta object argument after ** must be a mapping, not NoneType

Additional info:
addons:         com_redhat_kdump, com_redhat_docker
cmdline:        /usr/bin/python3  /sbin/anaconda
cmdline_file:   BOOT_IMAGE=/syslinux/vmlinuz inst.stage2=hd:UUID=2A80-6F31 quiet
executable:     /sbin/anaconda
hashmarkername: anaconda
kernel:         4.8.0-0.rc7.git0.1.fc25.x86_64
product:        Fedora
release:        Cannot get release name.
type:           anaconda
version:        25

Comment 1 Petr Schindler 2016-10-05 13:28:00 UTC
Created attachment 1207593 [details]
File: anaconda-tb

Comment 2 Petr Schindler 2016-10-05 13:28:02 UTC
Created attachment 1207594 [details]
File: anaconda.log

Comment 3 Petr Schindler 2016-10-05 13:28:03 UTC
Created attachment 1207595 [details]
File: dnf.librepo.log

Comment 4 Petr Schindler 2016-10-05 13:28:05 UTC
Created attachment 1207596 [details]
File: environ

Comment 5 Petr Schindler 2016-10-05 13:28:06 UTC
Created attachment 1207597 [details]
File: hawkey.log

Comment 6 Petr Schindler 2016-10-05 13:28:08 UTC
Created attachment 1207598 [details]
File: lsblk_output

Comment 7 Petr Schindler 2016-10-05 13:28:09 UTC
Created attachment 1207599 [details]
File: nmcli_dev_list

Comment 8 Petr Schindler 2016-10-05 13:28:10 UTC
Created attachment 1207600 [details]
File: os_info

Comment 9 Petr Schindler 2016-10-05 13:28:12 UTC
Created attachment 1207601 [details]
File: program.log

Comment 10 Petr Schindler 2016-10-05 13:28:14 UTC
Created attachment 1207602 [details]
File: storage.log

Comment 11 Petr Schindler 2016-10-05 13:28:16 UTC
Created attachment 1207603 [details]
File: syslog

Comment 12 Petr Schindler 2016-10-05 13:28:18 UTC
Created attachment 1207604 [details]
File: ifcfg.log

Comment 13 Petr Schindler 2016-10-05 13:28:19 UTC
Created attachment 1207605 [details]
File: packaging.log

Comment 14 Petr Schindler 2016-10-05 14:05:12 UTC
This bug seems to happen only when RAID5 or RAID1 is used (with firmware RAID). I was able to install wit RAID0 and without RAID on this computer. I tried to wipefs -a all the disks in raid before creating it, but it didn't help.

There is Intel firmware raid on that computer.

I propose this as a blocker as it violates the beta criterion: "The installer must be able to detect and install to hardware or firmware RAID storage devices."

Comment 15 Adam Williamson 2016-10-05 15:41:51 UTC
argh, the mdadm --examine -E output for pschindl's fwraid set has some wacky value format that mine didn't:

13:21:54,566 INFO program: Running [10] mdadm --examine -E /dev/sda ...

...

     RAID Level : 5 <-- 5
        Members : 3 <-- 3
          Slots : [UUU] <-- [UUU]
    Failed disk : none
      This Slot : 0
     Array Size : 468873216 (223.58 GiB 240.06 GB)
   Per Dev Size : 234436872 (111.79 GiB 120.03 GB)
  Sector Offset : 0
    Num Stripes : 915768
     Chunk Size : 128 KiB <-- 128 KiB
       Reserved : 0
  Migrate State : initialize
      Map State : normal <-- uninitialized

...

Note all those X <-- Y values, I dunno why the hell mdadm decided to do that but that's the problem, I think:

13:21:54,952 DEBUG blivet: blockdev.md.examine error: Failed to parse chunk size from mdexamine data: Failed to parse size spec: 128 KiB <-- 128 KiB

Comment 16 Chris Murphy 2016-10-05 15:55:15 UTC
Adding Doug, see if he's aware of why this output from mdadm would differ between two Intel firmware raid setups.

Comment 17 Adam Williamson 2016-10-05 16:07:56 UTC
We think your array was still in the process of migrating from one RAID level to another when you ran this test (I guess you tested RAID 0, then just 'changed' the set's RAID level rather than recreating it at RAID 1?). That's why you get those <-- values: according to mdadm source / devs, they only show up when the device "is in the middle of a migration".

On that basis I'd be -1 blocker on this, as it's a pretty bad idea to try and run an install to a RAID set that's in the middle of a level migration and I don't think we need to block on that. Could you check if it works once the migration is complete (i.e. once an 'mdadm --examine -E /dev/sda' doesn't show those <-- values)?

Comment 18 Jes Sorensen 2016-10-05 16:22:50 UTC
As Adam said - looks like a reshape. Doing that just before an install is a
really bad idea, even if it probably would work.

That said Anaconda should be able to cope with it, but I agree it shouldn't
be a blocker.

Jes

Comment 19 Adam Williamson 2016-10-05 16:24:20 UTC
in fact, it's probably just that the set was still being *initialized*. Note in the output:

  Migrate State : initialize

looking at the source code (and my output), that is 'idle' if there's no 'migration' going on. It looks like initialization of the set is basically treated as a 'migration', so we get the <-- formatted output showing both maps even though there isn't a level change or anything going on.

Source, btw:

https://github.com/neilbrown/mdadm/blob/master/super-intel.c#L1132

`print_imsm_dev` is the function that prints the --examine output, you can see that we get the <-- format if there is a `map2`. We can also see in the Migrate State and Map State code pretty much what's going on there.

https://github.com/neilbrown/mdadm/blob/master/super-intel.c#L697

`get_imsm_map` gets the maps, and helpfully notes that "A device can have 2 maps if it is in the middle of a migration."

So actually I guess the firmware RAID interface let you out while the set was still initializing, which seems kinda like bad design, but still not something to block Beta on I don't think.

Comment 20 Adam Williamson 2016-10-05 16:26:06 UTC
Jes: when you say "Anaconda should be able to cope with it" - should it just go ahead and install to a set that's migrating? Or should it notice that the set's in a migration state and either wait for it to complete or just warn the user to try again when it's done?

Comment 21 Chris Murphy 2016-10-05 16:53:10 UTC
If there's a fragile period of initialization like setting up a write intent bitmap or something, then I can see waiting for that. But for reshape, it's supposed to be immediately reliable to use the array. Either a reshape or resync could take hours. I'm kinda surprised this hasn't been bumped into before now though.

Comment 22 Adam Williamson 2016-10-05 17:02:45 UTC
Chris: the code for analyzing mdraid sets changed substantially between F24 and F25, it's all just different now; the old code may not have had a problem with this (it parsed mdadm output somewhat differently).

Comment 23 Doug Ledford 2016-10-05 18:21:42 UTC
(In reply to Adam Williamson from comment #19)
> So actually I guess the firmware RAID interface let you out while the set
> was still initializing, which seems kinda like bad design, but still not
> something to block Beta on I don't think.

It's very common for the firmware to initialize the control elements of the array and then do the parity/block initialize in the background.  I wasn't aware that the imsm code displayed an init in such a way, but the action itself is more or less routine.

Comment 24 Adam Williamson 2016-10-05 19:00:03 UTC
yeah, I think we all more or less understand what's going on now, we just have to teach libblockdev to handle the variant output. I'll try and come up with a patch today, if it's too hard (I learn C by doing!) I'll leave it to vpodzime. :P But for Beta I think a commonbugs note is enough, it'd probably be good to fix this better for Final.

Doug / Jes, out of all the migrate states - MIGR_INIT, MIGR_REBUILD, MIGR_VERIFY, MIGR_GEN_MIGR, MIGR_STATE_CHANGE, and MIGR_REPAIR - do you think anaconda should just happily go ahead and install to a set in any of those states? Or should it refuse to install when the set is in some of them?

I'm figuring we should just go ahead and accept MIGR_INIT as it seems to be a pretty normal state, but I'm not sure about the others.

Comment 25 Petr Schindler 2016-10-06 08:57:09 UTC
I tried to erase both disks used in RAID1 (dd if=/dev/zero ...) and even after this it still keeps crashing. I can't find a way how to install on RAID1 or RAID5.

Comment 26 Petr Schindler 2016-10-06 09:32:47 UTC
I just tried to boot F24 installer with the same disk settings and it works fine. Installation finished without errors.

Comment 27 Adam Williamson 2016-10-06 15:41:34 UTC
pschindl: you just have to wait for the initial resync of the RAID set to complete (which will take a good hour or two, depending how big / fast your disks are). You can watch the progress with 'mdadm --detail /dev/md126' (or whatever the node is). Once the sync completes, the 'mdadm --examine /dev/sda' output will no longer have the X <-- Y values, and anaconda will work OK (at least, it did for me, I tested).

Comment 28 Adam Williamson 2016-10-06 20:52:35 UTC
Discussed at 2016-10-06 Fedora 25 Beta Go/No-Go meeting, acting as a blocker review meeting: https://meetbot-raw.fedoraproject.org/teams/f25-beta-go_no_go-meeting/f25-beta-go_no_go-meeting.2016-10-06-17.00.html . Rejected as a Beta blocker but accepted as a Final blocker: we think a commonbugs note advising to wait for the sync operation to complete is sufficient for Beta, but for Final, at least the fairly common 'initial sync' case should be handled smoothly.

Comment 29 Adam Williamson 2016-10-25 22:17:02 UTC
pschindl: can you test with https://www.happyassassin.net/updates/1381996.1.img and see if that helps? Create a new RAID > 0 set (1 or 5, anything with redundancy), immediately boot the installer with the updates image, flip to a console and run 'mdadm -e --examine /dev/sda' and make sure you see the lines with "foo <-- bar" values (especially the Chunk Size one), then run the install and see if it works. I can't do it as my test box isn't working right now (I killed its EFI var store, I think).

PR: https://github.com/rhinstaller/libblockdev/pull/133
Scratch build: http://koji.fedoraproject.org/koji/taskinfo?taskID=16200095

Comment 30 Petr Schindler 2016-10-26 12:21:50 UTC
I tested the updates image from the comment 29 and it solves problem for me. Installation went fine both with RAID1 and RAID5.

Comment 31 Adam Williamson 2016-10-27 21:38:12 UTC
Huh, seems like Bodhi <-> Bugzilla integration is on the fritz: vpodzime submitted an update marked as fixing this, but it didn't get commented or updated.

Update is https://bodhi.fedoraproject.org/updates/FEDORA-2016-4720fedecc . There are a couple of other changes besides my patch in it, so I'll build an installer image to test it.

Comment 32 Fedora Update System 2016-10-28 14:55:23 UTC
libblockdev-1.9-7.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-4720fedecc

Comment 33 Fedora Update System 2016-10-31 07:37:42 UTC
libblockdev-1.9-7.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Comment 34 Adam Williamson 2016-10-31 15:39:29 UTC
Petr, can you verify that this works with the next nightly? Thanks!

Comment 35 Petr Schindler 2016-11-01 11:48:18 UTC
I tested this with Fedora-25-20161031.n.1 which contains libblockdev-1.9-7. It fixes the problem. Both RAID{1,5} work for me now.


Note You need to log in before you can comment on or make changes to this bug.