Bug 1263539

Summary: F23 alpha Installation crashes on system with BIOS fake RAID (DDF meta data)
Product: [Fedora] Fedora Reporter: Martin Wilck <martin.wilck>
Component: python-blivetAssignee: Blivet Maintenance Team <blivet-maint-list>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 23CC: anaconda-maint-list, blivet-maint-list, brucemartin10, chorn, g.kaviyarasu, jonathan, vanmeeuwen+fedora
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1263550 (view as bug list) Environment:
Last Closed: 2016-12-20 14:38:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1263550    
Attachments:
Description Flags
anaconda bug report
none
result of booting F23 alpha live image with rd.md=1 rd.dm=0 rd.md.ddf=1 rd.auto=1
none
screenshot from RHEL 7.1 none

Description Martin Wilck 2015-09-16 06:34:21 UTC
Created attachment 1073902 [details]
anaconda bug report

Description of problem:
Installation crashes on system with BIOS fake RAID (DDF meta data)

Version-Release number of selected component (if applicable):
23.17-1.fc23

How reproducible:
always

Steps to Reproduce:
1. install on Fujitsu PRIMERGY RX1330 (or other model) with BIOS fake RAID

Actual results:
See attached logs

Expected results:
Installation is successful

Additional info:
The fake RAID volumes are visible in the system (on the command line) and configured with dmraid. I had actually expected them to be configured with MD. But that's a minor point compared to the fact that installation doesn't work at all.

Comment 1 Martin Wilck 2015-09-16 13:01:22 UTC
Traceback (most recent call last):
  File "/usr/lib64/python3.4/site-packages/pyanaconda/threads.py", line 253, in run
    threading.Thread.run(self, *args, **kwargs)
  File "/usr/lib64/python3.4/threading.py", line 868, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.4/site-packages/pyanaconda/timezone.py", line 76, in time_initialize
    threadMgr.wait(THREAD_STORAGE)
  File "/usr/lib64/python3.4/site-packages/pyanaconda/threads.py", line 116, in wait
    self.raise_if_error(name)
  File "/usr/lib64/python3.4/site-packages/pyanaconda/threads.py", line 171, in raise_if_error
    raise exc_info[0](exc_info[1]).with_traceback(exc_info[2])
  File "/usr/lib64/python3.4/site-packages/pyanaconda/threads.py", line 253, in run
    threading.Thread.run(self, *args, **kwargs)
  File "/usr/lib64/python3.4/threading.py", line 868, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.4/site-packages/blivet/osinstall.py", line 1123, in storageInitialize
    storage.reset()
  File "/usr/lib/python3.4/site-packages/blivet/blivet.py", line 279, in reset
    self.devicetree.populate(cleanupOnly=cleanupOnly)
  File "/usr/lib/python3.4/site-packages/blivet/devicetree.py", line 554, in populate
    self._populator.populate(cleanupOnly=cleanupOnly)
  File "/usr/lib/python3.4/site-packages/blivet/populator.py", line 1597, in populate
    self._populate()
  File "/usr/lib/python3.4/site-packages/blivet/populator.py", line 1660, in _populate
    self.addUdevDevice(dev)
  File "/usr/lib/python3.4/site-packages/blivet/populator.py", line 713, in addUdevDevice
    device = self.addUdevDMDevice(info)
  File "/usr/lib/python3.4/site-packages/blivet/populator.py", line 304, in addUdevDMDevice
    slave_devices = self._addSlaveDevices(info)
  File "/usr/lib/python3.4/site-packages/blivet/populator.py", line 273, in _addSlaveDevices
    raise DeviceTreeError(msg)
blivet.errors.DeviceTreeError: failed to add slave ddf1_4c53492020202020808627c3000000004711471100001450 of device ddf1_4c53492020202020808627c3000000004711471100001450p7

Comment 2 Martin Wilck 2015-09-16 13:04:58 UTC
It appears that the above stack re-raises a python exception that had occured earlier in blivet:

02:20:55,111 DEBUG blivet:              Populator.addUdevDevice: name: ddf1_4c53492020202020808627c3000000004711471100001450 ; info: {'DEVLINKS': '/dev/disk/b
y-id/dm-uuid-DMRAID-ddf1_4c53492020202020808627c3000000004711471100001450 '
             '/dev/disk/by-id/dm-name-ddf1_4c53492020202020808627c3000000004711471100001450',
 'DEVNAME': '/dev/dm-3',
 'DEVPATH': '/devices/virtual/block/dm-3',
 'DEVTYPE': 'disk',
 'DM_NAME': 'ddf1_4c53492020202020808627c3000000004711471100001450',
 'DM_SUSPENDED': '0',
 'DM_UDEV_DISABLE_DM_RULES_FLAG': '1',
 'DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG': '1',
 'DM_UDEV_PRIMARY_SOURCE_FLAG': '1',
 'DM_UDEV_RULES_VSN': '2',
 'DM_UUID': 'DMRAID-ddf1_4c53492020202020808627c3000000004711471100001450',
 'ID_PART_TABLE_TYPE': 'gpt',
 'ID_PART_TABLE_UUID': '1525de8a-5805-4727-8745-0a8fa5e62a67',
 'MAJOR': '253',
 'MINOR': '3',
 'MPATH_SBIN_PATH': '/sbin',
 'SUBSYSTEM': 'block',
 'TAGS': ':systemd:',
 'USEC_INITIALIZED': '33755393'} ;
02:20:55,112 INFO blivet: scanning ddf1_4c53492020202020808627c3000000004711471100001450 (/sys/devices/virtual/block/dm-3)...
02:20:55,113 DEBUG blivet:                DeviceTree.getDeviceByName: name: ddf1_4c53492020202020808627c3000000004711471100001450 ; incomplete: False ; hidden: False ;
02:20:55,114 DEBUG blivet:                DeviceTree.getDeviceByName returned None
02:20:55,114 INFO blivet: ddf1_4c53492020202020808627c3000000004711471100001450 is a device-mapper device
02:20:55,115 DEBUG blivet:               Populator.addUdevDMDevice: name: ddf1_4c53492020202020808627c3000000004711471100001450 ;
02:20:55,116 DEBUG blivet:                  DeviceTree.getDeviceByName: name: sda ; incomplete: False ; hidden: False ;
02:20:55,118 DEBUG blivet:                  DeviceTree.getDeviceByName returned existing 232.89 GiB disk sda (17) with existing dmraidmember
02:20:55,119 DEBUG blivet:                  DeviceTree.getDeviceByName: name: sdb ; incomplete: False ; hidden: False ;
02:20:55,120 DEBUG blivet:                  DeviceTree.getDeviceByName returned existing 232.89 GiB disk sdb (23) with existing dmraidmember
02:20:55,122 DEBUG blivet:                 DeviceTree.getDeviceByName: name: ddf1_4c53492020202020808627c3000000004711471100001450 ; incomplete: False ; hidden: False ;
02:20:55,123 DEBUG blivet:                 DeviceTree.getDeviceByName returned None
02:20:55,123 DEBUG blivet: lvm filter: adding ddf1_4c53492020202020808627c3000000004711471100001450 to the reject list
02:20:55,123 WARN blivet: ignoring dm device ddf1_4c53492020202020808627c3000000004711471100001450
02:20:55,124 DEBUG blivet: no device obtained for ddf1_4c53492020202020808627c3000000004711471100001450
02:20:55,124 DEBUG blivet:               DeviceTree.getDeviceByName: name: ddf1_4c53492020202020808627c3000000004711471100001450 ; incomplete: False ; hidden: False ;
02:20:55,125 DEBUG blivet:               DeviceTree.getDeviceByName returned None
02:20:55,126 ERR blivet: failure scanning device ddf1_4c53492020202020808627c3000000004711471100001450p7: could not add slave ddf1_4c53492020202020808627c3000000004711471100001450

Comment 3 David Lehman 2015-09-16 13:33:33 UTC
As far as I know, DDF metadata is not supported by the installer or by mdadm. My understanding is that it does not work well even with dmraid.

Comment 4 Martin Wilck 2015-09-16 13:46:24 UTC
That's not true. DDF works quite well with both mdadm and dmraid. Several distributions including RHEL6 (not certain about RHEL7) will offer installation on these fake RAIDs by default. Red Hat based distros have only offered dmraid in the past, other distros have switched to mdadm lately.
 
I personally favor mdadm over dmraid because of the advanced RAID management features (https://raid.wiki.kernel.org/index.php/DDF_Fake_RAID), but I guess that's a matter of taste. mdadm is preferred for IMSM meta data these days, so preferring it for DDF, too, would be consistent.

However, on F22 and F23 neither dmraid nor mdadm can be activated during installation, and that's a disaster.

Looking at the udev rules and dracut code in Fedora, it looks as if mdadm support for DDF was generally available. But blivet/anaconda messes it all up.

In F22, anaconda silently added the command line options "rd.dm=0 rd.md=0" to suppress automatic software RAID assembly by udev, then activated dmraid RAID sets later in the boot process via systemd (fedora-dmraid-activation.service). In F23, "rd.dm=0" seems to have been dropped again. The result in both F22 and F23 is that when anaconda is up, the fake RAID devices are under control of dmraid. Not sure why anaconda doesn't offer them for installation though.

Comment 5 Martin Wilck 2015-09-16 14:04:20 UTC
Created attachment 1074036 [details]
result of booting F23 alpha live image with rd.md=1 rd.dm=0 rd.md.ddf=1 rd.auto=1

Comment 6 Martin Wilck 2015-09-16 14:12:20 UTC
With md activated (comment #5), the anaconda backtrace is the same as in the corresponding case described in bug 1263550. It bails out because of the missing RAID array name. 

  File "/usr/lib/python3.4/site-packages/blivet/populator.py", line 1660, in _populate
    self.addUdevDevice(dev)
  File "/usr/lib/python3.4/site-packages/blivet/populator.py", line 724, in addUdevDevice
    device = self.addUdevPartitionDevice(info)
  File "/usr/lib/python3.4/site-packages/blivet/populator.py", line 404, in addUdevPartitionDevice
    name = blockdev.md.name_from_node(name)
  File "/usr/lib64/python3.4/site-packages/gi/overrides/BlockDev.py", line 416, in wrapped
    raise transform[1](msg)
gi.overrides.BlockDev.MDRaidError: No name found for the node 'md126p1'

Firstly, looking for a RAID name for a partition is obviously wrong. Secondly, this RAID array actually has no name (MD_DEVNAME as printed by mdadm -D --export $MDDEV) associated. However that's no reason not to use it. The array has a valid device node /dev/md126 and a valid UUID, both could be used as "names" in the UI.

Comment 7 Christian Horn 2015-09-16 19:10:19 UTC
Just for understanding:

- I would assume that the installation can be started if the disk blockdevice was completely cleared before. i.e. starting a live dvd and "cat /dev/zero >/dev/sda".  Correct?
- Not sure how standardized/open the raid signature is which is on disks
- I think you would like to not only see the signature ignored/overwritten, but actively picked up and used.
- reminder to self, I guess our driver is doing here what upstream does, but might be worth checking (so if the desired behaviour is not implemented at all, vs. just not activated/configured)

Comment 8 Martin Wilck 2015-09-17 06:51:53 UTC
(In reply to Christian Horn from comment #7)

> - I would assume that the installation can be started if the disk
> blockdevice was completely cleared before. i.e. starting a live dvd and "cat
> /dev/zero >/dev/sda".  Correct?

Probably, yes. But that's not my intention.

> - Not sure how standardized/open the raid signature is which is on disks

DDF is a well-known standard and supported by both dmraid and mdadm. Actually it's the only vendor-neutral standard for RAID meta data that I'm aware of.

http://www.snia.org/tech_activities/standards/curr_standards/ddf

(not to say that these standards are perfect, IMHO they leave too much room for ambiguity and "vendor specific" stuff).

> - I think you would like to not only see the signature ignored/overwritten,
> but actively picked up and used.

Exactly. And it works, I have it working at home under CentOS, OpenSUSE, and SLES. 

> - reminder to self, I guess our driver is doing here what upstream does, but
> might be worth checking (so if the desired behaviour is not implemented at
> all, vs. just not activated/configured)

It's not a driver or low-level tool problem. Several components need to be looked at:

 1 mdadm (or dmraid) need to have proper DDF support. I would claim that this is almost completely done.

 2 udev rules need to be set up to automatically activate this kind of RAID arrays during boot. The dmraid and mdadm packages both come with their set of udev rules for exactly this task, and I am certain that they work well. It's important to make sure that they don't interfere with each other, and to either have a clear preference or give the admin the opportunity to choose.

 3 During boot (initial RAM disk), dracut changes some udev rules using its own parameters (nodmraid, rd.md, rd.md.ddf, etc). It's necessary to check that dracut code, too. dracut currently seems to have basic support for both dmraid and md/DDF (otherwise there would be no need for the rd.md.ddf parameter). Also, the "dracut" command needs to setup the initrd such that the required RAID tools are available during boot (IMO that's done, it's also needed for root on regular md RAID).

 4 Then there is systemd, which has its own ideas about device setup. On F22, it starts "fedora-dmraid-activation.service". This is a pretty obvious indication that Fedora is supposed to have some sort of BIOS RAID support. I don't quite understand why this service is necessary, as there are udev rules for automatic setup already.

 5 During installation, the game is different again because anaconda is doing device setup according to its own rules. 

 6 The boot loader needs to be checked, it has to support boot loader installation on a MD RAID or dmraid device.

I will install my system with Fedora on an USB disk and make some more experiments.

Comment 9 Martin Wilck 2015-09-17 07:46:39 UTC
Created attachment 1074318 [details]
screenshot from RHEL 7.1

I just checked RHEL 7.1 and it offers the BIOS RAID set for installation, using dmraid.

However it doesn't seem to work properly (existing GPT partitions on the RAID set aren't properly detected and displayed), so I'd rather not continue here.

Comment 10 Bruce Martin 2016-03-17 15:54:52 UTC
From Greater Montreal Canada area...

I have been trying to Install the gnome Desktop in a Fedora 23 x86-64 Server installation.

According to Fedora's Instructions (https://docs.fedoraproject.org/en-US/Fedora/13/html/Installation_Guide/sn-switching-to-gui-login.html#sn-enabling-repos)

It informs me that it cannot continue because Yum has been depricated. (Test was at end of February 2016).

These instructions then, appear to themselves be "depricated" or more likely, simply obsolete.

There may exist a current equivalent to do this with rsync or the other protocol that is being used to replace Yum, but if so, I am not aware of it.

YUMEX, as it happens has packages that can go with it so it appears to be able to work as a nice front end for rsync, and the other protocol, in addition to Yum - Bravo! That looks good to me!

However, I wonder if Yumex would be capable of running its normal screens before I get the whole Gnome desktop running, so I can use that to download and install added packages as needed? If so, Bravo again!

If not, could Yumex be yet again upgraded to allow this to work (or to be installable as a groupinstall with its dependencies to do that?)

One of the reasons for needing the GUI desktop is that the intention of building this new server is that it will accommodate many hardware upgrades, such as USB3 firmware RAID boxes, to be installed progressively over some time.

It also will need to accept to mount hard drives in a "toaster" i.e. a USB3 external hard drive dock which may contain up to 2 physical hard drives and/or SSDs, to be swapped on a regular basis ongoing.

Comment 11 Martin Wilck 2016-03-21 11:13:56 UTC
(In reply to Bruce Martin from comment #10)
> From Greater Montreal Canada area...

Greetings from Westfalia, Germany ... you are describing an interesting phenomenon, but I fail to see the relation to this BZ. IMO you should open a new bug.

Comment 12 Fedora End Of Life 2016-11-24 12:30:49 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Fedora End Of Life 2016-12-20 14:38:40 UTC
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.