Bug 1382274 - gi.overrides.BlockDev.DMError: Failed to group_set
Summary: gi.overrides.BlockDev.DMError: Failed to group_set
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: dmraid
Version: 25
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: LVM and device-mapper development team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: abrt_hash:500877468c4bf0e8f85279744b8...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-06 08:28 UTC by Petr Schindler
Modified: 2017-12-12 11:06 UTC (History)
15 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-12-12 11:06:24 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
File: anaconda-tb (208.65 KB, text/plain)
2016-10-06 08:28 UTC, Petr Schindler
no flags Details
File: anaconda.log (11.75 KB, text/plain)
2016-10-06 08:28 UTC, Petr Schindler
no flags Details
File: dnf.librepo.log (11.02 KB, text/plain)
2016-10-06 08:28 UTC, Petr Schindler
no flags Details
File: environ (482 bytes, text/plain)
2016-10-06 08:28 UTC, Petr Schindler
no flags Details
File: hawkey.log (900 bytes, text/plain)
2016-10-06 08:28 UTC, Petr Schindler
no flags Details
File: lsblk_output (1.95 KB, text/plain)
2016-10-06 08:28 UTC, Petr Schindler
no flags Details
File: nmcli_dev_list (1.56 KB, text/plain)
2016-10-06 08:28 UTC, Petr Schindler
no flags Details
File: os_info (449 bytes, text/plain)
2016-10-06 08:28 UTC, Petr Schindler
no flags Details
File: program.log (6.80 KB, text/plain)
2016-10-06 08:28 UTC, Petr Schindler
no flags Details
File: storage.log (20.02 KB, text/plain)
2016-10-06 08:28 UTC, Petr Schindler
no flags Details
File: syslog (113.53 KB, text/plain)
2016-10-06 08:28 UTC, Petr Schindler
no flags Details
File: ifcfg.log (2.25 KB, text/plain)
2016-10-06 08:28 UTC, Petr Schindler
no flags Details
File: packaging.log (885 bytes, text/plain)
2016-10-06 08:28 UTC, Petr Schindler
no flags Details
output of dmraid -b (166 bytes, text/plain)
2016-10-18 13:59 UTC, Petr Schindler
no flags Details
output of dmraid -l (467 bytes, text/plain)
2016-10-18 14:00 UTC, Petr Schindler
no flags Details
output of dmraid -s (235 bytes, text/plain)
2016-10-18 14:01 UTC, Petr Schindler
no flags Details
output of lsblk --fs (1.34 KB, text/plain)
2016-10-18 14:02 UTC, Petr Schindler
no flags Details
first MiB of /dev/sda (1.00 MB, application/octet-stream)
2016-10-18 14:03 UTC, Petr Schindler
no flags Details
first MiB of /dev/sdb (1.00 MB, application/octet-stream)
2016-10-18 14:03 UTC, Petr Schindler
no flags Details
last MiB of sda and sdb (1.19 KB, application/x-xz)
2016-10-19 14:20 UTC, Kamil Páral
no flags Details

Description Petr Schindler 2016-10-06 08:28:15 UTC
Description of problem:
Anaconda crashed right after start. This is probably coused by disks set into RAID1 (firmware raid)

Version-Release number of selected component:
anaconda-25.20.4-1

The following was filed automatically by anaconda:
anaconda 25.20.4-1 exception report
Traceback (most recent call first):
  File "/usr/lib64/python3.5/site-packages/gi/overrides/BlockDev.py", line 441, in wrapped
    raise transform[1](msg)
  File "/usr/lib/python3.5/site-packages/blivet/populator/helpers/dmraid.py", line 56, in run
    rs_names = blockdev.dm.get_member_raid_sets(name, uuid, major, minor)
  File "/usr/lib/python3.5/site-packages/blivet/populator/populator.py", line 345, in handle_format
    helper_class(self, info, device).run()
  File "/usr/lib/python3.5/site-packages/blivet/threads.py", line 45, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/blivet/populator/populator.py", line 318, in handle_device
    self.handle_format(info, device)
  File "/usr/lib/python3.5/site-packages/blivet/threads.py", line 45, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/blivet/populator/populator.py", line 518, in _populate
    self.handle_device(dev)
  File "/usr/lib/python3.5/site-packages/blivet/threads.py", line 45, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/blivet/populator/populator.py", line 451, in populate
    self._populate()
  File "/usr/lib/python3.5/site-packages/blivet/threads.py", line 45, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/blivet/blivet.py", line 271, in reset
    self.devicetree.populate(cleanup_only=cleanup_only)
  File "/usr/lib/python3.5/site-packages/blivet/threads.py", line 45, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/blivet/osinstall.py", line 1175, in storage_initialize
    storage.reset()
  File "/usr/lib64/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.5/site-packages/pyanaconda/threads.py", line 251, in run
    threading.Thread.run(self, *args, **kwargs)
gi.overrides.BlockDev.DMError: Failed to group_set

Additional info:
addons:         com_redhat_kdump, com_redhat_docker
cmdline:        /usr/bin/python3  /sbin/anaconda
cmdline_file:   BOOT_IMAGE=/images/pxeboot/vmlinuz inst.stage2=hd:LABEL=Fedora-S-dvd-x86_64-25 rd.live.check quiet
executable:     /sbin/anaconda
hashmarkername: anaconda
kernel:         4.8.0-0.rc7.git0.1.fc25.x86_64
product:        Fedora
release:        Cannot get release name.
type:           anaconda
version:        25

Comment 1 Petr Schindler 2016-10-06 08:28:21 UTC
Created attachment 1207841 [details]
File: anaconda-tb

Comment 2 Petr Schindler 2016-10-06 08:28:22 UTC
Created attachment 1207842 [details]
File: anaconda.log

Comment 3 Petr Schindler 2016-10-06 08:28:24 UTC
Created attachment 1207843 [details]
File: dnf.librepo.log

Comment 4 Petr Schindler 2016-10-06 08:28:26 UTC
Created attachment 1207844 [details]
File: environ

Comment 5 Petr Schindler 2016-10-06 08:28:27 UTC
Created attachment 1207845 [details]
File: hawkey.log

Comment 6 Petr Schindler 2016-10-06 08:28:28 UTC
Created attachment 1207846 [details]
File: lsblk_output

Comment 7 Petr Schindler 2016-10-06 08:28:30 UTC
Created attachment 1207847 [details]
File: nmcli_dev_list

Comment 8 Petr Schindler 2016-10-06 08:28:31 UTC
Created attachment 1207848 [details]
File: os_info

Comment 9 Petr Schindler 2016-10-06 08:28:32 UTC
Created attachment 1207849 [details]
File: program.log

Comment 10 Petr Schindler 2016-10-06 08:28:34 UTC
Created attachment 1207850 [details]
File: storage.log

Comment 11 Petr Schindler 2016-10-06 08:28:36 UTC
Created attachment 1207851 [details]
File: syslog

Comment 12 Petr Schindler 2016-10-06 08:28:38 UTC
Created attachment 1207852 [details]
File: ifcfg.log

Comment 13 Petr Schindler 2016-10-06 08:28:39 UTC
Created attachment 1207853 [details]
File: packaging.log

Comment 14 Petr Schindler 2016-10-06 08:59:29 UTC
This bug appears whenever I want to use firmware RAID on this computer. I tried to erase both disks (there are just two) and it didn't help.

Strange thing is that when I run lsblk, those two disks aren't listed as part of md* device (there is no such device at all).

Comment 15 Petr Schindler 2016-10-06 09:37:52 UTC
I tested this with Fedora 24 and this bug appears there too. So it's not new.

I propose this as a blocker even though that I think it could be some problem with hardware (as I think that we tested F24 on this computer too, but I'm not sure). Non-functional RAID is violation the Beta criterion: "The installer must be able to detect and install to hardware or firmware RAID storage devices."

Comment 16 Adam Williamson 2016-10-06 15:36:23 UTC
This is NVIDIA firmware RAID?

Comment 17 Adam Williamson 2016-10-06 15:47:48 UTC
"Strange thing is that when I run lsblk, those two disks aren't listed as part of md* device (there is no such device at all)."

This looks like a firmware RAID case where dmraid rather than mdraid is used, so yes, you won't get any md devices :)

Comment 18 David Lehman 2016-10-06 15:48:47 UTC
Looks to me like dmraid doesn't like your array.

Here's the list of block devices reported by udev when anaconda probes storage:

08:26:10,995 INFO blivet: devices to scan: ['sdc', 'sdc1', 'sdc2', 'sda', 'sdb', 'sr0', 'loop0', 'loop1', 'loop2', 'live-rw', 'live-base']


And here's the lsblk output:

08:26:32,553 INFO program: Running... lsblk --perms --fs --bytes
08:26:32,577 INFO program: NAME                SIZE OWNER GROUP MODE       NAME        FSTYPE                        LABEL                  UUID                                 MOUNTPOINT
08:26:32,578 INFO program: loop1         2147483648 root  disk  brw-rw---- loop1       ext4                          Anaconda               9674df00-c062-44b6-b861-21eab39708ab
08:26:32,578 INFO program: |-live-base   2147483648 root  disk  brw-rw---- |-live-base ext4                          Anaconda               9674df00-c062-44b6-b861-21eab39708ab
08:26:32,578 INFO program: `-live-rw     2147483648 root  disk  brw-rw---- `-live-rw   ext4                          Anaconda               9674df00-c062-44b6-b861-21eab39708ab /
08:26:32,579 INFO program: sdb         500107862016 root  disk  brw-rw---- sdb         promise_fasttrack_raid_member
08:26:32,579 INFO program: sr0           1073741312 root  cdrom brw-rw---- sr0
08:26:32,579 INFO program: loop2          536870912 root  disk  brw-rw---- loop2       DM_snapshot_cow
08:26:32,579 INFO program: `-live-rw     2147483648 root  disk  brw-rw---- `-live-rw   ext4                          Anaconda               9674df00-c062-44b6-b861-21eab39708ab /
08:26:32,579 INFO program: loop0          409735168 root  disk  brw-rw---- loop0       squashfs
08:26:32,579 INFO program: sdc          15552479232 root  disk  brw-rw---- sdc         iso9660                       Fedora-S-dvd-x86_64-25 2016-10-05-05-31-59-00               /run/install/repo
08:26:32,580 INFO program: |-sdc2           5447680 root  disk  brw-rw---- |-sdc2      vfat                          ANACONDA               61C8-BEAF
08:26:32,580 INFO program: `-sdc1        2042626048 root  disk  brw-rw---- `-sdc1      iso9660                       Fedora-S-dvd-x86_64-25 2016-10-05-05-31-59-00
08:26:32,580 INFO program: sda          80026361856 root  disk  brw-rw---- sda         promise_fasttrack_raid_member
08:26:32,580 DEBUG program: Return code: 0


So it looks like sda and sdb are members of a promise fasttrack array that dmraid for some reason did not choose to activate during system startup.


Reassigning to dmraid for further investigation...

Comment 19 Adam Williamson 2016-10-06 20:57:48 UTC
Discussed at 2016-10-06 Fedora 25 Beta Go/No-Go meeting, acting as a blocker review meeting: https://meetbot-raw.fedoraproject.org/teams/f25-beta-go_no_go-meeting/f25-beta-go_no_go-meeting.2016-10-06-17.00.html . Rejected as a blocker: this was reported very late so it's hard to make a definitive call, but given the lateness of the report and the feeling we have that this may be an issue with the RAID set rather than a genuine bug, we decided to reject it as a Beta blocker. If we investigate further and determine that it's a genuine bug that may affect many dmraid cases, it may become a Final blocker.

pschindl, dlehman said there's a few things you can get that might help us to figure out what's going on:

<dlehman> heinz knows better than I do, but he might start w/ 'dmraid -rv ; dmraid -sv'
<dlehman> or try to run whatever part of systemd should have activated the array during bootup
<dlehman> and see what happens
<dlehman> dmraid -l might be of interest (to see if 'pdc' is in the output)

Comment 20 Heinz Mauelshagen 2016-10-06 23:17:53 UTC
Yes, output of "dmraid -s"/"dmraid -b" would be useful to get.

If this is pdc, you may want to retrieve the first MiB of both
disks for further analysis and attach them.

If you just want to get rid of any RAID metadata, wipe the first MiB of each component disk unless "wipefs -all /dev/sd[ab]" does it for you.

Comment 21 Petr Schindler 2016-10-18 13:59:49 UTC
Created attachment 1211750 [details]
output of dmraid -b

Comment 22 Petr Schindler 2016-10-18 14:00:31 UTC
Created attachment 1211751 [details]
output of dmraid -l

Comment 23 Petr Schindler 2016-10-18 14:01:37 UTC
Created attachment 1211752 [details]
output of dmraid -s

Comment 24 Petr Schindler 2016-10-18 14:02:14 UTC
Created attachment 1211753 [details]
output of lsblk --fs

Comment 25 Petr Schindler 2016-10-18 14:03:09 UTC
Created attachment 1211754 [details]
first MiB of /dev/sda

Comment 26 Petr Schindler 2016-10-18 14:03:52 UTC
Created attachment 1211755 [details]
first MiB of /dev/sdb

Comment 27 Heinz Mauelshagen 2016-10-18 14:55:14 UTC
(In reply to Petr Schindler from comment #26)
> Created attachment 1211755 [details]
> first MiB of /dev/sdb

Petr, both dumps of sda and sdb contain zeroes?
Please check/recreate.

Just to confirm:
do you have a Promise FakeRAID BIOS on the machine which discovers the RAID set ok?

Comment 28 Kamil Páral 2016-10-19 08:48:53 UTC
(In reply to Heinz Mauelshagen from comment #27)
> Petr, both dumps of sda and sdb contain zeroes?
> Please check/recreate.

I checked that, that's really the contents of sda and sdb, all zeroes. I don't know how firmware raid works, but shouldn't there be some kind of raid signature? How does lsblk know the disk is "promise_fasttrack_raid_member"?

> 
> Just to confirm:
> do you have a Promise FakeRAID BIOS on the machine which discovers the RAID
> set ok?

Sorry, I don't understand. The board is Asus M5A97 PRO, and when I set RAID as disk controller and set up RAID 1 in the integrated tool (I tried both fast and full initialization), everything looks in order in that integrated tool. I should also mention this worked for us in the past, but when we tried it now with older release (F24), it didn't work either. It is possible this might a hardware failure (raid controller fried or something), but I don't know how to distinguish that.

Comment 29 Heinz Mauelshagen 2016-10-19 13:25:24 UTC
(In reply to Kamil Páral from comment #28)
> (In reply to Heinz Mauelshagen from comment #27)
> > Petr, both dumps of sda and sdb contain zeroes?
> > Please check/recreate.
> 
> I checked that, that's really the contents of sda and sdb, all zeroes. I
> don't know how firmware raid works, but shouldn't there be some kind of raid
> signature? How does lsblk know the disk is "promise_fasttrack_raid_member"?
> 

I was mistaken about the metadata location, sorry.

We need the last MiB of each component device attached.
That's where lsblk found the identifier "Promise Technology, Inc." causing
"promise_fasttrack_raid_member" to be displayed
(see libblkid/src/superblocks/promise_raid.c in the util-linux package).


> > 
> > Just to confirm:
> > do you have a Promise FakeRAID BIOS on the machine which discovers the RAID
> > set ok?
> 
> Sorry, I don't understand. The board is Asus M5A97 PRO, and when I set RAID
> as disk controller and set up RAID 1 in the integrated tool (I tried both
> fast and full initialization), everything looks in order in that integrated
> tool. 

With "integrated tool", you're likely referring to the BIOS RAID support/utility on the Motherboard (or a Promise FakeRAID controller plugged in) selected by some hot key (combination) during POST, which allows to boot off such software RAID and to manage it (i.e. setting it up/displaying information on it).

> I should also mention this worked for us in the past, but when we
> tried it now with older release (F24), it didn't work either. It is possible
> this might a hardware failure (raid controller fried or something), but I
> don't know how to distinguish that.

The controller doesn't seem to be reasoning it, presumably you got reliable access to the 2 disks from the BIOS RAID utility and from Linux.

You aren't noticing any disk SMART errors, are you?
Use "for d in /dev/sd[ab];do smartctl -l error $d;done" to see their error logs.

Once I have the metadata, I can analyse if this is a bug in dmraid and tell more.

BTW: are you able to cause this failure on a different system or is it singular?

Comment 30 Kamil Páral 2016-10-19 14:20:56 UTC
Created attachment 1212170 [details]
last MiB of sda and sdb

Comment 31 Kamil Páral 2016-10-19 14:24:22 UTC
(In reply to Heinz Mauelshagen from comment #29)
> We need the last MiB of each component device attached.

Attached.

> With "integrated tool", you're likely referring to the BIOS RAID
> support/utility on the Motherboard 

Yes.

> (or a Promise FakeRAID controller plugged in) 

Nope, no external controller.

> selected by some hot key (combination) during POST, which allows to boot
> off such software RAID and to manage it (i.e. setting it up/displaying
> information on it).

Exactly.

> You aren't noticing any disk SMART errors, are you?
> Use "for d in /dev/sd[ab];do smartctl -l error $d;done" to see their error
> logs.

Smart was disabled, I had to enable it with "-s on". This is the output:

$ smartctl -s on -l error /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.8.1-1.fc25.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

=== START OF READ SMART DATA SECTION ===
SMART Error Log not supported

$ smartctl -s on -l error /dev/sdb
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.8.1-1.fc25.x86_64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
No Errors Logged

> BTW: are you able to cause this failure on a different system or is it
> singular?

We don't have any other non-intel firmware raid available in any other system, so I can't really tell. We haven't seen this error with intel firmware raid.

Comment 32 Heinz Mauelshagen 2016-10-19 16:38:41 UTC
Kamil,

I am able to analyse further with your metadata as of comment #30.

Your sda doesn't support error logging, whereas sdb does so we can't be sure
about sda's sanity.

Comment 33 Kamil Páral 2016-10-20 07:46:00 UTC
sda is Intel SSD SC2CT08. I'm surprised it doesn't support SMART. But I've done numerous installations to it (in non-RAID mode) recently and had no issues with it. However, if you can't find any other error or have a strong suspicion of a disk failure, I'll replace it with a different drive and try again.

Comment 34 Kamil Páral 2016-10-20 08:53:53 UTC
Interesting is that I can show the SMART values for sda in gnome-disks, it shows them without problems, and all values are marked as OK. Even short self-test passed. So the drive is probably OK and it's just some smartctl issue.

Comment 35 Fedora End Of Life 2017-11-16 18:47:53 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 36 Fedora End Of Life 2017-12-12 11:06:24 UTC
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.