Bug 731177

Summary: DeviceTreeError: MD RAID device md127 not in devicetree after scanning all slaves
Product: [Fedora] Fedora Reporter: Rudolf Kastl <che666>
Component: anacondaAssignee: David Lehman <dlehman>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 16CC: anaconda-maint-list, awilliam, bloch, bugzilla, dledford, fedora, jonathan, rtguille, vanmeeuwen+fedora, xaphir
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: abrt_hash:519a87963383940344afd288ccb0d183395158a2c73bad8dd1d9cc5cc268015e AcceptedBlocker
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 736530 (view as bug list) Environment:
Last Closed: 2011-09-23 00:38:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 736530    
Bug Blocks: 713564    
Attachments:
Description Flags
anaconda log, alpha kde, selinux=0, reaches 'custom layout'
none
dmesg, f16 alpha kde, selinux=0
none
messages, f16 alpha kde, selinux=0
none
f16 beta tc2 kde, anaconda exeption
none
Comment none

Description Rudolf Kastl 2011-08-16 20:18:44 UTC
abrt version: 2.0.5
executable:     /usr/bin/python
hashmarkername: anaconda
kernel:         3.0.0-1.fc16.x86_64
product:        Fedora
reason:         DeviceTreeError: MD RAID device md127 not in devicetree after scanning all slaves
time:           Tue Aug 16 16:18:24 2011
version:        16

description:
:The following was filed automatically by anaconda:
:anaconda 16.14.5 exception report
:Traceback (most recent call first):
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 753, in addUdevMDDevice
:    "scanning all slaves" % name)
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 999, in addUdevDevice
:    device = self.addUdevMDDevice(info)
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1832, in _populate
:    self.addUdevDevice(dev)
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1789, in populate
:    self._populate(progressWindow)
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/__init__.py", line 474, in reset
:    cleanupOnly=cleanupOnly)
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/__init__.py", line 103, in storageInitialize
:    storage.reset()
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/dispatch.py", line 348, in dispatch
:    self.dir = self.steps[self.step].target(self.anaconda)
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/dispatch.py", line 235, in go_forward
:    self.dispatch()
:  File "/usr/lib64/python2.7/site-packages/pyanaconda/gui.py", line 1198, in nextClicked
:    self.anaconda.dispatch.go_forward()
:DeviceTreeError: MD RAID device md127 not in devicetree after scanning all slaves

Comment 1 Chris Lumens 2011-08-24 15:47:15 UTC
Can you please attach /tmp/anaconda-tb-* to this file?  It should have been attached automatically.

Comment 2 Brian Lane 2011-08-29 20:26:24 UTC
*** Bug 733865 has been marked as a duplicate of this bug. ***

Comment 3 Reartes Guillermo 2011-08-29 22:47:04 UTC
Created attachment 915361 [details]
Comment

(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).

Comment 4 David Lehman 2011-08-30 13:53:13 UTC
(In reply to comment #3)
> Comming from 733865. The first time i used the 'save' option but it seems to
> not to work properly. 

Thanks. Please, in the future, do not paste large amounts of text directly into bug report comments. Use the "Add an attachment" link instead.

Comment 5 Reartes Guillermo 2011-09-02 17:55:46 UTC
Tried with the beta tc and no changes.
I also tried it in a laptop with a single hard disk, and it also fails.

I am supposed to use an unpartitioned disk? (currently i cannot test this).

Comment 6 David Lehman 2011-09-02 18:25:13 UTC
(In reply to comment #5)
> Tried with the beta tc and no changes.
> I also tried it in a laptop with a single hard disk, and it also fails.

I assume it is a different failure. If so, please file a bug for that failure.

> 
> I am supposed to use an unpartitioned disk? (currently i cannot test this).

You should be able to use your disks as they are.


Can you run some commands to collect some information?

  ls /sys/class/block/md*

  cat /proc/mdstat

Thanks.

Comment 7 David Lehman 2011-09-02 19:53:14 UTC
There is no need for you to provide the requested information. I have reproduced the failure in a test environment and am working on a solution. Something has changed in the mdadm package that broke our scanning of devices.

Comment 8 Adam Williamson 2011-09-02 20:10:06 UTC
Impact of this appears to be that anaconda will crash if you have a pre-existing mdraid device, both live and non-live. Proposing as a Beta blocker:

"The installer must be able to create and install to software, hardware or BIOS RAID-0, RAID-1 or RAID-5 partitions for anything except /boot "

The criterion is slightly ambiguous WRT pre-existing partitions, but if anything this failure is *worse*.

Comment 9 Adam Williamson 2011-09-07 21:54:16 UTC
any news on this one, david?

Comment 10 David Lehman 2011-09-07 22:31:22 UTC
I got distracted working on the non-live version of this bug. In that situation what happens is mdadm crashes after mostly stopping the array, but then there is a blkid process stuck trying to probe the non-gone /dev/md0. I filed a bug against mdadm here to try to get help figuring out what is happening: bug 736521

I am going to look into the live variant again now.

Comment 11 David Lehman 2011-09-07 23:26:59 UTC
The live version's problem is related to selinux. Here are the syslog entries from mdadm trying to start the array:

Sep  7 19:12:34 localhost kernel: [   36.687948] type=1400 audit(1315437142.158:4): avc:  denied  { create } for  pid=555 comm="mdadm" name="0" scontext=system_u:system_r:mdadm_t:s0-s0:c0.c1023 tcontext=system_u:object_r:mdadm_var_run_t:s0 tclass=lnk_file
Sep  7 19:12:34 localhost kernel: [   37.686153] md: bind<sda5>
Sep  7 19:12:34 localhost kernel: [   37.874116] md: bind<sda4>
Sep  7 19:12:34 localhost kernel: [   37.900842] type=1400 audit(1315437143.371:5): avc:  denied  { module_request } for  pid=556 comm="mdadm" kmod="md-level-1" scontext=system_u:system_r:mdadm_t:s0-s0:c0.c1023 tcontext=system_u:system_r:kernel_t:s0 tclass=system
Sep  7 19:12:34 localhost kernel: [   37.910393] bio: create slab <bio-1> at 1
Sep  7 19:12:34 localhost kernel: [   37.938256] md: personality for level 1 is not loaded!

It is failing to create a symlink somewhere in /var/run and then it is failing to load the md raid1 kernel module, both times thwarted by selinux.

Comment 12 Adam Williamson 2011-09-08 00:19:52 UTC
I think it should be migrated to /run rather than /var/run , right?

...never mind, /var/run is a symlink to /run...although that probably means things should use /run directly.

Comment 13 Reartes Guillermo 2011-09-08 17:10:55 UTC
I added the parameter selinux=0 to the cdrom's syslinux bootloader and i was able to further in the install process but it fails again later.

When using selinux=0, it ask the LUKS passprhase (i skip it because i do not use
it yet su i lost the testing passprhase and i may delete it safely if necesary).
and then i reach the custom layout, but another anaconda exeption is raised and 
End Of Instalation Attempt (EOIA).

I will attach new logs in a few moments.

Comment 14 Reartes Guillermo 2011-09-08 17:13:12 UTC
Created attachment 522168 [details]
anaconda log, alpha kde, selinux=0, reaches 'custom layout'

Comment 15 Reartes Guillermo 2011-09-08 17:14:28 UTC
Created attachment 522169 [details]
dmesg, f16 alpha kde, selinux=0

Comment 16 Reartes Guillermo 2011-09-08 17:15:44 UTC
Created attachment 522170 [details]
messages, f16 alpha kde, selinux=0

Comment 17 David Lehman 2011-09-08 17:34:33 UTC
(In reply to comment #13)
> I added the parameter selinux=0 to the cdrom's syslinux bootloader and i was
> able to further in the install process but it fails again later.
> 
> When using selinux=0, it ask the LUKS passprhase (i skip it because i do not
> use
> it yet su i lost the testing passprhase and i may delete it safely if
> necesary).
> and then i reach the custom layout, but another anaconda exeption is raised and 
> End Of Instalation Attempt (EOIA).
> 
> I will attach new logs in a few moments.

You are hitting bug 727814 . It is a known bug and will be resolved in the F16 Beta.

Comment 18 Doug Ledford 2011-09-09 00:21:05 UTC
(In reply to comment #12)
> I think it should be migrated to /run rather than /var/run , right?
> 
> ...never mind, /var/run is a symlink to /run...although that probably means
> things should use /run directly.

This is probably the source of the SELinux denials.  The old rules point to /var/run/mdadm, and since /var/run symlinks to /run now, either SELinux needs updated to use /run, or SELinux has been updated to use /run and does not factor in that /var/run points to /run and denies mdadm's attempt to use /var/run.  I'll look into building an mdadm that uses /run for f16.

Comment 19 Adam Williamson 2011-09-09 17:28:00 UTC
Discussed at the 2011-09-09 blocker review meeting. Agreed that this is a blocker under the criterion "The installer must be able to create and install to software, hardware or BIOS RAID-0, RAID-1 or RAID-5 partitions for anything except /boot ". We agreed this criterion should be read as requiring anaconda to correctly recognize and install to pre-existing RAID arrays, as well as creating new ones - especially in light of the mention of BIOS arrays, which are always pre-existing.

Comment 20 Reartes Guillermo 2011-09-10 16:52:27 UTC
Created attachment 522538 [details]
f16 beta tc2 kde, anaconda exeption

Downloaded and tried with the new beta-tc2 image (kde-cdrom).
Used default boot parameters, and found that it still crashes.

Comment 21 Andy Burns 2011-09-12 11:06:45 UTC
I encountered this same issue, my machine has multiple mdraid and lvm devices, booting with enforcing=0 allowed a workaround ...

Comment 22 Adam Williamson 2011-09-14 01:10:08 UTC
I can confirm the fix looks good. I built a live image with anaconda 16.17 and selinux-policy 3.10.0-28. I do not see this traceback on the console when starting liveinst, and I can select my BIOS RAID array in advanced devices and 'continue': previously, anaconda would exit showing the traceback at that point, now it continues.

I can't complete a test install as the RAID array is my production laptop F15 install, but at least I can say this specific issue looks to be fixed.

Comment 23 xaphir 2011-09-21 20:40:58 UTC
Fedora-16-Beta-x86_64-netinst.iso (Sept 15th release) is failing with "MD RAID device not in devicetree" bug.  Selinux=0 was set when it occurred.

Comment 24 Adam Williamson 2011-09-22 00:51:35 UTC
please file a new bug for that, with more details and logs.

Comment 25 xaphir 2011-09-22 07:25:03 UTC
731177 is the bug that was filed from the machine that encountered the blocker, and it looked at the time like it had uploaded the debug log.

Comment 26 Adam Williamson 2011-09-23 00:38:17 UTC
selinux-policy -28 was fixed, so marking this as CLOSED.

Comment 27 Adam Williamson 2011-10-26 02:14:49 UTC
I just hit this with the pre-RC1 image tflink built with anaconda 16.23, not sure what's going on there...

Comment 28 David Lehman 2011-10-26 16:00:38 UTC
(In reply to comment #27)
> I just hit this with the pre-RC1 image tflink built with anaconda 16.23, not
> sure what's going on there...

Unfortunately, this error (like a lot of storage-related errors) can be caused by a variety of problems -- it is a very general failure. Unless you're also seeing avc denials from mdadm commands I'd say you're seeing something different.

Comment 29 Thomas 2011-12-08 15:54:33 UTC
Hitting the same bug with anaconda 16.25. Anaconda breaks with "DeviceTreeError: MD RAID device md127 not in devicetree after scanning all slaves"

Here's the partioning information from kickstatr file:
zerombr
clearpart --all
bootloader --location mbr

part biosboot --fstype=biosboot --size=1
part /boot --fstype=ext3 --size=2048 --asprimary --ondrive=sda
part /boot1 --fstype=ext3 --size=2048 --asprimary --ondrive=sdb

part raid.11 --ondrive=sda --size=1 --grow
part raid.21 --ondrive=sdb --size=1 --grow

raid pv.01 --device=md0 --level=1 raid.11 raid.21

volgroup vgos pv.01

logvol / --fstype=ext4 --name=lvol_root --vgname=vgos --size=6144 --maxsize=6144
logvol /opt --fstype=ext4 --name=lvol_opt --vgname=vgos --size=10240 --maxsize=10240
logvol /usr --fstype=ext4 --name=lvol_usr --vgname=vgos --size=8192 --maxsize=8192
logvol /var --fstype=ext4 --name=lvol_var --vgname=vgos --size=8192 --maxsize=8192
logvol /tmp --fstype=ext4 --name=lvol_tmp --vgname=vgos --size=2048 --maxsize=2048
logvol /export --fstype=ext4 --name=lvol_home --vgname=vgos --size=204800 --maxsize=204800
logvol swap --name=lvol_swap --vgname=vgos --maxsize=4096 --size=4096


After hitting this error, I went to the shell on scren 2 and I can see that there's a device md127 in /proc/mdstat.

Doing pvs -v, vgs -v and lvs -v I can see that all stuff expected from kickstart file is there but anaconda breaks. 

I need an unattended install and this makes installation impossible! Tried to set selinux=0 during kickstart but doesn't help at all.

Regards

Thomas

Comment 30 Doug Ledford 2011-12-08 16:21:35 UTC
If you're hitting this with current anaconda then it is not the same issue that this bug addressed (even if the error message is the same).  Please open a new bug.

Comment 31 David Lehman 2011-12-08 16:36:14 UTC
(In reply to comment #29)
> Hitting the same bug with anaconda 16.25. Anaconda breaks with
> "DeviceTreeError: MD RAID device md127 not in devicetree after scanning all
> slaves"
> 
> After hitting this error, I went to the shell on scren 2 and I can see that
> there's a device md127 in /proc/mdstat.

The usual cause for this error nowadays is an incomplete or degraded array that mdadm has automatically started. You can remove the old raid metadata using wipefs.

> 
> Doing pvs -v, vgs -v and lvs -v I can see that all stuff expected from
> kickstart file is there but anaconda breaks. 

If you're hitting this error then anaconda has created none of the devices specified in your kickstart file yet.

Comment 32 Doug Ledford 2011-12-08 16:50:50 UTC
(In reply to comment #31)
> (In reply to comment #29)
> > Hitting the same bug with anaconda 16.25. Anaconda breaks with
> > "DeviceTreeError: MD RAID device md127 not in devicetree after scanning all
> > slaves"
> > 
> > After hitting this error, I went to the shell on scren 2 and I can see that
> > there's a device md127 in /proc/mdstat.
> 
> The usual cause for this error nowadays is an incomplete or degraded array that
> mdadm has automatically started. You can remove the old raid metadata using
> wipefs.

No need to use wipefs for an md raid superblock.  That's like swatting a fly with a Mack truck.  Just use mdadm --zero-superblock (or is it --zerosuperblock, I sometimes forget where Neil decides to use dashes...he's not totally consistent on that).

> > 
> > Doing pvs -v, vgs -v and lvs -v I can see that all stuff expected from
> > kickstart file is there but anaconda breaks. 
> 
> If you're hitting this error then anaconda has created none of the devices
> specified in your kickstart file yet.

Actually, and this is the reason I told him to create a new bug, I suspect it created the device, but because the kickstart file still assumes that the device mapping is numerical in nature, and the fact that mdadm eschews usage of device numbers, the wrong device number was created.  Of course, I could be wrong, but I didn't think anaconda allowed udev assembly of existing arrays and so didn't think md127 would exist unless anaconda assembled it.

Comment 33 David Lehman 2011-12-08 17:05:16 UTC
(In reply to comment #32)
> (In reply to comment #31)
> > (In reply to comment #29)
> > > Hitting the same bug with anaconda 16.25. Anaconda breaks with
> > > "DeviceTreeError: MD RAID device md127 not in devicetree after scanning all
> > > slaves"
> > > 
> > > After hitting this error, I went to the shell on scren 2 and I can see that
> > > there's a device md127 in /proc/mdstat.
> > 
> > The usual cause for this error nowadays is an incomplete or degraded array that
> > mdadm has automatically started. You can remove the old raid metadata using
> > wipefs.
> 
> No need to use wipefs for an md raid superblock.  That's like swatting a fly
> with a Mack truck.  Just use mdadm --zero-superblock (or is it
> --zerosuperblock, I sometimes forget where Neil decides to use dashes...he's
> not totally consistent on that).

wipefs just removes metadata like md superblocks. Maybe like swatting a fly with a rolled up newspaper instead of a flyswatter :)

> 
> > > 
> > > Doing pvs -v, vgs -v and lvs -v I can see that all stuff expected from
> > > kickstart file is there but anaconda breaks. 
> > 
> > If you're hitting this error then anaconda has created none of the devices
> > specified in your kickstart file yet.
> 
> Actually, and this is the reason I told him to create a new bug, I suspect it
> created the device, but because the kickstart file still assumes that the
> device mapping is numerical in nature, and the fact that mdadm eschews usage of
> device numbers, the wrong device number was created.  Of course, I could be
> wrong, but I didn't think anaconda allowed udev assembly of existing arrays and
> so didn't think md127 would exist unless anaconda assembled it.

Something changed in the anaconda runtime environment which, IIRC, causes the mdadm-specific udev hacks to not be put into place until too late. Either way, mdadm is definitely auto-starting everything, including unusable arrays, in F16.

Comment 34 David Lehman 2011-12-08 17:09:27 UTC
(In reply to comment #29)
> Hitting the same bug with anaconda 16.25. Anaconda breaks with
> "DeviceTreeError: MD RAID device md127 not in devicetree after scanning all
> slaves"

What I failed to make clear in my other comment was that the error you're hitting can only occur when scanning the system's initial configuration -- before creating any devices. This scan is not performed after creating devices.

Comment 35 Doug Ledford 2011-12-08 17:31:03 UTC
OK, but that's still a valid bug that deserves a new bug, and it's still an anaconda issue.

Comment 36 Fedora Update System 2012-03-06 15:00:47 UTC
anaconda-17.12-1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/anaconda-17.12-1.fc17