Bug 876441

Summary: Early crash while evaluating existing RAID arrays
Product: [Fedora] Fedora Reporter: Jes Sorensen <Jes.Sorensen>
Component: anacondaAssignee: David Lehman <dlehman>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 18CC: anaconda-maint-list, awilliam, dledford, g.kaviyarasu, Jes.Sorensen, jonathan, jsmith.fedora, kparal, mschmidt, sbueno, systemd-maint, tiansworld, vanmeeuwen+fedora
Target Milestone: ---Keywords: CommonBugs, Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: https://fedoraproject.org/wiki/Common_F18_bugs#broken-lvm-raid-btrfs AcceptedNTH
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-20 12:39:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 752665    
Attachments:
Description Flags
X.log
none
anaconda.log
none
ifcfg.log
none
packaging.log
none
program.log
none
storage.log
none
syslog
none
anaconda-tb
none
X.log smoketest
none
anaconda.log smoketest
none
ifcfg.log smoketest
none
packaging.log smoketest
none
program.log smoketest
none
storage.log smoketest
none
syslog smoketest
none
anaconda-tb smoketest none

Description Jes Sorensen 2012-11-14 07:30:24 UTC
Description of problem:
When trying to run the TC18 install DVD image, Anaconda never gets up,
instead there is a small window for about a microsecond and I am left with
a blank X screen.

Switching to VC1 I see a message that says 'pane is dead'.

Version-Release number of selected component (if applicable):
Fedora 18 TC8

How reproducible:
Every time

Steps to Reproduce:
1. Insert TC8 DVD
2. Run 'install Fedora'
3. Wait about 30 seconds to watch the cursor on a blank gui screen

Comment 1 Jes Sorensen 2012-11-14 07:31:31 UTC
Created attachment 644598 [details]
X.log

Comment 2 Jes Sorensen 2012-11-14 07:32:37 UTC
Created attachment 644599 [details]
anaconda.log

Comment 3 Jes Sorensen 2012-11-14 07:33:04 UTC
Created attachment 644600 [details]
ifcfg.log

Comment 4 Jes Sorensen 2012-11-14 07:33:30 UTC
Created attachment 644601 [details]
packaging.log

Comment 5 Jes Sorensen 2012-11-14 07:33:52 UTC
Created attachment 644602 [details]
program.log

Comment 6 Jes Sorensen 2012-11-14 07:34:29 UTC
Created attachment 644603 [details]
storage.log

Comment 7 Jes Sorensen 2012-11-14 07:35:19 UTC
Created attachment 644604 [details]
syslog

Comment 8 Jes Sorensen 2012-11-14 07:40:58 UTC
Created attachment 644606 [details]
anaconda-tb

Comment 9 Adam Williamson 2012-11-14 07:46:14 UTC
From the traceback:

anaconda 18.24 exception report
Traceback (most recent call first):
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicelibs/mdraid.py", line 264, in md_node_from_name
    raise MDRaidError("md_node_from_name failed: %s" % e)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devices.py", line 2839, in updateSysfsPath
    md_node = mdraid.md_node_from_name(self.name)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1449, in handleUdevMDMemberFormat
    md_array.updateSysfsPath()
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1663, in handleUdevDeviceFormat
    self.handleUdevMDMemberFormat(info, device)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1096, in addUdevDevice
    self.handleUdevDeviceFormat(info, device)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1902, in _populate
    self.addUdevDevice(dev)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1850, in populate
    self._populate()
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/__init__.py", line 398, in reset
    self.devicetree.populate(cleanupOnly=cleanupOnly)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/__init__.py", line 104, in storageInitialize
    storage.reset()
  File "/usr/lib64/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/threads.py", line 91, in run
    threading.Thread.run(self, *args, **kwargs)
MDRaidError: md_node_from_name failed: [Errno 22] Invalid argument: '/dev/md/'

Local variables in innermost frame:
e: [Errno 22] Invalid argument: '/dev/md/'
named_path: /dev/md/
name: 

So we're in the RAID code here. dlehman! Note this bug is stopping Jes from helping to fix our *other* RAID bug, so it'd be appreciated if you could give it a bit of priority. And hey, it may even rank a blocker in itself, once we have more data.

Comment 10 Jes Sorensen 2012-11-14 07:50:54 UTC
Ok this is actually TC7, not TC8.

Comment 11 Jes Sorensen 2012-11-14 08:29:24 UTC
Adam gave me a pointer to the latest smoketest image - same thing. Looked like
anaconda 18.29 on the screen before the kapow.

Uploading logs in a second.

Comment 12 Jes Sorensen 2012-11-14 08:30:08 UTC
Created attachment 644633 [details]
X.log smoketest

Comment 13 Jes Sorensen 2012-11-14 08:31:03 UTC
Created attachment 644634 [details]
anaconda.log smoketest

Comment 14 Jes Sorensen 2012-11-14 08:31:53 UTC
Created attachment 644635 [details]
ifcfg.log smoketest

Comment 15 Jes Sorensen 2012-11-14 08:32:32 UTC
Created attachment 644642 [details]
packaging.log smoketest

Comment 16 Jes Sorensen 2012-11-14 08:33:13 UTC
Created attachment 644646 [details]
program.log smoketest

Comment 17 Jes Sorensen 2012-11-14 08:33:41 UTC
Created attachment 644647 [details]
storage.log smoketest

Comment 18 Jes Sorensen 2012-11-14 08:34:08 UTC
Created attachment 644648 [details]
syslog smoketest

Comment 19 Jes Sorensen 2012-11-14 08:34:52 UTC
Created attachment 644649 [details]
anaconda-tb smoketest

Comment 20 Adam Williamson 2012-11-15 17:57:26 UTC
dlehman says: "there's no md-specific data in md127's udevdb entry. so that's a udev or environment issue. there should be several MD_ things, but there are none"

kay says: "sorry, i'm entirely clueless about raid. but i'm not really surprised that their hacks don't always work out :) i have never personally used md or lvm, any specific pointers?"

This could possibly just be a more explosive case of the mdadm/udev/systemd problem from https://bugzilla.redhat.com/show_bug.cgi?id=873576 , but I'm not sure.

CCing relevant folks.

Comment 21 Jes Sorensen 2012-11-15 18:11:49 UTC
md127 is a normal md array, nothing special about it. I realize udev/systemd
loves to blame mdadm for things, but the array is fine.

This being on an F18 box that was upgraded from F17 via yum.

[root@monkeybay grub.d]# mdadm --detail /dev/md127
/dev/md127:
        Version : 1.0
  Creation Time : Thu Apr 26 16:02:31 2012
     Raid Level : raid1
     Array Size : 4883444 (4.66 GiB 5.00 GB)
  Used Dev Size : 4883444 (4.66 GiB 5.00 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Wed Nov 14 09:25:21 2012
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : monkeybay:42  (local to host monkeybay)
           UUID : 39e60b6a:6125de1b:cee8da1d:c1eb7dd8
         Events : 89

    Number   Major   Minor   RaidDevice State
       0       8       72        0      active sync   /dev/sde8
       1       8       88        1      active sync   /dev/sdf8

Comment 22 Jes Sorensen 2012-11-15 18:14:18 UTC
I should note that this one comes across to me similar to 862073, which btw.
has sat untouched since I reported it against Alpha six weeks ago :(

Comment 23 Doug Ledford 2012-11-15 18:52:41 UTC
(In reply to comment #20)
> dlehman says: "there's no md-specific data in md127's udevdb entry. so
> that's a udev or environment issue. there should be several MD_ things, but
> there are none"

From my reading of the storage log, anaconda listed md127 as a device to be scanned, but never actually scanned it.  If it doesn't scan it, then I would expect it not to have an MD_ environment entries.

Also from my reading, anaconda scans constituent devices and then tries to add them to raid arrays when, from what I can tell, they are already part of raid arrays and should not be added to anything.  Mdadm does the right thing and refuses to add them to anything, but that just strikes me as wrong.

Normally, mdadm populates the MD_ entries via the 64-md-raid.rules file on a running system via the call to mdadm --detail --export $tempnode, so if Anaconda is doing its own scanning, and not running default udev rules, then it goes out of our hands.

Comment 24 David Lehman 2012-11-16 23:42:20 UTC
(In reply to comment #23)
> (In reply to comment #20)
> > dlehman says: "there's no md-specific data in md127's udevdb entry. so
> > that's a udev or environment issue. there should be several MD_ things, but
> > there are none"
> 
> From my reading of the storage log, anaconda listed md127 as a device to be
> scanned, but never actually scanned it.  If it doesn't scan it, then I would
> expect it not to have an MD_ environment entries.

We tried to scan it, but it being an md device without any MD_ entries in udev made it fail, which is the cause of the crash.

08:25:21,643 DEBUG storage:          DeviceTree.addUdevDevice: info: {'ANACBIN': '/sbin',
 'DEVNAME': 'md127',
 'DEVPATH': '/devices/virtual/block/md127',
 'DEVTYPE': 'disk',
 'MAJOR': '9',
 'MINOR': '127',
 'MPATH_SBIN_PATH': '/sbin',
 'SUBSYSTEM': 'block',
 'SYSTEMD_READY': '0',
 'TAGS': ':systemd:',
 'USEC_INITIALIZED': '761967',
 'name': 'md127',
 'symlinks': [],
 'sysfs_path': '/devices/virtual/block/md127'} ; name: md127 ;


Other md devices in the same storage.log have the MD_ data in udev. The lack of said data in md127's udevdb entry is the basic problem that needs to get fixed IMO.

> Normally, mdadm populates the MD_ entries via the 64-md-raid.rules file on a
> running system via the call to mdadm --detail --export $tempnode, so if
> Anaconda is doing its own scanning, and not running default udev rules, then
> it goes out of our hands.

The mdadm udev rules are indeed enabled in the installer environment, but this md device is apparently not triggering them.

Comment 25 Jes Sorensen 2012-11-19 09:12:23 UTC
I was looking a bit at how this happens, and it seems to be a valid setup.
Basically if you assemble an array without sufficient devices to be able
to start it, you will end up with an array that reports udev info back this
way.

IMHO Anaconda should be able to notice the array is incomplete and list it
as such.

It is a scenario likely to happen in the field when someone adds previously
used drives to a box and tries to install it.

[root@monkeybay ~]# mdadm -A /dev/md43 /dev/sdg7
mdadm: /dev/md43 assembled from 1 drive - not enough to start the array.
[root@monkeybay ~]# udevadm info --name=md43
P: /devices/virtual/block/md43
N: md43
E: DEVNAME=/dev/md43
E: DEVPATH=/devices/virtual/block/md43
E: DEVTYPE=disk
E: MAJOR=9
E: MINOR=43
E: SUBSYSTEM=block
E: SYSTEMD_READY=0
E: TAGS=:systemd:
E: UDISKS_PRESENTATION_NOPOLICY=1
E: USEC_INITIALIZED=4420974364

Comment 26 Adam Williamson 2012-11-19 09:28:03 UTC
anaconda is intended to handle such cases and not crash, yeah, but we're not considering that a high priority for Beta.

If you work around this bug, can you help us figure out https://bugzilla.redhat.com/show_bug.cgi?id=876789 ? that's the remaining bug with the RAID-1 case.

Comment 27 Jes Sorensen 2012-11-19 10:07:46 UTC
Having the installer crash like this with no explanation is rather
unacceptable IMHO. It should be fairly easy to at least ignore the
array so anaconda doesn't crash.

Comment 28 Adam Williamson 2012-11-22 05:40:10 UTC
commonbugs -> we need to document that incomplete RAID, LVM, btrfs etc multi-disk devices may cause crashes.

Comment 29 Adam Williamson 2012-12-18 20:13:28 UTC
Proposing as NTH for final: the bug here is in evaluating existing incomplete RAID arrays, which probably isn't serious enough to block release, but certainly worth fixing.

Comment 30 Kamil Páral 2012-12-18 20:25:45 UTC
+1 NTH

Comment 31 Jared Smith 2012-12-18 20:35:53 UTC
+1 NTH

Comment 32 Adam Williamson 2012-12-18 20:57:14 UTC
Thanks for votes, folks, accepted as NTH.

Comment 33 Fedora Update System 2012-12-19 01:34:37 UTC
anaconda-18.37.4-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/anaconda-18.37.4-1.fc18

Comment 34 Fedora Update System 2012-12-19 22:41:24 UTC
Package anaconda-18.37.4-1.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing anaconda-18.37.4-1.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-20677/anaconda-18.37.4-1.fc18
then log in and leave karma (feedback).

Comment 35 Fedora Update System 2012-12-20 05:36:08 UTC
anaconda-18.37.4-1.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 36 Kamil Páral 2012-12-20 10:36:57 UTC
Jes, can you please verify this is fixed with smoke9 image?
http://dl.fedoraproject.org/pub/alt/qa/20121219_f18-smoke9/

Thanks!

Comment 37 TianShixiong 2012-12-20 11:15:19 UTC
(In reply to comment #35)
> anaconda-18.37.4-1.fc18 has been pushed to the Fedora 18 stable repository. 
> If problems still persist, please make note of it in this bug report.

A L10n related problem occurs after this update.
See bug https://bugzilla.redhat.com/show_bug.cgi?id=634385

Comment 38 Jes Sorensen 2012-12-20 11:33:49 UTC
(In reply to comment #36)
> Jes, can you please verify this is fixed with smoke9 image?
> http://dl.fedoraproject.org/pub/alt/qa/20121219_f18-smoke9/
> 
> Thanks!

Kamal,

In a panic before the holidays mode here today, it's unlikely I will get to
this before the new year :(

It did work in TC3 though.

Jes

Comment 39 Kamil Páral 2012-12-20 12:39:08 UTC
If this worked in TC3, I take it as a verification. Closing this bug. Thanks.