Bug 876441 - Early crash while evaluating existing RAID arrays
Early crash while evaluating existing RAID arrays
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: anaconda (Show other bugs)
18
Unspecified Unspecified
unspecified Severity urgent
: ---
: ---
Assigned To: David Lehman
Fedora Extras Quality Assurance
https://fedoraproject.org/wiki/Common...
: CommonBugs, Reopened
Depends On:
Blocks: F18-accepted/F18FinalFreezeExcept
  Show dependency treegraph
 
Reported: 2012-11-14 02:30 EST by Jes Sorensen
Modified: 2012-12-20 07:39 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-12-20 07:39:08 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
X.log (32.68 KB, text/plain)
2012-11-14 02:31 EST, Jes Sorensen
no flags Details
anaconda.log (3.77 KB, text/plain)
2012-11-14 02:32 EST, Jes Sorensen
no flags Details
ifcfg.log (631 bytes, text/plain)
2012-11-14 02:33 EST, Jes Sorensen
no flags Details
packaging.log (135 bytes, text/plain)
2012-11-14 02:33 EST, Jes Sorensen
no flags Details
program.log (6.62 KB, text/plain)
2012-11-14 02:33 EST, Jes Sorensen
no flags Details
storage.log (212.39 KB, text/plain)
2012-11-14 02:34 EST, Jes Sorensen
no flags Details
syslog (126.76 KB, text/plain)
2012-11-14 02:35 EST, Jes Sorensen
no flags Details
anaconda-tb (415.99 KB, text/plain)
2012-11-14 02:40 EST, Jes Sorensen
no flags Details
X.log smoketest (32.67 KB, text/plain)
2012-11-14 03:30 EST, Jes Sorensen
no flags Details
anaconda.log smoketest (4.41 KB, text/x-log)
2012-11-14 03:31 EST, Jes Sorensen
no flags Details
ifcfg.log smoketest (631 bytes, text/plain)
2012-11-14 03:31 EST, Jes Sorensen
no flags Details
packaging.log smoketest (135 bytes, text/plain)
2012-11-14 03:32 EST, Jes Sorensen
no flags Details
program.log smoketest (174.96 KB, text/plain)
2012-11-14 03:33 EST, Jes Sorensen
no flags Details
storage.log smoketest (522.79 KB, text/plain)
2012-11-14 03:33 EST, Jes Sorensen
no flags Details
syslog smoketest (125.95 KB, application/octet-stream)
2012-11-14 03:34 EST, Jes Sorensen
no flags Details
anaconda-tb smoketest (906.70 KB, text/plain)
2012-11-14 03:34 EST, Jes Sorensen
no flags Details

  None (edit)
Description Jes Sorensen 2012-11-14 02:30:24 EST
Description of problem:
When trying to run the TC18 install DVD image, Anaconda never gets up,
instead there is a small window for about a microsecond and I am left with
a blank X screen.

Switching to VC1 I see a message that says 'pane is dead'.

Version-Release number of selected component (if applicable):
Fedora 18 TC8

How reproducible:
Every time

Steps to Reproduce:
1. Insert TC8 DVD
2. Run 'install Fedora'
3. Wait about 30 seconds to watch the cursor on a blank gui screen
Comment 1 Jes Sorensen 2012-11-14 02:31:31 EST
Created attachment 644598 [details]
X.log
Comment 2 Jes Sorensen 2012-11-14 02:32:37 EST
Created attachment 644599 [details]
anaconda.log
Comment 3 Jes Sorensen 2012-11-14 02:33:04 EST
Created attachment 644600 [details]
ifcfg.log
Comment 4 Jes Sorensen 2012-11-14 02:33:30 EST
Created attachment 644601 [details]
packaging.log
Comment 5 Jes Sorensen 2012-11-14 02:33:52 EST
Created attachment 644602 [details]
program.log
Comment 6 Jes Sorensen 2012-11-14 02:34:29 EST
Created attachment 644603 [details]
storage.log
Comment 7 Jes Sorensen 2012-11-14 02:35:19 EST
Created attachment 644604 [details]
syslog
Comment 8 Jes Sorensen 2012-11-14 02:40:58 EST
Created attachment 644606 [details]
anaconda-tb
Comment 9 Adam Williamson 2012-11-14 02:46:14 EST
From the traceback:

anaconda 18.24 exception report
Traceback (most recent call first):
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicelibs/mdraid.py", line 264, in md_node_from_name
    raise MDRaidError("md_node_from_name failed: %s" % e)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devices.py", line 2839, in updateSysfsPath
    md_node = mdraid.md_node_from_name(self.name)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1449, in handleUdevMDMemberFormat
    md_array.updateSysfsPath()
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1663, in handleUdevDeviceFormat
    self.handleUdevMDMemberFormat(info, device)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1096, in addUdevDevice
    self.handleUdevDeviceFormat(info, device)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1902, in _populate
    self.addUdevDevice(dev)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1850, in populate
    self._populate()
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/__init__.py", line 398, in reset
    self.devicetree.populate(cleanupOnly=cleanupOnly)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/__init__.py", line 104, in storageInitialize
    storage.reset()
  File "/usr/lib64/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/threads.py", line 91, in run
    threading.Thread.run(self, *args, **kwargs)
MDRaidError: md_node_from_name failed: [Errno 22] Invalid argument: '/dev/md/'

Local variables in innermost frame:
e: [Errno 22] Invalid argument: '/dev/md/'
named_path: /dev/md/
name: 

So we're in the RAID code here. dlehman! Note this bug is stopping Jes from helping to fix our *other* RAID bug, so it'd be appreciated if you could give it a bit of priority. And hey, it may even rank a blocker in itself, once we have more data.
Comment 10 Jes Sorensen 2012-11-14 02:50:54 EST
Ok this is actually TC7, not TC8.
Comment 11 Jes Sorensen 2012-11-14 03:29:24 EST
Adam gave me a pointer to the latest smoketest image - same thing. Looked like
anaconda 18.29 on the screen before the kapow.

Uploading logs in a second.
Comment 12 Jes Sorensen 2012-11-14 03:30:08 EST
Created attachment 644633 [details]
X.log smoketest
Comment 13 Jes Sorensen 2012-11-14 03:31:03 EST
Created attachment 644634 [details]
anaconda.log smoketest
Comment 14 Jes Sorensen 2012-11-14 03:31:53 EST
Created attachment 644635 [details]
ifcfg.log smoketest
Comment 15 Jes Sorensen 2012-11-14 03:32:32 EST
Created attachment 644642 [details]
packaging.log smoketest
Comment 16 Jes Sorensen 2012-11-14 03:33:13 EST
Created attachment 644646 [details]
program.log smoketest
Comment 17 Jes Sorensen 2012-11-14 03:33:41 EST
Created attachment 644647 [details]
storage.log smoketest
Comment 18 Jes Sorensen 2012-11-14 03:34:08 EST
Created attachment 644648 [details]
syslog smoketest
Comment 19 Jes Sorensen 2012-11-14 03:34:52 EST
Created attachment 644649 [details]
anaconda-tb smoketest
Comment 20 Adam Williamson 2012-11-15 12:57:26 EST
dlehman says: "there's no md-specific data in md127's udevdb entry. so that's a udev or environment issue. there should be several MD_ things, but there are none"

kay says: "sorry, i'm entirely clueless about raid. but i'm not really surprised that their hacks don't always work out :) i have never personally used md or lvm, any specific pointers?"

This could possibly just be a more explosive case of the mdadm/udev/systemd problem from https://bugzilla.redhat.com/show_bug.cgi?id=873576 , but I'm not sure.

CCing relevant folks.
Comment 21 Jes Sorensen 2012-11-15 13:11:49 EST
md127 is a normal md array, nothing special about it. I realize udev/systemd
loves to blame mdadm for things, but the array is fine.

This being on an F18 box that was upgraded from F17 via yum.

[root@monkeybay grub.d]# mdadm --detail /dev/md127
/dev/md127:
        Version : 1.0
  Creation Time : Thu Apr 26 16:02:31 2012
     Raid Level : raid1
     Array Size : 4883444 (4.66 GiB 5.00 GB)
  Used Dev Size : 4883444 (4.66 GiB 5.00 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Wed Nov 14 09:25:21 2012
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : monkeybay:42  (local to host monkeybay)
           UUID : 39e60b6a:6125de1b:cee8da1d:c1eb7dd8
         Events : 89

    Number   Major   Minor   RaidDevice State
       0       8       72        0      active sync   /dev/sde8
       1       8       88        1      active sync   /dev/sdf8
Comment 22 Jes Sorensen 2012-11-15 13:14:18 EST
I should note that this one comes across to me similar to 862073, which btw.
has sat untouched since I reported it against Alpha six weeks ago :(
Comment 23 Doug Ledford 2012-11-15 13:52:41 EST
(In reply to comment #20)
> dlehman says: "there's no md-specific data in md127's udevdb entry. so
> that's a udev or environment issue. there should be several MD_ things, but
> there are none"

From my reading of the storage log, anaconda listed md127 as a device to be scanned, but never actually scanned it.  If it doesn't scan it, then I would expect it not to have an MD_ environment entries.

Also from my reading, anaconda scans constituent devices and then tries to add them to raid arrays when, from what I can tell, they are already part of raid arrays and should not be added to anything.  Mdadm does the right thing and refuses to add them to anything, but that just strikes me as wrong.

Normally, mdadm populates the MD_ entries via the 64-md-raid.rules file on a running system via the call to mdadm --detail --export $tempnode, so if Anaconda is doing its own scanning, and not running default udev rules, then it goes out of our hands.
Comment 24 David Lehman 2012-11-16 18:42:20 EST
(In reply to comment #23)
> (In reply to comment #20)
> > dlehman says: "there's no md-specific data in md127's udevdb entry. so
> > that's a udev or environment issue. there should be several MD_ things, but
> > there are none"
> 
> From my reading of the storage log, anaconda listed md127 as a device to be
> scanned, but never actually scanned it.  If it doesn't scan it, then I would
> expect it not to have an MD_ environment entries.

We tried to scan it, but it being an md device without any MD_ entries in udev made it fail, which is the cause of the crash.

08:25:21,643 DEBUG storage:          DeviceTree.addUdevDevice: info: {'ANACBIN': '/sbin',
 'DEVNAME': 'md127',
 'DEVPATH': '/devices/virtual/block/md127',
 'DEVTYPE': 'disk',
 'MAJOR': '9',
 'MINOR': '127',
 'MPATH_SBIN_PATH': '/sbin',
 'SUBSYSTEM': 'block',
 'SYSTEMD_READY': '0',
 'TAGS': ':systemd:',
 'USEC_INITIALIZED': '761967',
 'name': 'md127',
 'symlinks': [],
 'sysfs_path': '/devices/virtual/block/md127'} ; name: md127 ;


Other md devices in the same storage.log have the MD_ data in udev. The lack of said data in md127's udevdb entry is the basic problem that needs to get fixed IMO.

> Normally, mdadm populates the MD_ entries via the 64-md-raid.rules file on a
> running system via the call to mdadm --detail --export $tempnode, so if
> Anaconda is doing its own scanning, and not running default udev rules, then
> it goes out of our hands.

The mdadm udev rules are indeed enabled in the installer environment, but this md device is apparently not triggering them.
Comment 25 Jes Sorensen 2012-11-19 04:12:23 EST
I was looking a bit at how this happens, and it seems to be a valid setup.
Basically if you assemble an array without sufficient devices to be able
to start it, you will end up with an array that reports udev info back this
way.

IMHO Anaconda should be able to notice the array is incomplete and list it
as such.

It is a scenario likely to happen in the field when someone adds previously
used drives to a box and tries to install it.

[root@monkeybay ~]# mdadm -A /dev/md43 /dev/sdg7
mdadm: /dev/md43 assembled from 1 drive - not enough to start the array.
[root@monkeybay ~]# udevadm info --name=md43
P: /devices/virtual/block/md43
N: md43
E: DEVNAME=/dev/md43
E: DEVPATH=/devices/virtual/block/md43
E: DEVTYPE=disk
E: MAJOR=9
E: MINOR=43
E: SUBSYSTEM=block
E: SYSTEMD_READY=0
E: TAGS=:systemd:
E: UDISKS_PRESENTATION_NOPOLICY=1
E: USEC_INITIALIZED=4420974364
Comment 26 Adam Williamson 2012-11-19 04:28:03 EST
anaconda is intended to handle such cases and not crash, yeah, but we're not considering that a high priority for Beta.

If you work around this bug, can you help us figure out https://bugzilla.redhat.com/show_bug.cgi?id=876789 ? that's the remaining bug with the RAID-1 case.
Comment 27 Jes Sorensen 2012-11-19 05:07:46 EST
Having the installer crash like this with no explanation is rather
unacceptable IMHO. It should be fairly easy to at least ignore the
array so anaconda doesn't crash.
Comment 28 Adam Williamson 2012-11-22 00:40:10 EST
commonbugs -> we need to document that incomplete RAID, LVM, btrfs etc multi-disk devices may cause crashes.
Comment 29 Adam Williamson 2012-12-18 15:13:28 EST
Proposing as NTH for final: the bug here is in evaluating existing incomplete RAID arrays, which probably isn't serious enough to block release, but certainly worth fixing.
Comment 30 Kamil Páral 2012-12-18 15:25:45 EST
+1 NTH
Comment 31 Jared Smith 2012-12-18 15:35:53 EST
+1 NTH
Comment 32 Adam Williamson 2012-12-18 15:57:14 EST
Thanks for votes, folks, accepted as NTH.
Comment 33 Fedora Update System 2012-12-18 20:34:37 EST
anaconda-18.37.4-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/anaconda-18.37.4-1.fc18
Comment 34 Fedora Update System 2012-12-19 17:41:24 EST
Package anaconda-18.37.4-1.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing anaconda-18.37.4-1.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-20677/anaconda-18.37.4-1.fc18
then log in and leave karma (feedback).
Comment 35 Fedora Update System 2012-12-20 00:36:08 EST
anaconda-18.37.4-1.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 36 Kamil Páral 2012-12-20 05:36:57 EST
Jes, can you please verify this is fixed with smoke9 image?
http://dl.fedoraproject.org/pub/alt/qa/20121219_f18-smoke9/

Thanks!
Comment 37 TianShixiong 2012-12-20 06:15:19 EST
(In reply to comment #35)
> anaconda-18.37.4-1.fc18 has been pushed to the Fedora 18 stable repository. 
> If problems still persist, please make note of it in this bug report.

A L10n related problem occurs after this update.
See bug https://bugzilla.redhat.com/show_bug.cgi?id=634385
Comment 38 Jes Sorensen 2012-12-20 06:33:49 EST
(In reply to comment #36)
> Jes, can you please verify this is fixed with smoke9 image?
> http://dl.fedoraproject.org/pub/alt/qa/20121219_f18-smoke9/
> 
> Thanks!

Kamal,

In a panic before the holidays mode here today, it's unlikely I will get to
this before the new year :(

It did work in TC3 though.

Jes
Comment 39 Kamil Páral 2012-12-20 07:39:08 EST
If this worked in TC3, I take it as a verification. Closing this bug. Thanks.

Note You need to log in before you can comment on or make changes to this bug.