Bug 876441
Summary: | Early crash while evaluating existing RAID arrays | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jes Sorensen <Jes.Sorensen> | ||||||||||||||||||||||||||||||||||
Component: | anaconda | Assignee: | David Lehman <dlehman> | ||||||||||||||||||||||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||||||||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||||||||||||||||||||||||
Priority: | unspecified | ||||||||||||||||||||||||||||||||||||
Version: | 18 | CC: | anaconda-maint-list, awilliam, dledford, g.kaviyarasu, Jes.Sorensen, jonathan, jsmith.fedora, kparal, mschmidt, sbueno, systemd-maint, tiansworld, vanmeeuwen+fedora | ||||||||||||||||||||||||||||||||||
Target Milestone: | --- | Keywords: | CommonBugs, Reopened | ||||||||||||||||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||||||||||||||
Whiteboard: | https://fedoraproject.org/wiki/Common_F18_bugs#broken-lvm-raid-btrfs AcceptedNTH | ||||||||||||||||||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||||||||||||||||
Last Closed: | 2012-12-20 12:39:08 UTC | Type: | Bug | ||||||||||||||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||||||||||||||||||
Bug Blocks: | 752665 | ||||||||||||||||||||||||||||||||||||
Attachments: |
|
Description
Jes Sorensen
2012-11-14 07:30:24 UTC
Created attachment 644598 [details]
X.log
Created attachment 644599 [details]
anaconda.log
Created attachment 644600 [details]
ifcfg.log
Created attachment 644601 [details]
packaging.log
Created attachment 644602 [details]
program.log
Created attachment 644603 [details]
storage.log
Created attachment 644604 [details]
syslog
Created attachment 644606 [details]
anaconda-tb
From the traceback: anaconda 18.24 exception report Traceback (most recent call first): File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicelibs/mdraid.py", line 264, in md_node_from_name raise MDRaidError("md_node_from_name failed: %s" % e) File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devices.py", line 2839, in updateSysfsPath md_node = mdraid.md_node_from_name(self.name) File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1449, in handleUdevMDMemberFormat md_array.updateSysfsPath() File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1663, in handleUdevDeviceFormat self.handleUdevMDMemberFormat(info, device) File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1096, in addUdevDevice self.handleUdevDeviceFormat(info, device) File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1902, in _populate self.addUdevDevice(dev) File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/devicetree.py", line 1850, in populate self._populate() File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/__init__.py", line 398, in reset self.devicetree.populate(cleanupOnly=cleanupOnly) File "/usr/lib64/python2.7/site-packages/pyanaconda/storage/__init__.py", line 104, in storageInitialize storage.reset() File "/usr/lib64/python2.7/threading.py", line 504, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib64/python2.7/site-packages/pyanaconda/threads.py", line 91, in run threading.Thread.run(self, *args, **kwargs) MDRaidError: md_node_from_name failed: [Errno 22] Invalid argument: '/dev/md/' Local variables in innermost frame: e: [Errno 22] Invalid argument: '/dev/md/' named_path: /dev/md/ name: So we're in the RAID code here. dlehman! Note this bug is stopping Jes from helping to fix our *other* RAID bug, so it'd be appreciated if you could give it a bit of priority. And hey, it may even rank a blocker in itself, once we have more data. Ok this is actually TC7, not TC8. Adam gave me a pointer to the latest smoketest image - same thing. Looked like anaconda 18.29 on the screen before the kapow. Uploading logs in a second. Created attachment 644633 [details]
X.log smoketest
Created attachment 644634 [details]
anaconda.log smoketest
Created attachment 644635 [details]
ifcfg.log smoketest
Created attachment 644642 [details]
packaging.log smoketest
Created attachment 644646 [details]
program.log smoketest
Created attachment 644647 [details]
storage.log smoketest
Created attachment 644648 [details]
syslog smoketest
Created attachment 644649 [details]
anaconda-tb smoketest
dlehman says: "there's no md-specific data in md127's udevdb entry. so that's a udev or environment issue. there should be several MD_ things, but there are none" kay says: "sorry, i'm entirely clueless about raid. but i'm not really surprised that their hacks don't always work out :) i have never personally used md or lvm, any specific pointers?" This could possibly just be a more explosive case of the mdadm/udev/systemd problem from https://bugzilla.redhat.com/show_bug.cgi?id=873576 , but I'm not sure. CCing relevant folks. md127 is a normal md array, nothing special about it. I realize udev/systemd loves to blame mdadm for things, but the array is fine. This being on an F18 box that was upgraded from F17 via yum. [root@monkeybay grub.d]# mdadm --detail /dev/md127 /dev/md127: Version : 1.0 Creation Time : Thu Apr 26 16:02:31 2012 Raid Level : raid1 Array Size : 4883444 (4.66 GiB 5.00 GB) Used Dev Size : 4883444 (4.66 GiB 5.00 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Wed Nov 14 09:25:21 2012 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Name : monkeybay:42 (local to host monkeybay) UUID : 39e60b6a:6125de1b:cee8da1d:c1eb7dd8 Events : 89 Number Major Minor RaidDevice State 0 8 72 0 active sync /dev/sde8 1 8 88 1 active sync /dev/sdf8 I should note that this one comes across to me similar to 862073, which btw. has sat untouched since I reported it against Alpha six weeks ago :( (In reply to comment #20) > dlehman says: "there's no md-specific data in md127's udevdb entry. so > that's a udev or environment issue. there should be several MD_ things, but > there are none" From my reading of the storage log, anaconda listed md127 as a device to be scanned, but never actually scanned it. If it doesn't scan it, then I would expect it not to have an MD_ environment entries. Also from my reading, anaconda scans constituent devices and then tries to add them to raid arrays when, from what I can tell, they are already part of raid arrays and should not be added to anything. Mdadm does the right thing and refuses to add them to anything, but that just strikes me as wrong. Normally, mdadm populates the MD_ entries via the 64-md-raid.rules file on a running system via the call to mdadm --detail --export $tempnode, so if Anaconda is doing its own scanning, and not running default udev rules, then it goes out of our hands. (In reply to comment #23) > (In reply to comment #20) > > dlehman says: "there's no md-specific data in md127's udevdb entry. so > > that's a udev or environment issue. there should be several MD_ things, but > > there are none" > > From my reading of the storage log, anaconda listed md127 as a device to be > scanned, but never actually scanned it. If it doesn't scan it, then I would > expect it not to have an MD_ environment entries. We tried to scan it, but it being an md device without any MD_ entries in udev made it fail, which is the cause of the crash. 08:25:21,643 DEBUG storage: DeviceTree.addUdevDevice: info: {'ANACBIN': '/sbin', 'DEVNAME': 'md127', 'DEVPATH': '/devices/virtual/block/md127', 'DEVTYPE': 'disk', 'MAJOR': '9', 'MINOR': '127', 'MPATH_SBIN_PATH': '/sbin', 'SUBSYSTEM': 'block', 'SYSTEMD_READY': '0', 'TAGS': ':systemd:', 'USEC_INITIALIZED': '761967', 'name': 'md127', 'symlinks': [], 'sysfs_path': '/devices/virtual/block/md127'} ; name: md127 ; Other md devices in the same storage.log have the MD_ data in udev. The lack of said data in md127's udevdb entry is the basic problem that needs to get fixed IMO. > Normally, mdadm populates the MD_ entries via the 64-md-raid.rules file on a > running system via the call to mdadm --detail --export $tempnode, so if > Anaconda is doing its own scanning, and not running default udev rules, then > it goes out of our hands. The mdadm udev rules are indeed enabled in the installer environment, but this md device is apparently not triggering them. I was looking a bit at how this happens, and it seems to be a valid setup. Basically if you assemble an array without sufficient devices to be able to start it, you will end up with an array that reports udev info back this way. IMHO Anaconda should be able to notice the array is incomplete and list it as such. It is a scenario likely to happen in the field when someone adds previously used drives to a box and tries to install it. [root@monkeybay ~]# mdadm -A /dev/md43 /dev/sdg7 mdadm: /dev/md43 assembled from 1 drive - not enough to start the array. [root@monkeybay ~]# udevadm info --name=md43 P: /devices/virtual/block/md43 N: md43 E: DEVNAME=/dev/md43 E: DEVPATH=/devices/virtual/block/md43 E: DEVTYPE=disk E: MAJOR=9 E: MINOR=43 E: SUBSYSTEM=block E: SYSTEMD_READY=0 E: TAGS=:systemd: E: UDISKS_PRESENTATION_NOPOLICY=1 E: USEC_INITIALIZED=4420974364 anaconda is intended to handle such cases and not crash, yeah, but we're not considering that a high priority for Beta. If you work around this bug, can you help us figure out https://bugzilla.redhat.com/show_bug.cgi?id=876789 ? that's the remaining bug with the RAID-1 case. Having the installer crash like this with no explanation is rather unacceptable IMHO. It should be fairly easy to at least ignore the array so anaconda doesn't crash. commonbugs -> we need to document that incomplete RAID, LVM, btrfs etc multi-disk devices may cause crashes. Proposing as NTH for final: the bug here is in evaluating existing incomplete RAID arrays, which probably isn't serious enough to block release, but certainly worth fixing. +1 NTH +1 NTH Thanks for votes, folks, accepted as NTH. anaconda-18.37.4-1.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/anaconda-18.37.4-1.fc18 Package anaconda-18.37.4-1.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing anaconda-18.37.4-1.fc18' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-20677/anaconda-18.37.4-1.fc18 then log in and leave karma (feedback). anaconda-18.37.4-1.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. Jes, can you please verify this is fixed with smoke9 image? http://dl.fedoraproject.org/pub/alt/qa/20121219_f18-smoke9/ Thanks! (In reply to comment #35) > anaconda-18.37.4-1.fc18 has been pushed to the Fedora 18 stable repository. > If problems still persist, please make note of it in this bug report. A L10n related problem occurs after this update. See bug https://bugzilla.redhat.com/show_bug.cgi?id=634385 (In reply to comment #36) > Jes, can you please verify this is fixed with smoke9 image? > http://dl.fedoraproject.org/pub/alt/qa/20121219_f18-smoke9/ > > Thanks! Kamal, In a panic before the holidays mode here today, it's unlikely I will get to this before the new year :( It did work in TC3 though. Jes If this worked in TC3, I take it as a verification. Closing this bug. Thanks. |