Red Hat Bugzilla – Bug 507782
hald ended with segmentation fault
Last modified: 2009-08-20 16:55:52 EDT
Description of problem:
During starting system installed on MD isw array raid1 haldaemon ended with segmentation fault.
Version-Release number of selected component (if applicable):
Install Fedora on raid1 array created in OROM. After reboot during first start, hald ended with segfault. Using recovery image you can reproduce this mounting root-filesystem and running hald manually.
hald: segfault at 0 ip 08080db5 sp bfbbcb40 error 4 in hald[8047000+5b000]
See attached logs from hald stdout (hald.stdout) and stderr (hald.stderr), and from strace command (hald.strace).
Created attachment 349423 [details]
logs from hald
Can you get a backtrace with debuginfo please. Thanks.
*** Bug 508617 has been marked as a duplicate of this bug. ***
Created attachment 350099 [details]
I hit the same problem with a fresh install of F11 on x86_64 with a dmraid/isw configuration. Attaching stack trace and some preliminary debugging with symbols installed.
hald will start if I run |mdadm -S -s| to deactivate the problematic md devices first.
Same Problem here since yesterday on F11 on 126.96.36.199-191.fc11.i686.PAE:
hald: segfault at 0 ip 08080db5 sp bfe8c050 error 4 in hald[8047000+5b000]
It started after I changed my md-devices to have a partition (before I had my filesystem directly on md0, I have changed this to md0p1 by creating a partition in the md0 device to get rid of the "md0: unknown partition table" message when booting):
Platte /dev/md0: 500.1 GByte, 500113211392 Byte
2 Köpfe, 4 Sektoren/Spuren, 122097952 Zylinder
Einheiten = Zylinder von 8 × 512 = 4096 Bytes
Disk identifier: 0x23d145c3
Gerät boot. Anfang Ende Blöcke Id System
/dev/md0p1 1 122097952 488391806 83 Linux
After this the error occurded.
Created attachment 354675 [details]
logs from debuginfo
Problem is at line 1501. of blockev.c
#0 0x0000000000434de0 in hotplug_event_begin_add_blockdev (sysfs_path=0x1d01600 "/sys/devices/virtual/block/md3/md3p1",
device_file=<value optimized out>, is_partition=<value optimized out>, parent=0x1cd3b80, end_token=0x1d014f0) at blockdev.c:1501
1501 hal_device_property_set_bool (d, "volume.is_disc", strcmp (hal_device_property_get_string (parent, "storage.drive_type"), "cdrom") == 0);
Added debug to show show parent udi, and parent disk type.
14:33:28.334 [I] blockdev.c:915: Handling /dev/md3p1 as MD device
14:33:28.334 [I] blockdev.c:1501: MDFAIL: block.storage_device='/org/freedesktop/Hal/devices/computer'
14:33:28.334 [I] blockdev.c:1502: MDFAIL: d_type='(null)'
which points to the strcmp failing.
Attachment in comment#6 indicates there was a null parent in that case.
0x08080b95 in hotplug_event_begin_add_blockdev (sysfs_path=0x81145bc "/sys/devices/virtual/block/md126/md126p1",
device_file=0x81149bc "/dev/md126p1", is_partition=1, parent=0x0, end_token=0x81144b0) at blockdev.c:1501
Working back it appears that the code for spotting what is an MD device does not take into account partitions on MD devices.
Created attachment 354698 [details]
patch to spot MD devices and not partitions on MD devices
Simple patch to cover the case where partitions exist on MD devices.
In which version of hal package will be this fix included?
I've patched version hal-0.5.12-26.20090226git.fc12.3 with this fix and problem disappeared.
Patch looks okay, but I have a concern that if you have more than 9 partitions this will overflow, and the same bug will bite.
For a string of /sys/devices/virtual/block/md126/md126p1
Surely we need:
sscanf (hal_util_get_last_element (sysfs_path), "md%dp%d", &md_number, &tc) == 2
sscanf (hal_util_get_last_element (sysfs_path), "md%d%c", &md_number, &tc) == 1
I'm also slightly concerned why this happened -- did md devices exist without partition numbers before? What happens if a device name without a p suffix (md126) gets pushed into a "md%dp%d" match?
Either way, this patch needs a little more work. I can look at this again tomorrow, but I would be good to discuss the patch first. Thanks.
No, there is no problem with more than 9 partitions with that patch.
Note we are looking for a single character after the number, we don't care what character merely that a non-numeric is present. sscanf returns the number of parameters it matches, so if there is anything after the initial digits the %c will match on whatever it is, any subsequent to the character is ignored and the returned number will be 2 which indicates it is not a plain MD device.
The partition names are automatically created by kpartx, called from hotplug, and seem to always have the 'p' indicator for partitions.
md devices have had partition tables before, but with F11 and the greater prevalence of virtual machines it is more common to have these appear (is my take).
Fixed in 0.5.13-5, http://koji.fedoraproject.org/koji/taskinfo?taskID=1550358
Thanks Martin for the patch!
Gents, is there a possibility that this patch can be released on the 0.5.12-fc11 updates branch. Currently Fedora 11 is utterly useless on pretty standard RHEL-compatible equipment (HP xw4600 Workstation). I can see it's been closed as a NEXTRELEASE but I would like to use Fedora again before F12 :) with hald segfaulting immediately after install you cannot get past firstboot without using rescue.
Building for F11 here: http://koji.fedoraproject.org/koji/taskinfo?taskID=1600361
I'll do a bodhi update when that's complete.
hal-0.5.12-29.20090226git.fc11 has been submitted as an update for Fedora 11.
hal-0.5.12-29.20090226git.fc11 has been pushed to the Fedora 11 stable repository. If problems still persist, please make note of it in this bug report.