Bug 507782 - hald ended with segmentation fault
hald ended with segmentation fault
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: hal (Show other bugs)
rawhide
i586 Linux
medium Severity high
: ---
: ---
Assigned To: Richard Hughes
Fedora Extras Quality Assurance
:
: 508617 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-06-24 04:21 EDT by Jacek Danecki
Modified: 2009-08-20 16:55 EDT (History)
9 users (show)

See Also:
Fixed In Version: 0.5.12-29.20090226git.fc11
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-07-28 13:39:25 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs from hald (202.79 KB, application/x-compressed)
2009-06-25 12:44 EDT, Jacek Danecki
no flags Details
stack trace (2.70 KB, text/plain)
2009-07-01 08:44 EDT, Matthew Gregan [:kinetik]
no flags Details
debuginfo log (7.66 KB, application/octet-stream)
2009-07-22 07:52 EDT, Jacek Danecki
no flags Details
patch to spot MD devices and not partitions on MD devices (1.20 KB, application/octet-stream)
2009-07-22 10:31 EDT, Martin Poole
no flags Details

  None (edit)
Description Jacek Danecki 2009-06-24 04:21:23 EDT
Description of problem:
During starting system installed on MD isw array raid1 haldaemon ended with segmentation fault. 

Version-Release number of selected component (if applicable):
hal-0.5.12-26.20090226git.fc11.i586

How reproducible:
Install Fedora on raid1 array created in OROM. After reboot during first start, hald ended with segfault. Using recovery image you can reproduce this mounting root-filesystem and running hald manually.

Actual results:
hald[5058]: segfault at 0 ip 08080db5 sp bfbbcb40 error 4 in hald[8047000+5b000]

Expected results:
Hald works

Additional info:
See attached logs from hald stdout (hald.stdout) and stderr (hald.stderr), and from strace command (hald.strace).
Comment 1 Jacek Danecki 2009-06-25 12:44:44 EDT
Created attachment 349423 [details]
logs from hald
Comment 2 Richard Hughes 2009-06-29 07:48:03 EDT
Can you get a backtrace with debuginfo please. Thanks.
Comment 3 Richard Hughes 2009-06-29 07:48:12 EDT
*** Bug 508617 has been marked as a duplicate of this bug. ***
Comment 4 Matthew Gregan [:kinetik] 2009-07-01 08:44:31 EDT
Created attachment 350099 [details]
stack trace

I hit the same problem with a fresh install of F11 on x86_64 with a dmraid/isw configuration.  Attaching stack trace and some preliminary debugging with symbols installed.

hald will start if I run |mdadm -S -s| to deactivate the problematic md devices first.
Comment 5 Michael Weidner 2009-07-09 03:26:34 EDT
Same Problem here since yesterday on F11 on 2.6.29.5-191.fc11.i686.PAE:

hald[5270]: segfault at 0 ip 08080db5 sp bfe8c050 error 4 in hald[8047000+5b000]

It started after I changed my md-devices to have a partition (before I had my filesystem directly on md0, I have changed this to md0p1 by creating a partition in the md0 device to get rid of the "md0: unknown partition table" message when booting):

Platte /dev/md0: 500.1 GByte, 500113211392 Byte
2 Köpfe, 4 Sektoren/Spuren, 122097952 Zylinder
Einheiten = Zylinder von 8 × 512 = 4096 Bytes
Disk identifier: 0x23d145c3

    Gerät  boot.     Anfang        Ende     Blöcke   Id  System
/dev/md0p1               1   122097952   488391806   83  Linux


After this the error occurded.
Comment 6 Jacek Danecki 2009-07-22 07:52:46 EDT
Created attachment 354675 [details]
debuginfo log

logs from debuginfo
Comment 7 Martin Poole 2009-07-22 10:29:34 EDT
Problem is at line 1501. of blockev.c

#0  0x0000000000434de0 in hotplug_event_begin_add_blockdev (sysfs_path=0x1d01600 "/sys/devices/virtual/block/md3/md3p1", 
    device_file=<value optimized out>, is_partition=<value optimized out>, parent=0x1cd3b80, end_token=0x1d014f0) at blockdev.c:1501
1501                    hal_device_property_set_bool (d, "volume.is_disc", strcmp (hal_device_property_get_string (parent, "storage.drive_type"), "cdrom") == 0);


Added debug to show show parent udi, and parent disk type.

14:33:28.334 [I] blockdev.c:915: Handling /dev/md3p1 as MD device
14:33:28.334 [I] blockdev.c:1501: MDFAIL: block.storage_device='/org/freedesktop/Hal/devices/computer'
14:33:28.334 [I] blockdev.c:1502: MDFAIL: d_type='(null)'

which points to the strcmp failing.

Attachment in comment#6 indicates there was a null parent in that case.

0x08080b95 in hotplug_event_begin_add_blockdev (sysfs_path=0x81145bc "/sys/devices/virtual/block/md126/md126p1", 
    device_file=0x81149bc "/dev/md126p1", is_partition=1, parent=0x0, end_token=0x81144b0) at blockdev.c:1501


Working back it appears that the code for spotting what is an MD device does not take into account partitions on MD devices.
Comment 8 Martin Poole 2009-07-22 10:31:17 EDT
Created attachment 354698 [details]
patch to spot MD devices and not partitions on MD devices

Simple patch to cover the case where partitions exist on MD devices.
Comment 9 Jacek Danecki 2009-07-23 05:12:22 EDT
In which version of hal package will be this fix included?
Comment 10 Jacek Danecki 2009-07-23 05:58:39 EDT
I've patched version hal-0.5.12-26.20090226git.fc12.3 with this fix and problem disappeared.
Comment 11 Richard Hughes 2009-07-23 12:31:58 EDT
Patch looks okay, but I have a concern that if you have more than 9 partitions this will overflow, and the same bug will bite.

For a string of /sys/devices/virtual/block/md126/md126p1

Surely we need:

sscanf (hal_util_get_last_element (sysfs_path), "md%dp%d", &md_number, &tc) == 2
                                                     ^^^                       ^
rather than

sscanf (hal_util_get_last_element (sysfs_path), "md%d%c", &md_number, &tc) == 1

I'm also slightly concerned why this happened -- did md devices exist without partition numbers before? What happens if a device name without a p suffix (md126) gets pushed into a "md%dp%d" match?

Either way, this patch needs a little more work. I can look at this again tomorrow, but I would be good to discuss the patch first. Thanks.
Comment 12 Martin Poole 2009-07-23 12:50:18 EDT
No, there is no problem with more than 9 partitions with that patch.

Note we are looking for a single character after the number, we don't care what character merely that a non-numeric is present. sscanf returns the number of parameters it matches, so if there is anything after the initial digits the %c will match on whatever it is, any subsequent to the character is ignored and the returned number will be 2 which indicates it is not a plain MD device.

The partition names are automatically created by kpartx, called from hotplug, and seem to always have the 'p' indicator for partitions.

md devices have had partition tables before, but with F11 and the greater prevalence of virtual machines it is more common to have these appear (is my take).
Comment 13 Richard Hughes 2009-07-28 13:39:25 EDT
Fixed in 0.5.13-5, http://koji.fedoraproject.org/koji/taskinfo?taskID=1550358

Thanks Martin for the patch!
Comment 14 Michael Cutler 2009-08-12 04:43:15 EDT
Gents, is there a possibility that this patch can be released on the 0.5.12-fc11 updates branch.  Currently Fedora 11 is utterly useless on pretty standard RHEL-compatible equipment (HP xw4600 Workstation).  I can see it's been closed as a NEXTRELEASE but I would like to use Fedora again before F12 :) with hald segfaulting immediately after install you cannot get past firstboot without using rescue.
Comment 15 Richard Hughes 2009-08-12 04:56:05 EDT
Building for F11 here: http://koji.fedoraproject.org/koji/taskinfo?taskID=1600361

I'll do a bodhi update when that's complete.
Comment 16 Fedora Update System 2009-08-12 05:04:38 EDT
hal-0.5.12-29.20090226git.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/hal-0.5.12-29.20090226git.fc11
Comment 17 Fedora Update System 2009-08-20 16:55:37 EDT
hal-0.5.12-29.20090226git.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.