507782 – hald ended with segmentation fault

Bug 507782 - hald ended with segmentation fault

Summary: hald ended with segmentation fault

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	hal
Sub Component:
Version:	rawhide
Hardware:	i586
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Richard Hughes
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	508617 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-06-24 08:21 UTC by Jacek Danecki
Modified:	2009-08-20 20:55 UTC (History)
CC List:	9 users (show)
Fixed In Version:	0.5.12-29.20090226git.fc11
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-07-28 17:39:25 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
logs from hald (202.79 KB, application/x-compressed) 2009-06-25 16:44 UTC, Jacek Danecki	no flags	Details
stack trace (2.70 KB, text/plain) 2009-07-01 12:44 UTC, Matthew Gregan [:kinetik]	no flags	Details
debuginfo log (7.66 KB, application/octet-stream) 2009-07-22 11:52 UTC, Jacek Danecki	no flags	Details
patch to spot MD devices and not partitions on MD devices (1.20 KB, application/octet-stream) 2009-07-22 14:31 UTC, Martin Poole	no flags	Details
View All

Description Jacek Danecki 2009-06-24 08:21:23 UTC

Description of problem:
During starting system installed on MD isw array raid1 haldaemon ended with segmentation fault. 

Version-Release number of selected component (if applicable):
hal-0.5.12-26.20090226git.fc11.i586

How reproducible:
Install Fedora on raid1 array created in OROM. After reboot during first start, hald ended with segfault. Using recovery image you can reproduce this mounting root-filesystem and running hald manually.

Actual results:
hald[5058]: segfault at 0 ip 08080db5 sp bfbbcb40 error 4 in hald[8047000+5b000]

Expected results:
Hald works

Additional info:
See attached logs from hald stdout (hald.stdout) and stderr (hald.stderr), and from strace command (hald.strace).

Comment 1 Jacek Danecki 2009-06-25 16:44:44 UTC

Created attachment 349423 [details]
logs from hald

Comment 2 Richard Hughes 2009-06-29 11:48:03 UTC

Can you get a backtrace with debuginfo please. Thanks.

Comment 3 Richard Hughes 2009-06-29 11:48:12 UTC

*** Bug 508617 has been marked as a duplicate of this bug. ***

Comment 4 Matthew Gregan [:kinetik] 2009-07-01 12:44:31 UTC

Created attachment 350099 [details]
stack trace

I hit the same problem with a fresh install of F11 on x86_64 with a dmraid/isw configuration.  Attaching stack trace and some preliminary debugging with symbols installed.

hald will start if I run |mdadm -S -s| to deactivate the problematic md devices first.

Comment 5 Michael Weidner 2009-07-09 07:26:34 UTC

Same Problem here since yesterday on F11 on 2.6.29.5-191.fc11.i686.PAE:

hald[5270]: segfault at 0 ip 08080db5 sp bfe8c050 error 4 in hald[8047000+5b000]

It started after I changed my md-devices to have a partition (before I had my filesystem directly on md0, I have changed this to md0p1 by creating a partition in the md0 device to get rid of the "md0: unknown partition table" message when booting):

Platte /dev/md0: 500.1 GByte, 500113211392 Byte
2 Köpfe, 4 Sektoren/Spuren, 122097952 Zylinder
Einheiten = Zylinder von 8 × 512 = 4096 Bytes
Disk identifier: 0x23d145c3

    Gerät  boot.     Anfang        Ende     Blöcke   Id  System
/dev/md0p1               1   122097952   488391806   83  Linux


After this the error occurded.

Comment 6 Jacek Danecki 2009-07-22 11:52:46 UTC

Created attachment 354675 [details]
debuginfo log

logs from debuginfo

Comment 7 Martin Poole 2009-07-22 14:29:34 UTC

Problem is at line 1501. of blockev.c

#0  0x0000000000434de0 in hotplug_event_begin_add_blockdev (sysfs_path=0x1d01600 "/sys/devices/virtual/block/md3/md3p1", 
    device_file=<value optimized out>, is_partition=<value optimized out>, parent=0x1cd3b80, end_token=0x1d014f0) at blockdev.c:1501
1501                    hal_device_property_set_bool (d, "volume.is_disc", strcmp (hal_device_property_get_string (parent, "storage.drive_type"), "cdrom") == 0);


Added debug to show show parent udi, and parent disk type.

14:33:28.334 [I] blockdev.c:915: Handling /dev/md3p1 as MD device
14:33:28.334 [I] blockdev.c:1501: MDFAIL: block.storage_device='/org/freedesktop/Hal/devices/computer'
14:33:28.334 [I] blockdev.c:1502: MDFAIL: d_type='(null)'

which points to the strcmp failing.

Attachment in comment#6 indicates there was a null parent in that case.

0x08080b95 in hotplug_event_begin_add_blockdev (sysfs_path=0x81145bc "/sys/devices/virtual/block/md126/md126p1", 
    device_file=0x81149bc "/dev/md126p1", is_partition=1, parent=0x0, end_token=0x81144b0) at blockdev.c:1501


Working back it appears that the code for spotting what is an MD device does not take into account partitions on MD devices.

Comment 8 Martin Poole 2009-07-22 14:31:17 UTC

Created attachment 354698 [details]
patch to spot MD devices and not partitions on MD devices

Simple patch to cover the case where partitions exist on MD devices.

Comment 9 Jacek Danecki 2009-07-23 09:12:22 UTC

In which version of hal package will be this fix included?

Comment 10 Jacek Danecki 2009-07-23 09:58:39 UTC

I've patched version hal-0.5.12-26.20090226git.fc12.3 with this fix and problem disappeared.

Comment 11 Richard Hughes 2009-07-23 16:31:58 UTC

Patch looks okay, but I have a concern that if you have more than 9 partitions this will overflow, and the same bug will bite.

For a string of /sys/devices/virtual/block/md126/md126p1

Surely we need:

sscanf (hal_util_get_last_element (sysfs_path), "md%dp%d", &md_number, &tc) == 2
                                                     ^^^                       ^
rather than

sscanf (hal_util_get_last_element (sysfs_path), "md%d%c", &md_number, &tc) == 1

I'm also slightly concerned why this happened -- did md devices exist without partition numbers before? What happens if a device name without a p suffix (md126) gets pushed into a "md%dp%d" match?

Either way, this patch needs a little more work. I can look at this again tomorrow, but I would be good to discuss the patch first. Thanks.

Comment 12 Martin Poole 2009-07-23 16:50:18 UTC

No, there is no problem with more than 9 partitions with that patch.

Note we are looking for a single character after the number, we don't care what character merely that a non-numeric is present. sscanf returns the number of parameters it matches, so if there is anything after the initial digits the %c will match on whatever it is, any subsequent to the character is ignored and the returned number will be 2 which indicates it is not a plain MD device.

The partition names are automatically created by kpartx, called from hotplug, and seem to always have the 'p' indicator for partitions.

md devices have had partition tables before, but with F11 and the greater prevalence of virtual machines it is more common to have these appear (is my take).

Comment 13 Richard Hughes 2009-07-28 17:39:25 UTC

Fixed in 0.5.13-5, http://koji.fedoraproject.org/koji/taskinfo?taskID=1550358

Thanks Martin for the patch!

Comment 14 Michael Cutler 2009-08-12 08:43:15 UTC

Gents, is there a possibility that this patch can be released on the 0.5.12-fc11 updates branch.  Currently Fedora 11 is utterly useless on pretty standard RHEL-compatible equipment (HP xw4600 Workstation).  I can see it's been closed as a NEXTRELEASE but I would like to use Fedora again before F12 :) with hald segfaulting immediately after install you cannot get past firstboot without using rescue.

Comment 15 Richard Hughes 2009-08-12 08:56:05 UTC

Building for F11 here: http://koji.fedoraproject.org/koji/taskinfo?taskID=1600361

I'll do a bodhi update when that's complete.

Comment 16 Fedora Update System 2009-08-12 09:04:38 UTC

hal-0.5.12-29.20090226git.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/hal-0.5.12-29.20090226git.fc11

Comment 17 Fedora Update System 2009-08-20 20:55:37 UTC

hal-0.5.12-29.20090226git.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.