From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041020 Description of problem: hald crash on sk98lin module, but it worked on earlier versions of hal. sh-3.00# hald --verbose=yes --daemon=no <..snip..> 23:39:21.704 [I] linux/osspec.c:793: handling /sys/class/net/eth1 net 23:39:21.711 [E] linux/net_class_device.c:137: SIOCGMIIREG on eth1 failed: Bad address 23:39:21.711 [E] linux/net_class_device.c:137: SIOCGMIIREG on eth1 failed: Bad address 23:39:21.711 [W] linux/net_class_device.c:257: Error reading link info 23:39:21.711 [E] linux/net_class_device.c:137: SIOCGMIIREG on eth1 failed: Bad address 23:39:21.711 [W] linux/net_class_device.c:193: Error reading rate info Segmentation fault Version-Release number of selected component (if applicable): hal-0.4.2-1.FC3 How reproducible: Always Steps to Reproduce: 1. service haldaemon start 2. 3. Additional info:
Created attachment 108070 [details] last fragment of: strace hald --verbose=yes --daemon=no 2>strace.hald
I have precisely such problem.
# lspci -v -s 02:05.0 02:05.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T [Marvell] (rev 12) Subsystem: ASUSTeK Computer Inc. P4P800 Mainboard Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 217 Memory at feaf8000 (32-bit, non-prefetchable) [size=16K] I/O ports at d800 [size=256] Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data
Exactly the same problem and hardware here. See bug id 142218.
*** Bug 142218 has been marked as a duplicate of this bug. ***
Can anyone give me a backtrace? You'll need the hal-debuginfo package installed and do # gdb /usr/sbin/hald and then give the command 'run --daemon=no --verbose=yes'. When the crash occurs give the command 'backtrace' and post the output. Also see http://fedora.linux.duke.edu/wiki/index.cgi/StackTraces Thanks, David
Created attachment 108225 [details] backtrace of hald
Serg: Thanks for the backtrace - please also attach the output of 'tree /sys' (you'll need pkg tree for that command).
Created attachment 108239 [details] tree /sys Probably such information will assist: # mii-tool SIOCGMIIPHY on 'eth0' failed: Bad address no MII interfaces found #
Interesting - fixed one little glitch; not sure that it will solve the issue but please try out these RPM's http://people.redhat.com/davidz/hal-testing2/ and let me know; thanks.
still doesn't work for me
ditto
After applying the patch above (hal-0.4.2.cvs20041210-1.i386.rpm) the problem is still the same. The same segmentation fault when run "hald --verbose=yes --daemon=no".
Hmm, try running '/usr/bin/valgrind --tool=memcheck /usr/sbin/hald --daemon=no --verbose=yes' and post the output.
Here is interesting part: 00:49:05.362 [I] linux/osspec.c:793: handling /sys/class/net/eth1 net 00:49:05.450 [E] linux/net_class_device.c:137: SIOCGMIIREG on eth1 failed: Bad address 00:49:05.451 [E] linux/net_class_device.c:137: SIOCGMIIREG on eth1 failed: Bad address 00:49:05.452 [W] linux/net_class_device.c:257: Error reading link info 00:49:05.458 [E] linux/net_class_device.c:137: SIOCGMIIREG on eth1 failed: Bad address 00:49:05.459 [W] linux/net_class_device.c:193: Error reading rate info ==4082== ==4082== Invalid read of size 2 ==4082== at 0x3E6550: (within /usr/lib/libgobject-2.0.so.0.400.8) ==4082== by 0x3E7CBB: g_signal_emit_valist (in /usr/lib/libgobject-2.0.so.0.400.8) ==4082== by 0x3E7F59: g_signal_emit (in /usr/lib/libgobject-2.0.so.0.400.8) ==4082== by 0x804E1DE: ??? (device.c:742) ==4082== Address 0x5614 is not stack'd, malloc'd or (recently) free'd ==4082== ==4082== Process terminating with default action of signal 11 (SIGSEGV) ==4082== Access not within mapped region at address 0x5614 ==4082== at 0x3E6550: (within /usr/lib/libgobject-2.0.so.0.400.8) ==4082== by 0x3E7CBB: g_signal_emit_valist (in /usr/lib/libgobject-2.0.so.0.400.8) ==4082== by 0x3E7F59: g_signal_emit (in /usr/lib/libgobject-2.0.so.0.400.8) ==4082== by 0x804E1DE: ??? (device.c:742) ==4082== ==4082== ERROR SUMMARY: 92 errors from 24 contexts (suppressed: 21 from 1) ==4082== malloc/free: in use at exit: 3688412 bytes in 13978 blocks. ==4082== malloc/free: 63738 allocs, 49760 frees, 15613063 bytes allocated. ==4082== For a detailed leak analysis, rerun with: --leak-check=yes ==4082== For counts of detected errors, rerun with: -v
*** Bug 142671 has been marked as a duplicate of this bug. ***
I have this problem too, on the same hardware, after installing the fc3 updates. Just one more observation - even when I roll back HAL to the version shipped with the release of fc3 (0.4.0-10), I still get the segmentation fault. This occurs whether I use the original kernel, or the one currently available as a fc3 update (2.6.9-1.681_FC3smp). This makes me wonder whether the bug has been introduced via one of the other fc3 updates.
Here are the fc3 packages that were installed on my host just prior to the onset of HAL seg faults: [Fri Dec 10 21:28:16 2004] up2date installing packages: ['Omni-0.9.2-1.1', 'Omni-foomatic-0.9.2-1.1', 'gaim-1.1.0-0.FC3', 'glib2-2.4.8-1.fc3', 'glib2-devel-2.4.8-1.fc3', 'gtk2-2.4.14-1.fc3', 'gtk2-devel-2.4.14-1.fc3', 'libpng-1.2.8-1.fc3', 'libpng-devel-1.2.8-1.fc3', 'libpng10-1.0.18-1.fc3', 'libpng10-devel-1.0.18-1.fc3', 'nfs-utils-1.0.6-44', 'rhpl-0.148.1-2', 'rsh-0.17-24.1', 'selinux-policy-targeted-1.17.30-2.39', 'shadow-utils-4.0.3-56', 'udev-039-10.FC3.5', 'wireless-tools-27-0.pre25.3', 'xorg-x11-6.8.1-12.FC3.21', 'xorg-x11-Mesa-libGL-6.8.1-12.FC3.21', 'xorg-x11-Mesa-libGLU-6.8.1-12.FC3.21', 'xorg-x11-deprecated-libs-6.8.1-12.FC3.21', 'xorg-x11-deprecated-libs-devel-6.8.1-12.FC3.21', 'xorg-x11-devel-6.8.1-12.FC3.21', 'xorg-x11-font-utils-6.8.1-12.FC3.21', 'xorg-x11-libs-6.8.1-12.FC3.21', 'xorg-x11-tools-6.8.1-12.FC3.21', 'xorg-x11-twm-6.8.1-12.FC3.21', 'xorg-x11-xauth-6.8.1-12.FC3.21', 'xorg-x11-xfs-6.8.1-12.FC3.21'
/usr/lib/libgobject-2.0.so.0.400.8 is where segfault occurred, it's part of glib2 (glib2-2.4.8-1.fc3). https://www.redhat.com/archives/fedora-announce-list/2004-December/msg00055.html
I just rolled back to the glib2 version that was released with fc3 (2.4.7-1). HAL is no longer crashing on startup. (I did my upgrade with 'rpm -U --oldpackage glib2-2.4.7-1*rpm' - this leaves dangling symlinks in /usr/lib that need to be fixed manually)
so what do we do from here? where do we get updates on the status of this problem?
yes, I'm unclear on whether it's a glib2 bug, or whether the crash occurs in glib2 because HAL is passing it junk.
Matthias, do you know of any known regressions between glib2-2.4.8-1 and glib2-2.4.7-1?
Created attachment 108756 [details] Rever patch of gobject/gsignal.c This small patch reverts gsignal.c to version 2.4.7, that will make HAL working again. Maybe somewhere there our bug is hiding, but where ?
*** Bug 143176 has been marked as a duplicate of this bug. ***
Created attachment 108771 [details] More precise backtrace of hald static inline void handler_ref (Handler *handler) { g_return_if_fail (handler->ref_count > 0); ^^^^^^^ It segfault here (if I get it right): glib-2.4.8/gobject/gsignal.c:564
David, I didn't know of problems with the gsignal optimization in 2.4.8 so far. Looking at the patch, nothing obvious jumps out. Can you reproduce the segfault ? It might be worth trying to run the thing under valgrind to see if the handler list becomes corrupted at some point. I'll join your efforts to debug this on Monday, as I won't be there tomorrow.
One further question: is hald using threads, so that reentrancy issues could be involved ?
Hi Matthias, No, hald is not using threads; there is some reentrancy involved though due to the rather asynchronous nature of how hald works. There is also a valgrind trace in comment 15. Btw, the bug only seems to occur with the sk98lin network driver. I'm suspecting it's writing too much data into a struct allocated on the stack when doing an ioctl(), thereby corrupting memory. I will try to dig into the driver source to see what is happening. I'll also try to allocate the struct for the ioctl on the to see if that makes the crash go away.
Ok, my next idea would be to write a function to check the integrity of the handler list and call that from suitable places to catch when and how it might get corrupted.
David, Relating to your idea about struct overflow on an ioctl(), my host has three ethernet controllers as shown below (output from lspci). The 3Com one is on the motherboard, and it is *disabled* under linux (e.g. not listed by ifconfig), so there's no network traffic going through it. I'm hoping that this observation will save you some time, since it means that any such leak is occurring even without transmission of ethernet data. 02:05.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T [Marvell] (rev 12) 02:09.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 02:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
I've tried the approach I've mentioned in comment 29. Please try the RPM's from http://people.redhat.com/davidz/cvs20050103/ If you're not on x86 you may rebuild from the SRPM. Thanks, David
Yes, it works for me. Big Thanks, Marek.
Works fine here too (glib2-2.6). Thanks, Igor
For me too works. Thanks, Serg.
Yes, this makes hald stop crashing for me. I still can't seem to talk to either of my CF card readers, though, but I imagine that's a different bug? When I plug either of them in, I get this in syslog, and nothing shows up in /media/. I have seen both of these readers work (eratically) on RH9. Generally they'd work once, then if I tried to use them again the next day, my only option would be to reboot first. Dazzle USB 2.0 reader (unsure of model number): Jan 3 19:35:39 grendel kernel: usb 5-1: new full speed USB device using address 52 Jan 3 19:35:39 grendel kernel: usb 5-1: device not accepting address 52, error -71 Jan 3 19:35:39 grendel kernel: usb 5-1: new full speed USB device using address 53 Jan 3 19:35:40 grendel kernel: usb 5-1: device not accepting address 53, error -71 SanDisk ImageMate SDDR-31 USB 1.0 reader: Jan 3 19:37:57 grendel kernel: usb 5-1: new full speed USB device using address 56 Jan 3 19:37:58 grendel kernel: usb 5-1: device not accepting address 56, error -71 Jan 3 19:37:58 grendel kernel: usb 5-1: new full speed USB device using address 57 Jan 3 19:37:58 grendel kernel: usb 5-1: device not accepting address 57, error -71 The SanDisk works fine in an (elderly, USB-1) Macintosh; the Dazzle doesn't.
Fixed for me. Now using glib2-2.4.8-1 OK. Thanks.
hal daemon now works for me, but my cf card reader is not recognized
This fix is in hal-0.4.5 available from Rawhide and it will also appear as a FC3 update. Closing.