Bug 135202
Description
Andrew Duggan
2004-10-10 04:15:01 UTC
Created attachment 104979 [details]
Output of hald --daemon=no --verbose=yes
Sorry about the 60-printer_remove.hal stuff - after I started seeing this
problem, I notice that the printe_removal.hal didn't start with a number so I
just guessed at one and renamed it; I also change the -x to a -r since the
--help didn't say anything about -x (looking at the output now I see that's not
right either, but I don't think its involved as the hald hanging was their
before I changed that. If you require, I will change it back and regenerate the
output.
Created attachment 104980 [details]
lspci -vvv just in case
Sorry for all the mispellings
Created attachment 104981 [details]
Output of lshal
One last thing, unplugging the WG511 does not unhang hald
Created attachment 105002 [details]
hald --daemon=no --verbose=yes
I redid the tests after reinstall hal*0.2.98.cvs20040929-3 + hal-cups-utils
here is the hald output. Plus I did this a little more scientifically. I run
this from vc3 then logged into a gnome desktop. I did 2 normal plugin, use,
unmount and unplug of my usb flash disk, then I plugged in the WG511 then the
usb flash disk again.
Created attachment 105003 [details]
lshal output 20041011
20041011 output of lshal at the end WG511 and flash disk inserted
Created attachment 105004 [details]
lspci -vvv 20041011
Created attachment 105006 [details]
dbus-monitor --system output 20041011
Here is the output of the dbus-monitor --system during the test
Sorry to deluge you with changes on this -- just thought it was better to have the test based on reference. Also FWIW I took the latest hal CVS as of 20041010, and I got the same net results, which is not surprising as there were not a lot of changes. But I put the fc3t2+ hal-*.cvs20040929-3 back before running the tests results I uploaded today. Hi, sorry for the lag and thanks for the detailed logs. According to the hald trace in comment 4, hald never receives the hotplug event with the sequence number 330 and that is what is causing the processing of hotplug events in hald to freeze. I'm wondering why that is - does (temporarily) removing the /etc/dev.d/net/hotplug.dev file make things work again? Thanks, David David, No problem -- I actually wasn't expecting such a quick response, so cool. However, I tried that tonight, and it did not make any difference (I even rebooted just to make sure) I did notice (and I should have seen it sooner) the following in my /var/log/messages ... udev: removing device node '/dev/0000:03:00.0' ... that just happens to match the end of the sysfs path for the card when plugged into the bottom slot, and is contemporary with other messages relating to the insertion of the card. I am attaching that section [messages.20041015] plus a dmesg [dmesg.20041015] output. Pardon the speculation, but since there is nothing in the log about creating /dev/0000:03:00.0 could that be involved? esp considering we having a MIA hotplug seqnum. After collecting that I ginned up a front end to /sbin/hotplug to log every invocation to debug-hotplug.log, then call /sbin/hotplug. Sorry this is still all new to me, there is probably an easier way to do that, but I would be much more comfortable looking at a VMS crash dump :-) I am attaching that [debug-hotplug.log] as well as the hald [hald-20041015] output. They are both missing the same (appearently critical) sequence number seqnum=351 so what ever that event is, at least to me, it does not look like its comming from the kernel. FWIW the messages and dmesg are from before I started down the hotplug script path. Is this maybe a hotplug{udev,kernel?} bug that hal is a downstream victim of? Also just so you have the complete picture, I had already built a hal with Martin Pitts patch to pci_bus_device.c, and that's what I am running right now. (I had to try) I would be happy to run anything in debug but I would need some guidance on where to set break or watch points. Thanks, Andrew Created attachment 105264 [details]
snip of var/log/messages showing the udev removing device node /dev/0000:03:00.0
Created attachment 105265 [details]
dmesg from the same time as the messages-20041015
Created attachment 105266 [details]
output of a front end to /sbin/hotplug
Created attachment 105267 [details]
hald --daemon=no --verbose=yes (while dbug-hotplug was being collected)
Hi David. I am sure this is either a kernel or Prism54 driver bug, so I guess you can close it. I spent the afternoon playing with the /lib/kobject.c from the kernel. After the sequence_number has been incremented there are two paths that allow an exit without actually calling call_usermodehelper() to invoke /sbin/hotplug. This is the missing hotplug event. I hacked a trival patch to kobject.c and it fixes this problem exactly. I am attaching the patch just as an fyi as this is really a kernel "bug". The other solution I am working on is a similar patch to kobject.c but to emit a "void" hotplug event if the call to kset->hotplug_ops->hotplug() fails. That is still building so I've not tested that one yet. Plus I need to verify my understanding of the env block that is being created so that any garbage created by the failed call to kset->hotplug_ops->hotplug() in the env block is cleaned up and the original argv [1] (name) gets put as an env var e.g. OLDNAME. Of course this does not address the immediate root cause of the behaviour I was seeing as I am guessing its something in the prism54 driver that is making the kset->hotplug_ops->hotplug() fail. However, if (as it seems) that HAL really prefers there be no "gaps" in the hotplug sequence numbers, then it is better that bugs in drivers (that don't even cause an Oops) should make hald virtually hang. Thanks, Andrew Created attachment 105341 [details]
hoplug sequence number patch in lib/kobject.c that greatly reduces gaps in hotplug seqnums
Dave, any comments on the patch? *** Bug 135677 has been marked as a duplicate of this bug. *** Created attachment 105412 [details]
Updated lib/kobject.patch based on feed back up stream
If you are still interested, here is an updated patch based on some feedback
from Greg KH. The big change is that sequence_number never needs to be
decremented. This one puts the seqnum in a different buffer, so it can put off
getting a sequence number until after all the conditional exits. A spot in the
envp array had to reserved for the sequence number as the call to
kset->hotplug_ops->hotplug() does not return a new value for i. It can't update
the scratch pointer either so there is no way to know where the end of the used
portion of buffer was.
Thx Andrew
Well, hald should really be able to cope with missing hotplug events so I did this. It's a sign of a driver bug for sure, but you'll see something along the lines of this in your syslog Oct 18 18:17:11 davidz hald[1145]: Timed out waiting for hotplug event 784 This fix is in hal-0.4.0-4 which is available in Rawhide tomorrow. I was also able to test this with a Cardbus card with using this driver so I'm closing as RAWHIDE. Andrew: Feel free to reopen this bug if it doesn't work for you. Thanks, David |