Bug 459523
Summary: | lirc_imon hangs kernel 2.6.27-0.244.rc2.git1.fc10 and later | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Tom Horsley <horsley1953> | ||||||||
Component: | kernel | Assignee: | Jarod Wilson <jarod> | ||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | rawhide | CC: | kernel-maint | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2008-09-09 01:15:44 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Tom Horsley
2008-08-19 17:35:35 UTC
Can you try the rc4 kernel at http://koji.fedoraproject.org/koji/buildinfo?buildID=60268 Tried installing kernel-2.6.27-0.280.rc4.git4.fc10.x86_64.rpm and I see the same hang. System is not totally frozen, it merely appears to be waiting on something which will never happen. It does reboot if I type Ctrl-Alt-Del, and even logs a few messages after that as though it is shutting things down. Paying closer attention, I notice what appears to be a kernel crash walkback (or some kind of walkback anyway) zip past the screen early in the boot process. Possibly some part of new init script system crashes which is what I'm waiting on later? (I don't have the energy to get a serial console hooked up to capture it :-). Almost the first thing the kernel prints is: PCI 0000:05:00.0: BAR3, can't allocate resource. That may have something to do with the weird PCIE<->AGP bridge on this motherboard, which might be relevant to the problem (random guess). The last thing I see before the hang is messages about the external USB disk, with the very last message being: sd 6:0:0:0: Attached scsi generic sg3 type0 but I don't think that has much to do with it, since if I unplug the disk and reboot, the last message then becomes the one from before the sd 6 messages, this one about the intel8x0 sound chip: intel8x0: clocking to 46866 So it is something that would happen after that which is hanging. I'll attach the tail end of /var/log/messages and the /var/log/dmesg file from the last attempt to boot (which I ended with Ctrl-Alt-Del after letting it sit for a while doing nothing). Created attachment 315057 [details]
/var/log/dmesg from the kernel that wouldn't boot
Created attachment 315058 [details]
tail of /var/log/messages from boot failure
Is there any way to add a kernel parameter or something to "single step" the boot process? I have a feeling if I could go through the init one step at a time, I'd be able to see exactly what crashes before the next init items scroll it off the screen. From the look of those logs, the kernel is done booting. As for debugging the boot process after that.. umm, Bill? I've watched the crash several times now, and it does get through the nash init script, says "switching to new root and running init", then says "starting udev", and right after the crash happens. I've been trying to find where "starting udev" might get written, and I've found there isn't an /etc/rc.d/init.d/udev script, but I haven't found what is doing it instead of such a script :-). That's from rc.sysinit; it implies that it's oopsing when loading a driver for your box. Booting with 'udevinfo' or 'udevdebug' may help. Found my null modem cable, and used kermit to get a transcript of the boot. Looks like the lirc_imon driver is what is killing it (I'll attach the boot transcript). My HTPC case happens to include a IR remote that uses that driver (but it works fine in fedora 9, no kernel oops). When I rename the lirc_imon driver under /lib/modules, the boot proceeds normally. I also discovered that it isn't really hung forever. I was spending some time examining the boot log I had accumulated, when I discovered the system was up and finishing the boot process, so it merely hangs for a very long time. Once the system was up, the kerneloops applet offered to send info about the crash, and I said yes, so if that worked, there may be a report somewhere out there to go with this bugzilla. Created attachment 315159 [details]
transcript of boot from serial console
ah, that's helpful. thanks. Jarod, can you take a look into this ? Ew. I think I saw a similar-ish report of a system w/an imon failing to boot 2.6.26+ or 2.6.27 on the lirc list in the past few days, but there were no details. That driver *did* get a substantial update recently, I'll take a look at it. Wish I had an imon here to play with, would make it easier... I'd be happy to send you mine, but it is manufactured in as part of the case, so I don't think I can get it out :-). If you want me to try swapping in a module with a gazillion printks added or something like that, I can give it a whirl. My suspicion is that the changes added for the lcd-based imon vs. the original vfd-based imon broke something. It *could* be some change in 2.6.27 vs. earlier kernels that is causing issues for some reason though, so first thing to try would be the latest 2.6.26.x F9 kernel to see if it fails the same way. This kernel... http://kojipkgs.fedoraproject.org/packages/kernel/2.6.26.3/17.fc9/ ...has identical lirc_imon code as rawhide. If that still fails, we can start throwing in printk's and/or back out the lcd imon changes and see what we can see... Two more things: 1) if you toss 'options lirc_imon debug=1', you should get some extra debug spew. 2) is your device possibly lcd-based? Nope, no LCD on my remote. It is fairly old and came with my zalman TNN 500AF case (http://home.att.net/~Tom.Horsley/zooty/zooty.html). I'll try the new F9 kernel when I get home today. Bad news I guess - the 2.6.26.3 kernel on F9 x86_64 seems to work fine with the lirc_imon driver. I even ran my program to recognize button pushes and it seems to give me good data from the lirc device. Okay, we've done a ton of work on the lirc drivers in the past few days, including a fairly large update to lirc_imon... Nothing has been done specifically to address this issue, but there's a chance we've fixed things by some of the changes made... Can you try with a 2.6.27-rc5-git6 or later build? http://kojipkgs.fedoraproject.org/packages/kernel/2.6.27/0.305.rc5.git6.fc10/ Also hoping perhaps some usb interrupt handling changes that went in might have helped... I'm afraid that didn't help. I see what appears to be the same kernel backtrace and the same hang slightly later. This was using the kernel/2.6.27/0.305.rc5.git6.fc10/ pointed at above. Darn. Hrm, actually, I was incorrect, the latest lirc_imon bits aren't in that build, they're in git7 and later, which hasn't been built yet... However, I'm inclined to think the problem will persist, and what I really need to do is take a long hard look at the backtrace and the code. Good news! I just installed latest rawhide with 2.6.27-0.314.rc5.git9.fc10.x86_64 and the lirc_imon driver not only doesn't hang, it even apparently works (at least I seem to get IR data from the receiver). Hey, awesome, we got lucky! :) As it happens, I'm *just* about to submit the lirc patch series to lkml for hopeful upstream inclusion, so this is great news. If you don't mind, I'd like to add you to the lirc_imon portion with a Tested-by: line. >If you don't mind, I'd like
>to add you to the lirc_imon portion with a Tested-by: line.
Sounds OK to me.
|