Bug 459523

Summary: lirc_imon hangs kernel 2.6.27-0.244.rc2.git1.fc10 and later
Product: [Fedora] Fedora Reporter: Tom Horsley <horsley1953>
Component: kernelAssignee: Jarod Wilson <jarod>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: medium    
Version: rawhideCC: kernel-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-09-09 01:15:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/dmesg from the kernel that wouldn't boot
none
tail of /var/log/messages from boot failure
none
transcript of boot from serial console none

Description Tom Horsley 2008-08-19 17:35:35 UTC
Description of problem:

Just got updates for a fresh install of f10 alpha, and the kernel
2.6.27-0.244.rc2.git1.fc10 that was installed by the update won't
boot. The boot process hangs right after printing info about
sd 6:0:0:0 (which I think is the extern USB drive). Just sits
there forever after that (though it does respond to Ctrl-Alt-Del).

Version-Release number of selected component (if applicable):
2.6.27-0.244.rc2.git1.fc10

How reproducible:
Every time I try to boot.

Steps to Reproduce:
1. see above
2.
3.
  
Actual results:
see above

Expected results:
see above

Additional info:

Hardware on this box described at:
http://www.smolts.org/client/show/pub_600e783b-1a34-46e1-b105-e86df2bfc41f

Comment 1 Dave Jones 2008-08-26 16:53:21 UTC
Can you try the rc4 kernel at 
http://koji.fedoraproject.org/koji/buildinfo?buildID=60268

Comment 2 Tom Horsley 2008-08-26 22:41:17 UTC
Tried installing kernel-2.6.27-0.280.rc4.git4.fc10.x86_64.rpm
and I see the same hang.

System is not totally frozen, it merely appears to be waiting on something
which will never happen. It does reboot if I type Ctrl-Alt-Del, and even
logs a few messages after that as though it is shutting things down.

Paying closer attention, I notice what appears to be a kernel crash walkback
(or some kind of walkback anyway) zip past the screen early in the boot
process. Possibly some part of new init script system crashes which is what
I'm waiting on later? (I don't have the energy to get a serial console
hooked up to capture it :-).

Almost the first thing the kernel prints is:

PCI 0000:05:00.0: BAR3, can't allocate resource.

That may have something to do with the weird PCIE<->AGP bridge on this
motherboard, which might be relevant to the problem (random guess).

The last thing I see before the hang is messages about the external
USB disk, with the very last message being:

sd 6:0:0:0: Attached scsi generic sg3 type0

but I don't think that has much to do with it, since if I unplug the
disk and reboot, the last message then becomes the one from before
the sd 6 messages, this one about the intel8x0 sound chip:

intel8x0: clocking to 46866

So it is something that would happen after that which is hanging.

I'll attach the tail end of /var/log/messages and the /var/log/dmesg
file from the last attempt to boot (which I ended with Ctrl-Alt-Del
after letting it sit for a while doing nothing).

Comment 3 Tom Horsley 2008-08-26 22:42:23 UTC
Created attachment 315057 [details]
/var/log/dmesg from the kernel that wouldn't boot

Comment 4 Tom Horsley 2008-08-26 22:43:06 UTC
Created attachment 315058 [details]
tail of /var/log/messages from boot failure

Comment 5 Tom Horsley 2008-08-26 22:45:34 UTC
Is there any way to add a kernel parameter or something to "single step"
the boot process? I have a feeling if I could go through the init one step
at a time, I'd be able to see exactly what crashes before the next
init items scroll it off the screen.

Comment 6 Dave Jones 2008-08-26 23:06:11 UTC
From the look of those logs, the kernel is done booting.

As for debugging the boot process after that.. umm, Bill?

Comment 7 Tom Horsley 2008-08-27 01:29:36 UTC
I've watched the crash several times now, and it does get through the
nash init script, says "switching to new root and running init", then
says "starting udev", and right after the crash happens.

I've been trying to find where "starting udev" might get written, and
I've found there isn't an /etc/rc.d/init.d/udev script, but I haven't
found what is doing it instead of such a script :-).

Comment 8 Bill Nottingham 2008-08-27 15:21:05 UTC
That's from rc.sysinit; it implies that it's oopsing when loading a driver for your box.  Booting with 'udevinfo' or 'udevdebug' may help.

Comment 9 Tom Horsley 2008-08-27 23:18:30 UTC
Found my null modem cable, and used kermit to get a transcript of the boot.

Looks like the lirc_imon driver is what is killing it (I'll attach the boot
transcript). My HTPC case happens to include a IR remote that uses that
driver (but it works fine in fedora 9, no kernel oops).

When I rename the lirc_imon driver under /lib/modules, the boot proceeds
normally.

I also discovered that it isn't really hung forever. I was spending some time
examining the boot log I had accumulated, when I discovered the system
was up and finishing the boot process, so it merely hangs for a very long
time.

Once the system was up, the kerneloops applet offered to send info about the
crash, and I said yes, so if that worked, there may be a report somewhere
out there to go with this bugzilla.

Comment 10 Tom Horsley 2008-08-27 23:19:39 UTC
Created attachment 315159 [details]
transcript of boot from serial console

Comment 11 Dave Jones 2008-08-28 02:41:49 UTC
ah, that's helpful. thanks.
Jarod, can you take a look into this ?

Comment 12 Jarod Wilson 2008-08-28 04:34:38 UTC
Ew. I think I saw a similar-ish report of a system w/an imon failing to boot 2.6.26+ or 2.6.27 on the lirc list in the past few days, but there were no details. That driver *did* get a substantial update recently, I'll take a look at it. Wish I had an imon here to play with, would make it easier...

Comment 13 Tom Horsley 2008-08-28 12:32:58 UTC
I'd be happy to send you mine, but it is manufactured in as part
of the case, so I don't think I can get it out :-). If you want me
to try swapping in a module with a gazillion printks added or
something like that, I can give it a whirl.

Comment 14 Jarod Wilson 2008-08-28 13:43:39 UTC
My suspicion is that the changes added for the lcd-based imon vs. the original vfd-based imon broke something. It *could* be some change in 2.6.27 vs. earlier kernels that is causing issues for some reason though, so first thing to try would be the latest 2.6.26.x F9 kernel to see if it fails the same way.

This kernel...

http://kojipkgs.fedoraproject.org/packages/kernel/2.6.26.3/17.fc9/

...has identical lirc_imon code as rawhide. If that still fails, we can start throwing in printk's and/or back out the lcd imon changes and see what we can see...

Comment 15 Jarod Wilson 2008-08-28 14:59:04 UTC
Two more things:

1) if you toss 'options lirc_imon debug=1', you should get some extra debug spew.

2) is your device possibly lcd-based?

Comment 16 Tom Horsley 2008-08-28 15:03:50 UTC
Nope, no LCD on my remote. It is fairly old and came with my zalman
TNN 500AF case (http://home.att.net/~Tom.Horsley/zooty/zooty.html).

I'll try the new F9 kernel when I get home today.

Comment 17 Tom Horsley 2008-08-28 21:37:50 UTC
Bad news I guess - the 2.6.26.3 kernel on F9 x86_64 seems to work
fine with the lirc_imon driver. I even ran my program to recognize
button pushes and it seems to give me good data from the lirc device.

Comment 18 Jarod Wilson 2008-09-05 20:40:11 UTC
Okay, we've done a ton of work on the lirc drivers in the past few days, including a fairly large update to lirc_imon... Nothing has been done specifically to address this issue, but there's a chance we've fixed things by some of the changes made... Can you try with a 2.6.27-rc5-git6 or later build?

http://kojipkgs.fedoraproject.org/packages/kernel/2.6.27/0.305.rc5.git6.fc10/

Also hoping perhaps some usb interrupt handling changes that went in might have helped...

Comment 19 Tom Horsley 2008-09-06 00:02:03 UTC
I'm afraid that didn't help. I see what appears to be the same
kernel backtrace and the same hang slightly later. This was using
the kernel/2.6.27/0.305.rc5.git6.fc10/ pointed at above.

Comment 20 Jarod Wilson 2008-09-06 04:18:37 UTC
Darn. Hrm, actually, I was incorrect, the latest lirc_imon bits aren't in that build, they're in git7 and later, which hasn't been built yet... However, I'm inclined to think the problem will persist, and what I really need to do is take a long hard look at the backtrace and the code.

Comment 21 Tom Horsley 2008-09-08 22:11:06 UTC
Good news! I just installed latest rawhide with 2.6.27-0.314.rc5.git9.fc10.x86_64
and the lirc_imon driver not only doesn't hang, it even apparently works
(at least I seem to get IR data from the receiver).

Comment 22 Jarod Wilson 2008-09-09 01:15:44 UTC
Hey, awesome, we got lucky! :)

As it happens, I'm *just* about to submit the lirc patch series to lkml for hopeful upstream inclusion, so this is great news. If you don't mind, I'd like to add you to the lirc_imon portion with a Tested-by: line.

Comment 23 Tom Horsley 2008-09-09 01:52:27 UTC
>If you don't mind, I'd like
>to add you to the lirc_imon portion with a Tested-by: line.

Sounds OK to me.