Bug 504402

Summary: lirc_serial is crashing kernel
Product: [Fedora] Fedora Reporter: Sebastian Vahl <fedora>
Component: kernelAssignee: Jarod Wilson <jarod>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 10CC: bugzilla.redhat, itamar, kernel-maint, phil4v7, quintela
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.29.6-213.fc11 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-06 23:07:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
oops report from kerneloops.org
none
BUG: unable to handle kernel NULL pointer dereference at (null) none

Description Sebastian Vahl 2009-06-06 10:47:51 UTC
Description of problem:
When using a remote with lirc_serial the kernel is crashing. This happens if you use irw twice. The first time it is working and the remote buttons are displayed. But after restarting irw the kernel crashes (on a tty you see a calltrace) and the machine freezes. There is no way to gather some information about the crash after the machine is freezed (dmesg and /var/log/messages don't get new entries when logged in via ssh).


Version-Release number of selected component (if applicable):
kernel-2.6.29.4-75.fc10.i686
lirc-0.8.5-2.fc10.i386
lirc-libs-0.8.5-2.fc10.i386

How reproducible:
ever

Steps to Reproduce:
1. modprobe lirc_serial
2. irw
3. press a button on the remote, it works
4. stop irw and restart it again
5. press a button on the remote, kernel crashes
  
Actual results:
crash and sometimes freeze


Expected results: 
working remote

Additional info: I'm loading lirc_serial this way via /etc/modprobe.conf/lirc.conf:
alias char-major-61 lirc_serial
options lirc_serial irq=4 io=0x3f8
install lirc_serial /bin/setserial /dev/ttyS0 uart none;\
/sbin/modprobe --ignore-install lirc_serial


This kerneloops is maybe related to this issue. It's the same crash but the kernel is tainted with the nvidia driver. For this bug report I removed the nvidia driver entirely but now the machine is hard freezing and I couldn't get more information (with the nvidia driver it only crashes the module but the machine was still responsible): http://www.kerneloops.org/submitresult.php?number=415193

Comment 1 Chuck Ebbert 2009-06-07 08:36:51 UTC
Created attachment 346773 [details]
oops report from kerneloops.org

Comment 2 Phil 2009-06-16 02:37:27 UTC
This bug is still present in my Fedora 11 installation as well. I'm using lirc to control my set-top box with an IR Blaster in a mythtv installation. The channel change script is successful exactly once per reboot. Any attempts to change the channel a second time with lirc results in a hard lock-up that requires a power cycle.

No error messages ever make it to any of the logs, so I'm unable to provide much detail at this point. I'm including my /etc/modprobe.d/lirc.conf file, but it's basically identical as the one above.

alias char-major-61-0 lirc_i2c
alias char-major-61-1 lirc_serial
options lirc_serial irq=4 io=0x3f8 softcarrier=1
####IR setup####
install lirc_i2c /sbin/modprobe ivtv; /sbin/modprobe --ignore-install lirc_i2c
install lirc_serial setserial /dev/ttyS0 uart none; /sbin/modprobe --ignore-install lirc_serial

All I can hope is to get a digital photo of the screen dump the next time I see it. I'm not sure why, but dumping error messages to the screen is less consistent than it locking up.

Comment 3 Phil 2009-06-16 03:07:45 UTC
Created attachment 348043 [details]
BUG: unable to handle kernel NULL pointer dereference at (null)

After a few attempts, my system survived long enough to send some error messages to the logs. I'm attaching them in case they're useful even though they look very similar to Sebastian's attachment.

Comment 4 Chuck Ebbert 2009-06-16 13:49:42 UTC
drivers/input/lirc/lirc_dev.c:456:
        lirc_buffer_clear(ir->buf);

drivers/input/lirc/lirc_dev.h:
static void lirc_buffer_clear(struct lirc_buffer *buf) 
{ 
        if (buf->fifo) 
                kfifo_reset(buf->fifo); 

include/linux/kfifo.h:62:
static inline void kfifo_reset(struct kfifo *fifo)
{
        unsigned long flags;

        spin_lock_irqsave(fifo->lock, flags);

looks like ir->buf->fifo->lock is NULL

Comment 5 Martin Andrews 2009-06-21 14:46:37 UTC
Same problem with my (non-tainted) F11 - I'm loading lirc_serial and lirc_i2c in the same way.

Last section of dmesg :

e100: eth0 NIC Link is Up 100 Mbps Full Duplex
ADDRCONF(NETDEV_UP): eth0: link is not ready
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
lirc_dev: IR Remote Control driver registered, major 248 
lirc_i2c: chip 0x10020 found @ 0x18 (Hauppauge IR)
lirc_dev: lirc_register_driver: sample_rate: 10
lirc_serial: auto-detected active high receiver
lirc_dev: lirc_register_driver: sample_rate: 0
eth0: no IPv6 routers present
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c0421b04>] __ticket_spin_lock+0x8/0x19
*pdpt = 0000000024c0f001 *pde = 0000000000000000 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/virtual/lirc/lirc1/dev
Modules linked in: lirc_serial lirc_i2c lirc_dev sco bridge stp llc bnep l2cap bluetooth sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 jfs dm_multipath uinput ivtvfb tuner_simple tuner_types tda9887 tda8290 ppdev tuner msp3400 saa7127 saa7115 dcdbas ivtv cx2341x iTCO_wdt snd_intel8x0 serio_raw v4l2_common iTCO_vendor_support pcspkr snd_ac97_codec ac97_bus i2c_i801 e100 videodev mii v4l1_compat tveeprom snd_usb_audio snd_pcm snd_timer snd_page_alloc snd_usb_lib snd_rawmidi snd_seq_device snd_hwdep snd parport_pc soundcore parport joydev ata_generic pata_acpi nouveau drm i2c_algo_bit i2c_core [last unloaded: p4_clockmod]

Pid: 2088, comm: lircd Not tainted (2.6.29.4-167.fc11.i686.PAE #1) Dimension 8300               
EIP: 0060:[<c0421b04>] EFLAGS: 00010046 CPU: 0
EIP is at __ticket_spin_lock+0x8/0x19
EAX: 00000000 EBX: 00000286 ECX: c07f8db3 EDX: 00000100
ESI: 00000000 EDI: 00000000 EBP: dfc0fe5c ESP: dfc0fe5c
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process lircd (pid: 2088, ti=dfc0e000 task=e4edb280 task.ti=dfc0e000)
Stack:
 dfc0fe64 c0421bc7 dfc0fe74 c0716d5d e4d77b00 e4dcf400 dfc0fe84 f2d7d6c2
 00000000 e4d77b88 dfc0fea4 c04ab191 dfc09000 e4f1eae0 e4f1eae0 dfc09000
 e4f1eae0 edc0d480 dfc0fec0 c04a73e3 e4481000 00000000 e4481000 dfc09000
Call Trace:
 [<c0421bc7>] ? default_spin_lock_flags+0x8/0xd
 [<c0716d5d>] ? _spin_lock_irqsave+0x30/0x37
 [<f2d7d6c2>] ? lirc_dev_fop_open+0xb4/0x189 [lirc_dev]
 [<c04ab191>] ? chrdev_open+0x11e/0x135
 [<c04a73e3>] ? __dentry_open+0x116/0x1f9
 [<c04a756e>] ? nameidata_to_filp+0x32/0x47
 [<c04ab073>] ? chrdev_open+0x0/0x135
 [<c04b1546>] ? do_filp_open+0x34c/0x5e9
 [<c04ab802>] ? cp_new_stat64+0xe3/0xf5
 [<c05687b1>] ? strncpy_from_user+0x38/0x54
 [<c04b94c3>] ? alloc_fd+0xd0/0xdc
 [<c04a71ea>] ? do_sys_open+0x47/0xbc
 [<c04a72ab>] ? sys_open+0x23/0x2b
 [<c040955e>] ? syscall_call+0x7/0xb
Code: 4f fd ff ff 5b eb 13 56 0f b7 d2 ff 75 08 89 d9 0f b6 c0 e8 6e fd ff ff 5a 59 8d 65 f8 5b 5e 5d c3 90 90 55 ba 00 01 00 00 89 e5 <3e> 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 5d c3 55 89 c2 89 
EIP: [<c0421b04>] __ticket_spin_lock+0x8/0x19 SS:ESP 0068:dfc0fe5c
---[ end trace c012546f4855adea ]---

Comment 6 Jarod Wilson 2009-06-30 05:06:42 UTC
Okay, finally got some time to poke at this... I was able to reproduce the problem, and I believe I've got it fixed. I can now run irw and press buttons w/o triggering an oops, kill irw, start it back up again, keep getting button presses, etc. I've pushed a build into koji which I'd appreciate folks testing to verify this is indeed fixed:

http://koji.fedoraproject.org/koji/taskinfo?taskID=1443320

Comment 7 Jarod Wilson 2009-06-30 05:07:57 UTC
Whoops, forgot this was originally posted against F10, not F11... Here's the matching F10 build too:

http://koji.fedoraproject.org/koji/taskinfo?taskID=1443322

Comment 8 Phil 2009-07-01 02:56:04 UTC
I attempted to test the -206 build this evening and was unsuccessful. I didn't make it far enough into the boot to be able to get logs or a screenshot, so I wrote down what little I could. I'm not sure if what I wrote down is sufficient or if you need more info, or if my new bug is even related to this bug or one of the other changes between -191 and -206. I suspect it's something different because I don't think I even made it to lirc_serial in the short boot-time. At any rate, the kernel panicked three times consecutively and the EIP line showed:

__validate_creds+0x1d/0x25

The call trace showed:

prepare_creds+0x13c/0x14e
sys_faccessat+0x33/0x16e
fput+0x18/0x1a
sys_access+0x15/0x17
syscall_call+0x7/0xb

I don't know if that information is useful, but if you can point me to what portion of the panic screen is most useful, I'd be happy to provide it.

Comment 9 Phil 2009-07-01 23:40:55 UTC
It looks like the -209 build fixed my kernel panics and the bug fixes you added to -206 seem to be working beautifully on the lirc_serial module. I have a functional mythtv setup back, thanks. :)

Comment 10 Jarod Wilson 2009-07-02 00:27:05 UTC
Excellent to hear, Phil, thanks for the feedback!

Comment 11 Sebastian Vahl 2009-07-06 23:07:10 UTC
I upgraded to F11 in the meantime. And with -211 all seems to be working fine for now. So I consider this bug as fixed.

Comment 12 Sebastian Vahl 2009-07-06 23:07:52 UTC
Well, just to forgotten to say thanks for fixing this. :)

Comment 13 Fedora Update System 2009-07-08 12:14:00 UTC
kernel-2.6.29.6-213.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/kernel-2.6.29.6-213.fc11

Comment 14 Fedora Update System 2009-07-22 21:57:38 UTC
kernel-2.6.29.6-213.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.