Bug 243067
Summary: | Kernel panic using USB serial I/O | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Greg Bailey <gbailey> |
Component: | kernel | Assignee: | Pete Zaitcev <zaitcev> |
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.5 | CC: | jbaron, jtluka, peterm, riek, syeghiay |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-05-18 19:26:38 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 430698, 461297 | ||
Attachments: |
Description
Greg Bailey
2007-06-07 05:48:07 UTC
Created attachment 156429 [details]
/var/log/messages excerpt from "proxy7000" server
Created attachment 156430 [details]
/var/log/messages excerpt from "communicator1" server
I'm able to reproduce this on kernel-smp-2.6.9-55.EL. The panic occurs within a minute of two of stressing USB serial activity. Jun 7 14:05:58 geb-test0 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000070 Jun 7 14:05:58 geb-test0 kernel: printing eip: Jun 7 14:05:58 geb-test0 kernel: c02004b1 Jun 7 14:05:58 geb-test0 kernel: *pde = 3572b001 Jun 7 14:05:58 geb-test0 kernel: Oops: 0000 [#1] Jun 7 14:05:58 geb-test0 kernel: SMP Jun 7 14:05:58 geb-test0 kernel: Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ipt_LOG ipt_state ip_conntrack iptable_filter ip_tables dm_mirror dm_mod button ftdi_sio usbserial battery ac uhci_hcd ehci_hcd hw_random e1000 sky2 ext3 jbd ata_piix libata sd_mod scsi_mod Jun 7 14:05:58 geb-test0 kernel: CPU: 0 Jun 7 14:05:58 geb-test0 kernel: EIP: 0060:[<c02004b1>] Not tainted VLI Jun 7 14:05:58 geb-test0 kernel: EFLAGS: 00010246 (2.6.9-55.ELsmp) Jun 7 14:05:58 geb-test0 kernel: EIP is at tty_get_baud_rate+0x3/0x3e Jun 7 14:05:58 geb-test0 kernel: eax: 00000000 ebx: 00000000 ecx: 00000000 edx: f7f10000 Jun 7 14:05:58 geb-test0 kernel: esi: 00000000 edi: f7e86f00 ebp: f89303b4 esp: f6c0ce24 Jun 7 14:05:58 geb-test0 kernel: ds: 007b es: 007b ss: 0068 Jun 7 14:05:58 geb-test0 kernel: Process cfa631 (pid: 6397, threadinfo=f6c0c000 task=f700f9b0) Jun 7 14:05:58 geb-test0 kernel: Stack: f7f29e00 f892e267 f7f29e00 f6a68200 f7e86f00 f73c2800 f892e1fc 00000008 Jun 7 14:05:58 geb-test0 kernel: f7f29e00 00001cb2 f73c2800 f892fa76 00000000 f7e86f00 00000000 f374b000 Jun 7 14:05:58 geb-test0 kernel: f7e86f00 c2a8f940 f7f29e00 f892eac6 00000040 00000000 00000000 f6c0ce93 Jun 7 14:05:58 geb-test0 kernel: Call Trace: Jun 7 14:05:58 geb-test0 kernel: [<f892e267>] get_ftdi_divisor+0x19/0x25a [ftdi_sio] Jun 7 14:05:58 geb-test0 kernel: [<f892e1fc>] change_speed+0x2d/0x7f [ftdi_sio] Jun 7 14:05:58 geb-test0 kernel: [<f892fa76>] ftdi_set_termios+0x296/0x460 [ftdi_sio] Jun 7 14:05:58 geb-test0 kernel: [<f892eac6>] ftdi_open+0x91/0x16b [ftdi_sio] Jun 7 14:05:58 geb-test0 kernel: [<c016f730>] dput+0x34/0x1a7 Jun 7 14:05:58 geb-test0 kernel: [<c01678ea>] link_path_walk+0x94/0xbe Jun 7 14:05:58 geb-test0 kernel: [<f88cf2e5>] serial_open+0xa5/0xf9 [usbserial] Jun 7 14:05:58 geb-test0 kernel: [<c01ff270>] tty_open+0x189/0x2ad Jun 7 14:05:58 geb-test0 kernel: [<c0162aa9>] chrdev_open+0x14c/0x187 Jun 7 14:05:58 geb-test0 kernel: [<c015a6c9>] __dentry_open+0xb7/0x18f Jun 7 14:05:58 geb-test0 kernel: [<c015a5b0>] filp_open+0x5c/0x70 Jun 7 14:05:58 geb-test0 kernel: [<c01c3e92>] direct_strncpy_from_user+0x46/0x5d Jun 7 14:05:58 geb-test0 kernel: [<c015a8f5>] sys_open+0x31/0x7d Jun 7 14:05:58 geb-test0 kernel: [<c02d5ee3>] syscall_call+0x7/0xb Jun 7 14:05:58 geb-test0 kernel: Code: 10 74 1c 89 d1 83 e1 0f 74 0b 8d 41 0f 3b 05 7c d2 33 c0 76 08 80 e6 ef 89 53 08 eb 02 89 c1 5b 8b 04 8d 00 d2 33 c0 c3 53 89 c3 <8b> 40 70 e8 bd ff ff ff 3d 00 96 00 00 75 2a 83 bb 48 09 00 00 Jun 7 14:05:58 geb-test0 kernel: <0>Fatal exception: panic in 5 seconds Created attachment 156952 [details]
Test patch 1 - backport
The first OOPS seems like the usual problem with USB serial used as a console, without a tty corresponding to it. I cannot explain how this happens. Surely the control program cfa631 should open a normal tty... Regardless, it's easy to fix with a backport of a few NULL checks. Putting "stress" on USB serial is ill advised. It's somewhat fragile. Worse, fixing it may be difficult. We'll see what I can do, but at least rapid and overlapping open/close must be 100% supported. Greg, please test the kernel at this URL: http://people.redhat.com/zaitcev/ftp/243067/ Created attachment 157014 [details]
cfa631.c program which opens/closes USB serial
I ran my tests using the supplied 2.6.9-55.7.EL.bz243067.1smp kernel, and was unable to reproduce the crash(es) after running over 10 hours. Previously, I could crash the kernel after 5-10 minutes. This patch fixes the issue from my perspective. Thanks for testing. Unfortunately we just missed 4.6 submission deadline by 8 days. And in any case PM has to approve. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. My candidate patch #1 was rejected by the engineering review. It's still full of bugs, just plugs bigger holes. I'm working on a better quality fix. Updating PM score. Created attachment 319713 [details]
Test patch 3 - smarter backport
This version disposes with all the unnecessary tty!=NULL checking,
including those in ftdi. This is because in reality poisoning is not
used by upstream, and single stage disconnect with refcounts is employed.
The whole mechanism is covered with table_lock. In the patch, port_lock
is only an implementation detail: it avoids changing usb-serial's
structures, to permit module mix-and-match.
Greg, please test the kernel 2.6.9-78.13.EL.bz243067.2, same location: http://people.redhat.com/zaitcev/ftp/243067/ Sorry for the delay; had to recreate my torture test environment and it's been awhile... I reconfirmed the bug exists in the latest 4.7 kernel (2.6.9-78.0.5 SMP), and I can trigger a panic immediately after starting my test. After booting 2.6.9-78.13.EL.bz243067.2smp, my torture test runs fine--no panics occur. This patch fixes the issue from my perspective. Thanks, Greg. Posted now. Does it mean, that Conditional NAK is no longer valid for this bug? If yes, remove it from Developer Whiteboard, please. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Committed in 78.23.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ Patch is in -89.EL kernel. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html |