Bug 132994 - kernel oops when unplugging usb serial adapter using pl2303 and mct_u232
Summary: kernel oops when unplugging usb serial adapter using pl2303 and mct_u232
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Pete Zaitcev
QA Contact:
URL: http://user.dtcc.edu/~ctribo/stack_tr...
Whiteboard:
Depends On:
Blocks: RHEL3U8CanFix
TreeView+ depends on / blocked
 
Reported: 2004-09-20 18:38 UTC by Chris Tribo
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version: RHSA-2006-0437
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-07-20 13:16:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Full dmesg (13.87 KB, text/plain)
2004-09-22 23:02 UTC, Chris Tribo
no flags Details
full lsmod output (1.68 KB, text/plain)
2004-09-30 14:52 UTC, Chris Tribo
no flags Details
Big patch for refcounting and proper close() (11.04 KB, patch)
2005-03-17 22:41 UTC, Pete Zaitcev
no flags Details | Diff
Candidate #2 - simple plug (628 bytes, patch)
2005-03-21 19:04 UTC, Pete Zaitcev
no flags Details | Diff
kernel.spec diff file to add the usbserial.3 patches into the current kernel source (2.11 KB, patch)
2005-06-07 11:49 UTC, Tru Huynh
no flags Details | Diff
little rework kernel.spec patch (2.46 KB, patch)
2005-06-08 16:31 UTC, Tru Huynh
no flags Details | Diff
inux-2.4.21-usb-little_rework+msleep+msecs_to_jiffies.patch (10.95 KB, patch)
2005-06-08 16:32 UTC, Tru Huynh
no flags Details | Diff
strace from agetty (105.07 KB, text/plain)
2005-06-08 17:29 UTC, Tru Huynh
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0437 0 normal SHIPPED_LIVE Important: Updated kernel packages for Red Hat Enterprise Linux 3 Update 8 2006-07-20 13:11:00 UTC

Description Chris Tribo 2004-09-20 18:38:11 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.1)
Gecko/20040707

Description of problem:
Unplugged a pl2303 driven usb to serial adapter from the system, and
the kernel oops'd and paniced. The system was unresponsive and the
floppy drive access light was running constantly.

Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-20.EL

How reproducible:
Didn't try

Steps to Reproduce:
1. start up system
2. start network ups tools monitoring ups's with a usb to serial adapter
3. unplug usb to serial adapter
4. kernel go byebye, system completely frozen.

Expected Results:  Not a panic.

Additional info:

mct_u232                6872   0
pl2303                 13272   3
usbserial              21692   0 [mct_u232 pl2303]
usb-ohci               23176   0 (unused)
usbcore                80928   1 [mct_u232 pl2303 usbserial hid usb-ohci]

usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
usb-ohci.c: USB OHCI at membase 0xd08bd000, IRQ 21
usb-ohci.c: usb-00:0a.0, NEC Corporation USB
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
usb-ohci.c: USB OHCI at membase 0xd08bf000, IRQ 20
usb-ohci.c: usb-00:0a.1, NEC Corporation USB (#2)
usb.c: new USB bus registered, assigned bus number 2
hub.c: USB hub found
ehci-hcd 00:0a.2: NEC Corporation USB 2.0
usb.c: new USB bus registered, assigned bus number 3
ehci-hcd 00:0a.2: USB 2.0 enabled, EHCI 0.95, driver 2003-Jan-22
hub.c: USB hub found
usb.c: registered new driver hiddev
usb.c: registered new driver hid
hid-core.c: USB HID support drivers
hub.c: new USB device 00:0f.2-1, assigned address 2
usb.c: USB device 2 (vend/prod 0x67b/0x2303) is not claimed by any
active driver.
usbserial.c: USB Serial support registered for Generic
usbserial.c: USB Serial Driver core v1.4
usbserial.c: USB Serial support registered for PL-2303
usbserial.c: PL-2303 converter detected
usbserial.c: PL-2303 converter now attached to ttyUSB0 (or usb/tts/0
for devfs)

Comment 1 Chris Tribo 2004-09-20 22:38:55 UTC
Happened again on a completely different machine. On this machine it has two mct_u232 
based adapters and no pl-2303. One of my co-workers took a screen shot of it and I have 
it up at http://user.dtcc.edu/~ctribo/stack_trace.jpg

Comment 2 Chris Tribo 2004-09-22 22:55:35 UTC
Totally reproduceable. Just be using a USB to serial adapter, pull the plug on it and instant 
panic. another panic on yet another machine. http://user.dtcc.edu/~ctribo/
kernel_panic.jpg

Comment 3 Chris Tribo 2004-09-22 23:02:25 UTC
Created attachment 104157 [details]
Full dmesg

full boot log

Comment 4 Chris Tribo 2004-09-30 14:52:31 UTC
Created attachment 104588 [details]
full lsmod output

Comment 5 Tru Huynh 2004-11-18 14:47:57 UTC
OOPS on x86_64 upon opening minicom:
(dual opteron 246, 4GB of ram, full update), same kernel release

NMI Watchdog detected LOCKUP on cpu0, eip fffffff801a61b0, registers:
CPU0
Pid:0, comm:swapper Not tainted
RIP: 0010:

Comment 6 Tru Huynh 2004-11-18 15:54:16 UTC
http://www.uwsg.iu.edu/hypermail/linux/kernel/0307.1/0482.html

old issue :( need major work if the usb-serial has not been backported
from 2.6 series

Tru

Comment 7 Tru Huynh 2004-11-18 16:05:23 UTC
I tried to copy the usb-serial.h, pl2303.{h,c} from the 2.4.27 kernel
sources (pl2303 v 0.10 instead of 0.9).

The same problems occured (oops) as soon as a key is sent through minicom.

2.6.8 comes with version 0.11 but I don't feel good enough to make the
backport.

Tru

Comment 8 Tru Huynh 2004-11-18 16:34:49 UTC
last addition for the day: no oops under 2.4.21-15.0.4.ELsmp

minicom works fine

Tru

Comment 9 Pete Zaitcev 2005-03-17 22:17:36 UTC
This is very nasty bug in 2.4. It boils down to usbserial's disconnect
attempting to force closes. This almost sounds reasonable, except that
the this leaves all processes with open ports accessing memory which
is not there. The 2.4.29 moves some NULL assignments around but all it
does is that it makes oops a little less likely, and also allows to
access freed memory.

I have developed a comprehensive fix, but it is deemed too invasive
for the Marcelo's tree. The core fix is to stop trying to force closes,
and just mark ports as dead. Then actual closes by processes close ports.
This is how it's done in 2.6 & RHEL4.

I'm going to attach the patch for record and in case anyonw wants to
test, but please keep in mind that it's not likely to go in. Just try
not to disconnect open devices for now.


Comment 10 Pete Zaitcev 2005-03-17 22:41:34 UTC
Created attachment 112112 [details]
Big patch for refcounting and proper close()

This is codenamed "Little Rework" in the community, in case of googling.

Comment 11 Pete Zaitcev 2005-03-21 19:04:21 UTC
Created attachment 112187 [details]
Candidate #2 - simple plug

We may be able to plug this by simpler means. I suspect it remains racy,
but it should work for most people.

Comment 12 Pete Zaitcev 2005-03-21 23:53:45 UTC
Please download a test kernel 2.4.21-31.EL.usbserial.2 for a desirable
architecture and run tests.
 ftp://people.redhat.com/zaitcev/rhel3usb/
This test employs the small plug fix.


Comment 13 Chris Tribo 2005-03-22 00:17:29 UTC
Nothing after 2.4.21-20.EL works on my system. All the kernels panic not being able to find the root 
file system. Apparently the mdtools/raidtools drivers were updated to something that doesn't work with 
the software raid mirror that I set up in the installer. The raid scan doesn't turn up any raid units (there 
are three). I guess it doesn't like the superblockless raid setup.

Comment 14 Tru Huynh 2005-04-19 15:51:33 UTC
I am dowloading 2.4.21-31.EL.usbserial.3, is it the latest/correct version to test? 

will this be included in the U5?

Tru

Comment 15 Ernie Petrides 2005-04-19 21:37:06 UTC
Tru, U5 is already closed.  Future fixes would go into U6 (or later).

Comment 16 Tru Huynh 2005-06-07 11:47:55 UTC
I tried to patch the 32.0.1.EL sources with the usb patches from usbserial.3 but
the resulting kernel (UP and SMP) on x86_64 still oops. To be precise, the UP
reboot upon key pressed in minicom, while the SMP kernel oops.

Tru


Comment 17 Tru Huynh 2005-06-07 11:49:30 UTC
Created attachment 115181 [details]
kernel.spec diff file to add the usbserial.3 patches into the current kernel source

Comment 18 Pete Zaitcev 2005-06-07 17:57:58 UTC
Tru: your problem obviously has nothing to do with Chris', or, rather, nothing
with the one he captured with kernel_panic.jpg, and what this bug is tracking.

He also has another one in stack_trace.jpg, which _may_ be the same as yours.
However, it was ill defined, there's no data to do anything about it. You should
have filed your own bug long ago, and I should have told you so. I forgot, sorry.
Anyway, this is a bug about oops in serial_write and not a bug about mysterious
lockups upon closing.

That said, since you're shuffling patches, why don't you give the "Little
Rework" a try, as attached to this bug? It resolves all wrongs in the
way closing is done in usbserial.


Comment 19 Tru Huynh 2005-06-08 16:29:58 UTC
I have replaced the linux-2.4.21-usb-bug132994.patch by
linux-2.4.21-usb-little_rework+msleep+msecs_to_jiffies.patch
(from https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=112112
+ added 2 missing functions)

the smp kernel no longer oops upon keypressed inside minicom, but
minicom does not seem to send/receive? anything

strace minicom.pid:

select(4, [0 3], NULL, NULL, {0, 760000}) = 0 (Timeout)
ioctl(3, TIOCMGET, [TIOCM_DTR|TIOCM_RTS]) = 0
select(4, [0 3], NULL, NULL, {1, 0})    = 0 (Timeout)
ioctl(3, TIOCMGET, [TIOCM_DTR|TIOCM_RTS]) = 0
select(4, [0 3], NULL, NULL, {1, 0})    = 0 (Timeout)
ioctl(3, TIOCMGET, [TIOCM_DTR|TIOCM_RTS]) = 0
select(4, [0 3], NULL, NULL, {1, 0})    = 0 (Timeout)
ioctl(3, TIOCMGET, [TIOCM_DTR|TIOCM_RTS]) = 0


Comment 20 Tru Huynh 2005-06-08 16:31:10 UTC
Created attachment 115225 [details]
little rework kernel.spec patch

Comment 21 Tru Huynh 2005-06-08 16:32:22 UTC
Created attachment 115226 [details]
inux-2.4.21-usb-little_rework+msleep+msecs_to_jiffies.patch

Comment 22 Tru Huynh 2005-06-08 17:28:22 UTC
I should definitely open a new bugzilla number....

here is what I tried:

myganne (with patched kernel) :/etc/inittab
co:2345:respawn:/sbin/agetty ttyUSB0 9600 vt100

atsukau: minicom
 CTRL-A Z for help |  9600 8N1 | NOR | Minicom 2.00.0 | VT102 |      Offline

myganne ->agetty USB0 -> pl2303 -> serial cable -> minicom ttyS0 (atsukau)

and the file containing the strace output from agetty from a blind typing
session in minicom (nothing seems to be echoing inside minicom).



Comment 23 Tru Huynh 2005-06-08 17:29:14 UTC
Created attachment 115229 [details]
strace from agetty

Comment 24 Tru Huynh 2005-06-08 17:59:40 UTC
I have filled a new bugzilla entry (bz#159862) for the ooops upon key pressed
inside minicom attached to /dev/ttyUSB0 (pl2303) on current kernel
2.4.21-32.0.1.ELsmp

Comment 25 Pete Zaitcev 2005-06-28 17:25:25 UTC
OK, we've got Tru his own bug 159862, so that's good. His difficulty
is that he's trying to have getty driving the USB device, which has
its own problems because of line discipline.

Chris, I expect you to fix your volume management problems.
The RHEL 3 kernel is definitely stable, it cannot have changes which
break that. In any case, Support should be able to get your system running.
Please let me know when you can test my kernels regarding the USB fix.


Comment 26 Chris Tribo 2005-06-28 18:02:56 UTC
what category should I file a new bug under, raidtools, mdadm or kernel?

Comment 28 Ernie Petrides 2006-04-21 04:06:39 UTC
A fix for this problem has just been committed to the RHEL3 U8
patch pool this evening (in kernel version 2.4.21-40.8.EL).


Comment 30 Joshua Giles 2006-05-30 15:21:43 UTC
A kernel has been released that contains a patch for this problem.  Please
verify if your problem is fixed with the latest available kernel from the RHEL3
public beta channel at rhn.redhat.com.

Comment 31 Ernie Petrides 2006-05-30 20:23:52 UTC
Reverting to ON_QA.

Comment 33 Red Hat Bugzilla 2006-07-20 13:16:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0437.html



Note You need to log in before you can comment on or make changes to this bug.