From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.1) Gecko/20040707 Description of problem: Unplugged a pl2303 driven usb to serial adapter from the system, and the kernel oops'd and paniced. The system was unresponsive and the floppy drive access light was running constantly. Version-Release number of selected component (if applicable): kernel-smp-2.4.21-20.EL How reproducible: Didn't try Steps to Reproduce: 1. start up system 2. start network ups tools monitoring ups's with a usb to serial adapter 3. unplug usb to serial adapter 4. kernel go byebye, system completely frozen. Expected Results: Not a panic. Additional info: mct_u232 6872 0 pl2303 13272 3 usbserial 21692 0 [mct_u232 pl2303] usb-ohci 23176 0 (unused) usbcore 80928 1 [mct_u232 pl2303 usbserial hid usb-ohci] usb.c: registered new driver usbdevfs usb.c: registered new driver hub usb-ohci.c: USB OHCI at membase 0xd08bd000, IRQ 21 usb-ohci.c: usb-00:0a.0, NEC Corporation USB usb.c: new USB bus registered, assigned bus number 1 hub.c: USB hub found usb-ohci.c: USB OHCI at membase 0xd08bf000, IRQ 20 usb-ohci.c: usb-00:0a.1, NEC Corporation USB (#2) usb.c: new USB bus registered, assigned bus number 2 hub.c: USB hub found ehci-hcd 00:0a.2: NEC Corporation USB 2.0 usb.c: new USB bus registered, assigned bus number 3 ehci-hcd 00:0a.2: USB 2.0 enabled, EHCI 0.95, driver 2003-Jan-22 hub.c: USB hub found usb.c: registered new driver hiddev usb.c: registered new driver hid hid-core.c: USB HID support drivers hub.c: new USB device 00:0f.2-1, assigned address 2 usb.c: USB device 2 (vend/prod 0x67b/0x2303) is not claimed by any active driver. usbserial.c: USB Serial support registered for Generic usbserial.c: USB Serial Driver core v1.4 usbserial.c: USB Serial support registered for PL-2303 usbserial.c: PL-2303 converter detected usbserial.c: PL-2303 converter now attached to ttyUSB0 (or usb/tts/0 for devfs)
Happened again on a completely different machine. On this machine it has two mct_u232 based adapters and no pl-2303. One of my co-workers took a screen shot of it and I have it up at http://user.dtcc.edu/~ctribo/stack_trace.jpg
Totally reproduceable. Just be using a USB to serial adapter, pull the plug on it and instant panic. another panic on yet another machine. http://user.dtcc.edu/~ctribo/ kernel_panic.jpg
Created attachment 104157 [details] Full dmesg full boot log
Created attachment 104588 [details] full lsmod output
OOPS on x86_64 upon opening minicom: (dual opteron 246, 4GB of ram, full update), same kernel release NMI Watchdog detected LOCKUP on cpu0, eip fffffff801a61b0, registers: CPU0 Pid:0, comm:swapper Not tainted RIP: 0010:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0307.1/0482.html old issue :( need major work if the usb-serial has not been backported from 2.6 series Tru
I tried to copy the usb-serial.h, pl2303.{h,c} from the 2.4.27 kernel sources (pl2303 v 0.10 instead of 0.9). The same problems occured (oops) as soon as a key is sent through minicom. 2.6.8 comes with version 0.11 but I don't feel good enough to make the backport. Tru
last addition for the day: no oops under 2.4.21-15.0.4.ELsmp minicom works fine Tru
This is very nasty bug in 2.4. It boils down to usbserial's disconnect attempting to force closes. This almost sounds reasonable, except that the this leaves all processes with open ports accessing memory which is not there. The 2.4.29 moves some NULL assignments around but all it does is that it makes oops a little less likely, and also allows to access freed memory. I have developed a comprehensive fix, but it is deemed too invasive for the Marcelo's tree. The core fix is to stop trying to force closes, and just mark ports as dead. Then actual closes by processes close ports. This is how it's done in 2.6 & RHEL4. I'm going to attach the patch for record and in case anyonw wants to test, but please keep in mind that it's not likely to go in. Just try not to disconnect open devices for now.
Created attachment 112112 [details] Big patch for refcounting and proper close() This is codenamed "Little Rework" in the community, in case of googling.
Created attachment 112187 [details] Candidate #2 - simple plug We may be able to plug this by simpler means. I suspect it remains racy, but it should work for most people.
Please download a test kernel 2.4.21-31.EL.usbserial.2 for a desirable architecture and run tests. ftp://people.redhat.com/zaitcev/rhel3usb/ This test employs the small plug fix.
Nothing after 2.4.21-20.EL works on my system. All the kernels panic not being able to find the root file system. Apparently the mdtools/raidtools drivers were updated to something that doesn't work with the software raid mirror that I set up in the installer. The raid scan doesn't turn up any raid units (there are three). I guess it doesn't like the superblockless raid setup.
I am dowloading 2.4.21-31.EL.usbserial.3, is it the latest/correct version to test? will this be included in the U5? Tru
Tru, U5 is already closed. Future fixes would go into U6 (or later).
I tried to patch the 32.0.1.EL sources with the usb patches from usbserial.3 but the resulting kernel (UP and SMP) on x86_64 still oops. To be precise, the UP reboot upon key pressed in minicom, while the SMP kernel oops. Tru
Created attachment 115181 [details] kernel.spec diff file to add the usbserial.3 patches into the current kernel source
Tru: your problem obviously has nothing to do with Chris', or, rather, nothing with the one he captured with kernel_panic.jpg, and what this bug is tracking. He also has another one in stack_trace.jpg, which _may_ be the same as yours. However, it was ill defined, there's no data to do anything about it. You should have filed your own bug long ago, and I should have told you so. I forgot, sorry. Anyway, this is a bug about oops in serial_write and not a bug about mysterious lockups upon closing. That said, since you're shuffling patches, why don't you give the "Little Rework" a try, as attached to this bug? It resolves all wrongs in the way closing is done in usbserial.
I have replaced the linux-2.4.21-usb-bug132994.patch by linux-2.4.21-usb-little_rework+msleep+msecs_to_jiffies.patch (from https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=112112 + added 2 missing functions) the smp kernel no longer oops upon keypressed inside minicom, but minicom does not seem to send/receive? anything strace minicom.pid: select(4, [0 3], NULL, NULL, {0, 760000}) = 0 (Timeout) ioctl(3, TIOCMGET, [TIOCM_DTR|TIOCM_RTS]) = 0 select(4, [0 3], NULL, NULL, {1, 0}) = 0 (Timeout) ioctl(3, TIOCMGET, [TIOCM_DTR|TIOCM_RTS]) = 0 select(4, [0 3], NULL, NULL, {1, 0}) = 0 (Timeout) ioctl(3, TIOCMGET, [TIOCM_DTR|TIOCM_RTS]) = 0 select(4, [0 3], NULL, NULL, {1, 0}) = 0 (Timeout) ioctl(3, TIOCMGET, [TIOCM_DTR|TIOCM_RTS]) = 0
Created attachment 115225 [details] little rework kernel.spec patch
Created attachment 115226 [details] inux-2.4.21-usb-little_rework+msleep+msecs_to_jiffies.patch
I should definitely open a new bugzilla number.... here is what I tried: myganne (with patched kernel) :/etc/inittab co:2345:respawn:/sbin/agetty ttyUSB0 9600 vt100 atsukau: minicom CTRL-A Z for help | 9600 8N1 | NOR | Minicom 2.00.0 | VT102 | Offline myganne ->agetty USB0 -> pl2303 -> serial cable -> minicom ttyS0 (atsukau) and the file containing the strace output from agetty from a blind typing session in minicom (nothing seems to be echoing inside minicom).
Created attachment 115229 [details] strace from agetty
I have filled a new bugzilla entry (bz#159862) for the ooops upon key pressed inside minicom attached to /dev/ttyUSB0 (pl2303) on current kernel 2.4.21-32.0.1.ELsmp
OK, we've got Tru his own bug 159862, so that's good. His difficulty is that he's trying to have getty driving the USB device, which has its own problems because of line discipline. Chris, I expect you to fix your volume management problems. The RHEL 3 kernel is definitely stable, it cannot have changes which break that. In any case, Support should be able to get your system running. Please let me know when you can test my kernels regarding the USB fix.
what category should I file a new bug under, raidtools, mdadm or kernel?
A fix for this problem has just been committed to the RHEL3 U8 patch pool this evening (in kernel version 2.4.21-40.8.EL).
A kernel has been released that contains a patch for this problem. Please verify if your problem is fixed with the latest available kernel from the RHEL3 public beta channel at rhn.redhat.com.
Reverting to ON_QA.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0437.html