Bug 623822

Summary: Trying to do PPP hangs my (USB) serial port
Product: [Fedora] Fedora Reporter: Chris Siebenmann <cks-rhbugzilla>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 13CC: anton, dougsland, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-28 14:39:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lsusb -v output
none
dmesg output
none
Test program to set the tty line discipline to PPP
none
Full dmesg output after running test program none

Description Chris Siebenmann 2010-08-12 21:19:33 UTC
Description of problem:

When I try to set up a PPP connection to my Fedora 13 server, pppd starts but
then the kernel on the server stops allowing anyone to talk to the serial port.
Attempts to do so hang, logging kernel messages about 'task <whatever> blocked
for more than 120 seconds. When this happens each such process becomes
unkillable (even with 'kill -9') and increases the load average.

The serial port is a USB serial port, a Prolific Technology, Inc. PL2303.

The machine is a SMP x86_64 machine with 4 GB of RAM (and no onboard
serial ports, hence why I have to use a USB serial port).

This fails under the current Fedora 13 kernel, the most recent Fedora 12
kernel (on Fedora 13), and a self-compiled 2.6.35 kernel from kernel.org.
It worked when this machine was running Fedora 11, including the most
recent errata kernel there (2.6.30.10-105.2.23.fc11.x86_64).

When this happens, the only thing that pppd syslogs on the machine is

  Aug  9 22:57:16 hawkwind pppd[23607]: pppd 2.4.5 started by ppphawkl, uid 0

With no onboard serial port I can't test if this is any sort of general
serial + PPP failure.  pppd works in general on this server under Fedora
13; I have successfully brought up a PPP session to the server by ssh'ing
in from another machine with a serial port and modem. 

A basic test of logging in over the serial connection as a plain terminal
session and trying to do general terminal things succeeds, but I have not
tried sending any significant amount of text in either direction.

This is completely reproducible.

Version-Release number of selected component (if applicable):

kernel-2.6.33.6-147.2.4.fc13.x86_64
ppp-2.4.5-9.fc13.x86_64

The first kernel 'task blocked' message syslogged is:

Aug  9 22:57:16 hawkwind kernel: PPP generic driver version 2.4.2
Aug  9 22:59:57 hawkwind kernel: INFO: task pppd:23607 blocked for more than 120 seconds.
Aug  9 22:59:57 hawkwind kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  9 22:59:57 hawkwind kernel: pppd          D ffff880005a157c0     0 23607  23551 0x00000080
Aug  9 22:59:57 hawkwind kernel: ffff88000add1da8 0000000000000082 ffffffff81b871e0 ffff88011702c928
Aug  9 22:59:57 hawkwind kernel: ffff88000add1d08 ffffffff81282b45 ffff88000add1d78 ffff88000add1fd8
Aug  9 22:59:57 hawkwind kernel: ffff88000add1fd8 000000000000f9b0 00000000000157c0 ffff880037ad03c8
Aug  9 22:59:57 hawkwind kernel: Call Trace:
Aug  9 22:59:57 hawkwind kernel: [<ffffffff81282b45>] ? tty_chars_in_buffer+0x15/0x17
Aug  9 22:59:57 hawkwind kernel: [<ffffffff81064431>] ? prepare_to_wait+0x67/0x74
Aug  9 22:59:57 hawkwind kernel: [<ffffffff81284249>] tty_ldisc_ref_wait+0x6f/0x91
Aug  9 22:59:57 hawkwind kernel: [<ffffffff810641bf>] ? autoremove_wake_function+0x0/0x34
Aug  9 22:59:57 hawkwind kernel: [<ffffffff8127ebea>] tty_ioctl+0x76c/0x7c0
Aug  9 22:59:57 hawkwind kernel: [<ffffffff8110d8cf>] vfs_ioctl+0x2d/0xa1
Aug  9 22:59:57 hawkwind kernel: [<ffffffff8110de38>] do_vfs_ioctl+0x47e/0x4c4
Aug  9 22:59:57 hawkwind kernel: [<ffffffff8110247e>] ? __fput+0x1c8/0x1d7
Aug  9 22:59:57 hawkwind kernel: [<ffffffff8110decf>] sys_ioctl+0x51/0x74
Aug  9 22:59:57 hawkwind kernel: [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b

Eventually things escalate to:

Aug  9 23:08:29 hawkwind kernel: ------------[ cut here ]------------
Aug  9 23:08:29 hawkwind kernel: WARNING: at drivers/char/tty_io.c:1278 tty_open+0x2ac/0x50d()
Aug  9 23:08:29 hawkwind kernel: Hardware name: System Product Name
Aug  9 23:08:29 hawkwind kernel: Modules linked in: ppp_async crc_ccitt ppp_generic slhc usb_storage fuse sunrpc hwmon_vid k8temp eeprom p4_clockmod freq_table speedstep_lib sit tunnel4 ip_gre authenc esp4 xfrm4_mode_tunnel deflate zlib_deflate ctr twofish_x86_64 twofish_common camellia serpent blowfish cast5 des_generic cbc aes_x86_64 aes_generic xcbc rmd160 sha512_generic sha256_generic crypto_null af_key xt_recent ipt_LOG 8021q garp stp llc e1000 ipv6 ext2 kvm_intel kvm uinput snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep pl2303 asus_atk0110 usbserial i2c_nforce2 snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc forcedeth microcode raid1 firewire_ohci firewire_core ata_generic crc_itu_t pata_amd pata_acpi sata_nv radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: freq_table]
Aug  9 23:08:29 hawkwind kernel: Pid: 24017, comm: mgetty Not tainted 2.6.33.6-147.2.4.fc13.x86_64 #1
Aug  9 23:08:29 hawkwind kernel: Call Trace:
Aug  9 23:08:29 hawkwind kernel: [<ffffffff8104aecc>] warn_slowpath_common+0x77/0x8f
Aug  9 23:08:29 hawkwind kernel: [<ffffffff8104aef3>] warn_slowpath_null+0xf/0x11
Aug  9 23:08:29 hawkwind kernel: [<ffffffff8127f9b6>] tty_open+0x2ac/0x50d
Aug  9 23:08:29 hawkwind kernel: [<ffffffff81103ece>] chrdev_open+0x11d/0x13e
Aug  9 23:08:29 hawkwind kernel: [<ffffffff81103db1>] ? chrdev_open+0x0/0x13e
Aug  9 23:08:29 hawkwind kernel: [<ffffffff810ffa10>] __dentry_open+0x18d/0x2c4
Aug  9 23:08:29 hawkwind kernel: [<ffffffff811be559>] ? security_inode_permission+0x1c/0x1e
Aug  9 23:08:29 hawkwind kernel: [<ffffffff810ffc0c>] nameidata_to_filp+0x3a/0x4b
Aug  9 23:08:29 hawkwind kernel: [<ffffffff8110c64a>] do_filp_open+0x571/0xad5
Aug  9 23:08:29 hawkwind kernel: [<ffffffff810673b6>] ? hrtimer_try_to_cancel+0x3c/0x46
Aug  9 23:08:29 hawkwind kernel: [<ffffffff8120396d>] ? might_fault+0x1c/0x1e
Aug  9 23:08:29 hawkwind kernel: [<ffffffff81114fdd>] ? alloc_fd+0x76/0x11f
Aug  9 23:08:29 hawkwind kernel: [<ffffffff810ff79a>] do_sys_open+0x5e/0x10a
Aug  9 23:08:29 hawkwind kernel: [<ffffffff810ff86f>] sys_open+0x1b/0x1d
Aug  9 23:08:29 hawkwind kernel: [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Aug  9 23:08:29 hawkwind kernel: ---[ end trace 5a438672021fa397 ]---

I will attach dmesg output (for a reboot that has not yet had this happen
to it) and 'lsusb -v' output to this bug report.

Comment 1 Chris Siebenmann 2010-08-12 21:20:21 UTC
Created attachment 438531 [details]
lsusb -v output

Comment 2 Chris Siebenmann 2010-08-12 21:23:45 UTC
Created attachment 438533 [details]
dmesg output

This is the output of 'dmesg' from the Fedora 13 kernel on
this machine. It does not have any messages about the bug
because I have not yet had the opportunity to go home and
try to bring up the PPP link.

Comment 3 Chris Siebenmann 2010-08-12 22:29:08 UTC
I believe that the correct summary of this bug may be 'attempting to set the line
discipline to PPP on a USB serial port hangs the process'. Acting on a hunch I
wrote a test program that did only this, and I can reliably produce hangs by
running it when logged in on a USB serial connection on my machine. The
kernel stack trace in the hang is not identical to the pppd stack trace,
though, although both go through tty_ldisc_ref_wait().

As a result of this I have also determined that processes appear to un-hang
again if I unplug the USB serial port.

My test program appears to run without problems on a x86 uniprocessor Fedora
13 machine (a laptop I have handy).

I will attach the test program and the dmesg output after I ran it on my
server and got the process hang.

Comment 4 Chris Siebenmann 2010-08-12 22:31:37 UTC
Created attachment 438550 [details]
Test program to set the tty line discipline to PPP

Comment 5 Chris Siebenmann 2010-08-12 22:33:32 UTC
Created attachment 438552 [details]
Full dmesg output after running test program

This is the full dmesg output from my machine after running the set-line-disc
test program while logged in on a USB serial connection. This resulted in a
hung process and a kernel message almost identical to my pppd case.

Comment 6 Bug Zapper 2011-06-01 11:33:41 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Bug Zapper 2011-06-28 14:39:16 UTC
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.