Bug 590851 - [RHEL 6.0] Serial console hangs after entering username
[RHEL 6.0] Serial console hangs after entering username
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.0
All Linux
high Severity high
: beta
: 6.0
Assigned To: John Villalovos
Red Hat Kernel QE team
: Regression, Reopened, TestBlocker
: 595494 (view as bug list)
Depends On:
Blocks: 551128
  Show dependency treegraph
 
Reported: 2010-05-10 15:30 EDT by John Villalovos
Modified: 2015-05-08 09:59 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-11 10:54:34 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/messages after sysrq-t (354.69 KB, text/plain)
2010-05-20 00:03 EDT, Evan McNabb
no flags Details
Initial patch to fix issue (3.03 KB, patch)
2010-05-24 21:36 EDT, John Villalovos
no flags Details | Diff
Updated patch to fix issue. (5.16 KB, patch)
2010-05-25 10:28 EDT, John Villalovos
no flags Details | Diff

  None (edit)
Description John Villalovos 2010-05-10 15:30:20 EDT
On the system:
https://beaker.engineering.redhat.com/view/intel-s3e36-02.lab.bos.redhat.com

I install RHEL 5.5 and I can login via the serial console.

Installing RHEL 6.0 Snapshot 2, I can see all the output via the serial console, but when I go to login the serial console will become unresponsive after entering the username:

---------------------------------------------
<snip>
Starting sshd: [  OK  ]
ntpdate: Synchronizing with time server: [  OK  ]
Starting ntpd: [  OK  ]
Starting postfix: [  OK  ]
Starting abrt daemon: [  OK  ]
[  OK  ] crond: [  OK  ]
[  OK  ] atd: [  OK  ]
Starting anamon: [  OK  ]

Red Hat Enterprise Linux Server release 6.0 Beta (Santiago)
Kernel 2.6.32-23.el6.x86_64 on an x86_64

intel-s3e36-02.lab.bos.redhat.com login: root
<cursor>
(No output after that)
---------------------------------------------

I can still login via SSH though.  And if I reboot the system output will start appearing on the serial console.

Though doing an: echo hello > /dev/ttyS0
has no result.

# ps axww | grep getty
 7850 tty2     Ss+    0:00 /sbin/mingetty /dev/tty2
 7853 tty3     Ss+    0:00 /sbin/mingetty /dev/tty3
 7855 tty4     Ss+    0:00 /sbin/mingetty /dev/tty4
 7857 tty5     Ss+    0:00 /sbin/mingetty /dev/tty5
 7859 tty6     Ss+    0:00 /sbin/mingetty /dev/tty6
 8106 ttyS0    Ss+    0:00 /sbin/agetty /dev/ttyS0 115200 vt100-nav
Comment 1 Karel Zak 2010-05-10 16:25:44 EDT
Please, check /var/log/messages and dmesg output for some errors/warning.
Comment 3 RHEL Product and Program Management 2010-05-10 17:42:52 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 4 Jane Lv 2010-05-11 03:26:48 EDT
I reproduced the same issue with John on my local 3e36 platform.

On visible error/warning message from dmesg.  I found this in /var/log/messages,

May 11 15:13:59 snapshot-2 init: serial (ttyS0) main process ended, respawning

BTW, SElinux is disabled on the system.
Comment 9 Karel Zak 2010-05-17 05:50:48 EDT

*** This bug has been marked as a duplicate of bug 568418 ***
Comment 10 Evan McNabb 2010-05-18 16:56:47 EDT
I'm also seeing this exact behavior on a s3e36 system (intel-s3e36-01.rhts.eng.rdu.redhat.com). This may be related to BZ 568418 but I don't believe it is a duplicate since there is no panic involved - just the login hangs. And if I pull the kernel patch from 568418 it panics as expected.

I currently have RHEL6.0-Snapshot-4_nfs-Server with the following installed:
kernel-2.6.32-26.el6.x86_64
dracut-004-19.el6.noarch
util-linux-ng-2.17.2-2.el6.x86_64
mingetty-1.08-4.1.el6.x86_64
upstart-0.6.5-5.el6.x86_64
plymouth-0.8.3-1.el6.x86_64
initscripts-9.03.7-1.el6.x86_64

I've booted with rd_NO_PLYMOUTH but the behavior does not change.

I'm going to reopen this BZ since this issue seems to be specific to intel-s3e36 (Emerald Ridge) systems. Let me know if you'd like access to one.
Comment 11 Evan McNabb 2010-05-19 10:39:58 EDT
Bumping the sev/pri to high/high since this is blocking testing on Boxboro-EX.
Comment 12 Evan McNabb 2010-05-19 12:21:49 EDT
I installed Beta1 where login works, and then upgraded the kernel to 2.6.32-27. Logging in still works so I suspect this isn't a kernel issue, or at least not purely kernel.
Comment 14 Karel Zak 2010-05-19 17:40:27 EDT
(In reply to comment #12)
> I installed Beta1 where login works, and then upgraded the kernel to 2.6.32-27.
> Logging in still works so I suspect this isn't a kernel issue, or at least not
> purely kernel.    

Cool, now we have to found where is the problem. I guess you have util-linux-ng 2.17.1.

Please, update to util-linux-ng 2.17.2 and try to again reproduce the problem (with kernel 2.6.32-27). And then try it again with updated upstart.

BTW, I have real doubts that the problem is in login or agetty. There is not a relevant change in the code between v2.17.1 and v2.17.2.
Comment 16 Ray Strode [halfline] 2010-05-19 22:22:22 EDT
Evan, there was a problem with plymouth-0.8.3-1.el6.x86_64 's tty handling that could very well be related.

I know you said you've been able to reproduce with rd_NO_PLYMOUTH on the kernel command line, which strongly suggests its not a plymouth issue, but just to remove all doubt, would you mind upgrading to plymouth-0.8.3-3.el6 (and rebuilding your initrd) and verify that the problem still happens?
Comment 17 Evan McNabb 2010-05-19 22:47:42 EDT
Hi Karel, Ray,

Here's some data that might help. Looks like it's possibly a kernel-2.6.32-XX + upstart-0.6.5-5 issue?

Login succeeds:
kernel-2.6.32-27.el6.x86_64
util-linux-ng-2.17.1-1.el6.x86_64
upstart-0.6.3-5.el6.x86_64

Login fails:
kernel-2.6.32-27.el6.x86_64
util-linux-ng-2.17.1-1.el6.x86_64
upstart-0.6.5-5.el6.x86_64

Login succeeds:
kernel-2.6.34 (custom compile)
util-linux-ng-2.17.1-1.el6.x86_64
upstart-0.6.5-5.el6.x86_64
Comment 18 Ray Strode [halfline] 2010-05-19 23:02:55 EDT
So I think there was some confusion in bug 568418.  It sounds like the initially reported problem exactly matches your symptoms.

Then bug 586418 comment 18 mentioned a potentially independent kernel panic that Neil fixed, but it could be the original problem still remains.

In that bug, Neil asked for sysrq-t ouput.  Can you get that?
Comment 19 Evan McNabb 2010-05-20 00:03:26 EDT
Created attachment 415307 [details]
/var/log/messages after sysrq-t
Comment 20 Ray Strode [halfline] 2010-05-20 00:17:55 EDT
Thanks, another thing that would be useful is the output of

sudo stty --file=/dev/ttyS0 -a

when the problem is happening
Comment 22 Neil Horman 2010-05-20 10:13:00 EDT
Hmm, I don't see anything thats clearly deadlocked in this setup.  I do notice however that this system is running the realtime policy kit daemon, which does lots of priority tuning on running processes, which might be triggering some sort of priority inversion issue.  Does the problem still occur if the rtkit-daemon is disabled?
Comment 23 Ray Strode [halfline] 2010-05-20 10:23:18 EDT
that daemon is probably activated by pulseaudio which is started by gdm.  The easiest way to disable it is probably to add " 3 " to the kernel command line to force the system to boot into runlevel 3 instead of runlevel 5.
Comment 24 Evan McNabb 2010-05-20 10:58:04 EDT
(In reply to comment #20)
> Thanks, another thing that would be useful is the output of
> 
> sudo stty --file=/dev/ttyS0 -a
> 
> when the problem is happening    

# stty --file=/dev/ttyS0 -a
speed 115200 baud; rows 0; columns 0; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>;
start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 hupcl -cstopb cread -clocal -crtscts cdtrdsr
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr -icrnl -ixon -ixoff -iuclc -ixany -imaxbel
-iutf8
-opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
-isig -icanon -iexten -echo -echoe -echok -echonl -noflsh -xcase -tostop -echoprt -echoctl -echoke

It also hangs in runlevel 3 with rtkit disabled. Let me know if one of you would like access to an Emerald Ridge system.
Comment 25 John Villalovos 2010-05-20 15:24:00 EDT
Working with Ray Strode.

We have eliminated Plymouth as a culprit.  We booted up with the following on the kernel command line:
plymouth:debug rd_NO_PLYMOUTH rd_NOPLYMOUTH

The Plymouth daemon was not running and we saw error messages (as expected) about trying to contact the plymouth daemon.

Doing: stty sane --file=/dev/ttyS0     did not make any difference.

Before we attempt to login we saw this for: stty -a --file=/dev/ttyS0:
speed 115200 baud; rows 0; columns 0; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^X; eof = ^D; eol = <undef>;
eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 hupcl -cstopb cread -clocal -crtscts cdtrdsr
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr -icrnl -ixon -ixoff
-iuclc -ixany -imaxbel -iutf8
-opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
-isig -icanon -iexten -echo -echoe -echok -echonl -noflsh -xcase -tostop -echoprt
-echoctl -echoke

After entering a user name we see this for stty -a --file=/dev/ttyS0:
speed 115200 baud; rows 0; columns 0; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>;
eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 hupcl -cstopb cread -clocal -crtscts cdtrdsr
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon ixoff
-iuclc -ixany -imaxbel -iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon -iexten -echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
-echoctl echoke

# diff -u before-login-stty-a.txt after-login-stty-a.txt 
--- before-login-stty-a.txt	2010-05-20 14:39:36.407917454 -0400
+++ after-login-stty-a.txt	2010-05-20 14:40:29.371839805 -0400
@@ -1,10 +1,10 @@
 speed 115200 baud; rows 0; columns 0; line = 0;
-intr = ^C; quit = ^\; erase = ^?; kill = ^X; eof = ^D; eol = <undef>;
+intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>;
 eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
 werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
 -parenb -parodd cs8 hupcl -cstopb cread -clocal -crtscts cdtrdsr
--ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr -icrnl -ixon -ixoff
+-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon ixoff
 -iuclc -ixany -imaxbel -iutf8
--opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
--isig -icanon -iexten -echo -echoe -echok -echonl -noflsh -xcase -tostop -echoprt
--echoctl -echoke
+opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
+isig icanon -iexten -echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
+-echoctl echoke

Not sure if that helps :)
Comment 26 John Villalovos 2010-05-20 15:55:14 EDT
Backing out the following commit fixed the issue:

commit 954287e8662d722b046c163d4d6a0a441c889274
Author: Mauro Carvalho Chehab <mchehab@redhat.com>
Date:   Wed Jan 6 12:44:02 2010 -0500

    [serial] 8250: add support for DTR/DSR hardware flow control
    
    Message-id: <4B448592.7030803@redhat.com>
    Patchwork-id: 22323
    O-Subject: [RHEL6] BZ#523848: 8250: add support for DTR/DSR hardware flow control
    Bugzilla: 523848
    RH-Acked-by: Aristeu Rozanski <aris@redhat.com>
    
    Backports a RHEL5 patch from: Aristeu Rozanski <arozansk@redhat.com>
    This patch is needed to support a certain serial printer that has small buffer
    and needs DTR/DSR flow control in order to work.
    
    v1: patch ported to RHEL6 Alpha3
    v2: patch ported to RHEL6 kernel-2.6.32-4.el6
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
    
    I don't have such hardware, so I can't test the patch to see if the printer is
    properly working.
    
    Aris,
    Could you please test it with the serial printer? I think you have access to one
    of those printers at the office.
    
    After having it properly tested, I'll try to re-submit it again upstream.
    
    ---
    
    Patch's original comments:
    
    Patch sent originally on Aug, 8 2008:
        http://lwn.net/Articles/293523/
    
    This patch adds support for DTR/DSR hardware flow control on 8250 driver on x86
    machines. It's done by adding a CDTRDSR flag to work just like CRTSCTS, which
    is not done on other architectures on purpose (so each maintainer can allocate
    it).
    
    This patch was tested with success with a serial printer configured with a
    small buffer and DTR/DSR flow control.
    
    This is based on the work of Michael Westermann
    (http://lkml.org/lkml/2007/8/31/133)
Comment 27 John Villalovos 2010-05-20 15:57:13 EDT
Mauro,

I'm giving this to you because your change appears to be causing this issue.
Comment 28 John Villalovos 2010-05-24 21:36:40 EDT
Created attachment 416255 [details]
Initial patch to fix issue
Comment 29 John Villalovos 2010-05-25 10:17:45 EDT
*** Bug 595494 has been marked as a duplicate of this bug. ***
Comment 30 John Villalovos 2010-05-25 10:26:14 EDT
The issue occurred because a the previous RHEL 6 commit
(954287e8662d722b046c163d4d6a0a441c889274) had a collision with bits used in
the serial port flag field.  UFI_DSR_FLOW was using bit 25 to signify Data
Terminal Ready (DTR) / Data Set Ready (DSR) flow control usage but bit 25 was
already being used for ASYNCB_CHECK_CD/ASYNC_CHECK_CD (check Data Carrier Detect).

The check for CLOCAL (ignore modem control lines) occurred after the check for
DTR/DSR.  If the CLOCAL flag is not set it would then set the ASYNCB_CHECK_CD
bit (bit 25) in the serial port flags.  indicating that DTR/DSR was being used
for hardware flow control, even if no hardware flow control had been enabled.

You would then experience this issue on systems where the DSR signal was not
asserted, even though no hardware flow control was enabled.  Not all systems
would experience this issue because it depended how you had the serial port
connected and if you had the DSR signal asserted.

The patch fixes this by moving the DTR/DSR bit to bit 21 in the port flags.  It
also renames UFI_DSR_FLOW to ASYNC_DSR_FLOW/ASYNCB_DSR_FLOW to match the usage
in the 2.6.32 kernel and moves it from include/linux/serial_core.h to
include/linux/serial.h

Thanks to Mauro Carvalho Chehab for all of his assistance in helping me
troubleshoot this issue!
Comment 31 John Villalovos 2010-05-25 10:28:18 EDT
Created attachment 416417 [details]
Updated patch to fix issue.

Updated patch to fix issue.  Previous patch still fixed the issue, this adds a minor housekeeping fix.
Comment 35 Aristeu Rozanski 2010-05-28 16:38:53 EDT
Patch(es) available on kernel-2.6.32-31.el6
Comment 38 John Villalovos 2010-06-01 11:20:51 EDT
Intel has verified that this issue is resolved in the latest RHEL 6 kernel.
Comment 39 Evan McNabb 2010-06-01 12:36:14 EDT
I also confirmed this now works correctly on 2.6.32-31.el6.x86_64. Setting to VERIFIED.
Comment 40 releng-rhel@redhat.com 2010-11-11 10:54:34 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.