Bug 716920 - console fails to work in stage2
Summary: console fails to work in stage2
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: anaconda
Version: 6.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Martin Sivák
QA Contact: Release Test Team
URL:
Whiteboard:
Depends On:
Blocks: 691780
TreeView+ depends on / blocked
 
Reported: 2011-06-27 13:16 UTC by Andrew Jones
Modified: 2011-12-06 10:38 UTC (History)
2 users (show)

Fixed In Version: anaconda-13.21.123-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 10:38:19 UTC
Target Upstream Version:


Attachments (Terms of Use)
full stage2 strace [gzipped, unpacked form has 124MB] (655.46 KB, application/x-gzip)
2011-07-21 15:10 UTC, Martin Sivák
no flags Details
Error hapenning after we add tssetpgrp to loader (15.23 KB, image/png)
2011-07-22 13:44 UTC, Martin Sivák
no flags Details
Loader strace in text form (6.22 MB, text/plain)
2011-07-22 13:51 UTC, Martin Sivák
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1565 0 normal SHIPPED_LIVE anaconda bug fix and enhancement update 2011-12-06 00:39:12 UTC

Description Andrew Jones 2011-06-27 13:16:57 UTC
When installing a 32-bit xen hvm guest using the serial console, the console stops responding at this display

              ┌──────────┤ Would you like to use VNC? ├──────────┐
              │                                                  │
              │ Text mode provides a limited set of installation │
              │ options.  It does not allow you to specify your  │
              │ own partitioning layout or package selections.   │
              │ Would you like to use VNC mode instead?          │
              │                                                  │
              │     ┌───────────┐         ┌───────────────┐      │
              │     │ Start VNC │         │ Use text mode │      │
              │     └───────────┘         └───────────────┘      │
              │                                                  │
              │                                                  │
              └──────────────────────────────────────────────────┘

How to reproduce:
1) start an installation
virt-install -v -n rhel6-32fv -l $LOCATION --vcpus=4 -r 2048 -f /dev/stripe/rhel6-32fv

2) when grub pops up in virt-viewer edit the kernel command line, add console=ttyS0,115200n8

3) connect with virsh console rhel6-32fv

4) everything seems to work fine, proceed through selections until the above selection is displayed.

results:

will not be able to select 'text mode' or even 'start vnc', the console does not respond to input

Comment 1 Andrew Jones 2011-06-27 13:19:45 UTC
This is a latest nightly build for 6.2.

Comment 2 Andrew Jones 2011-06-27 13:20:18 UTC
Same results with a PV guest, changing summary.

Comment 4 Jan Stodola 2011-07-01 14:25:13 UTC
Also reproduced on s390x and x86_64 (both bare metal and virtual machine in KVM).

If I use an incomplete kickstart to run installation interactively in text mode, then it is also not possible to select/enter any values, for example select time zone).

Even though the TUI is frozen, I'm still able to connect to the machine via ssh or switch ttys. I will send logs from the installation.

Reproduced on build RHEL6.2-20110630.n.0 with anaconda-13.21.118-1.el6

Comment 6 David Cantrell 2011-07-19 17:51:58 UTC
What happens if you pass 'serial' as a boot argument in addition to the console=ttyS0,115200n8 boot argument?

Comment 7 Andrew Jones 2011-07-20 08:50:43 UTC
(In reply to comment #6)
> What happens if you pass 'serial' as a boot argument in addition to the
> console=ttyS0,115200n8 boot argument?

Since the problem also occurs with PV guests (console=hvc0), then the console=ttyS0,115200n8 isn't really a super important data point. However, I did add serial[1] to see what happens, and I got the same failure with a 64-bit PV guest.

[1] virt-install -n test62 -p -l http://download.englab.brq.redhat.com/pub/rhel/nightly/latest-RHEL-6/6.2/Server/x86_64/os --vcpus=2 -r 1024 -f test62 -s 7 -x serial

I'm raising the priority/severity. This problem is going to slow testing that will start in one week.

Comment 8 RHEL Program Management 2011-07-20 15:00:16 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 9 Jan Stodola 2011-07-20 15:28:02 UTC
[anaconda root@rtt7 tmp]# loadkeys us
/usr/bin/loadkeys: (13, 'Permission denied')[anaconda root@rtt7 tmp]# strace loadkeys us 2>&1 | tail
open("/usr/lib64/python2.6/site-packages/selinux/stat.pyc", O_RDONLY) = -1 ENOENT (No such file or directory)
close(4)                                = 0
munmap(0x20000564000, 4096)             = 0
close(3)                                = 0
munmap(0x20000523000, 4096)             = 0
open("/etc/keymaps.gz", O_RDONLY)       = -1 ENOENT (No such file or directory)
write(2, "/usr/bin/loadkeys: (13, 'Permiss"..., 44/usr/bin/loadkeys: (13, 'Permission denied')) = 44
rt_sigaction(SIGINT, {SIG_DFL, [], 0}, {0x2000015d03c, [], 0}, 8) = 0
close(6)                                = 0
exit_group(1)                           = ?

Comment 10 Martin Sivák 2011-07-20 16:10:04 UTC
So what I found out is that the keyboard stops responding in the text interface only while in text or serial+text mode.

VNC or graphical are ok as well as our shell on the second VT (in text mode too).

I was able to switch to the second VT using C-A-F2 during the freeze.

The missing keymaps.gz error was strange, but probably happened only on s390.

Comment 11 Jan Stodola 2011-07-20 16:38:48 UTC
'loadkeys us' doesn't print any error when executed on an x86_64 system - tried via ssh right after anaconda asked to select text/vnc installation method and UI stopped responding:

-bash-4.1# loadkeys us
-bash-4.1# ls -l /etc/keymaps.gz
-rw-r--r--. 1 root root 12173 Jul 19 13:02 /etc/keymaps.gz

Comment 12 Martin Sivák 2011-07-21 13:59:42 UTC
So the only new information I have right now is that the anaconda process (and anamon and auditd follows them in some time, all with tty1 controlling terminal) ends up in T state after first key is pressed in any stage2 text dialog.

T is defined as "Stopped, either by a job control signal or because it is being traced." and I have no idea why we get there..

Serial and all graphical modes work ok. This affects only text mode regardless on the type of machine (I have seen it on s390, x86_64, both virtualized and physical..)

Comment 13 Martin Sivák 2011-07-21 15:10:19 UTC
Created attachment 514219 [details]
full stage2 strace [gzipped, unpacked form has 124MB]

See the end of the strace:


fstat(5, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
ioctl(5, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffc6a70f60) = -1 ENOTTY (Inappropriate ioctl for device)
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7e3c0ef000
read(5, "", 4096)                       = 0
...
close(5)                                = 0
munmap(0x7f7e3c0ef000, 4096)            = 0
fstat(4, {st_mode=S_IFREG|0444, st_size=16507, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f7...skipping...
--- SIGTTIN (Stopped (tty input)) @ 0 (0) ---
--- SIGTTIN (Stopped (tty input)) @ 0 (0) ---
read(6, 0x7fffc6a7688f, 1)              = ? ERESTARTSYS (To be restarted)

Comment 14 Martin Sivák 2011-07-21 15:50:34 UTC
Some more debug info about the state of processes couple of minutes after freeze

    64 tty1     S+     0:02 /sbin/loader --devel
  632 tty1     Z+     0:00  \_ [hald] <defunct>
  633 tty1     Z      0:00  \_ [NetworkManager] <defunct>
  660 tty2     Ss+    0:00  \_ -/bin/sh
  661 ?        Ss     0:00  \_ /sbin/loader --devel
  687 tty1     T      0:00  \_ /bin/bash /usr/bin/anaconda --stage2 http://10.34
  688 tty1     Z      0:56      \_ [strace] <defunct>
  634 ?        Ss     0:00 /sbin/hald --use-syslog
  635 ?        S      0:00  \_ hald-runner
  636 ?        Ssl    0:00 /usr/sbin/NetworkManager --pid-file=/var/run/NetworkM
  650 ?        S      0:00  \_ /sbin/dhclient -d -4 -sf /usr/libexec/nm-dhcp-cli
  642 ?        Ss     0:00 /usr/sbin/wpa_supplicant -c /etc/wpa_supplicant/wpa_s
  817 tty1     S      0:00 /sbin/auditd
  830 ?        Ss     0:00 /sbin/sshd -f /etc/ssh/sshd_config.anaconda
  831 ?        Ss     0:00  \_ sshd: root@pts/0                              
  833 pts/0    Ss     0:00      \_ -bash
  859 pts/0    R+     0:00          \_ ps axf

And descriptors just before freeze

-bash-4.1# ls -al /proc/64/fd
total 0
dr-x------. 2 root root  0 Jul 21 11:51 .
dr-xr-xr-x. 7 root root  0 Jul 21 11:39 ..
lrwx------. 1 root root 64 Jul 21 11:51 0 -> /dev/tty1
lrwx------. 1 root root 64 Jul 21 11:51 1 -> /dev/tty1
lrwx------. 1 root root 64 Jul 21 11:51 2 -> /dev/tty1
lrwx------. 1 root root 64 Jul 21 11:51 5 -> socket:[7475]
lrwx------. 1 root root 64 Jul 21 11:51 8 -> socket:[7700]
-bash-4.1# ls -al /proc/689/fd
total 0
dr-x------. 2 root root  0 Jul 21 11:41 .
dr-xr-xr-x. 7 root root  0 Jul 21 11:40 ..
lrwx------. 1 root root 64 Jul 21 11:41 0 -> /dev/tty1
lrwx------. 1 root root 64 Jul 21 11:41 1 -> /dev/tty1
l-wx------. 1 root root 64 Jul 21 11:41 10 -> /dev/tty3
l-wx------. 1 root root 64 Jul 21 11:41 11 -> /tmp/storage.log
l-wx------. 1 root root 64 Jul 21 11:41 12 -> /dev/tty3
lr-x------. 1 root root 64 Jul 21 11:41 13 -> /mnt/runtime/usr/lib/anaconda/textw/welcome_text.py
lrwx------. 1 root root 64 Jul 21 11:41 18 -> /tmp/ffih2ein5 (deleted)
l-wx------. 1 root root 64 Jul 21 11:41 2 -> /tmp/anaconda.strace
l-wx------. 1 root root 64 Jul 21 11:41 3 -> /dev/tty3
l-wx------. 1 root root 64 Jul 21 11:41 4 -> /tmp/anaconda.log
l-wx------. 1 root root 64 Jul 21 11:41 5 -> /tmp/program.log
lrwx------. 1 root root 64 Jul 21 11:41 6 -> /dev/tty
lrwx------. 1 root root 64 Jul 21 11:41 7 -> socket:[9804]
lrwx------. 1 root root 64 Jul 21 11:41 8 -> /dev/mapper/control
l-wx------. 1 root root 64 Jul 21 11:41 9 -> /tmp/ifcfg.log
-bash-4.1# ls -al /proc/688/fd
total 0
dr-x------. 2 root root  0 Jul 21 11:41 .
dr-xr-xr-x. 7 root root  0 Jul 21 11:40 ..
lrwx------. 1 root root 64 Jul 21 11:41 0 -> /dev/tty1
lrwx------. 1 root root 64 Jul 21 11:41 1 -> /dev/tty1
l-wx------. 1 root root 64 Jul 21 11:41 2 -> /tmp/anaconda.strace

# this is not loader, it was forked from it
-bash-4.1# ls -al /proc/661/fd
total 0
dr-x------. 2 root root  0 Jul 21 11:41 .
dr-xr-xr-x. 7 root root  0 Jul 21 11:40 ..
lr-x------. 1 root root 64 Jul 21 11:41 0 -> /dev/null
l-wx------. 1 root root 64 Jul 21 11:41 1 -> /dev/null
l-wx------. 1 root root 64 Jul 21 11:41 2 -> /dev/null
l-wx------. 1 root root 64 Jul 21 11:41 3 -> /dev/tty3
l-wx------. 1 root root 64 Jul 21 11:41 4 -> /tmp/anaconda.log
lrwx------. 1 root root 64 Jul 21 11:41 5 -> socket:[9399]
-bash-4.1#

Comment 16 Martin Sivák 2011-07-22 13:44:52 UTC
Created attachment 514700 [details]
Error hapenning after we add tssetpgrp to loader

It is clearly visible, that tcsetpgrp killed the stage2 process with the same errno as we saw in stage2 freeze.

Since the call was the second call after fork (fork, setpgrp, tcsetpgrp) it leads me to assumption that something changed either in kernel or glibc.

Comment 17 Martin Sivák 2011-07-22 13:51:22 UTC
Created attachment 514704 [details]
Loader strace in text form

Comment 18 Martin Sivák 2011-07-25 13:43:35 UTC
Seems like it was caused by our debugging code being called in bad place and bad order. The patch was posted for review and will get included in the build once approved.

Comment 20 Alexander Todorov 2011-08-18 12:12:19 UTC
With anaconda-13.21.130-1 / 0818.n.0 nightly I was able to successfully perform interactive install in text mode. Moving to VERIFIED.

Comment 21 errata-xmlrpc 2011-12-06 10:38:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1565.html


Note You need to log in before you can comment on or make changes to this bug.