Bug 666332

Summary:

sshd service startup failing after destroy when guest install complete

Product:

Red Hat Enterprise Linux 6

Reporter:

Nan Zhang <nzhang>

Component:

selinux-policy

Assignee:

Miroslav Grepl <mgrepl>

Status:

CLOSED ERRATA

QA Contact:

Karel Srot <ksrot>

Severity:

medium

Docs Contact:

Priority:

low

Version:

6.1

CC:

dallan, dwalsh, dyuan, eblake, ksrot, lihuang, llim, mkenneth, mvadkert, pvrabec, rwu, tburke, virt-maint, xen-maint

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

selinux-policy-3.7.19-138.el6

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-06-20 12:24:13 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
sshd failed info in boot.log in guest	none
1) tarball of /etc/ssh (sshd start failed)	none
2) tarball of /etc/ssh (sshd start successful)	none

Description Nan Zhang 2010-12-30 08:03:42 UTC

Created attachment 471153 [details]
sshd failed info in boot.log in guest

Description of problem:
Install a new guest by using default installation config, and force off the guest, then found the sshd service startup failed. If executing shutdown operation in guest, it's no this issue.

Version-Release number of selected component (if applicable):
kernel-2.6.32-92.el6.x86_64
qemu-kvm-0.12.1.2-2.128.el6.x86_64
libvirt-0.8.6-1.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Install a guest by virt-manager, and wait for guest fully boot up.
2. Force off the guest at once when guest install complete (virsh destroy).
3. Start the guest again.
4. Check the ssh service
   # ps -ef | grep sshd
  
Actual results:
Service sshd startup failed.

Expected results:
Service sshd can be started up successfully.

Additional info:
Waiting for more than about 30s will workaround this issue.

Comment 2 RHEL Program Management 2011-04-04 01:49:23 UTC

Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 3 Dave Allan 2011-06-10 02:04:04 UTC

It looks to me like this is simple disk corruption caused by forcing off the guest, but I'm reassigning to qemu, since I don't know that for sure.

Comment 4 Dor Laor 2011-06-15 13:16:32 UTC

Is there fsck check on the guest OS boot?
What type of cache model (ps aux | grep qemu) was used?

Comment 5 Nan Zhang 2011-06-27 10:22:36 UTC

There is no fsck on guest OS when it was booting, and reproduced on latest build.

# rpm -q libvirt
libvirt-0.9.2-1.el6.x86_64
# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.165.el6.x86_64

# ps aux|grep qemu
qemu     26927 22.1  8.4 795520 320040 ?       Sl   06:43   0:19 /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -name demo -uuid a0952861-25ef-3316-a6da-c87c1095bc91 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/demo.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -boot c -drive file=/var/lib/libvirt/images/demo.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=threads -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d5:77:f5,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

Comment 6 Jes Sorensen 2011-07-21 09:34:23 UTC

What does the 'wait 30 seconds' in the bug report mean?

sshd is known to take some time to generate it's host keys on first boot
after an install. If you kill the guest before this has happened, then
sshd is not going to work on reboot. The question is whether the sshd
boot script is smart enough to handle this case.

I would suspect this problem could be reproduced on real hardware too.

Comment 7 Nan Zhang 2011-08-01 05:00:59 UTC

Jes,
That means don't kill the guest immediately when goes into login UI after first boot, just waiting for over 30 secs. Then, killing the guest and booting again, the sshd service will start up with OS.

Comment 8 Jes Sorensen 2011-08-02 07:30:10 UTC

Nan,

Ok, I think this is an sshd problem, not a KVM problem. Could you
please try the following:

1) Recreate the problem, then once sshd has refused to start, tar up the
   contents of /etc/ssh and attach them to this BZ
2) Do the same for a case where you waited the 30 seconds and the install
   was successful

Thanks,
Jes

Comment 9 Nan Zhang 2011-08-11 07:05:53 UTC

Created attachment 517745 [details]
1) tarball of /etc/ssh (sshd start failed)

Comment 10 Nan Zhang 2011-08-11 07:07:00 UTC

Created attachment 517746 [details]
2) tarball of /etc/ssh (sshd start successful)

Comment 11 Jes Sorensen 2011-08-29 14:09:55 UTC

As expected, the key files are simply not completed, or the write to disk
isn't when the crash occurs. Most of them are zero byte files, and sshd
fails on the following reboot.

This seems a bug in sshd, that shouldn't just fail without warning if these
files are empty. I suggest it either tried to regenerate them or provide
a useful error message offering the admin the chance to create some.

Reassigning to openssh.

Jes

[jes@red-feather tmp]$ ls -al good/etc/ssh/
total 160
drwxr-xr-x 2 jes jes   4096 Dec 28  2009 .
drwxr-xr-x 3 jes jes   4096 Aug 29 16:04 ..
-rw------- 1 jes jes 125811 Aug  8 10:25 moduli
-rw------- 1 jes jes    668 Dec 28  2009 ssh_host_dsa_key
-rw-r--r-- 1 jes jes    590 Dec 28  2009 ssh_host_dsa_key.pub
-rw------- 1 jes jes    963 Dec 28  2009 ssh_host_key
-rw-r--r-- 1 jes jes    627 Dec 28  2009 ssh_host_key.pub
-rw------- 1 jes jes   1675 Dec 28  2009 ssh_host_rsa_key
-rw-r--r-- 1 jes jes    382 Dec 28  2009 ssh_host_rsa_key.pub
-rw------- 1 jes jes   3872 Aug  8 10:25 sshd_config
[jes@red-feather tmp]$ ls -al bad/etc/ssh/
total 140
drwxr-xr-x 2 jes jes   4096 Dec 29  2009 .
drwxr-xr-x 3 jes jes   4096 Aug 29 16:04 ..
-rw------- 1 jes jes 125811 Aug  8 10:25 moduli
-rw------- 1 jes jes      0 Dec 29  2009 ssh_host_dsa_key
-rw-r--r-- 1 jes jes      0 Dec 29  2009 ssh_host_dsa_key.pub
-rw------- 1 jes jes    963 Dec 29  2009 ssh_host_key
-rw-r--r-- 1 jes jes      0 Dec 29  2009 ssh_host_key.pub
-rw------- 1 jes jes      0 Dec 29  2009 ssh_host_rsa_key
-rw-r--r-- 1 jes jes      0 Dec 29  2009 ssh_host_rsa_key.pub
-rw------- 1 jes jes   3872 Aug  8 10:25 sshd_config
[jes@red-feather tmp]$

Comment 15 Karel Srot 2012-01-05 11:14:41 UTC

> This seems a bug in sshd, that shouldn't just fail without warning if these
> files are empty. I suggest it either tried to regenerate them or provide
> a useful error message offering the admin the chance to create some.


Hi Petr,
how are you gonna fix this? I believe error message (logged) would be better then automatic certificate regeneration.

Comment 17 Petr Lautrbach 2012-01-31 16:41:14 UTC

In this particular case, there is still probably valid protocol 1 key - ssh_host_key - therefore initscript cannot know if this is intended configuration or broken files. 

If AUTOCREATE_SERVER_KEYS is set to NO then script runs sshd daemon. Otherwise it tries to regenerate empty key files and most probably fail because of SELinux restrictions on sshd keys and you can see on console or in /var/log/boot.log message:
Generating SSH2 RSA host key: [FAILED]


When there is empty key files and client connects, sshd writes error messages about problem with ssh keys like this to /var/log/messages:

Jan 31 17:25:42 rhel-6-openssh sshd[1922]: error: Could not load host key: /etc/ssh/ssh_host_rsa_key
Jan 31 17:25:42 rhel-6-openssh sshd[1922]: error: Could not load host key: /etc/ssh/ssh_host_dsa_key

Comment 18 Petr Lautrbach 2012-02-01 10:04:55 UTC

So sshd initscripts tries to regenerate new keys but ssh-keygen fails to write public keys:

        if [ ! -s $RSA_KEY ]; then
                echo -n $"Generating SSH2 RSA host key: "
                rm -f $RSA_KEY
                if test ! -f $RSA_KEY && $KEYGEN -q -t rsa -f $RSA_KEY -C '' -N '' >&/dev/null; then
                   ...
                else
                        failure $"RSA key generation"
                        echo
                        exit 1
                fi


type=AVC msg=audit(1328090249.637:121): avc:  denied  { write } for  pid=1730 comm="ssh-keygen" name="ssh_host_rsa_key.pub" dev=vda3 ino=135938 scontext=unconfined_u:system_r:ssh_keygen_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file


# matchpathcon /etc/ssh/ssh_host_*
/etc/ssh/ssh_host_dsa_key       system_u:object_r:sshd_key_t:s0
/etc/ssh/ssh_host_dsa_key.pub   system_u:object_r:etc_t:s0
/etc/ssh/ssh_host_key   system_u:object_r:sshd_key_t:s0
/etc/ssh/ssh_host_key.pub       system_u:object_r:etc_t:s0
/etc/ssh/ssh_host_rsa_key       system_u:object_r:sshd_key_t:s0
/etc/ssh/ssh_host_rsa_key.pub   system_u:object_r:etc_t:s0

Comment 19 Miroslav Grepl 2012-02-01 11:58:55 UTC

We have been testing it more with Petr and "$RSA_KEY.pub" are created with the "sshd_key_t" label but there is the "restorecon" command
 

if [ -x /sbin/restorecon ]; then
                            /sbin/restorecon $RSA_KEY.pub

which resets labels back to etc_t how is defined in the policy. This needs to be fixed.

Comment 20 Miroslav Grepl 2012-02-28 07:05:41 UTC

I added a fix to Fedora. I am backporting it.

Comment 24 errata-xmlrpc 2012-06-20 12:24:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0780.html