RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1888761 - sshd service fails to start the first time because of missing host keys
Summary: sshd service fails to start the first time because of missing host keys
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: cloud-init
Version: 8.3
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: 8.0
Assignee: Eduardo Otubo
QA Contact: xiachen
URL:
Whiteboard:
Depends On: 1957532 1970909
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-15 17:26 UTC by Qin Yuan
Modified: 2021-06-23 09:24 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-23 09:23:28 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
cloud-init logs (26.64 KB, application/x-xz)
2020-10-15 17:26 UTC, Qin Yuan
no flags Details

Description Qin Yuan 2020-10-15 17:26:05 UTC
Created attachment 1721882 [details]
cloud-init logs

Description of problem:
When start a VM with cloud-init, sshd service fails to start the first time because of missing host keys:

[root@test1 ~]# journalctl -u sshd
Oct 15 19:23:48 test1 systemd[1]: Starting OpenSSH server daemon...
Oct 15 19:23:48 test1 sshd[1043]: Unable to load host key: /etc/ssh/ssh_host_rsa_key
Oct 15 19:23:48 test1 sshd[1043]: Unable to load host key: /etc/ssh/ssh_host_ecdsa_key
Oct 15 19:23:48 test1 sshd[1043]: Unable to load host key: /etc/ssh/ssh_host_ed25519_key
Oct 15 19:23:48 test1 sshd[1043]: sshd: no hostkeys available -- exiting.
Oct 15 19:23:48 test1 systemd[1]: sshd.service: Main process exited, code=exited, status=1/FAILURE
Oct 15 19:23:48 test1 systemd[1]: sshd.service: Failed with result 'exit-code'.
Oct 15 19:23:48 test1 systemd[1]: Failed to start OpenSSH server daemon.
Oct 15 19:24:32 test1 systemd[1]: sshd.service: Service RestartSec=42s expired, scheduling restart.
Oct 15 19:24:32 test1 systemd[1]: sshd.service: Scheduled restart job, restart counter is at 1.
Oct 15 19:24:32 test1 systemd[1]: Stopped OpenSSH server daemon.
Oct 15 19:24:32 test1 systemd[1]: Starting OpenSSH server daemon...
Oct 15 19:24:33 test1 sshd[4766]: Server listening on 0.0.0.0 port 22.
Oct 15 19:24:33 test1 sshd[4766]: Server listening on :: port 22.
Oct 15 19:24:33 test1 systemd[1]: Started OpenSSH server daemon.
Oct 15 19:25:09 test1 sshd[4769]: Accepted password for root from 10.72.12.131 port 34770 ssh2
Oct 15 19:25:09 test1 sshd[4769]: pam_unix(sshd:session): session opened for user root by (uid=0)

In /var/log/messages:

Oct 15 19:23:38 test1 systemd[1]: Starting Initial cloud-init job (pre-networking)...
Oct 15 19:23:38 test1 systemd[1]: Starting OpenSSH ecdsa Server Key Generation...
Oct 15 19:23:38 test1 systemd[1]: Starting OpenSSH rsa Server Key Generation...
Oct 15 19:23:38 test1 systemd[1]: Starting OpenSSH ed25519 Server Key Generation...
Oct 15 19:23:38 test1 systemd[1]: sshd-keygen: Succeeded.
Oct 15 19:23:38 test1 systemd[1]: Started OpenSSH ecdsa Server Key Generation.
Oct 15 19:23:38 test1 systemd[1]: sshd-keygen: Succeeded.
Oct 15 19:23:38 test1 systemd[1]: Started OpenSSH ed25519 Server Key Generation.
Oct 15 19:23:41 test1 systemd[1]: sshd-keygen: Succeeded.
Oct 15 19:23:41 test1 systemd[1]: Started OpenSSH rsa Server Key Generation.
Oct 15 19:23:41 test1 systemd[1]: Reached target sshd-keygen.target.
Oct 15 19:23:44 test1 systemd[1]: Started Initial cloud-init job (pre-networking).
Oct 15 19:23:47 test1 systemd[1]: Starting Initial cloud-init job (metadata service crawler)...
Oct 15 19:23:48 test1 systemd[1]: Started Initial cloud-init job (metadata service crawler).
Oct 15 19:23:48 test1 systemd[1]: Starting OpenSSH server daemon...
Oct 15 19:23:48 test1 sshd[1043]: Unable to load host key: /etc/ssh/ssh_host_rsa_key
Oct 15 19:23:48 test1 sshd[1043]: Unable to load host key: /etc/ssh/ssh_host_ecdsa_key
Oct 15 19:23:48 test1 sshd[1043]: Unable to load host key: /etc/ssh/ssh_host_ed25519_key
Oct 15 19:23:48 test1 sshd[1043]: sshd: no hostkeys available -- exiting.
Oct 15 19:23:48 test1 systemd[1]: sshd.service: Main process exited, code=exited, status=1/FAILURE
Oct 15 19:23:48 test1 systemd[1]: sshd.service: Failed with result 'exit-code'.
Oct 15 19:23:48 test1 systemd[1]: Failed to start OpenSSH server daemon.

In /var/log/cloud-init.log:

2020-10-15 16:23:43,838 - stages.py[DEBUG]: Running module ssh (<module 'cloudinit.config.cc_ssh' from '/usr/lib/python3.6/site-packages/cloudinit/config/cc_ssh.py'>) with frequency once-per-instance
2020-10-15 16:23:43,840 - helpers.py[DEBUG]: Running config-ssh using lock (<FileLock using file '/var/lib/cloud/instances/6c7709d2-be62-445b-b678-a022ef6c0aa9/sem/config_ssh'>)
2020-10-15 16:23:43,846 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ecdsa_key
2020-10-15 16:23:43,846 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ecdsa_key.pub
2020-10-15 16:23:43,847 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ed25519_key
2020-10-15 16:23:43,847 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ed25519_key.pub
2020-10-15 16:23:43,847 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_rsa_key
2020-10-15 16:23:43,847 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_rsa_key.pub

In /etc/cloud/cloud.cfg:

ssh_deletekeys:   1
ssh_genkeytypes:  ~


As you can see, OpenSSH Server Key Generation was started almost at the same time with cloud-init job, but the keys generated by it were removed later when running cloud-init ssh module because ssh_deletekeys is 1. So when starting OpenSSH server daemon, it failed because of no hostkeys available at that time.

It's not reasonable for cloud-init to remove the hostkeys generated by sshd, sshd hostkeys generation should be executed after cloud-init is finished.


Version-Release number of selected component (if applicable):
kernel-4.18.0-233.el8.x86_64 (rhel 8.3)
cloud-init-19.4-8.el8.noarch
RHV-4.4.3

How reproducible:
100%

Steps to Reproduce:
1. Create a VM from template(latest-rhel-guest-image-8.3-infra) and enable cloud-init on RHV.
2. Run the VM.
3. Check sshd service info by running `journalctl -u sshd` after the VM is up.

Actual results:
1. sshd service failed to start the first time, and it got restarted 42s later.

Expected results:
1. sshd service should be started successfully the first time.

Additional info:
This also happens on rhel 8.2 VM.

Comment 1 Huijuan Zhao 2020-10-19 03:27:42 UTC
Can reproduce on OpenStack both in cloud-init-19.4-11.el8.noarch and cloud-init-20.3-1.el8.1.eterrell202010141324.noarch.
Give qa_ack+

Comment 2 Eduardo Otubo 2020-10-19 10:19:16 UTC
ssh_deletekeys does not remove newly generated keys, it deletes keys that are already available on the instance when it is deployed - this should be the default behavior in order to avoid having the same key across multiple instances. The problem here looks like to be a small race between sshd and cloud-init. Cloud-init needs to consume ssh keys at a point in time that they're not yet generated by sshd. Looks like an upstream bug. I'm setting the priority of this BZ to low as it doesn't prevent the startup and/or configuration. But I'll definitely look into this in the future.

Thanks for the complete report!

Comment 6 xiachen 2021-06-23 09:23:28 UTC
Close this bug based on all test PASS.

Test details in below bugs.
BZ #1957532 for RHEL8.5, cloud-init-21.1-2.el8
BZ #1963981 for RHEL8.4.0.z, cloud-init-20.3-10.el8_4.4
BZ #1963982 for RHEL8.2.0.z, cloud-init-18.5-12.el8_2.8
BZ #1970909 for RHEL9.0, cloud-init-21.1-2.el9


Note You need to log in before you can comment on or make changes to this bug.