Bug 1789719

Summary: [cloud-init][rhel-8.2.0] Failed to write ssh keys to authorized_keys for cloud-user for OpenStack instance
Product: Red Hat Enterprise Linux 8 Reporter: Huijuan Zhao <huzhao>
Component: cloud-initAssignee: Eduardo Otubo <eterrell>
Status: CLOSED NEXTRELEASE QA Contact: Huijuan Zhao <huzhao>
Severity: high Docs Contact:
Priority: high    
Version: 8.2CC: ailan, berrange, eterrell, jgreguske, jwboyer, knoel, ribarry, vkuznets, vpoliset, xiachen, yacao, yoguo, yuxisun
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-21 13:00:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1803928, 1831646    
Bug Blocks:    
Attachments:
Description Flags
cloud-init.log none

Description Huijuan Zhao 2020-01-10 09:18:06 UTC
Created attachment 1651206 [details]
cloud-init.log

Description of problem:
Launch instance which enabled nested virtualization on OpenStack, failed to write ssh keys to authorized_keys, which will cause failed to login instance.

Reboot the instance, can write ssh keys successful and can login instance successful.


Version-Release number of selected component (if applicable):
RHEL-8.2.0-20191219.0 (kernel-4.18.0-167.el8.x86_64)
cloud-init-18.5-8.el8.x86_64

ENV: OpenStack

How reproducible:
50%
Tested 6 times, reproduced 3 times

Steps to Reproduce:
1. Launch instance on OpenStack with Flavor ci.nested.virt.m1.medium (This Flavor enabled nested virtualization)
2. Login the instance
    # ssh cloud-user@IP

Actual results:
After step2, login failed:
$ ssh cloud-user@IP
cloud-user@IP: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

Check the console log:
[[0;32m  OK  [0m] Started OpenSSH server daemon.
 [   19.706797] cloud-init[913]: Cloud-init v. 18.5 running 'modules:config' at Wed, 08 Jan 2020 09:51:36 +0000. Up 19.57 seconds.
 [[0;32m  OK  [0m] Started Apply the settings specified in cloud-config.
          Starting Execute cloud user/final scripts...
 ci-info: no authorized ssh keys fingerprints found for user cloud-user.

Check the /var/log/cloud-init.log:
--------
206 2020-01-08 09:51:34,681 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'url': 'http://169.254.169.254/openstack', 'allow_redirects': True, 'method': 'GET', 'timeout': 10.0, '        headers': {'User-Agent': 'Cloud-Init/18.5'}} configuration
207 2020-01-08 09:51:34,685 - url_helper.py[DEBUG]: Calling 'http://169.254.169.254/openstack' failed [0/-1s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with ur        l: /openstack (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4da6c41978>: Failed to establish a new connection: [Errno 101] Network is unreachable',))]
208 2020-01-08 09:51:34,685 - DataSourceOpenStack.py[DEBUG]: Giving up on OpenStack md from ['http://169.254.169.254/openstack'] after 0 seconds
209 2020-01-08 09:51:34,685 - util.py[WARNING]: No active metadata service found
210 2020-01-08 09:51:34,685 - util.py[DEBUG]: No active metadata service found
--------
541 2020-01-08 09:51:35,668 - util.py[DEBUG]: Writing to /home/cloud-user/.ssh/authorized_keys - wb: [600] 0 bytes
--------


Expected results:
After step2, should login instance successful

Additional info:
1. Scenario 1:  RHEL-8.2 + Flavor (ci.nested.virt.m1.medium)
     Can reproduce the issue
2. Scenario 2:  RHEL-8.2 + Flavor (others, which disabled the nested virtualization) (such as: m1.medium or ci.m1.micro)
     Can NOT reproduce the issue
3. Scenario 1:  RHEL-7.8 + Flavor (ci.nested.virt.m1.medium)
     Can NOT reproduce the issue

Comment 2 Huijuan Zhao 2020-02-04 10:15:25 UTC
Also meet this issue when disable nested virtualization.

Test ENV:
RHEL-8.2.0-20191219.0-x86_64 & cloud-init-18.5-9.el8 + Flavor(m1.medium)

Detail test steps please refer to https://bugzilla.redhat.com/show_bug.cgi?id=1785648#c3

Comment 3 Huijuan Zhao 2020-02-05 09:51:36 UTC
Update more test scenarios:

1. Scenario 1:  RHEL-8.2(4.18.0-167.el8.x86_64) +  cloud-init-18.5-8.el8 + Flavor(m1.medium)
     Can reproduce the issue (reproducible rate: about 50%)
2. Scenario 2:  RHEL-8.2(4.18.0-167.el8.x86_64) +  cloud-init-18.5-9.el8 + Flavor (m1.medium)
     Can reproduce the issue (tested 2 times, both can reproduce)
3. Scenario 1:  RHEL-7.8(3.10.0-1121.el7.x86_64) + cloud-init-18.5-6.el7 + Flavor(ci.nested.virt.m1.medium, m1.medium, ci.m1.micro)
     Can NOT reproduce the issue (tested 4 times, did not reproduce)


Test platform: OpenStack

Test steps:
1.Launch instance A with image RHEL-8.2.0-20191219.0-x86_64 & cloud-init-18.5-8.el8, Flavor is m1.medium
Login instance A with cloud-user successful.

2.Login instance A, clear instance A with below commands
# rm -rf /var/lib/cloud /var/log/cloud-init* /var/log/messages /etc/sudoers.d/* /mnt/swapfile /var/lib/NetworkManager/dhclient-* /etc/resolv.conf
# userdel -rf cloud-user
# hostnamectl set-hostname localhost.localdomain
# sed -i '/cloud-user/d' /etc/sudoers

3.Create image with instance A
# nova stop InstanceA    
# nova image-create --poll InstanceA imagename

4.Launch a new instance B with image created in step3.

5.Login instance B with cloud-user failed:
# ssh cloud-user@IP
cloud-user@IP: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

6. Reboot instance B, login it again with cloud-user successful


According to the above steps:
rhel-8.2(both cloud-init-18.5-8.el8 and cloud-init-18.5-9.el8) can reproduce the issue with about 50% reproducible rate
rhel-7.8 did NOT reproduce the issue.

Eduardo, if you need the test ENV to debug the issue, please ping me anytime, thanks!