Bug 1707099
Summary: | Unable to login to OC BM instance booted with overcloud-full image with cloud-user. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> | ||||
Component: | openstack-tripleo-heat-templates | Assignee: | Bob Fournier <bfournie> | ||||
Status: | CLOSED ERRATA | QA Contact: | Alexander Chuzhoy <sasha> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 15.0 (Stein) | CC: | apevec, bfournie, dalvarez, derekh, dsneddon, dtantsur, hjensas, lhh, lmartins, mbooth, mburns, mlammon, racedoro, rpittau | ||||
Target Milestone: | beta | Keywords: | Regression, Triaged | ||||
Target Release: | 15.0 (Stein) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | openstack-tripleo-heat-templates-10.5.1-0.20190528190414.f41e7e4.1.el8ost | Doc Type: | No Doc Update | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-09-21 11:21:45 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Alexander Chuzhoy
2019-05-06 19:47:23 UTC
On the same setup I booted a regular instance with cirros image and was able to login with cirros user using the public key provided with --key-name. Sasha - on the instance we should see cloud-init adding the cloud-user, for example (on another system): [cloud-user@instance1 log]$ sudo grep cloud-user /var/log/cloud-init.log 2019-04-10 20:28:41,370 - __init__.py[DEBUG]: Adding user cloud-user 2019-04-10 20:28:41,371 - util.py[DEBUG]: Running hidden command to protect sensitive input/output logstring: ['useradd', 'cloud-user', '--comment', 'Cloud User', '--groups', 'adm,systemd-journal,wheel', '--shell', '/bin/bash', '-m'] 2019-04-10 20:28:41,481 - util.py[DEBUG]: Running command ['passwd', '-l', 'cloud-user'] with allowed return codes [0] (shell=False, capture=True) 2019-04-10 20:28:41,565 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user (recursive=True) 2019-04-10 20:28:41,567 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user/.ssh (recursive=False) 2019-04-10 20:28:41,567 - util.py[DEBUG]: Changing the ownership of /home/cloud-user/.ssh to 1000:1001 2019-04-10 20:28:41,580 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user/.ssh (recursive=True) 2019-04-10 20:28:41,581 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user/.ssh (recursive=False) 2019-04-10 20:28:41,582 - util.py[DEBUG]: Writing to /home/cloud-user/.ssh/authorized_keys - wb: [384] 433 bytes 2019-04-10 20:28:41,583 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user/.ssh/authorized_keys (recursive=False) 2019-04-10 20:28:41,584 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user/.ssh/authorized_keys (recursive=False) 2019-04-10 20:28:41,584 - util.py[DEBUG]: Changing the ownership of /home/cloud-user/.ssh/authorized_keys to 1000:1001 2019-04-10 20:28:41,585 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user/.ssh (recursive=True) 2019-04-10 20:28:44,462 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user/.ssh (recursive=True) 2019-04-10 20:28:44,463 - util.py[DEBUG]: Reading from /home/cloud-user/.ssh/authorized_keys (quiet=False) 2019-04-10 20:28:44,463 - util.py[DEBUG]: Read 433 bytes from /home/cloud-user/.ssh/authorized_keys When you're logged in as root can retrieve /var/log/cloud-init.log? If you can't get this file can you check if the cloud-init pkg is installed? Created attachment 1566779 [details]
cloud-init log
On sasha's system, looks like the cloud-user is being set up fine but 0 bytes are written to /home/cloud-user/.ssh/authorized_keys. [root@host-192-168-24-97 ~]# grep cloud-user /var/log/cloud-init.log 2019-05-10 18:46:09,239 - __init__.py[DEBUG]: Adding user cloud-user 2019-05-10 18:46:09,239 - util.py[DEBUG]: Running hidden command to protect sensitive input/output logstring: ['useradd', 'cloud-user', '--comment', 'Cloud User', '--groups', 'adm,systemd-journal,wheel', '--shell', '/bin/bash', '-m'] 2019-05-10 18:46:09,637 - util.py[DEBUG]: Running command ['passwd', '-l', 'cloud-user'] with allowed return codes [0] (shell=False, capture=True) 2019-05-10 18:46:09,658 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user (recursive=True) 2019-05-10 18:46:09,660 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user/.ssh (recursive=False) 2019-05-10 18:46:09,660 - util.py[DEBUG]: Changing the ownership of /home/cloud-user/.ssh to 1000:1001 2019-05-10 18:46:09,661 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user/.ssh (recursive=True) 2019-05-10 18:46:09,662 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user/.ssh (recursive=False) 2019-05-10 18:46:09,662 - util.py[DEBUG]: Writing to /home/cloud-user/.ssh/authorized_keys - wb: [600] 0 bytes <====== 2019-05-10 18:46:09,662 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user/.ssh/authorized_keys (recursive=False) 2019-05-10 18:46:09,663 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user/.ssh/authorized_keys (recursive=False) 2019-05-10 18:46:09,663 - util.py[DEBUG]: Changing the ownership of /home/cloud-user/.ssh/authorized_keys to 1000:1001 2019-05-10 18:46:09,663 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user/.ssh (recursive=True) 2019-05-10 18:46:11,003 - util.py[DEBUG]: Restoring selinux mode for /home/cloud-user/.ssh (recursive=True) 2019-05-10 18:46:11,004 - util.py[DEBUG]: Reading from /home/cloud-user/.ssh/authorized_keys (quiet=False) 2019-05-10 18:46:11,004 - util.py[DEBUG]: Read 0 bytes from /home/cloud-user/.ssh/authorized_keys <===== Looks like an issue getting metadata for sasha's instance: loud-init-output.log:2019-05-10 18:45:49,439 - url_helper.py[WARNING]: Calling 'http://192.168.24.1/latest/meta-data/instance-id' failed [101/120s]: request error [HTTPConnectionPool(host='192.168.24.1', port=80): Max retries exceeded with url: /latest/meta-data/instance-id (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f28a44d8390>, 'Connection to 192.168.24.1 timed out. (connect timeout=50.0)'))] cloud-init-output.log:2019-05-10 18:46:07,460 - url_helper.py[WARNING]: Calling 'http://192.168.24.1/latest/meta-data/instance-id' failed [119/120s]: request error [HTTPConnectionPool(host='192.168.24.1', port=80): Max retries exceeded with url: /latest/meta-data/instance-id (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f28a44d8ba8>, 'Connection to 192.168.24.1 timed out. (connect timeout=17.0)'))] Also there's no route to the metadata server 169.254.169.254 [root@host-192-168-24-97 ~]# ip route default via 192.168.24.1 dev eth0 proto dhcp metric 100 192.168.24.0/24 dev eth0 proto kernel scope link src 192.168.24.97 metric 100 versus what we see on working instance on osp-13: cloud-user@instance1 ~]$ ip route default via 172.21.2.1 dev em1 proto dhcp metric 100 169.254.169.254 via 172.21.2.1 dev em1 proto dhcp metric 100 172.21.2.0/24 dev em1 proto kernel scope link src 172.21.2.206 metric 100 We should use configdrive, not metadata. I wonder if something has changed that prevented configdrives from being read. Sasha - Can you log onto the instance, mount the config-drive (it should be a 64M partition) and see if the files in there contain your key. Also can you take a look at ~/.ssh/id_rsa.pub to make sure it isn't an empty file for some reason. So if the server is created with "--config-drive true" - the issue doesn't reproduce. Thanks for testing Sasha. Derek found that force_config_drive is set to false in OSP-15 where previously it was true, this is because of the ovn change https://review.opendev.org/#/c/502943/. The release note for this change indicates: features: - Adds ability to configure metadata agent for networking-ovn based deployments. upgrade: - force_config_drive is now set to False in Nova. Instances will now fetch their metadata from the metadata service instead from the config drive. Using the metadata service isn't working properly... During boot the BM instance's console shows: [ 45.726833] cloud-init[1072]: 2019-05-17 19:08:55,628 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [7/120s]: bad status code [404] [ 47.777999] cloud-init[1072]: 2019-05-17 19:08:57,679 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [9/120s]: bad status code [404] [ 49.833946] cloud-init[1072]: 2019-05-17 19:08:59,735 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [11/120s]: bad status code [404] [ 51.884542] cloud-init[1072]: 2019-05-17 19:09:01,785 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [13/120s]: bad status code [404] [ 53.931933] cloud-init[1072]: 2019-05-17 19:09:03,833 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [15/120s]: bad status code [404] [ 56.997295] cloud-init[1072]: 2019-05-17 19:09:06,898 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [18/120s]: bad status code [404] [ 60.046866] cloud-init[1072]: 2019-05-17 19:09:09,948 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [21/120s]: bad status code [404] [ 63.103655] cloud-init[1072]: 2019-05-17 19:09:13,004 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [24/120s]: bad status code [404] [ 66.156222] cloud-init[1072]: 2019-05-17 19:09:16,057 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [28/120s]: bad status code [404] [ 69.208272] cloud-init[1072]: 2019-05-17 19:09:19,109 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [31/120s]: bad status code [404] [ 73.262001] cloud-init[1072]: 2019-05-17 19:09:23,163 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [35/120s]: bad status code [404] [ 77.490188] cloud-init[1072]: 2019-05-17 19:09:27,391 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [39/120s]: bad status code [404] [ 81.543769] cloud-init[1072]: 2019-05-17 19:09:31,444 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [43/120s]: bad status code [404] Note: this also affects boot time of the BM instance - much longer. Looking at the logs it seems the instance is indeed trying to use the correct metadata server ip, but since the route rule is missing for 169.254.169.254 it can't reach it. As a consequence, we have first a no route to host to 169.254.169.254, and then the server trying to fall-back to 192.168.24.1 (gateway) to retrieve the metadata. The route rule to reach the metadata ip should be added by neutron during vm creation, specifically by the neutron dhcp service. It would be interesting to verify that the metadata service is reachable after manually adding the route rule, to exclude any other possible networking service issues. Daniel - per Dmitry's comment in https://review.opendev.org/#/c/502943/32/puppet/services/ovn-controller.yaml, is there any downside to reverting the change setting force_config_drive to false? Looks like force_config_drive is now set to true using latest compose (5/29) [root@undercloud-0 etc]# rpm -qa | grep tripleo-heat openstack-tripleo-heat-templates-10.5.1-0.20190528190414.f41e7e4.1.el8ost.noarch [root@undercloud-0 etc]# grep -r force_config_drive * puppet/hieradata/service_configs.json: "nova::compute::force_config_drive": true, force_config_drive is also true on controller nodes [heat-admin@controller-0 puppet]$ sudo grep -r force_config_drive * hieradata/service_configs.json: "nova::compute::force_config_drive": true, Env: openstack-tripleo-heat-templates-10.5.1-0.20190528190414.f41e7e4.1.el8ost.noarch (undercloud) [stack@undercloud-0 ~]$ sudo grep -r force_config_drive /etc/puppet/hieradata/service_configs.json "nova::compute::force_config_drive": true Latest test no longer see the issue ssh to instance Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811 |