Bug 1558641 - cloud-init creates bogus metadata route preventing metadata setup
Summary: cloud-init creates bogus metadata route preventing metadata setup
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: cloud-init
Version: 28
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Garrett Holmstrom
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On:
Blocks: F28BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2018-03-20 16:34 UTC by Kevin Fenzi
Modified: 2018-03-29 19:22 UTC (History)
13 users (show)

Fixed In Version: cloud-init-17.1-4.fc28
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-29 19:22:03 UTC


Attachments (Terms of Use)
boot logs from cloud instance (70.87 KB, text/plain)
2018-03-20 21:50 UTC, Kevin Fenzi
no flags Details

Description Kevin Fenzi 2018-03-20 16:34:01 UTC
Using https://kojipkgs.fedoraproject.org//work/tasks/5585/25835585/Fedora-Cloud-Base-28-20180320.n.0.x86_64.qcow2 things boot and cloud-init runs, but it creates a bogus route pointing the metadata service to the local instance (which of course fails). 

[[0;32m  OK  [0m] Reached target Network.
[   17.249056] cloud-init[924]: Cloud-init v. 17.1 running 'init' at Tue, 20 Mar 2018 16:26:32 +0000. Up 10.62 seconds.
[   17.250631] cloud-init[924]: ci-info: ++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++
[   17.252014] cloud-init[924]: ci-info: +--------+------+--------------+---------------+-------+-------------------+
[   17.253460] cloud-init[924]: ci-info: | Device |  Up  |   Address    |      Mask     | Scope |     Hw-Address    |
[   17.254789] cloud-init[924]: ci-info: +--------+------+--------------+---------------+-------+-------------------+
[   17.256118] cloud-init[924]: ci-info: | eth0:  | True | 172.25.64.52 | 255.255.240.0 |   .   | fa:16:3e:11:cb:02 |
[   17.257429] cloud-init[924]: ci-info: | eth0:  | True |      .       |       .       |   d   | fa:16:3e:11:cb:02 |
[   17.258761] cloud-init[924]: ci-info: |  lo:   | True |  127.0.0.1   |   255.0.0.0   |   .   |         .         |
[   17.260064] cloud-init[924]: ci-info: |  lo:   | True |      .       |       .       |   d   |         .         |
[   17.261368] cloud-init[924]: ci-info: +--------+------+--------------+---------------+-------+-------------------+
[   17.262745] cloud-init[924]: ci-info: +++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++
[   17.264081] cloud-init[924]: ci-info: +-------+-------------+-------------+---------------+-----------+-------+
[   17.265400] cloud-init[924]: ci-info: | Route | Destination |   Gateway   |    Genmask    | Interface | Flags |
[   17.266710] cloud-init[924]: ci-info: +-------+-------------+-------------+---------------+-----------+-------+
[   17.268085] cloud-init[924]: ci-info: |   0   |   0.0.0.0   | 172.25.64.1 |    0.0.0.0    |    eth0   |   UG  |
[   17.269401] cloud-init[924]: ci-info: |   1   | 169.254.0.0 |   0.0.0.0   |  255.255.0.0  |    eth0   |   U   |
[   17.271197] cloud-init[924]: ci-info: |   2   | 172.25.64.0 |   0.0.0.0   | 255.255.240.0 |    eth0   |   U   |
[   17.272526] cloud-init[924]: ci-info: +-------+-------------+-------------+---------------+-----------+-------+
[   17.273895] cloud-init[924]: 2018-03-20 16:26:38,796 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [2/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb01c7aaeb8>: Failed to establish a new connection: [Errno 113] No route to host',))]
[   20.320776] cloud-init[924]: 2018-03-20 16:26:41,868 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [5/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb01c7c8710>: Failed to establish a new connection: [Errno 113] No route to host',))]
...tons more of those... 
[  194.551942] cloud-init[924]: 2018-03-20 16:28:42,435 - DataSourceEc2.py[CRITICAL]: Giving up on md from ['http://169.254.169.254/2009-04-04/meta-data/instance-id'] after 126 seconds
[  194.559970] cloud-init[924]: 2018-03-20 16:28:42,443 - url_helper.py[WARNING]: Calling 'http://172.25.64.3/latest/meta-data/instance-id' failed [0/120s]: request error [HTTPConnectionPool(host='172.25.64.3', port=80): Max retries exceeded with url: /latest/meta-data/instance-id (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb01c7c8780>: Failed to establish a new connection: [Errno 111] Connection refused',))]
...

At this point the instance is up and can be pinged fine, but since it couldn't reach the metadata service there's no ssh keys setup, etc. So you cannot login. 

Eventually it times out again and ssh starts, but there's no ssh keys injected, so you still cannot login. 

Note that this is a very old openstack (RHOS 5). It would be good to know if this problem happens to new clouds too.

Comment 1 Fedora Blocker Bugs Application 2018-03-20 16:39:46 UTC
Proposed as a Blocker for 28-beta by Fedora user kevin using the blocker tracking app because:

 "The cloud-init package must be functional for release blocking cloud images. "

Comment 2 Adam Williamson 2018-03-20 17:06:00 UTC
+1 blocker, obviously hits the criterion right on the nose.

Comment 3 Garrett Holmstrom 2018-03-20 21:03:51 UTC
cloud-init has a disable_metadata switch one can use to block access to the EC2 metadata service, but it shouldn't be enabled by default and it *definitely* shouldn't be applied this early in the boot process.  Do you happen to have the rest of the VM's boot-time output handy?

Comment 4 Kevin Fenzi 2018-03-20 21:50:55 UTC
Created attachment 1410821 [details]
boot logs from cloud instance

Here's the logs...

Comment 5 Stephen Gallagher 2018-03-21 17:20:39 UTC
+1 blocker if this is indeed true for all installations. If it turns out that it's limited to old versions of OpenStack, I'll revise that.

Comment 6 Adam Williamson 2018-03-21 17:32:34 UTC
Agreed with Stephen, there, I'm +1 assuming it's a general failure (has anyone tested EC2 yet?)

Comment 7 Mohan Boddu 2018-03-21 17:47:28 UTC
Has anyone tested with other/newer clouds, if its happening in every instance then

+1 Blocker

Comment 8 Adam Williamson 2018-03-21 22:29:53 UTC
That's +3, setting accepted - if it's shown that this doesn't happen on other clouds, I will drop accepted status for a revote.

Comment 9 Patrick Uiterwijk 2018-03-21 23:00:08 UTC
The route also gets added on EC2, although due to networking differences between openstack and EC2, it does not actually break the metadata gathering there.
Regardless, this means that the networking behaviour changed and it breaks on clouds that aren't setup for these routes.


Cloud image from Fedora-28-20180321.n.0:
Mar 21 22:52:14 ip-172-30-2-30.ec2.internal cloud-init[957]: ci-info: ++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++
Mar 21 22:52:14 ip-172-30-2-30.ec2.internal cloud-init[957]: ci-info: +-------+-------------+------------+---------------+-----------+-------+
Mar 21 22:52:14 ip-172-30-2-30.ec2.internal cloud-init[957]: ci-info: | Route | Destination |  Gateway   |    Genmask    | Interface | Flags |
Mar 21 22:52:14 ip-172-30-2-30.ec2.internal cloud-init[957]: ci-info: +-------+-------------+------------+---------------+-----------+-------+
Mar 21 22:52:14 ip-172-30-2-30.ec2.internal cloud-init[957]: ci-info: |   0   |   0.0.0.0   | 172.30.2.1 |    0.0.0.0    |    eth0   |   UG  |
Mar 21 22:52:14 ip-172-30-2-30.ec2.internal cloud-init[957]: ci-info: |   1   | 169.254.0.0 |  0.0.0.0   |  255.255.0.0  |    eth0   |   U   |
Mar 21 22:52:14 ip-172-30-2-30.ec2.internal cloud-init[957]: ci-info: |   2   |  172.30.2.0 |  0.0.0.0   | 255.255.255.0 |    eth0   |   U   |
Mar 21 22:52:14 ip-172-30-2-30.ec2.internal cloud-init[957]: ci-info: +-------+-------------+------------+---------------+-----------+-------+


Current live Fedora 27 cloud image:
Mar 21 22:56:42 ip-172-30-2-41.ec2.internal cloud-init[825]: ci-info: ++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++
Mar 21 22:56:42 ip-172-30-2-41.ec2.internal cloud-init[825]: ci-info: +-------+-------------+------------+---------------+-----------+-------+
Mar 21 22:56:42 ip-172-30-2-41.ec2.internal cloud-init[825]: ci-info: | Route | Destination |  Gateway   |    Genmask    | Interface | Flags |
Mar 21 22:56:42 ip-172-30-2-41.ec2.internal cloud-init[825]: ci-info: +-------+-------------+------------+---------------+-----------+-------+
Mar 21 22:56:42 ip-172-30-2-41.ec2.internal cloud-init[825]: ci-info: |   0   |   0.0.0.0   | 172.30.2.1 |    0.0.0.0    |    eth0   |   UG  |
Mar 21 22:56:42 ip-172-30-2-41.ec2.internal cloud-init[825]: ci-info: |   1   |  172.30.2.0 |  0.0.0.0   | 255.255.255.0 |    eth0   |   U   |
Mar 21 22:56:42 ip-172-30-2-41.ec2.internal cloud-init[825]: ci-info: +-------+-------------+------------+---------------+-----------+-------+

Comment 10 Kevin Fenzi 2018-03-21 23:42:48 UTC
So, after some debug images and poking around I am pretty sure the bug is this: 

- cloud-init overwrites our /etc/sysconfig/network file with: 

# Created by cloud-init on instance boot automatically, do not edit.
#
NETWORKING=yes

- This means that 2 lines we add in the kickstart are gone: 

NOZEROCONF=yes
DEVTIMEOUT=10

- network starts, NOZEROCONF is not set so (in ifup-eth you can see): 
# Add Zeroconf route.
if [ -z "${NOZEROCONF}" -a "${ISALIAS}" = "no" -a "${REALDEVICE}" != "lo" ]; then
    ip route add 169.254.0.0/16 dev ${REALDEVICE} metric $((1000 + $(cat /sys/class/net/${REALDEVICE}/ifindex))) scope link
fi

- Now the metadata route is hosed.

Comment 11 Adam Williamson 2018-03-26 15:37:02 UTC
So, how are we gonna fix it?

Comment 12 Fedora Update System 2018-03-26 20:47:36 UTC
cloud-init-17.1-3.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-86137b2be8

Comment 13 Adam Williamson 2018-03-27 16:12:15 UTC
The fix for this was pulled into the Beta-1.1 (Beta RC1) compose. Can anyone confirm the fix in the Cloud images from that compose? Thanks.

Comment 14 Fedora Update System 2018-03-27 17:52:26 UTC
cloud-init-17.1-4.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-86137b2be8

Comment 16 Geoffrey Marr 2018-03-28 17:39:23 UTC
Tested with Fedora-Cloud-Base-28_Beta-1.1.x86_64 on EC2 and locally with testcloud, boot and login work OK!

Comment 17 Fedora Update System 2018-03-29 19:22:03 UTC
cloud-init-17.1-4.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.