Bug 1569321 - cloud-init-17.1-4.fc28 sets BOOTPROTO=none when using EC2 private network only
Summary: cloud-init-17.1-4.fc28 sets BOOTPROTO=none when using EC2 private network only
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: cloud-init
Version: 28
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Garrett Holmstrom
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedFreezeException
Depends On:
Blocks: F28FinalBlocker F28FinalFreezeException
TreeView+ depends on / blocked
 
Reported: 2018-04-19 03:58 UTC by Joe Doss
Modified: 2018-04-25 00:02 UTC (History)
16 users (show)

Fixed In Version: cloud-init-17.1-5.fc28
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-25 00:02:32 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
cloud-init.log (136.70 KB, text/plain)
2018-04-19 22:09 UTC, Lars Kellogg-Stedman
no flags Details
journal.log (190.42 KB, text/x-vhdl)
2018-04-19 22:09 UTC, Lars Kellogg-Stedman
no flags Details

Description Joe Doss 2018-04-19 03:58:03 UTC
Description of problem:

Fedora 28 Atomic Beta (and F28 Beta Cloud) when using private IPs on the instance only, does not connect to the network. This results in the EC2 instance failing it's connectivity test and you cannot connect to it via SSH. If you create the EC2 instance with a Public IP however, it will work just fine.

Looking at the instance volume on a working server it seems that BOOTPROTO=none
 is being set by cloud-init so it does not DHCP correctly.

[fedora@worker ~]# cat /mnt/f28/etc/redhat-release 
Fedora release 28 (Twenty Eight)
[fedora@worker ~]# cat /mnt/f28/etc/sysconfig/network-scripts/ifcfg-eth0
# Created by cloud-init on instance boot automatically, do not edit.
#
BOOTPROTO=none
DEVICE=eth0
HWADDR=06:83:72:56:7c:66
ONBOOT=yes
TYPE=Ethernet
USERCTL=no

Version-Release number of selected component (if applicable):

cloud-init-17.1-4.fc28

How reproducible:

Always.

Steps to Reproduce:
1. Create EC2 instance with Fedora-AtomicHost-28_Beta-1.3.x86_64-us-west-2-HVM-gp2-0 (ami-e5841f9d) without a public IP address. 
2. Observe that it doesn't pass it's Instance Status Checks 

Actual results:

EC2 instance is inaccessible.

Expected results:

An EC2 instance accessible via the private network.

Additional info:

https://pagure.io/atomic-wg/issue/456

Comment 1 Fedora Blocker Bugs Application 2018-04-19 12:58:13 UTC
Proposed as a Blocker for 28-final by Fedora user dustymabe using the blocker tracking app because:

 "The cloud-init package must be functional for release blocking cloud images."

I don't know if this is really a blocker (per release criteria above), but we have found a case in ec2 where cloud-init doesn't work for instances that are requested to not have a public IP (i.e. access to them is through other instances in the same VPC).

Proposing as blocker for now and then we can just bump to FE if it doesn't meet the requirements.

Comment 2 Adam Williamson 2018-04-19 19:22:13 UTC
I'm at least +1 FE to this, not entirely sure if it merits blocker status, don't have enough experience of how big a deal this is.

Comment 3 Joe Doss 2018-04-19 20:38:36 UTC
To add some end-user perspective. We have ~200 VMs running Fedora Cloud and six of them use the public IP features in EC2. Since we rely heavily on running VMs with only private IP addresses (we access them via SSH jump box) this makes this bug huge blocker on using Fedora 28 Atomic / Cloud out of the box.

I understand that it might not fit all the release criteria for a blocker, but I wanted add some color on why this could be a blocker for some end users on consuming Fedora 28 on release.

The good news is that I built an AMI with a downgraded cloud-init and it works just fine.

[jdoss@sts71 ~]$ ssh fedora.11.156
Warning: Permanently added '10.0.11.156' (ECDSA) to the list of known hosts.
[fedora@ip-10-0-11-156 ~]$ cat /etc/redhat-release 
Fedora release 28 (Twenty Eight)
[fedora@ip-10-0-11-156 ~]$ rpm -qa |grep cloud-init
cloud-init-0.7.9-9.fc27.noarch

Maybe just reverting cloud-init back down to 0.7.9-9 is quick path to getting this issue fixed until cloud-init-17.1-4 gets sorted out.

Comment 4 Lars Kellogg-Stedman 2018-04-19 22:09:03 UTC
Created attachment 1424309 [details]
cloud-init.log

attached cloud-init log from https://pagure.io/atomic-wg/issue/456

Comment 5 Lars Kellogg-Stedman 2018-04-19 22:09:52 UTC
Created attachment 1424310 [details]
journal.log

attached system journal from https://pagure.io/atomic-wg/issue/456

Comment 6 Lars Kellogg-Stedman 2018-04-22 02:02:55 UTC
Inspecting the obj.pkl file created by cloud-init on a failing instance, we see:

>>> import cloudinit
>>> import pickle
>>> d=pickle.load(open('obj.pkl','rb'))
>>>
>>> d.network_config
{'version': 1, 'config': [{'type': 'physical', 'name': 'eth0', 'subnets': [], 'mac_address': '12:66:92:11:cc:64'}]}

While on an instance with a public ip address, we see:

>>> import cloudinit
>>> import pickle
>>> d=pickle.load(open('obj.pkl','rb'))
>>> d.network_config
{'version': 1, 'config': [{'type': 'physical', 'name': 'eth0', 'subnets': [{'type': 'dhcp4'}], 'mac_address': '12:ae:7e:61:ab:e8'}]}

Note the difference in the contents of the "subnets" key, which is empty on the failing instance.

Comment 7 Lars Kellogg-Stedman 2018-04-22 02:16:25 UTC
This appears to be a bug in cloud-init that was fixed upstream in https://git.launchpad.net/cloud-init/commit/?id=eb292c18c3d83b9f7e5d1fd81b0e8aefaab0cc2d.

Comment 8 Lars Kellogg-Stedman 2018-04-22 02:40:16 UTC
Scratch build @ https://koji.fedoraproject.org/koji/taskinfo?taskID=26485084 that should resolve the problem. Someone let me know if it works?

Comment 9 Dusty Mabe 2018-04-22 15:27:30 UTC
This worked for my test. I created ami-da8a21a5. Please others use this to verify things work!

Lars, can you create a PR to cloud-init? and we'll get it merged and we'll create an official build and get a bodhi update for it.

Comment 10 Lars Kellogg-Stedman 2018-04-22 16:16:37 UTC
I've opened https://src.fedoraproject.org/rpms/cloud-init/pull-request/2 with the fix.

Comment 11 Joe Doss 2018-04-22 19:07:50 UTC
This worked with my tests as well:

[jdoss@sts71 ~]$ ssh fedora.11.104
Warning: Permanently added '10.0.11.104' (ECDSA) to the list of known hosts.
[fedora@ip-10-0-11-104 ~]$ cat /etc/redhat-release 
Fedora release 28 (Twenty Eight)
[fedora@ip-10-0-11-104 ~]$ rpm -qa |grep cloud-init
cloud-init-17.1-5.fc29.noarch

I am able to connect to the instance over the private network without any public IP added to the server.

Comment 12 Fedora Update System 2018-04-23 03:08:08 UTC
cloud-init-17.1-5.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-768f665d9d

Comment 13 František Zatloukal 2018-04-23 16:37:07 UTC
Discussed during the 2018-04-23 blocker review meeting: [1]

The decision to classify this bug as an AcceptedFreezeException and punt (delay decision) for Blocker was made:

"this is a judgment call as it depends how common ec2 instances with only private IPs are. We agree this is at least serious enough to be an FE; we don't yet have enough data to be totally sure if it's serious enough to be a blocker. If the fix presents problems, we will revisit whether to make this a blocker."

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2018-04-23/f28-blocker-review.2018-04-23-16.00.log.txt

Comment 14 Fedora Update System 2018-04-23 22:52:35 UTC
cloud-init-17.1-5.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-768f665d9d

Comment 15 Kevin Fenzi 2018-04-24 00:58:38 UTC
I don't have a handy way to test this in openstack without a .qcow2, but I did just test today's rawhide image that has cloud-init-17.1-5.fc29.noarch with these same changes in it. It works as expected no problems with network. 

So, at least it does not appear to cause any regressions with the openstack case.

Comment 16 Neal Gompa 2018-04-24 14:23:59 UTC
This issue was afflicting me too, I'm looking forward to this being fixed. :)

Comment 17 Dusty Mabe 2018-04-24 15:10:54 UTC
I created a raw image [1] and corresponding ami-b48236cb (us-east-1) for people to use to test the fix for this bug.

[1] https://dustymabe.fedorapeople.org/cloud-init-bz1569321-f28.raw

Comment 18 Dusty Mabe 2018-04-24 15:24:59 UTC
(In reply to Dusty Mabe from comment #17)
> I created a raw image [1] and corresponding ami-b48236cb (us-east-1) for
> people to use to test the fix for this bug.
> 
> [1] https://dustymabe.fedorapeople.org/cloud-init-bz1569321-f28.raw

scratch that.. use this raw.xz image link instead: https://dustymabe.fedorapeople.org/cloud-init-bz1569321-f28.raw.xz

Comment 19 Paul Whalen 2018-04-24 18:09:58 UTC
Tested on a local private network, no issues using https://dustymabe.fedorapeople.org/cloud-init-bz1569321-f28.raw.xz

Comment 20 Adam Williamson 2018-04-24 18:11:24 UTC
Paul: can you give the update karma too, then? Thanks!

Comment 21 Joe Doss 2018-04-24 20:48:58 UTC
I tested Dusty's new AMI in us-east-1 and in us-west-2 and it works as intended.

[jdoss@sts71 ~]$ ssh fedora.11.82
Warning: Permanently added '10.0.11.82' (ECDSA) to the list of known hosts.
[fedora@ip-10-0-11-82 ~]$ cat /etc/redhat-release 
Fedora release 28 (Twenty Eight)
[fedora@ip-10-0-11-82 ~]$

Comment 22 Fedora Update System 2018-04-25 00:02:32 UTC
cloud-init-17.1-5.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.