Description of problem: Fedora 28 Atomic Beta (and F28 Beta Cloud) when using private IPs on the instance only, does not connect to the network. This results in the EC2 instance failing it's connectivity test and you cannot connect to it via SSH. If you create the EC2 instance with a Public IP however, it will work just fine. Looking at the instance volume on a working server it seems that BOOTPROTO=none is being set by cloud-init so it does not DHCP correctly. [fedora@worker ~]# cat /mnt/f28/etc/redhat-release Fedora release 28 (Twenty Eight) [fedora@worker ~]# cat /mnt/f28/etc/sysconfig/network-scripts/ifcfg-eth0 # Created by cloud-init on instance boot automatically, do not edit. # BOOTPROTO=none DEVICE=eth0 HWADDR=06:83:72:56:7c:66 ONBOOT=yes TYPE=Ethernet USERCTL=no Version-Release number of selected component (if applicable): cloud-init-17.1-4.fc28 How reproducible: Always. Steps to Reproduce: 1. Create EC2 instance with Fedora-AtomicHost-28_Beta-1.3.x86_64-us-west-2-HVM-gp2-0 (ami-e5841f9d) without a public IP address. 2. Observe that it doesn't pass it's Instance Status Checks  Actual results: EC2 instance is inaccessible. Expected results: An EC2 instance accessible via the private network. Additional info: https://pagure.io/atomic-wg/issue/456
Proposed as a Blocker for 28-final by Fedora user dustymabe using the blocker tracking app because: "The cloud-init package must be functional for release blocking cloud images." I don't know if this is really a blocker (per release criteria above), but we have found a case in ec2 where cloud-init doesn't work for instances that are requested to not have a public IP (i.e. access to them is through other instances in the same VPC). Proposing as blocker for now and then we can just bump to FE if it doesn't meet the requirements.
I'm at least +1 FE to this, not entirely sure if it merits blocker status, don't have enough experience of how big a deal this is.
To add some end-user perspective. We have ~200 VMs running Fedora Cloud and six of them use the public IP features in EC2. Since we rely heavily on running VMs with only private IP addresses (we access them via SSH jump box) this makes this bug huge blocker on using Fedora 28 Atomic / Cloud out of the box. I understand that it might not fit all the release criteria for a blocker, but I wanted add some color on why this could be a blocker for some end users on consuming Fedora 28 on release. The good news is that I built an AMI with a downgraded cloud-init and it works just fine. [jdoss@sts71 ~]$ ssh fedora.11.156 Warning: Permanently added '10.0.11.156' (ECDSA) to the list of known hosts. [fedora@ip-10-0-11-156 ~]$ cat /etc/redhat-release Fedora release 28 (Twenty Eight) [fedora@ip-10-0-11-156 ~]$ rpm -qa |grep cloud-init cloud-init-0.7.9-9.fc27.noarch Maybe just reverting cloud-init back down to 0.7.9-9 is quick path to getting this issue fixed until cloud-init-17.1-4 gets sorted out.
Created attachment 1424309 [details] cloud-init.log attached cloud-init log from https://pagure.io/atomic-wg/issue/456
Created attachment 1424310 [details] journal.log attached system journal from https://pagure.io/atomic-wg/issue/456
Inspecting the obj.pkl file created by cloud-init on a failing instance, we see: >>> import cloudinit >>> import pickle >>> d=pickle.load(open('obj.pkl','rb')) >>> >>> d.network_config {'version': 1, 'config': [{'type': 'physical', 'name': 'eth0', 'subnets': [], 'mac_address': '12:66:92:11:cc:64'}]} While on an instance with a public ip address, we see: >>> import cloudinit >>> import pickle >>> d=pickle.load(open('obj.pkl','rb')) >>> d.network_config {'version': 1, 'config': [{'type': 'physical', 'name': 'eth0', 'subnets': [{'type': 'dhcp4'}], 'mac_address': '12:ae:7e:61:ab:e8'}]} Note the difference in the contents of the "subnets" key, which is empty on the failing instance.
This appears to be a bug in cloud-init that was fixed upstream in https://git.launchpad.net/cloud-init/commit/?id=eb292c18c3d83b9f7e5d1fd81b0e8aefaab0cc2d.
Scratch build @ https://koji.fedoraproject.org/koji/taskinfo?taskID=26485084 that should resolve the problem. Someone let me know if it works?
This worked for my test. I created ami-da8a21a5. Please others use this to verify things work! Lars, can you create a PR to cloud-init? and we'll get it merged and we'll create an official build and get a bodhi update for it.
I've opened https://src.fedoraproject.org/rpms/cloud-init/pull-request/2 with the fix.
This worked with my tests as well: [jdoss@sts71 ~]$ ssh fedora.11.104 Warning: Permanently added '10.0.11.104' (ECDSA) to the list of known hosts. [fedora@ip-10-0-11-104 ~]$ cat /etc/redhat-release Fedora release 28 (Twenty Eight) [fedora@ip-10-0-11-104 ~]$ rpm -qa |grep cloud-init cloud-init-17.1-5.fc29.noarch I am able to connect to the instance over the private network without any public IP added to the server.
cloud-init-17.1-5.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-768f665d9d
Discussed during the 2018-04-23 blocker review meeting: [1] The decision to classify this bug as an AcceptedFreezeException and punt (delay decision) for Blocker was made: "this is a judgment call as it depends how common ec2 instances with only private IPs are. We agree this is at least serious enough to be an FE; we don't yet have enough data to be totally sure if it's serious enough to be a blocker. If the fix presents problems, we will revisit whether to make this a blocker." [1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2018-04-23/f28-blocker-review.2018-04-23-16.00.log.txt
cloud-init-17.1-5.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-768f665d9d
I don't have a handy way to test this in openstack without a .qcow2, but I did just test today's rawhide image that has cloud-init-17.1-5.fc29.noarch with these same changes in it. It works as expected no problems with network. So, at least it does not appear to cause any regressions with the openstack case.
This issue was afflicting me too, I'm looking forward to this being fixed. :)
I created a raw image [1] and corresponding ami-b48236cb (us-east-1) for people to use to test the fix for this bug. [1] https://dustymabe.fedorapeople.org/cloud-init-bz1569321-f28.raw
(In reply to Dusty Mabe from comment #17) > I created a raw image [1] and corresponding ami-b48236cb (us-east-1) for > people to use to test the fix for this bug. > > [1] https://dustymabe.fedorapeople.org/cloud-init-bz1569321-f28.raw scratch that.. use this raw.xz image link instead: https://dustymabe.fedorapeople.org/cloud-init-bz1569321-f28.raw.xz
Tested on a local private network, no issues using https://dustymabe.fedorapeople.org/cloud-init-bz1569321-f28.raw.xz
Paul: can you give the update karma too, then? Thanks!
I tested Dusty's new AMI in us-east-1 and in us-west-2 and it works as intended. [jdoss@sts71 ~]$ ssh fedora.11.82 Warning: Permanently added '10.0.11.82' (ECDSA) to the list of known hosts. [fedora@ip-10-0-11-82 ~]$ cat /etc/redhat-release Fedora release 28 (Twenty Eight) [fedora@ip-10-0-11-82 ~]$
cloud-init-17.1-5.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.