Description of problem: Image contextualization using cloud-init in Fedora 23 under OpenNebula does not set up the desired IPv4 address when the machine is newly deployed. After rebooting the VM using ctrl-alt-del over VNC, the address gets assigned correctly. Version-Release number of selected component (if applicable): cloud-init-0.7.6-5.20140218bzr1060.fc23.noarch Fedora-Cloud-Base-23-20151030.x86_64.qcow2 (also tested on manually prepared .qcow2 image created using kickstart of Fedora Server). How reproducible: 100 % Steps to Reproduce: 1. Create a Fedora 23 image inside OpenNebula with cloud-init installed, NetworkManager disabled, network enabled, biosdevname disabled, etc. I tried the stock Fedora Cloud image, as well as manually installed F23 Server image. 2. Create a template using this image, enable network contextualization. 3. Deploy one or more instances of this template. Actual results: The newly deployed machine is unreachable over the IPv4 network. Expected results: The newly deployed machine should use the assigned IPv4 configuration. Additional info: - the configuration in the contextualization ISO image gets read somehow, because the hostname of the machine is set to ip-X-Y-A-B.localdomain (X.Y.A.B being the assigned IPv4 address). - the interface eth0 is up, and gets assigned the IPv6 address including the default IPv6 gateway from the router advertisement packets on my network. - when I click "Send Ctrl-Alt-Del" in the VNC interface, the VM reboots, and after reboot everything works as expected. - the contents of the context.sh file is the following (addresses and SSH key replaced): vm# mount /dev/cdrom /mnt vm# cat /mnt/context.sh # Context variables generated by OpenNebula DISK_ID='1' ETH0_DNS='X.Y.Z.Q' ETH0_GATEWAY='X.Y.A.1' ETH0_IP='X.Y.A.B' ETH0_MAC='02:00:XX:YY:AA:BB' ETH0_MASK='255.255.255.0' ETH0_NETWORK='X.Y.A.0' NETWORK='YES' SSH_PUBLIC_KEY='ssh-rsa AAAA[...]YKQ== me ' TARGET='hda' - the problem is indeed in IPv4 addressing: vm# ifconfig eth0 eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 1234:5678:9012:3456:0:XXff:feYY:AABB prefixlen 64 scopeid 0x0<global> inet6 fe80::XXff:feYY:AABB prefixlen 64 scopeid 0x20<link> ether 02:00:XX:YY:AA:BB txqueuelen 1000 (Ethernet) RX packets 9903 bytes 754531 (736.8 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 413 bytes 63764 (62.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 - the contextualization is correctly translated to the ifcfg-eth0 file: vm# cat /etc/sysconfig/network-scripts/ifcfg-eth0 # Created by cloud-init v. 0.7.7 on Wed, 24 Feb 2016 09:50:24 +0000 BOOTPROTO=static GATEWAY=X.Y.A.1 DEVICE=eth0 IPADDR=X.Y.A.B NETMASK=255.255.255.0 ONBOOT=yes - /etc/resolv.conf contains _two_ nameservers, which - in context of our network - means that it was probably created based on DHCP data and not based on context.sh: vm# cat /etc/resolv.conf # Generated by NetworkManager search my.domain nameserver X.Y.Z.Q nameserver X.Y.Z.R
OK, the problem is that cloud-init-local.service has no dependency on the network service. The network service (/etc/init.d/network) is started in parallel with cloud-init-local.service, so it grabs whatever was in /etc/sysconfig/network-scripts/ifcfg-eth0 at the time the VM image has been created, sets up the eth0 interface according to it, and some time after it the cloud-init --local creates the proper ifcfg-eth0 file, which will then be used after the reboot. I think the correct solution would be to have an explicit dependency between cloud-init-local.service and network.service: --- /tmp/cloud-init-local.service 2016-03-16 22:39:41.570920600 +0100 +++ /usr/lib/systemd/system/cloud-init-local.service 2016-03-16 22:31:10.339623114 +0100 @@ -2,6 +2,7 @@ Description=Initial cloud-init job (pre-networking) Wants=local-fs.target After=local-fs.target +Before=network.service [Service] Type=oneshot
Apparently upstream already has a similar fix: http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/systemd/cloud-init-local.service They seem to use Before=network-pre.target instead of Before=network.service. Please fix this in F23 - without this, F23 cloud images are unusable with cloud-init image contextualization. Thanks!
Further testing reveals that Before=network-pre.target does not help for F23. Only Before=network.service provides the proper ordering.
On CentOS 7 I had to use Before=NetworkManager.service because otherwise NetworkManager would start at the same time as cloud-init-local, and would start managing /etc/resolv.conf instead of using the entries from ConfigDrive in my case (since cloud-init writes those to /etc/resolv.conf directly rather than putting them in ifcg-* scripts where NM could pick them up) Since it looks like using OpenNebula with context.sh also directly modifies /etc/resolv.conf you'll also run into this issue: See: https://bugs.launchpad.net/cloud-init/+bug/1620845 and https://bugs.launchpad.net/cloud-init/+bug/1620807
cloud-init-0.7.8-2.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-e96b704c39
cloud-init-0.7.8-2.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.