Red Hat Bugzilla – Bug 1311655
cloud-init under OpenNebula works only after reboot
Last modified: 2016-10-18 07:29:48 EDT
Description of problem:
Image contextualization using cloud-init in Fedora 23 under OpenNebula does not set up the desired IPv4 address when the machine is newly deployed. After rebooting the VM using ctrl-alt-del over VNC, the address gets assigned correctly.
Version-Release number of selected component (if applicable):
Fedora-Cloud-Base-23-20151030.x86_64.qcow2 (also tested on manually prepared .qcow2 image created using kickstart of Fedora Server).
Steps to Reproduce:
1. Create a Fedora 23 image inside OpenNebula with cloud-init installed, NetworkManager disabled, network enabled, biosdevname disabled, etc. I tried the stock Fedora Cloud image, as well as manually installed F23 Server image.
2. Create a template using this image, enable network contextualization.
3. Deploy one or more instances of this template.
The newly deployed machine is unreachable over the IPv4 network.
The newly deployed machine should use the assigned IPv4 configuration.
- the configuration in the contextualization ISO image gets read somehow, because the hostname of the machine is set to ip-X-Y-A-B.localdomain (X.Y.A.B being the assigned IPv4 address).
- the interface eth0 is up, and gets assigned the IPv6 address including the default IPv6 gateway from the router advertisement packets on my network.
- when I click "Send Ctrl-Alt-Del" in the VNC interface, the VM reboots, and after reboot everything works as expected.
- the contents of the context.sh file is the following (addresses and SSH key replaced):
vm# mount /dev/cdrom /mnt
vm# cat /mnt/context.sh
# Context variables generated by OpenNebula
SSH_PUBLIC_KEY='ssh-rsa AAAA[...]YKQ== firstname.lastname@example.org
- the problem is indeed in IPv4 addressing:
vm# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 1234:5678:9012:3456:0:XXff:feYY:AABB prefixlen 64 scopeid 0x0<global>
inet6 fe80::XXff:feYY:AABB prefixlen 64 scopeid 0x20<link>
ether 02:00:XX:YY:AA:BB txqueuelen 1000 (Ethernet)
RX packets 9903 bytes 754531 (736.8 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 413 bytes 63764 (62.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
- the contextualization is correctly translated to the ifcfg-eth0 file:
vm# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Created by cloud-init v. 0.7.7 on Wed, 24 Feb 2016 09:50:24 +0000
- /etc/resolv.conf contains _two_ nameservers, which - in context of our network - means that it was probably created based on DHCP data and not based on context.sh:
vm# cat /etc/resolv.conf
# Generated by NetworkManager
OK, the problem is that cloud-init-local.service has no dependency on the network service. The network service (/etc/init.d/network) is started in parallel with cloud-init-local.service, so it grabs whatever was in /etc/sysconfig/network-scripts/ifcfg-eth0 at the time the VM image has been created, sets up the eth0 interface according to it, and some time after it the cloud-init --local creates the proper ifcfg-eth0 file, which will then be used after the reboot.
I think the correct solution would be to have an explicit dependency between cloud-init-local.service and network.service:
--- /tmp/cloud-init-local.service 2016-03-16 22:39:41.570920600 +0100
+++ /usr/lib/systemd/system/cloud-init-local.service 2016-03-16 22:31:10.339623114 +0100
@@ -2,6 +2,7 @@
Description=Initial cloud-init job (pre-networking)
Apparently upstream already has a similar fix:
They seem to use Before=network-pre.target instead of Before=network.service. Please fix this in F23 - without this, F23 cloud images are unusable with cloud-init image contextualization. Thanks!
Further testing reveals that Before=network-pre.target does not help for F23. Only Before=network.service provides the proper ordering.
On CentOS 7 I had to use Before=NetworkManager.service because otherwise NetworkManager would start at the same time as cloud-init-local, and would start managing /etc/resolv.conf instead of using the entries from ConfigDrive in my case (since cloud-init writes those to /etc/resolv.conf directly rather than putting them in ifcg-* scripts where NM could pick them up)
Since it looks like using OpenNebula with context.sh also directly modifies /etc/resolv.conf you'll also run into this issue:
See: https://bugs.launchpad.net/cloud-init/+bug/1620845 and https://bugs.launchpad.net/cloud-init/+bug/1620807
cloud-init-0.7.8-2.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-e96b704c39
cloud-init-0.7.8-2.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.