Bug 1311655

Summary: cloud-init under OpenNebula works only after reboot
Product: [Fedora] Fedora Reporter: Jan "Yenya" Kasprzak <kas>
Component: cloud-initAssignee: Garrett Holmstrom <gholms>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 23CC: adimania, apevec, bert.regeer, gholms, Jan.van.Eldik, mattdm, shardy, s
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: cloud-init-0.7.8-2.fc25 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-18 11:29:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan "Yenya" Kasprzak 2016-02-24 16:55:35 UTC
Description of problem:
Image contextualization using cloud-init in Fedora 23 under OpenNebula does not set up the desired IPv4 address when the machine is newly deployed. After rebooting the VM using ctrl-alt-del over VNC, the address gets assigned correctly.

Version-Release number of selected component (if applicable):
cloud-init-0.7.6-5.20140218bzr1060.fc23.noarch
Fedora-Cloud-Base-23-20151030.x86_64.qcow2 (also tested on manually prepared .qcow2 image created using kickstart of Fedora Server).

How reproducible:
100 %

Steps to Reproduce:
1. Create a Fedora 23 image inside OpenNebula with cloud-init installed, NetworkManager disabled, network enabled, biosdevname disabled, etc. I tried the stock Fedora Cloud image, as well as manually installed F23 Server image.
2. Create a template using this image, enable network contextualization.
3. Deploy one or more instances of this template.

Actual results:
The newly deployed machine is unreachable over the IPv4 network.

Expected results:
The newly deployed machine should use the assigned IPv4 configuration.

Additional info:
- the configuration in the contextualization ISO image gets read somehow, because the hostname of the machine is set to ip-X-Y-A-B.localdomain (X.Y.A.B being the assigned IPv4 address).

- the interface eth0 is up, and gets assigned the IPv6 address including the default IPv6 gateway from the router advertisement packets on my network.

- when I click "Send Ctrl-Alt-Del" in the VNC interface, the VM reboots, and after reboot everything works as expected.

- the contents of the context.sh file is the following (addresses and SSH key replaced):
vm# mount /dev/cdrom /mnt
vm# cat /mnt/context.sh
# Context variables generated by OpenNebula
DISK_ID='1'
ETH0_DNS='X.Y.Z.Q'
ETH0_GATEWAY='X.Y.A.1'
ETH0_IP='X.Y.A.B'
ETH0_MAC='02:00:XX:YY:AA:BB'
ETH0_MASK='255.255.255.0'
ETH0_NETWORK='X.Y.A.0'
NETWORK='YES'
SSH_PUBLIC_KEY='ssh-rsa AAAA[...]YKQ== me
'
TARGET='hda'

- the problem is indeed in IPv4 addressing:
vm# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 1234:5678:9012:3456:0:XXff:feYY:AABB  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::XXff:feYY:AABB  prefixlen 64  scopeid 0x20<link>
        ether 02:00:XX:YY:AA:BB  txqueuelen 1000  (Ethernet)
        RX packets 9903  bytes 754531 (736.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 413  bytes 63764 (62.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

- the contextualization is correctly translated to the ifcfg-eth0 file:
vm#  cat /etc/sysconfig/network-scripts/ifcfg-eth0 
# Created by cloud-init v. 0.7.7 on Wed, 24 Feb 2016 09:50:24 +0000
BOOTPROTO=static
GATEWAY=X.Y.A.1
DEVICE=eth0
IPADDR=X.Y.A.B
NETMASK=255.255.255.0
ONBOOT=yes

- /etc/resolv.conf contains _two_ nameservers, which - in context of our network - means that it was probably created based on DHCP data and not based on context.sh: 
vm# cat /etc/resolv.conf 
# Generated by NetworkManager
search my.domain
nameserver X.Y.Z.Q
nameserver X.Y.Z.R

Comment 1 Jan "Yenya" Kasprzak 2016-03-16 21:46:56 UTC
OK, the problem is that cloud-init-local.service has no dependency on the network service. The network service (/etc/init.d/network) is started in parallel with cloud-init-local.service, so it grabs whatever was in /etc/sysconfig/network-scripts/ifcfg-eth0 at the time the VM image has been created, sets up the eth0 interface according to it, and some time after it the cloud-init --local creates the proper ifcfg-eth0 file, which will then be used after the reboot.

I think the correct solution would be to have an explicit dependency between cloud-init-local.service and network.service:

--- /tmp/cloud-init-local.service	2016-03-16 22:39:41.570920600 +0100
+++ /usr/lib/systemd/system/cloud-init-local.service	2016-03-16 22:31:10.339623114 +0100
@@ -2,6 +2,7 @@
 Description=Initial cloud-init job (pre-networking)
 Wants=local-fs.target
 After=local-fs.target
+Before=network.service
 
 [Service]
 Type=oneshot

Comment 2 Jan "Yenya" Kasprzak 2016-03-17 07:42:44 UTC
Apparently upstream already has a similar fix:

http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/systemd/cloud-init-local.service

They seem to use Before=network-pre.target instead of Before=network.service. Please fix this in F23 - without this, F23 cloud images are unusable with cloud-init image contextualization. Thanks!

Comment 3 Jan "Yenya" Kasprzak 2016-03-30 19:40:06 UTC
Further testing reveals that Before=network-pre.target does not help for F23. Only Before=network.service provides the proper ordering.

Comment 4 Bert JW Regeer (CTL) 2016-09-09 18:41:48 UTC
On CentOS 7 I had to use Before=NetworkManager.service because otherwise NetworkManager would start at the same time as cloud-init-local, and would start managing /etc/resolv.conf instead of using the entries from ConfigDrive in my case (since cloud-init writes those to /etc/resolv.conf directly rather than putting them in ifcg-* scripts where NM could pick them up)

Since it looks like using OpenNebula with context.sh also directly modifies /etc/resolv.conf you'll also run into this issue:


See: https://bugs.launchpad.net/cloud-init/+bug/1620845 and https://bugs.launchpad.net/cloud-init/+bug/1620807

Comment 5 Fedora Update System 2016-10-16 01:55:56 UTC
cloud-init-0.7.8-2.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-e96b704c39

Comment 6 Fedora Update System 2016-10-18 11:29:48 UTC
cloud-init-0.7.8-2.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.