1311655 – cloud-init under OpenNebula works only after reboot

Bug 1311655 - cloud-init under OpenNebula works only after reboot

Summary: cloud-init under OpenNebula works only after reboot

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	cloud-init
Sub Component:
Version:	23
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Garrett Holmstrom
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-02-24 16:55 UTC by Jan "Yenya" Kasprzak
Modified:	2016-10-18 11:29 UTC (History)
CC List:	8 users (show)
Fixed In Version:	cloud-init-0.7.8-2.fc25
Clone Of:
Environment:
Last Closed:	2016-10-18 11:29:48 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Jan "Yenya" Kasprzak 2016-02-24 16:55:35 UTC

Description of problem:
Image contextualization using cloud-init in Fedora 23 under OpenNebula does not set up the desired IPv4 address when the machine is newly deployed. After rebooting the VM using ctrl-alt-del over VNC, the address gets assigned correctly.

Version-Release number of selected component (if applicable):
cloud-init-0.7.6-5.20140218bzr1060.fc23.noarch
Fedora-Cloud-Base-23-20151030.x86_64.qcow2 (also tested on manually prepared .qcow2 image created using kickstart of Fedora Server).

How reproducible:
100 %

Steps to Reproduce:
1. Create a Fedora 23 image inside OpenNebula with cloud-init installed, NetworkManager disabled, network enabled, biosdevname disabled, etc. I tried the stock Fedora Cloud image, as well as manually installed F23 Server image.
2. Create a template using this image, enable network contextualization.
3. Deploy one or more instances of this template.

Actual results:
The newly deployed machine is unreachable over the IPv4 network.

Expected results:
The newly deployed machine should use the assigned IPv4 configuration.

Additional info:
- the configuration in the contextualization ISO image gets read somehow, because the hostname of the machine is set to ip-X-Y-A-B.localdomain (X.Y.A.B being the assigned IPv4 address).

- the interface eth0 is up, and gets assigned the IPv6 address including the default IPv6 gateway from the router advertisement packets on my network.

- when I click "Send Ctrl-Alt-Del" in the VNC interface, the VM reboots, and after reboot everything works as expected.

- the contents of the context.sh file is the following (addresses and SSH key replaced):
vm# mount /dev/cdrom /mnt
vm# cat /mnt/context.sh
# Context variables generated by OpenNebula
DISK_ID='1'
ETH0_DNS='X.Y.Z.Q'
ETH0_GATEWAY='X.Y.A.1'
ETH0_IP='X.Y.A.B'
ETH0_MAC='02:00:XX:YY:AA:BB'
ETH0_MASK='255.255.255.0'
ETH0_NETWORK='X.Y.A.0'
NETWORK='YES'
SSH_PUBLIC_KEY='ssh-rsa AAAA[...]YKQ== me
'
TARGET='hda'

- the problem is indeed in IPv4 addressing:
vm# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 1234:5678:9012:3456:0:XXff:feYY:AABB  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::XXff:feYY:AABB  prefixlen 64  scopeid 0x20<link>
        ether 02:00:XX:YY:AA:BB  txqueuelen 1000  (Ethernet)
        RX packets 9903  bytes 754531 (736.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 413  bytes 63764 (62.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

- the contextualization is correctly translated to the ifcfg-eth0 file:
vm#  cat /etc/sysconfig/network-scripts/ifcfg-eth0 
# Created by cloud-init v. 0.7.7 on Wed, 24 Feb 2016 09:50:24 +0000
BOOTPROTO=static
GATEWAY=X.Y.A.1
DEVICE=eth0
IPADDR=X.Y.A.B
NETMASK=255.255.255.0
ONBOOT=yes

- /etc/resolv.conf contains _two_ nameservers, which - in context of our network - means that it was probably created based on DHCP data and not based on context.sh: 
vm# cat /etc/resolv.conf 
# Generated by NetworkManager
search my.domain
nameserver X.Y.Z.Q
nameserver X.Y.Z.R

Comment 1 Jan "Yenya" Kasprzak 2016-03-16 21:46:56 UTC

OK, the problem is that cloud-init-local.service has no dependency on the network service. The network service (/etc/init.d/network) is started in parallel with cloud-init-local.service, so it grabs whatever was in /etc/sysconfig/network-scripts/ifcfg-eth0 at the time the VM image has been created, sets up the eth0 interface according to it, and some time after it the cloud-init --local creates the proper ifcfg-eth0 file, which will then be used after the reboot.

I think the correct solution would be to have an explicit dependency between cloud-init-local.service and network.service:

--- /tmp/cloud-init-local.service	2016-03-16 22:39:41.570920600 +0100
+++ /usr/lib/systemd/system/cloud-init-local.service	2016-03-16 22:31:10.339623114 +0100
@@ -2,6 +2,7 @@
 Description=Initial cloud-init job (pre-networking)
 Wants=local-fs.target
 After=local-fs.target
+Before=network.service
 
 [Service]
 Type=oneshot

Comment 2 Jan "Yenya" Kasprzak 2016-03-17 07:42:44 UTC

Apparently upstream already has a similar fix:

http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/systemd/cloud-init-local.service

They seem to use Before=network-pre.target instead of Before=network.service. Please fix this in F23 - without this, F23 cloud images are unusable with cloud-init image contextualization. Thanks!

Comment 3 Jan "Yenya" Kasprzak 2016-03-30 19:40:06 UTC

Further testing reveals that Before=network-pre.target does not help for F23. Only Before=network.service provides the proper ordering.

Comment 4 Bert JW Regeer (CTL) 2016-09-09 18:41:48 UTC

On CentOS 7 I had to use Before=NetworkManager.service because otherwise NetworkManager would start at the same time as cloud-init-local, and would start managing /etc/resolv.conf instead of using the entries from ConfigDrive in my case (since cloud-init writes those to /etc/resolv.conf directly rather than putting them in ifcg-* scripts where NM could pick them up)

Since it looks like using OpenNebula with context.sh also directly modifies /etc/resolv.conf you'll also run into this issue:


See: https://bugs.launchpad.net/cloud-init/+bug/1620845 and https://bugs.launchpad.net/cloud-init/+bug/1620807

Comment 5 Fedora Update System 2016-10-16 01:55:56 UTC

cloud-init-0.7.8-2.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-e96b704c39

Comment 6 Fedora Update System 2016-10-18 11:29:48 UTC

cloud-init-0.7.8-2.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.