Bug 1311655 - cloud-init under OpenNebula works only after reboot
cloud-init under OpenNebula works only after reboot
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: cloud-init (Show other bugs)
23
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Garrett Holmstrom
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-24 11:55 EST by Jan "Yenya" Kasprzak
Modified: 2016-10-18 07:29 EDT (History)
8 users (show)

See Also:
Fixed In Version: cloud-init-0.7.8-2.fc25
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-10-18 07:29:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jan "Yenya" Kasprzak 2016-02-24 11:55:35 EST
Description of problem:
Image contextualization using cloud-init in Fedora 23 under OpenNebula does not set up the desired IPv4 address when the machine is newly deployed. After rebooting the VM using ctrl-alt-del over VNC, the address gets assigned correctly.

Version-Release number of selected component (if applicable):
cloud-init-0.7.6-5.20140218bzr1060.fc23.noarch
Fedora-Cloud-Base-23-20151030.x86_64.qcow2 (also tested on manually prepared .qcow2 image created using kickstart of Fedora Server).

How reproducible:
100 %

Steps to Reproduce:
1. Create a Fedora 23 image inside OpenNebula with cloud-init installed, NetworkManager disabled, network enabled, biosdevname disabled, etc. I tried the stock Fedora Cloud image, as well as manually installed F23 Server image.
2. Create a template using this image, enable network contextualization.
3. Deploy one or more instances of this template.

Actual results:
The newly deployed machine is unreachable over the IPv4 network.

Expected results:
The newly deployed machine should use the assigned IPv4 configuration.

Additional info:
- the configuration in the contextualization ISO image gets read somehow, because the hostname of the machine is set to ip-X-Y-A-B.localdomain (X.Y.A.B being the assigned IPv4 address).

- the interface eth0 is up, and gets assigned the IPv6 address including the default IPv6 gateway from the router advertisement packets on my network.

- when I click "Send Ctrl-Alt-Del" in the VNC interface, the VM reboots, and after reboot everything works as expected.

- the contents of the context.sh file is the following (addresses and SSH key replaced):
vm# mount /dev/cdrom /mnt
vm# cat /mnt/context.sh
# Context variables generated by OpenNebula
DISK_ID='1'
ETH0_DNS='X.Y.Z.Q'
ETH0_GATEWAY='X.Y.A.1'
ETH0_IP='X.Y.A.B'
ETH0_MAC='02:00:XX:YY:AA:BB'
ETH0_MASK='255.255.255.0'
ETH0_NETWORK='X.Y.A.0'
NETWORK='YES'
SSH_PUBLIC_KEY='ssh-rsa AAAA[...]YKQ== me@myhost.org
'
TARGET='hda'

- the problem is indeed in IPv4 addressing:
vm# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 1234:5678:9012:3456:0:XXff:feYY:AABB  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::XXff:feYY:AABB  prefixlen 64  scopeid 0x20<link>
        ether 02:00:XX:YY:AA:BB  txqueuelen 1000  (Ethernet)
        RX packets 9903  bytes 754531 (736.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 413  bytes 63764 (62.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

- the contextualization is correctly translated to the ifcfg-eth0 file:
vm#  cat /etc/sysconfig/network-scripts/ifcfg-eth0 
# Created by cloud-init v. 0.7.7 on Wed, 24 Feb 2016 09:50:24 +0000
BOOTPROTO=static
GATEWAY=X.Y.A.1
DEVICE=eth0
IPADDR=X.Y.A.B
NETMASK=255.255.255.0
ONBOOT=yes

- /etc/resolv.conf contains _two_ nameservers, which - in context of our network - means that it was probably created based on DHCP data and not based on context.sh: 
vm# cat /etc/resolv.conf 
# Generated by NetworkManager
search my.domain
nameserver X.Y.Z.Q
nameserver X.Y.Z.R
Comment 1 Jan "Yenya" Kasprzak 2016-03-16 17:46:56 EDT
OK, the problem is that cloud-init-local.service has no dependency on the network service. The network service (/etc/init.d/network) is started in parallel with cloud-init-local.service, so it grabs whatever was in /etc/sysconfig/network-scripts/ifcfg-eth0 at the time the VM image has been created, sets up the eth0 interface according to it, and some time after it the cloud-init --local creates the proper ifcfg-eth0 file, which will then be used after the reboot.

I think the correct solution would be to have an explicit dependency between cloud-init-local.service and network.service:

--- /tmp/cloud-init-local.service	2016-03-16 22:39:41.570920600 +0100
+++ /usr/lib/systemd/system/cloud-init-local.service	2016-03-16 22:31:10.339623114 +0100
@@ -2,6 +2,7 @@
 Description=Initial cloud-init job (pre-networking)
 Wants=local-fs.target
 After=local-fs.target
+Before=network.service
 
 [Service]
 Type=oneshot
Comment 2 Jan "Yenya" Kasprzak 2016-03-17 03:42:44 EDT
Apparently upstream already has a similar fix:

http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/systemd/cloud-init-local.service

They seem to use Before=network-pre.target instead of Before=network.service. Please fix this in F23 - without this, F23 cloud images are unusable with cloud-init image contextualization. Thanks!
Comment 3 Jan "Yenya" Kasprzak 2016-03-30 15:40:06 EDT
Further testing reveals that Before=network-pre.target does not help for F23. Only Before=network.service provides the proper ordering.
Comment 4 Bert JW Regeer (CTL) 2016-09-09 14:41:48 EDT
On CentOS 7 I had to use Before=NetworkManager.service because otherwise NetworkManager would start at the same time as cloud-init-local, and would start managing /etc/resolv.conf instead of using the entries from ConfigDrive in my case (since cloud-init writes those to /etc/resolv.conf directly rather than putting them in ifcg-* scripts where NM could pick them up)

Since it looks like using OpenNebula with context.sh also directly modifies /etc/resolv.conf you'll also run into this issue:


See: https://bugs.launchpad.net/cloud-init/+bug/1620845 and https://bugs.launchpad.net/cloud-init/+bug/1620807
Comment 5 Fedora Update System 2016-10-15 21:55:56 EDT
cloud-init-0.7.8-2.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-e96b704c39
Comment 6 Fedora Update System 2016-10-18 07:29:48 EDT
cloud-init-0.7.8-2.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.