Bug 1393094

Summary: f25: cloud-init systemd dependency loop for multi-user.target
Product: [Fedora] Fedora Reporter: Dusty Mabe <dustymabe>
Component: cloud-initAssignee: Garrett Holmstrom <gholms>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 25CC: adimania, apevec, gholms, Jan.van.Eldik, lars, mattdm, shardy, s, steve.traylen, strigazi, walters
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: cloud-init-0.7.8-5.fc25 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-02 20:20:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Dusty Mabe 2016-11-08 20:43:10 UTC
=== NOTE === This bug is not causing breakage but does need to be fixed. It is not a blocker for F25.


Description of problem:

Booting cloud image with cloud-init-0.7.8-3.fc25.noarch you see the following:

Nov 08 19:10:56 digitalocean systemd[1]: multi-user.target: Found ordering cycle on multi-user.target/start
Nov 08 19:10:56 digitalocean systemd[1]: multi-user.target: Found dependency on cloud-init.target/start
Nov 08 19:10:56 digitalocean systemd[1]: multi-user.target: Found dependency on cloud-final.service/start
Nov 08 19:10:56 digitalocean systemd[1]: multi-user.target: Found dependency on multi-user.target/start
Nov 08 19:10:56 digitalocean systemd[1]: multi-user.target: Breaking ordering cycle by deleting job cloud-init.target/start
Nov 08 19:10:56 digitalocean systemd[1]: cloud-init.target: Job cloud-init.target/start deleted to break ordering cycle starting with multi-user.target/start



We need to backport this upstream fix:

https://git.launchpad.net/cloud-init/commit/?id=3705bb5964a2ff3f9a67265e6d090a112b35e40c

Comment 1 Spyros Trigazis 2017-01-17 17:06:59 UTC
Any update on this. I have noticed quite a few breaks on the atomic host. It's a bit random, but in slow vms, it happens a lot.

Comment 2 Spyros Trigazis 2017-01-18 08:46:06 UTC
The following are logs from OpenStack/Magnum's tests in the openstack infrastructure.

Two vms are created using Fedora Atomic 20170106.

In one node cloud init configures the network. [1]
On the other node it doesn't. [2]

Deployed in the same environment with the same qcow2 image.

In an environment built with the same process, we get the error mentioned in the bug description: [3]
Jan 16 13:03:02 localhost.localdomain systemd[1]: cloud-final.service: Job cloud-final.service/start deleted to break ordering cycle starting with cloud-init.target/start

[1] http://logs.openstack.org/57/417457/22/check/gate-functional-dsvm-magnum-k8s-ubuntu-xenial/f300eb2/logs/cluster-nodes/node-test_replication_controller_apis-172.24.5.8/cloud-init-output.txt.gz
[2] http://logs.openstack.org/57/417457/22/check/gate-functional-dsvm-magnum-k8s-ubuntu-xenial/f300eb2/logs/cluster-nodes/master-test_replication_controller_apis-172.24.5.5/cloud-config.txt.gz
[3] http://logs.openstack.org/57/417457/22/check/gate-functional-dsvm-magnum-swarm-ubuntu-xenial/f976e1b/logs/cluster-nodes/master-test_start_stop_container_from_api-172.24.5.8/cloud-final.txt.gz

Comment 3 Steve Traylen 2017-01-18 15:17:40 UTC
Hi,

This looks to be fixed in now release 0.7.9 version of cloud init.

Comment 4 Steve Traylen 2017-01-18 15:18:59 UTC
Please can we update 0.7.9?

Comment 5 Colin Walters 2017-01-20 19:11:29 UTC
Building https://koji.fedoraproject.org/koji/taskinfo?taskID=17346019

I'll backport to f25 soon.

Comment 6 Dusty Mabe 2017-01-23 14:40:22 UTC
> Building https://koji.fedoraproject.org/koji/taskinfo?taskID=17346019
> 
> I'll backport to f25 soon.

That build failed

Comment 8 Dusty Mabe 2017-01-23 15:27:27 UTC
also here is a scratch build for f25 if anyone wants to grab it and test things out:

https://koji.fedoraproject.org/koji/taskinfo?taskID=17393273

Comment 9 Dusty Mabe 2017-01-25 09:09:14 UTC
I found an issue with the new cloud-init - basically during the run of cloud-init it tries to set the hostname using hostnamectl but dbus hasn't started yet and it fails. We need to figure this out. 


from the debug log: 

```
2017-01-25 08:59:38,338 - util.py[DEBUG]: Running module set_hostname (<module 'cloudinit.config.cc_set_hostname' from '/usr/lib/python3.5/site-packages/cloudinit/config/cc_set_hostname.py'>) failed
Traceback (most recent call last):
  File "/usr/lib/python3.5/site-packages/cloudinit/stages.py", line 776, in _run_modules
    freq=freq)
  File "/usr/lib/python3.5/site-packages/cloudinit/cloud.py", line 54, in run
    return self._runners.run(name, functor, args, freq, clear_on_fail)
  File "/usr/lib/python3.5/site-packages/cloudinit/helpers.py", line 187, in run
    results = functor(*args)
  File "/usr/lib/python3.5/site-packages/cloudinit/config/cc_set_hostname.py", line 47, in handle
    cloud.distro.set_hostname(hostname, fqdn)
  File "/usr/lib/python3.5/site-packages/cloudinit/distros/__init__.py", line 84, in set_hostname
    self._write_hostname(writeable_hostname, self.hostname_conf_fn)
  File "/usr/lib/python3.5/site-packages/cloudinit/distros/rhel.py", line 130, in _write_hostname
    util.subp(['hostnamectl', 'set-hostname', str(hostname)])
  File "/usr/lib/python3.5/site-packages/cloudinit/util.py", line 1850, in subp
    cmd=args)
cloudinit.util.ProcessExecutionError: Unexpected error while running command.
Command: ['hostnamectl', 'set-hostname', 'cloudhost.localdomain']
Exit code: 1
Reason: -
Stdout: -
Stderr: Failed to create bus connection: No such file or directory

```


and from the journal you can see that dbus starts after cloud-init:

```
-bash-4.3# journalctl -u cloud-init | grep -P 'Starting|hostname' 
Jan 25 08:59:36 localhost.localdomain systemd[1]: Starting Initial cloud-init job (metadata service crawler)...
Jan 25 08:59:38 localhost.localdomain cloud-init[1308]: 2017-01-25 08:59:38,331 - util.py[WARNING]: Failed to set the hostname to cloudhost.localdomain (cloudhost)
Jan 25 08:59:38 localhost.localdomain cloud-init[1308]: 2017-01-25 08:59:38,338 - util.py[WARNING]: Running module set_hostname (<module 'cloudinit.config.cc_set_hostname' from '/usr/lib/python3.5/site-packages/cloudinit/config/cc_set_hostname.py'>) failed
-bash-4.3# 
-bash-4.3# 
-bash-4.3# journalctl -u dbus | head -n 2
-- Logs begin at Wed 2017-01-25 08:59:29 UTC, end at Wed 2017-01-25 09:02:04 UTC. --
Jan 25 08:59:38 localhost.localdomain systemd[1]: Started D-Bus System Message Bus.
```

We need to file a bug upstream and/or send a patch to get this fixed for us. 


I pulled the patch from the commit in the first comment of this BZ and applied it to cloud-init 0.7.8 in Fedora and built a scratch build:


https://koji.fedoraproject.org/koji/taskinfo?taskID=17405006

Comment 10 Lars Kellogg-Stedman 2017-01-25 16:27:15 UTC
cloud-init 0.7.9 includes:

  DefaultDependencies=no

Which means that it does *not* have the implicit dependency on basic.target.  So with 0.7.8, running systemd-analyze dot cloud-init.service yields:

	"cloud-init.service"->"basic.target" [color="green"];

But in 0.7.9 there is no such dependency.  dbus.service runs before basic.target, and this is why everything works in 0.7.8 but hostname setting fails in 0.7.9.

Comment 11 Spyros Trigazis 2017-01-25 16:34:41 UTC
With this build [1] from comment #9, I did a few runs in OpenStack's CI and I didn't see the error again.

[1] https://koji.fedoraproject.org/koji/taskinfo?taskID=17405006

Comment 12 Lars Kellogg-Stedman 2017-01-25 17:04:23 UTC
Related:

https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1629797

Comment 13 Lars Kellogg-Stedman 2017-01-25 18:06:28 UTC
And also:

https://github.com/systemd/systemd/issues/4504

Comment 14 Garrett Holmstrom 2017-01-27 02:41:18 UTC
Filed the new issue as bug 1417025.

Comment 15 Fedora Update System 2017-01-27 03:10:46 UTC
cloud-init-0.7.8-5.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-1f675fd070

Comment 16 Fedora Update System 2017-01-28 04:55:31 UTC
cloud-init-0.7.8-5.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-1f675fd070

Comment 17 Fedora Update System 2017-02-02 20:20:48 UTC
cloud-init-0.7.8-5.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.