Bug 1440831
| Summary: | boot into 7.3.4 from 7.3.3 ISO install causes cloud-init to run | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Micah Abbott <miabbott> | ||||
| Component: | cloud-init | Assignee: | Lars Kellogg-Stedman <lars> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Vratislav Hutsky <vhutsky> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 7.3 | CC: | dustymabe, huzhao, jlebon, lars, lfriedma, lmiksik, mbracho, miabbott, mmagr, salmy, sgirijan, walters, yacao | ||||
| Target Milestone: | rc | Keywords: | Triaged | ||||
| Target Release: | --- | Flags: | mbracho:
needinfo+
|
||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | cloud-init-0.7.9-5.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1455335 (view as bug list) | Environment: | |||||
| Last Closed: | 2017-08-01 23:23:42 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1455335 | ||||||
| Attachments: |
|
||||||
|
Description
Micah Abbott
2017-04-10 14:36:37 UTC
Created attachment 1270509 [details]
journal from boot
(In reply to Micah Abbott from comment #0) > The cloud-init service took approximately 2 minutes before giving up, during > which time the system could not be accessed via SSH or console. Scratch that...I actually measured it at about 4.5 minutes So, I *think* the issue here is that cloud-init now ships a generator: http://cloudinit.readthedocs.io/en/latest/topics/boot.html#generator We do disable cloud-init in the interactive-defaults kickstart, but that seems to get overridden by the generator. Adding 'cloud-init=disabled' to the kernel cmdline does work around it fortunately. As Lars mentioned, the easiest thing to do here might be to back out the generator for now, until we can figure out how to do this properly *without* affecting already installed clients. Smoking gun: Apr 10 11:05:57 localhost.localdomain systemd[572]: Spawned /usr/lib/systemd/system-generators/cloud-init-generator as 573. ... Apr 10 11:05:57 localhost.localdomain systemd[572]: /usr/lib/systemd/system-generators/cloud-init-generator succeeded. ... Apr 10 11:05:57 localhost.localdomain systemd[1]: Installed new job cloud-init.target/start as 215 ... Apr 10 11:05:57 localhost.localdomain systemd[1]: Installed new job cloud-init.service/start as 223 Apr 10 11:05:57 localhost.localdomain systemd[1]: Installed new job cloud-init-local.service/start as 221 ... for my reference, code that disables cloud-init is: https://github.com/projectatomic/rpm-ostree-toolbox/blob/master/src/py/lorax-http-repo.tmpl Specifically https://bugzilla.redhat.com/show_bug.cgi?id=850058#c5 I don't see a major risk in backporting this to RHEL - the only thing to note is that it requires a coordinated change with spin-kickstarts to enable it for the cloud images. I renamed the bug title since it doesn't actually have to be a bare metal install. > I don't see a major risk in backporting this to RHEL - the only thing to note is that it requires a coordinated change with spin-kickstarts to enable it for the cloud images. The issue is that the generator will actively re-enable services, even if previously disabled: https://git.launchpad.net/cloud-init/tree/systemd/cloud-init-generator?id=61eb03f#n132. This made me wonder why the generator was added in the first place, which led to: https://git.launchpad.net/cloud-init/commit?id=df2d69 So, it was added so that cloud-init could be disabled from the kernel cmdline. The assumption that it should be enabled in all other cases probably makes sense in most places where cloud-init is found, though not for the AH install path of course. This makes me wonder if we should just not rather completely neuter the generator in Atomc Host composes, since here we do want to preserve the semantics of enabled/disabled services. E.g. in postprocess, do ln -s /dev/null /etc/systemd/system-generators/cloud-init-generator I don't understand why that generator was added upstream. That's not really what generators are for (IMO). The default systemd mechanism for controlling service enablement is *presets*. Anyways, I'm fine nuking the generator just for RHELAH, am also fine nuking it gloablly. I've produced a scratch build of cloud-init that removes the generator from the package. I generally agree with Colin that this isn't the best way to manage services. You can find the build at: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12991127 As soon as this bz has the appropriate acks I'll turn that into a real build. @larsks should we ask upstream about the sanity of using generators? and see if it should be changed? Dusty: Possibly. I think that the generator could be replaced by a ConditionKernelCommandLine and a ConditionPathExists in the unit files. I'll suggest that, but I won't hold my breath. Incidentally, while the -4 package noted in #11 will fix the issue, I think we may have also mis-characterized the underlying problem. In #9, jlebon says:
> The issue is that the generator will actively re-enable services, even if previously disabled
This isn't actually true. If, *after* upgrading to 0.7.9-3, you disable the cloud-init services:
systemctl disable cloud-init cloud-init-local cloud-config cloud-infal
They will remain disabled when you reboot the system. That is, cloud-init upstream did not fundamentally break service management, although as Colin said their decision to use a generator is questionable.
I believe the root cause is that with 0.7.9, the cloud-init services moved from multi-user.target to cloud-init.target, which means that with the new compose there would be new files in /etc/systemd/system/cloud-init.target.wants.
The generator operates by enabling/disabling cloud-init.target, rather than individual services.
Because of the "new" files in the package, I believe the three-way merge of /etc that happens as part of the atomic upgrade process will populate those new files into /etc as part of the upgrade.
(Also, note that the above is *only* going to happen during an ostree compose, because the services are only enabled automatically by the spec file during an *install*. A package *upgrade* will not trigger this behavior.) Blocker for 7.3.5. (In reply to Lars Kellogg-Stedman from comment #14) > > Because of the "new" files in the package, I believe the three-way merge of > /etc that happens as part of the atomic upgrade process will populate those > new files into /etc as part of the upgrade. Yes, that's expected. But the same thing would happen with rpm's default config management, no? Anyways, in the big picture, I think it's simplest if cloud-init worked like every other service - use the systemd macros which honor presets. That's what the patch in https://bugzilla.redhat.com/show_bug.cgi?id=850058#c5 does. Currently, that'd mean it was disabled by default in /usr, though we could change that to allow it to default-enable. Before in Fedora when "Cloud" was a separate edition with its own presets, that would have made sense. But since Atomic Host spans metal and cloud, we *have* to carry a delta for one or the other. Right now we're hacking around cloud-init enabling itself and forcibly disabling it on metal. My preference again is inverting this - in the cloud image kickstart, we `systemctl enable cloud-init`. > use the systemd macros which honor presets
I agree that presets would be nifty. I wasn't sure we were making much use of systemd presets in rhel7, but if that's not true, awesome. We'll need to get appropriate presets into our images before we can modify the package and I'm not sure what that process would look like...I guess a bug against rhel-guest-image and atomic?
There's a scratch build here that mostly reverts the systemd units to what we had in previous releases: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13120763 Still no presets, but they are once again installed to multi-user.target and the generator is gone. Can you build a new tree with this package and test the upgrade? Thanks! I made a local scratch compose with the build that Lars linked to and I can confirm it worked in the case of: - installing via ISO onto bare-metal, then upgrade to scratch compose - installing via cloud-image via libvirt, then upgrade to scratch compose - installing via cloud-image via OpenStack, then upgrade to scratch compose I didn't do any kind of significant testing of the various features supported by clout-init, but the most basic cases seemed to work fine. I successfully repeated my simple tests from comment#22 using the latest RHELAH 7.4 compose, which includes cloud-init-0.7.9-9.el7.x86_64 Based on those results, I'm moving this to VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2275 |