Bug 1272254
Summary: | Overcloud update fails due to os-collect-config restart | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Jan Provaznik <jprovazn> |
Component: | os-collect-config | Assignee: | Mike Burns <mburns> |
Status: | CLOSED ERRATA | QA Contact: | Alexander Chuzhoy <sasha> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 7.0 (Kilo) | CC: | apevec, augol, cylopez, dmacpher, dsavinea, glambert, hrosnet, lhh, mburns, mchappel, rhel-osp-director-maint, sasha, sbaker, yeylon, zbitter |
Target Milestone: | y2 | Keywords: | Triaged |
Target Release: | 7.0 (Kilo) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | os-collect-config-0.1.35-4.el7ost | Doc Type: | Bug Fix |
Doc Text: |
The "os-collect-config" service on the Overcloud restarted on an RPM update. This caused Overcloud updates to fail. This fix changes the behavior so that "os-collect-config" does not restart on an RPM update. The Overcloud updates now succeed after an update of "os-collect-config". Note that "os-collect-config" gracefully restarts itself when "os-refresh-config" runs, so the restart on update is not required.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2015-12-21 16:56:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1274859, 1275814 | ||
Bug Blocks: |
Description
Jan Provaznik
2015-10-15 21:35:52 UTC
os-collect-config is designed to gracefully restart at the end of each run if any data changes, so the rpm spec does not need to specify a restart for the os-collect-config service. https://github.com/openstack/os-collect-config/blob/master/os_collect_config/collect.py#L287 I would suggest as an urgent fix to release an os-collect-config package which doesn't restart the service. Unfortunately os-collect-config is still being restarted: Oct 16 04:34:51 overcloud-controller-0.localdomain os-collect-config[2003]: 2015-10-16 04:34:51.390 2003 WARNING os_collect_config.local [-] No local metadata found (['/var/lib/os-collect-config/local-data']) Oct 16 04:35:27 overcloud-controller-0.localdomain yum[28308]: Updated: os-collect-config-0.1.35-4.el7ost.noarch Oct 16 04:35:33 overcloud-controller-0.localdomain os-collect-config[29174]: 2015-10-16 04:35:33.061 29174 WARNING os-collect-config [-] Source [request] Unavailable. the restart is actually triggered by the rpm being removed, not the new one being installed. the rpm script is %postun which tells the rpm what to do when it's being removed (or upgraded). There isn't anything we can do for that other than document that users need to manually update the rpm on each host *first* then run the stack update. If we can't avoid a restart then we should be able to get systemd to not kill os-collect-conifig's child processes. According to man systemd.kill [1] setting [Service] SendSIGKILL=no would prevent os-refresh-config from being killed when os-collect-config is. This would allow the full os-refresh-config run to continue until its natural exit. The restarted os-collect-config may attempt to do another os-refresh-config while the old one is still running, but this is fine as os-refresh-config prevents concurrent runs with a lockfile [2] It would be nice if we could fix this in the systemd unit rather than requiring a manual upgrade of the package. [1] http://www.freedesktop.org/software/systemd/man/systemd.kill.html [2] https://github.com/openstack/os-refresh-config/blob/master/os_refresh_config/os_refresh_config.py#L93 I think we should make that change to the package, but also add code in the upgrade script to set SendSIGKILL=no in the service file if it is not already present and then do a systemctl daemon-reload so that when yum runs the %postun stanza it will not kill the existing os-collect-config. I think that will allow us to make the initial transition (from not having SendSIGKILL=no to having it) without a manual workaround. The thing to watch out for would be how yum treats modified files on an uninstall (I think it renames them with a suffix instead of removing them), and how that interacts with systemd (I think it probably works because the directory it actually starts things from just contains symlinks to the actual unit files). It should work but there may be subtleties. I'll look into patching the unit file in the update script too. The fixed package works for me when upgrading puddles 2015-07-30-1 -> 2015-10-21-1. One quirk is that journalctl -u os-collect-config stops logging the orphaned os-refresh-config so the results of the remaining update script can't be seen until heat is signalled with the full deploy_stdout. This is to be expected, its just something to keep in mind. Upgrading works from 7.0 to 7.2 now. the original error is very binary, either it works or it isn't. since it is, its enough to mark this as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2015:2651 |