Bug 1647956
| Summary: | [UPGRADES][14] Need a way to disable validation during undercloud ugrade re-run | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Yurii Prokulevych <yprokule> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Sofer Athlan-Guyot <sathlang> |
| Status: | CLOSED ERRATA | QA Contact: | Ronnie Rasouli <rrasouli> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 14.0 (Rocky) | CC: | augol, ccamacho, hbrock, jamsmith, jfrancoa, jslagle, jstransk, lbezdick, mburns, rheslop, sathlang, sgolovat, ssmolyak, yprokule |
| Target Milestone: | z2 | Keywords: | Triaged, ZStream |
| Target Release: | 14.0 (Rocky) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-9.2.1-0.20190119154863.el7ost.noarch | Doc Type: | Bug Fix |
| Doc Text: |
This update fixes an issue that prevented users from successfully re-running a failed OSP13-to-OSP14 upgrade of OpenStack Platform director.
Some upgrade failures resulted in a state where services were not yet deployed with docker, which prevented a successful re-run of the upgrade.
Now a check is performed to verify that the services are deployed under docker control, enabling a successful re-run.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-04-30 17:51:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Yurii Prokulevych
2018-11-08 15:39:41 UTC
For what I can observe, the underclod upgrade calls to tripleo deploy with --upgrade option, which at the same time calls: ansible-playbook -i playbook_inventory upgrade_steps_playbook.yaml --skip-tags validation https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/tripleo_deploy.py#L802 So the validations shouldn't be running...unless the tag was missed in the validation and it's running it due to that. Hi Yurii, so as mentioned by Jose[1] the ansible-playbooks are run like this during upgrade: ansible-playbook ... --skip-tags validation Could you provide an example of the validation that fails. Note, I'm currently re-running an undercloud upgrade, but It would be simpler if we had the exact error you've bumped into. [1] but the line number is currently https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/tripleo_deploy.py#L810 Hi, so it seems that: <holser_> chem I can easily reporduce <holser_> just killing apache before upgrade *** yprokule|mtg (~yprokule.131.84) is now known as yprokule <holser_> then it should fail on ironic pre-upgrade check [14:59] <chem> holser_: ah, cool thanks will do is enough to reproduce the error. Just a piece of info: we should support skipping both `validation` and `pre-upgrade`, so a generic --skip-tags param like the overcloud upgrade has would probably be the best. (In reply to Sofer Athlan-Guyot from comment #2) > Hi Yurii, > > so as mentioned by Jose[1] the ansible-playbooks are run like this during > upgrade: > > ansible-playbook ... --skip-tags validation > > Could you provide an example of the validation that fails. > > Note, I'm currently re-running an undercloud upgrade, but It would be > simpler if we had the exact error you've bumped into. > > [1] but the line number is currently > https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/ > v1/tripleo_deploy.py#L810 It fails like: ... fatal: [undercloud-0]: FAILED! => {"changed": true, "cmd": ["docker", "exec", "ironic_api", "ironic-dbsync", "--config-file", "/etc/ironic/ironic.conf", "online_data_migrations"], "delta": "0 :00:00.036968", "end": "2018-11-20 11:06:53.414249", "msg": "non-zero return code", "rc": 1, "start": "2018-11-20 11:06:53.377281", "stderr": "Error response from daemon: No such container: i ronic_api", "stderr_lines": ["Error response from daemon: No such container: ironic_api"], "stdout": "", "stdout_lines": []} So, that validation failed because it was missing this patch https://review.openstack.org/616146 , where it's being checked first if the container exists prior to run the validation. Even with that patch it would fail via this part of the script: echo "Error: ironic_api container not found" exit 1 If we want to skip the pre-upgrade tasks, we indeed need --skip-tags i think. Alternatively, if this causes too much trouble, we could drop the pre-upgrade migrations altogether. They are just a safety net, not a requirement. But their start bringing value in cases when users forget to run the migrations manually (i mean for overcloud mainly, but we use the same t-h-t for undercloud services). On a call earlier today we established that we'll probably make the pre-upgrade migrations "best effort only" (run only when we can, don't error if undercloud is stopped and we can't), as failing when we can't run them seems to bring more trouble than benefit right now. To clarify, the migrations that we're talking about here shouldn't be necessary at all if the users follow the docs. The intended place where users should run migrations is post-upgrade. The pre-upgrade task which stops the upgrade here is only meant as a safety net for the overcloud, in case the user forgets to run the migrations via the `external-upgrade run` command. *** Bug 1664705 has been marked as a duplicate of this bug. *** I introduced fail task into the ironic-api:
(undercloud) [stack@verify-xbezdick-undercloud-0 ~]$ openstack undercloud upgrade
....
TASK [fail] *******************************************************************************************************************************************************************************************************
fatal: [verify-xbezdick-undercloud-0]: FAILED! => {"changed": false, "msg": "TEST FAIL"}
...
Now rerun with the fail task removed:
(undercloud) [stack@verify-xbezdick-undercloud-0 ~]$ openstack undercloud upgrade
...
upgrade passed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0878 |