Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1647956

Summary: [UPGRADES][14] Need a way to disable validation during undercloud ugrade re-run
Product: Red Hat OpenStack Reporter: Yurii Prokulevych <yprokule>
Component: openstack-tripleo-heat-templatesAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED ERRATA QA Contact: Ronnie Rasouli <rrasouli>
Severity: high Docs Contact:
Priority: high    
Version: 14.0 (Rocky)CC: augol, ccamacho, hbrock, jamsmith, jfrancoa, jslagle, jstransk, lbezdick, mburns, rheslop, sathlang, sgolovat, ssmolyak, yprokule
Target Milestone: z2Keywords: Triaged, ZStream
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-9.2.1-0.20190119154863.el7ost.noarch Doc Type: Bug Fix
Doc Text:
This update fixes an issue that prevented users from successfully re-running a failed OSP13-to-OSP14 upgrade of OpenStack Platform director. Some upgrade failures resulted in a state where services were not yet deployed with docker, which prevented a successful re-run of the upgrade. Now a check is performed to verify that the services are deployed under docker control, enabling a successful re-run.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-30 17:51:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yurii Prokulevych 2018-11-08 15:39:41 UTC
Description of problem:
-----------------------
Trying to re-run failed undercloud upgrade might fail, because some of validation are not valid anymore. 
So we need a way to pass list of tags that should be skipped, in a same way as we do it for overcloud upgrade '--skip-tags validation'

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jose Luis Franco 2018-11-13 14:10:16 UTC
For what I can observe, the underclod upgrade calls to tripleo deploy with --upgrade option, which at the same time calls:
ansible-playbook -i playbook_inventory upgrade_steps_playbook.yaml --skip-tags validation

https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/tripleo_deploy.py#L802

So the validations shouldn't be running...unless the tag was missed in the validation and it's running it due to that.

Comment 2 Sofer Athlan-Guyot 2018-11-20 13:41:25 UTC
Hi Yurii,

so as mentioned by Jose[1] the ansible-playbooks are run like this during upgrade:

   ansible-playbook ... --skip-tags validation

Could you provide an example of the validation that fails.

Note, I'm currently re-running an undercloud upgrade, but It would be simpler if we had the exact error you've bumped into.

[1] but the line number is currently https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/tripleo_deploy.py#L810

Comment 3 Sofer Athlan-Guyot 2018-11-20 14:00:22 UTC
Hi,

so it seems that:

<holser_> chem I can easily reporduce
<holser_> just killing apache before upgrade
*** yprokule|mtg (~yprokule.131.84) is now known as yprokule
<holser_> then it should fail on ironic pre-upgrade check               [14:59]
<chem> holser_: ah, cool thanks will do

is enough to reproduce the error.

Comment 4 Jiri Stransky 2018-11-20 16:13:45 UTC
Just a piece of info: we should support skipping both `validation` and `pre-upgrade`, so a generic --skip-tags param like the overcloud upgrade has would probably be the best.

Comment 5 Yurii Prokulevych 2018-11-20 16:14:21 UTC
(In reply to Sofer Athlan-Guyot from comment #2)
> Hi Yurii,
> 
> so as mentioned by Jose[1] the ansible-playbooks are run like this during
> upgrade:
> 
>    ansible-playbook ... --skip-tags validation
> 
> Could you provide an example of the validation that fails.
> 
> Note, I'm currently re-running an undercloud upgrade, but It would be
> simpler if we had the exact error you've bumped into.
> 
> [1] but the line number is currently
> https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/
> v1/tripleo_deploy.py#L810

It fails like:
...
    fatal: [undercloud-0]: FAILED! => {"changed": true, "cmd": ["docker", "exec", "ironic_api", "ironic-dbsync", "--config-file", "/etc/ironic/ironic.conf", "online_data_migrations"], "delta": "0
    :00:00.036968", "end": "2018-11-20 11:06:53.414249", "msg": "non-zero return code", "rc": 1, "start": "2018-11-20 11:06:53.377281", "stderr": "Error response from daemon: No such container: i
    ronic_api", "stderr_lines": ["Error response from daemon: No such container: ironic_api"], "stdout": "", "stdout_lines": []}

Comment 6 Jose Luis Franco 2018-11-20 16:40:19 UTC
So, that validation failed because it was missing this patch https://review.openstack.org/616146 , where it's being checked first if the container exists prior to run the validation.

Comment 7 Jiri Stransky 2018-11-20 16:51:47 UTC
Even with that patch it would fail via this part of the script:

echo "Error: ironic_api container not found"
exit 1

If we want to skip the pre-upgrade tasks, we indeed need --skip-tags i think. 

Alternatively, if this causes too much trouble, we could drop the pre-upgrade migrations altogether. They are just a safety net, not a requirement. But their start bringing value in cases when users forget to run the migrations manually (i mean for overcloud mainly, but we use the same t-h-t for undercloud services).

Comment 8 Jiri Stransky 2018-11-21 13:37:08 UTC
On a call earlier today we established that we'll probably make the pre-upgrade migrations "best effort only" (run only when we can, don't error if undercloud is stopped and we can't), as failing when we can't run them seems to bring more trouble than benefit right now.

To clarify, the migrations that we're talking about here shouldn't be necessary at all if the users follow the docs. The intended place where users should run migrations is post-upgrade. The pre-upgrade task which stops the upgrade here is only meant as a safety net for the overcloud, in case the user forgets to run the migrations via the `external-upgrade run` command.

Comment 12 Jose Luis Franco 2019-01-14 14:53:10 UTC
*** Bug 1664705 has been marked as a duplicate of this bug. ***

Comment 19 Lukas Bezdicka 2019-04-26 15:53:27 UTC
I introduced fail task into the ironic-api:
(undercloud) [stack@verify-xbezdick-undercloud-0 ~]$ openstack undercloud upgrade
....
TASK [fail] *******************************************************************************************************************************************************************************************************
fatal: [verify-xbezdick-undercloud-0]: FAILED! => {"changed": false, "msg": "TEST FAIL"}
...


Now rerun with the fail task removed:
(undercloud) [stack@verify-xbezdick-undercloud-0 ~]$ openstack undercloud upgrade
...
upgrade passed

Comment 21 errata-xmlrpc 2019-04-30 17:51:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0878