Bug 1518662

Summary: OSP11 -> OSP12 upgrade: pre-upgrade validations are preventing a re-run of the upgrade-non-controller.sh script to upgrade a compute node after a failed attempt
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-commonAssignee: Marios Andreou <mandreou>
Status: CLOSED ERRATA QA Contact: Yurii Prokulevych <yprokule>
Severity: high Docs Contact:
Priority: urgent    
Version: 12.0 (Pike)CC: ccamacho, dbecker, jamsmith, mandreou, mburns, morazi, rhel-osp-director-maint, sathlang, slinaber
Target Milestone: z3Keywords: TestOnly, Triaged, ZStream
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-7.6.9-1.el7ost Doc Type: Bug Fix
Doc Text:
Additional non-controller upgrade attempts after a failed upgrade can fail during service validation if services are not running. To prevent such upgrade failures you can skip services validation. Pass the option "--skip-tags validation" to the Ansible invocation. For example: upgrade-non-controller.sh --upgrade compute-0 --ansible-opts "--skip-tags validation"
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-20 12:58:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2017-11-29 12:34:00 UTC
Description of problem:
OSP11 -> OSP12 upgrade: unable to re-run the upgrade-non-controller.sh script to upgrade a compute node after a failed attempt because the pre-upgrade validations are failing.

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-7.0.3-16.el7ost.noarch
openstack-tripleo-common-7.6.3-6.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP11
2. Upgrade to OSP12
3. Complete major-upgrade-composable-steps-docker.yaml step successfuly
4. Run upgrade-non-controller.sh --upgrade compute-0 which fails due to unreacheable repositories:

TASK [Upgrade os-net-config] ******************************************************************************************************************************************************************************************************************
fatal: [192.168.24.11]: FAILED! => {"changed": true, "failed": true, "msg": "http://rhos-qe-mirror-brq.usersys.redhat.com/rcm-guest/puddles/OpenStack/12.0-RHEL-7/latest/RH7-RHOS-12.0/x86_64/os/Packages/python2-pbr-3.1.1-1.el7ost.noarch.rpm: [Errno -1] Package does not match intended download. Suggestion: run yum --enablerepo=rhelosp-12.0-puddle clean metadata\nTrying other mirror.\nhttp://rhos-qe-mirror-brq.usersys.redhat.com/rcm-guest/puddles/OpenStack/12.0-RHEL-7/latest/RH7-RHOS-12.0/x86_64/os/Packages/os-net-config-7.3.1-1.el7ost.noarch.rpm: [Errno -1] Package does not match intended download. Suggestion: run yum --enablerepo=rhelosp-12.0-puddle clean metadata\nTrying other mirror.\n\n\nError downloading packages:\n  os-net-config-7.3.1-1.el7ost.noarch: [Errno 256] No more mirrors to try.\n  python2-pbr-3.1.1-1.el7ost.noarch: [Errno 256] No more mirrors to try.\n\n", "rc": 1, "results": ["Loaded plugins: product-id, search-disabled-repos, subscription-manager\nThis system is not registered with an entitlement server. You can use subscription-manager to register.\nResolving Dependencies\n--> Running transaction check\n---> Package os-net-config.noarch 0:6.1.0-2.el7ost will be updated\n---> Package os-net-config.noarch 0:7.3.1-1.el7ost will be an update\n--> Processing Dependency: python-pbr >= 2.0.0 for package: os-net-config-7.3.1-1.el7ost.noarch\n--> Running transaction check\n---> Package python-pbr.noarch 0:1.10.0-2.el7ost will be obsoleted\n---> Package python2-pbr.noarch 0:3.1.1-1.el7ost will be obsoleting\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package           Arch       Version             Repository               Size\n================================================================================\nInstalling:\n python2-pbr       noarch     3.1.1-1.el7ost      rhelosp-12.0-puddle     191 k\n     replacing  python-pbr.noarch 1.10.0-2.el7ost\nUpdating:\n os-net-config     noarch     7.3.1-1.el7ost      rhelosp-12.0-puddle     260 k\n\nTransaction Summary\n================================================================================\nInstall  1 Package\nUpgrade  1 Package\n\nTotal download size: 451 k\nDownloading packages:\nDelta RPMs disabled because /usr/bin/applydeltarpm not installed.\n"]}

5. Fix repositories issue

6. Re-run upgrade-non-controller.sh --upgrade compute-0

Actual results:
The command fails the pre-upgrade validation check for the neutron-openvswitch-agent service:

TASK [PreUpgrade step0,validation: Check service neutron-openvswitch-agent is running] ********************************************************************************************************************************************************
fatal: [192.168.24.11]: FAILED! => {"changed": true, "cmd": "/usr/bin/systemctl show 'neutron-openvswitch-agent' --property ActiveState | grep '\\bactive\\b'", "delta": "0:00:00.008534", "end": "2017-11-29 12:26:33.244232", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2017-11-29 12:26:33.235698", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}


Expected results:
We should have a way to disable to pre-upgrade validation checks for the upgrade-non-controller script to allow re-running it after a failed attempt.

Additional info:

Workaround: ssh to the compute node and manually start the services which were stopped during the failed upgrade attempt:

systemctl start neutron-openvswitch-agent

Then re-run the upgrade-non-controller.sh script.

Comment 1 Carlos Camacho 2017-11-29 14:39:55 UTC
Yes, we can not pass --skip-validations to upgrade-non-controller.sh

Comment 2 Marios Andreou 2017-11-30 10:47:45 UTC
taking for investigation as we discussed on scrum last night thanks

Comment 3 Marios Andreou 2017-11-30 16:32:06 UTC
fwiw I don't think this is a blocker for 12 and would be nice to get into the first async release please. Upstream patch posted today needs some more testing

Comment 6 Lon Hohberger 2018-03-29 10:34:32 UTC
According to our records, this should be resolved by openstack-tripleo-common-7.6.9-3.el7ost.  This build is available now.

Comment 8 Yurii Prokulevych 2018-07-02 14:31:39 UTC
Verified with openstack-tripleo-common-7.6.13-1.el7ost.noarch

upgrade-non-controller.sh --upgrade compute-0 -O '--skip-tags validation'

Comment 11 errata-xmlrpc 2018-08-20 12:58:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2331