Created attachment 1729211 [details] updated nova-compute-container-puppet.yaml Description of problem: nova_compute container gets stuck in restarting loop after executing nova_hybrid_state tasks while performing the upgrade of OSP13 to OSP16.1 in instanceHA environment. ~~~ [root@msufiyan-novacomputeiha-0 ~]#docker logs nova_compute ++ cat /run_command + CMD='/var/lib/nova/instanceha/check-run-nova-compute ' + ARGS= + sudo kolla_copy_cacerts + [[ ! -n '' ]] + . kolla_extend_start ++ [[ ! -d /var/log/kolla/nova ]] +++ stat -c %a /var/log/kolla/nova ++ [[ 2755 != \7\5\5 ]] ++ chmod 755 /var/log/kolla/nova Running command: '/var/lib/nova/instanceha/check-run-nova-compute ' ++ . /usr/local/bin/kolla_nova_extend_start +++ [[ ! -d /var/lib/nova/instances ]] + echo 'Running command: '\''/var/lib/nova/instanceha/check-run-nova-compute '\''' + exec /var/lib/nova/instanceha/check-run-nova-compute Traceback (most recent call last): File "/var/lib/nova/instanceha/check-run-nova-compute", line 191, in <module> connection = create_nova_connection(config.sections["placement"]) File "/var/lib/nova/instanceha/check-run-nova-compute", line 149, in create_nova_connection http_log_debug=options.has_key("verbose"), AttributeError: 'dict' object has no attribute 'has_key' [root@msufiyan-novacomputeiha-0 ~]# ~~~ - It seems code for instanceHA script to run nova_compute was missing in nova-compute-container-puppet.yaml due to which nova-compute were still using old code[1] in file[2] restart the container ~~~ nova = client.Client(version, region_name=options["os_region_name"][0], session=keystone_session, auth=keystone_auth, http_log_debug=options.has_key("verbose"), endpoint_type=nova_endpoint_type) ~~~ [1] https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/extraconfig/tasks/instanceha/check-run-nova-compute#L149 [2] /var/lib/nova/instanceha/check-run-nova-compute Workaround:- 1) Adding below tasks bind mouting the new "check-run-nova-compute" in /usr/share/openstack-tripleo-heat-templates/deployment/nova/nova-compute-container-puppet.yaml ~~~ 1126 # This code is partial copy of logic in podman installation 1127 - name: is Instance HA enabled 1128 set_fact: 1129 instance_ha_enabled: {get_param: EnableInstanceHA} 1130 - name: install Instance HA script that runs nova-compute 1131 when: instance_ha_enabled|bool 1132 copy: 1133 content: {get_file: ../../scripts/check-run-nova-compute} 1134 dest: /var/lib/nova/instanceha/check-run-nova-compute 1135 mode: 0755 ~~~ 2) Update the plan ~~~ openstack overcloud upgrade prepare ... ~~~ 3) re-run the nova-hybrid task to boot the container with new check-run-nova-compute ~~~ nohup openstack overcloud upgrade run --stack msufiyan --playbook upgrade_steps_playbook.yaml --tags nova_hybrid_state --limit all --yes & ~~~ Additional Note:- Updated check-run-nova-compute will have:- ~~~ 149 else: 150 # OSP >= Ocata 151 # ArgSpec(args=['version'], varargs='args', keywords='kwargs', defaults=None) 152 nova = client.Client(version, 153 region_name=region, 154 session=keystone_session, auth=keystone_auth, 155 http_log_debug="verbose" in options, <<<===== 156 endpoint_type=nova_endpoint_type) ~~~ Version-Release number of selected component (if applicable): OSP16.1 How reproducible: Evertime when we perform upgrade framework[3] in instanceHA bases environment. [3] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/framework_for_upgrades_13_to_16.1/index#upgrading-controller-nodes-with-director-deployed-ceph-storage_upgrading-overcloud-standard
Code present in http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp-ipv4-vxlan-HA-no-ceph-from-passed_phase2/34/undercloud-0/usr/share/openstack-tripleo-heat-templates/deployment/nova/nova-compute-container-puppet.yaml.gz which belongs to the CI job: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/upgrades/view/ffu/job/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp-ipv4-vxlan-HA-no-ceph-from-passed_phase2/34/ Package: openstack-tripleo-heat-templates-11.3.2-1.20200914170175.el8ost.noarch http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-ffu-upgrade-13-16.1_director-rhel-virthost-3cont_2comp-ipv4-vxlan-HA-no-ceph-from-passed_phase2/34/undercloud-0/var/log/dnf.rpm.log.gz Code validated by Sufiyan in his test environment.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.3 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5413