Bug 2057604
Summary: | Overcloud update converge fails after containers are restarted, some of them taking minutes to shutdown and start again | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Eric Nothen <enothen> |
Component: | tripleo-ansible | Assignee: | Gregory Thiemonge <gthiemon> |
Status: | CLOSED ERRATA | QA Contact: | Omer Schwartz <oschwart> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 16.1 (Train) | CC: | bporwal, gthiemon, jelynch, lpeer, majopela, oschwart, scohen |
Target Milestone: | z9 | Keywords: | Triaged |
Target Release: | 16.1 (Train on RHEL 8.2) | Flags: | bporwal:
needinfo?
|
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | tripleo-ansible-0.5.1-1.20220614153406.902c3c8.el8ost | Doc Type: | Bug Fix |
Doc Text: |
Before this update, the Load-balancing services (octavia) were restarted many times during deployments or updates. With this update, the services are restarted only when required, preventing potential interruptions of the control plane.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-12-07 20:25:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Eric Nothen
2022-02-23 17:30:22 UTC
Added a commit for tripleo-ansible, it would prevent restarting the Octavia services each time the playbook is run Seeing that in our update job, in a build that ran from 16.1.4 -> 16.1.6 we had 2 service restarts in the converge stage: Ex: health manager logs (we see "INFO octavia.common.config [-] /usr/bin/octavia-health-manager version 5.0.3") http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-octavia-update-16.1_director-rhel-virthost-3cont_2comp_1ipa-ipv4-geneve-tls/97/controller-0/var/log/containers/octavia/health-manager.log.gz We can see when that step started in the following log (2022-11-14 18:44:39.107968) http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-octavia-update-16.1_director-rhel-virthost-3cont_2comp_1ipa-ipv4-geneve-tls/97/undercloud-0/home/stack/.tripleo/history.gz And seeing that in an update job with the fix, from our current latest_cdn puddle to our current passed_phase2 puddle, we got 1 service restart during the converge stage: health manager logs (we see "INFO octavia.common.config [-] /usr/bin/octavia-health-manager version 5.0.3") http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-octavia-update-16.1_director-rhel-virthost-3cont_2comp_1ipa-ipv4-geneve-tls/99/controller-0/var/log/containers/octavia/health-manager.log.gz We can see when that step started in the following log (2022-11-18 20:09:07.158376) http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-network-octavia-update-16.1_director-rhel-virthost-3cont_2comp_1ipa-ipv4-geneve-tls/97/undercloud-0/home/stack/.tripleo/history.gz We didn't receive as many restarts as the customer, but having the fix merged - we do see an improvement regarding the abuse the Octavia services restart. That looks good to me. I am moving the BZ status to VERIFIED. Some info about the puddles: 16.1.4: RHOS-16.1-RHEL-8-20210311.n.1 16.1.6: RHOS-16.1-RHEL-8-20210506.n.1 16.1 latest_cdn which was used in the aforementioned build: RHOS-16.1-RHEL-8-20220804.n.1 16.1 passed_phase2 which was used in the aforementioned build: RHOS-16.1-RHEL-8-20221116.n.1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8795 |