Bug 1868259
Summary: | [OVN]Upgrading from 4.5.5 to 4.6 latest nightly build failed | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | huirwang |
Component: | Networking | Assignee: | Tim Rozet <trozet> |
Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | aconstan, anusaxen, rbrattai, weinliu, weliang, wsun, zzhao |
Version: | 4.6 | Keywords: | TestBlocker |
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:28:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
huirwang
2020-08-12 07:11:05 UTC
I think there are 2 different issues here. For the original bug the issue is that we are not upgrading ovnkube-node before we launch the metrics pods, so metrics cannot deploy because it needs port 9103. For the the other comment, it looks like something is wrong with configuring br-ex bridge with the ovs-configuration service, however I cannot launch an oc debug node pod on that cluster, so I'm unable to investigate further. Either way, Weinan please open a new BZ for the issue you are encountering as it is a separate bug. (In reply to Tim Rozet from comment #6) > I think there are 2 different issues here. For the original bug the issue is > that we are not upgrading ovnkube-node before we launch the metrics pods, so > metrics cannot deploy because it needs port 9103. For the the other comment, > it looks like something is wrong with configuring br-ex bridge with the > ovs-configuration service, however I cannot launch an oc debug node pod on > that cluster, so I'm unable to investigate further. Either way, Weinan > please open a new BZ for the issue you are encountering as it is a separate > bug. Actually the metrics port problem might not be the real issue. I see in the manual upgrade cluster ovs-configuration problems. It seems we can't escape the x509 error. As per auth team "no. It's reasonably safeish to ignore a cert error for getting logs (so we built that), but it's considerably less safe cases where users send data to the potentially unsafe endpoint" The other way i believe is to leverage a bastion host here which is not working for me at the moment :( @huiran Could you try to repro this issue on Monday on a new setup and share? Had a hard time with oc debug and bastion host on this one. Thanks OK so after further investigation it looks like the problem is that CNO upgrades before MCO. Which means MCO never has a chance to start system OVS and run the ovs-configuration service and OVN fails to start. We have a couple of options here: 1. Add the same detection we use for ovs-node DS to detect whether or not OVS is running in the host or not, and use that to determine if we should run in local GW mode. That would allow CNO to "upgrade" and then when MCO runs it would reboot the node and ovn-kube would then run the right way after that. 2. Move CNO to run after MCO in upgrade path. Need to figure out if #2 is feasible, otherwise we go with #1. *** Bug 1868083 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475 |