Bug 1854175
Summary: | [OVN]4.4.11-x86_64 upgrade to 4.5.0-rc.6-x86_64 failed due console operator | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Simon <skordas> |
Component: | Networking | Assignee: | Ricardo Carrillo Cruz <ricarril> |
Networking sub component: | ovn-kubernetes | QA Contact: | zhaozhanqi <zzhao> |
Status: | CLOSED WONTFIX | Docs Contact: | |
Severity: | medium | ||
Priority: | unspecified | CC: | aos-bugs, bbennett, dmellado, jhou, jokerman, mifiedle, pweil, scuppett, spadgett, xtian, yanpzhan, yapei |
Version: | 4.5 | Keywords: | Reopened, Upgrades |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-08-21 13:36:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Simon
2020-07-06 16:14:53 UTC
The console pod is unable to get the OAuth well-known endpoint. This could be a networking issue. 2020-07-06T11:52:09.957946336Z 2020-07-06T11:52:09Z auth: error contacting auth provider (retrying in 10s): Get https://kubernetes.default.svc/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) It looks like we've seen this before, possibly. One bug notes restarting the OVS pod on the node fixed the issue. https://bugzilla.redhat.com/show_bug.cgi?id=1760103 https://bugzilla.redhat.com/show_bug.cgi?id=1760948 (In reply to Zac Herman from comment #3) > It looks like we've seen this before, possibly. One bug notes restarting > the OVS pod on the node fixed the issue. > > https://bugzilla.redhat.com/show_bug.cgi?id=1760103 > https://bugzilla.redhat.com/show_bug.cgi?id=1760948 from the install-config.yaml. this issue cluster is happen on 'OVNKubernetes' networok plugin and above two bugs are 'openshift-sdn'. So updated the subcomponent I hit this issue last night on an ovn cluster on Azure. Same config this was originally reported on. 1/2 console pods crashlooping with "error contacting auth provider" I tried deleting the crashing console pod, but it's replacement stuck in ContainerCreating with this event: Warning FailedCreatePodSandBox 2s kubelet, ugdci06153010c-v4ksw-master-2 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_console-764486b8cd-gxp5c_openshift-console_213dfedd-5d81-424b-9270-c7ea20e21b5e_0(89c6fbf63d99c330d48f08a5dbba695d29983f95ea5bf1502193dfc1ff7e6356): Multus: [openshift-console/console-764486b8cd-gxp5c]: error adding container to network "ovn-kubernetes": delegateAdd: error invoking confAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[openshift-console/console-764486b8cd-gxp5c] failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition ' So, another "timed out waiting for condition" @yadan for my case, the upgrade was successful (console not progressing) but the console was unavailable. (In reply to Mike Fiedler from comment #6) > I hit this issue last night on an ovn cluster on Azure. Same config this > was originally reported on. > > 1/2 console pods crashlooping with "error contacting auth provider" > > I tried deleting the crashing console pod, but it's replacement stuck in > ContainerCreating with this event: > > Warning FailedCreatePodSandBox 2s kubelet, > ugdci06153010c-v4ksw-master-2 Failed to create pod sandbox: rpc error: code > = Unknown desc = failed to create pod network sandbox > k8s_console-764486b8cd-gxp5c_openshift-console_213dfedd-5d81-424b-9270- > c7ea20e21b5e_0(89c6fbf63d99c330d48f08a5dbba695d29983f95ea5bf1502193dfc1ff7e63 > 56): Multus: [openshift-console/console-764486b8cd-gxp5c]: error adding > container to network "ovn-kubernetes": delegateAdd: error invoking confAdd - > "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request > failed with status 400: '[openshift-console/console-764486b8cd-gxp5c] failed > to configure pod interface: timed out dumping br-int flow entries for > sandbox: timed out waiting for the condition > ' > > So, another "timed out waiting for condition" > > > @yadan for my case, the upgrade was successful (console not progressing) but > the console was unavailable. Guess this issue happen on all pods in that node not only console pod. could you help provide the must-gather logs. thanks @Mike Checked upgrade ci, the upgrade from 4.4.11-x86_64 to 4.5.0-rc.7-x86_64 on cluster with IPI on Azure (FIPS on) OVN succeed. And console could be accessed successfully. $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.5.0-rc.7 True False False 146m cloud-credential 4.5.0-rc.7 True False False 176m cluster-autoscaler 4.5.0-rc.7 True False False 161m config-operator 4.5.0-rc.7 True False False 105m console 4.5.0-rc.7 True False False 64m csi-snapshot-controller 4.5.0-rc.7 True False False 74m dns 4.5.0-rc.7 True False False 169m etcd 4.5.0-rc.7 True False False 169m image-registry 4.5.0-rc.7 True False False 153m ingress 4.5.0-rc.7 True False False 152m insights 4.5.0-rc.7 True False False 163m kube-apiserver 4.5.0-rc.7 True False False 167m kube-controller-manager 4.5.0-rc.7 True False False 168m kube-scheduler 4.5.0-rc.7 True False False 168m kube-storage-version-migrator 4.5.0-rc.7 True False False 68m machine-api 4.5.0-rc.7 True False False 163m machine-approver 4.5.0-rc.7 True False False 92m machine-config 4.5.0-rc.7 True False False 101m marketplace 4.5.0-rc.7 True False False 62m monitoring 4.5.0-rc.7 True False False 88m network 4.5.0-rc.7 True False False 171m node-tuning 4.5.0-rc.7 True False False 92m openshift-apiserver 4.5.0-rc.7 True False False 62m openshift-controller-manager 4.5.0-rc.7 True False False 164m openshift-samples 4.5.0-rc.7 True False False 91m operator-lifecycle-manager 4.5.0-rc.7 True False False 170m operator-lifecycle-manager-catalog 4.5.0-rc.7 True False False 170m operator-lifecycle-manager-packageserver 4.5.0-rc.7 True False False 62m service-ca 4.5.0-rc.7 True False False 171m service-catalog-apiserver 4.4.11 True False False 61m service-catalog-controller-manager 4.4.11 True False False 65m storage 4.5.0-rc.7 True False False 92m [zyp@MiWiFi-R1CM ~]$ oc get pod -n openshift-console NAME READY STATUS RESTARTS AGE console-dc6dc747-57v5l 1/1 Running 0 70m console-dc6dc747-zwrz7 1/1 Running 0 65m downloads-8546cb9cff-4hhlj 1/1 Running 0 65m downloads-8546cb9cff-m9qlc 1/1 Running 0 70m I downloaded the 4.5.1 and upgraded from latest 4.4 just fine: [ricky@localhost openshift-installer]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.12 True True 2m45s Working towards 4.5.1: 27% complete [ricky@localhost openshift-installer]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.1 True False 49m Cluster version is 4.5.1 [ricky@localhost openshift-installer]$ oc get co console NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE console 4.5.1 True False False 57m As such closing, seems it's fixed or was an environmental issue. Punting to 4.7 as I'm unable to reproduce and is very intermittent. There is no supported upgrade for ovn-kube from 4.4 to 4.5. The only customer with a supported 4.4 ovn-kube is not upgrading clusters, they are reinstalling. |