Description of problem: During upgrade 4.4.11-x86_64 to 4.5.0-rc.6-x86_64 console stays in unavailable state Version-Release number of selected component (if applicable): 4.4.11-x86_64 How reproducible: 1 on 1 try Steps to Reproduce: 1. Upgrade 4.4.11-x86_64 to 2. 4.5.0-rc.6-x86_64 3. Actual results: $ oc get co console NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE console 4.5.0-rc.6 False True False 106m Name: console Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2020-07-06T08:06:40Z Generation: 1 Resource Version: 79114 Self Link: /apis/config.openshift.io/v1/clusteroperators/console UID: d0126bb1-1460-4d1a-9edb-0a30529b8e3e Spec: Status: Conditions: Last Transition Time: 2020-07-06T08:06:41Z Reason: AsExpected Status: False Type: Degraded Last Transition Time: 2020-07-06T10:04:33Z Message: SyncLoopRefreshProgressing: Working toward version 4.5.0-rc.6 Reason: SyncLoopRefresh_InProgress Status: True Type: Progressing Last Transition Time: 2020-07-06T10:04:37Z Message: DeploymentAvailable: 2 replicas ready at version 4.5.0-rc.6 Reason: Deployment_FailedUpdate Status: False Type: Available Last Transition Time: 2020-07-06T08:06:40Z Reason: AsExpected Status: True Type: Upgradeable Extension: <nil> Related Objects: Group: operator.openshift.io Name: cluster Resource: consoles Group: config.openshift.io Name: cluster Resource: consoles Group: config.openshift.io Name: cluster Resource: infrastructures Group: config.openshift.io Name: cluster Resource: proxies Group: oauth.openshift.io Name: console Resource: oauthclients Group: Name: openshift-console-operator Resource: namespaces Group: Name: openshift-console Resource: namespaces Group: Name: console-public Namespace: openshift-config-managed Resource: configmaps Versions: Name: operator Version: 4.5.0-rc.6 Events: <none> Name: dns Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2020-07-06T08:01:15Z Generation: 1 Resource Version: 47582 Self Link: /apis/config.openshift.io/v1/clusteroperators/dns UID: e6369fae-94ad-4719-84f6-96329f087541 Spec: Status: Conditions: Last Transition Time: 2020-07-06T08:15:24Z Message: All desired DNS DaemonSets available and operand Namespace exists Reason: AsExpected Status: False Type: Degraded Last Transition Time: 2020-07-06T09:17:18Z Message: Desired and available number of DNS DaemonSets are equal Reason: AsExpected Status: False Type: Progressing Last Transition Time: 2020-07-06T08:01:36Z Message: At least 1 DNS DaemonSet available Reason: AsExpected Status: True Type: Available Extension: <nil> Related Objects: Group: Name: openshift-dns-operator Resource: namespaces Group: Name: openshift-dns Resource: namespaces Group: operator.openshift.io Name: Resource: DNS Versions: Name: operator Version: 4.4.11 Name: coredns Version: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1cfcdb3c2406c10e980eabd454ef2640877b15d6576e7dfae2beaf129ec94f03 Name: openshift-cli Version: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7cb23c271c3b40a1733f7ae366167cdb91050a449c263811c066f582b772054c Events: <none> Name: machine-config Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2020-07-06T08:00:50Z Generation: 1 Resource Version: 49273 Self Link: /apis/config.openshift.io/v1/clusteroperators/machine-config UID: bf1c40f8-77e5-4838-a4b1-ea091f526f41 Spec: Status: Conditions: Last Transition Time: 2020-07-06T09:21:46Z Message: Cluster has deployed 4.4.11 Status: True Type: Available Last Transition Time: 2020-07-06T08:01:53Z Message: Cluster version is 4.4.11 Status: False Type: Progressing Last Transition Time: 2020-07-06T09:21:46Z Status: False Type: Degraded Last Transition Time: 2020-07-06T08:01:53Z Reason: AsExpected Status: True Type: Upgradeable Extension: Related Objects: Group: Name: openshift-machine-config-operator Resource: namespaces Group: machineconfiguration.openshift.io Name: master Resource: machineconfigpools Group: machineconfiguration.openshift.io Name: worker Resource: machineconfigpools Group: machineconfiguration.openshift.io Name: machine-config-controller Resource: controllerconfigs Versions: Name: operator Version: 4.4.11 Events: <none> Name: network Namespace: Labels: <none> Annotations: network.operator.openshift.io/last-seen-state: {"DaemonsetStates":[],"DeploymentStates":[]} API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2020-07-06T07:59:07Z Generation: 1 Resource Version: 78767 Self Link: /apis/config.openshift.io/v1/clusteroperators/network UID: 28323b4a-9fa9-4cc6-89b5-0bdc19d01894 Spec: Status: Conditions: Last Transition Time: 2020-07-06T07:59:56Z Status: False Type: Degraded Last Transition Time: 2020-07-06T07:59:07Z Status: True Type: Upgradeable Last Transition Time: 2020-07-06T10:04:07Z Status: False Type: Progressing Last Transition Time: 2020-07-06T08:00:31Z Status: True Type: Available Extension: <nil> Related Objects: Group: Name: applied-cluster Namespace: openshift-network-operator Resource: configmaps Group: apiextensions.k8s.io Name: network-attachment-definitions.k8s.cni.cncf.io Resource: customresourcedefinitions Group: apiextensions.k8s.io Name: ippools.whereabouts.cni.cncf.io Resource: customresourcedefinitions Group: Name: openshift-multus Resource: namespaces Group: rbac.authorization.k8s.io Name: multus Resource: clusterroles Group: Name: multus Namespace: openshift-multus Resource: serviceaccounts Group: rbac.authorization.k8s.io Name: multus Resource: clusterrolebindings Group: rbac.authorization.k8s.io Name: multus-whereabouts Resource: clusterrolebindings Group: rbac.authorization.k8s.io Name: whereabouts-cni Resource: clusterroles Group: Name: cni-binary-copy-script Namespace: openshift-multus Resource: configmaps Group: apps Name: multus Namespace: openshift-multus Resource: daemonsets Group: Name: multus-admission-controller Namespace: openshift-multus Resource: services Group: rbac.authorization.k8s.io Name: multus-admission-controller-webhook Resource: clusterroles Group: rbac.authorization.k8s.io Name: multus-admission-controller-webhook Resource: clusterrolebindings Group: admissionregistration.k8s.io Name: multus.openshift.io Resource: validatingwebhookconfigurations Group: Name: openshift-service-ca Namespace: openshift-network-operator Resource: configmaps Group: apps Name: multus-admission-controller Namespace: openshift-multus Resource: daemonsets Group: monitoring.coreos.com Name: monitor-multus-admission-controller Namespace: openshift-multus Resource: servicemonitors Group: rbac.authorization.k8s.io Name: prometheus-k8s Namespace: openshift-multus Resource: roles Group: rbac.authorization.k8s.io Name: prometheus-k8s Namespace: openshift-multus Resource: rolebindings Group: monitoring.coreos.com Name: prometheus-k8s-rules Namespace: openshift-multus Resource: prometheusrules Group: Name: openshift-ovn-kubernetes Resource: namespaces Group: Name: ovn-kubernetes-node Namespace: openshift-ovn-kubernetes Resource: serviceaccounts Group: rbac.authorization.k8s.io Name: openshift-ovn-kubernetes-node Resource: clusterroles Group: rbac.authorization.k8s.io Name: openshift-ovn-kubernetes-node Resource: clusterrolebindings Group: Name: ovn-kubernetes-controller Namespace: openshift-ovn-kubernetes Resource: serviceaccounts Group: rbac.authorization.k8s.io Name: openshift-ovn-kubernetes-controller Resource: clusterroles Group: rbac.authorization.k8s.io Name: openshift-ovn-kubernetes-controller Resource: clusterrolebindings Group: rbac.authorization.k8s.io Name: openshift-ovn-kubernetes-sbdb Namespace: openshift-ovn-kubernetes Resource: roles Group: rbac.authorization.k8s.io Name: openshift-ovn-kubernetes-sbdb Namespace: openshift-ovn-kubernetes Resource: rolebindings Group: Name: ovnkube-config Namespace: openshift-ovn-kubernetes Resource: configmaps Group: Name: ovnkube-db Namespace: openshift-ovn-kubernetes Resource: services Group: apps Name: ovs-node Namespace: openshift-ovn-kubernetes Resource: daemonsets Group: network.operator.openshift.io Name: ovn Namespace: openshift-ovn-kubernetes Resource: operatorpkis Group: monitoring.coreos.com Name: master-rules Namespace: openshift-ovn-kubernetes Resource: prometheusrules Group: monitoring.coreos.com Name: networking-rules Namespace: openshift-ovn-kubernetes Resource: prometheusrules Group: monitoring.coreos.com Name: monitor-ovn-master Namespace: openshift-ovn-kubernetes Resource: servicemonitors Group: Name: ovn-kubernetes-master Namespace: openshift-ovn-kubernetes Resource: services Group: monitoring.coreos.com Name: monitor-ovn-node Namespace: openshift-ovn-kubernetes Resource: servicemonitors Group: Name: ovn-kubernetes-node Namespace: openshift-ovn-kubernetes Resource: services Group: rbac.authorization.k8s.io Name: prometheus-k8s Namespace: openshift-ovn-kubernetes Resource: roles Group: rbac.authorization.k8s.io Name: prometheus-k8s Namespace: openshift-ovn-kubernetes Resource: rolebindings Group: policy Name: ovn-raft-quorum-guard Namespace: openshift-ovn-kubernetes Resource: poddisruptionbudgets Group: apps Name: ovnkube-master Namespace: openshift-ovn-kubernetes Resource: daemonsets Group: apps Name: ovnkube-node Namespace: openshift-ovn-kubernetes Resource: daemonsets Group: Name: openshift-network-operator Resource: namespaces Versions: Name: operator Version: 4.4.11 Events: <none> Expected results: console should be available and upgraded
The console pod is unable to get the OAuth well-known endpoint. This could be a networking issue. 2020-07-06T11:52:09.957946336Z 2020-07-06T11:52:09Z auth: error contacting auth provider (retrying in 10s): Get https://kubernetes.default.svc/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
It looks like we've seen this before, possibly. One bug notes restarting the OVS pod on the node fixed the issue. https://bugzilla.redhat.com/show_bug.cgi?id=1760103 https://bugzilla.redhat.com/show_bug.cgi?id=1760948
(In reply to Zac Herman from comment #3) > It looks like we've seen this before, possibly. One bug notes restarting > the OVS pod on the node fixed the issue. > > https://bugzilla.redhat.com/show_bug.cgi?id=1760103 > https://bugzilla.redhat.com/show_bug.cgi?id=1760948 from the install-config.yaml. this issue cluster is happen on 'OVNKubernetes' networok plugin and above two bugs are 'openshift-sdn'. So updated the subcomponent
I hit this issue last night on an ovn cluster on Azure. Same config this was originally reported on. 1/2 console pods crashlooping with "error contacting auth provider" I tried deleting the crashing console pod, but it's replacement stuck in ContainerCreating with this event: Warning FailedCreatePodSandBox 2s kubelet, ugdci06153010c-v4ksw-master-2 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_console-764486b8cd-gxp5c_openshift-console_213dfedd-5d81-424b-9270-c7ea20e21b5e_0(89c6fbf63d99c330d48f08a5dbba695d29983f95ea5bf1502193dfc1ff7e6356): Multus: [openshift-console/console-764486b8cd-gxp5c]: error adding container to network "ovn-kubernetes": delegateAdd: error invoking confAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[openshift-console/console-764486b8cd-gxp5c] failed to configure pod interface: timed out dumping br-int flow entries for sandbox: timed out waiting for the condition ' So, another "timed out waiting for condition" @yadan for my case, the upgrade was successful (console not progressing) but the console was unavailable.
(In reply to Mike Fiedler from comment #6) > I hit this issue last night on an ovn cluster on Azure. Same config this > was originally reported on. > > 1/2 console pods crashlooping with "error contacting auth provider" > > I tried deleting the crashing console pod, but it's replacement stuck in > ContainerCreating with this event: > > Warning FailedCreatePodSandBox 2s kubelet, > ugdci06153010c-v4ksw-master-2 Failed to create pod sandbox: rpc error: code > = Unknown desc = failed to create pod network sandbox > k8s_console-764486b8cd-gxp5c_openshift-console_213dfedd-5d81-424b-9270- > c7ea20e21b5e_0(89c6fbf63d99c330d48f08a5dbba695d29983f95ea5bf1502193dfc1ff7e63 > 56): Multus: [openshift-console/console-764486b8cd-gxp5c]: error adding > container to network "ovn-kubernetes": delegateAdd: error invoking confAdd - > "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request > failed with status 400: '[openshift-console/console-764486b8cd-gxp5c] failed > to configure pod interface: timed out dumping br-int flow entries for > sandbox: timed out waiting for the condition > ' > > So, another "timed out waiting for condition" > > > @yadan for my case, the upgrade was successful (console not progressing) but > the console was unavailable. Guess this issue happen on all pods in that node not only console pod. could you help provide the must-gather logs. thanks @Mike
Checked upgrade ci, the upgrade from 4.4.11-x86_64 to 4.5.0-rc.7-x86_64 on cluster with IPI on Azure (FIPS on) OVN succeed. And console could be accessed successfully. $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.5.0-rc.7 True False False 146m cloud-credential 4.5.0-rc.7 True False False 176m cluster-autoscaler 4.5.0-rc.7 True False False 161m config-operator 4.5.0-rc.7 True False False 105m console 4.5.0-rc.7 True False False 64m csi-snapshot-controller 4.5.0-rc.7 True False False 74m dns 4.5.0-rc.7 True False False 169m etcd 4.5.0-rc.7 True False False 169m image-registry 4.5.0-rc.7 True False False 153m ingress 4.5.0-rc.7 True False False 152m insights 4.5.0-rc.7 True False False 163m kube-apiserver 4.5.0-rc.7 True False False 167m kube-controller-manager 4.5.0-rc.7 True False False 168m kube-scheduler 4.5.0-rc.7 True False False 168m kube-storage-version-migrator 4.5.0-rc.7 True False False 68m machine-api 4.5.0-rc.7 True False False 163m machine-approver 4.5.0-rc.7 True False False 92m machine-config 4.5.0-rc.7 True False False 101m marketplace 4.5.0-rc.7 True False False 62m monitoring 4.5.0-rc.7 True False False 88m network 4.5.0-rc.7 True False False 171m node-tuning 4.5.0-rc.7 True False False 92m openshift-apiserver 4.5.0-rc.7 True False False 62m openshift-controller-manager 4.5.0-rc.7 True False False 164m openshift-samples 4.5.0-rc.7 True False False 91m operator-lifecycle-manager 4.5.0-rc.7 True False False 170m operator-lifecycle-manager-catalog 4.5.0-rc.7 True False False 170m operator-lifecycle-manager-packageserver 4.5.0-rc.7 True False False 62m service-ca 4.5.0-rc.7 True False False 171m service-catalog-apiserver 4.4.11 True False False 61m service-catalog-controller-manager 4.4.11 True False False 65m storage 4.5.0-rc.7 True False False 92m [zyp@MiWiFi-R1CM ~]$ oc get pod -n openshift-console NAME READY STATUS RESTARTS AGE console-dc6dc747-57v5l 1/1 Running 0 70m console-dc6dc747-zwrz7 1/1 Running 0 65m downloads-8546cb9cff-4hhlj 1/1 Running 0 65m downloads-8546cb9cff-m9qlc 1/1 Running 0 70m
I downloaded the 4.5.1 and upgraded from latest 4.4 just fine: [ricky@localhost openshift-installer]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.12 True True 2m45s Working towards 4.5.1: 27% complete [ricky@localhost openshift-installer]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.1 True False 49m Cluster version is 4.5.1 [ricky@localhost openshift-installer]$ oc get co console NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE console 4.5.1 True False False 57m As such closing, seems it's fixed or was an environmental issue.
Punting to 4.7 as I'm unable to reproduce and is very intermittent.
There is no supported upgrade for ovn-kube from 4.4 to 4.5. The only customer with a supported 4.4 ovn-kube is not upgrading clusters, they are reinstalling.