Bug 2094932
Summary: | MGMT-10403 Ingress should enable single-node cluster expansion on upgraded clusters | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Omer Tuchfeld <otuchfel> |
Component: | Networking | Assignee: | Omer Tuchfeld <otuchfel> |
Networking sub component: | router | QA Contact: | Hongan Li <hongli> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | unspecified | CC: | aos-bugs, jhou, mmasters, wking, wwei |
Version: | 4.11 | ||
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 11:17:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Omer Tuchfeld
2022-06-08 16:25:09 UTC
upgrade from 4.10.0-0.nightly-2022-06-08-150219 to 4.11.0-0.nightly-2022-06-15-222801, but the ingress.status.defaultPlacement is still blank. $ oc get ingress.config cluster -oyaml <---snip---> status: componentRoutes: - conditions: - lastTransitionTime: "2022-06-16T06:49:41Z" message: All is well reason: AsExpected status: "False" type: Progressing - lastTransitionTime: "2022-06-16T06:49:41Z" message: All is well reason: AsExpected status: "False" type: Degraded consumingUsers: - system:serviceaccount:oauth-openshift:authentication-operator currentHostnames: - oauth-openshift.apps.hongli-sno.qe.devcluster.openshift.com defaultHostname: oauth-openshift.apps.hongli-sno.qe.devcluster.openshift.com name: oauth-openshift namespace: openshift-authentication relatedObjects: - group: route.openshift.io name: oauth-openshift namespace: openshift-authentication resource: routes defaultPlacement: "" $ oc get infrastructures.config.openshift.io cluster -oyaml <---snip---> status: apiServerInternalURI: https://api-int.hongli-sno.qe.devcluster.openshift.com:6443 apiServerURL: https://api.hongli-sno.qe.devcluster.openshift.com:6443 controlPlaneTopology: SingleReplica etcdDiscoveryDomain: "" infrastructureName: hongli-sno-2k9gn infrastructureTopology: SingleReplica platform: None platformStatus: type: None $ oc get clusterversion/version -oyaml <---snip---> history: - completionTime: "2022-06-16T07:22:05Z" image: registry.ci.openshift.org/ocp/release@sha256:bceac2ed723ce186c56b1db5e7b17cf0ef0a62e6bbfba5d545d419c3018498b2 startedTime: "2022-06-16T06:29:58Z" state: Completed verified: true version: 4.11.0-0.nightly-2022-06-15-222801 - completionTime: "2022-06-16T03:00:37Z" image: registry.ci.openshift.org/ocp/release@sha256:6bb01826e3996b4b792c0eed75316cfd55fd45f87fdd08a54d4953311c6ae985 startedTime: "2022-06-16T02:22:42Z" state: Completed verified: false version: 4.10.0-0.nightly-2022-06-08-150219 Thanks for noticing this. Can you please share the ingress operator logs from that run? Thanks,
> 2022-06-16T07:16:50.802Z ERROR operator.init ingress-operator/start.go:197 failed to handle single node 4.11 upgrade logic {"error": "failed fetching cluster nodes: nodes is forbidden: User \"system:serviceaccount:openshift-ingress-operator:ingress-operator\" cannot list resource \"nodes\" in API group \"\" at the cluster scope"}
So it's a permissions issue. Which also explains why I didn't encounter this when testing locally, because I used a kubeadmin kubeconfig. Will fix
checked with latest ci build (since no available nightly build so far) but see new errors in the logs: 2022-06-21T03:32:12.813Z ERROR operator.init ingress-operator/start.go:197 failed to handle single node 4.11 upgrade logic {"error": "unable to update ingress config \"cluster\": ingresses.config.openshift.io \"cluster\" is forbidden: User \"system:serviceaccount:openshift-ingress-operator:ingress-operator\" cannot patch resource \"ingresses/status\" in API group \"config.openshift.io\" at the cluster scope"} and the ingress.status.defaultPlacement is still blank defaultPlacement: "" checked with latest ci build : 4.11.0-0.nightly-2022-06-21-040754 upgrade from 4.10.0-0.nightly-2022-06-08-150219 to 4.11.0-0.nightly-2022-06-21-040754 oc get clusterversions NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-21-040754 True False 4m48s Cluster version is 4.11.0-0.nightly-2022-06-21-040754 oc get ingresses.config.openshift.io cluster -ojson | jq '.status.defaultPlacement' "" oc get deployment -n openshift-ingress -ojson | jq -r '.items[].spec.template.spec.nodeSelector | keys[] | select(. | test("node"))' | cut -d'/' -f2 worker error info: 2022-06-21T15:55:27.579Z ERROR operator.init ingress-operator/start.go:197 failed to handle single node 4.11 upgrade logic {"error": "unable to update ingress config \"cluster\": ingresses.config.openshift.io \"cluster\" is forbidden: User \"system:serviceaccount:openshift-ingress-operator:ingress-operator\" cannot patch resource \"ingresses/status\" in API group \"config.openshift.io\" at the cluster scope"} 2022-06-21T15:55:28.202Z ERROR operator.canary_controller wait/wait.go:155 error performing canary route check {"error": "error sending canary HTTP request: DNS error: Get \"https://canary-openshift-ingress-canary.apps.wwei-0621h.qe.devcluster.openshift.com\": dial tcp: lookup canary-openshift-ingress-canary.apps.wwei-0621h.qe.devcluster.openshift.com on 172.30.0.10:53: read udp 10.128.0.105:38668->172.30.0.10:53: read: connection refused"} 2022-06-21T15:55:28.476Z ERROR operator.ingress_controller controller/controller.go:114 got retryable error; requeueing {"after": "59m59.999992937s", "error": "IngressController may become degraded soon: DeploymentReplicasAllAvailable=False"} upgrade from 4.10.0-0.nightly-2022-06-08-150219 to 4.11.0-0.nightly-2022-06-22-015220 and passed. $ oc get ingress.config cluster -o=jsonpath={.status.defaultPlacement} ControlPlane $ oc get deployment -n openshift-ingress -ojson | jq -r '.items[].spec.template.spec.nodeSelector' { "kubernetes.io/os": "linux", "node-role.kubernetes.io/master": "" } Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |