Bug 2094932
| Summary: | MGMT-10403 Ingress should enable single-node cluster expansion on upgraded clusters | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Omer Tuchfeld <otuchfel> |
| Component: | Networking | Assignee: | Omer Tuchfeld <otuchfel> |
| Networking sub component: | router | QA Contact: | Hongan Li <hongli> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | unspecified | CC: | aos-bugs, jhou, mmasters, wking, wwei |
| Version: | 4.11 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 11:17:00 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Omer Tuchfeld
2022-06-08 16:25:09 UTC
upgrade from 4.10.0-0.nightly-2022-06-08-150219 to 4.11.0-0.nightly-2022-06-15-222801, but the ingress.status.defaultPlacement is still blank.
$ oc get ingress.config cluster -oyaml
<---snip--->
status:
componentRoutes:
- conditions:
- lastTransitionTime: "2022-06-16T06:49:41Z"
message: All is well
reason: AsExpected
status: "False"
type: Progressing
- lastTransitionTime: "2022-06-16T06:49:41Z"
message: All is well
reason: AsExpected
status: "False"
type: Degraded
consumingUsers:
- system:serviceaccount:oauth-openshift:authentication-operator
currentHostnames:
- oauth-openshift.apps.hongli-sno.qe.devcluster.openshift.com
defaultHostname: oauth-openshift.apps.hongli-sno.qe.devcluster.openshift.com
name: oauth-openshift
namespace: openshift-authentication
relatedObjects:
- group: route.openshift.io
name: oauth-openshift
namespace: openshift-authentication
resource: routes
defaultPlacement: ""
$ oc get infrastructures.config.openshift.io cluster -oyaml
<---snip--->
status:
apiServerInternalURI: https://api-int.hongli-sno.qe.devcluster.openshift.com:6443
apiServerURL: https://api.hongli-sno.qe.devcluster.openshift.com:6443
controlPlaneTopology: SingleReplica
etcdDiscoveryDomain: ""
infrastructureName: hongli-sno-2k9gn
infrastructureTopology: SingleReplica
platform: None
platformStatus:
type: None
$ oc get clusterversion/version -oyaml
<---snip--->
history:
- completionTime: "2022-06-16T07:22:05Z"
image: registry.ci.openshift.org/ocp/release@sha256:bceac2ed723ce186c56b1db5e7b17cf0ef0a62e6bbfba5d545d419c3018498b2
startedTime: "2022-06-16T06:29:58Z"
state: Completed
verified: true
version: 4.11.0-0.nightly-2022-06-15-222801
- completionTime: "2022-06-16T03:00:37Z"
image: registry.ci.openshift.org/ocp/release@sha256:6bb01826e3996b4b792c0eed75316cfd55fd45f87fdd08a54d4953311c6ae985
startedTime: "2022-06-16T02:22:42Z"
state: Completed
verified: false
version: 4.10.0-0.nightly-2022-06-08-150219
Thanks for noticing this. Can you please share the ingress operator logs from that run? Thanks,
> 2022-06-16T07:16:50.802Z ERROR operator.init ingress-operator/start.go:197 failed to handle single node 4.11 upgrade logic {"error": "failed fetching cluster nodes: nodes is forbidden: User \"system:serviceaccount:openshift-ingress-operator:ingress-operator\" cannot list resource \"nodes\" in API group \"\" at the cluster scope"}
So it's a permissions issue. Which also explains why I didn't encounter this when testing locally, because I used a kubeadmin kubeconfig. Will fix
checked with latest ci build (since no available nightly build so far) but see new errors in the logs:
2022-06-21T03:32:12.813Z ERROR operator.init ingress-operator/start.go:197 failed to handle single node 4.11 upgrade logic {"error": "unable to update ingress config \"cluster\": ingresses.config.openshift.io \"cluster\" is forbidden: User \"system:serviceaccount:openshift-ingress-operator:ingress-operator\" cannot patch resource \"ingresses/status\" in API group \"config.openshift.io\" at the cluster scope"}
and the ingress.status.defaultPlacement is still blank
defaultPlacement: ""
checked with latest ci build : 4.11.0-0.nightly-2022-06-21-040754
upgrade from 4.10.0-0.nightly-2022-06-08-150219 to 4.11.0-0.nightly-2022-06-21-040754
oc get clusterversions
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-06-21-040754 True False 4m48s Cluster version is 4.11.0-0.nightly-2022-06-21-040754
oc get ingresses.config.openshift.io cluster -ojson | jq '.status.defaultPlacement'
""
oc get deployment -n openshift-ingress -ojson | jq -r '.items[].spec.template.spec.nodeSelector | keys[] | select(. | test("node"))' | cut -d'/' -f2
worker
error info:
2022-06-21T15:55:27.579Z ERROR operator.init ingress-operator/start.go:197 failed to handle single node 4.11 upgrade logic {"error": "unable to update ingress config \"cluster\": ingresses.config.openshift.io \"cluster\" is forbidden: User \"system:serviceaccount:openshift-ingress-operator:ingress-operator\" cannot patch resource \"ingresses/status\" in API group \"config.openshift.io\" at the cluster scope"}
2022-06-21T15:55:28.202Z ERROR operator.canary_controller wait/wait.go:155 error performing canary route check {"error": "error sending canary HTTP request: DNS error: Get \"https://canary-openshift-ingress-canary.apps.wwei-0621h.qe.devcluster.openshift.com\": dial tcp: lookup canary-openshift-ingress-canary.apps.wwei-0621h.qe.devcluster.openshift.com on 172.30.0.10:53: read udp 10.128.0.105:38668->172.30.0.10:53: read: connection refused"}
2022-06-21T15:55:28.476Z ERROR operator.ingress_controller controller/controller.go:114 got retryable error; requeueing {"after": "59m59.999992937s", "error": "IngressController may become degraded soon: DeploymentReplicasAllAvailable=False"}
upgrade from 4.10.0-0.nightly-2022-06-08-150219 to 4.11.0-0.nightly-2022-06-22-015220 and passed.
$ oc get ingress.config cluster -o=jsonpath={.status.defaultPlacement}
ControlPlane
$ oc get deployment -n openshift-ingress -ojson | jq -r '.items[].spec.template.spec.nodeSelector'
{
"kubernetes.io/os": "linux",
"node-role.kubernetes.io/master": ""
}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |