Bug 2024946
| Summary: | Ingress Canary does not respect router sharding on default IngressController | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Simon Reber <sreber> |
| Component: | Networking | Assignee: | Grant Spence <gspence> |
| Networking sub component: | router | QA Contact: | Shudi Li <shudili> |
| Status: | CLOSED ERRATA | Docs Contact: | Jesse Dohmann <jdohmann> |
| Severity: | medium | ||
| Priority: | medium | CC: | ahardin, aos-bugs, gspence, hongli, mmasters, ramon.gordillo |
| Version: | 4.9 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.11.z | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause:
The documentation is unclear on the implications of sharding the default ingress controller. Additionally, the cluster operator didn't report an issue if the default ingress controller was not selecting the canary route.
Consequence:
Users shard their default ingress controller subsequently break the canary route (or others).
Fix:
Update the documentation to be more clear on the implications of sharding the default ingress controller as well as add a cluster operator error if a user causes the canary route to not be selected by the default ingress controller.
Result:
Users avoid sharding the default ingress controller in such a way that breaks their cluster.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-09-20 16:34:44 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2108214 | ||
| Bug Blocks: | |||
|
Description
Simon Reber
2021-11-19 14:55:10 UTC
Setting blocker- because this is not a regression but we do need to figure out whether/how we can support this configuration. This issue looks related to bug 2021446, so I will investigate both BZs. https://github.com/openshift/cluster-ingress-operator/pull/723 merged on March 30, but automation didn't update the BZ status. The fix should be in nightlies since April, so I'm moving the BZ to ON_QA. Failed to verified it with 4.11.0-0.nightly-2022-06-23-092832 Flexy id: 114986(I will keep this cluster tonight) kubeconfig: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/114986/artifact/workdir/install-dir/auth/kubeconfig A: 1. % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-23-092832 True False 3h59m Cluster version is 4.11.0-0.nightly-2022-06-23-092832 % 2. edit ingresscontroller default with namespaceSelector and nodePlacement % oc -n openshift-ingress-operator edit ingresscontroller default ingresscontroller.operator.openshift.io/default edited % 3. % oc -n openshift-ingress-operator get ingresscontroller default -oyaml | grep -A18 spec: spec: clientTLS: clientCA: name: "" clientCertificatePolicy: "" httpCompression: {} httpEmptyRequestsPolicy: Respond httpErrorCodePages: name: "" namespaceSelector: matchLabels: type: sharded nodePlacement: nodeSelector: matchLabels: node-role.kubernetes.io/worker: "" replicas: 2 tuningOptions: {} unsupportedConfigOverrides: null % 4. delete the ingress-operator pod % oc -n openshift-ingress-operator get pods NAME READY STATUS RESTARTS AGE ingress-operator-5b4c9d69df-cpdgb 2/2 Running 2 (63m ago) 70m % oc -n openshift-ingress-operator delete pod ingress-operator-5b4c9d69df-cpdgb pod "ingress-operator-5b4c9d69df-cpdgb" deleted % 5. % oc -n openshift-ingress-operator get pods NAME READY STATUS RESTARTS AGE ingress-operator-5b4c9d69df-lw78x 2/2 Running 0 17s shudi@Shudis-MacBook-Pro 410 % oc -n openshift-ingress get pods NAME READY STATUS RESTARTS AGE router-default-6485c44f88-gct4m 1/1 Running 0 2m45s router-default-6485c44f88-w7lnt 1/1 Running 0 2m45s % 6. ingress co was degraded % oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.11.0-0.nightly-2022-06-23-153912 False False True 28m OAuthServerRouteEndpointAccessibleControllerAvailable: "https://oauth-openshift.apps.shudi-411awspm01.qe.devcluster.openshift.com/healthz" returned "503 Service Unavailable" baremetal 4.11.0-0.nightly-2022-06-23-153912 True False False 95m cloud-controller-manager 4.11.0-0.nightly-2022-06-23-153912 True False False 97m cloud-credential 4.11.0-0.nightly-2022-06-23-153912 True False False 97m cluster-autoscaler 4.11.0-0.nightly-2022-06-23-153912 True False False 95m config-operator 4.11.0-0.nightly-2022-06-23-153912 True False False 96m console 4.11.0-0.nightly-2022-06-23-153912 False False False 28m RouteHealthAvailable: route not yet available, https://console-openshift-console.apps.shudi-411awspm01.qe.devcluster.openshift.com returns '503 Service Unavailable' csi-snapshot-controller 4.11.0-0.nightly-2022-06-23-153912 True False False 96m dns 4.11.0-0.nightly-2022-06-23-153912 True False False 95m etcd 4.11.0-0.nightly-2022-06-23-153912 True False False 94m image-registry 4.11.0-0.nightly-2022-06-23-153912 True False False 83m ingress 4.11.0-0.nightly-2022-06-23-153912 True False True 87m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing) B: Use oc apply router-internal2.yaml, then delete the ingress-operator pod, ingress co is degraded, too. 1. % oc apply -f router-internal2.yaml E0624 19:59:34.031859 10669 request.go:1085] Unexpected error when reading response body: net/http: request canceled (Client.Timeout or context cancellation while reading body) Warning: resource ingresscontrollers/default is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by oc apply. oc apply should only be used on resources created declaratively by either oc create --save-config or oc apply. The missing annotation will be patched automatically. ingresscontroller.operator.openshift.io/default configured % % cat router-internal2.yaml apiVersion: v1 items: - apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: domain: apps.shudi-411awspm01.qe.devcluster.openshift.com nodePlacement: nodeSelector: matchLabels: node-role.kubernetes.io/worker: "" routeSelector: matchLabels: type: sharded status: {} kind: List metadata: resourceVersion: "" selfLink: "" % 2. % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-23-153912 True False 105m Cluster version is 4.11.0-0.nightly-2022-06-23-153912 shudi@Shudis-MacBook-Pro 410 % oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.11.0-0.nightly-2022-06-23-153912 False False True 7m36s OAuthServerRouteEndpointAccessibleControllerAvailable: "https://oauth-openshift.apps.shudi-411awspm01.qe.devcluster.openshift.com/healthz" returned "503 Service Unavailable" baremetal 4.11.0-0.nightly-2022-06-23-153912 True False False 124m cloud-controller-manager 4.11.0-0.nightly-2022-06-23-153912 True False False 126m cloud-credential 4.11.0-0.nightly-2022-06-23-153912 True False False 126m cluster-autoscaler 4.11.0-0.nightly-2022-06-23-153912 True False False 124m config-operator 4.11.0-0.nightly-2022-06-23-153912 True False False 125m console 4.11.0-0.nightly-2022-06-23-153912 False False False 7m38s RouteHealthAvailable: route not yet available, https://console-openshift-console.apps.shudi-411awspm01.qe.devcluster.openshift.com returns '503 Service Unavailable' csi-snapshot-controller 4.11.0-0.nightly-2022-06-23-153912 True False False 125m dns 4.11.0-0.nightly-2022-06-23-153912 True False False 124m etcd 4.11.0-0.nightly-2022-06-23-153912 True False False 123m image-registry 4.11.0-0.nightly-2022-06-23-153912 True False False 112m ingress 4.11.0-0.nightly-2022-06-23-153912 True False True 15m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing) Use fresh cluster and do the test again by the oc apply command
1.
% oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-06-25-081133 True False 23m Cluster version is 4.11.0-0.nightly-2022-06-25-081133
% oc get route -o json -n openshift-authentication oauth-openshift | jq '.status'
{
"ingress": [
{
"conditions": [
{
"lastTransitionTime": "2022-06-27T09:37:40Z",
"status": "True",
"type": "Admitted"
}
],
"host": "oauth-openshift.apps.shudi-411awspm08.qe.devcluster.openshift.com",
"routerCanonicalHostname": "router-default.apps.shudi-411awspm08.qe.devcluster.openshift.com",
"routerName": "default",
"wildcardPolicy": "None"
}
]
}
%
2.
% oc get dns.config/cluster -oyaml | grep -i domain
baseDomain: shudi-411awspm08.qe.devcluster.openshift.com
%
3. oc apply -f router-internal2.yaml
% cat router-internal2.yaml
apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
name: default
namespace: openshift-ingress-operator
spec:
domain: apps.shudi-411awspm08.qe.devcluster.openshift.com
nodePlacement:
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker: ""
routeSelector:
matchLabels:
type: sharded
status: {}
kind: List
metadata:
resourceVersion: ""
selfLink: ""
% oc apply -f router-internal2.yaml
4. More than 10 minutes passed,
% oc get route -o json -n openshift-authentication oauth-openshift | jq '.status'
{}
%
5.
% oc get co | grep ingress
ingress 4.11.0-0.nightly-2022-06-25-081133 True False True 52m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=Unknown (CanaryRouteNotAdmitted: Canary route is not admitted by the default ingress controller)
%
6.
% oc get co | egrep "authentication|ingress|monitoring"
authentication 4.11.0-0.nightly-2022-06-25-081133 False False True 30m OAuthServerRouteEndpointAccessibleControllerAvailable: "https://oauth-openshift.apps.shudi-411awspm08.qe.devcluster.openshift.com/healthz" returned "503 Service Unavailable"
ingress 4.11.0-0.nightly-2022-06-25-081133 True False True 64m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=Unknown (CanaryRouteNotAdmitted: Canary route is not admitted by the default ingress controller)
monitoring 4.11.0-0.nightly-2022-06-25-081133 False True True 15m Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
%
7.
% oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-06-25-081133 True False 58m Cluster version is 4.11.0-0.nightly-2022-06-25-081133
%
failed to verify it with 4.11.0-0.nightly-2022-09-14-233224
1.
% oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-09-14-233224 True False 47m Cluster version is 4.11.0-0.nightly-2022-09-14-233224
%
2.
% oc -n openshift-ingress-operator edit ingresscontroller default
ingresscontroller.operator.openshift.io/default edited
%
3.
% oc -n openshift-ingress-operator get ingresscontroller default -oyaml | grep -A18 spec:
spec:
clientTLS:
clientCA:
name: ""
clientCertificatePolicy: ""
httpCompression: {}
httpEmptyRequestsPolicy: Respond
httpErrorCodePages:
name: ""
namespaceSelector:
matchLabels:
type: sharded
nodePlacement:
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker: ""
replicas: 2
tuningOptions: {}
unsupportedConfigOverrides: null
%
4. wait two new router pods created, then show the ingress-operator pod
% oc -n openshift-ingress-operator get pods
NAME READY STATUS RESTARTS AGE
ingress-operator-5b6d5b7fbc-zgj5b 2/2 Running 2 (62m ago) 68m
%
5.
% oc -n openshift-ingress-operator delete pod ingress-operator-5b6d5b7fbc-zgj5b
pod "ingress-operator-5b6d5b7fbc-zgj5b" deleted
%
% oc -n openshift-ingress-operator get pods
NAME READY STATUS RESTARTS AGE
ingress-operator-5b6d5b7fbc-gglvl 2/2 Running 0 15s
%
6.
% oc get co | grep ingress
ingress 4.11.0-0.nightly-2022-09-14-233224 True False True 71m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=Unknown (CanaryRouteNotAdmitted: Canary route is not admitted by the default ingress controller)
%
7.
% oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-09-14-233224 True False 61m Error while reconciling 4.11.0-0.nightly-2022-09-14-233224: the cluster operator ingress is degraded
%
Facing it in a 4.11.0 SNO installation (OVN) without sharding, that prevents it to finish. ingress 4.11.0 True False True 22m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=Unknown (CanaryRouteNotAdmitted: Canary route is not admitted by the default ingress controller) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.11.5 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6536 |