Bug 1989058
| Summary: | router pod stuck in ContainerCreatin if removed configmap/router-client-ca-crl-default and update spec.clientTLS.clientCertificatePolicy | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Hongan Li <hongli> |
| Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> |
| Networking sub component: | router | QA Contact: | jechen <jechen> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | aos-bugs, jechen, mmasters |
| Version: | 4.9 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-10-18 17:43:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Verified using pre-PR merge verification
Used cluster-bot: launch 4.9-ci,openshift/cluster-ingress-operator#642
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.9.0-0.ci.test-2021-08-25-130000-ci-ln-wmxtwr2-latest True False 3m42s Cluster version is 4.9.0-0.ci.test-2021-08-25-130000-ci-ln-wmxtwr2-latest
#1. create cm/test-client-ca in ns openshift-config and enable mTLS in ingresscontroller/default, new router pods are created
$ oc create configmap test-client-ca --from-file=./openshift-tests-private/test/extended/testdata/router/ca-bundle.pem -n openshift-config
configmap/test-client-ca created
$ oc -n openshift-ingress-operator edit ingresscontroller/default
spec:
clientTLS:
clientCA:
name: test-client-ca
clientCertificatePolicy: Requirerd
$ oc -n openshift-ingress get cm
NAME DATA AGE
kube-root-ca.crt 1 80m
openshift-service-ca.crt 1 80m
router-client-ca-default 1 2m17s
service-ca-bundle 1 80m
$ oc -n openshift-ingress get pod
NAME READY STATUS RESTARTS AGE
router-default-6cbfb5f886-gzjfl 1/1 Running 0 72s
router-default-6cbfb5f886-thdvc 1/1 Running 0 72s
router-default-7774747b4f-87nwf 1/1 Terminating 0 71m
router-default-7774747b4f-brn8f 1/1 Terminating 0 71m
$ oc -n openshift-ingress get pod
NAME READY STATUS RESTARTS AGE
router-default-6cbfb5f886-gzjfl 1/1 Running 0 8m54s
router-default-6cbfb5f886-thdvc 1/1 Running 0 8m54s
#2. remove the configmap/router-client-ca-default, then update "clientCertificatePolicy" to Optional, new router pods are created and they are up running
$ oc -n openshift-ingress delete cm/router-client-ca-default
configmap "router-client-ca-default" deleted
$ oc -n openshift-ingress-operator edit ingresscontroller/default
spec:
clientTLS:
clientCA:
name: test-client-ca
clientCertificatePolicy: Optional
$ oc -n openshift-ingress get pod
NAME READY STATUS RESTARTS AGE
router-default-59b665cc4f-xf67g 1/1 Running 0 40s
router-default-59b665cc4f-zqqkl 1/1 Running 0 40s
router-default-6cbfb5f886-gzjfl 1/1 Terminating 0 11m
router-default-6cbfb5f886-thdvc 1/1 Terminating 0 11m
$ oc -n openshift-ingress get pod
NAME READY STATUS RESTARTS AGE
router-default-59b665cc4f-xf67g 1/1 Running 0 87s
router-default-59b665cc4f-zqqkl 1/1 Running 0 87s
$ oc -n openshift-ingress describe pod router-default-59b665cc4f-xf67g
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m41s default-scheduler Successfully assigned openshift-ingress/router-default-59b665cc4f-xf67g to ci-ln-wmxtwr2-f76d1-w4d97-worker-b-47lx8
Normal AddedInterface 2m39s multus Add eth0 [10.131.0.46/23] from openshift-sdn
Normal Pulled 2m38s kubelet Container image "registry.build01.ci.openshift.org/ci-ln-wmxtwr2/stable@sha256:90785597c84ba9f9b5ec44175eb435ea31357e9f77c5f411c1766eb18d4b7d5b" already present on machine
Normal Created 2m38s kubelet Created container router
Normal Started 2m38s kubelet Started container router
Have done fastfix verification (pre-PR merge verificaiton), change the status to "verified" now. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |
Description of problem: After removing configmap/router-client-ca-crl-default and update spec.clientTLS.clientCertificatePolicy, the deploy/router-default roll out but pods stuck in ContainerCreating status OpenShift release version: 4.9.0-0.nightly-2021-08-01-132055 Cluster Platform: AWS How reproducible: 100% Steps to Reproduce (in detail): 1. create cm/test-client-ca in ns openshift-config and enable mTLS in ingresscontroller/default: spec: clientTLS: clientCA: name: test-client-ca clientCertificatePolicy: Requirerd 2. ensure router pods work well 3. remove the configmap/router-client-ca-crl-default, then update "clientCertificatePolicy" to Optional. $ oc -n openshift-ingress delete cm/router-client-ca-crl-default $ oc -n openshift-ingress-operator edit ingresscontroller/default spec: clientTLS: clientCA: name: test-client-ca clientCertificatePolicy: Optional Actual results: $ oc -n openshift-ingress get pod NAME READY STATUS RESTARTS AGE router-default-6b4bdb6cf4-r9m7r 0/1 ContainerCreating 0 25m router-default-6b4bdb6cf4-wk92g 0/1 ContainerCreating 0 25m router-default-78dcc7cbf9-272xp 1/1 Running 0 26m $ oc -n openshift-ingress describe pod router-default-6b4bdb6cf4-r9m7r Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 16m default-scheduler Successfully assigned openshift-ingress/router-default-6b4bdb6cf4-r9m7r to ip-10-0-216-156.us-east-2.compute.internal Warning FailedMount 14m kubelet Unable to attach or mount volumes: unmounted volumes=[client-ca-crl], unattached volumes=[service-ca-bundle stats-auth metrics-certs client-ca client-ca-crl kube-api-access-k4r75 default-certificate]: timed out waiting for the condition Warning FailedMount 11m kubelet Unable to attach or mount volumes: unmounted volumes=[client-ca-crl], unattached volumes=[metrics-certs client-ca client-ca-crl kube-api-access-k4r75 default-certificate service-ca-bundle stats-auth]: timed out waiting for the condition Warning FailedMount 4m57s (x3 over 9m30s) kubelet Unable to attach or mount volumes: unmounted volumes=[client-ca-crl], unattached volumes=[client-ca client-ca-crl kube-api-access-k4r75 default-certificate service-ca-bundle stats-auth metrics-certs]: timed out waiting for the condition Warning FailedMount 2m39s kubelet Unable to attach or mount volumes: unmounted volumes=[client-ca-crl], unattached volumes=[kube-api-access-k4r75 default-certificate service-ca-bundle stats-auth metrics-certs client-ca client-ca-crl]: timed out waiting for the condition Warning FailedMount 104s (x15 over 16m) kubelet MountVolume.SetUp failed for volume "client-ca-crl" : configmap "router-client-ca-crl-default" not found Warning FailedMount 24s kubelet Unable to attach or mount volumes: unmounted volumes=[client-ca-crl], unattached volumes=[stats-auth metrics-certs client-ca client-ca-crl kube-api-access-k4r75 default-certificate service-ca-bundle]: timed out waiting for the condition Expected results: new router pods should be ready. Impact of the problem: Additional info: workaround: restarting ingress operator pod ** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report. You may also mark the bug private if you wish.