Bug 1989058 - router pod stuck in ContainerCreatin if removed configmap/router-client-ca-crl-default and update spec.clientTLS.clientCertificatePolicy
Summary: router pod stuck in ContainerCreatin if removed configmap/router-client-ca-cr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.9.0
Assignee: Miciah Dashiel Butler Masters
QA Contact: jechen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-02 10:14 UTC by Hongan Li
Modified: 2022-08-04 22:32 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:43:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 642 0 None None None 2021-08-17 00:47:32 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:43:58 UTC

Description Hongan Li 2021-08-02 10:14:41 UTC
Description of problem:
After removing configmap/router-client-ca-crl-default and update spec.clientTLS.clientCertificatePolicy, the deploy/router-default roll out but pods stuck in ContainerCreating status

OpenShift release version:
4.9.0-0.nightly-2021-08-01-132055

Cluster Platform:
AWS

How reproducible:
100%

Steps to Reproduce (in detail):
1. create cm/test-client-ca in ns openshift-config and enable mTLS in ingresscontroller/default:
spec:
  clientTLS:
    clientCA:
      name: test-client-ca
    clientCertificatePolicy: Requirerd

2. ensure router pods work well

3. remove the configmap/router-client-ca-crl-default, then update "clientCertificatePolicy" to Optional.

$ oc -n openshift-ingress delete cm/router-client-ca-crl-default
$ oc -n openshift-ingress-operator edit ingresscontroller/default
spec:
  clientTLS:
    clientCA:
      name: test-client-ca
    clientCertificatePolicy: Optional


Actual results:
$ oc -n openshift-ingress get pod
NAME                              READY   STATUS              RESTARTS   AGE
router-default-6b4bdb6cf4-r9m7r   0/1     ContainerCreating   0          25m
router-default-6b4bdb6cf4-wk92g   0/1     ContainerCreating   0          25m
router-default-78dcc7cbf9-272xp   1/1     Running             0          26m

$ oc -n openshift-ingress describe pod router-default-6b4bdb6cf4-r9m7r
Events:
  Type     Reason       Age                    From               Message
  ----     ------       ----                   ----               -------
  Normal   Scheduled    16m                    default-scheduler  Successfully assigned openshift-ingress/router-default-6b4bdb6cf4-r9m7r to ip-10-0-216-156.us-east-2.compute.internal
  Warning  FailedMount  14m                    kubelet            Unable to attach or mount volumes: unmounted volumes=[client-ca-crl], unattached volumes=[service-ca-bundle stats-auth metrics-certs client-ca client-ca-crl kube-api-access-k4r75 default-certificate]: timed out waiting for the condition
  Warning  FailedMount  11m                    kubelet            Unable to attach or mount volumes: unmounted volumes=[client-ca-crl], unattached volumes=[metrics-certs client-ca client-ca-crl kube-api-access-k4r75 default-certificate service-ca-bundle stats-auth]: timed out waiting for the condition
  Warning  FailedMount  4m57s (x3 over 9m30s)  kubelet            Unable to attach or mount volumes: unmounted volumes=[client-ca-crl], unattached volumes=[client-ca client-ca-crl kube-api-access-k4r75 default-certificate service-ca-bundle stats-auth metrics-certs]: timed out waiting for the condition
  Warning  FailedMount  2m39s                  kubelet            Unable to attach or mount volumes: unmounted volumes=[client-ca-crl], unattached volumes=[kube-api-access-k4r75 default-certificate service-ca-bundle stats-auth metrics-certs client-ca client-ca-crl]: timed out waiting for the condition
  Warning  FailedMount  104s (x15 over 16m)    kubelet            MountVolume.SetUp failed for volume "client-ca-crl" : configmap "router-client-ca-crl-default" not found
  Warning  FailedMount  24s                    kubelet            Unable to attach or mount volumes: unmounted volumes=[client-ca-crl], unattached volumes=[stats-auth metrics-certs client-ca client-ca-crl kube-api-access-k4r75 default-certificate service-ca-bundle]: timed out waiting for the condition


Expected results:
new router pods should be ready.

Impact of the problem:


Additional info:
workaround: restarting ingress operator pod


** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report.  You may also mark the bug private if you wish.

Comment 1 jechen 2021-08-25 15:05:07 UTC
Verified using pre-PR merge verification

Used cluster-bot: launch 4.9-ci,openshift/cluster-ingress-operator#642

$ oc get clusterversion
NAME      VERSION                                                  AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.ci.test-2021-08-25-130000-ci-ln-wmxtwr2-latest   True        False         3m42s   Cluster version is 4.9.0-0.ci.test-2021-08-25-130000-ci-ln-wmxtwr2-latest


#1. create cm/test-client-ca in ns openshift-config and enable mTLS in ingresscontroller/default, new router pods are created
$ oc create configmap test-client-ca --from-file=./openshift-tests-private/test/extended/testdata/router/ca-bundle.pem  -n openshift-config
configmap/test-client-ca created

$ oc -n openshift-ingress-operator edit ingresscontroller/default
spec:
  clientTLS:
    clientCA:
      name: test-client-ca
    clientCertificatePolicy: Requirerd

$ oc -n openshift-ingress get cm
NAME                       DATA   AGE
kube-root-ca.crt           1      80m
openshift-service-ca.crt   1      80m
router-client-ca-default   1      2m17s
service-ca-bundle          1      80m


$ oc -n openshift-ingress get pod
NAME                              READY   STATUS        RESTARTS   AGE
router-default-6cbfb5f886-gzjfl   1/1     Running       0          72s
router-default-6cbfb5f886-thdvc   1/1     Running       0          72s
router-default-7774747b4f-87nwf   1/1     Terminating   0          71m
router-default-7774747b4f-brn8f   1/1     Terminating   0          71m

$ oc -n openshift-ingress get pod
NAME                              READY   STATUS    RESTARTS   AGE
router-default-6cbfb5f886-gzjfl   1/1     Running   0          8m54s
router-default-6cbfb5f886-thdvc   1/1     Running   0          8m54s


#2.  remove the configmap/router-client-ca-default, then update "clientCertificatePolicy" to Optional, new router pods are created and they are up running
$ oc -n openshift-ingress delete cm/router-client-ca-default
configmap "router-client-ca-default" deleted


$ oc -n openshift-ingress-operator edit ingresscontroller/default
spec:
  clientTLS:
    clientCA:
      name: test-client-ca
    clientCertificatePolicy: Optional



$ oc -n openshift-ingress get pod
NAME                              READY   STATUS        RESTARTS   AGE
router-default-59b665cc4f-xf67g   1/1     Running       0          40s
router-default-59b665cc4f-zqqkl   1/1     Running       0          40s
router-default-6cbfb5f886-gzjfl   1/1     Terminating   0          11m
router-default-6cbfb5f886-thdvc   1/1     Terminating   0          11m


$ oc -n openshift-ingress get pod
NAME                              READY   STATUS    RESTARTS   AGE
router-default-59b665cc4f-xf67g   1/1     Running   0          87s
router-default-59b665cc4f-zqqkl   1/1     Running   0          87s


$ oc -n openshift-ingress describe  pod router-default-59b665cc4f-xf67g
Events:
  Type    Reason          Age    From               Message
  ----    ------          ----   ----               -------
  Normal  Scheduled       2m41s  default-scheduler  Successfully assigned openshift-ingress/router-default-59b665cc4f-xf67g to ci-ln-wmxtwr2-f76d1-w4d97-worker-b-47lx8
  Normal  AddedInterface  2m39s  multus             Add eth0 [10.131.0.46/23] from openshift-sdn
  Normal  Pulled          2m38s  kubelet            Container image "registry.build01.ci.openshift.org/ci-ln-wmxtwr2/stable@sha256:90785597c84ba9f9b5ec44175eb435ea31357e9f77c5f411c1766eb18d4b7d5b" already present on machine
  Normal  Created         2m38s  kubelet            Created container router
  Normal  Started         2m38s  kubelet            Started container router

Comment 4 jechen 2021-08-27 13:36:57 UTC
Have done fastfix verification (pre-PR merge verificaiton), change the status to "verified" now.

Comment 7 errata-xmlrpc 2021-10-18 17:43:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.