2106264 – Static pods stuck in 'CreateContainerError' after SDN migration then rollback on Ali Cloud

Bug 2106264 - Static pods stuck in 'CreateContainerError' after SDN migration then rollback on Ali Cloud

Summary: Static pods stuck in 'CreateContainerError' after SDN migration then rollback...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.12
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.12.z
Assignee:	Sascha Grunert
QA Contact:	Sunil Choudhary
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-07-12 08:56 UTC by Peng Liu
Modified:	2023-01-30 17:31 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-01-30 17:31:32 UTC
Target Upstream Version:
Embargoed:
Flags:	pliu: needinfo- pliu: needinfo-

Attachments	(Terms of Use)
kubelet log (12.35 MB, text/plain) 2022-07-12 08:56 UTC, Peng Liu	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2023:0449	0	None	None	None	2023-01-30 17:31:34 UTC

Description Peng Liu 2022-07-12 08:56:17 UTC

Created attachment 1896284 [details]
kubelet log

Description of problem:

After doing SDN migration and rollback on a cluster on AliCloud, the static pod kube-controller-manager cannot become ready. The pods stuck in 'CreateContainerError' with error 'error reserving ctr name k8s_cluster-policy-controller_kube-controller-manager-zzhao-alisdn3-pzk57-master-2_openshift-kube-controller-manager_6b7005530f1d4a02799194a880c80087_3 for id 23b521bccdd316f1a6ab720ee9694c626a7f8c7814fc336e8439af3b37881c93: name is reserved'

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy a cluster on alicloud using openshift-sdn
2. Migrate the network provider to ovn-kubernetes. https://docs.openshift.com/container-platform/4.10/networking/ovn_kubernetes_network_provider/migrate-from-openshift-sdn.html
3. Rollback the network provider to openshift-sdn, after the migration is done successfully.
https://docs.openshift.com/container-platform/4.10/networking/ovn_kubernetes_network_provider/rollback-to-openshift-sdn.html

Actual results:
Some static pods cannot become ready. The pods can be kube-controller-manager or kube-apiserver.

Expected results:
All pods can work after the rollback.


Additional info:
The symptom looks similar to https://bugzilla.redhat.com/show_bug.cgi?id=1785399 
This issue doesn't happen on other platforms, like AWS, GCP, baremetal, IBM Cloud etc.

Comment 3 Peng Liu 2022-07-12 09:03:14 UTC

Executing 'crictl rm -r -a' on the node can recover the pod. This is also a workaround mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1785399.
I also found the container state from kubectl, is not synced with the actual container state on the node. The container was up and running on the node, but the container log from kubectl only contains previous container failure.

Comment 4 Ryan Phillips 2022-07-26 18:28:43 UTC

cri-o needs a reboot to detect the new networking setup. Is there something reboot the nodes when there is a network change?

Comment 5 Peng Liu 2022-07-27 02:20:17 UTC

@rphillips Yes, there are 2 reboots during the rollback. One is triggered manually, MCO triggers the other one. Also after seeing this problem, I tried to reboot the node with malfunctioning pods again, but it didn't help. The pods still could not become ready after reboot.

BTW, The kube-controller-manager or kube-apiserver pods are using hostnetwork instead of CNI. So they are directly affected by the SDN migration which only swaps the CNI of the cluster.

Comment 6 Sascha Grunert 2022-07-27 07:32:56 UTC

Hey Peng, I'm looking into this issue in addition to my team members.

(In reply to Peng Liu from comment #3)
> I also found the container state from kubectl, is not synced with the actual container state on the node. The container was up and running on the node, but the container log from kubectl only contains previous container failure.

I assume that up and running means that `crictl inspect $ID | jq .status.state` reports "CONTAINER_RUNNING", right?

If that's the case, would it be possible to get the kubelet logs with increased verbosity `-v=10` as well as one of the affected container names?

Comment 7 Peng Liu 2022-07-27 11:25:54 UTC

I've prepared a cluster for online debugging.

Comment 8 Sascha Grunert 2022-07-27 12:45:49 UTC

The test setup mentioned by Peng reveals the issue that we have two cluster-policy-controller containers, where one is up and running and the other one blocks the name, which tries to get created by the kubelet:

> 47b97d9dd4801       e1bdd290e0d56520dc3f8a8af8fcdee28b1d1c99214b9ee3cb63c8faa256c4c1   Less than a second ago   Exited              cluster-policy-controller                     2                   621ba87014777       kube-controller-manager-pliu-alicloud-qp45q-master-0
> 293d816f6735e       e1bdd290e0d56520dc3f8a8af8fcdee28b1d1c99214b9ee3cb63c8faa256c4c1   48 minutes ago           Running             cluster-policy-controller                     3                   621ba87014777       kube-controller-manager-pliu-alicloud-qp45q-master-0

I'd expect that the kubelet does a cleanup of running containers before trying to create new ones, so this may be the bug here.

Peng, can you please try to reproduce the issue with the latest 4.11 and tell me if it works there?

Comment 9 Peng Liu 2022-07-28 10:46:48 UTC

I've reproduced this issue with 4.11.0-0.ci-2022-07-27-174640.

Comment 10 Sascha Grunert 2022-07-28 12:54:21 UTC

I've built an experimental workaround into CRI-O to remove the container once we get the name reservation error message 10 times in 10 minutes: 

https://github.com/cri-o/cri-o/pull/6097
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=46802628

Can you please try the 4.12 reproducer again with the modified package cri-o-1.25.0-13.rhaos4.12.gitdf67c83.el8 and enabled CRI-O debug logs:

http://brew-task-repos.usersys.redhat.com/repos/scratch/sgrunert/cri-o/1.25.0/13.rhaos4.12.gitdf67c83.el8/

You can install the rpms locally by putting this .repo file in your /etc/yum.repos.d/ directory:

http://brew-task-repos.usersys.redhat.com/repos/scratch/sgrunert/cri-o/1.25.0/13.rhaos4.12.gitdf67c83.el8/cri-o-1.25.0-13.rhaos4.12.gitdf67c83.el8-scratch.repo

RPMs and build logs can be found in the following locations:

http://brew-task-repos.usersys.redhat.com/repos/scratch/sgrunert/cri-o/1.25.0/13.rhaos4.12.gitdf67c83.el8/aarch64/
http://brew-task-repos.usersys.redhat.com/repos/scratch/sgrunert/cri-o/1.25.0/13.rhaos4.12.gitdf67c83.el8/ppc64le/
http://brew-task-repos.usersys.redhat.com/repos/scratch/sgrunert/cri-o/1.25.0/13.rhaos4.12.gitdf67c83.el8/s390x/
http://brew-task-repos.usersys.redhat.com/repos/scratch/sgrunert/cri-o/1.25.0/13.rhaos4.12.gitdf67c83.el8/x86_64/

Comment 11 Sascha Grunert 2022-07-28 13:01:26 UTC

Removing the blocker and reducing the urgency since we have the `crictl rm -fa` workaround.

Comment 13 Sascha Grunert 2022-07-29 09:22:45 UTC

I had to build a new version of the patch:

- https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=46827743
- http://brew-task-repos.usersys.redhat.com/repos/scratch/sgrunert/cri-o/1.25.0/14.rhaos4.12.gitdf67c83.el8/

Peng, can you please try it again and send me over the CRI-O and kubelet debug logs?

Comment 14 Peng Liu 2022-07-29 14:22:23 UTC

The new build can remove the stale container from the node. However, the container still cannot become ready in the apiserver.

NAME                                                READY   STATUS      RESTARTS   AGE     IP             NODE                           NOMINATED NODE   READINESS GATES
apiserver-watcher-pliu-alicloud-bxctb-master-0      1/1     Running     2          8h      10.0.101.108   pliu-alicloud-bxctb-master-0   <none>           <none>
apiserver-watcher-pliu-alicloud-bxctb-master-1      1/1     Running     3          8h      10.0.157.160   pliu-alicloud-bxctb-master-1   <none>           <none>
apiserver-watcher-pliu-alicloud-bxctb-master-2      1/1     Running     3          8h      10.0.101.109   pliu-alicloud-bxctb-master-2   <none>           <none>
kube-apiserver-guard-pliu-alicloud-bxctb-master-0   1/1     Running     0          7h8m    10.130.0.13    pliu-alicloud-bxctb-master-0   <none>           <none>
kube-apiserver-guard-pliu-alicloud-bxctb-master-1   1/1     Running     1          7h15m   10.128.0.23    pliu-alicloud-bxctb-master-1   <none>           <none>
kube-apiserver-guard-pliu-alicloud-bxctb-master-2   1/1     Running     1          7h12m   10.129.0.9     pliu-alicloud-bxctb-master-2   <none>           <none>
kube-apiserver-pliu-alicloud-bxctb-master-0         5/5     Running     14         8h      10.0.101.108   pliu-alicloud-bxctb-master-0   <none>           <none>
kube-apiserver-pliu-alicloud-bxctb-master-1         4/5     Running     16         8h      10.0.157.160   pliu-alicloud-bxctb-master-1   <none>           <none>
kube-apiserver-pliu-alicloud-bxctb-master-2         4/5     Running     16         8h      10.0.101.109   pliu-alicloud-bxctb-master-2   <none>           <none>
revision-pruner-7-pliu-alicloud-bxctb-master-0      0/1     Completed   0          7h11m   10.130.0.44    pliu-alicloud-bxctb-master-0   <none>           <none>
revision-pruner-7-pliu-alicloud-bxctb-master-1      0/1     Completed   0          7h17m   10.128.0.5     pliu-alicloud-bxctb-master-1   <none>           <none>
revision-pruner-7-pliu-alicloud-bxctb-master-2      0/1     Completed   0          7h14m   10.129.0.40    pliu-alicloud-bxctb-master-2   <none>           <none>


Here are the kubelet log https://paste.c-net.org/PilotSocial and crio log https://paste.c-net.org/MuldoonPolitely

Comment 15 Sascha Grunert 2022-08-01 07:52:11 UTC

Thank you Peng,

I would require the full CRI-O logs for the reproducer, unfortunately it looks like that we only capture 1 minute of them.

Another question: Does a restart of the kubelet resolve the issue?


Ryan, may I ask you to assist me with the kubelet part here? The facts are right now:

- There is a container up and running (healthy) in the API server POD after the SDN migration (this is visible via crictl for example)
- The kubelet still tries to create the container and fails because CRI-O reports "name is reserved" (see the logs above)

I know the kubelet caches the results, right? So I'm wondering if a restart of the kubelet helps here from the failing state. It feels like a bug in the kubelet, and my CRI-O workaround (https://github.com/cri-o/cri-o/pull/6097) tries to fix it from the opposite direction.

Comment 16 Peng Liu 2022-08-01 08:41:05 UTC

@sgrunert The crio log is huge after turning on the debug level. Ping me on slack if you need an environment for debugging.

Comment 17 Peng Liu 2022-08-01 14:11:08 UTC

We have some new findings while debugging this issue. This issue can be reproduced without doing any SDN migration operation. Instead, a MachineConfig update can also trigger it, for instance, turning on crio debug logging following https://access.redhat.com/solutions/5133191. We hit this issue when doing SDN migration because there is also a MachineConfig update in the operation of migration.

I set up the reproduce environment in Ali cloud with QE's CI job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install. The test environment can only last for 12 hours.

Comment 18 Sascha Grunert 2022-08-01 14:32:36 UTC

Thanks Peng, this is the summary of the current state:

We already have a container kube-apiserver up and running in the pod where the kubelet tries to create one with the same name resulting in the "name reserved" error:

sh-4.4# crictl ps -a --name kube-apiserver
CONTAINER           IMAGE                                                              CREATED                  STATE               NAME                                          ATTEMPT             POD ID              POD
6305d725a6bd3       9fa866b8c15bf5e536504da71b706caf1dc0c926ed21991f69425f6e41938ba1   Less than a second ago   Exited              kube-apiserver                                3                   90943b3fa5860       kube-apiserver-pliu-alicloud-fblsj-master-0
57beb7420744b       9fa866b8c15bf5e536504da71b706caf1dc0c926ed21991f69425f6e41938ba1   11 minutes ago           Running             kube-apiserver                                4                   90943b3fa5860       kube-apiserver-pliu-alicloud-fblsj-master-0

The workaround does not work as expected so I have to find out why. Ryan, can you double check the kubelet logs to see if we can find the root cause in the kubelet sync loop?

All logs can be found here:
https://drive.google.com/drive/folders/1C18N-k_vsP6CMagxAYs20WSkdltKwxF6?usp=sharing

Comment 19 Sascha Grunert 2022-08-02 10:44:51 UTC

Have a new run without any cluster modification other than applying a machine config 99-master-kubelet-loglevel: 

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-master-kubelet-loglevel
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
        - dropins:
            - contents: |
                [Service]
                Environment="KUBELET_LOG_LEVEL=10"
              name: 30-logging.conf
          enabled: true
          name: kubelet.service

The crictl output shows the problem:

CONTAINER           IMAGE                                                              CREATED                  STATE               NAME                                          ATTEMPT             POD ID              POD
f3a6fa482cd0b       0bc42fd43ea720db6380802cc39ce72f392e78bc0eb18afc0fa261e2ae2c8e55   Less than a second ago   Running             kube-apiserver-cert-regeneration-controller   1                   a8cc75ec372da       kube-apiserver-pliu-alicloud-2n9bd-master-0
e287689759221       0bc42fd43ea720db6380802cc39ce72f392e78bc0eb18afc0fa261e2ae2c8e55   Less than a second ago   Running             kube-apiserver-cert-syncer                    1                   a8cc75ec372da       kube-apiserver-pliu-alicloud-2n9bd-master-0
4065e776eb308       9fa866b8c15bf5e536504da71b706caf1dc0c926ed21991f69425f6e41938ba1   Less than a second ago   Exited              kube-apiserver                                1                   a8cc75ec372da       kube-apiserver-pliu-alicloud-2n9bd-master-0
ad56b755641d9       9fa866b8c15bf5e536504da71b706caf1dc0c926ed21991f69425f6e41938ba1   40 minutes ago           Running             kube-apiserver                                2                   a8cc75ec372da       kube-apiserver-pliu-alicloud-2n9bd-master-0
60b30abe43fda       0bc42fd43ea720db6380802cc39ce72f392e78bc0eb18afc0fa261e2ae2c8e55   40 minutes ago           Running             kube-apiserver-check-endpoints                1                   a8cc75ec372da       kube-apiserver-pliu-alicloud-2n9bd-master-0
d45d4d8dd6b32       0bc42fd43ea720db6380802cc39ce72f392e78bc0eb18afc0fa261e2ae2c8e55   40 minutes ago           Running             kube-apiserver-insecure-readyz                1                   a8cc75ec372da       kube-apiserver-pliu-alicloud-2n9bd-master-0

kubelet and CRI-O logs: https://drive.google.com/drive/folders/1rSvUMFP3bpROI9MVa1i19cUckKoGNwxU?usp=sharing

Comment 20 Sascha Grunert 2022-08-02 14:38:11 UTC

Found out that the revert in https://github.com/cri-o/cri-o/pull/6111 fixes the problem. We're now discussing how to proceed.

Comment 21 Sascha Grunert 2022-08-25 13:15:50 UTC

We're now working on https://github.com/cri-o/cri-o/pull/6123 to solve the problem.

Comment 22 Peng Liu 2022-09-20 01:57:12 UTC

I see the fix has been merged upstream. May I ask when we can have it in OCP?

Comment 23 Sascha Grunert 2022-09-20 07:04:40 UTC

(In reply to Peng Liu from comment #22)
> I see the fix has been merged upstream. May I ask when we can have it in OCP?

The merge into main will land in 1.26 / OCP 4.13, we can backport it to 4.12 once it's verified by QA.

Comment 24 Weibin Liang 2022-09-20 14:00:11 UTC

@sgrunert Right now we do not have v4.13 available in  https://amd64.ocp.releases.ci.openshift.org/, QE can not do any testing. Move Status back to Assign.

Comment 25 Sascha Grunert 2022-09-21 07:17:08 UTC

Alright I'm opening a cherry-pick for 4.15 to be able to verify it: https://github.com/cri-o/cri-o/pull/6241

Comment 26 Sascha Grunert 2022-09-21 07:25:00 UTC

The PR already merged, we can now verify once CI has CRI-O 76292062.

https://github.com/cri-o/cri-o/commit/76292062dbcd6fc77569fcec45487551fa40d844

Comment 27 Peng Liu 2022-11-07 07:40:05 UTC

@weliang The patch has been merged. Could you help to verify it?

Comment 28 Weibin Liang 2022-11-07 14:23:11 UTC

@pliu 

Two Alicloud cluster installation bugs blocked our verification.
https://issues.redhat.com/browse/OCPBUGS-2248
https://issues.redhat.com/browse/OCPBUGS-2388

Comment 29 Peng Liu 2022-11-08 01:27:26 UTC

Ok, let's wait for the installer fix.

Comment 31 Weibin Liang 2023-01-05 19:24:29 UTC

Verification failed on 4.13.0-0.nightly-2023-01-01-223309

[weliang@weliang openshift-tests-private]$ oc get all -n openshift-kube-apiserver
NAME                                                    READY   STATUS              RESTARTS        AGE
pod/apiserver-watcher-weliang-01053-rx8tq-master-0      1/1     Running             4               4h39m
pod/apiserver-watcher-weliang-01053-rx8tq-master-1      1/1     Running             4               4h37m
pod/apiserver-watcher-weliang-01053-rx8tq-master-2      1/1     Running             4               4h37m
pod/kube-apiserver-guard-weliang-01053-rx8tq-master-0   1/1     Running             0               134m
pod/kube-apiserver-guard-weliang-01053-rx8tq-master-1   1/1     Running             0               126m
pod/kube-apiserver-guard-weliang-01053-rx8tq-master-2   1/1     Running             0               130m
pod/kube-apiserver-weliang-01053-rx8tq-master-0         4/5     RunContainerError   25 (134m ago)   4h11m
pod/kube-apiserver-weliang-01053-rx8tq-master-1         4/5     Running             22              4h5m
pod/kube-apiserver-weliang-01053-rx8tq-master-2         4/5     RunContainerError   25 (130m ago)   4h8m
pod/revision-pruner-7-weliang-01053-rx8tq-master-0      0/1     Completed           0               135m
pod/revision-pruner-7-weliang-01053-rx8tq-master-1      0/1     Completed           0               130m
pod/revision-pruner-7-weliang-01053-rx8tq-master-2      0/1     Completed           0               132m

NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/apiserver   ClusterIP   172.30.71.194   <none>        443/TCP   4h37m
[weliang@weliang openshift-tests-private]$ oc describe pod/kube-apiserver-weliang-01053-rx8tq-master-0 -n openshift-kube-apiserver
Name:                 kube-apiserver-weliang-01053-rx8tq-master-0
Namespace:            openshift-kube-apiserver
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 weliang-01053-rx8tq-master-0/10.0.99.103
Start Time:           Thu, 05 Jan 2023 09:43:08 -0500
Labels:               apiserver=true
                      app=openshift-kube-apiserver
                      revision=7
Annotations:          kubectl.kubernetes.io/default-container: kube-apiserver
                      kubernetes.io/config.hash: 0fec70f8250dabd2139268425b6896e7
                      kubernetes.io/config.mirror: 0fec70f8250dabd2139268425b6896e7
                      kubernetes.io/config.seen: 2023-01-05T15:05:29.477569711Z
                      kubernetes.io/config.source: file
                      target.workload.openshift.io/management: {"effect": "PreferredDuringScheduling"}
Status:               Running
IP:                   10.0.99.103
IPs:
  IP:           10.0.99.103
Controlled By:  Node/weliang-01053-rx8tq-master-0
Init Containers:
  setup:
    Container ID:  cri-o://b99295901e4c00b71a9badb162fc7d10d77f47673eeecf14afe369286c8152c7
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/bin/timeout
      220
      /bin/bash
      -ec
    Args:
      echo "Fixing audit permissions ..."
      chmod 0700 /var/log/kube-apiserver && touch /var/log/kube-apiserver/audit.log && chmod 0600 /var/log/kube-apiserver/*
      
      LOCK=/var/log/kube-apiserver/.lock
      echo "Acquiring exclusive lock ${LOCK} ..."
      
      # Waiting for 135s max for old kube-apiserver's watch-termination process to exit and remove the lock.
      # Two cases:
      # 1. if kubelet does not start the old and new in parallel (i.e. works as expected), the flock will always succeed without any time.
      # 2. if kubelet does overlap old and new pods for up to 130s, the flock will wait and immediate return when the old finishes.
      #
      # NOTE: We can increase 135s for a bigger expected overlap. But a higher value means less noise about the broken kubelet behaviour, i.e. we hide a bug.
      # NOTE: Do not tweak these timings without considering the livenessProbe initialDelaySeconds
      exec {LOCK_FD}>${LOCK} && flock --verbose -w 135 "${LOCK_FD}" || {
        echo "$(date -Iseconds -u) kubelet did not terminate old kube-apiserver before new one" >> /var/log/kube-apiserver/lock.log
        echo -n ": WARNING: kubelet did not terminate old kube-apiserver before new one."
      
        # We failed to acquire exclusive lock, which means there is old kube-apiserver running in system.
        # Since we utilize SO_REUSEPORT, we need to make sure the old kube-apiserver stopped listening.
        #
        # NOTE: This is a fallback for broken kubelet, if you observe this please report a bug.
        echo -n "Waiting for port 6443 to be released due to likely bug in kubelet or CRI-O "
        while [ -n "$(ss -Htan state listening '( sport = 6443 or sport = 6080 )')" ]; do
          echo -n "."
          sleep 1
          (( tries += 1 ))
          if [[ "${tries}" -gt 10 ]]; then
            echo "Timed out waiting for port :6443 and :6080 to be released, this is likely a bug in kubelet or CRI-O"
            exit 1
          fi
        done
        #  This is to make sure the server has terminated independently from the lock.
        #  After the port has been freed (requests can be pending and need 60s max).
        sleep 65
      }
      # We cannot hold the lock from the init container to the main container. We release it here. There is no risk, at this point we know we are safe.
      flock -u "${LOCK_FD}"
      
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 05 Jan 2023 20:03:19 -0500
      Finished:     Thu, 05 Jan 2023 20:03:19 -0500
    Ready:          True
    Restart Count:  4
    Requests:
      cpu:        5m
      memory:     50Mi
    Environment:  <none>
    Mounts:
      /var/log/kube-apiserver from audit-dir (rw)
Containers:
  kube-apiserver:
    Container ID:  cri-o://2f29f674782ee2f975537fb06e6ba732d4829f1276509d7badd547158edc470a
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70
    Port:          6443/TCP
    Host Port:     6443/TCP
    Command:
      /bin/bash
      -ec
    Args:
      LOCK=/var/log/kube-apiserver/.lock
      # We should be able to acquire the lock immediatelly. If not, it means the init container has not released it yet and kubelet or CRI-O started container prematurely.
      exec {LOCK_FD}>${LOCK} && flock --verbose -w 30 "${LOCK_FD}" || {
        echo "Failed to acquire lock for kube-apiserver. Please check setup container for details. This is likely kubelet or CRI-O bug."
        exit 1
      }
      if [ -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt ]; then
        echo "Copying system trust bundle ..."
        cp -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
      fi
      
      exec watch-termination --termination-touch-file=/var/log/kube-apiserver/.terminating --termination-log-file=/var/log/kube-apiserver/termination.log --graceful-termination-duration=135s --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig -- hyperkube kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml --advertise-address=${HOST_IP}  -v=2 --permit-address-sharing
      
    State:       Waiting
      Reason:    RunContainerError
    Last State:  Terminated
      Reason:    Error
      Message:   ng.go:106] unable to get PriorityClass system-node-critical: Get "https://[::1]:6443/apis/scheduling.k8s.io/v1/priorityclasses/system-node-critical": x509: certificate has expired or is not yet valid: current time 2023-01-05T17:03:56Z is before 2023-01-06T00:03:21Z. Retrying...
E0105 17:03:56.199404      14 storage_rbac.go:187] unable to initialize clusterroles: Get "https://[::1]:6443/apis/rbac.authorization.k8s.io/v1/clusterroles": x509: certificate has expired or is not yet valid: current time 2023-01-05T17:03:56Z is before 2023-01-06T00:03:21Z
W0105 17:03:56.199456      14 storage_scheduling.go:106] unable to get PriorityClass system-node-critical: Get "https://[::1]:6443/apis/scheduling.k8s.io/v1/priorityclasses/system-node-critical": x509: certificate has expired or is not yet valid: current time 2023-01-05T17:03:56Z is before 2023-01-06T00:03:21Z. Retrying...
F0105 17:03:56.199610      14 hooks.go:203] PostStartHook "scheduling/bootstrap-system-priority-classes" failed: unable to add default system priority classes: timed out waiting for the condition
E0105 17:03:56.327343      14 sdn_readyz_wait.go:107] Get "https://[::1]:6443/api/v1/namespaces/openshift-oauth-apiserver/endpoints/api": x509: certificate has expired or is not yet valid: current time 2023-01-05T17:03:56Z is before 2023-01-06T00:03:21Z
E0105 17:03:56.327629      14 sdn_readyz_wait.go:107] Get "https://[::1]:6443/api/v1/namespaces/openshift-apiserver/endpoints/api": x509: certificate has expired or is not yet valid: current time 2023-01-05T17:03:56Z is before 2023-01-06T00:03:21Z
E0105 17:03:56.327679      14 storage_rbac.go:187] unable to initialize clusterroles: Get "https://[::1]:6443/apis/rbac.authorization.k8s.io/v1/clusterroles": x509: certificate has expired or is not yet valid: current time 2023-01-05T17:03:56Z is before 2023-01-06T00:03:21Z
I0105 17:03:56.493315       1 main.go:235] Termination finished with exit code 255
I0105 17:03:56.493417       1 main.go:188] Deleting termination lock file "/var/log/kube-apiserver/.terminating"

      Exit Code:    255
      Started:      Thu, 05 Jan 2023 20:03:20 -0500
      Finished:     Thu, 05 Jan 2023 12:03:56 -0500
    Ready:          False
    Restart Count:  4
    Requests:
      cpu:      265m
      memory:   1Gi
    Liveness:   http-get https://:6443/livez delay=45s timeout=10s period=10s #success=1 #failure=3
    Readiness:  http-get https://:6443/readyz delay=10s timeout=10s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:            kube-apiserver-weliang-01053-rx8tq-master-0 (v1:metadata.name)
      POD_NAMESPACE:       openshift-kube-apiserver (v1:metadata.namespace)
      STATIC_POD_VERSION:  7
      HOST_IP:              (v1:status.hostIP)
      GOGC:                100
    Mounts:
      /etc/kubernetes/static-pod-certs from cert-dir (rw)
      /etc/kubernetes/static-pod-resources from resource-dir (rw)
      /var/log/kube-apiserver from audit-dir (rw)
  kube-apiserver-cert-syncer:
    Container ID:  cri-o://46d2a285aa4af70ae2f86c39257a125cf9f2e34d000b387a9fe5eb52c4bda0af
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
    Port:          <none>
    Host Port:     <none>
    Command:
      cluster-kube-apiserver-operator
      cert-syncer
    Args:
      --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig
      --namespace=$(POD_NAMESPACE)
      --destination-dir=/etc/kubernetes/static-pod-certs
    State:          Running
      Started:      Thu, 05 Jan 2023 20:03:21 -0500
    Ready:          True
    Restart Count:  4
    Requests:
      cpu:     5m
      memory:  50Mi
    Environment:
      POD_NAME:       kube-apiserver-weliang-01053-rx8tq-master-0 (v1:metadata.name)
      POD_NAMESPACE:  openshift-kube-apiserver (v1:metadata.namespace)
    Mounts:
      /etc/kubernetes/static-pod-certs from cert-dir (rw)
      /etc/kubernetes/static-pod-resources from resource-dir (rw)
  kube-apiserver-cert-regeneration-controller:
    Container ID:  cri-o://0d6f91bda2f5cccedd0a27e7dcb39819f7f6e96b76bc365cf71747e0aa3e987f
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
    Port:          <none>
    Host Port:     <none>
    Command:
      cluster-kube-apiserver-operator
      cert-regeneration-controller
    Args:
      --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig
      --namespace=$(POD_NAMESPACE)
      -v=2
    State:          Running
      Started:      Thu, 05 Jan 2023 12:03:23 -0500
    Ready:          True
    Restart Count:  4
    Requests:
      cpu:     5m
      memory:  50Mi
    Environment:
      POD_NAMESPACE:  openshift-kube-apiserver (v1:metadata.namespace)
    Mounts:
      /etc/kubernetes/static-pod-resources from resource-dir (rw)
  kube-apiserver-insecure-readyz:
    Container ID:  cri-o://fa288fb5de0e39c4b13e3f112b947e4cb761801c89ba15ab75b6116b483f30d6
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
    Port:          6080/TCP
    Host Port:     6080/TCP
    Command:
      cluster-kube-apiserver-operator
      insecure-readyz
    Args:
      --insecure-port=6080
      --delegate-url=https://localhost:6443/readyz
    State:          Running
      Started:      Thu, 05 Jan 2023 12:03:24 -0500
    Ready:          True
    Restart Count:  4
    Requests:
      cpu:        5m
      memory:     50Mi
    Environment:  <none>
    Mounts:       <none>
  kube-apiserver-check-endpoints:
    Container ID:  cri-o://f2fea0945620a234b492c9d936897e6eb83b57ea780482f99dc2c32e6f62a591
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
    Port:          17697/TCP
    Host Port:     17697/TCP
    Command:
      cluster-kube-apiserver-operator
      check-endpoints
    Args:
      --kubeconfig
      /etc/kubernetes/static-pod-certs/configmaps/check-endpoints-kubeconfig/kubeconfig
      --listen
      0.0.0.0:17697
      --namespace
      $(POD_NAMESPACE)
      --v
      2
    State:       Running
      Started:   Thu, 05 Jan 2023 12:04:18 -0500
    Last State:  Terminated
      Reason:    Error
      Message:   W0105 17:03:50.570372       1 cmd.go:213] Using insecure, self-signed certificates
I0105 17:03:50.570747       1 crypto.go:601] Generating new CA for check-endpoints-signer@1672938230 cert, and key in /tmp/serving-cert-1963727250/serving-signer.crt, /tmp/serving-cert-1963727250/serving-signer.key
I0105 17:03:50.918995       1 observer_polling.go:159] Starting file observer
W0105 17:03:50.934666       1 builder.go:230] unable to get owner reference (falling back to namespace): pods "kube-apiserver-weliang-01053-rx8tq-master-0" is forbidden: User "system:serviceaccount:openshift-kube-apiserver:check-endpoints" cannot get resource "pods" in API group "" in the namespace "openshift-kube-apiserver"
I0105 17:03:50.934813       1 builder.go:262] check-endpoints version 4.13.0-202212240845.p0.gb6ca7dc.assembly.stream-b6ca7dc-b6ca7dcf808b9deb9a2ca8a1c67f8ceb475caf59
I0105 17:03:50.935452       1 dynamic_serving_content.go:113] "Loaded a new cert/key pair" name="serving-cert::/tmp/serving-cert-1963727250/tls.crt::/tmp/serving-cert-1963727250/tls.key"
W0105 17:03:51.720732       1 requestheader_controller.go:193] Unable to get configmap/extension-apiserver-authentication in kube-system.  Usually fixed by 'kubectl create rolebinding -n kube-system ROLEBINDING_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'
F0105 17:03:51.720771       1 cmd.go:138] error initializing delegating authentication: unable to load configmap based request-header-client-ca-file: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:openshift-kube-apiserver:check-endpoints" cannot get resource "configmaps" in API group "" in the namespace "kube-system"

      Exit Code:    255
      Started:      Thu, 05 Jan 2023 12:03:50 -0500
      Finished:     Thu, 05 Jan 2023 12:03:51 -0500
    Ready:          True
    Restart Count:  9
    Requests:
      cpu:      10m
      memory:   50Mi
    Liveness:   http-get https://:17697/healthz delay=10s timeout=10s period=10s #success=1 #failure=3
    Readiness:  http-get https://:17697/healthz delay=10s timeout=10s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       kube-apiserver-weliang-01053-rx8tq-master-0 (v1:metadata.name)
      POD_NAMESPACE:  openshift-kube-apiserver (v1:metadata.namespace)
    Mounts:
      /etc/kubernetes/static-pod-certs from cert-dir (rw)
      /etc/kubernetes/static-pod-resources from resource-dir (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  resource-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7
    HostPathType:  
  cert-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/static-pod-resources/kube-apiserver-certs
    HostPathType:  
  audit-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/kube-apiserver
    HostPathType:  
QoS Class:         Burstable
Node-Selectors:    <none>
Tolerations:       op=Exists
Events:
  Type     Reason      Age    From     Message
  ----     ------      ----   ----     -------
  Normal   Pulled      4h11m  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
  Normal   Created     4h11m  kubelet  Created container setup
  Normal   Started     4h11m  kubelet  Started container setup
  Normal   Pulled      4h11m  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
  Normal   Created     4h11m  kubelet  Created container kube-apiserver
  Normal   Started     4h11m  kubelet  Started container kube-apiserver
  Normal   Pulled      4h11m  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Created     4h11m  kubelet  Created container kube-apiserver-cert-syncer
  Normal   Started     4h11m  kubelet  Started container kube-apiserver-cert-syncer
  Normal   Pulled      4h11m  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Started     4h11m  kubelet  Started container kube-apiserver-check-endpoints
  Normal   Started     4h11m  kubelet  Started container kube-apiserver-cert-regeneration-controller
  Normal   Created     4h11m  kubelet  Created container kube-apiserver-insecure-readyz
  Normal   Created     4h11m  kubelet  Created container kube-apiserver-cert-regeneration-controller
  Normal   Started     4h11m  kubelet  Started container kube-apiserver-insecure-readyz
  Normal   Pulled      4h11m  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Created     4h11m  kubelet  Created container kube-apiserver-check-endpoints
  Normal   Pulled      4h11m  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Created     3h5m   kubelet  Created container kube-apiserver-cert-syncer
  Normal   Started     3h5m   kubelet  Started container kube-apiserver-cert-syncer
  Normal   Pulled      3h5m   kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Created     3h5m   kubelet  Created container kube-apiserver-cert-regeneration-controller
  Normal   Started     3h5m   kubelet  Started container kube-apiserver-cert-regeneration-controller
  Normal   Pulled      3h5m   kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Created     3h5m   kubelet  Created container kube-apiserver-insecure-readyz
  Normal   Started     3h5m   kubelet  Started container kube-apiserver-insecure-readyz
  Normal   Pulled      3h5m   kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Created     3h5m   kubelet  Created container kube-apiserver-check-endpoints
  Normal   Started     3h5m   kubelet  Started container kube-apiserver-check-endpoints
  Normal   Pulled      173m   kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
  Normal   Created     173m   kubelet  Created container setup
  Normal   Started     173m   kubelet  Started container setup
  Normal   Started     173m   kubelet  Started container kube-apiserver-insecure-readyz
  Normal   Created     173m   kubelet  Created container kube-apiserver
  Normal   Started     173m   kubelet  Started container kube-apiserver
  Normal   Pulled      173m   kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Created     173m   kubelet  Created container kube-apiserver-cert-syncer
  Normal   Started     173m   kubelet  Started container kube-apiserver-cert-syncer
  Normal   Pulled      173m   kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Created     173m   kubelet  Created container kube-apiserver-cert-regeneration-controller
  Normal   Pulled      173m   kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
  Normal   Pulled      173m   kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Created     173m   kubelet  Created container kube-apiserver-insecure-readyz
  Normal   Started     173m   kubelet  Started container kube-apiserver-cert-regeneration-controller
  Normal   Created     173m   kubelet  Created container kube-apiserver-check-endpoints
  Normal   Started     173m   kubelet  Started container kube-apiserver-check-endpoints
  Warning  Unhealthy   173m   kubelet  Liveness probe failed: Get "https://10.0.99.103:17697/healthz": read tcp 10.0.99.103:56616->10.0.99.103:17697: read: connection reset by peer
  Warning  ProbeError  173m   kubelet  Readiness probe error: HTTP probe failed with statuscode: 403
body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/readyz\": RBAC: [clusterrole.rbac.authorization.k8s.io \"system:webhook\" not found, clusterrole.rbac.authorization.k8s.io \"system:openshift:public-info-viewer\" not found, clusterrole.rbac.authorization.k8s.io \"self-access-reviewer\" not found, clusterrole.rbac.authorization.k8s.io \"system:public-info-viewer\" not found, clusterrole.rbac.authorization.k8s.io \"system:oauth-token-deleter\" not found, clusterrole.rbac.authorization.k8s.io \"system:scope-impersonation\" not found]","reason":"Forbidden","details":{},"code":403}
  Warning  Unhealthy   173m  kubelet  Readiness probe failed: HTTP probe failed with statuscode: 403
  Warning  ProbeError  173m  kubelet  Liveness probe error: Get "https://10.0.99.103:17697/healthz": read tcp 10.0.99.103:56616->10.0.99.103:17697: read: connection reset by peer
body:
  Normal   Pulled      173m (x2 over 173m)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Warning  ProbeError  173m                 kubelet  Readiness probe error: HTTP probe failed with statuscode: 500
body: [+]ping ok
[+]log ok
[+]etcd ok
[+]etcd-readiness ok
[-]api-openshift-apiserver-available failed: reason withheld
[-]api-openshift-oauth-apiserver-available failed: reason withheld
[+]informer-sync ok
[+]poststarthook/openshift.io-openshift-apiserver-reachable ok
[+]poststarthook/openshift.io-oauth-apiserver-reachable ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/quota.openshift.io-clusterquotamapping ok
[+]poststarthook/openshift.io-deprecated-api-requests-filter ok
[+]poststarthook/openshift.io-startkubeinformers ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[-]poststarthook/scheduling/bootstrap-system-priority-classes failed: reason withheld
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/apiservice-wait-for-first-sync ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/apiservice-openapiv3-controller ok
[+]shutdown ok
readyz check failed
  Warning  ProbeError  146m  kubelet  Readiness probe error: HTTP probe failed with statuscode: 500
body: [+]ping ok
[+]log ok
[-]etcd failed: reason withheld
[+]etcd-readiness ok
[+]api-openshift-apiserver-available ok
[+]api-openshift-oauth-apiserver-available ok
[+]informer-sync ok
[+]poststarthook/openshift.io-openshift-apiserver-reachable ok
[+]poststarthook/openshift.io-oauth-apiserver-reachable ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/quota.openshift.io-clusterquotamapping ok
[+]poststarthook/openshift.io-deprecated-api-requests-filter ok
[+]poststarthook/openshift.io-startkubeinformers ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/apiservice-wait-for-first-sync ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/apiservice-openapiv3-controller ok
[+]shutdown ok
readyz check failed
  Warning  Unhealthy   146m (x11 over 173m)  kubelet  Readiness probe failed: HTTP probe failed with statuscode: 500
  Normal   Pulled      143m                  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
  Normal   Created     143m                  kubelet  Created container setup
  Normal   Started     143m                  kubelet  Started container setup
  Normal   Pulled      143m                  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
  Normal   Created     143m                  kubelet  Created container kube-apiserver
  Normal   Started     143m                  kubelet  Started container kube-apiserver
  Normal   Pulled      143m                  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Created     143m                  kubelet  Created container kube-apiserver-check-endpoints
  Normal   Started     143m                  kubelet  Started container kube-apiserver-cert-syncer
  Normal   Pulled      143m                  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Created     143m                  kubelet  Created container kube-apiserver-cert-regeneration-controller
  Normal   Started     143m                  kubelet  Started container kube-apiserver-cert-regeneration-controller
  Normal   Created     143m                  kubelet  Created container kube-apiserver-cert-syncer
  Normal   Created     143m                  kubelet  Created container kube-apiserver-insecure-readyz
  Normal   Started     143m                  kubelet  Started container kube-apiserver-insecure-readyz
  Normal   Started     143m                  kubelet  Started container kube-apiserver-check-endpoints
  Normal   Pulled      143m                  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Warning  ProbeError  143m                  kubelet  Readiness probe error: Get "https://10.0.99.103:17697/healthz": dial tcp 10.0.99.103:17697: connect: connection refused
body:
  Warning  Unhealthy   143m  kubelet  Readiness probe failed: Get "https://10.0.99.103:17697/healthz": dial tcp 10.0.99.103:17697: connect: connection refused
  Warning  ProbeError  143m  kubelet  Readiness probe error: HTTP probe failed with statuscode: 403
body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/readyz\"","reason":"Forbidden","details":{},"code":403}
  Warning  Unhealthy   143m                 kubelet  Readiness probe failed: HTTP probe failed with statuscode: 403
  Normal   Pulled      143m (x2 over 143m)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Warning  ProbeError  143m                 kubelet  Liveness probe error: Get "https://10.0.99.103:17697/healthz": dial tcp 10.0.99.103:17697: connect: connection refused
body:
  Warning  Unhealthy  143m                      kubelet  Liveness probe failed: Get "https://10.0.99.103:17697/healthz": dial tcp 10.0.99.103:17697: connect: connection refused
  Normal   Created    135m                      kubelet  Created container kube-apiserver-cert-regeneration-controller
  Normal   Started    135m                      kubelet  Started container kube-apiserver-insecure-readyz
  Normal   Pulled     135m                      kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Created    135m                      kubelet  Created container kube-apiserver-insecure-readyz
  Normal   Started    135m                      kubelet  Started container kube-apiserver-cert-regeneration-controller
  Normal   Created    135m (x2 over 135m)       kubelet  Created container kube-apiserver-check-endpoints
  Normal   Started    135m (x2 over 135m)       kubelet  Started container kube-apiserver-check-endpoints
  Warning  BackOff    134m (x3 over 135m)       kubelet  Back-off restarting failed container
  Normal   Pulled     134m (x3 over 135m)       kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Warning  BackOff    130m (x20 over 134m)      kubelet  Back-off restarting failed container
  Normal   Pulled     1s (x600 over <invalid>)  kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
  Normal   Pulled     <invalid>                 kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
  Normal   Created    <invalid>                 kubelet  Created container setup
  Normal   Started    <invalid>                 kubelet  Started container setup
  Normal   Created    <invalid>                 kubelet  Created container kube-apiserver
  Normal   Started    <invalid>                 kubelet  Started container kube-apiserver
  Normal   Pulled     <invalid>                 kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Pulled     <invalid>                 kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
  Normal   Pulled     <invalid>                 kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
  Normal   Created    <invalid>                 kubelet  Created container setup
  Normal   Started    <invalid>                 kubelet  Started container setup
  Normal   Created    <invalid>                 kubelet  Created container kube-apiserver
  Normal   Started    <invalid>                 kubelet  Started container kube-apiserver
  Normal   Pulled     <invalid>                 kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
  Normal   Created    <invalid>                 kubelet  Created container kube-apiserver-cert-syncer
  Normal   Started    <invalid>                 kubelet  Started container kube-apiserver-cert-syncer
  Normal   Pulled     <invalid>                 kubelet  Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
[weliang@weliang openshift-tests-private]$ 
[weliang@weliang openshift-tests-private]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-01-01-223309   True        False         4h3m    Error while reconciling 4.13.0-0.nightly-2023-01-01-223309: the cluster operator kube-apiserver is degraded
[weliang@weliang openshift-tests-private]$

Comment 34 Weibin Liang 2023-01-17 21:37:44 UTC

Tested and verified in 4.12.0-rc.8

Comment 37 errata-xmlrpc 2023-01-30 17:31:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.12.1 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:0449

Note You need to log in before you can comment on or make changes to this bug.