Bug 2106264
| Summary: | Static pods stuck in 'CreateContainerError' after SDN migration then rollback on Ali Cloud | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Peng Liu <pliu> | ||||
| Component: | Node | Assignee: | Sascha Grunert <sgrunert> | ||||
| Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | high | CC: | rphillips, sgrunert, talessio, weliang, zzhao | ||||
| Version: | 4.12 | Flags: | pliu:
needinfo-
pliu: needinfo- |
||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.12.z | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2023-01-30 17:31:32 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Peng Liu
2022-07-12 08:56:17 UTC
Executing 'crictl rm -r -a' on the node can recover the pod. This is also a workaround mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1785399. I also found the container state from kubectl, is not synced with the actual container state on the node. The container was up and running on the node, but the container log from kubectl only contains previous container failure. cri-o needs a reboot to detect the new networking setup. Is there something reboot the nodes when there is a network change? @rphillips Yes, there are 2 reboots during the rollback. One is triggered manually, MCO triggers the other one. Also after seeing this problem, I tried to reboot the node with malfunctioning pods again, but it didn't help. The pods still could not become ready after reboot. BTW, The kube-controller-manager or kube-apiserver pods are using hostnetwork instead of CNI. So they are directly affected by the SDN migration which only swaps the CNI of the cluster. Hey Peng, I'm looking into this issue in addition to my team members. (In reply to Peng Liu from comment #3) > I also found the container state from kubectl, is not synced with the actual container state on the node. The container was up and running on the node, but the container log from kubectl only contains previous container failure. I assume that up and running means that `crictl inspect $ID | jq .status.state` reports "CONTAINER_RUNNING", right? If that's the case, would it be possible to get the kubelet logs with increased verbosity `-v=10` as well as one of the affected container names? I've prepared a cluster for online debugging. The test setup mentioned by Peng reveals the issue that we have two cluster-policy-controller containers, where one is up and running and the other one blocks the name, which tries to get created by the kubelet:
> 47b97d9dd4801 e1bdd290e0d56520dc3f8a8af8fcdee28b1d1c99214b9ee3cb63c8faa256c4c1 Less than a second ago Exited cluster-policy-controller 2 621ba87014777 kube-controller-manager-pliu-alicloud-qp45q-master-0
> 293d816f6735e e1bdd290e0d56520dc3f8a8af8fcdee28b1d1c99214b9ee3cb63c8faa256c4c1 48 minutes ago Running cluster-policy-controller 3 621ba87014777 kube-controller-manager-pliu-alicloud-qp45q-master-0
I'd expect that the kubelet does a cleanup of running containers before trying to create new ones, so this may be the bug here.
Peng, can you please try to reproduce the issue with the latest 4.11 and tell me if it works there?
I've reproduced this issue with 4.11.0-0.ci-2022-07-27-174640. I've built an experimental workaround into CRI-O to remove the container once we get the name reservation error message 10 times in 10 minutes: https://github.com/cri-o/cri-o/pull/6097 https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=46802628 Can you please try the 4.12 reproducer again with the modified package cri-o-1.25.0-13.rhaos4.12.gitdf67c83.el8 and enabled CRI-O debug logs: http://brew-task-repos.usersys.redhat.com/repos/scratch/sgrunert/cri-o/1.25.0/13.rhaos4.12.gitdf67c83.el8/ You can install the rpms locally by putting this .repo file in your /etc/yum.repos.d/ directory: http://brew-task-repos.usersys.redhat.com/repos/scratch/sgrunert/cri-o/1.25.0/13.rhaos4.12.gitdf67c83.el8/cri-o-1.25.0-13.rhaos4.12.gitdf67c83.el8-scratch.repo RPMs and build logs can be found in the following locations: http://brew-task-repos.usersys.redhat.com/repos/scratch/sgrunert/cri-o/1.25.0/13.rhaos4.12.gitdf67c83.el8/aarch64/ http://brew-task-repos.usersys.redhat.com/repos/scratch/sgrunert/cri-o/1.25.0/13.rhaos4.12.gitdf67c83.el8/ppc64le/ http://brew-task-repos.usersys.redhat.com/repos/scratch/sgrunert/cri-o/1.25.0/13.rhaos4.12.gitdf67c83.el8/s390x/ http://brew-task-repos.usersys.redhat.com/repos/scratch/sgrunert/cri-o/1.25.0/13.rhaos4.12.gitdf67c83.el8/x86_64/ Removing the blocker and reducing the urgency since we have the `crictl rm -fa` workaround. I had to build a new version of the patch: - https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=46827743 - http://brew-task-repos.usersys.redhat.com/repos/scratch/sgrunert/cri-o/1.25.0/14.rhaos4.12.gitdf67c83.el8/ Peng, can you please try it again and send me over the CRI-O and kubelet debug logs? The new build can remove the stale container from the node. However, the container still cannot become ready in the apiserver. NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES apiserver-watcher-pliu-alicloud-bxctb-master-0 1/1 Running 2 8h 10.0.101.108 pliu-alicloud-bxctb-master-0 <none> <none> apiserver-watcher-pliu-alicloud-bxctb-master-1 1/1 Running 3 8h 10.0.157.160 pliu-alicloud-bxctb-master-1 <none> <none> apiserver-watcher-pliu-alicloud-bxctb-master-2 1/1 Running 3 8h 10.0.101.109 pliu-alicloud-bxctb-master-2 <none> <none> kube-apiserver-guard-pliu-alicloud-bxctb-master-0 1/1 Running 0 7h8m 10.130.0.13 pliu-alicloud-bxctb-master-0 <none> <none> kube-apiserver-guard-pliu-alicloud-bxctb-master-1 1/1 Running 1 7h15m 10.128.0.23 pliu-alicloud-bxctb-master-1 <none> <none> kube-apiserver-guard-pliu-alicloud-bxctb-master-2 1/1 Running 1 7h12m 10.129.0.9 pliu-alicloud-bxctb-master-2 <none> <none> kube-apiserver-pliu-alicloud-bxctb-master-0 5/5 Running 14 8h 10.0.101.108 pliu-alicloud-bxctb-master-0 <none> <none> kube-apiserver-pliu-alicloud-bxctb-master-1 4/5 Running 16 8h 10.0.157.160 pliu-alicloud-bxctb-master-1 <none> <none> kube-apiserver-pliu-alicloud-bxctb-master-2 4/5 Running 16 8h 10.0.101.109 pliu-alicloud-bxctb-master-2 <none> <none> revision-pruner-7-pliu-alicloud-bxctb-master-0 0/1 Completed 0 7h11m 10.130.0.44 pliu-alicloud-bxctb-master-0 <none> <none> revision-pruner-7-pliu-alicloud-bxctb-master-1 0/1 Completed 0 7h17m 10.128.0.5 pliu-alicloud-bxctb-master-1 <none> <none> revision-pruner-7-pliu-alicloud-bxctb-master-2 0/1 Completed 0 7h14m 10.129.0.40 pliu-alicloud-bxctb-master-2 <none> <none> Here are the kubelet log https://paste.c-net.org/PilotSocial and crio log https://paste.c-net.org/MuldoonPolitely Thank you Peng, I would require the full CRI-O logs for the reproducer, unfortunately it looks like that we only capture 1 minute of them. Another question: Does a restart of the kubelet resolve the issue? Ryan, may I ask you to assist me with the kubelet part here? The facts are right now: - There is a container up and running (healthy) in the API server POD after the SDN migration (this is visible via crictl for example) - The kubelet still tries to create the container and fails because CRI-O reports "name is reserved" (see the logs above) I know the kubelet caches the results, right? So I'm wondering if a restart of the kubelet helps here from the failing state. It feels like a bug in the kubelet, and my CRI-O workaround (https://github.com/cri-o/cri-o/pull/6097) tries to fix it from the opposite direction. @sgrunert The crio log is huge after turning on the debug level. Ping me on slack if you need an environment for debugging. We have some new findings while debugging this issue. This issue can be reproduced without doing any SDN migration operation. Instead, a MachineConfig update can also trigger it, for instance, turning on crio debug logging following https://access.redhat.com/solutions/5133191. We hit this issue when doing SDN migration because there is also a MachineConfig update in the operation of migration. I set up the reproduce environment in Ali cloud with QE's CI job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install. The test environment can only last for 12 hours. Thanks Peng, this is the summary of the current state: We already have a container kube-apiserver up and running in the pod where the kubelet tries to create one with the same name resulting in the "name reserved" error: sh-4.4# crictl ps -a --name kube-apiserver CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD 6305d725a6bd3 9fa866b8c15bf5e536504da71b706caf1dc0c926ed21991f69425f6e41938ba1 Less than a second ago Exited kube-apiserver 3 90943b3fa5860 kube-apiserver-pliu-alicloud-fblsj-master-0 57beb7420744b 9fa866b8c15bf5e536504da71b706caf1dc0c926ed21991f69425f6e41938ba1 11 minutes ago Running kube-apiserver 4 90943b3fa5860 kube-apiserver-pliu-alicloud-fblsj-master-0 The workaround does not work as expected so I have to find out why. Ryan, can you double check the kubelet logs to see if we can find the root cause in the kubelet sync loop? All logs can be found here: https://drive.google.com/drive/folders/1C18N-k_vsP6CMagxAYs20WSkdltKwxF6?usp=sharing Have a new run without any cluster modification other than applying a machine config 99-master-kubelet-loglevel:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: 99-master-kubelet-loglevel
spec:
config:
ignition:
version: 3.2.0
systemd:
units:
- dropins:
- contents: |
[Service]
Environment="KUBELET_LOG_LEVEL=10"
name: 30-logging.conf
enabled: true
name: kubelet.service
The crictl output shows the problem:
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
f3a6fa482cd0b 0bc42fd43ea720db6380802cc39ce72f392e78bc0eb18afc0fa261e2ae2c8e55 Less than a second ago Running kube-apiserver-cert-regeneration-controller 1 a8cc75ec372da kube-apiserver-pliu-alicloud-2n9bd-master-0
e287689759221 0bc42fd43ea720db6380802cc39ce72f392e78bc0eb18afc0fa261e2ae2c8e55 Less than a second ago Running kube-apiserver-cert-syncer 1 a8cc75ec372da kube-apiserver-pliu-alicloud-2n9bd-master-0
4065e776eb308 9fa866b8c15bf5e536504da71b706caf1dc0c926ed21991f69425f6e41938ba1 Less than a second ago Exited kube-apiserver 1 a8cc75ec372da kube-apiserver-pliu-alicloud-2n9bd-master-0
ad56b755641d9 9fa866b8c15bf5e536504da71b706caf1dc0c926ed21991f69425f6e41938ba1 40 minutes ago Running kube-apiserver 2 a8cc75ec372da kube-apiserver-pliu-alicloud-2n9bd-master-0
60b30abe43fda 0bc42fd43ea720db6380802cc39ce72f392e78bc0eb18afc0fa261e2ae2c8e55 40 minutes ago Running kube-apiserver-check-endpoints 1 a8cc75ec372da kube-apiserver-pliu-alicloud-2n9bd-master-0
d45d4d8dd6b32 0bc42fd43ea720db6380802cc39ce72f392e78bc0eb18afc0fa261e2ae2c8e55 40 minutes ago Running kube-apiserver-insecure-readyz 1 a8cc75ec372da kube-apiserver-pliu-alicloud-2n9bd-master-0
kubelet and CRI-O logs: https://drive.google.com/drive/folders/1rSvUMFP3bpROI9MVa1i19cUckKoGNwxU?usp=sharing
Found out that the revert in https://github.com/cri-o/cri-o/pull/6111 fixes the problem. We're now discussing how to proceed. We're now working on https://github.com/cri-o/cri-o/pull/6123 to solve the problem. I see the fix has been merged upstream. May I ask when we can have it in OCP? (In reply to Peng Liu from comment #22) > I see the fix has been merged upstream. May I ask when we can have it in OCP? The merge into main will land in 1.26 / OCP 4.13, we can backport it to 4.12 once it's verified by QA. @sgrunert Right now we do not have v4.13 available in https://amd64.ocp.releases.ci.openshift.org/, QE can not do any testing. Move Status back to Assign. Alright I'm opening a cherry-pick for 4.15 to be able to verify it: https://github.com/cri-o/cri-o/pull/6241 The PR already merged, we can now verify once CI has CRI-O 76292062. https://github.com/cri-o/cri-o/commit/76292062dbcd6fc77569fcec45487551fa40d844 @weliang The patch has been merged. Could you help to verify it? @pliu Two Alicloud cluster installation bugs blocked our verification. https://issues.redhat.com/browse/OCPBUGS-2248 https://issues.redhat.com/browse/OCPBUGS-2388 Ok, let's wait for the installer fix. Verification failed on 4.13.0-0.nightly-2023-01-01-223309
[weliang@weliang openshift-tests-private]$ oc get all -n openshift-kube-apiserver
NAME READY STATUS RESTARTS AGE
pod/apiserver-watcher-weliang-01053-rx8tq-master-0 1/1 Running 4 4h39m
pod/apiserver-watcher-weliang-01053-rx8tq-master-1 1/1 Running 4 4h37m
pod/apiserver-watcher-weliang-01053-rx8tq-master-2 1/1 Running 4 4h37m
pod/kube-apiserver-guard-weliang-01053-rx8tq-master-0 1/1 Running 0 134m
pod/kube-apiserver-guard-weliang-01053-rx8tq-master-1 1/1 Running 0 126m
pod/kube-apiserver-guard-weliang-01053-rx8tq-master-2 1/1 Running 0 130m
pod/kube-apiserver-weliang-01053-rx8tq-master-0 4/5 RunContainerError 25 (134m ago) 4h11m
pod/kube-apiserver-weliang-01053-rx8tq-master-1 4/5 Running 22 4h5m
pod/kube-apiserver-weliang-01053-rx8tq-master-2 4/5 RunContainerError 25 (130m ago) 4h8m
pod/revision-pruner-7-weliang-01053-rx8tq-master-0 0/1 Completed 0 135m
pod/revision-pruner-7-weliang-01053-rx8tq-master-1 0/1 Completed 0 130m
pod/revision-pruner-7-weliang-01053-rx8tq-master-2 0/1 Completed 0 132m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/apiserver ClusterIP 172.30.71.194 <none> 443/TCP 4h37m
[weliang@weliang openshift-tests-private]$ oc describe pod/kube-apiserver-weliang-01053-rx8tq-master-0 -n openshift-kube-apiserver
Name: kube-apiserver-weliang-01053-rx8tq-master-0
Namespace: openshift-kube-apiserver
Priority: 2000001000
Priority Class Name: system-node-critical
Node: weliang-01053-rx8tq-master-0/10.0.99.103
Start Time: Thu, 05 Jan 2023 09:43:08 -0500
Labels: apiserver=true
app=openshift-kube-apiserver
revision=7
Annotations: kubectl.kubernetes.io/default-container: kube-apiserver
kubernetes.io/config.hash: 0fec70f8250dabd2139268425b6896e7
kubernetes.io/config.mirror: 0fec70f8250dabd2139268425b6896e7
kubernetes.io/config.seen: 2023-01-05T15:05:29.477569711Z
kubernetes.io/config.source: file
target.workload.openshift.io/management: {"effect": "PreferredDuringScheduling"}
Status: Running
IP: 10.0.99.103
IPs:
IP: 10.0.99.103
Controlled By: Node/weliang-01053-rx8tq-master-0
Init Containers:
setup:
Container ID: cri-o://b99295901e4c00b71a9badb162fc7d10d77f47673eeecf14afe369286c8152c7
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70
Port: <none>
Host Port: <none>
Command:
/usr/bin/timeout
220
/bin/bash
-ec
Args:
echo "Fixing audit permissions ..."
chmod 0700 /var/log/kube-apiserver && touch /var/log/kube-apiserver/audit.log && chmod 0600 /var/log/kube-apiserver/*
LOCK=/var/log/kube-apiserver/.lock
echo "Acquiring exclusive lock ${LOCK} ..."
# Waiting for 135s max for old kube-apiserver's watch-termination process to exit and remove the lock.
# Two cases:
# 1. if kubelet does not start the old and new in parallel (i.e. works as expected), the flock will always succeed without any time.
# 2. if kubelet does overlap old and new pods for up to 130s, the flock will wait and immediate return when the old finishes.
#
# NOTE: We can increase 135s for a bigger expected overlap. But a higher value means less noise about the broken kubelet behaviour, i.e. we hide a bug.
# NOTE: Do not tweak these timings without considering the livenessProbe initialDelaySeconds
exec {LOCK_FD}>${LOCK} && flock --verbose -w 135 "${LOCK_FD}" || {
echo "$(date -Iseconds -u) kubelet did not terminate old kube-apiserver before new one" >> /var/log/kube-apiserver/lock.log
echo -n ": WARNING: kubelet did not terminate old kube-apiserver before new one."
# We failed to acquire exclusive lock, which means there is old kube-apiserver running in system.
# Since we utilize SO_REUSEPORT, we need to make sure the old kube-apiserver stopped listening.
#
# NOTE: This is a fallback for broken kubelet, if you observe this please report a bug.
echo -n "Waiting for port 6443 to be released due to likely bug in kubelet or CRI-O "
while [ -n "$(ss -Htan state listening '( sport = 6443 or sport = 6080 )')" ]; do
echo -n "."
sleep 1
(( tries += 1 ))
if [[ "${tries}" -gt 10 ]]; then
echo "Timed out waiting for port :6443 and :6080 to be released, this is likely a bug in kubelet or CRI-O"
exit 1
fi
done
# This is to make sure the server has terminated independently from the lock.
# After the port has been freed (requests can be pending and need 60s max).
sleep 65
}
# We cannot hold the lock from the init container to the main container. We release it here. There is no risk, at this point we know we are safe.
flock -u "${LOCK_FD}"
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 05 Jan 2023 20:03:19 -0500
Finished: Thu, 05 Jan 2023 20:03:19 -0500
Ready: True
Restart Count: 4
Requests:
cpu: 5m
memory: 50Mi
Environment: <none>
Mounts:
/var/log/kube-apiserver from audit-dir (rw)
Containers:
kube-apiserver:
Container ID: cri-o://2f29f674782ee2f975537fb06e6ba732d4829f1276509d7badd547158edc470a
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70
Port: 6443/TCP
Host Port: 6443/TCP
Command:
/bin/bash
-ec
Args:
LOCK=/var/log/kube-apiserver/.lock
# We should be able to acquire the lock immediatelly. If not, it means the init container has not released it yet and kubelet or CRI-O started container prematurely.
exec {LOCK_FD}>${LOCK} && flock --verbose -w 30 "${LOCK_FD}" || {
echo "Failed to acquire lock for kube-apiserver. Please check setup container for details. This is likely kubelet or CRI-O bug."
exit 1
}
if [ -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt ]; then
echo "Copying system trust bundle ..."
cp -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
fi
exec watch-termination --termination-touch-file=/var/log/kube-apiserver/.terminating --termination-log-file=/var/log/kube-apiserver/termination.log --graceful-termination-duration=135s --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig -- hyperkube kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml --advertise-address=${HOST_IP} -v=2 --permit-address-sharing
State: Waiting
Reason: RunContainerError
Last State: Terminated
Reason: Error
Message: ng.go:106] unable to get PriorityClass system-node-critical: Get "https://[::1]:6443/apis/scheduling.k8s.io/v1/priorityclasses/system-node-critical": x509: certificate has expired or is not yet valid: current time 2023-01-05T17:03:56Z is before 2023-01-06T00:03:21Z. Retrying...
E0105 17:03:56.199404 14 storage_rbac.go:187] unable to initialize clusterroles: Get "https://[::1]:6443/apis/rbac.authorization.k8s.io/v1/clusterroles": x509: certificate has expired or is not yet valid: current time 2023-01-05T17:03:56Z is before 2023-01-06T00:03:21Z
W0105 17:03:56.199456 14 storage_scheduling.go:106] unable to get PriorityClass system-node-critical: Get "https://[::1]:6443/apis/scheduling.k8s.io/v1/priorityclasses/system-node-critical": x509: certificate has expired or is not yet valid: current time 2023-01-05T17:03:56Z is before 2023-01-06T00:03:21Z. Retrying...
F0105 17:03:56.199610 14 hooks.go:203] PostStartHook "scheduling/bootstrap-system-priority-classes" failed: unable to add default system priority classes: timed out waiting for the condition
E0105 17:03:56.327343 14 sdn_readyz_wait.go:107] Get "https://[::1]:6443/api/v1/namespaces/openshift-oauth-apiserver/endpoints/api": x509: certificate has expired or is not yet valid: current time 2023-01-05T17:03:56Z is before 2023-01-06T00:03:21Z
E0105 17:03:56.327629 14 sdn_readyz_wait.go:107] Get "https://[::1]:6443/api/v1/namespaces/openshift-apiserver/endpoints/api": x509: certificate has expired or is not yet valid: current time 2023-01-05T17:03:56Z is before 2023-01-06T00:03:21Z
E0105 17:03:56.327679 14 storage_rbac.go:187] unable to initialize clusterroles: Get "https://[::1]:6443/apis/rbac.authorization.k8s.io/v1/clusterroles": x509: certificate has expired or is not yet valid: current time 2023-01-05T17:03:56Z is before 2023-01-06T00:03:21Z
I0105 17:03:56.493315 1 main.go:235] Termination finished with exit code 255
I0105 17:03:56.493417 1 main.go:188] Deleting termination lock file "/var/log/kube-apiserver/.terminating"
Exit Code: 255
Started: Thu, 05 Jan 2023 20:03:20 -0500
Finished: Thu, 05 Jan 2023 12:03:56 -0500
Ready: False
Restart Count: 4
Requests:
cpu: 265m
memory: 1Gi
Liveness: http-get https://:6443/livez delay=45s timeout=10s period=10s #success=1 #failure=3
Readiness: http-get https://:6443/readyz delay=10s timeout=10s period=10s #success=1 #failure=3
Environment:
POD_NAME: kube-apiserver-weliang-01053-rx8tq-master-0 (v1:metadata.name)
POD_NAMESPACE: openshift-kube-apiserver (v1:metadata.namespace)
STATIC_POD_VERSION: 7
HOST_IP: (v1:status.hostIP)
GOGC: 100
Mounts:
/etc/kubernetes/static-pod-certs from cert-dir (rw)
/etc/kubernetes/static-pod-resources from resource-dir (rw)
/var/log/kube-apiserver from audit-dir (rw)
kube-apiserver-cert-syncer:
Container ID: cri-o://46d2a285aa4af70ae2f86c39257a125cf9f2e34d000b387a9fe5eb52c4bda0af
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
Port: <none>
Host Port: <none>
Command:
cluster-kube-apiserver-operator
cert-syncer
Args:
--kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig
--namespace=$(POD_NAMESPACE)
--destination-dir=/etc/kubernetes/static-pod-certs
State: Running
Started: Thu, 05 Jan 2023 20:03:21 -0500
Ready: True
Restart Count: 4
Requests:
cpu: 5m
memory: 50Mi
Environment:
POD_NAME: kube-apiserver-weliang-01053-rx8tq-master-0 (v1:metadata.name)
POD_NAMESPACE: openshift-kube-apiserver (v1:metadata.namespace)
Mounts:
/etc/kubernetes/static-pod-certs from cert-dir (rw)
/etc/kubernetes/static-pod-resources from resource-dir (rw)
kube-apiserver-cert-regeneration-controller:
Container ID: cri-o://0d6f91bda2f5cccedd0a27e7dcb39819f7f6e96b76bc365cf71747e0aa3e987f
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
Port: <none>
Host Port: <none>
Command:
cluster-kube-apiserver-operator
cert-regeneration-controller
Args:
--kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig
--namespace=$(POD_NAMESPACE)
-v=2
State: Running
Started: Thu, 05 Jan 2023 12:03:23 -0500
Ready: True
Restart Count: 4
Requests:
cpu: 5m
memory: 50Mi
Environment:
POD_NAMESPACE: openshift-kube-apiserver (v1:metadata.namespace)
Mounts:
/etc/kubernetes/static-pod-resources from resource-dir (rw)
kube-apiserver-insecure-readyz:
Container ID: cri-o://fa288fb5de0e39c4b13e3f112b947e4cb761801c89ba15ab75b6116b483f30d6
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
Port: 6080/TCP
Host Port: 6080/TCP
Command:
cluster-kube-apiserver-operator
insecure-readyz
Args:
--insecure-port=6080
--delegate-url=https://localhost:6443/readyz
State: Running
Started: Thu, 05 Jan 2023 12:03:24 -0500
Ready: True
Restart Count: 4
Requests:
cpu: 5m
memory: 50Mi
Environment: <none>
Mounts: <none>
kube-apiserver-check-endpoints:
Container ID: cri-o://f2fea0945620a234b492c9d936897e6eb83b57ea780482f99dc2c32e6f62a591
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc
Port: 17697/TCP
Host Port: 17697/TCP
Command:
cluster-kube-apiserver-operator
check-endpoints
Args:
--kubeconfig
/etc/kubernetes/static-pod-certs/configmaps/check-endpoints-kubeconfig/kubeconfig
--listen
0.0.0.0:17697
--namespace
$(POD_NAMESPACE)
--v
2
State: Running
Started: Thu, 05 Jan 2023 12:04:18 -0500
Last State: Terminated
Reason: Error
Message: W0105 17:03:50.570372 1 cmd.go:213] Using insecure, self-signed certificates
I0105 17:03:50.570747 1 crypto.go:601] Generating new CA for check-endpoints-signer@1672938230 cert, and key in /tmp/serving-cert-1963727250/serving-signer.crt, /tmp/serving-cert-1963727250/serving-signer.key
I0105 17:03:50.918995 1 observer_polling.go:159] Starting file observer
W0105 17:03:50.934666 1 builder.go:230] unable to get owner reference (falling back to namespace): pods "kube-apiserver-weliang-01053-rx8tq-master-0" is forbidden: User "system:serviceaccount:openshift-kube-apiserver:check-endpoints" cannot get resource "pods" in API group "" in the namespace "openshift-kube-apiserver"
I0105 17:03:50.934813 1 builder.go:262] check-endpoints version 4.13.0-202212240845.p0.gb6ca7dc.assembly.stream-b6ca7dc-b6ca7dcf808b9deb9a2ca8a1c67f8ceb475caf59
I0105 17:03:50.935452 1 dynamic_serving_content.go:113] "Loaded a new cert/key pair" name="serving-cert::/tmp/serving-cert-1963727250/tls.crt::/tmp/serving-cert-1963727250/tls.key"
W0105 17:03:51.720732 1 requestheader_controller.go:193] Unable to get configmap/extension-apiserver-authentication in kube-system. Usually fixed by 'kubectl create rolebinding -n kube-system ROLEBINDING_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'
F0105 17:03:51.720771 1 cmd.go:138] error initializing delegating authentication: unable to load configmap based request-header-client-ca-file: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:openshift-kube-apiserver:check-endpoints" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
Exit Code: 255
Started: Thu, 05 Jan 2023 12:03:50 -0500
Finished: Thu, 05 Jan 2023 12:03:51 -0500
Ready: True
Restart Count: 9
Requests:
cpu: 10m
memory: 50Mi
Liveness: http-get https://:17697/healthz delay=10s timeout=10s period=10s #success=1 #failure=3
Readiness: http-get https://:17697/healthz delay=10s timeout=10s period=10s #success=1 #failure=3
Environment:
POD_NAME: kube-apiserver-weliang-01053-rx8tq-master-0 (v1:metadata.name)
POD_NAMESPACE: openshift-kube-apiserver (v1:metadata.namespace)
Mounts:
/etc/kubernetes/static-pod-certs from cert-dir (rw)
/etc/kubernetes/static-pod-resources from resource-dir (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
resource-dir:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7
HostPathType:
cert-dir:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/static-pod-resources/kube-apiserver-certs
HostPathType:
audit-dir:
Type: HostPath (bare host directory volume)
Path: /var/log/kube-apiserver
HostPathType:
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 4h11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
Normal Created 4h11m kubelet Created container setup
Normal Started 4h11m kubelet Started container setup
Normal Pulled 4h11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
Normal Created 4h11m kubelet Created container kube-apiserver
Normal Started 4h11m kubelet Started container kube-apiserver
Normal Pulled 4h11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Created 4h11m kubelet Created container kube-apiserver-cert-syncer
Normal Started 4h11m kubelet Started container kube-apiserver-cert-syncer
Normal Pulled 4h11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Started 4h11m kubelet Started container kube-apiserver-check-endpoints
Normal Started 4h11m kubelet Started container kube-apiserver-cert-regeneration-controller
Normal Created 4h11m kubelet Created container kube-apiserver-insecure-readyz
Normal Created 4h11m kubelet Created container kube-apiserver-cert-regeneration-controller
Normal Started 4h11m kubelet Started container kube-apiserver-insecure-readyz
Normal Pulled 4h11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Created 4h11m kubelet Created container kube-apiserver-check-endpoints
Normal Pulled 4h11m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Created 3h5m kubelet Created container kube-apiserver-cert-syncer
Normal Started 3h5m kubelet Started container kube-apiserver-cert-syncer
Normal Pulled 3h5m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Created 3h5m kubelet Created container kube-apiserver-cert-regeneration-controller
Normal Started 3h5m kubelet Started container kube-apiserver-cert-regeneration-controller
Normal Pulled 3h5m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Created 3h5m kubelet Created container kube-apiserver-insecure-readyz
Normal Started 3h5m kubelet Started container kube-apiserver-insecure-readyz
Normal Pulled 3h5m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Created 3h5m kubelet Created container kube-apiserver-check-endpoints
Normal Started 3h5m kubelet Started container kube-apiserver-check-endpoints
Normal Pulled 173m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
Normal Created 173m kubelet Created container setup
Normal Started 173m kubelet Started container setup
Normal Started 173m kubelet Started container kube-apiserver-insecure-readyz
Normal Created 173m kubelet Created container kube-apiserver
Normal Started 173m kubelet Started container kube-apiserver
Normal Pulled 173m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Created 173m kubelet Created container kube-apiserver-cert-syncer
Normal Started 173m kubelet Started container kube-apiserver-cert-syncer
Normal Pulled 173m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Created 173m kubelet Created container kube-apiserver-cert-regeneration-controller
Normal Pulled 173m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
Normal Pulled 173m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Created 173m kubelet Created container kube-apiserver-insecure-readyz
Normal Started 173m kubelet Started container kube-apiserver-cert-regeneration-controller
Normal Created 173m kubelet Created container kube-apiserver-check-endpoints
Normal Started 173m kubelet Started container kube-apiserver-check-endpoints
Warning Unhealthy 173m kubelet Liveness probe failed: Get "https://10.0.99.103:17697/healthz": read tcp 10.0.99.103:56616->10.0.99.103:17697: read: connection reset by peer
Warning ProbeError 173m kubelet Readiness probe error: HTTP probe failed with statuscode: 403
body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/readyz\": RBAC: [clusterrole.rbac.authorization.k8s.io \"system:webhook\" not found, clusterrole.rbac.authorization.k8s.io \"system:openshift:public-info-viewer\" not found, clusterrole.rbac.authorization.k8s.io \"self-access-reviewer\" not found, clusterrole.rbac.authorization.k8s.io \"system:public-info-viewer\" not found, clusterrole.rbac.authorization.k8s.io \"system:oauth-token-deleter\" not found, clusterrole.rbac.authorization.k8s.io \"system:scope-impersonation\" not found]","reason":"Forbidden","details":{},"code":403}
Warning Unhealthy 173m kubelet Readiness probe failed: HTTP probe failed with statuscode: 403
Warning ProbeError 173m kubelet Liveness probe error: Get "https://10.0.99.103:17697/healthz": read tcp 10.0.99.103:56616->10.0.99.103:17697: read: connection reset by peer
body:
Normal Pulled 173m (x2 over 173m) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Warning ProbeError 173m kubelet Readiness probe error: HTTP probe failed with statuscode: 500
body: [+]ping ok
[+]log ok
[+]etcd ok
[+]etcd-readiness ok
[-]api-openshift-apiserver-available failed: reason withheld
[-]api-openshift-oauth-apiserver-available failed: reason withheld
[+]informer-sync ok
[+]poststarthook/openshift.io-openshift-apiserver-reachable ok
[+]poststarthook/openshift.io-oauth-apiserver-reachable ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/quota.openshift.io-clusterquotamapping ok
[+]poststarthook/openshift.io-deprecated-api-requests-filter ok
[+]poststarthook/openshift.io-startkubeinformers ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[-]poststarthook/scheduling/bootstrap-system-priority-classes failed: reason withheld
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/apiservice-wait-for-first-sync ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/apiservice-openapiv3-controller ok
[+]shutdown ok
readyz check failed
Warning ProbeError 146m kubelet Readiness probe error: HTTP probe failed with statuscode: 500
body: [+]ping ok
[+]log ok
[-]etcd failed: reason withheld
[+]etcd-readiness ok
[+]api-openshift-apiserver-available ok
[+]api-openshift-oauth-apiserver-available ok
[+]informer-sync ok
[+]poststarthook/openshift.io-openshift-apiserver-reachable ok
[+]poststarthook/openshift.io-oauth-apiserver-reachable ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/quota.openshift.io-clusterquotamapping ok
[+]poststarthook/openshift.io-deprecated-api-requests-filter ok
[+]poststarthook/openshift.io-startkubeinformers ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/apiservice-wait-for-first-sync ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/apiservice-openapiv3-controller ok
[+]shutdown ok
readyz check failed
Warning Unhealthy 146m (x11 over 173m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
Normal Pulled 143m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
Normal Created 143m kubelet Created container setup
Normal Started 143m kubelet Started container setup
Normal Pulled 143m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
Normal Created 143m kubelet Created container kube-apiserver
Normal Started 143m kubelet Started container kube-apiserver
Normal Pulled 143m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Created 143m kubelet Created container kube-apiserver-check-endpoints
Normal Started 143m kubelet Started container kube-apiserver-cert-syncer
Normal Pulled 143m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Created 143m kubelet Created container kube-apiserver-cert-regeneration-controller
Normal Started 143m kubelet Started container kube-apiserver-cert-regeneration-controller
Normal Created 143m kubelet Created container kube-apiserver-cert-syncer
Normal Created 143m kubelet Created container kube-apiserver-insecure-readyz
Normal Started 143m kubelet Started container kube-apiserver-insecure-readyz
Normal Started 143m kubelet Started container kube-apiserver-check-endpoints
Normal Pulled 143m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Warning ProbeError 143m kubelet Readiness probe error: Get "https://10.0.99.103:17697/healthz": dial tcp 10.0.99.103:17697: connect: connection refused
body:
Warning Unhealthy 143m kubelet Readiness probe failed: Get "https://10.0.99.103:17697/healthz": dial tcp 10.0.99.103:17697: connect: connection refused
Warning ProbeError 143m kubelet Readiness probe error: HTTP probe failed with statuscode: 403
body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/readyz\"","reason":"Forbidden","details":{},"code":403}
Warning Unhealthy 143m kubelet Readiness probe failed: HTTP probe failed with statuscode: 403
Normal Pulled 143m (x2 over 143m) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Warning ProbeError 143m kubelet Liveness probe error: Get "https://10.0.99.103:17697/healthz": dial tcp 10.0.99.103:17697: connect: connection refused
body:
Warning Unhealthy 143m kubelet Liveness probe failed: Get "https://10.0.99.103:17697/healthz": dial tcp 10.0.99.103:17697: connect: connection refused
Normal Created 135m kubelet Created container kube-apiserver-cert-regeneration-controller
Normal Started 135m kubelet Started container kube-apiserver-insecure-readyz
Normal Pulled 135m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Created 135m kubelet Created container kube-apiserver-insecure-readyz
Normal Started 135m kubelet Started container kube-apiserver-cert-regeneration-controller
Normal Created 135m (x2 over 135m) kubelet Created container kube-apiserver-check-endpoints
Normal Started 135m (x2 over 135m) kubelet Started container kube-apiserver-check-endpoints
Warning BackOff 134m (x3 over 135m) kubelet Back-off restarting failed container
Normal Pulled 134m (x3 over 135m) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Warning BackOff 130m (x20 over 134m) kubelet Back-off restarting failed container
Normal Pulled 1s (x600 over <invalid>) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
Normal Pulled <invalid> kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
Normal Created <invalid> kubelet Created container setup
Normal Started <invalid> kubelet Started container setup
Normal Created <invalid> kubelet Created container kube-apiserver
Normal Started <invalid> kubelet Started container kube-apiserver
Normal Pulled <invalid> kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Pulled <invalid> kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
Normal Pulled <invalid> kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cb3c465bdfa0ebd38bdb74cd8b80f131dfaf503a4f8c120b7fcb4440eccd6a70" already present on machine
Normal Created <invalid> kubelet Created container setup
Normal Started <invalid> kubelet Started container setup
Normal Created <invalid> kubelet Created container kube-apiserver
Normal Started <invalid> kubelet Started container kube-apiserver
Normal Pulled <invalid> kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
Normal Created <invalid> kubelet Created container kube-apiserver-cert-syncer
Normal Started <invalid> kubelet Started container kube-apiserver-cert-syncer
Normal Pulled <invalid> kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f4d6dde7aebc339e6fc537f11c8ce94ca0d5a43d01ef18aaa21c3b957819afc" already present on machine
[weliang@weliang openshift-tests-private]$
[weliang@weliang openshift-tests-private]$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.13.0-0.nightly-2023-01-01-223309 True False 4h3m Error while reconciling 4.13.0-0.nightly-2023-01-01-223309: the cluster operator kube-apiserver is degraded
[weliang@weliang openshift-tests-private]$
Tested and verified in 4.12.0-rc.8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.12.1 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:0449 |