Bug 1902157
Summary: | The DaemonSet machine-api-termination-handler couldn't allocate Pod | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | sunzhaohua <zhsun> |
Component: | Cloud Compute | Assignee: | Joel Speed <jspeed> |
Cloud Compute sub component: | Other Providers | QA Contact: | sunzhaohua <zhsun> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | mimccune |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:36:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
sunzhaohua
2020-11-27 06:36:47 UTC
Failed to verify, machine-api-termination-handler pod stuck in CrashLoopBackOff status. $ oc get po NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-6c655f5d8c-5xjrj 2/2 Running 0 38m cluster-baremetal-operator-77c9666dc4-4bdv7 1/1 Running 0 38m machine-api-controllers-549cf9c695-v6r9h 7/7 Running 0 42m machine-api-operator-5b4fcbdbb7-j7xgf 2/2 Running 0 38m machine-api-termination-handler-z7g49 0/1 CrashLoopBackOff 6 7m2s $ oc logs -f machine-api-termination-handler-z7g49 I1130 02:45:09.603797 1 cert_rotation.go:137] Starting client certificate rotation controller I1130 02:45:09.703793 1 request.go:581] Throttling request took 76.347045ms, request: GET:https://api-int.zhsunaws30.qe.devcluster.openshift.com:6443/apis/apiextensions.k8s.io/v1beta1?timeout=32s E1130 02:45:12.012929 1 main.go:70] "msg"="Error starting termination handler" "error"="error fetching machine for node (\"ip-10-0-219-126.us-east-2.compute.internal\"): error listing machines: machines.machine.openshift.io is forbidden: User \"system:node:ip-10-0-219-126.us-east-2.compute.internal\" cannot list resource \"machines\" in API group \"machine.openshift.io\" in the namespace \"openshift-machine-api\"" It's known that these weren't going to work initially, we are completely reimplementing the model of how these work. I've attached the three implementation PRs, this should be ready to verify once they are all merged. the PR associated with this has all the necessary labels and is currently waiting on CI to pass Verifed on aws, gcp and azure. clusterversion: 4.7.0-0.nightly-2020-12-09-112139 $ oc get ds NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE machine-api-termination-handler 1 1 1 1 1 machine.openshift.io/interruptible-instance= 4h56m $ oc get po NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-6c74c8db9c-tkvj2 2/2 Running 0 4h34m cluster-baremetal-operator-6769569bd7-gxfkd 1/1 Running 0 4h34m machine-api-controllers-d9c44bd94-7xqtt 7/7 Running 0 4h32m machine-api-operator-78d65b4969-7hgwh 2/2 Running 0 4h32m machine-api-termination-handler-gj8pf 1/1 Running 0 6m28s Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |