Bug 1817465
Summary: | timed out waiting for the condition during syncRequiredMachineConfigPools: error pool master is not ready | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Nikolaos Leandros Moraitis <nmoraiti> |
Component: | Node | Assignee: | Harshal Patil <harpatil> |
Status: | CLOSED DUPLICATE | QA Contact: | Sunil Choudhary <schoudha> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.5 | CC: | amurdaca, aos-bugs, bparees, harpatil, jokerman, kgarriso, periklis, rphillips, wking |
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-18 14:53:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Nikolaos Leandros Moraitis
2020-03-26 11:49:43 UTC
The master pool was ready and completed the upgrade: From MCC logs: I0326 01:12:38.648998 1 status.go:82] Pool master: All nodes are updated with rendered-master-d03f1a11947f94bcb114b096e51147a3 Something after that happened and the node (a master) is gone: I0326 01:17:35.815305 1 node_controller.go:433] Pool master: node ip-10-0-138-27.us-west-2.compute.internal is now reporting unready: node ip-10-0-138-27.us-west-2.compute.internal is reporting OutOfDisk=Unknown The MCO hasn't caused that from looking at the logs. conditions: [ { lastHeartbeatTime: "2020-03-26T01:16:24Z", lastTransitionTime: "2020-03-26T01:17:35Z", message: "Kubelet stopped posting node status.", reason: "NodeStatusUnknown", status: "Unknown", type: "MemoryPressure" }, { lastHeartbeatTime: "2020-03-26T01:16:24Z", lastTransitionTime: "2020-03-26T01:17:35Z", message: "Kubelet stopped posting node status.", reason: "NodeStatusUnknown", status: "Unknown", type: "DiskPressure" }, { lastHeartbeatTime: "2020-03-26T01:16:24Z", lastTransitionTime: "2020-03-26T01:17:35Z", message: "Kubelet stopped posting node status.", reason: "NodeStatusUnknown", status: "Unknown", type: "PIDPressure" }, { lastHeartbeatTime: "2020-03-26T01:16:24Z", lastTransitionTime: "2020-03-26T01:17:35Z", message: "Kubelet stopped posting node status.", reason: "NodeStatusUnknown", status: "Unknown", type: "Ready" } ], How often do we see this error in CI jobs? Could it be the cloud provider failing the machine? (happened to my knowledge) Moving to node team based on Antonio's assessment of a disappearing node. This general error(timed out waiting for the condition during syncRequiredMachineConfigPools) is showing up a lot: https://search.svc.ci.openshift.org/?search=timed+out+waiting+for+the+condition+during+syncRequiredMachineConfigPools&maxAge=48h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job In https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/23129/artifacts/junit_operator.xml Cluster operator network Degraded is True with RolloutHung: DaemonSet "openshift-multus/multus" rollout is not making progress - last change 2020-03-26T01:17:35Z DaemonSet "openshift-sdn/ovs" rollout is not making progress - last change 2020-03-26T01:17:36Z DaemonSet "openshift-sdn/sdn" rollout is not making progress - last change 2020-03-26T01:17:36Z" level=info msg="Cluster operator network Progressing is True with Deploying: DaemonSet "openshift-multus/multus" is not available (awaiting 1 nodes) DaemonSet "openshift-sdn/ovs" is not available (awaiting 1 nodes) DaemonSet "openshift-sdn/sdn" is not available (awaiting 1 nodes)" *** This bug has been marked as a duplicate of bug 1834895 *** |