Bug 1841255
Summary: | MCO firstboot does not handle being interrupted | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Colin Walters <walters> |
Component: | Machine Config Operator | Assignee: | Colin Walters <walters> |
Status: | CLOSED ERRATA | QA Contact: | Antonio Murdaca <amurdaca> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.5 | CC: | amurdaca, augol, beth.white, bnemec, jlebon, pehunt, stbenjam, walters |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | 4.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1840222 | Environment: | |
Last Closed: | 2020-07-13 17:42:41 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1840222 | ||
Bug Blocks: | 1840301 |
Description
Colin Walters
2020-05-28 17:49:38 UTC
Tested with 4.5.0-0.nightly-2020-06-17-234944 There's no clear reproduction for this as it happens to be flacky anyway, I've spun up a cluster with the above nightly and made sure the changes to the unit are there (no more BindsTo): 17:25:15 [~/Downloads] export KUBECONFIG=cluster-bot-2020-06-18-142422.kubeconfig 17:25:19 [~/Downloads] oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-163-11.us-west-2.compute.internal Ready master 30m v1.18.3+91d0edd ip-10-0-163-148.us-west-2.compute.internal Ready master 31m v1.18.3+91d0edd ip-10-0-177-120.us-west-2.compute.internal Ready worker 20m v1.18.3+91d0edd ip-10-0-185-241.us-west-2.compute.internal Ready worker 20m v1.18.3+91d0edd ip-10-0-245-33.us-west-2.compute.internal Ready master 30m v1.18.3+91d0edd ip-10-0-249-229.us-west-2.compute.internal Ready worker 20m v1.18.3+91d0edd 17:25:24 [~/Downloads] oc debug node ip-10-0-163-11.us-west-2.compute.internal Error from server (NotFound): pods "node" not found 17:25:30 [~/Downloads] oc debug node/ip-10-0-163-11.us-west-2.compute.internal 1 ↵ Starting pod/ip-10-0-163-11us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.163.11 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host bash [root@ip-10-0-163-11 /]# systemctl cat machine-config-daemon-firstboot.service # /etc/systemd/system/machine-config-daemon-firstboot.service [Unit] Description=Machine Config Daemon Firstboot # Make sure it runs only on OSTree booted system ConditionPathExists=/run/ostree-booted # Removal of this file signals firstboot completion ConditionPathExists=/etc/ignition-machine-config-encapsulated.json # We only want to run on 4.3 clusters and above; this came from # https://github.com/coreos/coreos-assembler/pull/768 ConditionPathExists=/sysroot/.coreos-aleph-version.json After=ignition-firstboot-complete.service Before=crio.service crio-wipe.service Before=kubelet.service [Service] # Need oneshot to delay kubelet Type=oneshot ExecStart=/usr/libexec/machine-config-daemon firstboot-complete-machineconfig [Install] WantedBy=multi-user.target RequiredBy=crio.service kubelet.service [root@ip-10-0-163-11 /]# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |