*** Bug 1847351 has been marked as a duplicate of this bug. ***
Saw https://github.com/openshift/machine-config-operator/pull/1822 is already merged into 4.5.0-0.nightly-2020-06-16-213659. Install a 4.4 cluster with "restart.service" using 4.4.0-0.nightly-2020-06-15-203705 Trigger upgrade towards 4.5.0-0.nightly-2020-06-16-213659. The problem is still reproduced. [root@preserve-jialiu-ansible ~]# oc get node NAME STATUS ROLES AGE VERSION jialiu441-6ssnn-control-plane-0 Ready master,worker 22h v1.17.1+912792b jialiu441-6ssnn-control-plane-1 Ready,SchedulingDisabled master,worker 22h v1.17.1+912792b jialiu441-6ssnn-control-plane-2 Ready master,worker 22h v1.17.1+912792b [root@preserve-jialiu-ansible ~]# oc describe node|grep machineconfig Annotations: machineconfiguration.openshift.io/currentConfig: rendered-master-3a6779965724f02eb44e835f074bd238 machineconfiguration.openshift.io/desiredConfig: rendered-master-3a6779965724f02eb44e835f074bd238 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/ssh: accessed machineconfiguration.openshift.io/state: Done Annotations: machineconfiguration.openshift.io/currentConfig: rendered-master-3a6779965724f02eb44e835f074bd238 machineconfiguration.openshift.io/desiredConfig: rendered-master-88d0f91d9987c81d0569a420f55b3552 machineconfiguration.openshift.io/reason: unexpected on-disk state validating against rendered-master-3a6779965724f02eb44e835f074bd238 machineconfiguration.openshift.io/state: Degraded Annotations: machineconfiguration.openshift.io/currentConfig: rendered-master-3a6779965724f02eb44e835f074bd238 machineconfiguration.openshift.io/desiredConfig: rendered-master-3a6779965724f02eb44e835f074bd238 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done [root@jialiu441-6ssnn-control-plane-1 ~]# systemctl cat restart # /etc/systemd/system/restart.service [Unit] ConditionFirstBoot=yes [Service] Type=idle ExecStart=/sbin/reboot [Install] WantedBy=multi-user.target [root@jialiu441-6ssnn-control-plane-1 ~]# journalctl --system |grep -i reboot Jun 16 07:23:04 host-10-0-97-148.openstacklocal ignition[1124]: INFO : files: op(34): [started] processing unit "mcd-write-pivot-reboot.service" Jun 16 07:23:04 host-10-0-97-148.openstacklocal ignition[1124]: INFO : files: op(34): op(35): [started] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service" Jun 16 07:23:04 host-10-0-97-148.openstacklocal ignition[1124]: INFO : files: op(34): op(35): [finished] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service" Jun 16 07:23:04 host-10-0-97-148.openstacklocal ignition[1124]: INFO : files: op(34): [finished] processing unit "mcd-write-pivot-reboot.service" Jun 16 07:23:04 host-10-0-97-148.openstacklocal ignition[1124]: INFO : files: op(36): [started] enabling unit "mcd-write-pivot-reboot.service" Jun 16 07:23:04 host-10-0-97-148.openstacklocal ignition[1124]: INFO : files: op(36): [finished] enabling unit "mcd-write-pivot-reboot.service" Jun 16 07:23:06 jialiu441-6ssnn-control-plane-1 ignition[1124]: INFO : files: op(34): [started] processing unit "mcd-write-pivot-reboot.service" Jun 16 07:23:06 jialiu441-6ssnn-control-plane-1 ignition[1124]: INFO : files: op(34): op(35): [started] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service" Jun 16 07:23:06 jialiu441-6ssnn-control-plane-1 ignition[1124]: INFO : files: op(34): op(35): [finished] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service" Jun 16 07:23:06 jialiu441-6ssnn-control-plane-1 ignition[1124]: INFO : files: op(34): [finished] processing unit "mcd-write-pivot-reboot.service" Jun 16 07:23:06 jialiu441-6ssnn-control-plane-1 ignition[1124]: INFO : files: op(36): [started] enabling unit "mcd-write-pivot-reboot.service" Jun 16 07:23:06 jialiu441-6ssnn-control-plane-1 ignition[1124]: INFO : files: op(36): [finished] enabling unit "mcd-write-pivot-reboot.service" Jun 16 07:23:07 jialiu441-6ssnn-control-plane-1 restorecon[1478]: Relabeled /etc/systemd/system/mcd-write-pivot-reboot.service from system_u:object_r:unlabeled_t:s0 to system_u:object_r:power_unit_file_t:s0 Jun 16 07:23:08 jialiu441-6ssnn-control-plane-1 systemd[1]: Started mcd-write-pivot-reboot.service. Jun 16 07:23:08 jialiu441-6ssnn-control-plane-1 systemd[1]: mcd-write-pivot-reboot.service: Consumed 11ms CPU time Jun 16 07:23:13 jialiu441-6ssnn-control-plane-1 systemd[1]: Starting Reboot... -- Reboot -- Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 root[13776]: machine-config-daemon[5042]: initiating reboot: Node will reboot into config rendered-master-3a6779965724f02eb44e835f074bd238 Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 hyperkube[1418]: I0616 07:28:59.798844 1418 factory.go:173] Using factory "raw" for container "/system.slice/machine-config-daemon-reboot.service" Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 hyperkube[1418]: I0616 07:28:59.799287 1418 manager.go:950] Added container: "/system.slice/machine-config-daemon-reboot.service" (aliases: [], namespace: "") Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 hyperkube[1418]: I0616 07:28:59.799510 1418 container.go:467] Start housekeeping for container "/system.slice/machine-config-daemon-reboot.service" Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-3a6779965724f02eb44e835f074bd238. Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 systemd-logind[1296]: System is rebooting. Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-3a6779965724f02eb44e835f074bd238. Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 systemd[1]: machine-config-daemon-reboot.service: Consumed 22ms CPU time Jun 16 07:30:30 jialiu441-6ssnn-control-plane-1 systemd[1]: Starting Reboot... -- Reboot -- Jun 16 07:31:47 jialiu441-6ssnn-control-plane-1 hyperkube[1430]: I0616 07:31:47.294622 1430 event.go:281] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"jialiu441-6ssnn-control-plane-1", UID:"jialiu441-6ssnn-control-plane-1", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'Rebooted' Node jialiu441-6ssnn-control-plane-1 has been rebooted, boot id: 6da77e17-0757-457e-982b-b661b2f727e9 Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 root[3610152]: machine-config-daemon[3594232]: initiating reboot: Node will reboot into config rendered-master-88d0f91d9987c81d0569a420f55b3552 Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 hyperkube[1430]: I0617 03:56:44.351852 1430 factory.go:173] Using factory "raw" for container "/system.slice/machine-config-daemon-reboot.service" Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 hyperkube[1430]: I0617 03:56:44.352565 1430 manager.go:950] Added container: "/system.slice/machine-config-daemon-reboot.service" (aliases: [], namespace: "") Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 hyperkube[1430]: I0617 03:56:44.354485 1430 container.go:467] Start housekeeping for container "/system.slice/machine-config-daemon-reboot.service" Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-88d0f91d9987c81d0569a420f55b3552. Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 systemd-logind[1294]: System is rebooting. Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-88d0f91d9987c81d0569a420f55b3552. Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 systemd[1]: machine-config-daemon-reboot.service: Consumed 22ms CPU time Jun 17 03:58:14 jialiu441-6ssnn-control-plane-1 systemd[1]: Starting Reboot... -- Reboot -- Jun 17 03:59:56 jialiu441-6ssnn-control-plane-1 machine-config-daemon[1348]: I0617 03:59:56.813518 1348 update.go:1404] initiating reboot: Completing firstboot provisioning to rendered-master-3a6779965724f02eb44e835f074bd238 Jun 17 03:59:56 jialiu441-6ssnn-control-plane-1 root[1585]: machine-config-daemon[1348]: initiating reboot: Completing firstboot provisioning to rendered-master-3a6779965724f02eb44e835f074bd238 Jun 17 03:59:56 jialiu441-6ssnn-control-plane-1 systemd-logind[1326]: System is rebooting. Jun 17 03:59:56 jialiu441-6ssnn-control-plane-1 systemd[1]: machine-config-daemon-reboot.service: Consumed 25ms CPU time Jun 17 04:00:03 jialiu441-6ssnn-control-plane-1 systemd[1]: Starting Reboot... -- Reboot -- Jun 17 04:00:58 jialiu441-6ssnn-control-plane-1 hyperkube[1444]: I0617 04:00:58.498405 1444 event.go:281] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"jialiu441-6ssnn-control-plane-1", UID:"jialiu441-6ssnn-control-plane-1", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'Rebooted' Node jialiu441-6ssnn-control-plane-1 has been rebooted, boot id: 22f89695-1e50-4517-8da4-e1249d6566f7
Can you get more logs (full journals plus the MCD logs) or just add a kubeconfig?
(In reply to Colin Walters from comment #5) > Can you get more logs (full journals plus the MCD logs) or just add a > kubeconfig? Johnny can you help with the above?
I'm now confused as to how the 4.6 BZ got verified then. if state.bootstrapping { The if above, where we inserted the removal of the encapsulated json isn't called during an upgrade. So, what it seems to be happening is correct and we never call the new hack code and thus the encapsulated json is still around and causes this issue. Should we just, uhm, get rid of that file first thing (bootstrapping or not) when we're running the MCD?
Created https://github.com/openshift/machine-config-operator/pull/1844 (which targets the 4.6 BZ now) for verification
I think the verification that happened https://bugzilla.redhat.com/show_bug.cgi?id=1842906#c55 was a lucky one: Jun 15 14:46:48 jialiu452-p9652-control-plane-2 machine-config-daemon[1345]: I0615 14:46:48.011423 1345 update.go:1346] initiating reboot: Completing firstboot provisioning to rendered-master-5c6dfc33b0a6115a3a3c25831d6c8009 This means we got there and weren't killed on the way there so the encapsulated json is now gone (which is correct) The code in the PR that merged for the 4.6 bug couldn't had an opportunity to run since we were effectively upgrading and the PR changes the very first initial bootstrap which happens only on installation (maybe I'm off tho, so, seeking further confirmations :))
(In reply to Antonio Murdaca from comment #11) > I'm now confused as to how the 4.6 BZ got verified then. We was running 4.5 -> 4.6 upgrade, per QE's black-box test result, it was proved such issue does not happen in 4.5 -> 4.6, but only 4.4 -> 4.5, so in the next round of testing, QE will only verify the fix PR in 4.6 bug (4.5 -> 4.6 upgrade) does not introduce any problem, so that unblock backport to 4.5, will run real functional verification in this bug (4.4 -> 4.5 upgrade).
(In reply to Johnny Liu from comment #14) > (In reply to Antonio Murdaca from comment #11) > > I'm now confused as to how the 4.6 BZ got verified then. > We was running 4.5 -> 4.6 upgrade, per QE's black-box test result, it was > proved such issue does not happen in 4.5 -> 4.6, but only 4.4 -> 4.5, so in > the next round of testing, QE will only verify the fix PR in 4.6 bug (4.5 -> > 4.6 upgrade) does not introduce any problem, so that unblock backport to > 4.5, will run real functional verification in this bug (4.4 -> 4.5 upgrade). ohh, I see thanks for clarifying that +1 We're going ahead with https://github.com/openshift/machine-config-operator/pull/1844 then to make the 4.6 BZ happy again https://bugzilla.redhat.com/show_bug.cgi?id=1842906 Once that merges, we'll have a fix ready for this 4.4->4.5 BZ sounds good?
> sounds good? +1
(In reply to Antonio Murdaca from comment #11) > I'm now confused as to how the 4.6 BZ got verified then. Didn't notice this earlier. 4.6 BZ was verified by upgrading 4.5->4.6 (https://bugzilla.redhat.com/show_bug.cgi?id=1842906#c55). We have removed BindsTo= in 4.5 https://github.com/openshift/machine-config-operator/pull/1762/files#diff-4eb88ea6718e4dcceecd6dfe6e4437e1L8 .So, m-c-d-firstboot service would have completed when 4.5 cluster was installed with an additional reboot(due to reboot.service). Best way to verify (https://bugzilla.redhat.com/show_bug.cgi?id=1842906 would be to upgrade from 4.4->4.6 in the environment where bug is happening.
Verified this bug with 4.5.0-0.nightly-2020-06-20-194346, and PASS. [jialiu@mylaptop ~]$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.4.0-0.nightly-2020-06-22-065157 True False False 26m cloud-credential 4.4.0-0.nightly-2020-06-22-065157 True False False 60m cluster-autoscaler 4.4.0-0.nightly-2020-06-22-065157 True False False 29m console 4.4.0-0.nightly-2020-06-22-065157 True False False 25m csi-snapshot-controller 4.4.0-0.nightly-2020-06-22-065157 True False False 33m dns 4.4.0-0.nightly-2020-06-22-065157 True False False 56m etcd 4.4.0-0.nightly-2020-06-22-065157 True False False 52m image-registry 4.4.0-0.nightly-2020-06-22-065157 True False False 32m ingress 4.4.0-0.nightly-2020-06-22-065157 True False False 32m insights 4.4.0-0.nightly-2020-06-22-065157 True False False 33m kube-apiserver 4.4.0-0.nightly-2020-06-22-065157 True False False 48m kube-controller-manager 4.4.0-0.nightly-2020-06-22-065157 True False False 51m kube-scheduler 4.4.0-0.nightly-2020-06-22-065157 True False False 33m kube-storage-version-migrator 4.4.0-0.nightly-2020-06-22-065157 True False False 57m machine-api 4.4.0-0.nightly-2020-06-22-065157 True False False 33m machine-config 4.4.0-0.nightly-2020-06-22-065157 True False False 33m marketplace 4.4.0-0.nightly-2020-06-22-065157 True False False 32m monitoring 4.4.0-0.nightly-2020-06-22-065157 True False False 26m network 4.4.0-0.nightly-2020-06-22-065157 True False False 58m node-tuning 4.4.0-0.nightly-2020-06-22-065157 True False False 58m openshift-apiserver 4.4.0-0.nightly-2020-06-22-065157 True False False 33m openshift-controller-manager 4.4.0-0.nightly-2020-06-22-065157 True False False 30m openshift-samples 4.4.0-0.nightly-2020-06-22-065157 True False False 29m operator-lifecycle-manager 4.4.0-0.nightly-2020-06-22-065157 True False False 57m operator-lifecycle-manager-catalog 4.4.0-0.nightly-2020-06-22-065157 True False False 57m operator-lifecycle-manager-packageserver 4.4.0-0.nightly-2020-06-22-065157 True False False 31m service-ca 4.4.0-0.nightly-2020-06-22-065157 True False False 57m service-catalog-apiserver 4.4.0-0.nightly-2020-06-22-065157 True False False 58m service-catalog-controller-manager 4.4.0-0.nightly-2020-06-22-065157 True False False 58m storage 4.4.0-0.nightly-2020-06-22-065157 True False False 33m [jialiu@mylaptop ~]$ oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-06-20-194346 --allow-explicit-upgrade --force Updating to release image registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-06-20-194346 [jialiu@mylaptop ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-06-22-065157 True True 18s Working towards registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-06-20-194346: downloading update [jialiu@mylaptop ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-06-20-194346 True False 112m Cluster version is 4.5.0-0.nightly-2020-06-20-194346 [jialiu@mylaptop ~]$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.5.0-0.nightly-2020-06-20-194346 True False False 3h2m cloud-credential 4.5.0-0.nightly-2020-06-20-194346 True False False 3h36m cluster-autoscaler 4.5.0-0.nightly-2020-06-20-194346 True False False 3h5m config-operator 4.5.0-0.nightly-2020-06-20-194346 True False False 153m console 4.5.0-0.nightly-2020-06-20-194346 True False False 121m csi-snapshot-controller 4.5.0-0.nightly-2020-06-20-194346 True False False 126m dns 4.5.0-0.nightly-2020-06-20-194346 True False False 3h32m etcd 4.5.0-0.nightly-2020-06-20-194346 True False False 3h28m image-registry 4.5.0-0.nightly-2020-06-20-194346 True False False 3h8m ingress 4.5.0-0.nightly-2020-06-20-194346 True False False 3h8m insights 4.5.0-0.nightly-2020-06-20-194346 True False False 3h9m kube-apiserver 4.5.0-0.nightly-2020-06-20-194346 True False False 3h24m kube-controller-manager 4.5.0-0.nightly-2020-06-20-194346 True False False 3h27m kube-scheduler 4.5.0-0.nightly-2020-06-20-194346 True False False 3h9m kube-storage-version-migrator 4.5.0-0.nightly-2020-06-20-194346 True False False 127m machine-api 4.5.0-0.nightly-2020-06-20-194346 True False False 3h9m machine-approver 4.5.0-0.nightly-2020-06-20-194346 True False False 148m machine-config 4.5.0-0.nightly-2020-06-20-194346 True False False 116m marketplace 4.5.0-0.nightly-2020-06-20-194346 True False False 120m monitoring 4.5.0-0.nightly-2020-06-20-194346 True False False 145m network 4.5.0-0.nightly-2020-06-20-194346 True False False 3h34m node-tuning 4.5.0-0.nightly-2020-06-20-194346 True False False 148m openshift-apiserver 4.5.0-0.nightly-2020-06-20-194346 True False False 124m openshift-controller-manager 4.5.0-0.nightly-2020-06-20-194346 True False False 146m openshift-samples 4.5.0-0.nightly-2020-06-20-194346 True False False 148m operator-lifecycle-manager 4.5.0-0.nightly-2020-06-20-194346 True False False 3h33m operator-lifecycle-manager-catalog 4.5.0-0.nightly-2020-06-20-194346 True False False 3h33m operator-lifecycle-manager-packageserver 4.5.0-0.nightly-2020-06-20-194346 True False False 121m service-ca 4.5.0-0.nightly-2020-06-20-194346 True False False 3h34m storage 4.5.0-0.nightly-2020-06-20-194346 True False False 148m [core@jialiu44-dwtvq-control-plane-0 ~]$ journalctl --system|grep -i reboot Jun 22 09:53:35 host-10-0-97-126.openstacklocal ignition[1122]: INFO : files: op(34): [started] processing unit "mcd-write-pivot-reboot.service" Jun 22 09:53:35 host-10-0-97-126.openstacklocal ignition[1122]: INFO : files: op(34): op(35): [started] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service" Jun 22 09:53:35 host-10-0-97-126.openstacklocal ignition[1122]: INFO : files: op(34): op(35): [finished] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service" Jun 22 09:53:35 host-10-0-97-126.openstacklocal ignition[1122]: INFO : files: op(34): [finished] processing unit "mcd-write-pivot-reboot.service" Jun 22 09:53:35 host-10-0-97-126.openstacklocal ignition[1122]: INFO : files: op(36): [started] enabling unit "mcd-write-pivot-reboot.service" Jun 22 09:53:35 host-10-0-97-126.openstacklocal ignition[1122]: INFO : files: op(36): [finished] enabling unit "mcd-write-pivot-reboot.service" Jun 22 09:53:39 jialiu44-dwtvq-control-plane-0 restorecon[1478]: Relabeled /etc/systemd/system/mcd-write-pivot-reboot.service from system_u:object_r:unlabeled_t:s0 to system_u:object_r:power_unit_file_t:s0 Jun 22 09:53:39 jialiu44-dwtvq-control-plane-0 systemd[1]: Started mcd-write-pivot-reboot.service. Jun 22 09:53:39 jialiu44-dwtvq-control-plane-0 systemd[1]: mcd-write-pivot-reboot.service: Consumed 4ms CPU time Jun 22 09:53:45 jialiu44-dwtvq-control-plane-0 systemd[1]: Starting Reboot... -- Reboot -- Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 root[14273]: machine-config-daemon[7242]: initiating reboot: Node will reboot into config rendered-master-994a41b33fe4832aa73597dc0b31323e Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 hyperkube[1414]: I0622 09:58:52.412149 1414 factory.go:173] Using factory "raw" for container "/system.slice/machine-config-daemon-reboot.service" Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 hyperkube[1414]: I0622 09:58:52.412687 1414 manager.go:950] Added container: "/system.slice/machine-config-daemon-reboot.service" (aliases: [], namespace: "") Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 hyperkube[1414]: I0622 09:58:52.413024 1414 container.go:467] Start housekeeping for container "/system.slice/machine-config-daemon-reboot.service" Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-994a41b33fe4832aa73597dc0b31323e. Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 systemd-logind[1288]: System is rebooting. Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 systemd[1]: Stopping machine-config-daemon: Node will reboot into config rendered-master-994a41b33fe4832aa73597dc0b31323e... Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-994a41b33fe4832aa73597dc0b31323e. Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 systemd[1]: machine-config-daemon-reboot.service: Consumed 21ms CPU time Jun 22 10:00:22 jialiu44-dwtvq-control-plane-0 systemd[1]: Starting Reboot... -- Reboot -- Jun 22 10:02:17 jialiu44-dwtvq-control-plane-0 hyperkube[1434]: I0622 10:02:17.624501 1434 event.go:281] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"jialiu44-dwtvq-control-plane-0", UID:"jialiu44-dwtvq-control-plane-0", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'Rebooted' Node jialiu44-dwtvq-control-plane-0 has been rebooted, boot id: 98f8dab9-117f-4730-8f35-b430f9b0dc7a Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 root[282527]: machine-config-daemon[228455]: initiating reboot: Node will reboot into config rendered-master-c564bd7f1ddf708bc43523a3cb2ac4f3 Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 hyperkube[1434]: I0622 11:26:10.295287 1434 factory.go:173] Using factory "raw" for container "/system.slice/machine-config-daemon-reboot.service" Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 hyperkube[1434]: I0622 11:26:10.295930 1434 manager.go:950] Added container: "/system.slice/machine-config-daemon-reboot.service" (aliases: [], namespace: "") Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 hyperkube[1434]: I0622 11:26:10.296199 1434 container.go:467] Start housekeeping for container "/system.slice/machine-config-daemon-reboot.service" Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-c564bd7f1ddf708bc43523a3cb2ac4f3. Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 systemd-logind[1297]: System is rebooting. Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 systemd[1]: Stopping machine-config-daemon: Node will reboot into config rendered-master-c564bd7f1ddf708bc43523a3cb2ac4f3... Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-c564bd7f1ddf708bc43523a3cb2ac4f3. Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 systemd[1]: machine-config-daemon-reboot.service: Consumed 24ms CPU time Jun 22 11:27:40 jialiu44-dwtvq-control-plane-0 systemd[1]: Starting Reboot... -- Reboot -- Jun 22 11:28:43 jialiu44-dwtvq-control-plane-0 hyperkube[1491]: I0622 11:28:43.785412 1491 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"jialiu44-dwtvq-control-plane-0", UID:"jialiu44-dwtvq-control-plane-0", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'Rebooted' Node jialiu44-dwtvq-control-plane-0 has been rebooted, boot id: 87e3739b-0984-4b9d-bfc2-06468d7ae4df
Comment 19 is run against a upi on baremetal cluster, tomorrow will run a upgrade against upi on vsphere for a double confirm.
Verified the bug with 4.5.0-0.nightly-2020-06-22-201506 (upgrade from 4.4.9) and passed. $ oc adm upgrade --to-image registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-06-22-201506 --force --allow-explicit-upgrade Updating to release image registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-06-22-201506 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.9 True True 17m Working towards 4.5.0-0.nightly-2020-06-22-201506: 79% complete $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-06-22-201506 True False 13m Cluster version is 4.5.0-0.nightly-2020-06-22-201506 $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.5.0-0.nightly-2020-06-22-201506 True False False 66m cloud-credential 4.5.0-0.nightly-2020-06-22-201506 True False False 85m cluster-autoscaler 4.5.0-0.nightly-2020-06-22-201506 True False False 73m config-operator 4.5.0-0.nightly-2020-06-22-201506 True False False 48m console 4.5.0-0.nightly-2020-06-22-201506 True False False 15m csi-snapshot-controller 4.5.0-0.nightly-2020-06-22-201506 True False False 11m dns 4.5.0-0.nightly-2020-06-22-201506 True False False 77m etcd 4.5.0-0.nightly-2020-06-22-201506 True False False 77m image-registry 4.5.0-0.nightly-2020-06-22-201506 True False False 12m ingress 4.5.0-0.nightly-2020-06-22-201506 True False False 69m insights 4.5.0-0.nightly-2020-06-22-201506 True False False 74m kube-apiserver 4.5.0-0.nightly-2020-06-22-201506 True False False 76m kube-controller-manager 4.5.0-0.nightly-2020-06-22-201506 True False False 80m kube-scheduler 4.5.0-0.nightly-2020-06-22-201506 True False False 76m kube-storage-version-migrator 4.5.0-0.nightly-2020-06-22-201506 True False False 12m machine-api 4.5.0-0.nightly-2020-06-22-201506 True False False 74m machine-approver 4.5.0-0.nightly-2020-06-22-201506 True False False 40m machine-config 4.5.0-0.nightly-2020-06-22-201506 True False False 4m49s marketplace 4.5.0-0.nightly-2020-06-22-201506 True False False 14m monitoring 4.5.0-0.nightly-2020-06-22-201506 True False False 37m network 4.5.0-0.nightly-2020-06-22-201506 True False False 83m node-tuning 4.5.0-0.nightly-2020-06-22-201506 True False False 40m openshift-apiserver 4.5.0-0.nightly-2020-06-22-201506 True False False 39m openshift-controller-manager 4.5.0-0.nightly-2020-06-22-201506 True False False 70m openshift-samples 4.5.0-0.nightly-2020-06-22-201506 True False False 39m operator-lifecycle-manager 4.5.0-0.nightly-2020-06-22-201506 True False False 82m operator-lifecycle-manager-catalog 4.5.0-0.nightly-2020-06-22-201506 True False False 82m operator-lifecycle-manager-packageserver 4.5.0-0.nightly-2020-06-22-201506 True False False 14m service-ca 4.5.0-0.nightly-2020-06-22-201506 True False False 82m storage 4.5.0-0.nightly-2020-06-22-201506 True False False 40m [root@control-plane-0 ~]# journalctl --system | grep -i reboot Jun 23 02:01:40 localhost ignition[1108]: INFO : files: op(3f): [started] processing unit "mcd-write-pivot-reboot.service" Jun 23 02:01:40 localhost ignition[1108]: INFO : files: op(3f): op(40): [started] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service" Jun 23 02:01:40 localhost ignition[1108]: INFO : files: op(3f): op(40): [finished] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service" Jun 23 02:01:40 localhost ignition[1108]: INFO : files: op(3f): [finished] processing unit "mcd-write-pivot-reboot.service" Jun 23 02:01:40 localhost ignition[1108]: INFO : files: op(41): [started] enabling unit "mcd-write-pivot-reboot.service" Jun 23 02:01:40 localhost ignition[1108]: INFO : files: op(41): [finished] enabling unit "mcd-write-pivot-reboot.service" Jun 23 02:01:42 control-plane-0 ignition[1108]: INFO : files: op(3f): [started] processing unit "mcd-write-pivot-reboot.service" Jun 23 02:01:42 control-plane-0 ignition[1108]: INFO : files: op(3f): op(40): [started] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service" Jun 23 02:01:42 control-plane-0 ignition[1108]: INFO : files: op(3f): op(40): [finished] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service" Jun 23 02:01:42 control-plane-0 ignition[1108]: INFO : files: op(3f): [finished] processing unit "mcd-write-pivot-reboot.service" Jun 23 02:01:42 control-plane-0 ignition[1108]: INFO : files: op(41): [started] enabling unit "mcd-write-pivot-reboot.service" Jun 23 02:01:42 control-plane-0 ignition[1108]: INFO : files: op(41): [finished] enabling unit "mcd-write-pivot-reboot.service" Jun 23 02:01:43 control-plane-0 restorecon[1437]: Relabeled /etc/systemd/system/mcd-write-pivot-reboot.service from system_u:object_r:unlabeled_t:s0 to system_u:object_r:power_unit_file_t:s0 Jun 23 02:01:43 control-plane-0 systemd[1]: Started mcd-write-pivot-reboot.service. Jun 23 02:01:43 control-plane-0 systemd[1]: mcd-write-pivot-reboot.service: Consumed 6ms CPU time Jun 23 02:01:49 control-plane-0 systemd[1]: Starting Reboot... -- Reboot -- Jun 23 02:07:44 control-plane-0 root[11282]: machine-config-daemon[5706]: initiating reboot: Node will reboot into config rendered-master-657bb47f985de9c41d0cb8273d3b6c80 Jun 23 02:07:44 control-plane-0 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-657bb47f985de9c41d0cb8273d3b6c80. Jun 23 02:07:44 control-plane-0 hyperkube[1385]: I0623 02:07:44.463809 1385 factory.go:173] Using factory "raw" for container "/system.slice/machine-config-daemon-reboot.service" Jun 23 02:07:44 control-plane-0 hyperkube[1385]: I0623 02:07:44.464273 1385 manager.go:950] Added container: "/system.slice/machine-config-daemon-reboot.service" (aliases: [], namespace: "") Jun 23 02:07:44 control-plane-0 hyperkube[1385]: I0623 02:07:44.464515 1385 container.go:467] Start housekeeping for container "/system.slice/machine-config-daemon-reboot.service" Jun 23 02:07:44 control-plane-0 systemd-logind[1269]: System is rebooting. Jun 23 02:07:44 control-plane-0 systemd[1]: Stopping machine-config-daemon: Node will reboot into config rendered-master-657bb47f985de9c41d0cb8273d3b6c80... Jun 23 02:07:44 control-plane-0 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-657bb47f985de9c41d0cb8273d3b6c80. Jun 23 02:07:44 control-plane-0 systemd[1]: machine-config-daemon-reboot.service: Consumed 26ms CPU time Jun 23 02:09:14 control-plane-0 systemd[1]: Starting Reboot... -- Reboot -- Jun 23 02:10:53 control-plane-0 hyperkube[1409]: I0623 02:10:53.766564 1409 event.go:281] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"control-plane-0", UID:"control-plane-0", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'Rebooted' Node control-plane-0 has been rebooted, boot id: 0bdf958e-ef27-4451-a18d-0685aaf29fe1 Jun 23 03:14:21 control-plane-0 root[173767]: machine-config-daemon[145700]: initiating reboot: Node will reboot into config rendered-master-cd63ef41104a49caa266c18da6ce18aa Jun 23 03:14:21 control-plane-0 hyperkube[1409]: I0623 03:14:21.752737 1409 factory.go:173] Using factory "raw" for container "/system.slice/machine-config-daemon-reboot.service" Jun 23 03:14:21 control-plane-0 hyperkube[1409]: I0623 03:14:21.753114 1409 manager.go:950] Added container: "/system.slice/machine-config-daemon-reboot.service" (aliases: [], namespace: "") Jun 23 03:14:21 control-plane-0 hyperkube[1409]: I0623 03:14:21.753340 1409 container.go:467] Start housekeeping for container "/system.slice/machine-config-daemon-reboot.service" Jun 23 03:14:21 control-plane-0 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-cd63ef41104a49caa266c18da6ce18aa. Jun 23 03:14:21 control-plane-0 systemd-logind[1281]: System is rebooting. Jun 23 03:14:21 control-plane-0 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-cd63ef41104a49caa266c18da6ce18aa. Jun 23 03:14:21 control-plane-0 systemd[1]: machine-config-daemon-reboot.service: Consumed 24ms CPU time Jun 23 03:15:52 control-plane-0 systemd[1]: Starting Reboot... -- Reboot -- Jun 23 03:17:17 control-plane-0 hyperkube[1496]: I0623 03:17:17.042669 1496 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"control-plane-0", UID:"control-plane-0", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'Rebooted' Node control-plane-0 has been rebooted, boot id: 78c53ec7-c0ec-49d6-8712-65b246b42c9a
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475