Bug 1846690 - upgrade from 4.4.6>4.5 fails: unexpected on-disk state validating against rendered-master
Summary: upgrade from 4.4.6>4.5 fails: unexpected on-disk state validating against ren...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.5.0
Assignee: Antonio Murdaca
QA Contact: Johnny Liu
URL:
Whiteboard:
: 1847351 (view as bug list)
Depends On: 1842906
Blocks: 1867886
TreeView+ depends on / blocked
 
Reported: 2020-06-12 17:21 UTC by OpenShift BugZilla Robot
Modified: 2020-08-11 06:46 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1867886 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:43:59 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1822 None closed [release-4.5] Bug 1846690: daemon: Remove encapsulated config when joining the cluster 2020-09-21 16:57:12 UTC
Github openshift machine-config-operator pull 1849 None closed [release-4.5] Bug 1846690: pkg/daemon: run 4.4 upgrade hack on MCD cluster boot 2020-09-21 16:57:12 UTC
Red Hat Product Errata RHBA-2020:2409 None None None 2020-07-13 17:44:25 UTC

Comment 3 Antonio Murdaca 2020-06-16 10:30:22 UTC
*** Bug 1847351 has been marked as a duplicate of this bug. ***

Comment 4 Johnny Liu 2020-06-17 06:01:08 UTC
Saw https://github.com/openshift/machine-config-operator/pull/1822 is already merged into 4.5.0-0.nightly-2020-06-16-213659.

Install a 4.4 cluster with "restart.service" using 4.4.0-0.nightly-2020-06-15-203705
Trigger upgrade towards 4.5.0-0.nightly-2020-06-16-213659.

The problem is still reproduced.

[root@preserve-jialiu-ansible ~]# oc get node
NAME                              STATUS                     ROLES           AGE   VERSION
jialiu441-6ssnn-control-plane-0   Ready                      master,worker   22h   v1.17.1+912792b
jialiu441-6ssnn-control-plane-1   Ready,SchedulingDisabled   master,worker   22h   v1.17.1+912792b
jialiu441-6ssnn-control-plane-2   Ready                      master,worker   22h   v1.17.1+912792b

[root@preserve-jialiu-ansible ~]# oc describe node|grep machineconfig
Annotations:        machineconfiguration.openshift.io/currentConfig: rendered-master-3a6779965724f02eb44e835f074bd238
                    machineconfiguration.openshift.io/desiredConfig: rendered-master-3a6779965724f02eb44e835f074bd238
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/ssh: accessed
                    machineconfiguration.openshift.io/state: Done
Annotations:        machineconfiguration.openshift.io/currentConfig: rendered-master-3a6779965724f02eb44e835f074bd238
                    machineconfiguration.openshift.io/desiredConfig: rendered-master-88d0f91d9987c81d0569a420f55b3552
                    machineconfiguration.openshift.io/reason: unexpected on-disk state validating against rendered-master-3a6779965724f02eb44e835f074bd238
                    machineconfiguration.openshift.io/state: Degraded
Annotations:        machineconfiguration.openshift.io/currentConfig: rendered-master-3a6779965724f02eb44e835f074bd238
                    machineconfiguration.openshift.io/desiredConfig: rendered-master-3a6779965724f02eb44e835f074bd238
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done

[root@jialiu441-6ssnn-control-plane-1 ~]# systemctl cat restart 
# /etc/systemd/system/restart.service
[Unit]
ConditionFirstBoot=yes
[Service]
Type=idle
ExecStart=/sbin/reboot
[Install]
WantedBy=multi-user.target

[root@jialiu441-6ssnn-control-plane-1 ~]# journalctl --system |grep -i reboot
Jun 16 07:23:04 host-10-0-97-148.openstacklocal ignition[1124]: INFO     : files: op(34): [started]  processing unit "mcd-write-pivot-reboot.service"
Jun 16 07:23:04 host-10-0-97-148.openstacklocal ignition[1124]: INFO     : files: op(34): op(35): [started]  writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service"
Jun 16 07:23:04 host-10-0-97-148.openstacklocal ignition[1124]: INFO     : files: op(34): op(35): [finished] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service"
Jun 16 07:23:04 host-10-0-97-148.openstacklocal ignition[1124]: INFO     : files: op(34): [finished] processing unit "mcd-write-pivot-reboot.service"
Jun 16 07:23:04 host-10-0-97-148.openstacklocal ignition[1124]: INFO     : files: op(36): [started]  enabling unit "mcd-write-pivot-reboot.service"
Jun 16 07:23:04 host-10-0-97-148.openstacklocal ignition[1124]: INFO     : files: op(36): [finished] enabling unit "mcd-write-pivot-reboot.service"
Jun 16 07:23:06 jialiu441-6ssnn-control-plane-1 ignition[1124]: INFO     : files: op(34): [started]  processing unit "mcd-write-pivot-reboot.service"
Jun 16 07:23:06 jialiu441-6ssnn-control-plane-1 ignition[1124]: INFO     : files: op(34): op(35): [started]  writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service"
Jun 16 07:23:06 jialiu441-6ssnn-control-plane-1 ignition[1124]: INFO     : files: op(34): op(35): [finished] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service"
Jun 16 07:23:06 jialiu441-6ssnn-control-plane-1 ignition[1124]: INFO     : files: op(34): [finished] processing unit "mcd-write-pivot-reboot.service"
Jun 16 07:23:06 jialiu441-6ssnn-control-plane-1 ignition[1124]: INFO     : files: op(36): [started]  enabling unit "mcd-write-pivot-reboot.service"
Jun 16 07:23:06 jialiu441-6ssnn-control-plane-1 ignition[1124]: INFO     : files: op(36): [finished] enabling unit "mcd-write-pivot-reboot.service"
Jun 16 07:23:07 jialiu441-6ssnn-control-plane-1 restorecon[1478]: Relabeled /etc/systemd/system/mcd-write-pivot-reboot.service from system_u:object_r:unlabeled_t:s0 to system_u:object_r:power_unit_file_t:s0
Jun 16 07:23:08 jialiu441-6ssnn-control-plane-1 systemd[1]: Started mcd-write-pivot-reboot.service.
Jun 16 07:23:08 jialiu441-6ssnn-control-plane-1 systemd[1]: mcd-write-pivot-reboot.service: Consumed 11ms CPU time
Jun 16 07:23:13 jialiu441-6ssnn-control-plane-1 systemd[1]: Starting Reboot...
-- Reboot --
Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 root[13776]: machine-config-daemon[5042]: initiating reboot: Node will reboot into config rendered-master-3a6779965724f02eb44e835f074bd238
Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 hyperkube[1418]: I0616 07:28:59.798844    1418 factory.go:173] Using factory "raw" for container "/system.slice/machine-config-daemon-reboot.service"
Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 hyperkube[1418]: I0616 07:28:59.799287    1418 manager.go:950] Added container: "/system.slice/machine-config-daemon-reboot.service" (aliases: [], namespace: "")
Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 hyperkube[1418]: I0616 07:28:59.799510    1418 container.go:467] Start housekeeping for container "/system.slice/machine-config-daemon-reboot.service"
Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-3a6779965724f02eb44e835f074bd238.
Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 systemd-logind[1296]: System is rebooting.
Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-3a6779965724f02eb44e835f074bd238.
Jun 16 07:28:59 jialiu441-6ssnn-control-plane-1 systemd[1]: machine-config-daemon-reboot.service: Consumed 22ms CPU time
Jun 16 07:30:30 jialiu441-6ssnn-control-plane-1 systemd[1]: Starting Reboot...
-- Reboot --
Jun 16 07:31:47 jialiu441-6ssnn-control-plane-1 hyperkube[1430]: I0616 07:31:47.294622    1430 event.go:281] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"jialiu441-6ssnn-control-plane-1", UID:"jialiu441-6ssnn-control-plane-1", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'Rebooted' Node jialiu441-6ssnn-control-plane-1 has been rebooted, boot id: 6da77e17-0757-457e-982b-b661b2f727e9
Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 root[3610152]: machine-config-daemon[3594232]: initiating reboot: Node will reboot into config rendered-master-88d0f91d9987c81d0569a420f55b3552
Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 hyperkube[1430]: I0617 03:56:44.351852    1430 factory.go:173] Using factory "raw" for container "/system.slice/machine-config-daemon-reboot.service"
Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 hyperkube[1430]: I0617 03:56:44.352565    1430 manager.go:950] Added container: "/system.slice/machine-config-daemon-reboot.service" (aliases: [], namespace: "")
Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 hyperkube[1430]: I0617 03:56:44.354485    1430 container.go:467] Start housekeeping for container "/system.slice/machine-config-daemon-reboot.service"
Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-88d0f91d9987c81d0569a420f55b3552.
Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 systemd-logind[1294]: System is rebooting.
Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-88d0f91d9987c81d0569a420f55b3552.
Jun 17 03:56:44 jialiu441-6ssnn-control-plane-1 systemd[1]: machine-config-daemon-reboot.service: Consumed 22ms CPU time
Jun 17 03:58:14 jialiu441-6ssnn-control-plane-1 systemd[1]: Starting Reboot...
-- Reboot --
Jun 17 03:59:56 jialiu441-6ssnn-control-plane-1 machine-config-daemon[1348]: I0617 03:59:56.813518    1348 update.go:1404] initiating reboot: Completing firstboot provisioning to rendered-master-3a6779965724f02eb44e835f074bd238
Jun 17 03:59:56 jialiu441-6ssnn-control-plane-1 root[1585]: machine-config-daemon[1348]: initiating reboot: Completing firstboot provisioning to rendered-master-3a6779965724f02eb44e835f074bd238
Jun 17 03:59:56 jialiu441-6ssnn-control-plane-1 systemd-logind[1326]: System is rebooting.
Jun 17 03:59:56 jialiu441-6ssnn-control-plane-1 systemd[1]: machine-config-daemon-reboot.service: Consumed 25ms CPU time
Jun 17 04:00:03 jialiu441-6ssnn-control-plane-1 systemd[1]: Starting Reboot...
-- Reboot --
Jun 17 04:00:58 jialiu441-6ssnn-control-plane-1 hyperkube[1444]: I0617 04:00:58.498405    1444 event.go:281] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"jialiu441-6ssnn-control-plane-1", UID:"jialiu441-6ssnn-control-plane-1", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'Rebooted' Node jialiu441-6ssnn-control-plane-1 has been rebooted, boot id: 22f89695-1e50-4517-8da4-e1249d6566f7

Comment 5 Colin Walters 2020-06-17 12:34:45 UTC
Can you get more logs (full journals plus the MCD logs) or just add a kubeconfig?

Comment 8 Antonio Murdaca 2020-06-18 08:41:47 UTC
(In reply to Colin Walters from comment #5)
> Can you get more logs (full journals plus the MCD logs) or just add a
> kubeconfig?

Johnny can you help with the above?

Comment 11 Antonio Murdaca 2020-06-18 12:11:40 UTC
I'm now confused as to how the 4.6 BZ got verified then.

	if state.bootstrapping {

The if above, where we inserted the removal of the encapsulated json isn't called during an upgrade. So, what it seems to be happening is correct and we never call the new hack code and thus the encapsulated json is still around and causes this issue.

Should we just, uhm, get rid of that file first thing (bootstrapping or not) when we're running the MCD?

Comment 12 Antonio Murdaca 2020-06-18 12:19:53 UTC
Created https://github.com/openshift/machine-config-operator/pull/1844 (which targets the 4.6 BZ now) for verification

Comment 13 Antonio Murdaca 2020-06-18 12:30:08 UTC
I think the verification that happened https://bugzilla.redhat.com/show_bug.cgi?id=1842906#c55 was a lucky one:

Jun 15 14:46:48 jialiu452-p9652-control-plane-2 machine-config-daemon[1345]: I0615 14:46:48.011423    1345 update.go:1346] initiating reboot: Completing firstboot provisioning to rendered-master-5c6dfc33b0a6115a3a3c25831d6c8009

This means we got there and weren't killed on the way there so the encapsulated json is now gone (which is correct)

The code in the PR that merged for the 4.6 bug couldn't had an opportunity to run since we were effectively upgrading and the PR changes the very first initial bootstrap which happens only on installation (maybe I'm off tho, so, seeking further confirmations :))

Comment 14 Johnny Liu 2020-06-18 13:54:04 UTC
(In reply to Antonio Murdaca from comment #11)
> I'm now confused as to how the 4.6 BZ got verified then.
We was running 4.5 -> 4.6 upgrade, per QE's black-box test result, it was proved such issue does not happen in 4.5 -> 4.6, but only 4.4 -> 4.5, so in the next round of testing, QE will only verify the fix PR in 4.6 bug (4.5 -> 4.6 upgrade) does not introduce any problem, so that unblock backport to 4.5, will run real functional verification in this bug (4.4 -> 4.5 upgrade).

Comment 15 Antonio Murdaca 2020-06-18 13:57:02 UTC
(In reply to Johnny Liu from comment #14)
> (In reply to Antonio Murdaca from comment #11)
> > I'm now confused as to how the 4.6 BZ got verified then.
> We was running 4.5 -> 4.6 upgrade, per QE's black-box test result, it was
> proved such issue does not happen in 4.5 -> 4.6, but only 4.4 -> 4.5, so in
> the next round of testing, QE will only verify the fix PR in 4.6 bug (4.5 ->
> 4.6 upgrade) does not introduce any problem, so that unblock backport to
> 4.5, will run real functional verification in this bug (4.4 -> 4.5 upgrade).

ohh, I see thanks for clarifying that +1

We're going ahead with https://github.com/openshift/machine-config-operator/pull/1844 then to make the 4.6 BZ happy again https://bugzilla.redhat.com/show_bug.cgi?id=1842906
Once that merges, we'll have a fix ready for this 4.4->4.5 BZ 

sounds good?

Comment 16 Johnny Liu 2020-06-18 14:00:23 UTC
> sounds good?
+1

Comment 17 Sinny Kumari 2020-06-18 14:23:31 UTC
(In reply to Antonio Murdaca from comment #11)
> I'm now confused as to how the 4.6 BZ got verified then.

Didn't notice this earlier. 4.6 BZ was verified by upgrading 4.5->4.6 (https://bugzilla.redhat.com/show_bug.cgi?id=1842906#c55). We have removed BindsTo= in 4.5 https://github.com/openshift/machine-config-operator/pull/1762/files#diff-4eb88ea6718e4dcceecd6dfe6e4437e1L8 .So, m-c-d-firstboot service would have completed when 4.5 cluster was installed with an additional reboot(due to reboot.service). Best way to verify (https://bugzilla.redhat.com/show_bug.cgi?id=1842906 would be to upgrade from 4.4->4.6 in the environment where bug is happening.

Comment 19 Johnny Liu 2020-06-22 13:32:46 UTC
Verified this bug with 4.5.0-0.nightly-2020-06-20-194346, and PASS.

[jialiu@mylaptop ~]$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.4.0-0.nightly-2020-06-22-065157   True        False         False      26m
cloud-credential                           4.4.0-0.nightly-2020-06-22-065157   True        False         False      60m
cluster-autoscaler                         4.4.0-0.nightly-2020-06-22-065157   True        False         False      29m
console                                    4.4.0-0.nightly-2020-06-22-065157   True        False         False      25m
csi-snapshot-controller                    4.4.0-0.nightly-2020-06-22-065157   True        False         False      33m
dns                                        4.4.0-0.nightly-2020-06-22-065157   True        False         False      56m
etcd                                       4.4.0-0.nightly-2020-06-22-065157   True        False         False      52m
image-registry                             4.4.0-0.nightly-2020-06-22-065157   True        False         False      32m
ingress                                    4.4.0-0.nightly-2020-06-22-065157   True        False         False      32m
insights                                   4.4.0-0.nightly-2020-06-22-065157   True        False         False      33m
kube-apiserver                             4.4.0-0.nightly-2020-06-22-065157   True        False         False      48m
kube-controller-manager                    4.4.0-0.nightly-2020-06-22-065157   True        False         False      51m
kube-scheduler                             4.4.0-0.nightly-2020-06-22-065157   True        False         False      33m
kube-storage-version-migrator              4.4.0-0.nightly-2020-06-22-065157   True        False         False      57m
machine-api                                4.4.0-0.nightly-2020-06-22-065157   True        False         False      33m
machine-config                             4.4.0-0.nightly-2020-06-22-065157   True        False         False      33m
marketplace                                4.4.0-0.nightly-2020-06-22-065157   True        False         False      32m
monitoring                                 4.4.0-0.nightly-2020-06-22-065157   True        False         False      26m
network                                    4.4.0-0.nightly-2020-06-22-065157   True        False         False      58m
node-tuning                                4.4.0-0.nightly-2020-06-22-065157   True        False         False      58m
openshift-apiserver                        4.4.0-0.nightly-2020-06-22-065157   True        False         False      33m
openshift-controller-manager               4.4.0-0.nightly-2020-06-22-065157   True        False         False      30m
openshift-samples                          4.4.0-0.nightly-2020-06-22-065157   True        False         False      29m
operator-lifecycle-manager                 4.4.0-0.nightly-2020-06-22-065157   True        False         False      57m
operator-lifecycle-manager-catalog         4.4.0-0.nightly-2020-06-22-065157   True        False         False      57m
operator-lifecycle-manager-packageserver   4.4.0-0.nightly-2020-06-22-065157   True        False         False      31m
service-ca                                 4.4.0-0.nightly-2020-06-22-065157   True        False         False      57m
service-catalog-apiserver                  4.4.0-0.nightly-2020-06-22-065157   True        False         False      58m
service-catalog-controller-manager         4.4.0-0.nightly-2020-06-22-065157   True        False         False      58m
storage                                    4.4.0-0.nightly-2020-06-22-065157   True        False         False      33m

[jialiu@mylaptop ~]$ oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-06-20-194346 --allow-explicit-upgrade --force
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-06-20-194346

[jialiu@mylaptop ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-06-22-065157   True        True          18s     Working towards registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-06-20-194346: downloading update

[jialiu@mylaptop ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-06-20-194346   True        False         112m    Cluster version is 4.5.0-0.nightly-2020-06-20-194346

[jialiu@mylaptop ~]$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h2m
cloud-credential                           4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h36m
cluster-autoscaler                         4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h5m
config-operator                            4.5.0-0.nightly-2020-06-20-194346   True        False         False      153m
console                                    4.5.0-0.nightly-2020-06-20-194346   True        False         False      121m
csi-snapshot-controller                    4.5.0-0.nightly-2020-06-20-194346   True        False         False      126m
dns                                        4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h32m
etcd                                       4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h28m
image-registry                             4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h8m
ingress                                    4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h8m
insights                                   4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h9m
kube-apiserver                             4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h24m
kube-controller-manager                    4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h27m
kube-scheduler                             4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h9m
kube-storage-version-migrator              4.5.0-0.nightly-2020-06-20-194346   True        False         False      127m
machine-api                                4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h9m
machine-approver                           4.5.0-0.nightly-2020-06-20-194346   True        False         False      148m
machine-config                             4.5.0-0.nightly-2020-06-20-194346   True        False         False      116m
marketplace                                4.5.0-0.nightly-2020-06-20-194346   True        False         False      120m
monitoring                                 4.5.0-0.nightly-2020-06-20-194346   True        False         False      145m
network                                    4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h34m
node-tuning                                4.5.0-0.nightly-2020-06-20-194346   True        False         False      148m
openshift-apiserver                        4.5.0-0.nightly-2020-06-20-194346   True        False         False      124m
openshift-controller-manager               4.5.0-0.nightly-2020-06-20-194346   True        False         False      146m
openshift-samples                          4.5.0-0.nightly-2020-06-20-194346   True        False         False      148m
operator-lifecycle-manager                 4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h33m
operator-lifecycle-manager-catalog         4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h33m
operator-lifecycle-manager-packageserver   4.5.0-0.nightly-2020-06-20-194346   True        False         False      121m
service-ca                                 4.5.0-0.nightly-2020-06-20-194346   True        False         False      3h34m
storage                                    4.5.0-0.nightly-2020-06-20-194346   True        False         False      148m

[core@jialiu44-dwtvq-control-plane-0 ~]$ journalctl --system|grep -i reboot
Jun 22 09:53:35 host-10-0-97-126.openstacklocal ignition[1122]: INFO     : files: op(34): [started]  processing unit "mcd-write-pivot-reboot.service"
Jun 22 09:53:35 host-10-0-97-126.openstacklocal ignition[1122]: INFO     : files: op(34): op(35): [started]  writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service"
Jun 22 09:53:35 host-10-0-97-126.openstacklocal ignition[1122]: INFO     : files: op(34): op(35): [finished] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service"
Jun 22 09:53:35 host-10-0-97-126.openstacklocal ignition[1122]: INFO     : files: op(34): [finished] processing unit "mcd-write-pivot-reboot.service"
Jun 22 09:53:35 host-10-0-97-126.openstacklocal ignition[1122]: INFO     : files: op(36): [started]  enabling unit "mcd-write-pivot-reboot.service"
Jun 22 09:53:35 host-10-0-97-126.openstacklocal ignition[1122]: INFO     : files: op(36): [finished] enabling unit "mcd-write-pivot-reboot.service"
Jun 22 09:53:39 jialiu44-dwtvq-control-plane-0 restorecon[1478]: Relabeled /etc/systemd/system/mcd-write-pivot-reboot.service from system_u:object_r:unlabeled_t:s0 to system_u:object_r:power_unit_file_t:s0
Jun 22 09:53:39 jialiu44-dwtvq-control-plane-0 systemd[1]: Started mcd-write-pivot-reboot.service.
Jun 22 09:53:39 jialiu44-dwtvq-control-plane-0 systemd[1]: mcd-write-pivot-reboot.service: Consumed 4ms CPU time
Jun 22 09:53:45 jialiu44-dwtvq-control-plane-0 systemd[1]: Starting Reboot...
-- Reboot --
Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 root[14273]: machine-config-daemon[7242]: initiating reboot: Node will reboot into config rendered-master-994a41b33fe4832aa73597dc0b31323e
Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 hyperkube[1414]: I0622 09:58:52.412149    1414 factory.go:173] Using factory "raw" for container "/system.slice/machine-config-daemon-reboot.service"
Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 hyperkube[1414]: I0622 09:58:52.412687    1414 manager.go:950] Added container: "/system.slice/machine-config-daemon-reboot.service" (aliases: [], namespace: "")
Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 hyperkube[1414]: I0622 09:58:52.413024    1414 container.go:467] Start housekeeping for container "/system.slice/machine-config-daemon-reboot.service"
Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-994a41b33fe4832aa73597dc0b31323e.
Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 systemd-logind[1288]: System is rebooting.
Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 systemd[1]: Stopping machine-config-daemon: Node will reboot into config rendered-master-994a41b33fe4832aa73597dc0b31323e...
Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-994a41b33fe4832aa73597dc0b31323e.
Jun 22 09:58:52 jialiu44-dwtvq-control-plane-0 systemd[1]: machine-config-daemon-reboot.service: Consumed 21ms CPU time
Jun 22 10:00:22 jialiu44-dwtvq-control-plane-0 systemd[1]: Starting Reboot...
-- Reboot --
Jun 22 10:02:17 jialiu44-dwtvq-control-plane-0 hyperkube[1434]: I0622 10:02:17.624501    1434 event.go:281] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"jialiu44-dwtvq-control-plane-0", UID:"jialiu44-dwtvq-control-plane-0", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'Rebooted' Node jialiu44-dwtvq-control-plane-0 has been rebooted, boot id: 98f8dab9-117f-4730-8f35-b430f9b0dc7a
Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 root[282527]: machine-config-daemon[228455]: initiating reboot: Node will reboot into config rendered-master-c564bd7f1ddf708bc43523a3cb2ac4f3
Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 hyperkube[1434]: I0622 11:26:10.295287    1434 factory.go:173] Using factory "raw" for container "/system.slice/machine-config-daemon-reboot.service"
Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 hyperkube[1434]: I0622 11:26:10.295930    1434 manager.go:950] Added container: "/system.slice/machine-config-daemon-reboot.service" (aliases: [], namespace: "")
Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 hyperkube[1434]: I0622 11:26:10.296199    1434 container.go:467] Start housekeeping for container "/system.slice/machine-config-daemon-reboot.service"
Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-c564bd7f1ddf708bc43523a3cb2ac4f3.
Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 systemd-logind[1297]: System is rebooting.
Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 systemd[1]: Stopping machine-config-daemon: Node will reboot into config rendered-master-c564bd7f1ddf708bc43523a3cb2ac4f3...
Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-c564bd7f1ddf708bc43523a3cb2ac4f3.
Jun 22 11:26:10 jialiu44-dwtvq-control-plane-0 systemd[1]: machine-config-daemon-reboot.service: Consumed 24ms CPU time
Jun 22 11:27:40 jialiu44-dwtvq-control-plane-0 systemd[1]: Starting Reboot...
-- Reboot --
Jun 22 11:28:43 jialiu44-dwtvq-control-plane-0 hyperkube[1491]: I0622 11:28:43.785412    1491 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"jialiu44-dwtvq-control-plane-0", UID:"jialiu44-dwtvq-control-plane-0", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'Rebooted' Node jialiu44-dwtvq-control-plane-0 has been rebooted, boot id: 87e3739b-0984-4b9d-bfc2-06468d7ae4df

Comment 20 Johnny Liu 2020-06-22 13:35:28 UTC
Comment 19 is run against a upi on baremetal cluster, tomorrow will run a upgrade against upi on vsphere for a double confirm.

Comment 21 jima 2020-06-23 03:42:27 UTC
Verified the bug with 4.5.0-0.nightly-2020-06-22-201506 (upgrade from 4.4.9) and passed.

$ oc adm upgrade --to-image registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-06-22-201506 --force --allow-explicit-upgrade
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.5.0-0.nightly-2020-06-22-201506

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.9     True        True          17m     Working towards 4.5.0-0.nightly-2020-06-22-201506: 79% complete

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-06-22-201506   True        False         13m     Cluster version is 4.5.0-0.nightly-2020-06-22-201506

$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.5.0-0.nightly-2020-06-22-201506   True        False         False      66m
cloud-credential                           4.5.0-0.nightly-2020-06-22-201506   True        False         False      85m
cluster-autoscaler                         4.5.0-0.nightly-2020-06-22-201506   True        False         False      73m
config-operator                            4.5.0-0.nightly-2020-06-22-201506   True        False         False      48m
console                                    4.5.0-0.nightly-2020-06-22-201506   True        False         False      15m
csi-snapshot-controller                    4.5.0-0.nightly-2020-06-22-201506   True        False         False      11m
dns                                        4.5.0-0.nightly-2020-06-22-201506   True        False         False      77m
etcd                                       4.5.0-0.nightly-2020-06-22-201506   True        False         False      77m
image-registry                             4.5.0-0.nightly-2020-06-22-201506   True        False         False      12m
ingress                                    4.5.0-0.nightly-2020-06-22-201506   True        False         False      69m
insights                                   4.5.0-0.nightly-2020-06-22-201506   True        False         False      74m
kube-apiserver                             4.5.0-0.nightly-2020-06-22-201506   True        False         False      76m
kube-controller-manager                    4.5.0-0.nightly-2020-06-22-201506   True        False         False      80m
kube-scheduler                             4.5.0-0.nightly-2020-06-22-201506   True        False         False      76m
kube-storage-version-migrator              4.5.0-0.nightly-2020-06-22-201506   True        False         False      12m
machine-api                                4.5.0-0.nightly-2020-06-22-201506   True        False         False      74m
machine-approver                           4.5.0-0.nightly-2020-06-22-201506   True        False         False      40m
machine-config                             4.5.0-0.nightly-2020-06-22-201506   True        False         False      4m49s
marketplace                                4.5.0-0.nightly-2020-06-22-201506   True        False         False      14m
monitoring                                 4.5.0-0.nightly-2020-06-22-201506   True        False         False      37m
network                                    4.5.0-0.nightly-2020-06-22-201506   True        False         False      83m
node-tuning                                4.5.0-0.nightly-2020-06-22-201506   True        False         False      40m
openshift-apiserver                        4.5.0-0.nightly-2020-06-22-201506   True        False         False      39m
openshift-controller-manager               4.5.0-0.nightly-2020-06-22-201506   True        False         False      70m
openshift-samples                          4.5.0-0.nightly-2020-06-22-201506   True        False         False      39m
operator-lifecycle-manager                 4.5.0-0.nightly-2020-06-22-201506   True        False         False      82m
operator-lifecycle-manager-catalog         4.5.0-0.nightly-2020-06-22-201506   True        False         False      82m
operator-lifecycle-manager-packageserver   4.5.0-0.nightly-2020-06-22-201506   True        False         False      14m
service-ca                                 4.5.0-0.nightly-2020-06-22-201506   True        False         False      82m
storage                                    4.5.0-0.nightly-2020-06-22-201506   True        False         False      40m

[root@control-plane-0 ~]# journalctl --system | grep -i reboot
Jun 23 02:01:40 localhost ignition[1108]: INFO     : files: op(3f): [started]  processing unit "mcd-write-pivot-reboot.service"
Jun 23 02:01:40 localhost ignition[1108]: INFO     : files: op(3f): op(40): [started]  writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service"
Jun 23 02:01:40 localhost ignition[1108]: INFO     : files: op(3f): op(40): [finished] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service"
Jun 23 02:01:40 localhost ignition[1108]: INFO     : files: op(3f): [finished] processing unit "mcd-write-pivot-reboot.service"
Jun 23 02:01:40 localhost ignition[1108]: INFO     : files: op(41): [started]  enabling unit "mcd-write-pivot-reboot.service"
Jun 23 02:01:40 localhost ignition[1108]: INFO     : files: op(41): [finished] enabling unit "mcd-write-pivot-reboot.service"
Jun 23 02:01:42 control-plane-0 ignition[1108]: INFO     : files: op(3f): [started]  processing unit "mcd-write-pivot-reboot.service"
Jun 23 02:01:42 control-plane-0 ignition[1108]: INFO     : files: op(3f): op(40): [started]  writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service"
Jun 23 02:01:42 control-plane-0 ignition[1108]: INFO     : files: op(3f): op(40): [finished] writing unit "mcd-write-pivot-reboot.service" at "etc/systemd/system/mcd-write-pivot-reboot.service"
Jun 23 02:01:42 control-plane-0 ignition[1108]: INFO     : files: op(3f): [finished] processing unit "mcd-write-pivot-reboot.service"
Jun 23 02:01:42 control-plane-0 ignition[1108]: INFO     : files: op(41): [started]  enabling unit "mcd-write-pivot-reboot.service"
Jun 23 02:01:42 control-plane-0 ignition[1108]: INFO     : files: op(41): [finished] enabling unit "mcd-write-pivot-reboot.service"
Jun 23 02:01:43 control-plane-0 restorecon[1437]: Relabeled /etc/systemd/system/mcd-write-pivot-reboot.service from system_u:object_r:unlabeled_t:s0 to system_u:object_r:power_unit_file_t:s0
Jun 23 02:01:43 control-plane-0 systemd[1]: Started mcd-write-pivot-reboot.service.
Jun 23 02:01:43 control-plane-0 systemd[1]: mcd-write-pivot-reboot.service: Consumed 6ms CPU time
Jun 23 02:01:49 control-plane-0 systemd[1]: Starting Reboot...
-- Reboot --
Jun 23 02:07:44 control-plane-0 root[11282]: machine-config-daemon[5706]: initiating reboot: Node will reboot into config rendered-master-657bb47f985de9c41d0cb8273d3b6c80
Jun 23 02:07:44 control-plane-0 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-657bb47f985de9c41d0cb8273d3b6c80.
Jun 23 02:07:44 control-plane-0 hyperkube[1385]: I0623 02:07:44.463809    1385 factory.go:173] Using factory "raw" for container "/system.slice/machine-config-daemon-reboot.service"
Jun 23 02:07:44 control-plane-0 hyperkube[1385]: I0623 02:07:44.464273    1385 manager.go:950] Added container: "/system.slice/machine-config-daemon-reboot.service" (aliases: [], namespace: "")
Jun 23 02:07:44 control-plane-0 hyperkube[1385]: I0623 02:07:44.464515    1385 container.go:467] Start housekeeping for container "/system.slice/machine-config-daemon-reboot.service"
Jun 23 02:07:44 control-plane-0 systemd-logind[1269]: System is rebooting.
Jun 23 02:07:44 control-plane-0 systemd[1]: Stopping machine-config-daemon: Node will reboot into config rendered-master-657bb47f985de9c41d0cb8273d3b6c80...
Jun 23 02:07:44 control-plane-0 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-657bb47f985de9c41d0cb8273d3b6c80.
Jun 23 02:07:44 control-plane-0 systemd[1]: machine-config-daemon-reboot.service: Consumed 26ms CPU time
Jun 23 02:09:14 control-plane-0 systemd[1]: Starting Reboot...
-- Reboot --
Jun 23 02:10:53 control-plane-0 hyperkube[1409]: I0623 02:10:53.766564    1409 event.go:281] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"control-plane-0", UID:"control-plane-0", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'Rebooted' Node control-plane-0 has been rebooted, boot id: 0bdf958e-ef27-4451-a18d-0685aaf29fe1
Jun 23 03:14:21 control-plane-0 root[173767]: machine-config-daemon[145700]: initiating reboot: Node will reboot into config rendered-master-cd63ef41104a49caa266c18da6ce18aa
Jun 23 03:14:21 control-plane-0 hyperkube[1409]: I0623 03:14:21.752737    1409 factory.go:173] Using factory "raw" for container "/system.slice/machine-config-daemon-reboot.service"
Jun 23 03:14:21 control-plane-0 hyperkube[1409]: I0623 03:14:21.753114    1409 manager.go:950] Added container: "/system.slice/machine-config-daemon-reboot.service" (aliases: [], namespace: "")
Jun 23 03:14:21 control-plane-0 hyperkube[1409]: I0623 03:14:21.753340    1409 container.go:467] Start housekeeping for container "/system.slice/machine-config-daemon-reboot.service"
Jun 23 03:14:21 control-plane-0 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-cd63ef41104a49caa266c18da6ce18aa.
Jun 23 03:14:21 control-plane-0 systemd-logind[1281]: System is rebooting.
Jun 23 03:14:21 control-plane-0 systemd[1]: Stopped machine-config-daemon: Node will reboot into config rendered-master-cd63ef41104a49caa266c18da6ce18aa.
Jun 23 03:14:21 control-plane-0 systemd[1]: machine-config-daemon-reboot.service: Consumed 24ms CPU time
Jun 23 03:15:52 control-plane-0 systemd[1]: Starting Reboot...
-- Reboot --
Jun 23 03:17:17 control-plane-0 hyperkube[1496]: I0623 03:17:17.042669    1496 event.go:278] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"control-plane-0", UID:"control-plane-0", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'Rebooted' Node control-plane-0 has been rebooted, boot id: 78c53ec7-c0ec-49d6-8712-65b246b42c9a

Comment 22 errata-xmlrpc 2020-07-13 17:43:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.