Bug 1762868 - Updating 00-master can wedge (master never progresses)
Summary: Updating 00-master can wedge (master never progresses)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 4.3.0
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1763205
TreeView+ depends on / blocked
 
Reported: 2019-10-17 17:20 UTC by Clayton Coleman
Modified: 2020-01-23 11:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1763205 (view as bug list)
Environment:
Last Closed: 2020-01-23 11:07:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1189 0 'None' closed Bug 1762868: revert #1177 and fix common templates in MCs 2020-11-10 08:12:55 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:08:19 UTC

Description Clayton Coleman 2019-10-17 17:20:48 UTC
1. Stand up 4.3 cluster
2. Update 00-master and update the system: unit: machine-config-daemon-host.service Description to have "Comment" at the end
3. Observe a machine config rollout start
4. Observe machine drain and reboot

Expected:

config drives to completion

Actual:

mcd on first master is in an error loop, never completes.  Observed 00-master reverted, but didn't result in completion:

I1017 17:18:52.399955    3699 daemon.go:682] Current config: rendered-master-bcc64aa3038ab9329eea7ecc4a3023bf
I1017 17:18:52.399993    3699 daemon.go:683] Desired config: rendered-master-d51d2d1b5ace2ce0a43baad6b117ffe3
I1017 17:18:52.409401    3699 update.go:984] Disk currentConfig rendered-master-d51d2d1b5ace2ce0a43baad6b117ffe3 overrides node annotation rendered-master-bcc64aa3038ab9329eea7ecc4a3023bf
I1017 17:18:52.413259    3699 daemon.go:893] Validating against pending config rendered-master-d51d2d1b5ace2ce0a43baad6b117ffe3
E1017 17:18:52.416702    3699 daemon.go:1284] content mismatch for file /etc/systemd/system/machine-config-daemon-host.service: [Unit]
Description=Machine Config Daemon Initial

A: 
# This only applies to ostree (MCD) systems;
# see also https://github.com/openshift/machine-config-operator/issues/1046
ConditionPathExists=/run/ostree-booted
ConditionPathExists=/etc/pivot/image-pullspec
# If pivot exists, defer to it.  Note similar code in update.go
ConditionPathExists=!/usr/lib/systemd/system/pivot.service
After=ignition-firstboot-complete.service
Before=kubelet.service

[Service]
# Need oneshot to delay kubelet
Type=oneshot
# TODO add --from-etc-pullspec after ratcheting
ExecStart=/usr/libexec/machine-config-daemon pivot

[Install]
WantedBy=multi-user.target


B:  Comment
# This only applies to ostree (MCD) systems;
# see also https://github.com/openshift/machine-config-operator/issues/1046
ConditionPathExists=/run/ostree-booted
ConditionPathExists=/etc/pivot/image-pullspec
# If pivot exists, defer to it.  Note similar code in update.go
ConditionPathExists=!/usr/lib/systemd/system/pivot.service
After=ignition-firstboot-complete.service
Before=kubelet.service

[Service]
# Need oneshot to delay kubelet
Type=oneshot
# TODO add --from-etc-pullspec after ratcheting
ExecStart=/usr/libexec/machine-config-daemon pivot

[Install]
WantedBy=multi-user.target


E1017 17:18:52.416739    3699 writer.go:127] Marking Degraded due to: unexpected on-disk state validating against rendered-master-d51d2d1b5ace2ce0a43baad6b117ffe3

Comment 2 Antonio Murdaca 2019-10-18 11:28:13 UTC
oh, this is a bug in how we ship templates.... I'm fixing it...

Comment 3 Antonio Murdaca 2019-10-18 11:40:44 UTC
This is likely to have been introduced by https://github.com/openshift/machine-config-operator/commit/cfddeaf8e90c289a2648a346defb9443600d519d

I'm working on a fix.

Comment 5 Antonio Murdaca 2019-10-18 12:37:43 UTC
Also, this is because of the patch that went in to fix the upgrade issue and caused a regression. I'm fixing all this.

Comment 7 Michael Nguyen 2019-11-15 20:37:15 UTC
Verified on 4.3.0-0.nightly-2019-11-13-233341

1. oc edit mc/00-master
2. attempt to make any changes to machine-config-daemon-host.service systemd unit
3. verify that 00-master will reconcile immediately back to the base template (no changes are taken)

Comment 9 errata-xmlrpc 2020-01-23 11:07:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.