Bug 1762868

Summary: Updating 00-master can wedge (master never progresses)
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: Machine Config OperatorAssignee: Antonio Murdaca <amurdaca>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.3.0CC: rfairley, smilner
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1763205 (view as bug list) Environment:
Last Closed: 2020-01-23 11:07:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1763205    

Description Clayton Coleman 2019-10-17 17:20:48 UTC
1. Stand up 4.3 cluster
2. Update 00-master and update the system: unit: machine-config-daemon-host.service Description to have "Comment" at the end
3. Observe a machine config rollout start
4. Observe machine drain and reboot

Expected:

config drives to completion

Actual:

mcd on first master is in an error loop, never completes.  Observed 00-master reverted, but didn't result in completion:

I1017 17:18:52.399955    3699 daemon.go:682] Current config: rendered-master-bcc64aa3038ab9329eea7ecc4a3023bf
I1017 17:18:52.399993    3699 daemon.go:683] Desired config: rendered-master-d51d2d1b5ace2ce0a43baad6b117ffe3
I1017 17:18:52.409401    3699 update.go:984] Disk currentConfig rendered-master-d51d2d1b5ace2ce0a43baad6b117ffe3 overrides node annotation rendered-master-bcc64aa3038ab9329eea7ecc4a3023bf
I1017 17:18:52.413259    3699 daemon.go:893] Validating against pending config rendered-master-d51d2d1b5ace2ce0a43baad6b117ffe3
E1017 17:18:52.416702    3699 daemon.go:1284] content mismatch for file /etc/systemd/system/machine-config-daemon-host.service: [Unit]
Description=Machine Config Daemon Initial

A: 
# This only applies to ostree (MCD) systems;
# see also https://github.com/openshift/machine-config-operator/issues/1046
ConditionPathExists=/run/ostree-booted
ConditionPathExists=/etc/pivot/image-pullspec
# If pivot exists, defer to it.  Note similar code in update.go
ConditionPathExists=!/usr/lib/systemd/system/pivot.service
After=ignition-firstboot-complete.service
Before=kubelet.service

[Service]
# Need oneshot to delay kubelet
Type=oneshot
# TODO add --from-etc-pullspec after ratcheting
ExecStart=/usr/libexec/machine-config-daemon pivot

[Install]
WantedBy=multi-user.target


B:  Comment
# This only applies to ostree (MCD) systems;
# see also https://github.com/openshift/machine-config-operator/issues/1046
ConditionPathExists=/run/ostree-booted
ConditionPathExists=/etc/pivot/image-pullspec
# If pivot exists, defer to it.  Note similar code in update.go
ConditionPathExists=!/usr/lib/systemd/system/pivot.service
After=ignition-firstboot-complete.service
Before=kubelet.service

[Service]
# Need oneshot to delay kubelet
Type=oneshot
# TODO add --from-etc-pullspec after ratcheting
ExecStart=/usr/libexec/machine-config-daemon pivot

[Install]
WantedBy=multi-user.target


E1017 17:18:52.416739    3699 writer.go:127] Marking Degraded due to: unexpected on-disk state validating against rendered-master-d51d2d1b5ace2ce0a43baad6b117ffe3

Comment 2 Antonio Murdaca 2019-10-18 11:28:13 UTC
oh, this is a bug in how we ship templates.... I'm fixing it...

Comment 3 Antonio Murdaca 2019-10-18 11:40:44 UTC
This is likely to have been introduced by https://github.com/openshift/machine-config-operator/commit/cfddeaf8e90c289a2648a346defb9443600d519d

I'm working on a fix.

Comment 5 Antonio Murdaca 2019-10-18 12:37:43 UTC
Also, this is because of the patch that went in to fix the upgrade issue and caused a regression. I'm fixing all this.

Comment 7 Michael Nguyen 2019-11-15 20:37:15 UTC
Verified on 4.3.0-0.nightly-2019-11-13-233341

1. oc edit mc/00-master
2. attempt to make any changes to machine-config-daemon-host.service systemd unit
3. verify that 00-master will reconcile immediately back to the base template (no changes are taken)

Comment 9 errata-xmlrpc 2020-01-23 11:07:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062