Bug 1989403

Summary: Machine-config degraded for unknown reason
Product: OpenShift Container Platform Reporter: Jaspreet Kaur <jkaur>
Component: Machine Config OperatorAssignee: MCO Bug Bot <mco-triage>
Machine Config Operator sub component: Machine Config Operator QA Contact: Rio Liu <rioliu>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: mkrejci, skumari, sregidor
Version: 4.6Flags: jerzhang: needinfo? (jkaur)
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-09 01:52:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1918440    
Bug Blocks:    

Description Jaspreet Kaur 2021-08-03 07:12:17 UTC
Description of problem : User recently upgraded where several issues were there initially which got resolved and then machine-config-daemon restart and when checking logs :

I0803 06:06:40.894442    3790 update.go:1270] /etc/systemd/system/multi-user.target.wants/node-valid-hostname.service already exists. Not making a new symlink
I0803 06:06:40.894466    3790 update.go:1357] Writing systemd unit "nodeip-configuration.service"
I0803 06:06:40.895754    3790 update.go:1290] /etc/systemd/system/multi-user.target.wants/nodeip-configuration.service was not present. No need to remove
I0803 06:06:40.895918    3790 update.go:1279] Enabled openvswitch.service
I0803 06:06:40.895939    3790 update.go:1357] Writing systemd unit "ovs-configuration.service"
I0803 06:06:40.897502    3790 update.go:1270] /etc/systemd/system/multi-user.target.wants/ovs-configuration.service already exists. Not making a new symlink
I0803 06:06:40.897524    3790 update.go:1323] Writing systemd unit dropin "10-ovs-vswitchd-restart.conf"
I0803 06:06:40.899093    3790 update.go:1323] Writing systemd unit dropin "10-ovsdb-restart.conf"
I0803 06:06:40.900679    3790 update.go:1279] Enabled ovsdb-server.service
I0803 06:06:40.900702    3790 update.go:1323] Writing systemd unit dropin "10-mco-default-env.conf"
I0803 06:06:40.902350    3790 update.go:1323] Writing systemd unit dropin "mco-disabled.conf"
I0803 06:06:40.904271    3790 update.go:1279] Enabled usbguard.service
I0803 06:06:40.904293    3790 update.go:1121] Deleting stale data
E0803 06:06:40.904426    3790 writer.go:135] Marking Degraded due to: exit status 1

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Upgrade cluster to 4.6
2. It failed for unknown reason

Actual results: Machine-config degarded for worker for unknown reason.

Expected results: Should have given messages for failure or there should be smooth upgrade.

Additional info:

Comment 3 Sinny Kumari 2021-08-17 14:50:44 UTC
From the MCD pod logs:
I0803 06:00:47.350972    3790 update.go:1676] Starting update from rendered-worker-7298552652d8f48f49c045e906ef5fa3 to rendered-worker-7298552652d8f48f49c045e906ef5fa3: &{osUpdate:false kargs:false fips:false passwd:false files:false units:false kernelType:false extensions:false}
I0803 06:04:00.049156    3790 update.go:1676] Running rpm-ostree [kargs --delete=audit_backlog_limit=8192 --delete=audit=1 --delete=nousb --delete=page_poison=1 --delete=pti=on --delete=vsyscall=none --append=audit_backlog_limit=8192 --append=audit=1 --append=nousb --append=page_poison=1 --append=pti=on --append=vsyscall=none]
I0803 06:04:00.285518    3790 update.go:375] Rolling back applied changes to OS due to error: exit status 1
I0803 06:04:00.285573    3790 rpm-ostree.go:261] Running captured: rpm-ostree cleanup -p

This seems like it is an instance of bug https://bugzilla.redhat.com/show_bug.cgi?id=1918440 , where we are un-necessarly updating kernel Args while there is no changes made.

Comment 8 errata-xmlrpc 2021-09-09 01:52:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.44 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.