Bug 2035214

Summary: 4.7 install hangs. RHEL 7.9 worker stuck on "error enabling unit: Failed to execute operation: No such file or directory
Product: OpenShift Container Platform Reporter: Jatan Malde <jmalde>
Component: Machine Config OperatorAssignee: MCO Team <team-mco>
Machine Config Operator sub component: Machine Config Operator QA Contact: Rio Liu <rioliu>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs, dornelas, jerzhang, jkyros, mkrejci
Version: 4.7   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-14 18:43:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jatan Malde 2021-12-23 09:58:41 UTC
Description of problem:

I have a customer who recently upgrade a ocp 4 baremetal cluster from 4.6 to 4.7 successfully. They have RHEL worker nodes as well but those were on 1.17 kubernetes version and hence it was asked to delete the node, clean the node, install fresh os 7.9 rhel and use the scale up playbook. 

The playbook fails as the machine-config-daemon is not able to preset the systemd unit files. 

~~~
        "I1221 12:27:52.912779       1 update.go:1562] Writing systemd unit dropin \"mco-disabled.conf\"", 
        "I1221 12:27:52.965805       1 update.go:1633] Could not reset unit preset for zincati.service, skipping. (Error msg: error running preset on unit: Failed to execute operation: No such file or directory", 
        ")", 
        "I1221 12:27:52.966103       1 update.go:1562] Writing systemd unit dropin \"10-mco-default-env.conf\"", 
        "I1221 12:27:52.993304       1 update.go:1633] Could not reset unit preset for pivot.service, skipping. (Error msg: error running preset on unit: Failed to execute operation: No such file or directory", 
        ")", 
        "I1221 12:27:52.993559       1 update.go:1562] Writing systemd unit dropin \"10-ovsdb-restart.conf\"", 
        "I1221 12:27:52.996618       1 update.go:1562] Writing systemd unit dropin \"10-ovs-vswitchd-restart.conf\"", 
        "I1221 12:27:53.023010       1 update.go:1633] Could not reset unit preset for ovs-vswitchd.service, skipping. (Error msg: error running preset on unit: Failed to execute operation: No such file or directory", 
        ")", 
        "I1221 12:27:53.023264       1 update.go:1596] Writing systemd unit \"ovs-configuration.service\"", 
        "I1221 12:27:53.026921       1 update.go:1596] Writing systemd unit \"nodeip-configuration.service\"", 
        "I1221 12:27:53.030395       1 update.go:1596] Writing systemd unit \"node-valid-hostname.service\"", 
        "I1221 12:27:53.033854       1 update.go:1596] Writing systemd unit \"etc-NetworkManager-systemConnectionsMerged.mount\"", 
        "I1221 12:27:53.037211       1 update.go:1596] Writing systemd unit \"machine-config-daemon-pull.service\"", 
        "I1221 12:27:53.040602       1 update.go:1596] Writing systemd unit \"machine-config-daemon-firstboot.service\"", 
        "I1221 12:27:53.043655       1 update.go:1562] Writing systemd unit dropin \"10-mco-default-env.conf\"", 
        "I1221 12:27:53.046703       1 update.go:1562] Writing systemd unit dropin \"10-mco-default-madv.conf\"", 
        "I1221 12:27:53.049808       1 update.go:1596] Writing systemd unit \"kubelet.service\"", 
        "I1221 12:27:53.052936       1 update.go:1562] Writing systemd unit dropin \"mco-disabled.conf\"", 
        "I1221 12:27:53.079359       1 update.go:1633] Could not reset unit preset for docker.socket, skipping. (Error msg: error running preset on unit: Failed to execute operation: No such file or directory", 
        ")", 
        "I1221 12:27:53.079647       1 update.go:1562] Writing systemd unit dropin \"10-mco-default-env.conf\"", 
        "I1221 12:27:53.082802       1 update.go:1562] Writing systemd unit dropin \"10-mco-profile-unix-socket.conf\"", 
        "I1221 12:27:53.085720       1 update.go:1562] Writing systemd unit dropin \"10-mco-default-madv.conf\"", 
        "I1221 12:27:53.772725       1 update.go:1535] Preset systemd unit crio.service", 
        "F1221 12:27:54.214792       1 start.go:158] error enabling units: Failed to execute operation: No such file or directory"

~~~

The issue looks similar to the one reported in here,
https://bugzilla.redhat.com/show_bug.cgi?id=1913536

Version-Release number of MCO (Machine Config Operator) (if applicable):

Platform (AWS, VSphere, Metal, etc.):

Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)?
Yes

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

must-gather and ansible full logs are attached as well.

Comment 4 John Kyros 2022-01-10 17:39:17 UTC
Removing blocker flag -- given what we currently know it does not appear that this is a blocker. Will re-add if new information changes our understanding.

Comment 5 Yu Qi Zhang 2022-01-18 01:07:48 UTC
Also to add, the failure is not due to preset failing. You can see that the preset succeeded. The issue came when the MCO attempted to enable the units provided via Machineconfigs. It looks like some of the systemd units defined in the MCO to enable doesn't exist on the system.

Comment 6 Yu Qi Zhang 2022-02-14 18:43:19 UTC
Given that there has been no updates, will be closing as insufficient data. Please reopen if this is still an issue, thanks!