Bug 1829713

Summary: [RHEL7.9] cloud-final.service failed when try "systemctl try-reload-or-restart NetworkManager.service"
Product: Red Hat Enterprise Linux 7 Reporter: Frank Liang <xiliang>
Component: cloud-initAssignee: Eduardo Otubo <eterrell>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.9CC: eterrell, huzhao, jen, jgreguske, jnewbigin, leiwang, linl, pvlasin, ribarry, virt-maint, vkuznets, xiachen, ymao
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-04 12:54:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Frank Liang 2020-04-30 07:33:48 UTC
In the latest RHEL-7.9-20200430.n.0,  cloud-final.service failed to start.
In RHEL8, systemctl supports "try-reload-or-restart", but in RHEL7, it should be "reload-or-try-restart ".

[root@ip-10-116-2-72 system]# rpm -qa|grep cloud-init
cloud-init-19.4-5.el7.x86_64
[root@ip-10-116-2-72 system]# systemctl status cloud-final.service
● cloud-final.service - Execute cloud user/final scripts
   Loaded: loaded (/usr/lib/systemd/system/cloud-final.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2020-04-30 06:38:11 UTC; 44min ago
  Process: 1523 ExecStartPost=/usr/bin/systemctl try-reload-or-restart NetworkManager.service (code=exited, status=1/FAILURE)
  Process: 1520 ExecStartPost=/bin/echo try restart NetworkManager.service (code=exited, status=0/SUCCESS)
  Process: 1396 ExecStart=/usr/bin/cloud-init modules --mode=final (code=exited, status=0/SUCCESS)
 Main PID: 1396 (code=exited, status=0/SUCCESS)

Apr 30 06:38:11 ip-10-116-2-72.us-west-2.compute.internal ec2[1483]: 256 SHA256:eM5vX61x/F5JwNW5Vgyc/kibtrqcUUag1Ve1XXKZx+c no comment (ECDSA)
Apr 30 06:38:11 ip-10-116-2-72.us-west-2.compute.internal ec2[1483]: 256 SHA256:wnZX9HrXAXdfx/RsZ5BnARURtTTzfo9YLaVNl3iVBNc no comment (ED25519)
Apr 30 06:38:11 ip-10-116-2-72.us-west-2.compute.internal ec2[1483]: 2048 SHA256:yzD+F5cTgmqBhV8Y1xbVRuHDDc+XNHylVvAnOmhlz98 no comment (RSA)
Apr 30 06:38:11 ip-10-116-2-72.us-west-2.compute.internal cloud-init[1396]: Cloud-init v. 19.4 finished at Thu, 30 Apr 2020 06:38:11 +0000. ...conds
Apr 30 06:38:11 ip-10-116-2-72.us-west-2.compute.internal echo[1520]: try restart NetworkManager.service
Apr 30 06:38:11 ip-10-116-2-72.us-west-2.compute.internal systemctl[1523]: Unknown operation 'try-reload-or-restart'.
Apr 30 06:38:11 ip-10-116-2-72.us-west-2.compute.internal systemd[1]: cloud-final.service: control process exited, code=exited status=1
Apr 30 06:38:11 ip-10-116-2-72.us-west-2.compute.internal systemd[1]: Failed to start Execute cloud user/final scripts.
Apr 30 06:38:11 ip-10-116-2-72.us-west-2.compute.internal systemd[1]: Unit cloud-final.service entered failed state.
Apr 30 06:38:11 ip-10-116-2-72.us-west-2.compute.internal systemd[1]: cloud-final.service failed.

[root@ip-10-116-2-72 system]# rpm -qf /bin/systemctl
systemd-219-76.el7.x86_64

Version-Release number of selected components (if applicable):

RHEL Version:
RHEL-7.9(3.10.0-1137.el7.x86_64)

How reproducible:
100%

Steps to Reproduce:
1. Start a RHEL-7.9  AMI on aws and check service status.

Actual results:
cloud-final.service failed
Expected results:
cloud-final.service not failed

Additional info:
- The failure should be caused by "ci-Remove-race-condition-between-cloud-init-and-Network-v2.patch [bz#1748015]."

Comment 8 John Newbigin 2020-11-19 03:31:11 UTC
I have NetworkManager.service masked which caused this error.
I had to remove `ExecStartPost=/usr/bin/systemctl reload-or-try-restart NetworkManager.service` from cloud-final.service
(copy to /etc/systemd/system/cloud-final.service and edit)

Comment 9 Eduardo Otubo 2020-11-19 11:43:17 UTC
(In reply to John Newbigin from comment #8)
> I have NetworkManager.service masked which caused this error.
> I had to remove `ExecStartPost=/usr/bin/systemctl reload-or-try-restart
> NetworkManager.service` from cloud-final.service
> (copy to /etc/systemd/system/cloud-final.service and edit)

I believe you want to refer to this bug: "cloud-final.service fails if NetworkManager not installed. " (https://bugzilla.redhat.com/show_bug.cgi?id=1898943). When you deploy a RHEL instance with cloud-init you should be using NetworkManager, otherwise it will indeed fail. The above mentioned bug will eventually be fixed on RHEL-8* branches only. If you need more assistance please fell free to leave a comment.

Comment 10 John Newbigin 2020-11-19 23:46:51 UTC
Fixing this in EL8 is great.
But I am on EL7.

We have NetworkManager installed because of
https://bugzilla.redhat.com/show_bug.cgi?id=1754701

I just wanted to publish my workaround to help others facing the same regression