Bug 2003915

Summary: pre-network-manager-config failed due to timeout when static config is used
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: Jatan Malde <jmalde>
Component: Infrastructure OperatorAssignee: Michael Filanov <mfilanov>
Status: CLOSED ERRATA QA Contact: Chad Crum <ccrum>
Severity: high Docs Contact: Derek <dcadzow>
Priority: unspecified    
Version: rhacm-2.4CC: aos-bugs, asegurap, cchun, ccrum, dhuynh, juhsu, mfilanov, ncarboni, oamizur, odepaz, padillon, rfreiman, sasha, trwest, yfirst
Target Milestone: ---Flags: ccrum: qe_test_coverage-
ming: rhacm-2.4+
jmalde: needinfo-
jmalde: needinfo-
Target Release: rhacm-2.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-11 18:33:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2014084    

Description Jatan Malde 2021-09-14 07:05:30 UTC
Version:

$ openshift-install version
Installer used from cloud.redhat.com 

Platform:

baremetal

What happened?

Single node OCP with static ip address install fails sometimes because the static network configuration do not get applied when booting first time from discovery ISO image. Since DHCP is not used and static address setting failed the node do not have external connectivity at all. Detailed failure is pre-network-manager-config service timeout. the service is the one that should apply the static network config to networkManager. In ignition file the timeout for the service is set as 10s and no retries OR no dependencies to any service are defined. so when the service fails once, node will never get network connectivity.


 Sep 08 09:20:55 localhost sh[1970]: changing security context of '/var/usrlocal/sbin'
Sep 08 09:20:55 localhost auditd[2055]: No plugins found, not dispatching events
Sep 08 09:20:55 localhost auditd[2055]: Init complete, auditd 3.0 listening for events (startup state enable)
Sep 08 09:20:57 localhost systemd[1]: Started RHCOS Fix SELinux Labeling For /usr/local/sbin.
Sep 08 09:21:01 localhost systemd[1]: pre-network-manager-config.service: start operation timed out. Terminating.
Sep 08 09:21:01 localhost systemd[1]: pre-network-manager-config.service: Main process exited, code=killed, status=15/TERM
Sep 08 09:21:01 localhost systemd[1]: pre-network-manager-config.service: Failed with result 'timeout'.
Sep 08 09:21:01 localhost systemd[1]: Failed to start Prepare network manager config content.
Sep 08 09:21:03 localhost systemd[1]: Started Run update-ca-trust.


What did you expect to happen?

The pre-network-manager-config service should work fine and the static configuration should be set up.

How to reproduce it (as minimally and precisely as possible)?

Pull a 4.8 cluster on cloud.redhat.com using assisted installer and select the checkbox for single node openshift.

Anything else we need to know?

When the timeout is set to 30seconds the configuration is successfully applied and the node network is setup.
Attaching the patched ignition file and also full serial logs from the machine.

Comment 2 Nick Carboni 2021-09-22 17:44:13 UTC
This is a potential issue with the assisted installer running in cloud.redhat.com so it doesn't qualify as an OCP blocker.

Comment 4 Alexander Chuzhoy 2021-10-05 13:29:11 UTC
Hi Jatan,
What model is the machine where you see this issue?

Comment 6 Alexander Chuzhoy 2021-10-07 16:54:13 UTC
IMHO this looks similar to https://bugzilla.redhat.com/show_bug.cgi?id=2002059
Duplicate?

Comment 12 Crystal Chun 2021-10-26 19:30:28 UTC
PR for 2.4 has been merged https://github.com/openshift/assisted-service/pull/2773 

Can we get this verified and closed?

Comment 16 errata-xmlrpc 2021-11-11 18:33:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Advanced Cluster Management 2.4 images and security updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4618

Comment 17 Osher De Paz 2021-11-14 16:36:49 UTC
Hi!
just wanted to let everyone know that console.redhat.com/cloud.redhat.com contains the relevant fix, with version v1.0.27.3