Bug 2111526

Summary: Install fails due to masters not rebooting within timeout because resolv.conf does not exist
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: Trey West <trwest>
Component: Infrastructure OperatorAssignee: Michael Filanov <mfilanov>
Status: CLOSED CURRENTRELEASE QA Contact: Chad Crum <ccrum>
Severity: high Docs Contact: Derek <dcadzow>
Priority: unspecified    
Version: rhacm-2.6CC: ccrum, trwest, yfirst, yuhe
Target Milestone: ---Keywords: Regression, Reopened, TestBlocker
Target Release: ---Flags: ccrum: Blocker+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-10-03 20:23:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2100456, 2105069, 2109967, 2111632    
Bug Blocks:    

Description Trey West 2022-07-27 12:42:39 UTC
Description of the problem:

When installing an 4.11 IPv6 DHCP cluster, after masters get to rebooting stage they never join back to the cluster due to errors during execution of the resolv-prepender script. The nodes journal displays "NM resolv-prepender: NM resolv.conf still empty of nameserver" multiple times before attempting to pull container images and ultimately failing.


Release version:
4.11.0-0.nightly-2022-07-26-041421

Operator snapshot version:
2.1.0-DOWNANDBACK-2022-07-25-16-01-39

OCP version:
4.11

Steps to reproduce:
1. Install a 4.11 multi-node cluster with IPv6 DHCP networking

Actual results:
Installation fails due to masters not rebooting within timeout

Expected results:
Installation completes successfully

Additional info:
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1972]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1972]: + [[ '' == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1972]: + '[' -z ']'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1972]: + echo 'Not a DHCP4 address. Ignoring.'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1972]: Not a DHCP4 address. Ignoring.
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1972]: + exit 0
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1973]: + '[' -z ']'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1973]: + echo 'Not a DHCP6 address. Ignoring.'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1973]: Not a DHCP6 address. Ignoring.
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1973]: + exit 0
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1976]: Error: Device '' not found.
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1986]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1986]: + [[ '' == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1986]: + '[' -z ']'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1986]: + echo 'Not a DHCP4 address. Ignoring.'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1986]: Not a DHCP4 address. Ignoring.
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1986]: + exit 0
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1987]: + '[' -z ']'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1987]: + echo 'Not a DHCP6 address. Ignoring.'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1987]: Not a DHCP6 address. Ignoring.
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1987]: + exit 0
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1990]: Error: Device '' not found.

Comment 2 Trey West 2022-07-27 13:04:13 UTC

*** This bug has been marked as a duplicate of bug 2105069 ***

Comment 3 Chad Crum 2022-08-03 15:21:43 UTC
I'm going to reopen this bug as a blocker and use it to track the following blocking bugs:
- BZ2100456
- BZ2105069
- BZ2111632

Once these bugs are fixed we can validate with the infrastructure operator and close this out.

Comment 4 Yuanyuan He 2022-08-18 14:49:34 UTC
Per latest update, fix was merged, waiting for the next OCP release (weekly)。

May we know when exactly the OCP release with the PR merged is available? Does it require a new AI build at ACM side? Thanks!

Comment 5 Trey West 2022-08-22 12:28:16 UTC
Hi Yuanyuan,

This is the PR we are depending on for a fix in 4.11: https://github.com/openshift/machine-config-operator/pull/3287. So far it has only been merged for 4.12

Comment 6 Chad Crum 2022-09-29 15:16:47 UTC
This was fixed and verified by QE