Bug 2111526 - Install fails due to masters not rebooting within timeout because resolv.conf does not exist
Summary: Install fails due to masters not rebooting within timeout because resolv.conf...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Infrastructure Operator
Version: rhacm-2.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Michael Filanov
QA Contact: Chad Crum
Derek
URL:
Whiteboard:
Depends On: 2100456 2105069 2109967 2111632
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-27 12:42 UTC by Trey West
Modified: 2022-10-20 19:55 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-10-03 20:23:57 UTC
Target Upstream Version:
Embargoed:
ccrum: Blocker+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github stolostron backlog issues 24815 0 None None None 2022-08-03 17:06:09 UTC
Red Hat Issue Tracker MGMTBUGSM-483 0 None None None 2022-07-27 13:04:50 UTC

Description Trey West 2022-07-27 12:42:39 UTC
Description of the problem:

When installing an 4.11 IPv6 DHCP cluster, after masters get to rebooting stage they never join back to the cluster due to errors during execution of the resolv-prepender script. The nodes journal displays "NM resolv-prepender: NM resolv.conf still empty of nameserver" multiple times before attempting to pull container images and ultimately failing.


Release version:
4.11.0-0.nightly-2022-07-26-041421

Operator snapshot version:
2.1.0-DOWNANDBACK-2022-07-25-16-01-39

OCP version:
4.11

Steps to reproduce:
1. Install a 4.11 multi-node cluster with IPv6 DHCP networking

Actual results:
Installation fails due to masters not rebooting within timeout

Expected results:
Installation completes successfully

Additional info:
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1972]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1972]: + [[ '' == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1972]: + '[' -z ']'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1972]: + echo 'Not a DHCP4 address. Ignoring.'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1972]: Not a DHCP4 address. Ignoring.
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1972]: + exit 0
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1973]: + '[' -z ']'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1973]: + echo 'Not a DHCP6 address. Ignoring.'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1973]: Not a DHCP6 address. Ignoring.
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1973]: + exit 0
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1976]: Error: Device '' not found.
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1986]: + [[ OVNKubernetes == \O\V\N\K\u\b\e\r\n\e\t\e\s ]]
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1986]: + [[ '' == \W\i\r\e\d\ \C\o\n\n\e\c\t\i\o\n ]]
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1986]: + '[' -z ']'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1986]: + echo 'Not a DHCP4 address. Ignoring.'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1986]: Not a DHCP4 address. Ignoring.
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1986]: + exit 0
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1987]: + '[' -z ']'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1987]: + echo 'Not a DHCP6 address. Ignoring.'
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1987]: Not a DHCP6 address. Ignoring.
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1987]: + exit 0
Jul 26 07:34:10 mdhcp-master-0-0 nm-dispatcher[1990]: Error: Device '' not found.

Comment 2 Trey West 2022-07-27 13:04:13 UTC

*** This bug has been marked as a duplicate of bug 2105069 ***

Comment 3 Chad Crum 2022-08-03 15:21:43 UTC
I'm going to reopen this bug as a blocker and use it to track the following blocking bugs:
- BZ2100456
- BZ2105069
- BZ2111632

Once these bugs are fixed we can validate with the infrastructure operator and close this out.

Comment 4 Yuanyuan He 2022-08-18 14:49:34 UTC
Per latest update, fix was merged, waiting for the next OCP release (weekly)。

May we know when exactly the OCP release with the PR merged is available? Does it require a new AI build at ACM side? Thanks!

Comment 5 Trey West 2022-08-22 12:28:16 UTC
Hi Yuanyuan,

This is the PR we are depending on for a fix in 4.11: https://github.com/openshift/machine-config-operator/pull/3287. So far it has only been merged for 4.12

Comment 6 Chad Crum 2022-09-29 15:16:47 UTC
This was fixed and verified by QE


Note You need to log in before you can comment on or make changes to this bug.