Bug 1879322 - Fail to scaleup rhel worker due to /etc/resolv.conf is empty
Summary: Fail to scaleup rhel worker due to /etc/resolv.conf is empty
Keywords:
Status: CLOSED DUPLICATE of bug 1879156
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Gal Zaidman
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-16 02:14 UTC by jima
Modified: 2020-09-21 11:43 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-21 11:43:50 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description jima 2020-09-16 02:14:28 UTC
Description of problem:
Install OCP 4.6.0-0.nightly-2020-09-15-045818 and scaleup rhel worker with rhel7.7 template on vsphere, new worker's CSR could not be approved.

We detected that nameserver in /etc/resolv.conf is empty after rebooting worker node per below task, so there are many errors of failing pull image in /var/log/message.

TASK [openshift_node : Reboot the host and wait for it to come back] ***********
Tuesday 15 September 2020  21:50:18 +0800 (0:00:01.730)       0:28:33.451 ***** 
changed: [136.144.52.245] => {"changed": true, "elapsed": 37, "rebooted": true}

TASK [openshift_node : Approve node CSRs] **************************************
Tuesday 15 September 2020  21:50:58 +0800 (0:00:40.651)       0:29:14.102 ***** 
fatal: [136.144.52.245]: FAILED! => {"changed": false, "client_approve_results": ["Attempt: 1, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 2, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 3, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 4, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 5, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 6, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 7, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 8, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 9, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", ...}

# cat /etc/resolv.conf
# Generated by NetworkManager

RHEL worker uses dhcp, network information comes from dhcp server, and /etc/resolv.conf is updated by NetworkManager.
We found that NetworkManager config file 30-resolv-prepender has been updated per PR: https://github.com/openshift/machine-config-operator/commit/03c880054238fa41ef07396fdce9f03134c9f98d

I tried to remove new lines introduced above PR, and restart node, /etc/resolv.conf becomen normal and get correct nameserver. 

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Install OCP 4.6.0-0.nightly-2020-09-15-045818
2. Scaleup rhel worker with RHEL7.7 template.
3.

Actual results:
Could not approve worker node's CSR

Expected results:
worker node is joined successfully

Additional info:

Comment 1 Russell Teague 2020-09-16 15:38:58 UTC
This appears to be related to a change in MCO.  openshift-ansible does not control /etc/resolv.conf.

Comment 2 Yu Qi Zhang 2020-09-16 19:43:49 UTC
Passing to gzaidman@redhat.com who made the PR to take a look. @Gal would we be able to revert (or at least partially revert) the PR?

Comment 3 Ben Nemec 2020-09-16 19:49:17 UTC
There's a fix up for this already: https://github.com/openshift/machine-config-operator/pull/2094

Comment 4 jima 2020-09-21 10:25:23 UTC
checked the issue on 4.6.0-0.nightly-2020-09-20-184226 which includes PR https://github.com/openshift/machine-config-operator/pull/2094.

It is successful to scaleup rhel worker.
$ oc get nodes
NAME                          STATUS   ROLES    AGE    VERSION
jima0921-gfwnh-master-0       Ready    master   8h     v1.19.0+7f9e863
jima0921-gfwnh-master-1       Ready    master   8h     v1.19.0+7f9e863
jima0921-gfwnh-master-2       Ready    master   8h     v1.19.0+7f9e863
jima0921-gfwnh-rhel-0         Ready    worker   119m   v1.19.0+f5121a6
jima0921-gfwnh-worker-k6zbw   Ready    worker   8h     v1.19.0+7f9e863
jima0921-gfwnh-worker-mj7w7   Ready    worker   8h     v1.19.0+7f9e863

Comment 5 Gal Zaidman 2020-09-21 11:43:50 UTC

*** This bug has been marked as a duplicate of bug 1879156 ***


Note You need to log in before you can comment on or make changes to this bug.