Description of problem: Install OCP 4.6.0-0.nightly-2020-09-15-045818 and scaleup rhel worker with rhel7.7 template on vsphere, new worker's CSR could not be approved. We detected that nameserver in /etc/resolv.conf is empty after rebooting worker node per below task, so there are many errors of failing pull image in /var/log/message. TASK [openshift_node : Reboot the host and wait for it to come back] *********** Tuesday 15 September 2020 21:50:18 +0800 (0:00:01.730) 0:28:33.451 ***** changed: [136.144.52.245] => {"changed": true, "elapsed": 37, "rebooted": true} TASK [openshift_node : Approve node CSRs] ************************************** Tuesday 15 September 2020 21:50:58 +0800 (0:00:40.651) 0:29:14.102 ***** fatal: [136.144.52.245]: FAILED! => {"changed": false, "client_approve_results": ["Attempt: 1, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 2, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 3, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 4, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 5, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 6, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 7, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 8, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", "Attempt: 9, Node jima091501-x7q9g-rhel-0 not present or CSR not yet available", ...} # cat /etc/resolv.conf # Generated by NetworkManager RHEL worker uses dhcp, network information comes from dhcp server, and /etc/resolv.conf is updated by NetworkManager. We found that NetworkManager config file 30-resolv-prepender has been updated per PR: https://github.com/openshift/machine-config-operator/commit/03c880054238fa41ef07396fdce9f03134c9f98d I tried to remove new lines introduced above PR, and restart node, /etc/resolv.conf becomen normal and get correct nameserver. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Install OCP 4.6.0-0.nightly-2020-09-15-045818 2. Scaleup rhel worker with RHEL7.7 template. 3. Actual results: Could not approve worker node's CSR Expected results: worker node is joined successfully Additional info:
This appears to be related to a change in MCO. openshift-ansible does not control /etc/resolv.conf.
Passing to gzaidman who made the PR to take a look. @Gal would we be able to revert (or at least partially revert) the PR?
There's a fix up for this already: https://github.com/openshift/machine-config-operator/pull/2094
checked the issue on 4.6.0-0.nightly-2020-09-20-184226 which includes PR https://github.com/openshift/machine-config-operator/pull/2094. It is successful to scaleup rhel worker. $ oc get nodes NAME STATUS ROLES AGE VERSION jima0921-gfwnh-master-0 Ready master 8h v1.19.0+7f9e863 jima0921-gfwnh-master-1 Ready master 8h v1.19.0+7f9e863 jima0921-gfwnh-master-2 Ready master 8h v1.19.0+7f9e863 jima0921-gfwnh-rhel-0 Ready worker 119m v1.19.0+f5121a6 jima0921-gfwnh-worker-k6zbw Ready worker 8h v1.19.0+7f9e863 jima0921-gfwnh-worker-mj7w7 Ready worker 8h v1.19.0+7f9e863
*** This bug has been marked as a duplicate of bug 1879156 ***