Bug 1968634 - [master] [assisted operator] Installed Clusters are missing DNS setups
Summary: [master] [assisted operator] Installed Clusters are missing DNS setups
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: assisted-installer
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Igal Tsoiref
QA Contact: Omri Hochman
URL:
Whiteboard: KNI-EDGE-JUKE-4.8 AI-Team-Platform
: 1969574 2042378 (view as bug list)
Depends On:
Blocks: 1969752 1971298
TreeView+ depends on / blocked
 
Reported: 2021-06-07 17:33 UTC by hanzhang
Modified: 2022-08-28 08:45 UTC (History)
5 users (show)

Fixed In Version: OCP-Metal-v1.0.22.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1969752 1971298 (view as bug list)
Environment:
Last Closed: 2022-08-28 08:45:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift assisted-service pull 1971 0 None closed OCPBUGSM-30146 - Fixing the issue with missing data in /etc/resolv.conf after reboot of SNO host 2021-06-14 07:52:39 UTC

Internal Links: 1969752

Description hanzhang 2021-06-07 17:33:01 UTC
Description of problem:
We used AI ocm-2.3-20210603 to install ocp 4.8.0-fc.7, and found 16/1081 clusters are showing clusterdeployment installed=true, but failed to connect to hub (RHACM).

After we did a scan of the clusters, all the SNO nodes' `/etc/resolve.conf` files are empty, which we expect they should include the DNS we setup in the nmstateconfig CR.

The cluster can be installed which means during bootstrapping, the cluster has correct DNS settings, otherwise it won't be able to download the rootfs. Something happened after bootstrap, and may be after AI installed the cluster.


Version-Release number of selected component (if applicable):


How reproducible:
16/1081 => about 1.5%

Steps to Reproduce:
1. Install AI with ACM
2. Zero touch provisioning SNOs with DNS settings in nmstateconfig 
3. See some of the clusters are not in ready status in ACM, and `/etc/resolve.conf` is empty. 

Actual results:
`/etc/resolve.conf` is empty. 

Expected results:
`/etc/resolve.conf` should have nameservers in nmstateconfig

Additional info:

Comment 1 hanzhang 2021-06-07 21:13:35 UTC
It can be related to the forcedns sed command not working:
```
# cat /etc/NetworkManager/dispatcher.d/forcedns

export IP="198.18.8.147"
if [ "$2" = "dhcp4-change" ] || [ "$2" = "dhcp6-change" ] || [ "$2" = "up" ] || [ "$2" = "connectivity-change" ]; then
    if ! grep -q "$IP" /etc/resolv.conf; then
      sed -i "s/sno00030.rdu2.scalelab.redhat.com//" /etc/resolv.conf
      sed -i "s/search /search sno00030.rdu2.scalelab.redhat.com /" /etc/resolv.conf
      sed -i "0,/nameserver/s/nameserver/nameserver $IP\nnameserver/" /etc/resolv.conf
    fi
fi


# journalctl | grep -C10 "/etc/NetworkManager/dispatcher.d/forcedns"
...                                                                                                                                                                                                         
Jun 04 23:29:54 sno00030 kernel: IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready                                                                                                                       
Jun 04 23:29:54 sno00030 dbus-daemon[1360]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'                                                                                         
Jun 04 23:29:54 sno00030 systemd[1]: Started Network Manager Script Dispatcher Service.                                                                                                                     
Jun 04 23:29:54 sno00030 kernel: IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready
Jun 04 23:29:54 sno00030 nm-dispatcher[1605]: Error: Device '' not found.
Jun 04 23:29:54 sno00030 nm-dispatcher[1605]: Error: Device '' not found.
Jun 04 23:29:54 sno00030 nm-dispatcher[1605]: grep: /etc/resolv.conf: No such file or directory
Jun 04 23:29:54 sno00030 nm-dispatcher[1605]: sed: can't read /etc/resolv.conf: No such file or directory                                                                                                  
Jun 04 23:29:54 sno00030 nm-dispatcher[1605]: sed: can't read /etc/resolv.conf: No such file or directory                                                                                                  
Jun 04 23:29:54 sno00030 nm-dispatcher[1605]: sed: can't read /etc/resolv.conf: No such file or directory                                                                                                  
Jun 04 23:29:54 sno00030 nm-dispatcher[1605]: req:2 'connectivity-change', "/etc/NetworkManager/dispatcher.d/forcedns": complete: failed with Script '/etc/NetworkManager/dispatcher.d/forcedns' exited wit$ error status 2.
Jun 04 23:29:54 sno00030 NetworkManager[1593]: <warn>  [1622849394.1499] dispatcher: (2) /etc/NetworkManager/dispatcher.d/forcedns failed (failed): Script '/etc/NetworkManager/dispatcher.d/forcedns' exit$d with error status 2.

```

Comment 3 Igal Tsoiref 2021-06-10 08:22:45 UTC
*** Bug 1969574 has been marked as a duplicate of this bug. ***

Comment 4 hanzhang 2021-06-25 13:05:59 UTC
Verified with latest 1k cluster provisioning test, not seeing this issue any more.

Comment 7 Michael Filanov 2022-04-20 09:04:04 UTC
*** Bug 2042378 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.