Description of problem: RHCOS deployment on baremetal using ISO does not pass DNS server kernel command line option Version-Release number of selected component (if applicable): 4.5 How reproducible: Boot baremetal from ISO and add the kernel options ip=10.10.10.2::10.10.10.254:255.255.255.0:core0.example.com:enp1s0:none nameserver=4.4.4.41 or ip=10.10.10.2::10.10.10.254:255.255.255.0:core0.example.com:enp1s0:none:4.4.4.41 Steps to Reproduce: 1. Boot from ISO 2. add kernel cmd line options ip=10.10.10.2::10.10.10.254:255.255.255.0:core0.example.com:enp1s0:none nameserver=4.4.4.41 3. Continue booting. 4. core0 cannot resolve names via dns Actual results: Node cannot resolve any dns queries Expected results: Node should use nameserver to resolve dns queries Additional info:
I was unable to reproduce this using `rhcos-4.5.6-x86_64-installer.x86_64.iso` in local libvirt testing. I booted the ISO via `virt-manager`, interrupted the boot and provided `ip=192.168.124.109::192.168.124.1:255.255.255.0:core0.example.com:enp1s0:none nameserver=1.1.1.1` as additional kernel command line args. When the boot process dropped me into the emergency shell, I saw that `/etc/resolv.conf` was populated with the nameserver provided. Additionally, I was able to ping the nameserver successfully. Does this match what you have tried? Or are you past the initial ISO environment and have installed RHCOS to the disk?
Yes. The nameserver option works for the install. However, after RHCOS has been placed on the disk and rebooted. The ip info is persisted but the nameserver is not. To continue install for each node (bootstrap, masters, and workers): ssh core@<ip of node> sudo su - nmcli con mod <connection> ipv4.dns "<dns server> nmcli con up <connection> The DNS entries are saved and installation continues.
As far as I can tell this is a real issue in 4.5. It has to do with the way we persist initrd networking information into the real root on first boot. The legacy initscripts that are used in the initramfs write directly to resolv.conf and we don't bring that forward. In 4.6 this is partly resolved because we use networkmanager in the initrd for networking and the `nm-initrd-generator` stores the nameserver information alongside the networking configuration and not directly in resolv.conf. The part that isn't resolved is that (at least when I was testing this some time ago) the following was true: - the `ip=${ip}::${gateway}:${netmask}:${initramfshostname}:${devname}:none:${nameserver}` syntax works - the `ip=${ip}::${gateway}:${netmask}:${initramfshostname}:${devname}:none nameserver=${nameserver}` syntax doesn't - see https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/391 That bug is fixed upstream but I'd need to check to see if we could get it backported, though I do believe it will be in RHEL 8.3.
@swilson, it's not ideal, but for now you could workaround by creating a file under /etc/sysconfig/network-scripts/ifcfg-enp1s0 using Ignition with the `DNS1` entry in it like: ``` TYPE=Ethernet BOOTPROTO=none IPADDR=${ip} PREFIX=${prefix} GATEWAY=${gateway} DEFROUTE=yes IPV4_FAILURE_FATAL=no NAME=enp1s0 DEVICE=enp1s0 ONBOOT=yes ```
Would be best if I added the DNS1 entry: ``` TYPE=Ethernet BOOTPROTO=none IPADDR=${ip} PREFIX=${prefix} GATEWAY=${gateway} DEFROUTE=yes IPV4_FAILURE_FATAL=no NAME=enp1s0 DEVICE=enp1s0 ONBOOT=yes DNS1=${nameserver} ```
Other higher priority tasks and bugs have prevented us from addressing this issue; it will be addressed in an upcoming sprint. @swilson Have you been able to test the workaround in comment #4 + #5?
Have not tested the workarounds from comment #4 + #5. Manually added the DNS server via comment #2. Another worker machine needs to be added will try the workaround using the ignition files from #4 + #5.
Conservatively targeting for 4.7 with a low priority until we receive more information.
I discussed this with @swilson and we each ran some new local tests. It turns out the original description has an inaccuracy. Summary: - `ip=${ip}::${gateway}:${netmask}:${hostname}:${devname}:none nameserver=${nameserver1}` syntax works just fine - `ip=${ip}::${gateway}:${netmask}:${hostname}:${devname}:none:${nameserver1}` has the bug where the resolv.conf doesn't get updated properly For now if you're hitting this issue on 4.5 you can use the `nameserver=` argument to workaround. The issue with the `ip=${ip}::${gateway}:${netmask}:${hostname}:${devname}:none:${nameserver1}` is fixed in 4.6. In 4.6 we use NetworkManager in the initrd and it doesn't seem to have the same problem. Marking as ON_QA. @swilson, do you mind testing a recent 4.6 build to verify you don't observe the problem in 4.6?
It looks like https://github.com/coreos/fedora-coreos-config/pull/636 is covering this, so if the test was run against an RHCOS build we should consider it verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196