Bug 1870871 - [UPI] Baremetal CoreOS static IP not persisting DNS nameserver
Summary: [UPI] Baremetal CoreOS static IP not persisting DNS nameserver
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.5
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
: 4.6.0
Assignee: Dusty Mabe
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-20 21:22 UTC by swilson
Modified: 2020-10-27 16:30 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:30:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:30:26 UTC

Description swilson 2020-08-20 21:22:20 UTC
Description of problem: RHCOS deployment on baremetal using ISO does not pass DNS server kernel command line option 


Version-Release number of selected component (if applicable): 4.5


How reproducible: Boot baremetal from ISO and add the kernel options 
ip=10.10.10.2::10.10.10.254:255.255.255.0:core0.example.com:enp1s0:none
nameserver=4.4.4.41

or 

ip=10.10.10.2::10.10.10.254:255.255.255.0:core0.example.com:enp1s0:none:4.4.4.41

Steps to Reproduce:
1. Boot from ISO
2. add kernel cmd line options ip=10.10.10.2::10.10.10.254:255.255.255.0:core0.example.com:enp1s0:none
nameserver=4.4.4.41
3. Continue booting.
4. core0 cannot resolve names via dns

Actual results: Node cannot resolve any dns queries


Expected results: Node should use nameserver to resolve dns queries


Additional info:

Comment 1 Micah Abbott 2020-08-25 20:30:24 UTC
I was unable to reproduce this using `rhcos-4.5.6-x86_64-installer.x86_64.iso` in local libvirt testing.

I booted the ISO via `virt-manager`, interrupted the boot and provided  `ip=192.168.124.109::192.168.124.1:255.255.255.0:core0.example.com:enp1s0:none nameserver=1.1.1.1` as additional kernel command line args.

When the boot process dropped me into the emergency shell, I saw that `/etc/resolv.conf` was populated with the nameserver provided.  Additionally, I was able to ping the nameserver successfully.


Does this match what you have tried?  Or are you past the initial ISO environment and have installed RHCOS to the disk?

Comment 2 swilson 2020-08-25 21:29:36 UTC
Yes. The nameserver option works for the install. However, after RHCOS has been placed on the disk and rebooted. The ip info is persisted but the nameserver is not. To continue install for each node (bootstrap, masters, and workers):

ssh core@<ip of node>
sudo su -
nmcli con mod <connection> ipv4.dns "<dns server>
nmcli con up <connection>

The DNS entries are saved and installation continues.

Comment 3 Dusty Mabe 2020-08-25 21:40:01 UTC
As far as I can tell this is a real issue in 4.5. It has to do with the way we persist initrd networking information into the real root on first boot. The legacy initscripts that are used in the initramfs write directly to resolv.conf and we don't bring that forward.

In 4.6 this is partly resolved because we use networkmanager in the initrd for networking and the `nm-initrd-generator` stores the nameserver information alongside the networking configuration and not directly in resolv.conf.

The part that isn't resolved is that (at least when I was testing this some time ago) the following was true:

    - the `ip=${ip}::${gateway}:${netmask}:${initramfshostname}:${devname}:none:${nameserver}` syntax works
    - the `ip=${ip}::${gateway}:${netmask}:${initramfshostname}:${devname}:none nameserver=${nameserver}` syntax doesn't
        - see https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/391

That bug is fixed upstream but I'd need to check to see if we could get it backported, though I do believe it will be in RHEL 8.3.

Comment 4 Dusty Mabe 2020-08-25 21:59:58 UTC
@swilson, it's not ideal, but for now you could workaround by creating a file under /etc/sysconfig/network-scripts/ifcfg-enp1s0 using Ignition with the `DNS1` entry in it like:

```
TYPE=Ethernet        
BOOTPROTO=none       
IPADDR=${ip}         
PREFIX=${prefix}     
GATEWAY=${gateway}   
DEFROUTE=yes         
IPV4_FAILURE_FATAL=no
NAME=enp1s0
DEVICE=enp1s0      
ONBOOT=yes
```

Comment 5 Dusty Mabe 2020-08-25 22:01:11 UTC
Would be best if I added the DNS1 entry:

```
TYPE=Ethernet        
BOOTPROTO=none       
IPADDR=${ip}         
PREFIX=${prefix}     
GATEWAY=${gateway}   
DEFROUTE=yes         
IPV4_FAILURE_FATAL=no
NAME=enp1s0
DEVICE=enp1s0      
ONBOOT=yes
DNS1=${nameserver}
```

Comment 6 Micah Abbott 2020-09-09 20:33:55 UTC
Other higher priority tasks and bugs have prevented us from addressing this issue; it will be addressed in an upcoming sprint.


@swilson Have you been able to test the workaround in comment #4 + #5?

Comment 7 swilson 2020-09-09 20:43:10 UTC
Have not tested the workarounds from comment #4 + #5. Manually added the DNS server via comment #2. Another worker machine needs to be added will try the workaround using the ignition files from #4 + #5.

Comment 9 Micah Abbott 2020-09-14 20:10:15 UTC
Conservatively targeting for 4.7 with a low priority until we receive more information.

Comment 10 Dusty Mabe 2020-09-25 19:17:34 UTC
I discussed this with @swilson and we each ran some new local tests.

It turns out the original description has an inaccuracy. Summary:

- `ip=${ip}::${gateway}:${netmask}:${hostname}:${devname}:none nameserver=${nameserver1}` syntax works just fine
- `ip=${ip}::${gateway}:${netmask}:${hostname}:${devname}:none:${nameserver1}` has the bug where the resolv.conf doesn't get updated properly

For now if you're hitting this issue on 4.5 you can use the `nameserver=` argument to workaround.

The issue with the `ip=${ip}::${gateway}:${netmask}:${hostname}:${devname}:none:${nameserver1}` is fixed in 4.6. In 4.6 we use NetworkManager in the initrd and it doesn't seem to have the same problem.

Marking as ON_QA. @swilson, do you mind testing a recent 4.6 build to verify you don't observe the problem in 4.6?

Comment 11 Colin Walters 2020-09-28 15:29:06 UTC
It looks like https://github.com/coreos/fedora-coreos-config/pull/636 is covering this, so if the test was run against an RHCOS build we should consider it verified.

Comment 14 errata-xmlrpc 2020-10-27 16:30:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.