Bug 2020780

Summary:

DNS settings are not applied in dual stack environment

Product:

Red Hat Enterprise Linux 8

Reporter:

Ben Nemec <bnemec>

Component:

nmstate

Assignee:

Fernando F. Mancera <ferferna>

Status:

CLOSED NOTABUG

QA Contact:

Mingyu Shi <mshi>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

8.4

CC:

amalykhi, ferferna, jiji, jishi, network-qe, till

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-12-02 22:38:29 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Handler output from applying config	none

Description Ben Nemec 2021-11-05 21:32:32 UTC

Description of problem: When I run nmstate on a dual stack host and try to update the DNS settings, it seems to have no effect on resolv.conf. I can see the DNS settings in the nmcli connection details, but it never shows up in /etc/resolv.conf or /var/run/NetworkManager/resolv.conf.


Version-Release number of selected component (if applicable): 1.0.4


How reproducible: Always


Steps to Reproduce:
1. Run nmstate on a dual stack host with a configuration such as the following:
dns-resolver:
  config:
    search:
    - nemebean.com
    server:
    - 11.1.1.1
interfaces:
- name: enp1s0
  type: ethernet
  state: up
  ipv4:
    auto-dns: false
    enabled: true
2.
3.

Actual results: None of the specified DNS settings are applied to the node.


Expected results: resolv.conf updated with the new settings.


Additional info:

Comment 1 Ben Nemec 2021-11-05 21:33:58 UTC

Created attachment 1840327 [details]
Handler output from applying config

Comment 2 Fernando F. Mancera 2021-11-18 10:46:16 UTC

Hello! What do you mean by "None of the specified DNS settings are applied to the node"? After doing the apply, if you do a show over this interface, does it show the configuration? The way nameservers are written into resolv.conf depends on how NetworkManager is configured to do it. Could you attach the NetworkManager configuration on the node?

Nmstate configures the nameservers on the connection and then activate the connection.. that is all. This can be caused due to a bad configuration of NetworkManager or an issue on NetworkManager. I will check the new information requested and let you know.

Thank you!

Comment 3 Ben Nemec 2021-11-22 22:47:23 UTC

Sorry, I should probably have just referred back to the description there. I meant the changes don't show up in either resolv.conf. They are in the nmcli output for the interface though.

The NetworkManager config is unmanaged:
[main]
rc-manager=unmanaged

And the nmcli con show output includes:
ipv4.dns:                               11.1.1.1
ipv4.dns-search:                        nemebean.com

Although ipv6 is still empty:
ipv6.dns:                               --
ipv6.dns-search:                        --

I know we have another issue with unmanaged resolv.conf, but in that case the nameservers show up in the /var/run/ version, they just don't get populated into /etc/resolv.conf so I don't think this is the same issue (which is too bad because it would be nice if I could reproduce the other bug to capture logs...).

Looking through the logs again though, I think I see the problem:
Nov 22 21:15:17 worker-0.ostest.test.metalkube.org NetworkManager[1227]: <warn>  [1637615717.5475] dhcp4 (enp1s0): request timed out
Nov 22 21:15:17 worker-0.ostest.test.metalkube.org NetworkManager[1227]: <info>  [1637615717.5476] dhcp4 (enp1s0): state changed unknown -> timeout
Nov 22 21:15:17 worker-0.ostest.test.metalkube.org NetworkManager[1227]: <info>  [1637615717.5477] device (enp1s0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
Nov 22 21:15:17 worker-0.ostest.test.metalkube.org NetworkManager[1227]: <warn>  [1637615717.5494] device (enp1s0): Activation: failed for connection 'Wired Connection'

That interface is only expected to get an ipv6 address so the dhcp4 failure is normal, but it appears that it is causing NetworkManager to consider it failed. I can't disable it either because I'm trying to apply an ipv4 DNS server and nmstate complains if ipv4 is disabled on the interface. I also can't use the other interface because it's bridged and trying to do the DNS config there results in:

Traceback (most recent call last):
  File \"/usr/bin/nmstatectl\", line 11, in 
    load_entry_point('nmstate==1.0.4', 'console_scripts', 'nmstatectl')()
  File \"/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py\", line 73, in main
    return args.func(args)
  File \"/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py\", line 326, in set
    return apply(args)
  File \"/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py\", line 354, in apply
    args.save_to_disk,
  File \"/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py\", line 407, in apply_state
    save_to_disk=save_to_disk,
  File \"/usr/lib/python3.6/site-packages/libnmstate/netapplier.py\", line 78, in apply
    desired_state, ignored_ifnames, current_state, save_to_disk
  File \"/usr/lib/python3.6/site-packages/libnmstate/net_state.py\", line 51, in __init__
    gen_conf_mode,
  File \"/usr/lib/python3.6/site-packages/libnmstate/ifaces/ifaces.py\", line 155, in __init__
    self._mark_vf_interface_as_absent_when_sriov_vf_decrease()
  File \"/usr/lib/python3.6/site-packages/libnmstate/ifaces/ifaces.py\", line 257, in _mark_vf_interface_as_absent_when_sriov_vf_decrease
    cur_iface.sriov_total_vfs != 0
AttributeError: 'OvsInternalIface' object has no attribute 'sriov_total_vfs'

I know there were issues with OVS bridges and nmstate, although I thought at least some of those were fixed. Maybe we have an old version though? It looks like our handler container has 1.0.4.

Comment 4 Fernando F. Mancera 2021-11-23 01:27:37 UTC

Hi! Thank you for the information.

(In reply to Ben Nemec from comment #3)
> Sorry, I should probably have just referred back to the description there. I
> meant the changes don't show up in either resolv.conf. They are in the nmcli
> output for the interface though.
> 
> The NetworkManager config is unmanaged:
> [main]
> rc-manager=unmanaged
> 
> And the nmcli con show output includes:
> ipv4.dns:                               11.1.1.1
> ipv4.dns-search:                        nemebean.com
> 

It seems the configuration is being set right by Nmstate. Could you also show me the output of 'nmcli c' command? If the connection is activated, the config should have been applied on resolv.conf.. I will ask NetworkManager people.. it does not seems a Nmstate issue.

Do you have a clear reproducer for this? If not, let me know and I will try to create one.

Thank you!

Comment 5 Ben Nemec 2021-11-24 22:13:34 UTC

Here's the output:
# nmcli c
NAME UUID TYPE DEVICE
enp1s0 8710e2a6-faab-4d67-8856-4b8a70c106cc ethernet enp1s0
ovs-if-br-ex 35699293-f83d-4e98-b309-492aed66ea7f ovs-interface br-ex
br-ex d2bee788-d65b-49c8-af2b-ee33be60ee77 ovs-bridge br-ex
ovs-if-phys0 2cf4990f-6095-4ab8-b998-2f71533558de ethernet enp2s0
ovs-port-br-ex 91edfa79-345a-4684-9335-1ec34ff3a89b ovs-port br-ex
ovs-port-phys0 6236e3bb-7a50-4ddc-9159-410435f97c97 ovs-port enp2s0
Wired Connection 7596e02e-2602-4a8e-a862-17e63d622d68 ethernet --
Wired Connection a4532e83-9bdb-448a-9aac-e429d2b45db6 ethernet --

The color-coding is lost, but the enp1s0 connection is orange, while the rest of them are green. I'm not sure what that means, but I thought I'd mention it in case it's relevant.

I've been trying to figure out a reproducer, but my standalone VM where I tried to create a similar environment is having issues doing the DNS configuration at all. I haven't been able to figure out what I'm doing wrong so I haven't been able to come up with a reproducer. Experimenting in the cluster (more on that below) hasn't narrowed down the problem either.

We did see something similar (DNS applied according to nmcli, but never shows up in resolv.conf) on a plain ipv4 cluster, so it's possible the dual stack part of this is a red herring. That was an isolated case though so I'm not sure whether that reproduces as consistently as in dual stack.

A couple of other notes about the node this is happening on:
-I noticed that ipv4.may-fail was false, which is a problem because enp1s0 doesn't get an ipv4 address in this env. I thought maybe that was making NM think the connection wasn't actually active, but even changing that didn't help.
-Similarly, I tried passing an ipv6 address as the server value since this interface is ipv6-only (br-ex is the one that is dual stack), but that also didn't work.

I'm out for the next couple of days for the US holiday, but I'll be around next week to continue debugging this. I should probably start with a fresh cluster that I haven't messed with so much and get some NetworkManager logs to see why it's not applying the changes.

Comment 6 Fernando F. Mancera 2021-11-25 10:25:54 UTC

(In reply to Ben Nemec from comment #5)
> Here's the output:
> # nmcli c
> NAME              UUID                                  TYPE          
> DEVICE 
> enp1s0            8710e2a6-faab-4d67-8856-4b8a70c106cc  ethernet      
> enp1s0 
> ovs-if-br-ex      35699293-f83d-4e98-b309-492aed66ea7f  ovs-interface  br-ex
> 
> br-ex             d2bee788-d65b-49c8-af2b-ee33be60ee77  ovs-bridge     br-ex
> 
> ovs-if-phys0      2cf4990f-6095-4ab8-b998-2f71533558de  ethernet      
> enp2s0 
> ovs-port-br-ex    91edfa79-345a-4684-9335-1ec34ff3a89b  ovs-port       br-ex
> 
> ovs-port-phys0    6236e3bb-7a50-4ddc-9159-410435f97c97  ovs-port      
> enp2s0 
> Wired Connection  7596e02e-2602-4a8e-a862-17e63d622d68  ethernet       --   
> 
> Wired Connection  a4532e83-9bdb-448a-9aac-e429d2b45db6  ethernet       --
> 
> The color-coding is lost, but the enp1s0 connection is orange, while the
> rest of them are green. I'm not sure what that means, but I thought I'd
> mention it in case it's relevant.
> 
> I've been trying to figure out a reproducer, but my standalone VM where I
> tried to create a similar environment is having issues doing the DNS
> configuration at all. I haven't been able to figure out what I'm doing wrong
> so I haven't been able to come up with a reproducer. Experimenting in the
> cluster (more on that below) hasn't narrowed down the problem either.
> 
> We did see something similar (DNS applied according to nmcli, but never
> shows up in resolv.conf) on a plain ipv4 cluster, so it's possible the dual
> stack part of this is a red herring. That was an isolated case though so I'm
> not sure whether that reproduces as consistently as in dual stack.
> 
> A couple of other notes about the node this is happening on:
> -I noticed that ipv4.may-fail was false, which is a problem because enp1s0
> doesn't get an ipv4 address in this env. I thought maybe that was making NM
> think the connection wasn't actually active, but even changing that didn't
> help.
> -Similarly, I tried passing an ipv6 address as the server value since this
> interface is ipv6-only (br-ex is the one that is dual stack), but that also
> didn't work.
> 
> I'm out for the next couple of days for the US holiday, but I'll be around
> next week to continue debugging this. I should probably start with a fresh
> cluster that I haven't messed with so much and get some NetworkManager logs
> to see why it's not applying the changes.

Hello!

Yes, the color is important. Let me explain what is happening:

When configuring static DNS on NetworkManager e.g "11.1.1.1", the user needs to set neither:
 1. Interface with static IP and static default gateway route. (IPv4 or IPv6 depending on the DNS)
 2. Interface with DHCP/Autoconf enabled with `auto-dns: False`

In this case, you are setting the following desired state:

```
---
dns-resolver:
  config:
    search:
    - nemebean.com
    server:
    - 11.1.1.1
interfaces:
- name: enp1s0
  type: ethernet
  state: up
  ipv4:
    enabled: true
    auto-dns: false
```

Notice that you are doing option 2. But the DNS cannot be written on resolv.conf until the interface gets a lease from the DHCP server. According to the "nmcli c" output, the connection have been activated correctly but is waiting for DHCP to get an IP address. Is there a DHCP server running on the environment? If not, you have two options again:

 1. Set up a DHCP server or check why it is not working if there is already one.
 2. Add an static IP address and set `dhcp: false` on the desired state. Then add a default gateway route similar to:

```
routes:
  config:
  - destination: 0.0.0.0/0
    next-hop-address: 11.1.1.1
    next-hop-interface: ens1f0
```

Let me know if something else is needed :-)

Comment 7 Ben Nemec 2021-11-30 21:58:22 UTC

Okay thanks, that helps a bunch. I was able to get an ipv6 server working, knowing that the orange color is a problem, which in this case meant I also needed to disable ipv4 on that interface.

We may be stuck with ipv4 DNS servers on dual stack clusters though. We can't add a route on enp1s0 because that's only used for the provisioning network. I tried doing DNS config with just an ipv4 address configured (but no route), but that complains about no suitable interface. I assume that's because there is no route.

We also can't add a route on the primary interface because it's DHCP and nmstate will complain about a route targeting an interface with a dynamic address. I also haven't been able to configure DNS on the primary interface because it's bridged, and trying to configure that fails on "'OvsInternalIface' object has no attribute 'sriov_total_vfs'". I'm not sure if that's expected to work or not, but in 1.1.0 it's not.

I guess ideally we'd be able to do DNS configuration on br-ex since that will have both v4 and v6 addresses, whereas the provisioning interface will only have one or the other. If there's some way to add a dummy address of the other type and have that work then that would be an option, but I haven't been able to make that work.

For context, the network layout of the node looks like this:
enp1s0 (only used for provisioning, may or may not have any external connectivity):
-ipv6: fd00:1101::632e:289d:8a65:10c1/128
-no ipv4
br-ex (a bridge on enp2s0, provides external connectivity)
-ipv6: fd2e:6f44:5dd8:c956::17/128
-ipv4: 192.168.111.23/24

Comment 8 Fernando F. Mancera 2021-12-01 08:27:36 UTC

(In reply to Ben Nemec from comment #7)
> Okay thanks, that helps a bunch. I was able to get an ipv6 server working,
> knowing that the orange color is a problem, which in this case meant I also
> needed to disable ipv4 on that interface.
> 
> We may be stuck with ipv4 DNS servers on dual stack clusters though. We
> can't add a route on enp1s0 because that's only used for the provisioning
> network. I tried doing DNS config with just an ipv4 address configured (but
> no route), but that complains about no suitable interface. I assume that's
> because there is no route.

Yes, the route is mandatory..

> 
> We also can't add a route on the primary interface because it's DHCP and
> nmstate will complain about a route targeting an interface with a dynamic
> address. I also haven't been able to configure DNS on the primary interface
> because it's bridged, and trying to configure that fails on
> "'OvsInternalIface' object has no attribute 'sriov_total_vfs'". I'm not sure
> if that's expected to work or not, but in 1.1.0 it's not.

This is not expected, it seems a bug. Do you have a reproducer or a desired state? Thank you!

> 
> I guess ideally we'd be able to do DNS configuration on br-ex since that
> will have both v4 and v6 addresses, whereas the provisioning interface will
> only have one or the other. If there's some way to add a dummy address of
> the other type and have that work then that would be an option, but I
> haven't been able to make that work.
> 
> For context, the network layout of the node looks like this:
> enp1s0 (only used for provisioning, may or may not have any external
> connectivity):
> -ipv6: fd00:1101::632e:289d:8a65:10c1/128
> -no ipv4
> br-ex (a bridge on enp2s0, provides external connectivity)
> -ipv6: fd2e:6f44:5dd8:c956::17/128
> -ipv4: 192.168.111.23/24


Yes.. this should work. But probably the error you described above is limiting you. Let's fix that.

Comment 9 Ben Nemec 2021-12-02 22:38:29 UTC

Okay, I think nmstate is working as intended. My problem with br-ex was that I just swapped the name in the interface definition, and that naturally doesn't work since br-ex is an ovs-bridge, not an interface.

That does present a problem for us because the br-ex bridge is managed by a separate script from nmstate, but it's a problem on our side so I'm going to close this and track that in https://issues.redhat.com/browse/SDN-2555 .

Thanks for your help working through this, Fernando.