Hide Forgot
Description of problem: Configuration of a bridge and VLAN causes a DNS issues "Failed to find suitable interface for saving DNS name servers" despite it leaves the original network intact. Version-Release number of selected component (if applicable): nmstate 0.3.4 How reproducible: Steps to Reproduce: 1. Have a host with following networking set up: [eth0]--[bond0]--[bond0.1000]--[OVS bridge] [eth1]-/ 2. Apply the following state: interfaces: - name: bond0.1044 state: up type: vlan vlan: base-iface: bond0 id: 1044 - bridge: options: stp: enabled: false port: - name: bond0.1044 description: br-ext2 with bond0.1044 ipv4: enabled: false name: br-ext2 state: up type: linux-bridge Actual results: The configuration fails with: File "/usr/bin/nmstatectl", line 11, in <module> load_entry_point('nmstate==0.3.4', 'console_scripts', 'nmstatectl')() File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 67, in main return args.func(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 267, in apply args.save_to_disk, File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 289, in apply_state save_to_disk=save_to_disk, File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 71, in apply net_state = NetState(desired_state, current_state, save_to_disk) File "/usr/lib/python3.6/site-packages/libnmstate/net_state.py", line 58, in __init__ self._ifaces.gen_dns_metadata(self._dns, self._route) File "/usr/lib/python3.6/site-packages/libnmstate/ifaces/ifaces.py", line 408, in gen_dns_metadata iface_metadata = dns_state.gen_metadata(self, route_state) File "/usr/lib/python3.6/site-packages/libnmstate/dns.py", line 116, in gen_metadata "name servers: %s" % server libnmstate.error.NmstateValueError: Failed to find suitable interface for saving DNS name servers: XX.XX.XX.91 Expected results: Configuration gets successfully applied on the host. Additional info: See https://bugzilla.redhat.com/show_bug.cgi?id=1929317 to find details about the issue.
What is the network state before applying the desired state as "nmstatectl show" reports?
The state was not collected by the customer. Is "nmstatectl show" all you need or do you want to collect more data? If so, please list what you will need so we can avoid multiple exchanges with the customer.
For nmstate-0.3, these logs could helps on debugging this issue. * Before nmstate changes the network * nmcli * nmcli d * nmcli c * ip link * ip addr * ip route * sudo nmstatectl show For nmstate-1.0, the logs of nmstate is sufficient.
Verified with nmstate-1.0.2-6.el8 https://beaker.engineering.redhat.com/recipes/9784506/tasks/124053374/results/580711313/logs/resultoutputfile.log
I'm seeing this same issue on openshift 4.8.5 with the Kubernetes NMState operator 4.8.0-202108130208. The following yaml works on 2 of my 3 coreos nodes: apiVersion: nmstate.io/v1beta1 kind: NodeNetworkConfigurationPolicy metadata: name: ens1f0-master1-policy spec: nodeSelector: kubernetes.io/hostname: "master1.jfmocp4.pokprv.stglabs.ibm.com" desiredState: interfaces: - name: ens1f0 description: IP config for ens1f0 type: ethernet state: up ipv4: address: - ip: 10.28.16.140 prefix-length: 24 enabled: true dns-resolver: config: search: - jfmocp4.pokprv.stglabs.ibm.com server: - 10.28.20.31 while on the third I get: message: |- error reconciling NodeNetworkConfigurationPolicy at desired state apply: , failed to execute nmstatectl set --no-commit --timeout 480: 'exit status 1' '' '/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py:325: UserWarning: Using 'set' is deprecated, use 'apply' instead. warnings.warn("Using 'set' is deprecated, use 'apply' instead.") 2021-09-01 09:01:52,077 root DEBUG Nmstate version: 1.0.2 2021-09-01 09:01:52,077 root DEBUG Applying desire state: {'dns-resolver': {'config': {'search': ['jfmocp4.pokprv.stglabs.ibm.com'], 'server': ['10.28.20.31']}}, 'interfaces': [{'description': 'IP config for ens1f0', 'ipv4': {'address': [{'ip': '10.28.16.140', 'prefix-length': 24}], 'enabled': True}, 'name': 'ens1f0', 'state': 'up', 'type': 'ethernet'}]} 2021-09-01 09:01:52,120 root DEBUG NetworkManager version 1.30.0 2021-09-01 09:01:52,125 root DEBUG Async action: Retrieve applied config: ethernet enp34s0f3u1u6 started 2021-09-01 09:01:52,125 root DEBUG Async action: Retrieve applied config: ethernet ens1f0 started 2021-09-01 09:01:52,125 root DEBUG Async action: Retrieve applied config: ethernet ens1f1 started 2021-09-01 09:01:52,125 root DEBUG Async action: Retrieve applied config: ethernet ens7f0 started 2021-09-01 09:01:52,126 root DEBUG Async action: Retrieve applied config: ethernet enp34s0f3u1u6 finished 2021-09-01 09:01:52,126 root DEBUG Async action: Retrieve applied config: ethernet ens1f0 finished 2021-09-01 09:01:52,126 root DEBUG Async action: Retrieve applied config: ethernet ens1f1 finished 2021-09-01 09:01:52,127 root DEBUG Async action: Retrieve applied config: ethernet ens7f0 finished 2021-09-01 09:01:52,150 root DEBUG Interface lo is type unknown and will be ignored during the activation Traceback (most recent call last): File "/usr/bin/nmstatectl", line 11, in <module> load_entry_point('nmstate==1.0.2', 'console_scripts', 'nmstatectl')() File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 73, in main return args.func(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 326, in set return apply(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 354, in apply args.save_to_disk, File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 407, in apply_state save_to_disk=save_to_disk, File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 78, in apply desired_state, ignored_ifnames, current_state, save_to_disk File "/usr/lib/python3.6/site-packages/libnmstate/net_state.py", line 72, in __init__ self._ifaces.gen_dns_metadata(self._dns, self._route) File "/usr/lib/python3.6/site-packages/libnmstate/ifaces/ifaces.py", line 675, in gen_dns_metadata iface_metadata = dns_state.gen_metadata(self, route_state) File "/usr/lib/python3.6/site-packages/libnmstate/dns.py", line 117, in gen_metadata "name servers: %s" % server libnmstate.error.NmstateValueError: Failed to find suitable interface for saving DNS name servers: 10.28.20.31 '
The dns-resolver section was added during debugging.. it also failed without that setting.
Looking at https://catalog.redhat.com/software/containers/openshift4/ose-kubernetes-nmstate-handler-rhel8/5e97379dbed8bd66f83dffb0?tag=v4.8.0-202108130208.p0.git.72dbc0e.assembly.stream&push_date=1630424276000&container-tabs=packages, this build should already contain nmstate 1.0.2-14. Would you please share the NodeNetworkState of the faulty node?
Will gladly share if you tell me how I share NodeNetworkState.. I'm pretty new to OCP :-)
Totally! So while NodeNetworkConfigurationPolicy is used to define how network configuration should look like and NodeNetworkConfigurationEnactment tracks the execution of every policy on every matching node, NodeNetworkState is containing information about the current network state of each node. I would like to see it to confirm that everything on the problematic node is ok and ready to take the Policy. You can get it by calling `oc get nns <name_of_the_node> -o yaml`.
Found it - just need to reboot to clean up some debugging -- and I'll share "oc get nns node01 -o yaml" afterwards.
Created attachment 1823281 [details] nns from master1
Hmmm, it is just a guess, but could your extend the ens1f0 config to explicitly disable DHCP? Something like this: - name: ens1f0 description: IP config for ens1f0 type: ethernet state: up ipv4: address: - ip: 10.28.16.140 prefix-length: 24 dhcp: false auto-dns: false enabled: true
Tried now.. didn't help: # cat ens1f0-master1.yaml apiVersion: nmstate.io/v1beta1 kind: NodeNetworkConfigurationPolicy metadata: name: ens1f0-master1-policy-no-dhcp spec: nodeSelector: kubernetes.io/hostname: "master1.jfmocp4.pokprv.stglabs.ibm.com" desiredState: interfaces: - name: ens1f0 description: IP config for ens1f0 type: ethernet state: up ipv4: address: - ip: 172.20.200.103 prefix-length: 24 dhcp: false auto-dns: false enabled: true # oc apply -f ens1f0-master1.yaml nodenetworkconfigurationpolicy.nmstate.io/ens1f0-master1-policy-no-dhcp created # oc get nnce|grep ens1f0-master1-policy|grep no-dhcp master0.jfmocp4.pokprv.stglabs.ibm.com.ens1f0-master1-policy-no-dhcp NodeSelectorNotMatching master1.jfmocp4.pokprv.stglabs.ibm.com.ens1f0-master1-policy-no-dhcp FailedToConfigure master2.jfmocp4.pokprv.stglabs.ibm.com.ens1f0-master1-policy-no-dhcp NodeSelectorNotMatching # oc get nnce master1.jfmocp4.pokprv.stglabs.ibm.com.ens1f0-master1-policy-no-dhcp -o jsonpath='{.status.conditions[?(@.type=="Failing")].message}' error reconciling NodeNetworkConfigurationPolicy at desired state apply: , failed to execute nmstatectl set --no-commit --timeout 480: 'exit status 1' '' '/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py:325: UserWarning: Using 'set' is deprecated, use 'apply' instead. warnings.warn("Using 'set' is deprecated, use 'apply' instead.") 2021-09-15 12:42:04,090 root DEBUG Nmstate version: 1.0.2 2021-09-15 12:42:04,090 root DEBUG Applying desire state: {'interfaces': [{'description': 'IP config for ens1f0', 'ipv4': {'address': [{'ip': '172.20.200.103', 'prefix-length': 24}], 'auto-dns': False, 'dhcp': False, 'enabled': True}, 'name': 'ens1f0', 'state': 'up', 'type': 'ethernet'}]} 2021-09-15 12:42:04,133 root DEBUG NetworkManager version 1.30.0 2021-09-15 12:42:04,137 root DEBUG Async action: Retrieve applied config: ethernet enp34s0f3u1u6 started 2021-09-15 12:42:04,137 root DEBUG Async action: Retrieve applied config: ethernet ens7f0 started 2021-09-15 12:42:04,138 root DEBUG Async action: Retrieve applied config: ethernet enp34s0f3u1u6 finished 2021-09-15 12:42:04,139 root DEBUG Async action: Retrieve applied config: ethernet ens7f0 finished 2021-09-15 12:42:04,162 root DEBUG Interface lo is type unknown and will be ignored during the activation Traceback (most recent call last): File "/usr/bin/nmstatectl", line 11, in <module> load_entry_point('nmstate==1.0.2', 'console_scripts', 'nmstatectl')() File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 73, in main return args.func(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 326, in set return apply(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 354, in apply args.save_to_disk, File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 407, in apply_state save_to_disk=save_to_disk, File "/usr/lib/python3.6/site-packages/libnmstate/netapplier.py", line 78, in apply desired_state, ignored_ifnames, current_state, save_to_disk File "/usr/lib/python3.6/site-packages/libnmstate/net_state.py", line 72, in __init__ self._ifaces.gen_dns_metadata(self._dns, self._route) File "/usr/lib/python3.6/site-packages/libnmstate/ifaces/ifaces.py", line 675, in gen_dns_metadata iface_metadata = dns_state.gen_metadata(self, route_state) File "/usr/lib/python3.6/site-packages/libnmstate/dns.py", line 117, in gen_metadata "name servers: %s" % server libnmstate.error.NmstateValueError: Failed to find suitable interface for saving DNS name servers: 10.28.20.31
@Fernando, Jan has issues configuring DNS even with an error that seems to be the same as the one tracked on this BZ. He is using nmstate 1.0.2-14. Odd thing is that this successfully passed on 2 nodes out of 3. I checked his desired state and the the current state, but it all looks ok to me. Would you please get a chance to check it, to see if it is an issue in the config or a new bug. Thanks.
One thing I came to think of was if there was something with the network manager connection name. It's not always good to use device name as connection name.. and one thing I now noticed was that on my working nodes, I have many more connections defined for some reason. Maybe some device discovery in network manager that hasn't happened on master1 ? [root@c83f1-app1 ip-config]# ssh master0 nmcli con NAME UUID TYPE DEVICE ens7f0 60c1cf64-75a8-4668-a41c-84ac9bedff81 ethernet ens7f0 Wired connection 1 4e3eb403-c8c4-3f52-bde2-71857c9d2024 ethernet enp34s0f3u1u6 ens1f0 3e7da231-6226-4246-9d7c-603f6764aaea ethernet ens1f0 ens1f1 e695f9e2-1675-4592-8972-acab87303803 ethernet ens1f1 Wired connection 10 ab047018-5226-3909-86d0-868d92c61e0e ethernet -- Wired connection 4 6eb908f3-ae53-385b-b561-2c39ecd15138 ethernet -- Wired connection 5 5eb86b12-8353-3688-a3b7-ed59659bc302 ethernet -- Wired connection 6 ec923941-33af-339d-a146-b5ce1fa015dd ethernet -- Wired connection 7 7bcb61cc-433c-39ad-b28a-e78e76065e66 ethernet -- Wired connection 8 67119c10-78a3-3d91-a0d4-8fd8b846d553 ethernet -- Wired connection 9 105b26d2-625e-3084-b224-0ac7f29c221e ethernet -- [root@c83f1-app1 ip-config]# ssh master1 nmcli con NAME UUID TYPE DEVICE Wired Connection 9c2212c7-1e17-4713-9da7-cad67a308f6f ethernet enp34s0f3u1u6 Wired Connection 9c2212c7-1e17-4713-9da7-cad67a308f6f ethernet ens7f0 ens7f0 6326e468-ae3e-43b9-af75-e03c2bbd90f8 ethernet -- [root@c83f1-app1 ip-config]# ssh master2 nmcli con NAME UUID TYPE DEVICE ens7f0 1f4f8016-2ec6-4f6f-8eb0-b519e42b1630 ethernet ens7f0 Wired connection 1 da4550db-ef2d-3526-8f4b-763f1459f99e ethernet enp34s0f3u1u6 ens1f0 d1c0f7fe-eb11-44b6-99bd-f1a1a7fac201 ethernet ens1f0 ens1f1 298c11af-2e1e-4dc0-84aa-c462029e2409 ethernet ens1f1 Wired connection 10 ced867e2-41b2-3d2e-b323-0c32dce7ed02 ethernet -- Wired connection 4 5444f514-cad5-32be-a3c4-7a9c2cc17fcc ethernet -- Wired connection 5 cba69e9f-8608-37df-bc16-82f47dda27bb ethernet -- Wired connection 6 5dea7a19-50a3-35b0-80ac-e2a2c43552b7 ethernet -- Wired connection 7 344f06ad-c2a2-3a13-bb3c-58e77a3f6fc0 ethernet -- Wired connection 8 9a962b56-3c9f-381e-a38b-7c5e4ffc4c65 ethernet -- Wired connection 9 7e86fe3a-07dc-3062-bc46-e476cdb3dad8 ethernet --
.. and back to my comment about device names and connection names.. I now see that master1 has a connection named "ens7f0" with no device, and two "Wired Connection" where one has device "ens7f0" !
I am taking a look to the desired and current state. Thank you!
Sure, then let me contact the NetworkManager developers to get them involved. Could you also attach the requested information to the BZ? Just to make sure all the developers can look at everything. I think it could be better to create a BZ for this issue. In case we need a fix, a BZ will be required. Thank you!
Which requested info do you need? Please feel free to use any data I've uploaded to the support case if that's relevant.
I "fixed" this problem by nmcli con del 9c2212c7-1e17-4713-9da7-cad67a308f6f # just removing this and rebooting didn't work, problem then re-appeared mv /etc/NetworkManager/system-connections/default_connection.nmconnection /root/ # move away the file mentioning this UUID systemctl restart NetworkManager and now "nmcli con show" looks much more like the other nodes, without any duplicates. And I can apply my config settings.
(In reply to jan-frode from comment #59) > I "fixed" this problem by > > > nmcli con del 9c2212c7-1e17-4713-9da7-cad67a308f6f # just removing this and > rebooting didn't work, problem then re-appeared > mv /etc/NetworkManager/system-connections/default_connection.nmconnection > /root/ # move away the file mentioning this UUID > systemctl restart NetworkManager > > and now "nmcli con show" looks much more like the other nodes, without any > duplicates. And I can apply my config settings. Hi! It seems NM was picking the wrong connections for a reason. It is nice you have a workaround but that is not enough. I am trying to reproduce this with all the information you provided. Thank you!
So the culprit seems to have been the file /etc/NetworkManager/system-connections/default_connection.nmconnection -- this file was only on master1. Master0 and master2 has no such file. The contents of this file on master1 was: [connection] id=Wired Connection uuid=9c2212c7-1e17-4713-9da7-cad67a308f6f type=ethernet autoconnect-retries=1 multi-connect=3 permissions= wait-device-timeout=60000 [ethernet] mac-address-blacklist= [ipv4] dhcp-timeout=90 dns=10.28.20.31; dns-search= may-fail=false method=auto [ipv6] addr-gen-mode=eui64 dhcp-timeout=90 dns-search= method=auto I have no idea where this file came from.. or what I can have done differently on master1 compared to master0/2.
(In reply to jan-frode from comment #61) > So the culprit seems to have been the file > /etc/NetworkManager/system-connections/default_connection.nmconnection -- > this file was only on master1. Master0 and master2 has no such file. The > contents of this file on master1 was: > > [connection] > id=Wired Connection > uuid=9c2212c7-1e17-4713-9da7-cad67a308f6f > type=ethernet > autoconnect-retries=1 > multi-connect=3 > permissions= > wait-device-timeout=60000 > > [ethernet] > mac-address-blacklist= > > [ipv4] > dhcp-timeout=90 > dns=10.28.20.31; > dns-search= > may-fail=false > method=auto > > [ipv6] > addr-gen-mode=eui64 > dhcp-timeout=90 > dns-search= > method=auto > > > I have no idea where this file came from.. or what I can have done > differently on master1 compared to master0/2. I not sure where this file came from but it didn't came from Nmstate. The profiles names in Nmstate always use the interface name. Maybe someone from CNV can know more about this?
CNV is only configuring host networking through nmstate. "Wired connection" sounds like it is created automatically by the OS. I don't know why is it not consistent across nodes. Was any non-nmstate configuration of the node networking (creating/editing profiles, modifying NetworkManager config, creating ifcfg files through MCO, ...) done on your cluster?
I don't think we did any non-nmstate configuration after deploying the nodes. I'm a bit new to openshift, and thought we couldn't do any direct nmcli changes on the coreos images. Only thing I know we did was to switch the mellanox connectx6 ports from infiniband to ethernet mode. So there will have been activated 4 new ethernet interfaces after the initial installation.
> we couldn't do any direct nmcli changes on the coreos images You can't. But you could use MCO to deploy a script that uses `nmcli` on boot to configure networking. This is not something I would recommend though. As this is unrelated to this BZ, would you open a new bug? Feel free to open it on the same component, I would then try to help you find the correct owner.
@phoracek, >As this is unrelated to this BZ, would you open a new bug? Feel free to open it on the same component, I would then try to help you find the correct owner. I've already submitted a new bz https://bugzilla.redhat.com/show_bug.cgi?id=2008446
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (nmstate bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4157