Bug 1451261 - [DNS] - dhcp networks are often out-of-sync when adding/updating/removing the DNS configuration
Summary: [DNS] - dhcp networks are often out-of-sync when adding/updating/removing the...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Network
Version: 4.2.0
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ovirt-4.4.0
: ---
Assignee: Edward Haas
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On: 1812914
Blocks: 1160667
TreeView+ depends on / blocked
 
Reported: 2017-05-16 08:54 UTC by Michael Burman
Modified: 2020-05-20 20:01 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-20 20:01:43 UTC
oVirt Team: Network
Embargoed:
pm-rhel: ovirt-4.4+
pm-rhel: ovirt-4.5?


Attachments (Terms of Use)
Logs (2.19 MB, application/x-gzip)
2017-05-16 08:54 UTC, Michael Burman
no flags Details
new engine log (1.35 MB, application/x-gzip)
2017-07-04 11:08 UTC, Michael Burman
no flags Details
new logs2 (1.40 MB, application/x-gzip)
2018-01-15 09:35 UTC, Michael Burman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1517638 0 medium CLOSED Networks often reported as out of sync when switching the network's bootproto - non-mgmt network 2021-02-22 00:41:40 UTC
oVirt gerrit 77619 0 master MERGED core: update VdsDynamic dns configuration after setupnetworks 2020-08-17 11:18:34 UTC
oVirt gerrit 80102 0 master MERGED core: method rename 2020-08-17 11:18:35 UTC
oVirt gerrit 81652 0 master ABANDONED core: fix default route issues in HostSetupNetworksCommand 2020-08-17 11:18:35 UTC
oVirt gerrit 81703 0 master MERGED core: fix default route issues in HostSetupNetworksCommand 2020-08-17 11:18:34 UTC
oVirt gerrit 85841 0 master MERGED engine: Change DNS sync check 2020-08-17 11:18:34 UTC

Internal Links: 1517638

Description Michael Burman 2017-05-16 08:54:46 UTC
Created attachment 1279233 [details]
Logs

Description of problem:
[DNS] - Networks are out-of-sync when adding/updating the DNS configuration. 

When adding or updating a network with DNS configuration, the change succeeded on the host, the ifcfg and resolv.conf updated properly, but engine complaining the the network/s are out-of-sync because of the difference in the name servers between host and DC.

Version-Release number of selected component (if applicable):
4.2.0-0.0.master.20170514130938.gitacb8b09.el7.centos
DNS rpms on top of the master version.
http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-on-demand-el7-x86_64/171/
https://gerrit.ovirt.org/#/c/76497/
How reproducible:
100

Steps to Reproduce:
1. Add 2 name servers to the ovirtmgmt network, via the 'Netwokrs' main tab or via the setupNetworks dialog and approve operation

Actual results:
The host/s updated successfully - 
cat /etc/resolv.conf
; generated by /usr/sbin/dhclient-script
search qa.lab.tlv.redhat.com lab.tlv.redhat.com tlv.redhat.com redhat.com
nameserver 10.35.28.1
nameserver 8.8.8.8
[root@vega04 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt 
# Generated by VDSM version 4.20.0-753.gitbdeadde.el7.centos
DEVICE=ovirtmgmt
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=yes
BOOTPROTO=dhcp
MTU=1500
DEFROUTE=yes
NM_CONTROLLED=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
DNS1=10.35.28.1
DNS2=8.8.8.8

But, engine complaining that network is out-of-sync with the host

Expected results:
Networks should be synced.

Comment 1 Martin Mucha 2017-05-30 11:29:44 UTC
a) vdsm cannot remove dns servers, therefore you need to always set maximum allowed number of them to avoid out-of-sync

b) it seems, that refresh capabilities now has to be called after network is updated in this manner. Doing so will bring network (if a) is met) back to sync.

Comment 2 Michael Burman 2017-07-04 06:52:48 UTC
Upon network changes in the DNS configuration engine still report in 50%-70% of times the network as out-of-sync with the host, looks like a refresh caps is missing or not always successful after the network changes applied. Sync not always invoked to make it synced.
All the network changes were applied with success on the host it self(resolv conf and ifcfg-*) which is great, but,
Doing refresh caps manually will sync the network. 

- Override scenario(host-level configuration) is also in 50% of the cases reported as out-of-sync, although the change took place on the host. 
This bug needs additional improvement.

Comment 3 Martin Mucha 2017-07-04 08:24:15 UTC
requests to vdsm are correct (proper dns configuration is sent)?
is refresh caps always called or not?

Comment 4 Michael Burman 2017-07-04 11:08:09 UTC
(In reply to Martin Mucha from comment #3)
> requests to vdsm are correct (proper dns configuration is sent)?
> is refresh caps always called or not?

Yes, i see that the requests are correct indeed and it's why it properly updated on the host.
I'm not sure. If it always called then maybe they are called before setupNetworks is issued and it's why it is sometimes works and sometimes not. 
Maybe it's timing issue and first refresh caps were executed, while setupNetworks action wasn't done on vdsm side. 
When it finally finished, it is not in synced with previously obtained data, because manually re-invoking refresh caps fixed the issue.

Comment 5 Michael Burman 2017-07-04 11:08:42 UTC
Created attachment 1294159 [details]
new engine log

Comment 6 Martin Mucha 2017-07-12 11:09:04 UTC
(as explained by Dan) If you use dhcp bootproto in setupnetworks, the operation is asynchronous and might produce out-of-sync networks(ones with dhcp). Manual refresh caps is acceptable and should solve potential out-of-sync issues. If it does, it's fine. This situation must not happen for static ip configuration, which is synchronous operation and setupnetworks should never end up with out-of-sync networks when static ip is used.

Can you revisit if this bug is still reproducible with respect to this new info? IIUC it could be fine, since described out-of-sync networks are, according to dans explanations, to be expected.

Comment 7 Michael Burman 2017-07-12 12:12:02 UTC
(In reply to Martin Mucha from comment #6)
> (as explained by Dan) If you use dhcp bootproto in setupnetworks, the
> operation is asynchronous and might produce out-of-sync networks(ones with
> dhcp). Manual refresh caps is acceptable and should solve potential
> out-of-sync issues. If it does, it's fine. This situation must not happen
> for static ip configuration, which is synchronous operation and
> setupnetworks should never end up with out-of-sync networks when static ip
> is used.
> 
> Can you revisit if this bug is still reproducible with respect to this new
> info? IIUC it could be fine, since described out-of-sync networks are,
> according to dans explanations, to be expected.

I don't understand how this is related to bootproto. I'm updating the DNS configuration on the management network with name server. 
And the network is reported as out-of-sync. Nothing changed from last time tested, it's the same thing. 
Networks shouldn't be out-of-sync at the engine and you shouldn't do manual refresh caps. engine should handle this, just like updating any network property while the network being attached to the host, engine do refresh caps and care for sync.

Comment 8 Dan Kenigsberg 2017-07-19 21:43:59 UTC
with bootproto=dhcp, we know that we have a problematic asynchronous behaviour. If it happens with bootproto=static we have a bigger riddle to solve.

There is no doubt that Engine should handle the out-of-sync state. But since refresh caps fixes the issue, users have a workaround, and we know that we have a synchronization issue, not a logical problem. Hence, I'm lowering the severity.

Comment 9 Sven Kieske 2017-09-21 07:38:29 UTC
So maybe document the workaround, beside the info in this bugreport, I still don't know how to "refresh caps", so maybe tell a concrete command a user can run? THX! ;)

Comment 10 Michael Burman 2017-09-24 06:05:20 UTC
(In reply to Sven Kieske from comment #9)
> So maybe document the workaround, beside the info in this bugreport, I still
> don't know how to "refresh caps", so maybe tell a concrete command a user
> can run? THX! ;)

Sven, 'Refresh Caps' is a button in the webadmin UI - Under hosts main tab and 'Management' drop down menu you will see it.

Comment 11 Sven Kieske 2017-09-25 09:16:31 UTC
Hi,

thanks for the clarification!

I mean I'm not really interested myself in this at all, because my network setup differs, but I can imagine users would be pretty annoyed if they just find out about this workaround in this obscure bugtracker? Maybe add this information to the "official" docs?

Just a suggestion, though.

keep up the good work!

Comment 12 Yaniv Kaul 2017-10-15 10:02:01 UTC
Is there additional work needed on this RFE? The posted patches are both merged, are there more expected?

Comment 13 Dan Kenigsberg 2017-10-15 10:11:25 UTC
This is not an RFE, but a buggy remainder of the already-verified rfe 1160667.

It is not yet clear to me why an additional Refresh Caps is required to eliminate the out-of-sync status, and requires further research.

Comment 14 RHV bug bot 2018-01-05 16:57:52 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Open patch attached]

For more info please contact: infra

Comment 15 Michael Burman 2018-01-15 07:36:37 UTC
Edy, 
It is still doesn't work on latest d/s build 4.2.1.1-0.1.el7

I'm wondering if it's because your fix didn't merged in this build or we still missing a proper fix for this bug.

Test performed - 
- Add host to RHV with it's origin DNS, the ovirtmgmt network has no DNS setting on the network prior the add host
- Via networks, edit ovirtmgmt with 2 new name servers
- Engine showing events that it applying networks change on the host
- Engine also showing  events saying that changes were successfully applied on both host
- On both hosts the change was successfully applied in resolv.conf
- BUT, On both hosts ovirtmgmt reported and remain as out-of-sync. 
- In some cases, one host managed to get sync, the second remain as out-of-sync for ever. NO SPM host

- Same applied for the override DNS configuration scenario.

Failing this bug according to this, sorry.

Comment 16 Edward Haas 2018-01-15 09:20:52 UTC
http://gerrit.ovirt.org/85841 fixes a specific scenario where one defines static DNS entries on Engine and has DHCP enabled in parallel.
In such a case, setting a DNS of 8.8.8.8 on Engine, may result in having the host setting a DNS of 8.8.8.8 plus the entries coming from the DHCP response. So we end up with DC requiring 8.8.8.8 and host reporting 8.8.8.8, 1.1.1.1, 2.2.2.2.
The fix says that if DC DNS entries are included in the reported host entries, all is in sync.
This is also what I managed to reproduce on my setup.

If caps is not refreshed, then this is a different problem.
Please attach the logs so we can investigate further.

Comment 17 Michael Burman 2018-01-15 09:35:25 UTC
Created attachment 1381316 [details]
new logs2

Comment 18 Michael Burman 2018-01-15 09:37:26 UTC
My scenario is the same scenario. Nothing has changed from my side of reproduction. I do the same exact steps.

Comment 19 Edward Haas 2018-02-04 11:14:10 UTC
(In reply to Michael Burman from comment #17)
> Created attachment 1381316 [details]
> new logs2

I see in the logs of VDSM that two DNS entries have been set with a DHCP enabled, but on caps, only the entry from the DHCP response is seen, not the other two.
I'm not clear on what happened there exactly, as we use ifcfg files to set the DNS entries.
We need to debug this live.

Comment 20 Edward Haas 2018-02-04 13:02:30 UTC
Debugging the issue shows this scenario:

ifup is issued at 14:15:06,830
Connectivity check is confirmed at 14:16:01,943
But only at 14:16:11,302 the original ifup returns.

The DNS entry is updated only after ifup returns, which in this case, is about 10 seconds after the DHCP answered.

With the ifcfg implementation we have no mechanism that informs us when the DNS is updated, but we could trigger an event to Engine when the original ifup returns successfully.

But then, we may be overloading Engine with several events (IP received by DHCP, followed by ifup returning) causing frequent caps to be issued.
We could try to control it by having a backoff mechanism at the Engine side, to delay the caps for a few seconds, aggregating multiple events (something that is needed anyway for safety reasons).

Comment 21 Michal Skrivanek 2020-03-18 15:45:23 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 22 Michal Skrivanek 2020-03-18 15:50:21 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 23 Dominik Holler 2020-03-23 15:48:37 UTC
We have to reproduce on nmstate.

Comment 24 Michael Burman 2020-03-24 08:26:43 UTC
We depend on BZ 1812914 and nmstate BZ 1815112
Once they fixed this can be retested using nmstate.

Comment 25 Michael Burman 2020-03-30 12:26:02 UTC
This finally can be tested now with latest nmstate and vdsm
BZ 1812914 is verified
BZ 1815112 is verified

Moving to ON_QA for testing

Comment 26 Michael Burman 2020-03-30 12:36:13 UTC
Verified on - vdsm-4.40.9-1.el8ev.x86_64 with 
nmstate-0.2.6-6.el8.noarch
rhvm-4.4.0-0.29.master.el8ev.noarch

Comment 27 Sandro Bonazzola 2020-05-20 20:01:43 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.