Created attachment 1279233 [details] Logs Description of problem: [DNS] - Networks are out-of-sync when adding/updating the DNS configuration. When adding or updating a network with DNS configuration, the change succeeded on the host, the ifcfg and resolv.conf updated properly, but engine complaining the the network/s are out-of-sync because of the difference in the name servers between host and DC. Version-Release number of selected component (if applicable): 4.2.0-0.0.master.20170514130938.gitacb8b09.el7.centos DNS rpms on top of the master version. http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-on-demand-el7-x86_64/171/ https://gerrit.ovirt.org/#/c/76497/ How reproducible: 100 Steps to Reproduce: 1. Add 2 name servers to the ovirtmgmt network, via the 'Netwokrs' main tab or via the setupNetworks dialog and approve operation Actual results: The host/s updated successfully - cat /etc/resolv.conf ; generated by /usr/sbin/dhclient-script search qa.lab.tlv.redhat.com lab.tlv.redhat.com tlv.redhat.com redhat.com nameserver 10.35.28.1 nameserver 8.8.8.8 [root@vega04 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt # Generated by VDSM version 4.20.0-753.gitbdeadde.el7.centos DEVICE=ovirtmgmt TYPE=Bridge DELAY=0 STP=off ONBOOT=yes BOOTPROTO=dhcp MTU=1500 DEFROUTE=yes NM_CONTROLLED=no IPV6INIT=yes IPV6_AUTOCONF=yes DNS1=10.35.28.1 DNS2=8.8.8.8 But, engine complaining that network is out-of-sync with the host Expected results: Networks should be synced.
a) vdsm cannot remove dns servers, therefore you need to always set maximum allowed number of them to avoid out-of-sync b) it seems, that refresh capabilities now has to be called after network is updated in this manner. Doing so will bring network (if a) is met) back to sync.
Upon network changes in the DNS configuration engine still report in 50%-70% of times the network as out-of-sync with the host, looks like a refresh caps is missing or not always successful after the network changes applied. Sync not always invoked to make it synced. All the network changes were applied with success on the host it self(resolv conf and ifcfg-*) which is great, but, Doing refresh caps manually will sync the network. - Override scenario(host-level configuration) is also in 50% of the cases reported as out-of-sync, although the change took place on the host. This bug needs additional improvement.
requests to vdsm are correct (proper dns configuration is sent)? is refresh caps always called or not?
(In reply to Martin Mucha from comment #3) > requests to vdsm are correct (proper dns configuration is sent)? > is refresh caps always called or not? Yes, i see that the requests are correct indeed and it's why it properly updated on the host. I'm not sure. If it always called then maybe they are called before setupNetworks is issued and it's why it is sometimes works and sometimes not. Maybe it's timing issue and first refresh caps were executed, while setupNetworks action wasn't done on vdsm side. When it finally finished, it is not in synced with previously obtained data, because manually re-invoking refresh caps fixed the issue.
Created attachment 1294159 [details] new engine log
(as explained by Dan) If you use dhcp bootproto in setupnetworks, the operation is asynchronous and might produce out-of-sync networks(ones with dhcp). Manual refresh caps is acceptable and should solve potential out-of-sync issues. If it does, it's fine. This situation must not happen for static ip configuration, which is synchronous operation and setupnetworks should never end up with out-of-sync networks when static ip is used. Can you revisit if this bug is still reproducible with respect to this new info? IIUC it could be fine, since described out-of-sync networks are, according to dans explanations, to be expected.
(In reply to Martin Mucha from comment #6) > (as explained by Dan) If you use dhcp bootproto in setupnetworks, the > operation is asynchronous and might produce out-of-sync networks(ones with > dhcp). Manual refresh caps is acceptable and should solve potential > out-of-sync issues. If it does, it's fine. This situation must not happen > for static ip configuration, which is synchronous operation and > setupnetworks should never end up with out-of-sync networks when static ip > is used. > > Can you revisit if this bug is still reproducible with respect to this new > info? IIUC it could be fine, since described out-of-sync networks are, > according to dans explanations, to be expected. I don't understand how this is related to bootproto. I'm updating the DNS configuration on the management network with name server. And the network is reported as out-of-sync. Nothing changed from last time tested, it's the same thing. Networks shouldn't be out-of-sync at the engine and you shouldn't do manual refresh caps. engine should handle this, just like updating any network property while the network being attached to the host, engine do refresh caps and care for sync.
with bootproto=dhcp, we know that we have a problematic asynchronous behaviour. If it happens with bootproto=static we have a bigger riddle to solve. There is no doubt that Engine should handle the out-of-sync state. But since refresh caps fixes the issue, users have a workaround, and we know that we have a synchronization issue, not a logical problem. Hence, I'm lowering the severity.
So maybe document the workaround, beside the info in this bugreport, I still don't know how to "refresh caps", so maybe tell a concrete command a user can run? THX! ;)
(In reply to Sven Kieske from comment #9) > So maybe document the workaround, beside the info in this bugreport, I still > don't know how to "refresh caps", so maybe tell a concrete command a user > can run? THX! ;) Sven, 'Refresh Caps' is a button in the webadmin UI - Under hosts main tab and 'Management' drop down menu you will see it.
Hi, thanks for the clarification! I mean I'm not really interested myself in this at all, because my network setup differs, but I can imagine users would be pretty annoyed if they just find out about this workaround in this obscure bugtracker? Maybe add this information to the "official" docs? Just a suggestion, though. keep up the good work!
Is there additional work needed on this RFE? The posted patches are both merged, are there more expected?
This is not an RFE, but a buggy remainder of the already-verified rfe 1160667. It is not yet clear to me why an additional Refresh Caps is required to eliminate the out-of-sync status, and requires further research.
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Open patch attached] For more info please contact: infra
Edy, It is still doesn't work on latest d/s build 4.2.1.1-0.1.el7 I'm wondering if it's because your fix didn't merged in this build or we still missing a proper fix for this bug. Test performed - - Add host to RHV with it's origin DNS, the ovirtmgmt network has no DNS setting on the network prior the add host - Via networks, edit ovirtmgmt with 2 new name servers - Engine showing events that it applying networks change on the host - Engine also showing events saying that changes were successfully applied on both host - On both hosts the change was successfully applied in resolv.conf - BUT, On both hosts ovirtmgmt reported and remain as out-of-sync. - In some cases, one host managed to get sync, the second remain as out-of-sync for ever. NO SPM host - Same applied for the override DNS configuration scenario. Failing this bug according to this, sorry.
http://gerrit.ovirt.org/85841 fixes a specific scenario where one defines static DNS entries on Engine and has DHCP enabled in parallel. In such a case, setting a DNS of 8.8.8.8 on Engine, may result in having the host setting a DNS of 8.8.8.8 plus the entries coming from the DHCP response. So we end up with DC requiring 8.8.8.8 and host reporting 8.8.8.8, 1.1.1.1, 2.2.2.2. The fix says that if DC DNS entries are included in the reported host entries, all is in sync. This is also what I managed to reproduce on my setup. If caps is not refreshed, then this is a different problem. Please attach the logs so we can investigate further.
Created attachment 1381316 [details] new logs2
My scenario is the same scenario. Nothing has changed from my side of reproduction. I do the same exact steps.
(In reply to Michael Burman from comment #17) > Created attachment 1381316 [details] > new logs2 I see in the logs of VDSM that two DNS entries have been set with a DHCP enabled, but on caps, only the entry from the DHCP response is seen, not the other two. I'm not clear on what happened there exactly, as we use ifcfg files to set the DNS entries. We need to debug this live.
Debugging the issue shows this scenario: ifup is issued at 14:15:06,830 Connectivity check is confirmed at 14:16:01,943 But only at 14:16:11,302 the original ifup returns. The DNS entry is updated only after ifup returns, which in this case, is about 10 seconds after the DHCP answered. With the ifcfg implementation we have no mechanism that informs us when the DNS is updated, but we could trigger an event to Engine when the original ifup returns successfully. But then, we may be overloading Engine with several events (IP received by DHCP, followed by ifup returning) causing frequent caps to be issued. We could try to control it by having a backoff mechanism at the Engine side, to delay the caps for a few seconds, aggregating multiple events (something that is needed anyway for safety reasons).
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly
We have to reproduce on nmstate.
We depend on BZ 1812914 and nmstate BZ 1815112 Once they fixed this can be retested using nmstate.
This finally can be tested now with latest nmstate and vdsm BZ 1812914 is verified BZ 1815112 is verified Moving to ON_QA for testing
Verified on - vdsm-4.40.9-1.el8ev.x86_64 with nmstate-0.2.6-6.el8.noarch rhvm-4.4.0-0.29.master.el8ev.noarch
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.