Created attachment 859467 [details] engine vdsm and supervdsm logs Description of problem: On 3.4 if network is attached to the host when updating network on the DC setupNetwork command sent sync the network on the host. If update network send more then once to quickly the second internal setupNetworks fails with error: Operation Failed: [Resource unavailable] Version-Release number of selected component (if applicable): ovirt-engine-3.4.0-0.5.beta1.el6.noarch How reproducible: 100% Steps to Reproduce: 1. Create 2 networks and attached them to the host using setupNetworks 2. Update the first network and immediately update the second one 3. Actual results: The second network never got synced on the host Expected results: Additional info: Both networks should be synced on the host
You might end with the same result if you execute 2 consecutive 'setup networks' commands from 2 different clients. But now with the multi-host network configuration and network labels feature we might hit it more often. The result of such failure will be a network which is not synced on the host. This is not a blocking issue: 1. The failure is reported via an event log specifying the change wasn't configured on the specific host. 2. The user can sync the network via the 'setup networks' as used to do before if failed. As for the specific issue: VDSM has a lock on the 'setupNetworks' which rejects any consecutive call with 'resource unavailable' error. There are couple of ways to handle it, and it deserves its own thread on engine-devel and vdsm-devel.
Setting target release to current version for consideration and review. please do not push non-RFE bugs to an undefined target release to make sure bugs are reviewed for relevancy, fix, closure, etc.
This is an automated message. Re-targeting all non-blocker bugs still open on 3.4.0 to 3.4.1.
This is an automated message. This Bugzilla report has been opened on a version which is not maintained anymore. Please check if this bug is still relevant in oVirt 3.5.4. If it's not relevant anymore, please close it (you may use EOL or CURRENT RELEASE resolution) If it's an RFE please update the version to 4.0 if still relevant.
I managed to reproduce this bug on 3.6.0.1-0.1.el6 with vdsm-4.17.9-1.el7ev.noarch I have 2 vlan non-VM networks attached to 1 NIC and i managed to update them quickly from non-VM> to VM networks, and result was that i ended up with 2 unsynced networks. Property - Bridged Host - false DC- true Attaching vdsm.log
Created attachment 1083241 [details] vdsm log
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
In oVirt testing is done on single stream by default. Therefore I'm removing the 4.0 flag. If you think this bug must be tested in 4.0 as well, please re-add the flag. Please note we might not have testing resources to handle the 4.0 clone.
(In reply to Michael Burman from comment #6) > Created attachment 1083241 [details] > vdsm log #1 This log does not show the tale-telling 'concurrent network verb already executing' error. Hence, I assume that the Engine-side protection actually worked. #2 I do not understand how the error Thread-10905::DEBUG::2015-10-15 13:41:32,803::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.setupNetworks' in bridge with {u'bondings': {}, u'networks': {u'net-2': {u'nic': u'ens1f0', u'bridged': u'false', u'mtu': u'1500'}}, u'options': {u'connectivityCheck': u'true', u'connectivityTimeout': 120}} ... ConfigNetworkError: (21, "interface u'ens1f0' cannot be defined with this network since it is already defined with network net-1") is related to your reproduction. #3 The patch solves a race between two single-host setupNetwork commands. You are right that it does not solve the reported problem of a DC-level change. We change the DC-level property, then go to change it on the host(s) to which the network is attached. Any of these host-level commands may fail and leave the host in an unsync'ed state.
I believe that the problem stems from our multi-host (background) operations. When changing the DC-level property, a background process starts updating all relevant hosts. Another change of a DC-level property would spawn another background multi-host process that is likely to collide with the first one. Solving this is not easy. We may want to fail all subsequent multi-host processes while the first one has not finished, but this may harm usability if the first one handles many slow hosts. We can block subsequent processes only if they handle the same hosts as a concurrent process. Another idea is to provide a mechanism to report the current state of multi-host processes, so the user can tell whether it is safe to start a new process.
Another scenario for this bug --> 1) Attach 3 vlan networks to NIC via label 2) Remove the 3 networks from DC in one action Result: - 1 network was removed from the host(with the label) - 2 networks remain as 'unmanaged' networks on the host('remove' setup networks command was sent, but first operation was still busy busy on vdsm) jsonrpc.Executor/6::DEBUG::2016-04-06 10:50:44,341::__init__::511::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.setupNetworks' in bridge with {'bondings': {}, 'networks': {'f3': {'remove': 'true'}}, 'optio ns': {'connectivityCheck': 'true', 'connectivityTimeout': 120}} mailbox.SPMMonitor::DEBUG::2016-04-06 10:50:44,365::storage_mailbox::733::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0191812 s, 53.4 MB /s\n'; <rc> = 0 jsonrpc.Executor/3::DEBUG::2016-04-06 10:50:44,374::__init__::511::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.ping' in bridge with {} jsonrpc.Executor/3::DEBUG::2016-04-06 10:50:44,375::__init__::539::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.ping' in bridge with True jsonrpc.Executor/5::DEBUG::2016-04-06 10:50:44,689::__init__::511::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.setupNetworks' in bridge with {'bondings': {}, 'networks': {'f2': {'remove': 'true'}}, 'optio ns': {'connectivityCheck': 'true', 'connectivityTimeout': 120}} jsonrpc.Executor/5::WARNING::2016-04-06 10:50:44,690::API::1459::vds::(setupNetworks) concurrent network verb already executing jsonrpc.Executor/7::DEBUG::2016-04-06 10:50:44,693::__init__::511::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.ping' in bridge with {} jsonrpc.Executor/7::DEBUG::2016-04-06 10:50:44,693::__init__::539::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.ping' in bridge with True jsonrpc.Executor/1::DEBUG::2016-04-06 10:50:45,027::__init__::511::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.setupNetworks' in bridge with {'bondings': {}, 'networks': {'f4': {'remove': 'true'}}, 'optio ns': {'connectivityCheck': 'true', 'connectivityTimeout': 120}} Attaching engine log Tested on 4.0.0-0.0.master.20160404161620.git4ffd5a4.el7.centos and vdsm-4.17.999-879.git565cb2e.el7.centos.noarch
Created attachment 1144119 [details] new engine log_
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
oVirt 4.0 beta has been released, moving to RC milestone.
Hi. I think I'm hitting this bug when deleting multiple networks in a batch. I delete for example 10 networks, but only 4 get deleted and the other 6 remain as "unmanaged" Since this hasn't been solved for a long time, do you know of any workarounds that I can apply? Is there any way of deleting "all" the unmanaged networks in a host or something similar? Thanks!
Eitan please add all relevant patches that you already merged for this bug report. Thanks)
(In reply to Michael Burman from comment #17) > Eitan please add all relevant patches that you already merged for this bug > report. Thanks) done
The expected behavior after the fix is to avoid a collision messages in engine when performing multiple host network requests and avoid failures if such collision happen. We shouldn't see "Can't perform setup networks because another setup networks is running" error in engine.
We haven't saw this collisions for a long time now. Looks good. We will report new bug if see this or similar again Verified on - 4.4.0-0.27.master.el8ev with vdsm-4.40.7-1.el8ev.x86_64 nmstate-0.2.6-4.el8.noarch
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.