Description of problem: If user tries to add number of networks it takes a long time and timeout exception is raised. Version-Release number of selected component (if applicable): rhev-hypervisor6-6.7-20150828.0.iso How reproducible: 100% Steps to Reproduce: 1. Attach 60+ networks Actual results: Action failed. With timeout on engine side Additional info: ovirt_store_config_retnum is called for every file and it takes 22 sec for persistng of 10 files. Running of the function with filelist as an argument takes 2 sec. So performance degradation is quite significant See also: https://bugzilla.redhat.com/show_bug.cgi?id=1193083 (More general problem not related to ovirt-node probably)
Thanks for the patch! The patch allone should not do any improvements, because it requires the caller to call this function differently. Pavel, were any other changes performed to fix the issue?
(In reply to Fabian Deutsch from comment #1) > Thanks for the patch! > > The patch allone should not do any improvements, because it requires the > caller to call this function differently. > Pavel, were any other changes performed to fix the issue? As discussed on IRC I've tested and verified the patch using 3.5 installation. Since 3.6 list of files are not longer accepted. Will rework it.
As the customer has closed the case I'm pushing it out to 4.0. The reasons supporting this are that a fix would change the semantics of the involved functions, and this is not something we can do in a z-stream update.
oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.
Hi Pavel, Since Virt-qe have no such environment to test this issue, would you mind to verify this issue once it fixed? Thanks! Hui Wang
(In reply to wanghui from comment #10) > Hi Pavel, > > Since Virt-qe have no such environment to test this issue, would you mind to > verify this issue once it fixed? > > Thanks! > Hui Wang Sorry but which environment do you need? 1) Add host 2) Create 100 networks (they can be tagged with non-existing vlans tags or any vlan tags if NIC (See 4) is disconnected). Can be scripted 3) "setup host networks" 4) Attach the networks to one NIC (can be disconnected one) 5) Wait for network attached (verified) or timeout occurred (Failed_QA). That's it.
update the needinfo to myself, and update the QA contact to me as well. I tested this bug following comment 11 steps with latest released RHEV-H el7 3.6 and engine 3.6 builds with more than 60+ logical network with non-existing vlans tages or any vlan tags, but I can not reproduce this issue. My test result is here can attech the networks to one NIC successful, did not encounter timeout issue. I will do more checking and testing on this bug next week.
Update the bug status to MODIFIED until 3-ack+.
Actually I did some test on June 20. But not sure I could 100% reproduce this issue on RHEV 3.5.z, tested 6 times to attach one logical network to one NIC, 4 times network was attached successful, but 2 times failed on timeout, this timeout only can be saw in engine.log, _not_ webUI. Test version: engine:3.5.7-0.1 rhev-hypervisor: 6.7 (20151015.1.el6ev) vdsm-4.16.27-1.el6ev.x86_64 Steps: 1. Installed RHEV-H 6.7 successful. 2. Added RHEV-H to Engine 3.5.z(3.5 cluster) 3. RHEV-H is up on Engine. 4. Create 80+ logical networks which are tagged with non-existing vlans tags. 5. "setup host networks" 6. Attach the networks to one NIC 5) Wait for network attached In WebUI, network was attached successful 6 times, there was no timeout in WebUI, the timeout error messages were only in engine.log 2 times. # tail -f engine.log <snip> 2016-06-20 14:35:20,240 WARN [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing 2016-06-20 14:35:20,240 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SetSafeNetworkConfigVDSCommand] (ajp-/127.0.0.1:8702-8) [2b98bd51] Command org.ovirt.engine.core.vdsbroker.vdsbroker.SetSafeNetworkConfigVDSCommand return value StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=5022, mMessage=Message timeout which can be caused by communication issues]] 2016-06-20 14:35:20,241 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SetSafeNetworkConfigVDSCommand] (ajp-/127.0.0.1:8702-8) [2b98bd51] HostName = hp-dl385pg8-11.lab.eng.pek2.redhat.com 2016-06-20 14:35:20,241 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SetSafeNetworkConfigVDSCommand] (ajp-/127.0.0.1:8702-8) [2b98bd51] Command SetSafeNetworkConfigVDSCommand(HostName = hp-dl385pg8-11.lab.eng.pek2.redhat.com, HostId = 7b6cff78-4cfb-47a0-92d0-125b89ea3266) execution failed. Exception: VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues 2016-06-20 14:35:20,242 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SetSafeNetworkConfigVDSCommand] (ajp-/127.0.0.1:8702-8) [2b98bd51] FINISH, SetSafeNetworkConfigVDSCommand, log id: 10bf2be5 2016-06-20 14:35:20,242 ERROR [org.ovirt.engine.core.bll.network.host.CommitNetworkChangesCommand] (ajp-/127.0.0.1:8702-8) [2b98bd51] Command org.ovirt.engine.core.bll.network.host.CommitNetworkChangesCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022) </snip>
(In reply to Ying Cui from comment #14) > Actually I did some test on June 20. But not sure I could 100% reproduce > this issue on RHEV 3.5.z, tested 6 times to attach one logical network to > one NIC, 4 times network was attached successful, but 2 times failed on > timeout, this timeout only can be saw in engine.log, _not_ webUI. > Rephrase here to avoid confusing. I tested 6 times to attach one logical network to one NIC, 6 times PASS on WebUI, the timeout error messages were generated in engine.log 2 times, but not failed on WebUI.
Ryan, could you check this bug? First of all, I am not sure I can reproduce this issue on Node, see comment 14 and comment 15. Secondly, from RHVH 4.0, we will not exist the persistence issue, so we don't need any fix on Node 4.0, customer ticket 01561344 already closed, Thirdly, the main vdsm bug 1193083 is targeted to ovirt-4.1, will not be fixed on RHV 4.0 GA. So whether we can close it in next release or depends on vdsm Bug 1193083 - [scale] Network: RHEV fails to apply high number of networks on a new hypervisor: Timeout during xml-rpc call.
Since the customer case is closed, I'm closing this as a duplicate, since the fix should be inherited by NGN *** This bug has been marked as a duplicate of bug 1193083 ***