Bug 1296141 - Timeout if huge number of network is being created
Summary: Timeout if huge number of network is being created
Keywords:
Status: CLOSED DUPLICATE of bug 1193083
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-node-ng
Version: 3.5.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.0.1
: 4.0.0
Assignee: Fabian Deutsch
QA Contact: Ying Cui
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-06 12:26 UTC by Pavel Zhukov
Modified: 2020-09-10 09:30 UTC (History)
14 users (show)

Fixed In Version: ovirt-node-ng-installer-master-2016051700.iso
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-07-11 13:43:34 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 51422 0 'None' ABANDONED Do not call ovirt_store_config_retnum for every file to avoid performance degradation 2020-11-13 17:41:22 UTC

Description Pavel Zhukov 2016-01-06 12:26:57 UTC
Description of problem:
If user tries to add number of networks it takes a long time and timeout exception is raised. 

Version-Release number of selected component (if applicable):
rhev-hypervisor6-6.7-20150828.0.iso

How reproducible:
100%

Steps to Reproduce:
1. Attach 60+ networks

Actual results:
Action failed. With timeout on engine side

Additional info:
ovirt_store_config_retnum is called for every file and it takes 22 sec for persistng of 10 files. Running of the function with filelist as an argument takes 2 sec. So performance degradation is quite significant 


See also: https://bugzilla.redhat.com/show_bug.cgi?id=1193083 (More general problem not related to ovirt-node probably)

Comment 1 Fabian Deutsch 2016-01-13 16:10:32 UTC
Thanks for the patch!

The patch allone should not do any improvements, because it requires the caller to call this function differently.
Pavel, were any other changes performed to fix the issue?

Comment 2 Pavel Zhukov 2016-01-14 10:27:10 UTC
(In reply to Fabian Deutsch from comment #1)
> Thanks for the patch!
> 
> The patch allone should not do any improvements, because it requires the
> caller to call this function differently.
> Pavel, were any other changes performed to fix the issue?

As discussed on IRC I've tested and verified the patch using 3.5 installation. Since 3.6 list of files are not longer accepted.
Will rework it.

Comment 6 Fabian Deutsch 2016-04-19 09:25:50 UTC
As the customer has closed the case I'm pushing it out to 4.0.

The reasons supporting this are that a fix would change the semantics of the involved functions, and this is not something we can do in a z-stream update.

Comment 7 Yaniv Lavi 2016-05-09 11:02:09 UTC
oVirt 4.0 Alpha has been released, moving to oVirt 4.0 Beta target.

Comment 10 wanghui 2016-05-18 03:35:38 UTC
Hi Pavel,

Since Virt-qe have no such environment to test this issue, would you mind to verify this issue once it fixed?

Thanks!
Hui Wang

Comment 11 Pavel Zhukov 2016-05-23 07:13:11 UTC
(In reply to wanghui from comment #10)
> Hi Pavel,
> 
> Since Virt-qe have no such environment to test this issue, would you mind to
> verify this issue once it fixed?
> 
> Thanks!
> Hui Wang

Sorry but which environment do you need? 
1) Add host
2) Create 100 networks (they can be tagged with non-existing vlans tags or any vlan tags if NIC (See 4) is disconnected). Can be scripted
3) "setup host networks"
4) Attach the networks to one NIC (can be disconnected one)
5) Wait for network attached (verified) or timeout occurred (Failed_QA). That's it.

Comment 12 Ying Cui 2016-06-08 16:35:50 UTC
update the needinfo to myself, and update the QA contact to me as well.

I tested this bug following comment 11 steps with latest released RHEV-H el7 3.6 and engine 3.6 builds with more than 60+ logical network with non-existing vlans tages or any vlan tags, but I can not reproduce this issue. My test result is here can attech the networks to one NIC successful, did not encounter timeout issue.

I will do more checking and testing on this bug next week.

Comment 13 Ying Cui 2016-06-12 12:45:25 UTC
Update the bug status to MODIFIED until 3-ack+.

Comment 14 Ying Cui 2016-07-11 09:26:04 UTC
Actually I did some test on June 20. But not sure I could 100% reproduce this issue on RHEV 3.5.z, tested 6 times to attach one logical network to one NIC, 4 times network was attached successful, but 2 times failed on timeout, this timeout only can be saw in engine.log, _not_ webUI.

Test version:
engine:3.5.7-0.1
rhev-hypervisor: 6.7 (20151015.1.el6ev)
vdsm-4.16.27-1.el6ev.x86_64

Steps:
1. Installed RHEV-H 6.7 successful.
2. Added RHEV-H to Engine 3.5.z(3.5 cluster)
3. RHEV-H is up on Engine.
4. Create 80+ logical networks which are tagged with non-existing vlans tags.
5. "setup host networks"
6. Attach the networks to one NIC
5) Wait for network attached

In WebUI, network was attached successful 6 times, there was no timeout in WebUI, the timeout error messages were only in engine.log 2 times.

# tail -f engine.log
<snip>
2016-06-20 14:35:20,240 WARN  [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing
2016-06-20 14:35:20,240 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SetSafeNetworkConfigVDSCommand] (ajp-/127.0.0.1:8702-8) [2b98bd51] Command org.ovirt.engine.core.vdsbroker.vdsbroker.SetSafeNetworkConfigVDSCommand return value
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=5022, mMessage=Message timeout which can be caused by communication issues]]
2016-06-20 14:35:20,241 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SetSafeNetworkConfigVDSCommand] (ajp-/127.0.0.1:8702-8) [2b98bd51] HostName = hp-dl385pg8-11.lab.eng.pek2.redhat.com
2016-06-20 14:35:20,241 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SetSafeNetworkConfigVDSCommand] (ajp-/127.0.0.1:8702-8) [2b98bd51] Command SetSafeNetworkConfigVDSCommand(HostName = hp-dl385pg8-11.lab.eng.pek2.redhat.com, HostId = 7b6cff78-4cfb-47a0-92d0-125b89ea3266) execution failed. Exception: VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues
2016-06-20 14:35:20,242 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SetSafeNetworkConfigVDSCommand] (ajp-/127.0.0.1:8702-8) [2b98bd51] FINISH, SetSafeNetworkConfigVDSCommand, log id: 10bf2be5
2016-06-20 14:35:20,242 ERROR [org.ovirt.engine.core.bll.network.host.CommitNetworkChangesCommand] (ajp-/127.0.0.1:8702-8) [2b98bd51] Command org.ovirt.engine.core.bll.network.host.CommitNetworkChangesCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues (Failed with error VDS_NETWORK_ERROR and code 5022)
</snip>

Comment 15 Ying Cui 2016-07-11 09:31:40 UTC
(In reply to Ying Cui from comment #14)
> Actually I did some test on June 20. But not sure I could 100% reproduce
> this issue on RHEV 3.5.z, tested 6 times to attach one logical network to
> one NIC, 4 times network was attached successful, but 2 times failed on
> timeout, this timeout only can be saw in engine.log, _not_ webUI.
> 

Rephrase here to avoid confusing.

I tested 6 times to attach one logical network to one NIC, 6 times PASS on WebUI,  the timeout error messages were generated in engine.log 2 times, but not failed on WebUI.

Comment 16 Ying Cui 2016-07-11 10:05:56 UTC
Ryan, could you check this bug? 
First of all, I am not sure I can reproduce this issue on Node, see comment 14 and comment 15.
Secondly, from RHVH 4.0, we will not exist the persistence issue, so we don't need any fix on Node 4.0, customer ticket 01561344 already closed, 
Thirdly, the main vdsm bug 1193083 is targeted to ovirt-4.1, will not be fixed on RHV 4.0 GA.

So whether we can close it in next release or depends on vdsm Bug 1193083 - [scale] Network: RHEV fails to apply high number of networks on a new hypervisor: Timeout during xml-rpc call.

Comment 17 Ryan Barry 2016-07-11 13:43:34 UTC
Since the customer case is closed, I'm closing this as a duplicate, since the fix should be inherited by NGN

*** This bug has been marked as a duplicate of bug 1193083 ***


Note You need to log in before you can comment on or make changes to this bug.