Bug 1340234 - Save network configuration takes more than 3 minute makes hypervisor non responsive in RHEV-M
Summary: Save network configuration takes more than 3 minute makes hypervisor non re...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.6.5
Hardware: All
OS: Linux
high
high
Target Milestone: ovirt-4.0.0-rc
: ---
Assignee: Edward Haas
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On:
Blocks: 1349029
TreeView+ depends on / blocked
 
Reported: 2016-05-26 19:22 UTC by nijin ashok
Modified: 2019-11-14 08:11 UTC (History)
9 users (show)

Fixed In Version: v4.17.31
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1349029 (view as bug list)
Environment:
Last Closed: 2016-08-23 20:16:16 UTC
oVirt Team: Network
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:1671 normal SHIPPED_LIVE VDSM 4.0 GA bug fix and enhancement update 2016-09-02 21:32:03 UTC
oVirt gerrit 58221 ovirt-3.5 ABANDONED Revert "net: always persist owned ifcfg files on ovirt node" 2016-07-26 07:53:55 UTC
oVirt gerrit 58298 master MERGED Revert "net: always persist owned ifcfg files on ovirt node" 2016-05-31 14:36:17 UTC
oVirt gerrit 58299 ovirt-3.6 MERGED Revert "net: always persist owned ifcfg files on ovirt node" 2016-06-02 13:08:00 UTC

Description nijin ashok 2016-05-26 19:22:28 UTC
Description of problem:

In RHEV-H hypervisor, "save network configuration" takes more than 3 minute which ends up in hypervisor to go non responsive when the number of logical networks assigned to the hypervisor is high. This is only observed in the RHEV-H and not in RHEL-H . 

In the customer environment, Host.setSafeNetworkConfig is taking about 4 minute. Customer have 24 bridge network. Any minor change like removing a single vlan from the hypervisor will take more than 3 minute which causes hypervisor to go non reponsive and results in the migration of the VMs to the other hypervisor.


Version-Release number of selected component (if applicable):
Red Hat Enterprise Virtualization Hypervisor release 7.2 (20160413.0.el7ev)
vdsm-4.17.26-0.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:

1. Add more than 25 vlan in a hypervisor . 

2. Try to do any minor change like removing a logical network . 

3. The hypervisor go into non responsive during the "save network configuration" process.

Actual results:

"save network configuration" is making the hypervisor non responsive

Expected results:

"save network configuration" should work.

Additional info:

Comment 2 nijin ashok 2016-05-26 19:23:17 UTC
It seems like the delay is here.

node_persist_owned_ifcfgs() {
    for f in $(find "$NET_CONF_DIR" -type f); do
        if grep -q "# Generated by VDSM version" "$f"; then
            ovirt_store_config "$f"
        fi
    done
}

ovirt_store_config() {
    for p in "$@"; do
        python <<EOP
from ovirtnode.ovirtfunctions import ovirt_store_config_retnum
ovirt_store_config_retnum("$p")

ovirtfunctions.py is called separately for each  ifcfg file and it seems like it's taking more than 4 seconds for ovirtfunctions.py to load in each iterate.

===
time python /usr/lib/python2.7/site-packages/ovirtnode/ovirtfunctions.py

real	0m4.041s
user	0m3.587s
sys	0m0.178s
===

For customer, we have 52 ifcfg file to persist.

grep -ir "Generated by VDSM version" etc/sysconfig/network-scripts/|wc -l
52

Comment 3 Dan Kenigsberg 2016-05-29 13:12:24 UTC
Could you attach supervdsm.log to BZ?

This bug might have been introduced by https://gerrit.ovirt.org/#/c/44929/3/vdsm/network/configurators/ifcfg.py which was required to solve bug 1252268.

Comment 5 Dan Kenigsberg 2016-05-29 13:27:21 UTC
Actually, it is more likely that it's due to https://gerrit.ovirt.org/#/q/Ibc717b86194a32c050d346e235a5c35fd318e1ff - one of the many patches done to solve bug 1203422.

Comment 7 Dan Kenigsberg 2016-05-31 14:39:11 UTC
> Dan, am i right about the patch number?

yes you are.

Comment 10 nijin ashok 2016-06-01 23:33:48 UTC
I am checking with customer if he can test this.

Meanwhile I tried in my test environment with 40 logical networks.

Before it was taking about 3+ minute.

jsonrpc.Executor/0::DEBUG::2016-06-01 16:13:00,173::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.setSafeNetworkConfig' in bridge with {}
jsonrpc.Executor/0::DEBUG::2016-06-01 16:16:25,880::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.setSafeNetworkConfig' in bridge with True


After applying the patch, the process finished within few milliseconds!

jsonrpc.Executor/1::DEBUG::2016-06-01 16:23:00,187::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.setSafeNetworkConfig' in bridge with {}
jsonrpc.Executor/1::DEBUG::2016-06-01 16:23:00,228::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.setSafeNetworkConfig' in bridge with True

Comment 11 nijin ashok 2016-06-01 23:46:15 UTC
Sorry, there was some error when I copied the file. The correct result after applying the patch in my test environment is

jsonrpc.Executor/5::DEBUG::2016-06-01 16:36:00,562::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.setSafeNetworkConfig' in bridge with {}
jsonrpc.Executor/5::DEBUG::2016-06-01 16:36:03,222::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.setSafeNetworkConfig' in bridge with True

 
supervdsm log

MainProcess|jsonrpc.Executor/5::DEBUG::2016-06-01 16:36:00,565::utils::671::root::(execCmd) /usr/bin/taskset --cpu-list 0-1 /usr/share/vdsm/vdsm-store-net-config unified (cwd None)
MainProcess|jsonrpc.Executor/5::DEBUG::2016-06-01 16:36:03,221::utils::689::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|jsonrpc.Executor/5::DEBUG::2016-06-01 16:36:03,222::supervdsmServer::123::SuperVdsm.ServerCallback::(wrapper) return setSafeNetworkConfig with None

So it's taking only 3 seconds to complete.

Comment 17 errata-xmlrpc 2016-08-23 20:16:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1671.html


Note You need to log in before you can comment on or make changes to this bug.