Description of problem: In RHEV-H hypervisor, "save network configuration" takes more than 3 minute which ends up in hypervisor to go non responsive when the number of logical networks assigned to the hypervisor is high. This is only observed in the RHEV-H and not in RHEL-H . In the customer environment, Host.setSafeNetworkConfig is taking about 4 minute. Customer have 24 bridge network. Any minor change like removing a single vlan from the hypervisor will take more than 3 minute which causes hypervisor to go non reponsive and results in the migration of the VMs to the other hypervisor. Version-Release number of selected component (if applicable): Red Hat Enterprise Virtualization Hypervisor release 7.2 (20160413.0.el7ev) vdsm-4.17.26-0.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. Add more than 25 vlan in a hypervisor . 2. Try to do any minor change like removing a logical network . 3. The hypervisor go into non responsive during the "save network configuration" process. Actual results: "save network configuration" is making the hypervisor non responsive Expected results: "save network configuration" should work. Additional info:
It seems like the delay is here. node_persist_owned_ifcfgs() { for f in $(find "$NET_CONF_DIR" -type f); do if grep -q "# Generated by VDSM version" "$f"; then ovirt_store_config "$f" fi done } ovirt_store_config() { for p in "$@"; do python <<EOP from ovirtnode.ovirtfunctions import ovirt_store_config_retnum ovirt_store_config_retnum("$p") ovirtfunctions.py is called separately for each ifcfg file and it seems like it's taking more than 4 seconds for ovirtfunctions.py to load in each iterate. === time python /usr/lib/python2.7/site-packages/ovirtnode/ovirtfunctions.py real 0m4.041s user 0m3.587s sys 0m0.178s === For customer, we have 52 ifcfg file to persist. grep -ir "Generated by VDSM version" etc/sysconfig/network-scripts/|wc -l 52
Could you attach supervdsm.log to BZ? This bug might have been introduced by https://gerrit.ovirt.org/#/c/44929/3/vdsm/network/configurators/ifcfg.py which was required to solve bug 1252268.
Actually, it is more likely that it's due to https://gerrit.ovirt.org/#/q/Ibc717b86194a32c050d346e235a5c35fd318e1ff - one of the many patches done to solve bug 1203422.
> Dan, am i right about the patch number? yes you are.
I am checking with customer if he can test this. Meanwhile I tried in my test environment with 40 logical networks. Before it was taking about 3+ minute. jsonrpc.Executor/0::DEBUG::2016-06-01 16:13:00,173::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.setSafeNetworkConfig' in bridge with {} jsonrpc.Executor/0::DEBUG::2016-06-01 16:16:25,880::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.setSafeNetworkConfig' in bridge with True After applying the patch, the process finished within few milliseconds! jsonrpc.Executor/1::DEBUG::2016-06-01 16:23:00,187::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.setSafeNetworkConfig' in bridge with {} jsonrpc.Executor/1::DEBUG::2016-06-01 16:23:00,228::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.setSafeNetworkConfig' in bridge with True
Sorry, there was some error when I copied the file. The correct result after applying the patch in my test environment is jsonrpc.Executor/5::DEBUG::2016-06-01 16:36:00,562::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.setSafeNetworkConfig' in bridge with {} jsonrpc.Executor/5::DEBUG::2016-06-01 16:36:03,222::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.setSafeNetworkConfig' in bridge with True supervdsm log MainProcess|jsonrpc.Executor/5::DEBUG::2016-06-01 16:36:00,565::utils::671::root::(execCmd) /usr/bin/taskset --cpu-list 0-1 /usr/share/vdsm/vdsm-store-net-config unified (cwd None) MainProcess|jsonrpc.Executor/5::DEBUG::2016-06-01 16:36:03,221::utils::689::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|jsonrpc.Executor/5::DEBUG::2016-06-01 16:36:03,222::supervdsmServer::123::SuperVdsm.ServerCallback::(wrapper) return setSafeNetworkConfig with None So it's taking only 3 seconds to complete.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-1671.html