Bug 1735384

Summary: 1 host in non responsive and stuck in migrating a vm
Product: [oVirt] ovirt-engine Reporter: Pascal DeMilly <pascal>
Component: GeneralAssignee: bugs <bugs>
Status: NEW --- QA Contact: Lukas Svaty <lsvaty>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.3.3.5CC: bugs, michal.skrivanek, rbarry
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
screenshot of hosts page
none
screen of non responsive host virtual machines
none
not able to put host in maintenance mode none

Description Pascal DeMilly 2019-07-31 21:06:50 UTC
Created attachment 1596557 [details]
screenshot of hosts page

Description of problem:

I have 1 data center with 2 clusters each with 3 hosts. One of my host shows 1 vm being migrated (it was the hosted-engine), however that hosted-engine has been successfully migrated to another host. If I open this host detail page I can see no virtual machine assigned to it so it looks like just a counter or some event needs to be clear. I cannot put that host in maintenance mode either or even reboot it from the web interface. Rebooting it from the CLI doesn't clear that field

Version-Release number of selected component (if applicable):

Software Version:4.3.3.5-1.el7

How reproducible:

whatever I try it is still there

Steps to Reproduce:
1.
2.
3.

Actual results:

In the events log I get: VDSM d1-c1-v1 command Get Host Capabilities failed: Message timeout which can be caused by communication issues


Expected results:


Additional info:

Comment 1 Pascal DeMilly 2019-07-31 21:08:23 UTC
Created attachment 1596558 [details]
screen of non responsive host virtual machines

Comment 2 Pascal DeMilly 2019-07-31 21:09:42 UTC
Created attachment 1596559 [details]
not able to put host in maintenance mode

Comment 3 Ryan Barry 2019-08-01 00:59:51 UTC
So, if you right-click the VM or hit the extended menu, you can select "host has been rebooted", which may clear it.

Is vdsm on the host responsive? Can it be routed to from the engine? After a HE migration, I'd suspect there may be a network interruption due to a misconfiguration somewhere

Comment 4 Pascal DeMilly 2019-08-01 16:37:28 UTC
When I choose "host has been rebooted" I get the following: Error while executing action: Cannot perform confirm 'Host has been rebooted'. Another power management action is already in progress.

I have now a second host that is unresponsive. In this case also I was migrating it to be able to update it.

On the 1st host here is the vdsm.log file: (and yes the host is pingable from the hosted-engine and vice versa)

2019-08-01 09:33:21,640-0700 INFO  (jsonrpc/4) [api.host] START getAllVmStats() from=::1,48452 (api:48)
2019-08-01 09:33:21,640-0700 INFO  (jsonrpc/4) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,48452 (api:54)
2019-08-01 09:33:21,641-0700 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312)
2019-08-01 09:33:30,792-0700 INFO  (periodic/1) [vdsm.api] START repoStats(domains=()) from=internal, task_id=39d30922-eab1-4fa5-8577-3abc50439f89 (api:48)
2019-08-01 09:33:30,793-0700 INFO  (periodic/1) [vdsm.api] FINISH repoStats return={} from=internal, task_id=39d30922-eab1-4fa5-8577-3abc50439f89 (api:54)
2019-08-01 09:33:36,671-0700 INFO  (jsonrpc/5) [api.host] START getAllVmStats() from=::1,48452 (api:48)
2019-08-01 09:33:36,672-0700 INFO  (jsonrpc/5) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,48452 (api:54)
2019-08-01 09:33:36,672-0700 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312)
2019-08-01 09:33:45,906-0700 INFO  (periodic/3) [vdsm.api] START repoStats(domains=()) from=internal, task_id=3d07b451-8e00-4100-9eb8-5f533b2f281b (api:48)
2019-08-01 09:33:45,907-0700 INFO  (periodic/3) [vdsm.api] FINISH repoStats return={} from=internal, task_id=3d07b451-8e00-4100-9eb8-5f533b2f281b (api:54)
2019-08-01 09:33:51,695-0700 INFO  (jsonrpc/6) [api.host] START getAllVmStats() from=::1,48452 (api:48)
2019-08-01 09:33:51,695-0700 INFO  (jsonrpc/6) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,48452 (api:54)
2019-08-01 09:33:51,696-0700 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312)

upgrade.log

MainThread::INFO::2019-07-31 18:02:42,273::netconfpersistence::231::root::(_clearDisk) Clearing netconf: /var/lib/vdsm/persistence/netconf
MainThread::INFO::2019-07-31 18:02:42,281::netconfpersistence::181::root::(save) Saved new config PersistentConfig({'AAA': {u'ipv6autoconf': False, u'nameservers': [], u'vlan': 2001, u'switch': u'legacy', u'mtu': 1500, u'bonding': u'bond0', u'dhcpv6': False, u'stp': False, u'bridged': True, u'defaultRoute': False, u'bootproto': u'none'}, 'ovirtmgmt': {u'ipv6autoconf': True, u'nameservers': [], u'bonding': u'bond1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'stp': False, u'bridged': True, u'defaultRoute': True, u'bootproto': u'dhcp'}, 'bfit-vm': {u'ipv6autoconf': False, u'nameservers': [], u'vlan': 1, u'switch': u'legacy', u'mtu': 1500, u'bonding': u'bond0', u'dhcpv6': False, u'stp': False, u'bridged': True, u'defaultRoute': False, u'bootproto': u'none'}, 'BBB': {u'ipv6autoconf': False, u'nameservers': [], u'vlan': 2002, u'switch': u'legacy', u'mtu': 1500, u'bonding': u'bond0', u'dhcpv6': False, u'stp': False, u'bridged': True, u'defaultRoute': False, u'bootproto': u'none'}, 'nas': {u'ipv6autoconf': False, u'nameservers': [], u'nic': u'p1p1', u'ipaddr': u'192.168.4.51', u'switch': u'legacy', u'mtu': 1500, u'netmask': u'255.255.255.0', u'dhcpv6': False, u'bridged': False, u'defaultRoute': False, u'bootproto': u'none'}, 'display': {u'ipv6autoconf': False, u'nameservers': [], u'nic': u'p1p2', u'ipaddr': u'70.182.176.223', u'netmask': u'255.255.255.0', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'bridged': False, u'gateway': u'70.182.176.1', u'defaultRoute': False, u'bootproto': u'none'}, 'CCC': {u'ipv6autoconf': False, u'nameservers': [], u'vlan': 2003, u'switch': u'legacy', u'mtu': 1500, u'bonding': u'bond0', u'dhcpv6': False, u'stp': False, u'bridged': True, u'defaultRoute': False, u'bootproto': u'none'}}, {'bond0': {u'nics': [u'em3', u'em4'], u'switch': u'legacy', u'options': u'mode=1 miimon=100'}, 'bond1': {u'nics': [u'em1', u'em2'], u'switch': u'legacy', u'options': u'mode=2 miimon=100'}}, {}) to [/var/lib/vdsm/persistence/netconf/nets,/var/lib/vdsm/persistence/netconf/bonds,/var/lib/vdsm/persistence/netconf/devices]
MainThread::DEBUG::2019-07-31 18:02:42,281::cmdutils::133::root::(exec_cmd) /usr/share/openvswitch/scripts/ovs-ctl status (cwd None)
MainThread::DEBUG::2019-07-31 18:02:42,303::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainThread::DEBUG::2019-07-31 18:02:42,304::vsctl::68::root::(commit) Executing commands: /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- list Bridge -- list Port -- list Interface
MainThread::DEBUG::2019-07-31 18:02:42,304::cmdutils::133::root::(exec_cmd) /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- list Bridge -- list Port -- list Interface (cwd None)
MainThread::DEBUG::2019-07-31 18:02:42,341::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainThread::DEBUG::2019-07-31 18:02:42,342::vsctl::68::root::(commit) Executing commands: /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- set open . external-ids:ovn-bridge-mappings=""
MainThread::DEBUG::2019-07-31 18:02:42,342::cmdutils::133::root::(exec_cmd) /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- set open . 'external-ids:ovn-bridge-mappings=""' (cwd None)
MainThread::DEBUG::2019-07-31 18:02:42,378::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0

supervdsm.log

MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,945::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,946::cmdutils::133::root::(exec_cmd) /sbin/ip -4 route add 10.10.10.0/24 via 10.10.10.51 dev ovirtmgmt table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,954::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,955::cmdutils::133::root::(exec_cmd) /sbin/ip rule add from 10.10.10.0/24 prio 32000 table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,961::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,962::cmdutils::133::root::(exec_cmd) /sbin/ip rule add from all to 10.10.10.0/24 dev ovirtmgmt prio 32000 table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,969::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,969::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return add_sourceroute with None
MainProcess|hsm/init::DEBUG::2019-07-31 18:03:19,067::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return hbaRescan with None
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:23,980::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call remove_sourceroute with ('ovirtmgmt',) {}
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:23,980::cmdutils::133::root::(exec_cmd) /sbin/ip rule (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:23,993::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:23,995::cmdutils::133::root::(exec_cmd) /sbin/ip rule (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,001::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,003::cmdutils::133::root::(exec_cmd) /sbin/ip -oneline route show table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,011::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,012::sourceroute::216::root::(remove) Removing source route for device ovirtmgmt
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,012::cmdutils::133::root::(exec_cmd) /sbin/ip -4 route del 0.0.0.0/0 via 10.10.10.1 dev ovirtmgmt table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,044::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,044::cmdutils::133::root::(exec_cmd) /sbin/ip -4 route del 10.10.10.0/24 via 10.10.10.51 dev ovirtmgmt table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,068::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,069::cmdutils::133::root::(exec_cmd) /sbin/ip rule del from all to 10.10.10.0/24 dev ovirtmgmt prio 32000 table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,076::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,077::cmdutils::133::root::(exec_cmd) /sbin/ip rule del from 10.10.10.0/24 prio 32000 table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,084::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,084::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return remove_sourceroute with None
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,089::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call add_sourceroute with ('ovirtmgmt', '10.10.10.51', '255.255.255.0', '10.10.10.1') {}
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,089::sourceroute::196::root::(add) Adding source route for device ovirtmgmt
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,090::cmdutils::133::root::(exec_cmd) /sbin/ip -4 route add 0.0.0.0/0 via 10.10.10.1 dev ovirtmgmt table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,098::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,099::cmdutils::133::root::(exec_cmd) /sbin/ip -4 route add 10.10.10.0/24 via 10.10.10.51 dev ovirtmgmt table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,107::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,108::cmdutils::133::root::(exec_cmd) /sbin/ip rule add from 10.10.10.0/24 prio 32000 table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,114::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,115::cmdutils::133::root::(exec_cmd) /sbin/ip rule add from all to 10.10.10.0/24 dev ovirtmgmt prio 32000 table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,121::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,122::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return add_sourceroute with None
MainProcess|jsonrpc/3::DEBUG::2019-07-31 18:03:38,686::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call ksmTune with ({u'run': 0, u'merge_across_nodes': 1},) {}
MainProcess|jsonrpc/3::DEBUG::2019-07-31 18:03:38,687::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return ksmTune with None
MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,491::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call add_sourceroute with ('ovirtmgmt', '10.10.10.51', '255.255.255.0', '10.10.10.1') {}
MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,491::logutils::319::root::(_report_stats) ThreadedHandler is ok in the last 35611 seconds (max pending: 22)
MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,492::sourceroute::196::root::(add) Adding source route for device ovirtmgmt
MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,492::cmdutils::133::root::(exec_cmd) /sbin/ip -4 route add 0.0.0.0/0 via 10.10.10.1 dev ovirtmgmt table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,513::cmdutils::141::root::(exec_cmd) FAILED: <err> = 'RTNETLINK answers: File exists\n'; <rc> = 2
MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,513::sourceroute::202::root::(add) Route already exists, addition failed,: ("IPRouteData(to='0.0.0.0/0' via='10.10.10.1' src=None family=4 device='ovirtmgmt' table='168430131')", 'RTNETLINK answers: File exists')
MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,514::cmdutils::133::root::(exec_cmd) /sbin/ip rule add from 10.10.10.0/24 prio 32000 table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,521::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,522::cmdutils::133::root::(exec_cmd) /sbin/ip rule add from all to 10.10.10.0/24 dev ovirtmgmt prio 32000 table 168430131 (cwd None)
MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,530::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,530::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return add_sourceroute with None

Comment 5 Pascal DeMilly 2019-08-01 16:43:05 UTC
Looking for errors in this log directory I found this

in mom.log

2019-07-31 17:58:42,161 - mom.RPCServer - INFO - RPC Server ending
2019-07-31 17:58:43,814 - mom.GuestManager - INFO - Guest Manager ending
2019-07-31 17:58:45,817 - mom.HostMonitor - INFO - Host Monitor ending
2019-07-31 18:03:17,064 - mom - INFO - MOM starting
2019-07-31 18:03:17,174 - mom.HostMonitor - INFO - Host Monitor starting
2019-07-31 18:03:17,175 - mom - INFO - hypervisor interface vdsmjsonrpcclient
2019-07-31 18:03:17,339 - mom - ERROR - Failed to initialize MOM threads 
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 29, in run
    hypervisor_iface = self.get_hypervisor_interface()
  File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 217, in get_hypervisor_interface
    return module.instance(self.config)
  File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcclientInterface.py", line 96, in instance
    return JsonRpcVdsmClientInterface()
  File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcclientInterface.py", line 31, in __init__
    self._vdsm_api = client.connect(host="localhost")
  File "/usr/lib/python2.7/site-packages/vdsm/client.py", line 157, in connect 
    raise ConnectionError(host, port, use_tls, timeout, e)
ConnectionError: Connection to localhost:54321 with use_tls=True, timeout=60 failed: [Errno 111] Connection refused 
2019-07-31 18:03:22,648 - mom - INFO - MOM starting
2019-07-31 18:03:22,769 - mom.HostMonitor - INFO - Host Monitor starting
2019-07-31 18:03:22,770 - mom - INFO - hypervisor interface vdsmjsonrpcclient
2019-07-31 18:03:23,017 - mom.GuestManager - INFO - Guest Manager starting: multi-thread
2019-07-31 18:03:23,036 - mom.Policy - INFO - Loaded policy '00-defines'
2019-07-31 18:03:23,042 - mom.Policy - INFO - Loaded policy '01-parameters'
2019-07-31 18:03:23,119 - mom.Policy - INFO - Loaded policy '02-balloon'
2019-07-31 18:03:23,243 - mom.Policy - INFO - Loaded policy '03-ksm'
2019-07-31 18:03:23,396 - mom.Policy - INFO - Loaded policy '04-cputune'
2019-07-31 18:03:23,572 - mom.Policy - INFO - Loaded policy '05-iotune'
2019-07-31 18:03:23,574 - mom.PolicyEngine - INFO - Policy Engine starting
2019-07-31 18:03:23,576 - mom.RPCServer - INFO - Using unix socket /var/run/vdsm/mom-vdsm.sock
2019-07-31 18:03:23,578 - mom.RPCServer - INFO - RPC Server starting
2019-07-31 18:03:23,581 - mom.HostMonitor - INFO - HostMonitor is ready
2019-07-31 18:03:35,119 - mom.RPCServer - INFO - ping()
2019-07-31 18:03:35,120 - mom.RPCServer - INFO - getStatistics()
2019-07-31 18:03:38,680 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:0 merge_across_nodes:1 run:0 sleep_millisecs:0
2019-07-31 18:03:50,227 - mom.RPCServer - INFO - ping()

in supervdsm.log

restore-net::DEBUG::2019-07-31 18:02:52,519::cmdutils::133::root::(exec_cmd) /sbin/ip addr flush dev em2 scope global (cwd None)
restore-net::DEBUG::2019-07-31 18:02:52,525::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
restore-net::DEBUG::2019-07-31 18:02:52,526::ifcfg::488::root::(_atomicBackup) Backed up /etc/sysconfig/network-scripts/ifcfg-em2
restore-net::DEBUG::2019-07-31 18:02:52,529::ifcfg::578::root::(writeConfFile) Writing to file /etc/sysconfig/network-scripts/ifcfg-em2 configuration:
# Generated by VDSM version 4.30.17.1
DEVICE=em2
MASTER=bond1
SLAVE=yes
ONBOOT=yes
MTU=1500
DEFROUTE=no
NM_CONTROLLED=no
IPV6INIT=no

restore-net::WARNING::2019-07-31 18:02:52,530::ifcfg::270::root::(_addSourceRoute) Invalid input for source routing: name=bond1, addr=None, netmask=None, gateway=None
netlink/events::DEBUG::2019-07-31 18:02:52,535::concurrent::193::root::(run) START thread <Thread(netlink/events, started daemon 140470423701248)> (func=<bound method Monitor._scan of <vdsm.network.netlink.monitor.Monitor object at 0x7fc1e138c890>>, args=(), kwargs={})
restore-net::DEBUG::2019-07-31 18:02:52,538::cmdutils::133::root::(exec_cmd) /usr/bin/systemd-run --scope --unit=227798e3-2979-41b9-bc4a-8c7084d714e7 --slice=vdsm-dhclient /sbin/ifup bond1 (cwd None)
restore-net::DEBUG::2019-07-31 18:02:54,956::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = 'Running scope as unit 227798e3-2979-41b9-bc4a-8c7084d714e7.scope.\n'; <rc> = 0
netlink/events::DEBUG::2019-07-31 18:02:57,012::concurrent::196::root::(run) FINISH thread <Thread(netlink/events, stopped daemon 140470423701248)>
netlink/events::DEBUG::2019-07-31 18:02:57,014::concurrent::193::root::(run) START thread <Thread(netlink/events, started daemon 140470423701248)> (func=<bound method Monitor._scan of <vdsm.network.netlink.monitor.Monitor object at 0x7fc1e1312890>>, args=(), kwargs={})
netlink/events::DEBUG::2019-07-31 18:02:57,015::concurrent::196::root::(run) FINISH thread <Thread(netlink/events, stopped daemon 140470423701248)>
restore-net::INFO::2019-07-31 18:02:57,016::netconfpersistence::69::root::(setBonding) Adding bond1({'nics': ['em1', 'em2'], 'switch': 'legacy', 'options': 'mode=2 miimon=100'})
restore-net::DEBUG::2019-07-31 18:02:57,017::cmdutils::133::root::(exec_cmd) /sbin/tc filter del dev bond1 pref 5000 (cwd None)
restore-net::DEBUG::2019-07-31 18:02:57,025::cmdutils::141::root::(exec_cmd) FAILED: <err> = 'RTNETLINK answers: Invalid argument\nWe have an error talking to the kernel\n'; <rc> = 2
restore-net::DEBUG::2019-07-31 18:02:57,025::cmdutils::133::root::(exec_cmd) /sbin/tc qdisc show dev bond1 (cwd None)
restore-net::DEBUG::2019-07-31 18:02:57,033::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
restore-net::DEBUG::2019-07-31 18:02:57,071::cmdutils::133::root::(exec_cmd) /sbin/tc qdisc show (cwd None)
restore-net::DEBUG::2019-07-31 18:02:57,079::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
restore-net::DEBUG::2019-07-31 18:02:57,088::cmdutils::133::root::(exec_cmd) /sbin/tc class show dev p1p1 classid 0:1388 (cwd None)
restore-net::DEBUG::2019-07-31 18:02:57,095::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
restore-net::DEBUG::2019-07-31 18:02:57,096::cmdutils::133::root::(exec_cmd) /sbin/tc class show dev p1p2 classid 0:1388 (cwd None)
restore-net::DEBUG::2019-07-31 18:02:57,103::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0
restore-net::DEBUG::2019-07-31 18:02:57,240::legacy_switch::484::root::(bonds_setup) Starting bondings setup. bonds={u'bond1': {u'nics': [u'em1', u'em2'], u'switch': u'legacy', u'options': u'miimon=100 mode=2'}}, in_rollback=True
restore-net::DEBUG::2019-07-31 18:02:57,274::cmdutils::133::root::(exec_cmd) /sbin/tc qdisc show (cwd None)


Hope that helps

Thanks

Pascal

Comment 6 Pascal DeMilly 2019-08-01 18:27:17 UTC
I restarted the hosted-engine and things starting to look better. Question: I am not in production yet. Is it recommended to restart the hosted-engine while other vm are running? Is putting the engine is global maintenance affecting other VMs?

Comment 7 Ryan Barry 2019-08-01 18:54:35 UTC
Putting it in global maintenance ensures only that ovirt-ha-agent|broker won't automatically attempt to restart the HE VM, but doesn't otherwise affect anything. Is power management configured on these hosts?

Comment 8 Pascal DeMilly 2019-08-01 19:43:44 UTC
Yes they have power management set up. After rebooting the hosted-engine VM the VM finally freed up and the host are now up. My question is: How can I free a VM from a host from the command line? How can I disconnect a host from the hosted engine from the command line since the GUI didn't allow me to do anything when the host r VM are in that state?

I also have 3 VM which are locked because: Failed to run VM AAAWMC20001888 due to a failed validation: [Cannot run VM. The VM is performing an operation on a Snapshot. Please wait for the operation to finish, and try again.] (User: admin@internal-authz).

not sure how this happened but how can I clear this?

Thanks

Comment 9 Ryan Barry 2019-08-05 14:13:51 UTC
It's likely that the snapshot operations are simply taking a long time. Did they ever clear?

Comment 10 Michal Skrivanek 2020-03-18 15:46:12 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly

Comment 11 Michal Skrivanek 2020-03-18 15:51:04 UTC
This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly