Bug 1735384
Summary: | 1 host in non responsive and stuck in migrating a vm | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Pascal DeMilly <pascal> | ||||||||
Component: | General | Assignee: | bugs <bugs> | ||||||||
Status: | CLOSED DEFERRED | QA Contact: | Lukas Svaty <lsvaty> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 4.3.3.5 | CC: | bugs, michal.skrivanek, rbarry | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | All | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2020-04-01 14:47:16 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Created attachment 1596558 [details]
screen of non responsive host virtual machines
Created attachment 1596559 [details]
not able to put host in maintenance mode
So, if you right-click the VM or hit the extended menu, you can select "host has been rebooted", which may clear it. Is vdsm on the host responsive? Can it be routed to from the engine? After a HE migration, I'd suspect there may be a network interruption due to a misconfiguration somewhere When I choose "host has been rebooted" I get the following: Error while executing action: Cannot perform confirm 'Host has been rebooted'. Another power management action is already in progress. I have now a second host that is unresponsive. In this case also I was migrating it to be able to update it. On the 1st host here is the vdsm.log file: (and yes the host is pingable from the hosted-engine and vice versa) 2019-08-01 09:33:21,640-0700 INFO (jsonrpc/4) [api.host] START getAllVmStats() from=::1,48452 (api:48) 2019-08-01 09:33:21,640-0700 INFO (jsonrpc/4) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,48452 (api:54) 2019-08-01 09:33:21,641-0700 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312) 2019-08-01 09:33:30,792-0700 INFO (periodic/1) [vdsm.api] START repoStats(domains=()) from=internal, task_id=39d30922-eab1-4fa5-8577-3abc50439f89 (api:48) 2019-08-01 09:33:30,793-0700 INFO (periodic/1) [vdsm.api] FINISH repoStats return={} from=internal, task_id=39d30922-eab1-4fa5-8577-3abc50439f89 (api:54) 2019-08-01 09:33:36,671-0700 INFO (jsonrpc/5) [api.host] START getAllVmStats() from=::1,48452 (api:48) 2019-08-01 09:33:36,672-0700 INFO (jsonrpc/5) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,48452 (api:54) 2019-08-01 09:33:36,672-0700 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312) 2019-08-01 09:33:45,906-0700 INFO (periodic/3) [vdsm.api] START repoStats(domains=()) from=internal, task_id=3d07b451-8e00-4100-9eb8-5f533b2f281b (api:48) 2019-08-01 09:33:45,907-0700 INFO (periodic/3) [vdsm.api] FINISH repoStats return={} from=internal, task_id=3d07b451-8e00-4100-9eb8-5f533b2f281b (api:54) 2019-08-01 09:33:51,695-0700 INFO (jsonrpc/6) [api.host] START getAllVmStats() from=::1,48452 (api:48) 2019-08-01 09:33:51,695-0700 INFO (jsonrpc/6) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::1,48452 (api:54) 2019-08-01 09:33:51,696-0700 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:312) upgrade.log MainThread::INFO::2019-07-31 18:02:42,273::netconfpersistence::231::root::(_clearDisk) Clearing netconf: /var/lib/vdsm/persistence/netconf MainThread::INFO::2019-07-31 18:02:42,281::netconfpersistence::181::root::(save) Saved new config PersistentConfig({'AAA': {u'ipv6autoconf': False, u'nameservers': [], u'vlan': 2001, u'switch': u'legacy', u'mtu': 1500, u'bonding': u'bond0', u'dhcpv6': False, u'stp': False, u'bridged': True, u'defaultRoute': False, u'bootproto': u'none'}, 'ovirtmgmt': {u'ipv6autoconf': True, u'nameservers': [], u'bonding': u'bond1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'stp': False, u'bridged': True, u'defaultRoute': True, u'bootproto': u'dhcp'}, 'bfit-vm': {u'ipv6autoconf': False, u'nameservers': [], u'vlan': 1, u'switch': u'legacy', u'mtu': 1500, u'bonding': u'bond0', u'dhcpv6': False, u'stp': False, u'bridged': True, u'defaultRoute': False, u'bootproto': u'none'}, 'BBB': {u'ipv6autoconf': False, u'nameservers': [], u'vlan': 2002, u'switch': u'legacy', u'mtu': 1500, u'bonding': u'bond0', u'dhcpv6': False, u'stp': False, u'bridged': True, u'defaultRoute': False, u'bootproto': u'none'}, 'nas': {u'ipv6autoconf': False, u'nameservers': [], u'nic': u'p1p1', u'ipaddr': u'192.168.4.51', u'switch': u'legacy', u'mtu': 1500, u'netmask': u'255.255.255.0', u'dhcpv6': False, u'bridged': False, u'defaultRoute': False, u'bootproto': u'none'}, 'display': {u'ipv6autoconf': False, u'nameservers': [], u'nic': u'p1p2', u'ipaddr': u'70.182.176.223', u'netmask': u'255.255.255.0', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'bridged': False, u'gateway': u'70.182.176.1', u'defaultRoute': False, u'bootproto': u'none'}, 'CCC': {u'ipv6autoconf': False, u'nameservers': [], u'vlan': 2003, u'switch': u'legacy', u'mtu': 1500, u'bonding': u'bond0', u'dhcpv6': False, u'stp': False, u'bridged': True, u'defaultRoute': False, u'bootproto': u'none'}}, {'bond0': {u'nics': [u'em3', u'em4'], u'switch': u'legacy', u'options': u'mode=1 miimon=100'}, 'bond1': {u'nics': [u'em1', u'em2'], u'switch': u'legacy', u'options': u'mode=2 miimon=100'}}, {}) to [/var/lib/vdsm/persistence/netconf/nets,/var/lib/vdsm/persistence/netconf/bonds,/var/lib/vdsm/persistence/netconf/devices] MainThread::DEBUG::2019-07-31 18:02:42,281::cmdutils::133::root::(exec_cmd) /usr/share/openvswitch/scripts/ovs-ctl status (cwd None) MainThread::DEBUG::2019-07-31 18:02:42,303::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainThread::DEBUG::2019-07-31 18:02:42,304::vsctl::68::root::(commit) Executing commands: /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- list Bridge -- list Port -- list Interface MainThread::DEBUG::2019-07-31 18:02:42,304::cmdutils::133::root::(exec_cmd) /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- list Bridge -- list Port -- list Interface (cwd None) MainThread::DEBUG::2019-07-31 18:02:42,341::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainThread::DEBUG::2019-07-31 18:02:42,342::vsctl::68::root::(commit) Executing commands: /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- set open . external-ids:ovn-bridge-mappings="" MainThread::DEBUG::2019-07-31 18:02:42,342::cmdutils::133::root::(exec_cmd) /usr/bin/ovs-vsctl --timeout=5 --oneline --format=json -- set open . 'external-ids:ovn-bridge-mappings=""' (cwd None) MainThread::DEBUG::2019-07-31 18:02:42,378::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 supervdsm.log MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,945::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,946::cmdutils::133::root::(exec_cmd) /sbin/ip -4 route add 10.10.10.0/24 via 10.10.10.51 dev ovirtmgmt table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,954::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,955::cmdutils::133::root::(exec_cmd) /sbin/ip rule add from 10.10.10.0/24 prio 32000 table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,961::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,962::cmdutils::133::root::(exec_cmd) /sbin/ip rule add from all to 10.10.10.0/24 dev ovirtmgmt prio 32000 table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,969::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:18,969::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return add_sourceroute with None MainProcess|hsm/init::DEBUG::2019-07-31 18:03:19,067::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return hbaRescan with None MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:23,980::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call remove_sourceroute with ('ovirtmgmt',) {} MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:23,980::cmdutils::133::root::(exec_cmd) /sbin/ip rule (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:23,993::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:23,995::cmdutils::133::root::(exec_cmd) /sbin/ip rule (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,001::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,003::cmdutils::133::root::(exec_cmd) /sbin/ip -oneline route show table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,011::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,012::sourceroute::216::root::(remove) Removing source route for device ovirtmgmt MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,012::cmdutils::133::root::(exec_cmd) /sbin/ip -4 route del 0.0.0.0/0 via 10.10.10.1 dev ovirtmgmt table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,044::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,044::cmdutils::133::root::(exec_cmd) /sbin/ip -4 route del 10.10.10.0/24 via 10.10.10.51 dev ovirtmgmt table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,068::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,069::cmdutils::133::root::(exec_cmd) /sbin/ip rule del from all to 10.10.10.0/24 dev ovirtmgmt prio 32000 table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,076::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,077::cmdutils::133::root::(exec_cmd) /sbin/ip rule del from 10.10.10.0/24 prio 32000 table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,084::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,084::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return remove_sourceroute with None MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,089::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call add_sourceroute with ('ovirtmgmt', '10.10.10.51', '255.255.255.0', '10.10.10.1') {} MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,089::sourceroute::196::root::(add) Adding source route for device ovirtmgmt MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,090::cmdutils::133::root::(exec_cmd) /sbin/ip -4 route add 0.0.0.0/0 via 10.10.10.1 dev ovirtmgmt table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,098::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,099::cmdutils::133::root::(exec_cmd) /sbin/ip -4 route add 10.10.10.0/24 via 10.10.10.51 dev ovirtmgmt table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,107::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,108::cmdutils::133::root::(exec_cmd) /sbin/ip rule add from 10.10.10.0/24 prio 32000 table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,114::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,115::cmdutils::133::root::(exec_cmd) /sbin/ip rule add from all to 10.10.10.0/24 dev ovirtmgmt prio 32000 table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,121::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-07-31 18:03:24,122::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return add_sourceroute with None MainProcess|jsonrpc/3::DEBUG::2019-07-31 18:03:38,686::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call ksmTune with ({u'run': 0, u'merge_across_nodes': 1},) {} MainProcess|jsonrpc/3::DEBUG::2019-07-31 18:03:38,687::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return ksmTune with None MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,491::supervdsm_server::99::SuperVdsm.ServerCallback::(wrapper) call add_sourceroute with ('ovirtmgmt', '10.10.10.51', '255.255.255.0', '10.10.10.1') {} MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,491::logutils::319::root::(_report_stats) ThreadedHandler is ok in the last 35611 seconds (max pending: 22) MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,492::sourceroute::196::root::(add) Adding source route for device ovirtmgmt MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,492::cmdutils::133::root::(exec_cmd) /sbin/ip -4 route add 0.0.0.0/0 via 10.10.10.1 dev ovirtmgmt table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,513::cmdutils::141::root::(exec_cmd) FAILED: <err> = 'RTNETLINK answers: File exists\n'; <rc> = 2 MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,513::sourceroute::202::root::(add) Route already exists, addition failed,: ("IPRouteData(to='0.0.0.0/0' via='10.10.10.1' src=None family=4 device='ovirtmgmt' table='168430131')", 'RTNETLINK answers: File exists') MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,514::cmdutils::133::root::(exec_cmd) /sbin/ip rule add from 10.10.10.0/24 prio 32000 table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,521::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,522::cmdutils::133::root::(exec_cmd) /sbin/ip rule add from all to 10.10.10.0/24 dev ovirtmgmt prio 32000 table 168430131 (cwd None) MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,530::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 MainProcess|dhclient-monitor::DEBUG::2019-08-01 03:56:13,530::supervdsm_server::106::SuperVdsm.ServerCallback::(wrapper) return add_sourceroute with None Looking for errors in this log directory I found this in mom.log 2019-07-31 17:58:42,161 - mom.RPCServer - INFO - RPC Server ending 2019-07-31 17:58:43,814 - mom.GuestManager - INFO - Guest Manager ending 2019-07-31 17:58:45,817 - mom.HostMonitor - INFO - Host Monitor ending 2019-07-31 18:03:17,064 - mom - INFO - MOM starting 2019-07-31 18:03:17,174 - mom.HostMonitor - INFO - Host Monitor starting 2019-07-31 18:03:17,175 - mom - INFO - hypervisor interface vdsmjsonrpcclient 2019-07-31 18:03:17,339 - mom - ERROR - Failed to initialize MOM threads Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 29, in run hypervisor_iface = self.get_hypervisor_interface() File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 217, in get_hypervisor_interface return module.instance(self.config) File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcclientInterface.py", line 96, in instance return JsonRpcVdsmClientInterface() File "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcclientInterface.py", line 31, in __init__ self._vdsm_api = client.connect(host="localhost") File "/usr/lib/python2.7/site-packages/vdsm/client.py", line 157, in connect raise ConnectionError(host, port, use_tls, timeout, e) ConnectionError: Connection to localhost:54321 with use_tls=True, timeout=60 failed: [Errno 111] Connection refused 2019-07-31 18:03:22,648 - mom - INFO - MOM starting 2019-07-31 18:03:22,769 - mom.HostMonitor - INFO - Host Monitor starting 2019-07-31 18:03:22,770 - mom - INFO - hypervisor interface vdsmjsonrpcclient 2019-07-31 18:03:23,017 - mom.GuestManager - INFO - Guest Manager starting: multi-thread 2019-07-31 18:03:23,036 - mom.Policy - INFO - Loaded policy '00-defines' 2019-07-31 18:03:23,042 - mom.Policy - INFO - Loaded policy '01-parameters' 2019-07-31 18:03:23,119 - mom.Policy - INFO - Loaded policy '02-balloon' 2019-07-31 18:03:23,243 - mom.Policy - INFO - Loaded policy '03-ksm' 2019-07-31 18:03:23,396 - mom.Policy - INFO - Loaded policy '04-cputune' 2019-07-31 18:03:23,572 - mom.Policy - INFO - Loaded policy '05-iotune' 2019-07-31 18:03:23,574 - mom.PolicyEngine - INFO - Policy Engine starting 2019-07-31 18:03:23,576 - mom.RPCServer - INFO - Using unix socket /var/run/vdsm/mom-vdsm.sock 2019-07-31 18:03:23,578 - mom.RPCServer - INFO - RPC Server starting 2019-07-31 18:03:23,581 - mom.HostMonitor - INFO - HostMonitor is ready 2019-07-31 18:03:35,119 - mom.RPCServer - INFO - ping() 2019-07-31 18:03:35,120 - mom.RPCServer - INFO - getStatistics() 2019-07-31 18:03:38,680 - mom.Controllers.KSM - INFO - Updating KSM configuration: pages_to_scan:0 merge_across_nodes:1 run:0 sleep_millisecs:0 2019-07-31 18:03:50,227 - mom.RPCServer - INFO - ping() in supervdsm.log restore-net::DEBUG::2019-07-31 18:02:52,519::cmdutils::133::root::(exec_cmd) /sbin/ip addr flush dev em2 scope global (cwd None) restore-net::DEBUG::2019-07-31 18:02:52,525::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 restore-net::DEBUG::2019-07-31 18:02:52,526::ifcfg::488::root::(_atomicBackup) Backed up /etc/sysconfig/network-scripts/ifcfg-em2 restore-net::DEBUG::2019-07-31 18:02:52,529::ifcfg::578::root::(writeConfFile) Writing to file /etc/sysconfig/network-scripts/ifcfg-em2 configuration: # Generated by VDSM version 4.30.17.1 DEVICE=em2 MASTER=bond1 SLAVE=yes ONBOOT=yes MTU=1500 DEFROUTE=no NM_CONTROLLED=no IPV6INIT=no restore-net::WARNING::2019-07-31 18:02:52,530::ifcfg::270::root::(_addSourceRoute) Invalid input for source routing: name=bond1, addr=None, netmask=None, gateway=None netlink/events::DEBUG::2019-07-31 18:02:52,535::concurrent::193::root::(run) START thread <Thread(netlink/events, started daemon 140470423701248)> (func=<bound method Monitor._scan of <vdsm.network.netlink.monitor.Monitor object at 0x7fc1e138c890>>, args=(), kwargs={}) restore-net::DEBUG::2019-07-31 18:02:52,538::cmdutils::133::root::(exec_cmd) /usr/bin/systemd-run --scope --unit=227798e3-2979-41b9-bc4a-8c7084d714e7 --slice=vdsm-dhclient /sbin/ifup bond1 (cwd None) restore-net::DEBUG::2019-07-31 18:02:54,956::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = 'Running scope as unit 227798e3-2979-41b9-bc4a-8c7084d714e7.scope.\n'; <rc> = 0 netlink/events::DEBUG::2019-07-31 18:02:57,012::concurrent::196::root::(run) FINISH thread <Thread(netlink/events, stopped daemon 140470423701248)> netlink/events::DEBUG::2019-07-31 18:02:57,014::concurrent::193::root::(run) START thread <Thread(netlink/events, started daemon 140470423701248)> (func=<bound method Monitor._scan of <vdsm.network.netlink.monitor.Monitor object at 0x7fc1e1312890>>, args=(), kwargs={}) netlink/events::DEBUG::2019-07-31 18:02:57,015::concurrent::196::root::(run) FINISH thread <Thread(netlink/events, stopped daemon 140470423701248)> restore-net::INFO::2019-07-31 18:02:57,016::netconfpersistence::69::root::(setBonding) Adding bond1({'nics': ['em1', 'em2'], 'switch': 'legacy', 'options': 'mode=2 miimon=100'}) restore-net::DEBUG::2019-07-31 18:02:57,017::cmdutils::133::root::(exec_cmd) /sbin/tc filter del dev bond1 pref 5000 (cwd None) restore-net::DEBUG::2019-07-31 18:02:57,025::cmdutils::141::root::(exec_cmd) FAILED: <err> = 'RTNETLINK answers: Invalid argument\nWe have an error talking to the kernel\n'; <rc> = 2 restore-net::DEBUG::2019-07-31 18:02:57,025::cmdutils::133::root::(exec_cmd) /sbin/tc qdisc show dev bond1 (cwd None) restore-net::DEBUG::2019-07-31 18:02:57,033::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 restore-net::DEBUG::2019-07-31 18:02:57,071::cmdutils::133::root::(exec_cmd) /sbin/tc qdisc show (cwd None) restore-net::DEBUG::2019-07-31 18:02:57,079::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 restore-net::DEBUG::2019-07-31 18:02:57,088::cmdutils::133::root::(exec_cmd) /sbin/tc class show dev p1p1 classid 0:1388 (cwd None) restore-net::DEBUG::2019-07-31 18:02:57,095::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 restore-net::DEBUG::2019-07-31 18:02:57,096::cmdutils::133::root::(exec_cmd) /sbin/tc class show dev p1p2 classid 0:1388 (cwd None) restore-net::DEBUG::2019-07-31 18:02:57,103::cmdutils::141::root::(exec_cmd) SUCCESS: <err> = ''; <rc> = 0 restore-net::DEBUG::2019-07-31 18:02:57,240::legacy_switch::484::root::(bonds_setup) Starting bondings setup. bonds={u'bond1': {u'nics': [u'em1', u'em2'], u'switch': u'legacy', u'options': u'miimon=100 mode=2'}}, in_rollback=True restore-net::DEBUG::2019-07-31 18:02:57,274::cmdutils::133::root::(exec_cmd) /sbin/tc qdisc show (cwd None) Hope that helps Thanks Pascal I restarted the hosted-engine and things starting to look better. Question: I am not in production yet. Is it recommended to restart the hosted-engine while other vm are running? Is putting the engine is global maintenance affecting other VMs? Putting it in global maintenance ensures only that ovirt-ha-agent|broker won't automatically attempt to restart the HE VM, but doesn't otherwise affect anything. Is power management configured on these hosts? Yes they have power management set up. After rebooting the hosted-engine VM the VM finally freed up and the host are now up. My question is: How can I free a VM from a host from the command line? How can I disconnect a host from the hosted engine from the command line since the GUI didn't allow me to do anything when the host r VM are in that state? I also have 3 VM which are locked because: Failed to run VM AAAWMC20001888 due to a failed validation: [Cannot run VM. The VM is performing an operation on a Snapshot. Please wait for the operation to finish, and try again.] (User: admin@internal-authz). not sure how this happened but how can I clear this? Thanks It's likely that the snapshot operations are simply taking a long time. Did they ever clear? This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly This bug didn't get any attention for a while, we didn't have the capacity to make any progress. If you deeply care about it or want to work on it please assign/target accordingly ok, closing. Please reopen if still relevant/you want to work on it. ok, closing. Please reopen if still relevant/you want to work on it. |
Created attachment 1596557 [details] screenshot of hosts page Description of problem: I have 1 data center with 2 clusters each with 3 hosts. One of my host shows 1 vm being migrated (it was the hosted-engine), however that hosted-engine has been successfully migrated to another host. If I open this host detail page I can see no virtual machine assigned to it so it looks like just a counter or some event needs to be clear. I cannot put that host in maintenance mode either or even reboot it from the web interface. Rebooting it from the CLI doesn't clear that field Version-Release number of selected component (if applicable): Software Version:4.3.3.5-1.el7 How reproducible: whatever I try it is still there Steps to Reproduce: 1. 2. 3. Actual results: In the events log I get: VDSM d1-c1-v1 command Get Host Capabilities failed: Message timeout which can be caused by communication issues Expected results: Additional info: