+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1723668 +++ ====================================================================== +++ This bug was initially created as a clone of Bug #1722746 +++ Description of problem: Occasionally this kind of messages can be seen in the manager: VDSM rhvh01.example.com command Get Host Statistics failed: Internal JSON-RPC error: {'reason': '[Errno 19] vnet111 is not present in the system'} This is logged in vdsm.log: 2019-06-17 22:54:57,723+0200 INFO (jsonrpc/4) [api.virt] START shutdown(delay=u'30', message=u'System Administrator has initiated shutdown of this Virtual Machine. Virtual Machine is shutting down.', reboot=False, timeout=None, force=False) from=::ffff:10.0.0.2,43422, flow_id=515ed70d-84e4-41ca-b0bd-166413a99c26, vmId=96328f4c-b30e-4b54-95a9-a437fabeebe5 (api:48) 2019-06-17 22:54:57,724+0200 INFO (jsonrpc/4) [virt.vm] (vmId='96328f4c-b30e-4b54-95a9-a437fabeebe5') Starting powerdown (vmpowerdown:70) 2019-06-17 22:54:57,725+0200 INFO (jsonrpc/4) [api.virt] FINISH shutdown return={'status': {'message': 'Machine shutting down', 'code': 0}} from=::ffff:10.0.0.2,43422, flow_id=515ed70d-84e4-41ca-b0bd-166413a99c26, vmId=96328f4c-b30e-4b54-95a9-a437fabeebe5 (api:54) 2019-06-17 22:54:57,725+0200 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call VM.shutdown succeeded in 0.00 seconds (__init__:312) 2019-06-17 22:55:08,255+0200 INFO (jsonrpc/5) [api.host] START getAllVmStats() from=::ffff:10.0.0.2,43422 (api:48) 2019-06-17 22:55:08,378+0200 INFO (jsonrpc/5) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::ffff:10.0.0.2,43422 (api:54) 2019-06-17 22:55:08,405+0200 INFO (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.16 seconds (__init__:312) 2019-06-17 22:55:11,173+0200 INFO (jsonrpc/2) [api.host] START getStats() from=::ffff:10.0.0.2,43422 (api:48) 2019-06-17 22:55:11,476+0200 INFO (jsonrpc/2) [api.host] FINISH getStats error=[Errno 19] vnet111 is not present in the system from=::ffff:10.0.0.2,43422 (api:52) 2019-06-17 22:55:11,476+0200 ERROR (jsonrpc/2) [jsonrpc.JsonRpcServer] Internal server error (__init__:350) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345, in _handle_request res = method(**params) File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 194, in _dynamicMethod result = fn(*methodArgs) File "<string>", line 2, in getStats File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1428, in getStats multipath=True)} File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 50, in get_stats decStats = stats.produce(first_sample, last_sample) File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 72, in produce stats.update(get_interfaces_stats()) File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 154, in get_interfaces_stats return net_api.network_stats() File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 63, in network_stats return netstats.report() File "/usr/lib/python2.7/site-packages/vdsm/network/netstats.py", line 32, in report stats = link_stats.report() File "/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py", line 34, in report stats[i.device] = i.statistics() File "/usr/lib/python2.7/site-packages/vdsm/network/link/iface.py", line 232, in statistics 'state': 'up' if self.is_oper_up() else 'down', File "/usr/lib/python2.7/site-packages/vdsm/network/link/iface.py", line 193, in is_oper_up properties = self.properties() File "/usr/lib/python2.7/site-packages/vdsm/network/link/iface.py", line 165, in properties info = link.get_link(self._dev) File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/link.py", line 38, in get_link name) IOError: [Errno 19] vnet111 is not present in the system 2019-06-17 22:55:11,478+0200 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getStats failed (error -32603) in 0.31 seconds (__init__:312) Version-Release number of selected component (if applicable): vdsm-4.30.13-4.el7ev.x86_64 How reproducible: Ocasionally, when powering off VMs Steps to Reproduce: 1. Power off VMs 2. When it coincides with a getStats of the host, this error can be seen some times Actual results: Get Host Statistics failed is reported in the manager. Expected results: No errors of getting statistics of a VM in the process of shutdown. Additional info: This can be seen in engine.log: 2019-06-17 22:54:57,712+02 INFO [org.ovirt.engine.core.bll.ShutdownVmCommand] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] Running command: ShutdownVmCommand internal: false. Entities affected : ID: 96328f4c-b30e-4b54-95a9-a437fabeebe5 Type: VMAction group SHUT_DOWN_VM with role type USER 2019-06-17 22:54:57,715+02 INFO [org.ovirt.engine.core.bll.ShutdownVmCommand] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] Entered (VM 'VM01'). 2019-06-17 22:54:57,717+02 INFO [org.ovirt.engine.core.bll.ShutdownVmCommand] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] Sending shutdown command for VM 'VM01'. 2019-06-17 22:54:57,718+02 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] START, DestroyVmVDSCommand( DestroyVmVDSCommandParameters:{hostId='3c3226f2-30d5-4068-bda5-51241cd8355b', vmId='96328f4c-b30e-4b54-95a9-a437fabeebe5', secondsToWait='30', gracefully='true', reason='', ignoreNoVm='false'}), log id: 5330cafc 2019-06-17 22:54:57,721+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] START, DestroyVDSCommand(HostName = rhvh01.example.com, DestroyVmVDSCommandParameters:{hostId='3c3226f2-30d5-4068-bda5-51241cd8355b', vmId='96328f4c-b30e-4b54-95a9-a437fabeebe5', secondsToWait='30', gracefully='true', reason='', ignoreNoVm='false'}), log id: 600f9351 2019-06-17 22:54:57,725+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] FINISH, DestroyVDSCommand, log id: 600f9351 2019-06-17 22:54:57,728+02 INFO [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] FINISH, DestroyVmVDSCommand, return: PoweringDown, log id: 5330cafc 2019-06-17 22:54:57,755+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] EVENT_ID: USER_INITIATED_SHUTDOWN_VM(73), VM shutdown initiated by user01@example.local-authz on VM VM01 (Host: rhvh01.example.com). 2019-06-17 22:55:11,478+02 WARN [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsAsyncVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-20) [] Unexpected return value: Status [code=-32603, message=Internal JSON-RPC error: {'reason': '[Errno 19] vnet111 is not present in the system'}] 2019-06-17 22:55:11,478+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsAsyncVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-20) [] Failed in 'Get Host Statistics' method 2019-06-17 22:55:11,479+02 WARN [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsAsyncVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-20) [] Unexpected return value: Status [code=-32603, message=Internal JSON-RPC error: {'reason': '[Errno 19] vnet111 is not present in the system'}] 2019-06-17 22:55:11,482+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-20) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM rhvh01.example.com command Get Host Statistics failed: Internal JSON-RPC error: {'reason': '[Errno 19] vnet111 is not present in the system'} 2019-06-17 22:55:11,482+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-20) [] Unable to GetStats: VDSErrorException: VDSGenericException: VDSErrorException: Failed to Get Host Statistics, error = Internal JSON-RPC error: {'reason': '[Errno 19] vnet111 is not present in the system'}, code = -32603 2019-06-17 22:56:10,292+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [] VM '96328f4c-b30e-4b54-95a9-a437fabeebe5' was reported as Down on VDS '3c3226f2-30d5-4068-bda5-51241cd8355b'(rhvh01.example.com) 2019-06-17 22:56:10,293+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [] START, DestroyVDSCommand(HostName = rhvh01.example.com, DestroyVmVDSCommandParameters:{hostId='3c3226f2-30d5-4068-bda5-51241cd8355b', vmId='96328f4c-b30e-4b54-95a9-a437fabeebe5', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 54f67c5f 2019-06-17 22:56:10,299+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [] FINISH, DestroyVDSCommand, log id: 54f67c5f 2019-06-17 22:56:10,299+02 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [] VM '96328f4c-b30e-4b54-95a9-a437fabeebe5'(VM01) moved from 'PoweringDown' --> 'Down' 2019-06-17 22:56:10,318+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [] EVENT_ID: VM_DOWN(61), VM VM01 is down. Exit message: Admin shut down from the engine (Originally by Juan Orti Alcaine)
The statistic gathering is not atomic in nature, therefore, the entities that exist at the beginning of the transaction may not exist during the whole transaction period. Therefore, statistics gathering may fail just because one interface disappeared in the middle. We could make this tolerable by absorbing such errors at the interface statistics level. (Originally by edwardh)
*** Bug 1722746 has been marked as a duplicate of this bug. *** (Originally by Michael Burman)
Verified on - vdsm-4.30.29-1.el7ev.x86_64 The issue was the report. Gathering a list of devices, and executing a report on each one of them would throw an error if a device would be disconnected after the device list was gathered, but the report not yet executed. Verified by manually inserting sleep after the device was fetched from the list, but before the statistics report started - then manually removing the selected device.
Thank you Bell with helping us verifying this bug.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3009