Bug 1744572 - [downstream clone - 4.3.6] VDSM command Get Host Statistics failed: Internal JSON-RPC error: {'reason': '[Errno 19] vnet<x> is not present in the system'}
Summary: [downstream clone - 4.3.6] VDSM command Get Host Statistics failed: Internal ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.3.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ovirt-4.3.6
: 4.3.6
Assignee: Bell Levin
QA Contact: Michael Burman
URL:
Whiteboard:
Depends On: 1723668
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-22 13:01 UTC by RHV bug bot
Modified: 2019-10-10 15:37 UTC (History)
10 users (show)

Fixed In Version: vdsm-4.30.29
Doc Type: Bug Fix
Doc Text:
Previously, while powering off a virtual machine, the RHV Manager occasionally displayed an error message that the VDSM Get Host Statistics command failed due to an Internal JSON-RPC error: {'reason': '[Errno 19] vnet111 is not present in the system'}. This error occurred because an interface disappeared while the host statistics were being gathered. The current release fixes this issue by suppressing this error under these conditions.
Clone Of: 1723668
Environment:
Last Closed: 2019-10-10 15:36:50 UTC
oVirt Team: Network
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4242981 0 Troubleshoot None Errors command Get Host Statistics failed: Internal JSON-RPC error: {'reason': '[Errno 19] vnet111 is not present in the... 2019-08-22 13:01:44 UTC
Red Hat Product Errata RHBA-2019:3009 0 None None None 2019-10-10 15:37:06 UTC
oVirt gerrit 101385 0 master MERGED net: Fix error if net removed during stats request 2020-11-04 12:19:51 UTC
oVirt gerrit 102606 0 None MERGED net: Fix error if net removed during stats request 2020-11-04 12:19:33 UTC

Description RHV bug bot 2019-08-22 13:01:22 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1723668 +++
======================================================================

+++ This bug was initially created as a clone of Bug #1722746 +++

Description of problem:
Occasionally this kind of messages can be seen in the manager:

VDSM rhvh01.example.com command Get Host Statistics failed: Internal JSON-RPC error: {'reason': '[Errno 19] vnet111 is not present in the system'}


This is logged in vdsm.log:

2019-06-17 22:54:57,723+0200 INFO  (jsonrpc/4) [api.virt] START shutdown(delay=u'30', message=u'System Administrator has initiated shutdown of this Virtual Machine. Virtual Machine is shutting down.', reboot=False, timeout=None, force=False) from=::ffff:10.0.0.2,43422, flow_id=515ed70d-84e4-41ca-b0bd-166413a99c26, vmId=96328f4c-b30e-4b54-95a9-a437fabeebe5 (api:48)
2019-06-17 22:54:57,724+0200 INFO  (jsonrpc/4) [virt.vm] (vmId='96328f4c-b30e-4b54-95a9-a437fabeebe5') Starting powerdown (vmpowerdown:70)
2019-06-17 22:54:57,725+0200 INFO  (jsonrpc/4) [api.virt] FINISH shutdown return={'status': {'message': 'Machine shutting down', 'code': 0}} from=::ffff:10.0.0.2,43422, flow_id=515ed70d-84e4-41ca-b0bd-166413a99c26, vmId=96328f4c-b30e-4b54-95a9-a437fabeebe5 (api:54)
2019-06-17 22:54:57,725+0200 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call VM.shutdown succeeded in 0.00 seconds (__init__:312)
2019-06-17 22:55:08,255+0200 INFO  (jsonrpc/5) [api.host] START getAllVmStats() from=::ffff:10.0.0.2,43422 (api:48)
2019-06-17 22:55:08,378+0200 INFO  (jsonrpc/5) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)} from=::ffff:10.0.0.2,43422 (api:54)
2019-06-17 22:55:08,405+0200 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call Host.getAllVmStats succeeded in 0.16 seconds (__init__:312)
2019-06-17 22:55:11,173+0200 INFO  (jsonrpc/2) [api.host] START getStats() from=::ffff:10.0.0.2,43422 (api:48)
2019-06-17 22:55:11,476+0200 INFO  (jsonrpc/2) [api.host] FINISH getStats error=[Errno 19] vnet111 is not present in the system from=::ffff:10.0.0.2,43422 (api:52)
2019-06-17 22:55:11,476+0200 ERROR (jsonrpc/2) [jsonrpc.JsonRpcServer] Internal server error (__init__:350)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 345, in _handle_request
    res = method(**params)
  File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 194, in _dynamicMethod
    result = fn(*methodArgs)
  File "<string>", line 2, in getStats
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1428, in getStats
    multipath=True)}
  File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 50, in get_stats
    decStats = stats.produce(first_sample, last_sample)
  File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 72, in produce
    stats.update(get_interfaces_stats())
  File "/usr/lib/python2.7/site-packages/vdsm/host/stats.py", line 154, in get_interfaces_stats
    return net_api.network_stats()
  File "/usr/lib/python2.7/site-packages/vdsm/network/api.py", line 63, in network_stats
    return netstats.report()
  File "/usr/lib/python2.7/site-packages/vdsm/network/netstats.py", line 32, in report
    stats = link_stats.report()
  File "/usr/lib/python2.7/site-packages/vdsm/network/link/stats.py", line 34, in report
    stats[i.device] = i.statistics()
  File "/usr/lib/python2.7/site-packages/vdsm/network/link/iface.py", line 232, in statistics
    'state': 'up' if self.is_oper_up() else 'down',
  File "/usr/lib/python2.7/site-packages/vdsm/network/link/iface.py", line 193, in is_oper_up
    properties = self.properties()
  File "/usr/lib/python2.7/site-packages/vdsm/network/link/iface.py", line 165, in properties
    info = link.get_link(self._dev)
  File "/usr/lib/python2.7/site-packages/vdsm/network/netlink/link.py", line 38, in get_link
    name)
IOError: [Errno 19] vnet111 is not present in the system
2019-06-17 22:55:11,478+0200 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call Host.getStats failed (error -32603) in 0.31 seconds (__init__:312)

Version-Release number of selected component (if applicable):
vdsm-4.30.13-4.el7ev.x86_64


How reproducible:
Ocasionally, when powering off VMs

Steps to Reproduce:
1. Power off VMs
2. When it coincides with a getStats of the host, this error can be seen some times

Actual results:
Get Host Statistics failed is reported in the manager.

Expected results:
No errors of getting statistics of a VM in the process of shutdown.

Additional info:
This can be seen in engine.log:

2019-06-17 22:54:57,712+02 INFO  [org.ovirt.engine.core.bll.ShutdownVmCommand] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] Running command: ShutdownVmCommand internal: false. Entities affected :  ID: 96328f4c-b30e-4b54-95a9-a437fabeebe5 Type: VMAction group SHUT_DOWN_VM with role type USER
2019-06-17 22:54:57,715+02 INFO  [org.ovirt.engine.core.bll.ShutdownVmCommand] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] Entered (VM 'VM01').
2019-06-17 22:54:57,717+02 INFO  [org.ovirt.engine.core.bll.ShutdownVmCommand] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] Sending shutdown command for VM 'VM01'.
2019-06-17 22:54:57,718+02 INFO  [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] START, DestroyVmVDSCommand( DestroyVmVDSCommandParameters:{hostId='3c3226f2-30d5-4068-bda5-51241cd8355b', vmId='96328f4c-b30e-4b54-95a9-a437fabeebe5', secondsToWait='30', gracefully='true', reason='', ignoreNoVm='false'}), log id: 5330cafc
2019-06-17 22:54:57,721+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] START, DestroyVDSCommand(HostName = rhvh01.example.com, DestroyVmVDSCommandParameters:{hostId='3c3226f2-30d5-4068-bda5-51241cd8355b', vmId='96328f4c-b30e-4b54-95a9-a437fabeebe5', secondsToWait='30', gracefully='true', reason='', ignoreNoVm='false'}), log id: 600f9351
2019-06-17 22:54:57,725+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] FINISH, DestroyVDSCommand, log id: 600f9351
2019-06-17 22:54:57,728+02 INFO  [org.ovirt.engine.core.vdsbroker.DestroyVmVDSCommand] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] FINISH, DestroyVmVDSCommand, return: PoweringDown, log id: 5330cafc
2019-06-17 22:54:57,755+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-10675) [515ed70d-84e4-41ca-b0bd-166413a99c26] EVENT_ID: USER_INITIATED_SHUTDOWN_VM(73), VM shutdown initiated by user01@example.local-authz on VM VM01 (Host: rhvh01.example.com).

2019-06-17 22:55:11,478+02 WARN  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsAsyncVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-20) [] Unexpected return value: Status [code=-32603, message=Internal JSON-RPC error: {'reason': '[Errno 19] vnet111 is not present in the system'}]
2019-06-17 22:55:11,478+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsAsyncVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-20) [] Failed in 'Get Host Statistics' method
2019-06-17 22:55:11,479+02 WARN  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsAsyncVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-20) [] Unexpected return value: Status [code=-32603, message=Internal JSON-RPC error: {'reason': '[Errno 19] vnet111 is not present in the system'}]
2019-06-17 22:55:11,482+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-20) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM rhvh01.example.com command Get Host Statistics failed: Internal JSON-RPC error: {'reason': '[Errno 19] vnet111 is not present in the system'}
2019-06-17 22:55:11,482+02 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-20) [] Unable to GetStats: VDSErrorException: VDSGenericException: VDSErrorException: Failed to Get Host Statistics, error = Internal JSON-RPC error: {'reason': '[Errno 19] vnet111 is not present in the system'}, code = -32603

2019-06-17 22:56:10,292+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [] VM '96328f4c-b30e-4b54-95a9-a437fabeebe5' was reported as Down on VDS '3c3226f2-30d5-4068-bda5-51241cd8355b'(rhvh01.example.com)
2019-06-17 22:56:10,293+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [] START, DestroyVDSCommand(HostName = rhvh01.example.com, DestroyVmVDSCommandParameters:{hostId='3c3226f2-30d5-4068-bda5-51241cd8355b', vmId='96328f4c-b30e-4b54-95a9-a437fabeebe5', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 54f67c5f
2019-06-17 22:56:10,299+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [] FINISH, DestroyVDSCommand, log id: 54f67c5f
2019-06-17 22:56:10,299+02 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [] VM '96328f4c-b30e-4b54-95a9-a437fabeebe5'(VM01) moved from 'PoweringDown' --> 'Down'
2019-06-17 22:56:10,318+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-74) [] EVENT_ID: VM_DOWN(61), VM VM01 is down. Exit message: Admin shut down from the engine

(Originally by Juan Orti Alcaine)

Comment 3 RHV bug bot 2019-08-22 13:01:28 UTC
The statistic gathering is not atomic in nature, therefore, the entities that exist at the beginning of the transaction may not exist during the whole transaction period.
Therefore, statistics gathering may fail just because one interface disappeared in the middle.

We could make this tolerable by absorbing such errors at the interface statistics level.

(Originally by edwardh)

Comment 6 RHV bug bot 2019-08-22 13:01:33 UTC
*** Bug 1722746 has been marked as a duplicate of this bug. ***

(Originally by Michael Burman)

Comment 10 Bell Levin 2019-09-03 07:58:57 UTC
Verified on - vdsm-4.30.29-1.el7ev.x86_64

The issue was the report. Gathering a list of devices, and executing a report on each one of them would throw an error if a device would be disconnected after the device list was gathered, but the report not yet executed.

Verified by manually inserting sleep after the device was fetched from the list, but before the statistics report started - then manually removing the selected device.

Comment 11 Michael Burman 2019-09-03 08:19:33 UTC
Thank you Bell with helping us verifying this bug.

Comment 14 errata-xmlrpc 2019-10-10 15:36:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3009


Note You need to log in before you can comment on or make changes to this bug.