| Summary: | VDSNetworkException: Timeout during xml-rpc call when adding host | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Michael Burman <mburman> | ||||||
| Component: | BLL.Network | Assignee: | Dan Kenigsberg <danken> | ||||||
| Status: | CLOSED DUPLICATE | QA Contact: | Michael Burman <mburman> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 4.0.5.2 | CC: | bugs, fdeutsch, masayag, mburman, michal.skrivanek, obockows, pkliczew, ycui | ||||||
| Target Milestone: | ovirt-4.0.5 | Keywords: | Regression | ||||||
| Target Release: | --- | Flags: | gklein:
ovirt-4.0.z?
gklein: blocker? mburman: planning_ack? mburman: devel_ack? mburman: testing_ack? |
||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-10-30 09:24:25 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
2016-10-25 10:01:45,363 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (org.ovirt.thread.pool-6-thread-45) [409fafa2] START, HostSetupNetworksVDSCommand(HostName = orchid-vds2.qa.lab.tlv.redhat.com, HostSetupNetworksVdsCommandParameters:{runAsync='true', hostId='9020e220-a5db-4393-9fe3-a595dd0faa38', vds='Host[orchid-vds2.qa.lab.tlv.redhat.com,9020e220-a5db-4393-9fe3-a595dd0faa38]', rollbackOnFailure='true', connectivityTimeout='120', networks='[HostNetwork:{defaultRoute='true', bonding='false', networkName='ovirtmgmt', nicName='enp4s0', vlan='null', mtu='0', vmNetwork='true', stp='false', properties='null', ipv4BootProtocol='DHCP', ipv4Address='null', ipv4Netmask='null', ipv4Gateway='null', ipv6BootProtocol='NONE', ipv6Address='null', ipv6Prefix='null', ipv6Gateway='null', switchType='LEGACY'}]', removedNetworks='[]', bonds='[]', removedBonds='[]'}), log id: 76324159
2016-10-25 10:01:45,369 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (org.ovirt.thread.pool-6-thread-45) [409fafa2] FINISH, HostSetupNetworksVDSCommand, log id: 76324159
2016-10-25 10:01:45,398 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-45) [409fafa2] Policy reset required for network reconfiguration
2016-10-25 10:01:45,589 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to orchid-vds2.qa.lab.tlv.redhat.com/10.35.128.15
2016-10-25 10:01:50,345 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-45) [409fafa2] Command 'PollVDSCommand(HostName = orchid-vds2.qa.lab.tlv.redhat.com, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='9020e220-a5db-4393-9fe3-a595dd0faa38'})' execution failed: VDSGenericException: VDSNetworkException: Timeout during xml-rpc call
2016-10-25 10:01:50,346 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-45) [409fafa2] Timeout waiting for VDSM response: Internal timeout occured
2016-10-25 10:01:52,851 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-45) [409fafa2] Command 'PollVDSCommand(HostName = orchid-vds2.qa.lab.tlv.redhat.com, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='9020e220-a5db-4393-9fe3-a595dd0faa38'})' execution failed: VDSGenericException: VDSNetworkException: Timeout during xml-rpc call
Vdsm receives the request, and acts upon it after (quite a long) minute
jsonrpc.Executor/3::DEBUG::2016-10-25 10:01:45,855::__init__::530::jsonrpc.JsonRpcServer::(_handle_request) Calling 'Host.setupNetworks' in bridge with {'bondings': {}, 'networks': {'ovirtmgmt': {'ipv6autoconf': False, 'nic': 'enp4s0', 'mtu': 1500, 'switch': 'legacy', 'dhcpv6': False, 'STP': 'no', 'bridged': 'true', 'defaultRoute': True, 'bootproto': 'dhcp'}}, 'options': {'connectivityCheck': 'true', 'connectivityTimeout': 120}}
...
jsonrpc.Executor/3::DEBUG::2016-10-25 10:02:41,802::__init__::555::jsonrpc.JsonRpcServer::(_handle_request) Return 'Host.setupNetworks' in bridge with {'message': 'Done', 'code': 0}
jsonrpc.Executor/3::INFO::2016-10-25 10:02:41,802::__init__::513::jsonrpc.JsonRpcServer::(_serveRequest) RPC call Host.setupNetworks succeeded in 55.94 seconds
and answers pings
jsonrpc.Executor/0::DEBUG::2016-10-25 10:02:43,177::__init__::530::jsonrpc.JsonRpcServer::(_handle_request) Calling 'Host.ping' in bridge with {}
jsonrpc.Executor/0::DEBUG::2016-10-25 10:02:43,177::__init__::555::jsonrpc.JsonRpcServer::(_handle_request) Return 'Host.ping' in bridge with True
7 seconds before Engine declated timeout.
Which version of vdsm-jsonrpc-java was used here?
vdsm-jsonrpc-java-1.2.6-1.el7ev.noarch The one that coming from http://bob.eng.lab.tlv.redhat.com/builds/4.0/4.0.5-3/el$releasever Piotr, could you take a look? It seems awfully familiar. Dan based on the logs I see that:
Engine connected to vdsm:
2016-10-25 10:01:42,400 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to orchid-vds2.qa.lab.tlv.redhat.com/10.35.128.15
engine initiated setup networks:
2016-10-25 10:01:45,149 INFO [org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand] (org.ovirt.thread.pool-6-thread-45) [409fafa2] Lock Acquired to object 'EngineLock:{exclusiveLocks='[HOST_NETWORK9020e220-a5db-4393-9fe3-a595dd0faa38=<HOST_NETWORK, ACTION_TYPE_FAILED_SETUP_NETWORKS_OR_REFRESH_IN_PROGRESS>]', sharedLocks='null'}'
vdsm received setupNetworks call:
jsonrpc.Executor/3::DEBUG::2016-10-25 10:01:45,855::__init__::530::jsonrpc.JsonRpcServer::(_handle_request) Calling 'Host.setupNetworks' in bridge with {'bondings': {}, 'networks': {'ovirtmgmt': {'ipv6autoconf': False, 'nic': 'enp4s0', 'mtu': 1500, 'switch': 'legacy', 'dhcpv6': False, 'STP': 'no', 'bridged': 'true', 'defaultRoute': True, 'bootproto': 'dhcp'}}, 'options': {'connectivityCheck': 'true', 'connectivityTimeout': 120}}
vdsm responsed to the call:
jsonrpc.Executor/3::DEBUG::2016-10-25 10:02:41,802::__init__::555::jsonrpc.JsonRpcServer::(_handle_request) Return 'Host.setupNetworks' in bridge with {'message': 'Done', 'code': 0}
jsonrpc.Executor/3::INFO::2016-10-25 10:02:41,802::__init__::513::jsonrpc.JsonRpcServer::(_serveRequest) RPC call Host.setupNetworks succeeded in 55.94 seconds
engine triggered policy reset after the setupNetworks:
2016-10-25 10:02:58,158 ERROR [org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand] (org.ovirt.thread.pool-6-thread-45) [409fafa2] Command 'org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Policy reset (Failed with error VDS_NETWORK_ERROR and code 5022)
engine declared host as nonResponsive:
2016-10-25 10:02:58,160 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-6-thread-22) [409fafa2] Host 'orchid-vds2.qa.lab.tlv.redhat.com' is not responding.
The timeout you mentioned is poll timeout not verb timeout. We know that after post second policy reset it seems to be OK. We know that setupNetworks unlocked the host and VdsManager declared it as not responsive. I wonder whether we have a race between setupNetworks and host monitoring.
Created attachment 1214543 [details]
detailed engine.log
Attached is the detailed engine.log which contains both engine in TRACE log level and jsonrpc in DEBUG.
(In reply to Piotr Kliczewski from comment #4) > I wonder whether we have a race between setupNetworks and host monitoring. We very well might have, but then this is not 100% reproducible as the description says. So I wonder...does it really reproduce all the time in manual scenarios? It is 100% reproducible for me at least) (In reply to Michal Skrivanek from comment #6) > (In reply to Piotr Kliczewski from comment #4) > > I wonder whether we have a race between setupNetworks and host monitoring. > > We very well might have, but then this is not 100% reproducible as the > description says. So I wonder...does it really reproduce all the time in > manual scenarios? I don't think it is reproducible on all environments, but I just got one more report from RHEV Storage QE that it was reproduced in a different environment. Looks like it the same issue - https://bugzilla.redhat.com/show_bug.cgi?id=1388778 I would say it is not the same issue but BZ #1388778 fixes this issue. (In reply to Piotr Kliczewski from comment #10) > I would say it is not the same issue but BZ #1388778 fixes this issue. So we can close-duplicate, as I've added a note there to check this scenario when verifying it. *** This bug has been marked as a duplicate of bug 1388778 *** |
Created attachment 1213739 [details] logs Description of problem: VDSNetworkException: Timeout during xml-rpc call when adding host. When adding new host or changing the management network from VM to non-VM(bridge less) we failing with the next error exception: 2016-10-25 10:02:50,589 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-6-thread-45) [409fafa2] Command 'PollVDSCommand(HostName = orchid-vds2.qa.lab.tlv.redhat.com, VdsIdVD SCommandParametersBase:{runAsync='true', hostId='9020e220-a5db-4393-9fe3-a595dd0faa38'})' execution failed: VDSGenericException: VDSNetworkException: Timeout during xml-rpc call This will end up with failure to configure management network on the host, after few minutes the host will be auto recovered and will go up, BUT, the management network won't be persisted, which will cause the host not to come up after reboot. Version-Release number of selected component (if applicable): 4.0.5.2-0.2.el7ev How reproducible: 100 Steps to Reproduce: 1. Add new host to rhv-m 4.0.5.2-0.2.el7ev Actual results: - Failed to configure management network on the host - Host was auto recovered after few minutes - Management network wasn't persisted - After reboot host didn't recovered and 'ovirtmgmt' wasn't restored Expected results: Should work as expected Additional info: