Description of problem: Getting "Network error while communicating to Host" once the Set up Host networks command is called. Unable to Refresh Capabilities for the host, once we receive the error engine.log 2015-04-02 18:03:01,597 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.70.43.181 2015-04-02 18:03:23,096 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (org.ovirt.thread.pool-8-thread-42) [695670cc] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand' return value 'org.ovirt.engine.core.vdsbroker .vdsbroker.VDSInfoReturnForXmlRpc@56477f9c' 2015-04-02 18:03:23,096 WARN [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) [] Exception thrown during messa ge processing 2015-04-02 18:03:23,098 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (org.ovirt.thread.pool-8-thread-42) [695670cc] HostName = 10.70.43.181 2015-04-02 18:03:23,100 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (org.ovirt.thread.pool-8-thread-42) [695670cc] Command 'GetCapabilitiesVDSCommand(HostName = 10.70.43.181, HostId = bf1433c3-cde0-4ea7-9b08-2a9950efcb07, vds=Host[10.70.4 3.181,bf1433c3-cde0-4ea7-9b08-2a9950efcb07])' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues 2015-04-02 18:03:23,135 INFO [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (org.ovirt.thread.pool-8-thread-42) [1f374caf] Runnin g command: HandleVdsVersionCommand internal: true. Entities affected : ID: bf1433c3-cde0-4ea7-9b08-2a9950efcb07 Type: VDS 2015-04-02 18:03:23,137 ERROR [org.ovirt.engine.core.bll.RefreshHostCapabilitiesCommand] (org.ovirt.thread.pool-8-thread-42) [1f374caf] Command 'org.ovirt.engine.core.bll.RefreshHostCapabilitiesCommand' failed: VDSGenericException: VDSNetworkException: Message timeout w hich can be caused by communication issues 2015-04-02 18:03:23,138 ERROR [org.ovirt.engine.core.bll.RefreshHostCapabilitiesCommand] (org.ovirt.thread.pool-8-thread-42) [1f374caf] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout wh ich can be caused by communication issues at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:183) [vdsbroker.j ar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:1 6) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:101) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:55) [vdsbroker.jar:] at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:] Attached trimmed vdsm.log Detector thread::WARNING::2015-04-02 17:50:52,487::protocoldetector::122::ProtocolDetector.Detector::(handle_read) Unrecognized protocol: '\n\n\nCONNECT\n' Version-Release number of selected component (if applicable): ovirt-master, vdsm-master (from nightly repo) How reproducible: Steps to Reproduce: 1. Add a host to a cluster (in my case with only gluster service enabled) 2. Add a nic to host, and assign network to second nic using Set Up host networks 3. Save Actual results: SetUpHostNetworks UI stays for in processing stage for 1-2 minutes and then gives an error about network error while communicating to host. No further operations are possible on host Expected results: Additional info:
Created attachment 1010150 [details] vdsm
Saw this reproduced on the setups of aaviram and mburman as well. Seems like one of the symptoms is that following a Setup Networks command, running getVdsCaps (either manually via commandline or via the engine's Refresh Capabilities buttom) hangs. Since Setup Networks always runs getVdsCaps in the end, clearly if that hangs then Setup Networks should time out... Michael, you said restarting the service gets getVdsCaps working again until the next Setup Networks command? Just making sure we're aware of all the symptoms to make sure the fix is right.
Hi Lior, Yes i did, but now getVdsCaps working without need to restart and 'refresh capabilities' working as well, but all SN operations are stuck in process much more then 2 minutes in my case. vdsmd service is active, but there is errors: [root@red-vds2 ~]# systemctl status vdsmd vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Mon 2015-04-06 09:32:37 IDT; 2h 37min ago Process: 10869 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Process: 10872 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 10967 (vdsm) CGroup: /system.slice/vdsmd.service ├─10967 /usr/bin/python /usr/share/vdsm/vdsm ├─11055 /usr/libexec/ioprocess --read-pipe-fd 61 --write-pipe-fd 60 --max-threads 10 --max-queued-requests 10 └─11103 /usr/libexec/ioprocess --read-pipe-fd 76 --write-pipe-fd 75 --max-threads 10 --max-queued-requests 10 Apr 06 09:37:26 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm vm.Vm ERROR vmId=`9fc4b616-8250-4003-9df3-01bbce0a10d6`::Alias not found for device type graphics during migration at destination host Apr 06 09:37:30 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm vm.Vm ERROR vmId=`9fc4b616-8250-4003-9df3-01bbce0a10d6`::desktopLock failed Traceback (most recent call last): File "/usr/share/vdsm/virt/guestagent.py", line 364, in desktopLock... Apr 06 09:37:38 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm periodic.VmDispatcher WARNING could not run <class 'virt.periodic.UpdateVolumes'> on ['9fc4b616-8250-4003-9df3-01bbce0a10d6'] Apr 06 10:02:05 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('nfo.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...0\x00\x00\x00 Apr 06 10:02:05 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('iet></Item><Item><rasd:Caption>1024 MB of memory</rasd:Caption><rasd:Descrip.../rasd:Allocat Apr 06 10:02:05 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0 Apr 06 11:02:06 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0 Apr 06 11:02:06 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0 Apr 06 12:02:09 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0 Apr 06 12:02:09 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0 Hint: Some lines were ellipsized, use -l to show in full.
There was an issue in switching policy for heartbeat during setupNetworks. The issue is not related to setupNetworks but to changes in transport configuration.
Piotr - is the issue relevant in 3.5 as well, or only 3.6?
3.6 only
Verified on - 3.6.0-0.0.master.20150412172306.git55ba764.el6 and vdsm-4.17.0-633.git7ad88bc.el7.x86_64
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue. If problems still persist, please open a new BZ and reference this one.