Bug 1208525 - Host in Connecting state once SetupNetworks is called
Summary: Host in Connecting state once SetupNetworks is called
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-webadmin
Version: 3.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.6.0
Assignee: Piotr Kliczewski
QA Contact: Michael Burman
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-02 12:51 UTC by Sahina Bose
Modified: 2016-02-10 19:30 UTC (History)
12 users (show)

Fixed In Version: ovirt-engine-3.6.0-0.0.master.20150412172306.git55ba764
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-04 11:29:36 UTC
oVirt Team: Infra
Embargoed:


Attachments (Terms of Use)
vdsm (112.08 KB, text/plain)
2015-04-02 12:52 UTC, Sahina Bose
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 39517 0 master MERGED reset of connection policy not restored properly after setupNetworks Never
oVirt gerrit 39601 0 master MERGED jsonrpc: reset heartbeat during network setup Never

Description Sahina Bose 2015-04-02 12:51:38 UTC
Description of problem:

Getting "Network error while communicating to Host" once the Set up Host networks command is called.
Unable to Refresh Capabilities for the host, once we receive the error

engine.log 

2015-04-02 18:03:01,597 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.70.43.181
2015-04-02 18:03:23,096 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (org.ovirt.thread.pool-8-thread-42)
 [695670cc] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand' return value 'org.ovirt.engine.core.vdsbroker
.vdsbroker.VDSInfoReturnForXmlRpc@56477f9c'
2015-04-02 18:03:23,096 WARN  [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) [] Exception thrown during messa
ge processing
2015-04-02 18:03:23,098 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (org.ovirt.thread.pool-8-thread-42)
 [695670cc] HostName = 10.70.43.181
2015-04-02 18:03:23,100 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (org.ovirt.thread.pool-8-thread-42)
 [695670cc] Command 'GetCapabilitiesVDSCommand(HostName = 10.70.43.181, HostId = bf1433c3-cde0-4ea7-9b08-2a9950efcb07, vds=Host[10.70.4
3.181,bf1433c3-cde0-4ea7-9b08-2a9950efcb07])' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be 
caused by communication issues
2015-04-02 18:03:23,135 INFO  [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (org.ovirt.thread.pool-8-thread-42) [1f374caf] Runnin
g command: HandleVdsVersionCommand internal: true. Entities affected :  ID: bf1433c3-cde0-4ea7-9b08-2a9950efcb07 Type: VDS
2015-04-02 18:03:23,137 ERROR [org.ovirt.engine.core.bll.RefreshHostCapabilitiesCommand] (org.ovirt.thread.pool-8-thread-42) [1f374caf]
 Command 'org.ovirt.engine.core.bll.RefreshHostCapabilitiesCommand' failed: VDSGenericException: VDSNetworkException: Message timeout w
hich can be caused by communication issues
2015-04-02 18:03:23,138 ERROR [org.ovirt.engine.core.bll.RefreshHostCapabilitiesCommand] (org.ovirt.thread.pool-8-thread-42) [1f374caf]
 Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout wh
ich can be caused by communication issues
        at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:183) [vdsbroker.j
ar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:1
6) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:101) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:55) [vdsbroker.jar:]
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]


Attached trimmed vdsm.log
Detector thread::WARNING::2015-04-02 17:50:52,487::protocoldetector::122::ProtocolDetector.Detector::(handle_read) Unrecognized protocol: '\n\n\nCONNECT\n'


Version-Release number of selected component (if applicable):
ovirt-master, vdsm-master (from nightly repo)

How reproducible:


Steps to Reproduce:
1. Add a host to a cluster (in my case with only gluster service enabled)
2. Add a nic to host, and assign network to second nic using Set Up host networks 
3. Save

Actual results:

SetUpHostNetworks UI stays for in processing stage for 1-2 minutes and then gives an error about network error while communicating to host.

No further operations are possible on host

Expected results:


Additional info:

Comment 1 Sahina Bose 2015-04-02 12:52:42 UTC
Created attachment 1010150 [details]
vdsm

Comment 2 Lior Vernia 2015-04-06 07:49:00 UTC
Saw this reproduced on the setups of aaviram and mburman as well.

Seems like one of the symptoms is that following a Setup Networks command, running getVdsCaps (either manually via commandline or via the engine's Refresh Capabilities buttom) hangs. Since Setup Networks always runs getVdsCaps in the end, clearly if that hangs then Setup Networks should time out...

Michael, you said restarting the service gets getVdsCaps working again until the next Setup Networks command?

Just making sure we're aware of all the symptoms to make sure the fix is right.

Comment 3 Michael Burman 2015-04-06 09:11:20 UTC
Hi Lior,

Yes i did, but now getVdsCaps working without need to restart and 'refresh capabilities' working as well, but all SN operations are stuck in process much more then 2 minutes in my case.


vdsmd service is active, but there is errors:
[root@red-vds2 ~]# systemctl status vdsmd
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: active (running) since Mon 2015-04-06 09:32:37 IDT; 2h 37min ago
  Process: 10869 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Process: 10872 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 10967 (vdsm)
   CGroup: /system.slice/vdsmd.service
           ├─10967 /usr/bin/python /usr/share/vdsm/vdsm
           ├─11055 /usr/libexec/ioprocess --read-pipe-fd 61 --write-pipe-fd 60 --max-threads 10 --max-queued-requests 10
           └─11103 /usr/libexec/ioprocess --read-pipe-fd 76 --write-pipe-fd 75 --max-threads 10 --max-queued-requests 10

Apr 06 09:37:26 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm vm.Vm ERROR vmId=`9fc4b616-8250-4003-9df3-01bbce0a10d6`::Alias not found for device type graphics during migration at destination host
Apr 06 09:37:30 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm vm.Vm ERROR vmId=`9fc4b616-8250-4003-9df3-01bbce0a10d6`::desktopLock failed
                                                            Traceback (most recent call last):
                                                              File "/usr/share/vdsm/virt/guestagent.py", line 364, in desktopLock...
Apr 06 09:37:38 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm periodic.VmDispatcher WARNING could not run <class 'virt.periodic.UpdateVolumes'> on ['9fc4b616-8250-4003-9df3-01bbce0a10d6']
Apr 06 10:02:05 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('nfo.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...0\x00\x00\x00
Apr 06 10:02:05 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('iet></Item><Item><rasd:Caption>1024 MB of memory</rasd:Caption><rasd:Descrip.../rasd:Allocat
Apr 06 10:02:05 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0
Apr 06 11:02:06 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0
Apr 06 11:02:06 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0
Apr 06 12:02:09 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0
Apr 06 12:02:09 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0
Hint: Some lines were ellipsized, use -l to show in full.

Comment 4 Piotr Kliczewski 2015-04-07 06:49:35 UTC
There was an issue in switching policy for heartbeat during setupNetworks. The issue is not related to setupNetworks but to changes in transport configuration.

Comment 5 Oved Ourfali 2015-04-07 11:35:00 UTC
Piotr - is the issue relevant in 3.5 as well, or only 3.6?

Comment 6 Piotr Kliczewski 2015-04-07 12:36:31 UTC
3.6 only

Comment 7 Michael Burman 2015-04-19 05:54:04 UTC
Verified on - 3.6.0-0.0.master.20150412172306.git55ba764.el6
and vdsm-4.17.0-633.git7ad88bc.el7.x86_64

Comment 8 Sandro Bonazzola 2015-11-04 11:29:36 UTC
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue.
If problems still persist, please open a new BZ and reference this one.


Note You need to log in before you can comment on or make changes to this bug.