Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1208525

Summary: Host in Connecting state once SetupNetworks is called
Product: [Retired] oVirt Reporter: Sahina Bose <sabose>
Component: ovirt-engine-webadminAssignee: Piotr Kliczewski <pkliczew>
Status: CLOSED CURRENTRELEASE QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6CC: amureini, bazulay, bugs, ecohen, gklein, lsurette, mburman, mgoldboi, oourfali, pkliczew, rbalakri, yeylon
Target Milestone: ---   
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: ovirt-engine-3.6.0-0.0.master.20150412172306.git55ba764 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-04 11:29:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm none

Description Sahina Bose 2015-04-02 12:51:38 UTC
Description of problem:

Getting "Network error while communicating to Host" once the Set up Host networks command is called.
Unable to Refresh Capabilities for the host, once we receive the error

engine.log 

2015-04-02 18:03:01,597 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to /10.70.43.181
2015-04-02 18:03:23,096 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (org.ovirt.thread.pool-8-thread-42)
 [695670cc] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand' return value 'org.ovirt.engine.core.vdsbroker
.vdsbroker.VDSInfoReturnForXmlRpc@56477f9c'
2015-04-02 18:03:23,096 WARN  [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) [] Exception thrown during messa
ge processing
2015-04-02 18:03:23,098 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (org.ovirt.thread.pool-8-thread-42)
 [695670cc] HostName = 10.70.43.181
2015-04-02 18:03:23,100 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (org.ovirt.thread.pool-8-thread-42)
 [695670cc] Command 'GetCapabilitiesVDSCommand(HostName = 10.70.43.181, HostId = bf1433c3-cde0-4ea7-9b08-2a9950efcb07, vds=Host[10.70.4
3.181,bf1433c3-cde0-4ea7-9b08-2a9950efcb07])' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be 
caused by communication issues
2015-04-02 18:03:23,135 INFO  [org.ovirt.engine.core.bll.HandleVdsVersionCommand] (org.ovirt.thread.pool-8-thread-42) [1f374caf] Runnin
g command: HandleVdsVersionCommand internal: true. Entities affected :  ID: bf1433c3-cde0-4ea7-9b08-2a9950efcb07 Type: VDS
2015-04-02 18:03:23,137 ERROR [org.ovirt.engine.core.bll.RefreshHostCapabilitiesCommand] (org.ovirt.thread.pool-8-thread-42) [1f374caf]
 Command 'org.ovirt.engine.core.bll.RefreshHostCapabilitiesCommand' failed: VDSGenericException: VDSNetworkException: Message timeout w
hich can be caused by communication issues
2015-04-02 18:03:23,138 ERROR [org.ovirt.engine.core.bll.RefreshHostCapabilitiesCommand] (org.ovirt.thread.pool-8-thread-42) [1f374caf]
 Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout wh
ich can be caused by communication issues
        at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:183) [vdsbroker.j
ar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:1
6) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:101) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:55) [vdsbroker.jar:]
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]


Attached trimmed vdsm.log
Detector thread::WARNING::2015-04-02 17:50:52,487::protocoldetector::122::ProtocolDetector.Detector::(handle_read) Unrecognized protocol: '\n\n\nCONNECT\n'


Version-Release number of selected component (if applicable):
ovirt-master, vdsm-master (from nightly repo)

How reproducible:


Steps to Reproduce:
1. Add a host to a cluster (in my case with only gluster service enabled)
2. Add a nic to host, and assign network to second nic using Set Up host networks 
3. Save

Actual results:

SetUpHostNetworks UI stays for in processing stage for 1-2 minutes and then gives an error about network error while communicating to host.

No further operations are possible on host

Expected results:


Additional info:

Comment 1 Sahina Bose 2015-04-02 12:52:42 UTC
Created attachment 1010150 [details]
vdsm

Comment 2 Lior Vernia 2015-04-06 07:49:00 UTC
Saw this reproduced on the setups of aaviram and mburman as well.

Seems like one of the symptoms is that following a Setup Networks command, running getVdsCaps (either manually via commandline or via the engine's Refresh Capabilities buttom) hangs. Since Setup Networks always runs getVdsCaps in the end, clearly if that hangs then Setup Networks should time out...

Michael, you said restarting the service gets getVdsCaps working again until the next Setup Networks command?

Just making sure we're aware of all the symptoms to make sure the fix is right.

Comment 3 Michael Burman 2015-04-06 09:11:20 UTC
Hi Lior,

Yes i did, but now getVdsCaps working without need to restart and 'refresh capabilities' working as well, but all SN operations are stuck in process much more then 2 minutes in my case.


vdsmd service is active, but there is errors:
[root@red-vds2 ~]# systemctl status vdsmd
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: active (running) since Mon 2015-04-06 09:32:37 IDT; 2h 37min ago
  Process: 10869 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Process: 10872 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 10967 (vdsm)
   CGroup: /system.slice/vdsmd.service
           ├─10967 /usr/bin/python /usr/share/vdsm/vdsm
           ├─11055 /usr/libexec/ioprocess --read-pipe-fd 61 --write-pipe-fd 60 --max-threads 10 --max-queued-requests 10
           └─11103 /usr/libexec/ioprocess --read-pipe-fd 76 --write-pipe-fd 75 --max-threads 10 --max-queued-requests 10

Apr 06 09:37:26 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm vm.Vm ERROR vmId=`9fc4b616-8250-4003-9df3-01bbce0a10d6`::Alias not found for device type graphics during migration at destination host
Apr 06 09:37:30 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm vm.Vm ERROR vmId=`9fc4b616-8250-4003-9df3-01bbce0a10d6`::desktopLock failed
                                                            Traceback (most recent call last):
                                                              File "/usr/share/vdsm/virt/guestagent.py", line 364, in desktopLock...
Apr 06 09:37:38 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm periodic.VmDispatcher WARNING could not run <class 'virt.periodic.UpdateVolumes'> on ['9fc4b616-8250-4003-9df3-01bbce0a10d6']
Apr 06 10:02:05 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('nfo.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...0\x00\x00\x00
Apr 06 10:02:05 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('iet></Item><Item><rasd:Caption>1024 MB of memory</rasd:Caption><rasd:Descrip.../rasd:Allocat
Apr 06 10:02:05 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0
Apr 06 11:02:06 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0
Apr 06 11:02:06 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0
Apr 06 12:02:09 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0
Apr 06 12:02:09 red-vds2.qa.lab.tlv.redhat.com vdsm[10967]: vdsm BindingXMLRPC.RequestHandler ERROR Bad request syntax ('info.json\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0...00\x00\x00\x0
Hint: Some lines were ellipsized, use -l to show in full.

Comment 4 Piotr Kliczewski 2015-04-07 06:49:35 UTC
There was an issue in switching policy for heartbeat during setupNetworks. The issue is not related to setupNetworks but to changes in transport configuration.

Comment 5 Oved Ourfali 2015-04-07 11:35:00 UTC
Piotr - is the issue relevant in 3.5 as well, or only 3.6?

Comment 6 Piotr Kliczewski 2015-04-07 12:36:31 UTC
3.6 only

Comment 7 Michael Burman 2015-04-19 05:54:04 UTC
Verified on - 3.6.0-0.0.master.20150412172306.git55ba764.el6
and vdsm-4.17.0-633.git7ad88bc.el7.x86_64

Comment 8 Sandro Bonazzola 2015-11-04 11:29:36 UTC
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue.
If problems still persist, please open a new BZ and reference this one.