Bug 1340651 - RHEVM 3.6 host keeps disconnecting
Summary: RHEVM 3.6 host keeps disconnecting
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Host-Deploy
Version: 3.6.5.1
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: ---
Assignee: Sandro Bonazzola
QA Contact: Pavel Stehlik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-29 16:59 UTC by brascon01
Modified: 2016-05-30 09:09 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-30 08:53:20 UTC
oVirt Team: Infra
Embargoed:
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
Ovirt-Engine and VDSM logs (844.23 KB, application/x-rar)
2016-05-30 08:47 UTC, brascon01
no flags Details

Description brascon01 2016-05-29 16:59:27 UTC
Description of problem:

Hi All,

hope someone will assist, i have been using RHEVM 3.5  without any issues,  decided to test ovirt 3.6, i have 2 dell R710 set in a cluster, whenever i went with RHEVM 3.6 , one node is always disconnecting after 4 or 5 minutes interval,  network is fine there are no drops as nothing was changed,  and it was working with 3.5 without any issues. Please let me know if you need further information.

Version-Release number of selected component (if applicable):


How reproducible:

to reproduce it whenever i approve the same server it will go in an up state after 4 minutes it shows connecting again.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Below error message:

3-9d6d-43a2-aaa1-862b9596defe]'})' execution failed: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection failed
2016-05-28 23:13:10,479 ERROR [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-42) [] Failure to refresh Vds runtime info: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection failed
2016-05-28 23:13:10,479 ERROR [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-42) [] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection failed
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createNetworkException(VdsBrokerCommand.java:157) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:120) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:65) [vdsbroker.jar:]
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:467) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:652) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:119) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:84) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:227) [vdsbroker.jar:]
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) [:1.7.0_101]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_101]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_101]
        at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81) [scheduler.jar:]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]
Caused by: org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection failed
        at org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient.connect(ReactorClient.java:157) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.getClient(JsonRpcClient.java:114) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:73) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:68) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer.getCapabilities(JsonRpcVdsServer.java:268) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:15) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110) [vdsbroker.jar:]
        ... 14 more

2016-05-28 23:13:10,480 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-42) [] Failed to refresh VDS, network error, continuing, vds='rhev10.test.net'(9a6659e3-9d6d-43a2-aaa1-862b9596defe): org.ovirt.vdsm.jsonrpc.client.ClientConnectionException: Connection failed

Comment 1 Oved Ourfali 2016-05-30 04:53:37 UTC
Are you using oVirt or RHEV-M?
Please test with 3.6.6, as it seems like another issue that we solved.
Also, you need to attach all logs, and also describe what you did.

Comment 2 brascon01 2016-05-30 08:45:07 UTC
(In reply to Oved Ourfali from comment #1)
> Are you using oVirt or RHEV-M?
> Please test with 3.6.6, as it seems like another issue that we solved.
> Also, you need to attach all logs, and also describe what you did.

I am using RHEV-M 3.6 vesrion 3.6.5.3

can you please help how to update to 3.6.6 when i do engine-check-upgrade  there is no upgrade available.


We get the events as follow and it repeats each 4 or 5 minutes then the host will come up,  an it is happening to only one host.  i have rebuilt the host many times,  upgraded the server firmeware, network it is fine whenever i downgrade to 3.5 there will be no issues.

1-VDSM rhev10.test.net command failed: Failed to read hardware information
2-Could not get hardware information for host rhev10.test.net
3-Status of host rhev10.test.net was set to Up

the uuid cannot be shown in hardware tan in RHEV-m but it shows on rhev-h

[root@rhev10 log]# dmidecode -s system-uuid
4C4C4544-0052-5110-8039-B9C04F5A344A


########################################

[root@rhev10 log]# systemctl status libvirtd vdsmd
â— libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/libvirtd.service.d
           └─unlimited-core.conf
   Active: active (running) since Mon 2016-05-30 08:18:16 UTC; 19min ago
     Docs: man:libvirtd(8)
           http://libvirt.org
 Main PID: 2941 (libvirtd)
   CGroup: /system.slice/libvirtd.service
           └─2941 /usr/sbin/libvirtd --listen

May 30 08:30:25 rhev10.test.net libvirtd[2941]: End of file while reading data: Input/output error
May 30 08:30:25 rhev10.test.net libvirtd[2941]: End of file while reading data: Input/output error
May 30 08:31:54 rhev10.test.net libvirtd[2941]: End of file while reading data: Input/output error
May 30 08:31:54 rhev10.test.net libvirtd[2941]: End of file while reading data: Input/output error
May 30 08:33:35 rhev10.test.net libvirtd[2941]: End of file while reading data: Input/output error
May 30 08:33:35 rhev10.test.net libvirtd[2941]: End of file while reading data: Input/output error
May 30 08:35:09 rhev10.test.net libvirtd[2941]: End of file while reading data: Input/output error
May 30 08:35:09 rhev10.test.net libvirtd[2941]: End of file while reading data: Input/output error
May 30 08:36:58 rhev10.test.net libvirtd[2941]: End of file while reading data: Input/output error
May 30 08:36:58 rhev10.test.net libvirtd[2941]: End of file while reading data: Input/output error

â— vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2016-05-30 08:37:23 UTC; 10s ago
  Process: 5048 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Process: 5054 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 5183 (vdsm)
   CGroup: /system.slice/vdsmd.service
           └─5183 /usr/bin/python /usr/share/vdsm/vdsm

May 30 08:37:25 rhev10.test.net python[5183]: DIGEST-MD5 ask_user_info()
May 30 08:37:25 rhev10.test.net python[5183]: DIGEST-MD5 client step 1
May 30 08:37:25 rhev10.test.net python[5183]: DIGEST-MD5 ask_user_info()
May 30 08:37:25 rhev10.test.net python[5183]: DIGEST-MD5 make_client_response()
May 30 08:37:25 rhev10.test.net python[5183]: DIGEST-MD5 client step 2
May 30 08:37:25 rhev10.test.net python[5183]: DIGEST-MD5 parse_server_challenge()
May 30 08:37:25 rhev10.test.net python[5183]: DIGEST-MD5 ask_user_info()
May 30 08:37:25 rhev10.test.net python[5183]: DIGEST-MD5 make_client_response()
May 30 08:37:25 rhev10.test.net python[5183]: DIGEST-MD5 client step 3
May 30 08:37:29 rhev10.test.net vdsm[5183]: vdsm vds ERROR failed to retrieve hardware info
                                            Traceback (most recent call last):
                                              File "/usr/share/vdsm/API.py", line 1342, in getHardwareInfo...
Hint: Some lines were ellipsized, use -l to show in full.


Many thanks

Comment 3 brascon01 2016-05-30 08:47:22 UTC
Created attachment 1162700 [details]
Ovirt-Engine and VDSM logs

This is the logs for both vdsm and Ovirt engine,  your help will be immensly appreciated.  thanks again

Comment 5 brascon01 2016-05-30 09:03:28 UTC
Hi Ourfali,

i see this was closed,  can you please tell me how to resolve this if it is not a bug.

thanks

Comment 6 Oved Ourfali 2016-05-30 09:09:51 UTC
(In reply to brascon01 from comment #5)
> Hi Ourfali,
> 
> i see this was closed,  can you please tell me how to resolve this if it is
> not a bug.
> 
> thanks

You need to contact support, as you're using RHEV-M.


Note You need to log in before you can comment on or make changes to this bug.