1329317 – Host hangs indefinitely in "Installing" stage during host-deploy

Bug 1329317 - Host hangs indefinitely in "Installing" stage during host-deploy

Summary: Host hangs indefinitely in "Installing" stage during host-deploy

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Host-Deploy
Sub Component:
Version:	3.6.5.1
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	ovirt-3.6.6
Target Release:	3.6.6
Assignee:	Piotr Kliczewski
QA Contact:	Jiri Belka
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-04-21 15:27 UTC by Sven Kieske
Modified:	2016-05-30 10:56 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-05-30 10:56:11 UTC
oVirt Team:	Infra
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-3.6.z+ rule-engine: blocker+ mgoldboi: planning_ack+ oourfali: devel_ack+ pstehlik: testing_ack+

Attachments	(Terms of Use)
log from engine (313.57 KB, text/plain) 2016-04-21 15:27 UTC, Sven Kieske	no flags	Details
host_deploy.log (547.91 KB, text/plain) 2016-04-21 15:28 UTC, Sven Kieske	no flags	Details
new host deploy log (547.95 KB, text/plain) 2016-04-27 11:18 UTC, Sven Kieske	no flags	Details
new vdsm log (781.28 KB, text/plain) 2016-04-27 11:19 UTC, Sven Kieske	no flags	Details
new engine log (3.95 MB, text/plain) 2016-04-27 11:21 UTC, Sven Kieske	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1331344	unspecified	CLOSED	Ovirt fail save network configuration	2021-02-22 00:41:40 UTC
oVirt gerrit	56178	ovirt-engine-3.6	MERGED	connect: update timeout	2020-02-07 19:45:57 UTC
oVirt gerrit	56282	ovirt-engine-3.6	MERGED	vdsbroker: check connection identifier during policy reset	2020-02-07 19:45:57 UTC
oVirt gerrit	56649	ovirt-engine-3.6	MERGED	jsonrpc: version bump	2020-02-07 19:45:57 UTC
oVirt gerrit	56908	ovirt-engine-3.6.6	MERGED	jsonrpc: version bump	2020-02-07 19:45:57 UTC
oVirt gerrit	56909	ovirt-engine-3.6.6	MERGED	connect: update timeout	2020-02-07 19:45:57 UTC
oVirt gerrit	56910	ovirt-engine-3.6.6	MERGED	vdsbroker: check connection identifier during policy reset	2020-02-07 19:45:57 UTC
oVirt gerrit	57097	ovirt-engine-3.6	MERGED	vdsbroker: timeout ping command when no response	2020-02-07 19:45:58 UTC
oVirt gerrit	57125	ovirt-engine-3.6.6	MERGED	vdsbroker: timeout ping command when no response	2020-02-07 19:45:58 UTC

Internal Links: 1331344

Description Sven Kieske 2016-04-21 15:27:39 UTC

Created attachment 1149481 [details]
log from engine

Description of problem:
I'm currently setting up a dev environment with latest engine:
3.6.5.3-1.el7.centos
during host-deploy of a centos 7.2 machine the host got stuck in webadmin gui in mode "installing".

the only thing I can select is "confirm host has been rebooted" which does not help.

I will attach engine.log and host-deploy.log for better analyisation, however there is a stack trace (error in PollVDSCommand) in org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand:

2016-04-21 17:08:13,410 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-8-thread-40) [322b1066] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: V
DSGenericException: VDSNetworkException: Connection reset by peer

on the centos 7.2 node I see this in the log file of vdsm:

systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2016-04-21 16:53:41 CEST; 13min ago
 Main PID: 42279 (vdsm)
   CGroup: /system.slice/vdsmd.service
           └─42279 /usr/bin/python /usr/share/vdsm/vdsm

Apr 21 16:53:41 vrnode0015.vrootdev python[42279]: DIGEST-MD5 client step 1
Apr 21 16:53:41 vrnode0015.vrootdev python[42279]: DIGEST-MD5 ask_user_info()
Apr 21 16:53:41 vrnode0015.vrootdev python[42279]: DIGEST-MD5 make_client_response()
Apr 21 16:53:41 vrnode0015.vrootdev python[42279]: DIGEST-MD5 client step 2
Apr 21 16:53:41 vrnode0015.vrootdev python[42279]: DIGEST-MD5 parse_server_challenge()
Apr 21 16:53:41 vrnode0015.vrootdev python[42279]: DIGEST-MD5 ask_user_info()
Apr 21 16:53:41 vrnode0015.vrootdev python[42279]: DIGEST-MD5 make_client_response()
Apr 21 16:53:41 vrnode0015.vrootdev python[42279]: DIGEST-MD5 client step 3
Apr 21 16:53:46 vrnode0015.vrootdev vdsm[42279]: vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof
Apr 21 16:53:46 vrnode0015.vrootdev vdsm[42279]: vdsm ProtocolDetector.SSLHandshakeDispatcher ERROR Error during handshake: unexpected eof

I was able to get rid of this error in vdsm.log by restarting the service.


Version-Release number of selected component (if applicable):
rpm -qa | grep vdsm
vdsm-4.17.23-0.el7.centos.noarch
vdsm-infra-4.17.23-0.el7.centos.noarch
vdsm-xmlrpc-4.17.23-0.el7.centos.noarch
vdsm-yajsonrpc-4.17.23-0.el7.centos.noarch
vdsm-hook-vmfex-dev-4.17.23-0.el7.centos.noarch
vdsm-cli-4.17.23-0.el7.centos.noarch
vdsm-python-4.17.23-0.el7.centos.noarch
vdsm-jsonrpc-4.17.23-0.el7.centos.noarch

rpm -qa | grep ovirt
ebay-cors-filter-1.0.1-0.1.ovirt.el7.noarch
ovirt-log-collector-3.6.0-1.el7.centos.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper-3.6.5.3-1.el7.centos.noarch
ovirt-engine-websocket-proxy-3.6.5.3-1.el7.centos.noarch
ovirt-engine-restapi-3.6.5.3-1.el7.centos.noarch
ovirt-engine-webadmin-portal-3.6.5.3-1.el7.centos.noarch
ovirt-iso-uploader-3.6.0-1.el7.centos.noarch
ovirt-vmconsole-proxy-1.0.0-1.el7.centos.noarch
ovirt-release36-002-2.noarch
ovirt-engine-wildfly-8.2.1-1.el7.x86_64
ovirt-engine-extension-aaa-ldap-setup-1.1.2-1.el7.centos.noarch
ovirt-engine-extension-aaa-jdbc-1.0.6-1.el7.noarch
ovirt-engine-cli-3.6.2.0-1.el7.centos.noarch
ovirt-engine-lib-3.6.5.3-1.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-common-3.6.5.3-1.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-3.6.5.3-1.el7.centos.noarch
ovirt-engine-setup-plugin-websocket-proxy-3.6.5.3-1.el7.centos.noarch
ovirt-engine-tools-backup-3.6.5.3-1.el7.centos.noarch
ovirt-engine-extensions-api-impl-3.6.5.3-1.el7.centos.noarch
ovirt-engine-wildfly-overlay-8.0.5-1.el7.noarch
ovirt-engine-backend-3.6.5.3-1.el7.centos.noarch
ovirt-engine-userportal-3.6.5.3-1.el7.centos.noarch
ovirt-engine-3.6.5.3-1.el7.centos.noarch
ovirt-image-uploader-3.6.0-1.el7.centos.noarch
ovirt-setup-lib-1.0.1-1.el7.centos.noarch
ovirt-vmconsole-1.0.0-1.el7.centos.noarch
ovirt-host-deploy-java-1.4.1-1.el7.centos.noarch
ovirt-engine-extension-aaa-ldap-1.1.2-1.el7.centos.noarch
ovirt-engine-setup-base-3.6.5.3-1.el7.centos.noarch
ovirt-engine-setup-3.6.5.3-1.el7.centos.noarch
ovirt-engine-vmconsole-proxy-helper-3.6.5.3-1.el7.centos.noarch
ovirt-engine-tools-3.6.5.3-1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.2.1-1.el7.centos.noarch
ovirt-engine-dbscripts-3.6.5.3-1.el7.centos.noarch
ovirt-host-deploy-1.4.1-1.el7.centos.noarch


How reproducible:
don't know, as the system seems not to recover from the current situation.

Steps to Reproduce:
1.
2.
3.

Actual results:
host-deploy hang in status "installing"

Expected results:
host-deploy succeeds or at least throws an error in gui without hanging forever

Additional info:
see the attached log files

Comment 1 Sven Kieske 2016-04-21 15:28:13 UTC

Created attachment 1149482 [details]
host_deploy.log

Comment 2 Sven Kieske 2016-04-21 15:38:43 UTC

Additional information from systemctl status network:

systemctl status network
● network.service - LSB: Bring up/down networking
   Loaded: loaded (/etc/rc.d/init.d/network)
   Active: failed (Result: exit-code) since Thu 2016-04-21 16:53:39 CEST; 43min ago
     Docs: man:systemd-sysv-generator(8)

Apr 21 16:53:39 vrnode0015.vrootdev network[41758]: RTNETLINK answers: File exists
Apr 21 16:53:39 vrnode0015.vrootdev network[41758]: RTNETLINK answers: File exists
Apr 21 16:53:39 vrnode0015.vrootdev network[41758]: RTNETLINK answers: File exists
Apr 21 16:53:39 vrnode0015.vrootdev network[41758]: RTNETLINK answers: File exists
Apr 21 16:53:39 vrnode0015.vrootdev network[41758]: RTNETLINK answers: File exists
Apr 21 16:53:39 vrnode0015.vrootdev network[41758]: RTNETLINK answers: File exists
Apr 21 16:53:39 vrnode0015.vrootdev systemd[1]: network.service: control process exited, code=exited status=1
Apr 21 16:53:39 vrnode0015.vrootdev systemd[1]: Failed to start LSB: Bring up/down networking.
Apr 21 16:53:39 vrnode0015.vrootdev systemd[1]: Unit network.service entered failed state.
Apr 21 16:53:39 vrnode0015.vrootdev systemd[1]: network.service failed.

But as you might guess: the network works, at least my old ssh session still works..

Comment 3 Sandro Bonazzola 2016-04-21 15:44:46 UTC

Looks like the host lost connectivity during the deployment. Dan can you have a look at the logs? Assigning to you for further investigations.

2016-04-21 16:53:48,521 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-8-thread-40) [322b1066] Error: org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcRunTimeException: Connection issues during send request
2016-04-21 16:53:48,521 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.PollVDSCommand] (org.ovirt.thread.pool-8-thread-40) [322b1066] Exception: java.util.concurrent.ExecutionException: org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcRunTimeException: Connection issues during send request

I see in host deploy an error like:
RuntimeError: Failed to start service 'network'

which may be related.

Oved, on infra side, engine should have handled the exception and not being stuck saying host is installing. Can you have a look as well?

Comment 4 Red Hat Bugzilla Rules Engine 2016-04-21 15:45:37 UTC

This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 5 Sven Kieske 2016-04-21 15:52:26 UTC

Hi,

vdsm spams this into vdsm.log (log is already over 80MB):

jsonrpc.Executor/4::DEBUG::2016-04-21 17:51:34,378::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.ping' in bridge with True
jsonrpc.Executor/5::DEBUG::2016-04-21 17:51:34,389::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.ping' in bridge with {}
jsonrpc.Executor/5::DEBUG::2016-04-21 17:51:34,389::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.ping' in bridge with True
jsonrpc.Executor/6::DEBUG::2016-04-21 17:51:34,400::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.ping' in bridge with {}
jsonrpc.Executor/6::DEBUG::2016-04-21 17:51:34,400::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.ping' in bridge with True

maybe this helps too:

cat /var/log/vdsm/connectivity.log
2016-04-21 16:53:56,484:DEBUG:recent_client:True, bond0:(operstate:down speed:0 duplex:unknown), lo:(operstate:up speed:0 duplex:unknown), ;vdsmdummy;:(operstate:down speed:0 duplex:unknown), eno2:(operstate:down speed:0 duplex:unknown), eno3:(operstate:down speed:0 duplex:unknown), eno4:(operstate:down speed:0 duplex:unknown), eth0:(operstate:up speed:1000 duplex:full)
2016-04-21 17:08:29,354:DEBUG:recent_client:True, bond0:(operstate:down speed:0 duplex:unknown), lo:(operstate:up speed:0 duplex:unknown), ;vdsmdummy;:(operstate:down speed:0 duplex:unknown), eno2:(operstate:down speed:0 duplex:unknown), eno3:(operstate:down speed:0 duplex:unknown), eno4:(operstate:down speed:0 duplex:unknown), eth0:(operstate:up speed:1000 duplex:full)


actually this might be the same as BZ 1320606 (mentioned on IRC).

HTH

Comment 6 Dan Kenigsberg 2016-04-21 16:52:59 UTC

The ping flood of 1320606 is only a small part of the problem. This bug seems more like bug 1320128

Comment 7 Sven Kieske 2016-04-22 07:56:16 UTC

Here is some more information from journalctl -xe, regarding the network service problem on the host:

-- Unit network.service has begun starting up.
Apr 22 09:52:23 vrnode0015.vrootdev network[3886]: Bringing up loopback interface:  [  OK  ]
Apr 22 09:52:23 vrnode0015.vrootdev network[3886]: Bringing up interface eno1:  ERROR    : [/etc/sysconfig/network-scripts/ifup-eth] Device eno1 does not seem to be present, delaying initialization.
Apr 22 09:52:23 vrnode0015.vrootdev /etc/sysconfig/network-scripts/ifup-eth[4031]: Device eno1 does not seem to be present, delaying initialization.
Apr 22 09:52:23 vrnode0015.vrootdev network[3886]: [FAILED]
Apr 22 09:52:23 vrnode0015.vrootdev network[3886]: Bringing up interface eth0:  [  OK  ]
Apr 22 09:52:23 vrnode0015.vrootdev network[3886]: RTNETLINK answers: File exists
Apr 22 09:52:23 vrnode0015.vrootdev network[3886]: RTNETLINK answers: File exists
Apr 22 09:52:23 vrnode0015.vrootdev network[3886]: RTNETLINK answers: File exists
Apr 22 09:52:23 vrnode0015.vrootdev network[3886]: RTNETLINK answers: File exists
Apr 22 09:52:23 vrnode0015.vrootdev network[3886]: RTNETLINK answers: File exists
Apr 22 09:52:23 vrnode0015.vrootdev network[3886]: RTNETLINK answers: File exists
Apr 22 09:52:23 vrnode0015.vrootdev network[3886]: RTNETLINK answers: File exists
Apr 22 09:52:23 vrnode0015.vrootdev network[3886]: RTNETLINK answers: File exists
Apr 22 09:52:23 vrnode0015.vrootdev network[3886]: RTNETLINK answers: File exists
Apr 22 09:52:23 vrnode0015.vrootdev systemd[1]: network.service: control process exited, code=exited status=1
Apr 22 09:52:23 vrnode0015.vrootdev systemd[1]: Failed to start LSB: Bring up/down networking.
-- Subject: Unit network.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit network.service has failed.
-- 
-- The result is failed.
Apr 22 09:52:23 vrnode0015.vrootdev systemd[1]: Unit network.service entered failed state.
Apr 22 09:52:23 vrnode0015.vrootdev systemd[1]: network.service failed.


actually there is no device eno1, I'm investigating my setup to see where the config file came from..

Comment 8 Sven Kieske 2016-04-22 08:18:25 UTC

Okay, the eno1 config seems to get created by dracut/initrd during setup:

head -n 1 /etc/sysconfig/network-scripts/ifcfg-eno1
# Generated by dracut initrd

my eth0 config was created via cobbler kickstart file (is this hardcoded in cobbler? will have to check later).

so moving the eno1 config file let's network.service start. But when I now start vdsm it continously spams the log again with:

jsonrpc.Executor/7::DEBUG::2016-04-22 10:12:16,378::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.ping' in bridge with {}
jsonrpc.Executor/7::DEBUG::2016-04-22 10:12:16,379::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.ping' in bridge with True

We are talking about 182 entries per second:

grep -c "10:12:16" /var/log/vdsm/vdsm.log
182

Comment 9 Sven Kieske 2016-04-22 12:49:48 UTC

I can confirm that this works for me with this engine version:

 oVirt Engine Version: 3.6.6-0.0.master.20160421175934.git6e24235.el7.centos 


kind regards

Sven

Comment 10 Oved Ourfali 2016-04-22 18:53:41 UTC

As we are struggling to find the issue, it would help to know how certain you are it works in 3.6.6?
How many times you've tried it?

Comment 11 Sven Kieske 2016-04-25 07:49:46 UTC

well 1 time, but I can try again if you like, as I do this via rest api anyway..

Comment 12 Sven Kieske 2016-04-25 08:00:59 UTC

Hi,

I removed DC, Cluster, Local Storage and Host from the engine and readded everything via automation and, with no surprise: it worked like a charm :)

Should I test more often? But as this is done via automation I doubt I will get different results the next time...

HTH

Comment 13 Oved Ourfali 2016-04-26 11:20:14 UTC

Thanks. 
This is really puzzling me.... 

We will check our automation or lago on 3.6.6 and check if it reproduces,  Without any modifications.

Comment 14 Sven Kieske 2016-04-27 10:00:46 UTC

Hi,

hum, I redeployed everything (except a new ovirt-engine).

So I deleted all DCs etc, and erased the host completely and reinstalled via cobbler, and it failed again with above mentioned engine version.

Here is some output, will upload new logs later:

systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2016-04-27 11:56:08 CEST; 3min 3s ago
 Main PID: 14681 (vdsm)
   CGroup: /system.slice/vdsmd.service
           └─14681 /usr/bin/python /usr/share/vdsm/vdsm

Apr 27 11:56:09 vrnode0015.vrootdev python[14681]: DIGEST-MD5 client step 1
Apr 27 11:56:09 vrnode0015.vrootdev python[14681]: DIGEST-MD5 ask_user_info()
Apr 27 11:56:09 vrnode0015.vrootdev python[14681]: DIGEST-MD5 make_client_response()
Apr 27 11:56:09 vrnode0015.vrootdev python[14681]: DIGEST-MD5 client step 2
Apr 27 11:56:09 vrnode0015.vrootdev python[14681]: DIGEST-MD5 parse_server_challenge()
Apr 27 11:56:09 vrnode0015.vrootdev python[14681]: DIGEST-MD5 ask_user_info()
Apr 27 11:56:09 vrnode0015.vrootdev python[14681]: DIGEST-MD5 make_client_response()
Apr 27 11:56:09 vrnode0015.vrootdev python[14681]: DIGEST-MD5 client step 3
Apr 27 11:56:14 vrnode0015.vrootdev vdsm[14681]: vdsm vds.dispatcher ERROR SSL error during reading data: unexpected eof
Apr 27 11:58:25 vrnode0015.vrootdev vdsm[14681]: vdsm vds ERROR connectivity check failed
                                                 Traceback (most recent call last):
                                                   File "/usr/share/vdsm/API.py", line 1641, in _rollback...
Hint: Some lines were ellipsized, use -l to show in full.

Comment 15 Sven Kieske 2016-04-27 11:18:17 UTC

Created attachment 1151293 [details]
new host deploy log

Comment 16 Sven Kieske 2016-04-27 11:19:24 UTC

Created attachment 1151294 [details]
new vdsm log

Comment 17 Sven Kieske 2016-04-27 11:21:40 UTC

Created attachment 1151295 [details]
new engine log

Comment 18 Sven Kieske 2016-04-27 11:28:36 UTC

Host status in Webadmin is "Non Operational".

Comment 19 Oved Ourfali 2016-04-27 11:35:56 UTC

According to the log it seems like a network issue, and not the original "stuck in installing" issue.
Dan - can you check if it gives you additional data? Or was that the issue you solved already a few days ago?

Comment 20 Sven Kieske 2016-04-27 11:41:29 UTC

I see, that in BZ 1320128 (mentioned by Dan Kenigsberg above) just yesterday this patch was merged:
https://gerrit.ovirt.org/#/c/56649/

It requires a newer version of package vdsm-jsonrpc-java:

1.1.10 instead of 1.1.9.

I still got 1.1.9-1.el7 installed. Might this be a problem?

Here are again all actual version informations:

engine:
rpm -qa | grep -e "vdsm\|ovirt"
ebay-cors-filter-1.0.1-0.1.ovirt.el7.noarch
ovirt-log-collector-3.6.0-1.el7.centos.noarch
vdsm-jsonrpc-java-1.1.9-1.el7.centos.noarch
ovirt-engine-lib-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-extension-aaa-jdbc-1.0.8-0.0.master.20160415165902.gitca9be7e.el7.noarch
ovirt-engine-websocket-proxy-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-host-deploy-java-1.4.2-0.0.master.20151122153544.gitfc808fc.el7.noarch
ovirt-engine-tools-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-vmconsole-proxy-1.0.0-1.el7.centos.noarch
ovirt-release36-002-2.noarch
ovirt-engine-wildfly-8.2.1-1.el7.x86_64
ovirt-engine-extension-aaa-ldap-setup-1.1.2-1.el7.centos.noarch
ovirt-engine-wildfly-overlay-8.0.5-1.el7.noarch
ovirt-release36-snapshot-007-1.noarch
ovirt-engine-setup-base-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-setup-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-extensions-api-impl-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-iso-uploader-3.6.1-0.0.master.20160414111648.gitd2aea1a.el7.noarch
ovirt-engine-vmconsole-proxy-helper-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-cli-3.6.2.1-0.1.20160111.git696d8ea.el7.centos.noarch
ovirt-host-deploy-1.4.2-0.0.master.20151122153544.gitfc808fc.el7.noarch
ovirt-engine-restapi-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-webadmin-portal-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-userportal-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-backend-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-setup-lib-1.0.1-1.el7.centos.noarch
ovirt-vmconsole-1.0.0-1.el7.centos.noarch
ovirt-engine-extension-aaa-ldap-1.1.2-1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.2.1-1.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-common-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-setup-plugin-websocket-proxy-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-tools-backup-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-image-uploader-3.6.1-0.0.master.20151006154111.git95ce637.el7.centos.noarch
ovirt-engine-dbscripts-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch
ovirt-engine-3.6.6-0.0.master.20160421175934.git6e24235.el7.centos.noarch

node (centos 7.2):

rpm -qa | grep -e "vdsm\|ovirt"
vdsm-4.17.23-0.el7.centos.noarch
vdsm-infra-4.17.23-0.el7.centos.noarch
vdsm-xmlrpc-4.17.23-0.el7.centos.noarch
vdsm-yajsonrpc-4.17.23-0.el7.centos.noarch
vdsm-hook-vmfex-dev-4.17.23-0.el7.centos.noarch
vdsm-cli-4.17.23-0.el7.centos.noarch
vdsm-python-4.17.23-0.el7.centos.noarch
ovirt-vmconsole-1.0.0-1.el7.centos.noarch
vdsm-jsonrpc-4.17.23-0.el7.centos.noarch
ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch


I'm aware there is a newer vdsm version available, both in the stable 3.6 repo:
vdsm-4.17.26-0.el7.centos.noarch.rpm     

and in the snapshot repo:

http://resources.ovirt.org/pub/ovirt-3.6-snapshot/rpm/el7/noarch/vdsm-4.17.26-8.git2c8d622.el7.centos.noarch.rpm

Might any of these 2 versions fix something?

As I see https://gerrit.ovirt.org/#/c/56649/ is not merged yet.

Comment 21 Oved Ourfali 2016-04-27 11:46:37 UTC

You reported success before they got merged, so I was wondering whether it indeed works for you.
We knew of a network issue that was addressed by the network team, but I was wondering whether infra changes are indeed needed.
We're still testing things, and we'll probably merge all patches soon.

Comment 22 Sven Kieske 2016-04-27 12:05:19 UTC

Hi,

After sending host into maintenance mode from engine gui and clicking "reinstall"
host is now in "up" state.. this seems not to work as reliable as I'd liked it to work.

Comment 23 Dan Kenigsberg 2016-04-27 15:25:40 UTC

Oved, it seems like another facet of the same issue. "ping"s do not get to Vdsm after setupNetworks. This is not really surprising, as vdsm-jsonrpc-java-1.1.10 has fixes for this problems exactly.

jsonrpc.Executor/4::DEBUG::2016-04-27 11:56:17,500::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.ping' in bridge with True
Reactor thread::DEBUG::2016-04-27 11:56:17,501::stompreactor::470::protocoldetector.StompDetector::(handle_socket) Stomp detected from ('10.210.0.135', 41080)
jsonrpc.Executor/5::DEBUG::2016-04-27 11:56:17,540::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'Host.setupNetworks' in bridge with {u'bondings': {}, u'networks': {u'ovirtmgmt': {u'nic': u'eth0', u'ipaddr': u'10.210.2.20', u'mtu': u'1500', u'netmask': u'255.255.0.0', u'STP': u'no', u'bridged': u'true', u'gateway': u'10.210.0.1', u'defaultRoute': True}}, u'options': {u'connectivityCheck': u'true', u'connectivityTimeout': 120}}


Sven, I would love to know if these problems reproduce with Piotr's recent fixes.

Comment 24 Sven Kieske 2016-04-28 06:12:47 UTC

So should I just update vdsm-jsonrpc-java to 1.1.10 or do I need to upgrade more packages? If yes to exactly which version?

As this is still a dev environment I have a lot of options (well I need to get this going soon, though.. meaning: rollout in live).

Comment 25 Dan Kenigsberg 2016-04-28 11:41:57 UTC

Yes, I believe that upgrading vdsm-jsonrpc-java and restarting Engine is enough. Piotr may correct me.

Comment 26 Sven Kieske 2016-04-28 15:03:53 UTC

unfortunately I got a hardware problem, will test tomorrow with a different hardware but same software (got HP red screen of death)..

Comment 27 Sven Kieske 2016-04-29 06:59:58 UTC

I was able to verify with the exact same hardware. At least it completed this time successfully the host-deploy stage, with vdsm-jsonrpc-java 1.1.10.

To grow more confidence I will reinstall some times more to see if it really works reliably, because in the past sometimes it worked, sometimes it won't work.

HTH

Comment 28 Piotr Kliczewski 2016-04-29 08:06:36 UTC

Dan,

There are few patches from the engine side which need to be applied.

Jsonrpc patches were merged and all of them are part of 1.1.10 (3.6) and there are:

https://gerrit.ovirt.org/56649 jsonrpc: version bump
https://gerrit.ovirt.org/56178 connect: update timeout
https://gerrit.ovirt.org/56282 vdsbroker: check connection identifier during policy reset

on the engine side.

Comment 29 Dan Kenigsberg 2016-05-04 13:29:19 UTC

*** Bug 1332857 has been marked as a duplicate of this bug. ***

Comment 30 Dan Kenigsberg 2016-05-05 12:36:12 UTC

Moving back to Post as eedri report 100% failure on CI.

Comment 31 Red Hat Bugzilla Rules Engine 2016-05-05 12:36:19 UTC

Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 32 Sven Kieske 2016-05-09 07:09:39 UTC

(In reply to Dan Kenigsberg from comment #30)
> Moving back to Post as eedri report 100% failure on CI.

Hi,

is this still unstable or should this work in 3.6.6 RC1/2 ?

as far as I can see the CI in the linked gerrit patches shows +1 ?

I would like to test further and provide additional feedback.

Am I correct that now all needed patches are in 3.6.6 RC?

Comment 33 Jiri Belka 2016-05-09 16:26:03 UTC

ok, can't reproduce with 'RHEVM 3.6.6.2 [Z-STREAM] RELEASE INFO - BUILD 3.6.6-3'. host was added without problem to engine.

Note You need to log in before you can comment on or make changes to this bug.