Bug 1289868 - Host cannot be modified because of XML protocol not supported error
Host cannot be modified because of XML protocol not supported error
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: Frontend.WebAdmin (Show other bugs)
3.6.1
Unspecified Unspecified
unspecified Severity high (vote)
: ovirt-3.6.3
: 3.6.3.3
Assigned To: Moti Asayag
Petr Matyáš
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-09 03:58 EST by Petr Matyáš
Modified: 2016-03-11 02:24 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-11 02:24:42 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
rule-engine: exception+
mgoldboi: planning_ack+
oourfali: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)
screenshot (202.73 KB, image/png)
2015-12-09 03:58 EST, Petr Matyáš
no flags Details
engine, vdsm and supervdsm logs (1.13 MB, application/octet-stream)
2016-01-21 05:05 EST, Petr Matyáš
no flags Details
engine, vdsm, supervdsm logs; screenshot (1.35 MB, application/octet-stream)
2016-01-29 07:06 EST, Petr Matyáš
no flags Details
engine, vdsm, supervdsm logs; screenshot (915.56 KB, application/octet-stream)
2016-02-03 06:52 EST, Petr Matyáš
no flags Details
engine, vdsm, supervdsm logs (679.14 KB, application/octet-stream)
2016-02-18 04:17 EST, Petr Matyáš
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 50254 ovirt-engine-3.6 MERGED core: Revert protocol fallback if host supports json-rpc 2015-12-23 04:10 EST
oVirt gerrit 50454 master MERGED core: Revert protocol fallback if host supports json-rpc 2015-12-22 07:22 EST
oVirt gerrit 50957 ovirt-engine-3.6.2 ABANDONED core: Revert protocol fallback if host supports json-rpc 2015-12-23 05:00 EST
oVirt gerrit 50962 refs/tags/ovirt-engine-3.6.2 ABANDONED core: Revert protocol fallback if host supports json-rpc 2015-12-23 05:02 EST
oVirt gerrit 53381 ovirt-engine-3.6 MERGED core: Block adding host with XML-RPC protocol to 3.6 cluster 2016-02-15 12:56 EST
oVirt gerrit 53382 master MERGED core: Block adding host with XML-RPC protocol to 3.6 cluster 2016-02-15 08:03 EST
oVirt gerrit 53541 ovirt-engine-3.6.3 MERGED core: Block adding host with XML-RPC protocol to 3.6 cluster 2016-02-16 01:46 EST
oVirt gerrit 53779 master MERGED webadmin: Populate host protocol on 'edit host' 2016-02-22 04:50 EST
oVirt gerrit 53825 ovirt-engine-3.6 MERGED webadmin: Populate host protocol on 'edit host' 2016-02-22 09:51 EST
oVirt gerrit 53828 ovirt-engine-3.6.3 MERGED webadmin: Populate host protocol on 'edit host' 2016-02-22 10:15 EST

  None (edit)
Description Petr Matyáš 2015-12-09 03:58:21 EST
Created attachment 1103811 [details]
screenshot

Description of problem:
When I try to edit host and change even only the name, error appears saying 'XML protocol not supported by cluster 3.6 or higher' after clicking OK button. The host is working correctly and was installed on clean RHEL7.2.

Version-Release number of selected component (if applicable):
rhevm-3.6.1.1-0.1.el6.noarch
vdsm-4.17.11-0.el7ev

How reproducible:
always

Steps to Reproduce:
1. have correctly installed and working host in 3.6 cluster
2. try to edit it
3.

Actual results:
error

Expected results:
host is correctly edited

Additional info:
2015-12-09 09:42:44,482 WARN  [org.ovirt.engine.core.bll.hostdeploy.UpdateVdsCommand] (ajp-/127.0.0.1:8702-3) [324eed5e] CanDoAction of action 'UpdateVds' failed for user admin@internal. Reasons: VAR__ACTION__UPDATE,VAR__TYPE__HOST,NOT_SUPPORTED_PROTOCOL_FOR_CLUSTER_VERSION
Comment 1 Oved Ourfali 2015-12-10 02:08:00 EST
In 3.6 you shouldn't get into a situation where you have an existing host in a 3.6 cluster, working with XMLRPC.

Please provide steps to reproduce that will explain what you did with the host before you edit it, so that we'll be able to understand how you got to this situation.

As I see it, you can only do the following to get there:
option1: Add a host to 3.6 cluster - will work with jsonrpc
option2: Add a 3.6 host to 3.5 cluster, and then update the cluster to 3.6 - should fail if there are hosts that work with XMLRPC
option3: Update host cluster to another cluster, which is 3.6, while the host works with XMLRPC - should fail.

Also, we need to know if that's a clean 3.6.1 environment, or a clean 3.5 environment upgraded to 3.6.1, as otherwise you might have gotten into this situation due to previous bugs.
Comment 2 Oved Ourfali 2015-12-10 02:22:09 EST
Also, please provide logs.
Comment 3 Moti Asayag 2015-12-10 03:35:55 EST
Petr, please provide also the output of the following query:

  select protocol from vds_static where vds_name = 'pmatyas-host04';

or by accessing the host resource via the api and paste its 'protocol' element value ?
Comment 4 Piotr Kliczewski 2015-12-10 05:38:06 EST
There is fallback logic defined for older vdsms which do not support jsonrpc. At the time of connection we do not know whether vdsm which we connect to supports jsonrpc. We try to connect to vdsm using jsonrpc twice with a timeout in between and if that fails we assume that we are connecting to vdsm which do not supports it and we switch to xmlrpc. As a result when the engine finds out about the version of the vdsm it can declare it non-operatinal depending on cluster level.

We need to understand why 2 attempts to connect to vdsm using jsonrpc failed. Please provide engine and vdsm logs so we could understand what was the reason.
Comment 5 Petr Matyáš 2015-12-10 05:45:40 EST
I'am already working with Moti on this. This is happening also in our selenium tests in jenkins.
Comment 6 Oved Ourfali 2015-12-10 07:30:23 EST
Discussed offline.
As I see that it is a corner case (even if it is related to what Piotr described in comment #4), we'll address that in 3.6.2.
Comment 7 Oved Ourfali 2015-12-10 08:41:01 EST
Petr - a clear reporoducer, if you have any, would be great.
Comment 8 Petr Matyáš 2015-12-10 09:00:40 EST
I don't, it's happening just for one of my hosts and for every host in our jenkins. I'll try some upgrade scenario now.
Comment 9 Sandro Bonazzola 2015-12-23 10:08:37 EST
This bug has target milestone 3.6.2 and is on modified without a target release.
This may be perfectly correct, but please check if the patch fixing this bug is included in ovirt-engine-3.6.2. If it's included, please set target-release to 3.6.2 and move to ON_QA. Thanks.
Comment 10 Sandro Bonazzola 2016-01-14 03:31:53 EST
Sorry for the noise with assignee, something went wrong while setting target release.
Comment 11 Petr Matyáš 2016-01-21 05:05 EST
Created attachment 1116876 [details]
engine, vdsm and supervdsm logs

This still happens for one of our test suits (https://rhev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/3.6/view/UI/job/3.6-git-rhevmCore-selenium_webadmin-sanity/)

Tested on 3.6.2-9
Comment 12 Moti Asayag 2016-01-21 06:12:34 EST
The fix didn't get into 3.6.2. Promoting the target release to 3.6.3, where the fix is already merged.
Comment 13 Petr Matyáš 2016-01-29 07:06 EST
Created attachment 1119407 [details]
engine, vdsm, supervdsm logs; screenshot

Still not fixed on 3.6.3-1
Comment 14 Red Hat Bugzilla Rules Engine 2016-01-29 07:06:26 EST
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Comment 15 Oved Ourfali 2016-01-29 12:11:31 EST
Can you specify what isn't working?
Comment 16 Moti Asayag 2016-01-31 07:13:27 EST
Piotr, 
the vdsm.log (attached) contains the following error, when attempt to detect the protocol to use:

ioprocess communication (29826)::ERROR::2016-01-20 20:22:24,745::__init__::174::IOProcessClient::(_communicate) IOProcess failure
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 129, in _communicate
    raise Exception("FD closed")
Exception: FD closed

and also later-on:

JsonRpc (StompReactor)::INFO::2016-01-20 20:29:50,286::stompreactor::153::Broker.StompAdapter::(_cmd_unsubscribe) Unsubscribe command received
JsonRpc (StompReactor)::ERROR::2016-01-20 20:29:50,299::betterAsyncore::124::vds.dispatcher::(recv) SSL error during reading data: unexpected eof
MainThread::DEBUG::2016-01-20 20:29:59,297::vdsm::71::vds::(sigtermHandler) Received signal 15

Could you advise ?
Comment 17 Piotr Kliczewski 2016-02-01 06:12:16 EST
1. Issue with ioprocess is known for something. I asked maintainer of this package to take a look at it but I am not sure whether there was BZ opened.

2. It looks like we unsubscribed and closed connection. SSL error was raised due to closed connection which was expected at this time.
Comment 18 Piotr Kliczewski 2016-02-02 07:47:07 EST
Please provide both engine and vdsm logs when the issues occurs. Provided logs are from different days so it is impossible to correlated both.
Comment 19 Petr Matyáš 2016-02-03 06:52 EST
Created attachment 1120722 [details]
engine, vdsm, supervdsm logs; screenshot

Sorry about last time, I must have packed wrong logs.
Comment 20 Piotr Kliczewski 2016-02-03 07:13:06 EST
From provided logs I can see that the engine connected to the host using xmlrpc only.

Reactor thread::DEBUG::2016-02-03 13:16:28,646::bindingxmlrpc::1297::XmlDetector::(handle_socket) xml over http detected from ('10.35.161.74', 60385)

On the engine side I can only see:

2016-02-03 13:16:31,938 ERROR [org.ovirt.engine.core.bll.pm.FenceProxyLocator] (DefaultQuartzScheduler_Worker-92) [7e3fff7d] Can not run fence action on host 'host-10.35.160.31', no suitable proxy host was found.

Later I can see that host was put to maintenance and can do action for UpdateVds failed with:

2016-02-03 13:37:46,536 WARN  [org.ovirt.engine.core.bll.hostdeploy.UpdateVdsCommand] (ajp-/127.0.0.1:8702-2) [30c5480f] CanDoAction of action 'UpdateVds' failed for user admin@internal. Reasons: VAR__ACTION__UPDATE,VAR__TYPE__HOST,NOT_SUPPORTED_PROTOCOL_FOR_CLUSTER_VERSION

The maintenance trigger storage disconnect which failed on vdsm:

Thread-99::ERROR::2016-02-03 13:35:54,361::hsm::2557::Storage.HSM::(disconnectStorageServer) Could not disconnect from storageServer
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2553, in disconnectStorageServer
    conObj.disconnect()
  File "/usr/share/vdsm/storage/storageServer.py", line 447, in disconnect
    return self._mountCon.disconnect()
  File "/usr/share/vdsm/storage/storageServer.py", line 256, in disconnect
    self._mount.umount(True, True)
  File "/usr/share/vdsm/storage/mount.py", line 256, in umount
    return self._runcmd(cmd, timeout)
  File "/usr/share/vdsm/storage/mount.py", line 241, in _runcmd
    raise MountError(rc, ";".join((out, err)))
MountError: (32, ';umount: /rhev/data-center/mnt/vserver-nas02.qa.lab.tlv.redhat.com:_nas02_storage__bqLZjD__nfs__2016__02__03__13__14__55__15384: mountpoint not found\n')

but I do not see that communication protocol was changed.

I looks like validation issue on the engine side.
Comment 21 Petr Matyáš 2016-02-18 04:17 EST
Created attachment 1128168 [details]
engine, vdsm, supervdsm logs
Comment 22 Petr Matyáš 2016-02-18 04:18:39 EST
This issue is still not fixed in 3.6.3-3
Comment 23 Red Hat Bugzilla Rules Engine 2016-02-18 04:18:45 EST
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Comment 24 Moti Asayag 2016-02-21 17:13:43 EST
Found the exact reproducer for this bug:

1. Add host to ovirt-engine
2. In 'Edit Host' dialog, move host to 3.4 cluster level.
3. In 'Edit Host' dialog, move host to 3.6 cluster level.
4. Try to edit the host once again.

The new patch handles this problem.
Comment 25 Petr Matyáš 2016-02-25 10:27:34 EST
Verified on 3.6.3-4

Note You need to log in before you can comment on or make changes to this bug.