Bug 1289868
| Summary: | Host cannot be modified because of XML protocol not supported error | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Petr Matyáš <pmatyas> | ||||||||||||
| Component: | Frontend.WebAdmin | Assignee: | Moti Asayag <masayag> | ||||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Petr Matyáš <pmatyas> | ||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||
| Priority: | unspecified | ||||||||||||||
| Version: | 3.6.1 | CC: | bugs, gklein, masayag, mgoldboi, oourfali, pkliczew, pmatyas, pstehlik, sbonazzo | ||||||||||||
| Target Milestone: | ovirt-3.6.3 | Flags: | rule-engine:
ovirt-3.6.z+
rule-engine: exception+ mgoldboi: planning_ack+ oourfali: devel_ack+ pstehlik: testing_ack+ |
||||||||||||
| Target Release: | 3.6.3.3 | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2016-03-11 07:24:42 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Attachments: |
|
||||||||||||||
In 3.6 you shouldn't get into a situation where you have an existing host in a 3.6 cluster, working with XMLRPC. Please provide steps to reproduce that will explain what you did with the host before you edit it, so that we'll be able to understand how you got to this situation. As I see it, you can only do the following to get there: option1: Add a host to 3.6 cluster - will work with jsonrpc option2: Add a 3.6 host to 3.5 cluster, and then update the cluster to 3.6 - should fail if there are hosts that work with XMLRPC option3: Update host cluster to another cluster, which is 3.6, while the host works with XMLRPC - should fail. Also, we need to know if that's a clean 3.6.1 environment, or a clean 3.5 environment upgraded to 3.6.1, as otherwise you might have gotten into this situation due to previous bugs. Also, please provide logs. Petr, please provide also the output of the following query: select protocol from vds_static where vds_name = 'pmatyas-host04'; or by accessing the host resource via the api and paste its 'protocol' element value ? There is fallback logic defined for older vdsms which do not support jsonrpc. At the time of connection we do not know whether vdsm which we connect to supports jsonrpc. We try to connect to vdsm using jsonrpc twice with a timeout in between and if that fails we assume that we are connecting to vdsm which do not supports it and we switch to xmlrpc. As a result when the engine finds out about the version of the vdsm it can declare it non-operatinal depending on cluster level. We need to understand why 2 attempts to connect to vdsm using jsonrpc failed. Please provide engine and vdsm logs so we could understand what was the reason. I'am already working with Moti on this. This is happening also in our selenium tests in jenkins. Discussed offline. As I see that it is a corner case (even if it is related to what Piotr described in comment #4), we'll address that in 3.6.2. Petr - a clear reporoducer, if you have any, would be great. I don't, it's happening just for one of my hosts and for every host in our jenkins. I'll try some upgrade scenario now. This bug has target milestone 3.6.2 and is on modified without a target release. This may be perfectly correct, but please check if the patch fixing this bug is included in ovirt-engine-3.6.2. If it's included, please set target-release to 3.6.2 and move to ON_QA. Thanks. Sorry for the noise with assignee, something went wrong while setting target release. Created attachment 1116876 [details] engine, vdsm and supervdsm logs This still happens for one of our test suits (https://rhev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/3.6/view/UI/job/3.6-git-rhevmCore-selenium_webadmin-sanity/) Tested on 3.6.2-9 The fix didn't get into 3.6.2. Promoting the target release to 3.6.3, where the fix is already merged. Created attachment 1119407 [details]
engine, vdsm, supervdsm logs; screenshot
Still not fixed on 3.6.3-1
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release. Can you specify what isn't working? Piotr,
the vdsm.log (attached) contains the following error, when attempt to detect the protocol to use:
ioprocess communication (29826)::ERROR::2016-01-20 20:22:24,745::__init__::174::IOProcessClient::(_communicate) IOProcess failure
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 129, in _communicate
raise Exception("FD closed")
Exception: FD closed
and also later-on:
JsonRpc (StompReactor)::INFO::2016-01-20 20:29:50,286::stompreactor::153::Broker.StompAdapter::(_cmd_unsubscribe) Unsubscribe command received
JsonRpc (StompReactor)::ERROR::2016-01-20 20:29:50,299::betterAsyncore::124::vds.dispatcher::(recv) SSL error during reading data: unexpected eof
MainThread::DEBUG::2016-01-20 20:29:59,297::vdsm::71::vds::(sigtermHandler) Received signal 15
Could you advise ?
1. Issue with ioprocess is known for something. I asked maintainer of this package to take a look at it but I am not sure whether there was BZ opened. 2. It looks like we unsubscribed and closed connection. SSL error was raised due to closed connection which was expected at this time. Please provide both engine and vdsm logs when the issues occurs. Provided logs are from different days so it is impossible to correlated both. Created attachment 1120722 [details]
engine, vdsm, supervdsm logs; screenshot
Sorry about last time, I must have packed wrong logs.
From provided logs I can see that the engine connected to the host using xmlrpc only.
Reactor thread::DEBUG::2016-02-03 13:16:28,646::bindingxmlrpc::1297::XmlDetector::(handle_socket) xml over http detected from ('10.35.161.74', 60385)
On the engine side I can only see:
2016-02-03 13:16:31,938 ERROR [org.ovirt.engine.core.bll.pm.FenceProxyLocator] (DefaultQuartzScheduler_Worker-92) [7e3fff7d] Can not run fence action on host 'host-10.35.160.31', no suitable proxy host was found.
Later I can see that host was put to maintenance and can do action for UpdateVds failed with:
2016-02-03 13:37:46,536 WARN [org.ovirt.engine.core.bll.hostdeploy.UpdateVdsCommand] (ajp-/127.0.0.1:8702-2) [30c5480f] CanDoAction of action 'UpdateVds' failed for user admin@internal. Reasons: VAR__ACTION__UPDATE,VAR__TYPE__HOST,NOT_SUPPORTED_PROTOCOL_FOR_CLUSTER_VERSION
The maintenance trigger storage disconnect which failed on vdsm:
Thread-99::ERROR::2016-02-03 13:35:54,361::hsm::2557::Storage.HSM::(disconnectStorageServer) Could not disconnect from storageServer
Traceback (most recent call last):
File "/usr/share/vdsm/storage/hsm.py", line 2553, in disconnectStorageServer
conObj.disconnect()
File "/usr/share/vdsm/storage/storageServer.py", line 447, in disconnect
return self._mountCon.disconnect()
File "/usr/share/vdsm/storage/storageServer.py", line 256, in disconnect
self._mount.umount(True, True)
File "/usr/share/vdsm/storage/mount.py", line 256, in umount
return self._runcmd(cmd, timeout)
File "/usr/share/vdsm/storage/mount.py", line 241, in _runcmd
raise MountError(rc, ";".join((out, err)))
MountError: (32, ';umount: /rhev/data-center/mnt/vserver-nas02.qa.lab.tlv.redhat.com:_nas02_storage__bqLZjD__nfs__2016__02__03__13__14__55__15384: mountpoint not found\n')
but I do not see that communication protocol was changed.
I looks like validation issue on the engine side.
Created attachment 1128168 [details]
engine, vdsm, supervdsm logs
This issue is still not fixed in 3.6.3-3 Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release. Found the exact reproducer for this bug: 1. Add host to ovirt-engine 2. In 'Edit Host' dialog, move host to 3.4 cluster level. 3. In 'Edit Host' dialog, move host to 3.6 cluster level. 4. Try to edit the host once again. The new patch handles this problem. Verified on 3.6.3-4 |
Created attachment 1103811 [details] screenshot Description of problem: When I try to edit host and change even only the name, error appears saying 'XML protocol not supported by cluster 3.6 or higher' after clicking OK button. The host is working correctly and was installed on clean RHEL7.2. Version-Release number of selected component (if applicable): rhevm-3.6.1.1-0.1.el6.noarch vdsm-4.17.11-0.el7ev How reproducible: always Steps to Reproduce: 1. have correctly installed and working host in 3.6 cluster 2. try to edit it 3. Actual results: error Expected results: host is correctly edited Additional info: 2015-12-09 09:42:44,482 WARN [org.ovirt.engine.core.bll.hostdeploy.UpdateVdsCommand] (ajp-/127.0.0.1:8702-3) [324eed5e] CanDoAction of action 'UpdateVds' failed for user admin@internal. Reasons: VAR__ACTION__UPDATE,VAR__TYPE__HOST,NOT_SUPPORTED_PROTOCOL_FOR_CLUSTER_VERSION