Description of problem: Hosts in 3.6 clusters should not use xml as their protocol (but stomp instead). When, from whatever reason, a host is using xml it blocks cluster upgrade. In this case, there is no "proper" way for the user to change the protocol - one can either change it directly in the database or edit the host in the UI (the protocol type is not exposed for versions>=3.6 but nevertheless the protocol will then change to stopmp). Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Have a 3.6 cluster with host that is set with protocol=XML 2. Upgrade the cluster compatibility version to 4.0 3. Actual results: The upgrade fails, saying that there are host that are not supported in the new compatibility version Expected results: If we do not support XML anymore then we can change the protocol during the cluster upgrade. Otherwise, should at least provide users with the option to change host's protocol Additional info:
Actually the severity is high, the priority can be lower as we are not sure how comes that the host on rhev.tlv was set with XML
Moti - can you take a look?
We shouldn't get into a situation where cluster 3.6 contains host with XML-RPC protocol. The only way to get to this stage is by re-installing the host or by adding host which failed to communicate with the engine by JSON-RPC and fallback to XML-RPC. If the host does support JSON-RPC, we should investigate the reason for failing to communicate with the engine via JSON-RPC, else, if the host doesn't support JSON-RPC, it should not be part of any 3.6 cluster. In order to recover from that state - one can remove the host from the engine and add it again so there will be another attempt to communicate with the host via JSON-RPC (no need to deal with the DB). Are there any logs from the failed installation of that host ?
(In reply to Moti Asayag from comment #3) > We shouldn't get into a situation where cluster 3.6 contains host with > XML-RPC protocol. > > The only way to get to this stage is by re-installing the host or by adding > host which failed to communicate with the engine by JSON-RPC and fallback to > XML-RPC. > > If the host does support JSON-RPC, we should investigate the reason for > failing to communicate with the engine via JSON-RPC, else, if the host > doesn't support JSON-RPC, it should not be part of any 3.6 cluster. > > In order to recover from that state - one can remove the host from the > engine and add it again so there will be another attempt to communicate with > the host via JSON-RPC (no need to deal with the DB). > > Are there any logs from the failed installation of that host ? so, are you suggesting to move xml-rpc based hosts to non-operational mode once we found them active on 3.6 cluster?
(In reply to Moran Goldboim from comment #4) > (In reply to Moti Asayag from comment #3) > > We shouldn't get into a situation where cluster 3.6 contains host with > > XML-RPC protocol. > > > > The only way to get to this stage is by re-installing the host or by adding > > host which failed to communicate with the engine by JSON-RPC and fallback to > > XML-RPC. > > > > If the host does support JSON-RPC, we should investigate the reason for > > failing to communicate with the engine via JSON-RPC, else, if the host > > doesn't support JSON-RPC, it should not be part of any 3.6 cluster. > > > > In order to recover from that state - one can remove the host from the > > engine and add it again so there will be another attempt to communicate with > > the host via JSON-RPC (no need to deal with the DB). > > > > Are there any logs from the failed installation of that host ? > > so, are you suggesting to move xml-rpc based hosts to non-operational mode > once we found them active on 3.6 cluster? Yes, we should.
(In reply to Moti Asayag from comment #3) > Are there any logs from the failed installation of that host ? I didn't look for a failed installation. It happened on rhev.tlv, maybe the log still exists there.
The fix for this patch for 4.0.x will include the following: 1. Add an upgrade script to move all 3.6 and above hosts to json. 2. Remove the fallback code which used to reconnect failed 3.6 hosts via json to xmlrpc. As a result, failed attempt to communicate with 3.6 hosts via json rpc during installation will end up with an installation failure.
ok, ovirt-engine-4.0.5.2-0.2.el7ev.noarch before engine update: engine=# select protocol from vds_static where vds_name = 'dell-r210ii-04'; protocol ---------- 0 (1 row) after engine update: engine=# select protocol from vds_static where vds_name = 'dell-r210ii-04'; protocol ---------- 1 (1 row) engine-setup went smoothly.
Jiri, Could you attempt to install a 3.5 host and after its failed installation to activate it and see that it behaves as expected: 1. Installation should fail. 2. Activation should move the host to Non-responsive.
(In reply to Moti Asayag from comment #9) > Jiri, > > Could you attempt to install a 3.5 host and after its failed installation to > activate it and see that it behaves as expected: > > 1. Installation should fail. > 2. Activation should move the host to Non-responsive. Ad 2 - the host ends in non-operational state as 3.5 host can't be managed by 4.0 engine as 4.0 supports only 3.6 and 4.0 cluster level.
(In reply to Jiri Belka from comment #10) > (In reply to Moti Asayag from comment #9) > > Jiri, > > > > Could you attempt to install a 3.5 host and after its failed installation to > > activate it and see that it behaves as expected: > > > > 1. Installation should fail. > > 2. Activation should move the host to Non-responsive. > > Ad 2 - the host ends in non-operational state as 3.5 host can't be managed > by 4.0 engine as 4.0 supports only 3.6 and 4.0 cluster level. You are right: This is the expected behavior with 3.5 host, since it supports also jsonrpc. 3.5 was the first host version which supported json-rpc, that's the reason why the ovirt-engine managed to communicate with it. Could you try the same with 3.4 host ? In 3.4 there is no JSON rpc support, and such case might occur only if the admin mis-configure the repositories file on the host, or attempted to register 3.4 rhev-h.