Bug 1373847 - Host that is set with protocol=xml fails cluster upgrade
Summary: Host that is set with protocol=xml fails cluster upgrade
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Infra
Version: 4.0.4
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ovirt-4.0.5
: 4.0.5.2
Assignee: Moti Asayag
QA Contact: Jiri Belka
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-07 09:18 UTC by Arik
Modified: 2017-01-18 07:38 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-01-18 07:38:40 UTC
oVirt Team: Infra
Embargoed:
rule-engine: ovirt-4.0.z+
mgoldboi: planning_ack+
masayag: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 65272 0 master MERGED engine: Eliminate fallback logic when installing a host 2016-10-14 10:19:33 UTC
oVirt gerrit 65276 0 ovirt-engine-4.0 MERGED engine: Eliminate fallback logic when installing a host 2016-10-17 09:02:26 UTC
oVirt gerrit 65527 0 ovirt-engine-4.0.5 MERGED engine: Eliminate fallback logic when installing a host 2016-10-17 13:31:44 UTC

Description Arik 2016-09-07 09:18:28 UTC
Description of problem:
Hosts in 3.6 clusters should not use xml as their protocol (but stomp instead).
When, from whatever reason, a host is using xml it blocks cluster upgrade. In this case, there is no "proper" way for the user to change the protocol - one can either change it directly in the database or edit the host in the UI (the protocol type is not exposed for versions>=3.6 but nevertheless the protocol will then change to stopmp).

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Have a 3.6 cluster with host that is set with protocol=XML
2. Upgrade the cluster compatibility version to 4.0
3.

Actual results:
The upgrade fails, saying that there are host that are not supported in the new compatibility version

Expected results:
If we do not support XML anymore then we can change the protocol during the cluster upgrade. Otherwise, should at least provide users with the option to change host's protocol

Additional info:

Comment 1 Arik 2016-09-07 09:20:52 UTC
Actually the severity is high, the priority can be lower as we are not sure how comes that the host on rhev.tlv was set with XML

Comment 2 Oved Ourfali 2016-09-15 05:37:03 UTC
Moti - can you take a look?

Comment 3 Moti Asayag 2016-09-19 10:50:01 UTC
We shouldn't get into a situation where cluster 3.6 contains host with XML-RPC protocol.

The only way to get to this stage is by re-installing the host or by adding host which failed to communicate with the engine by JSON-RPC and fallback to XML-RPC.

If the host does support JSON-RPC, we should investigate the reason for failing to communicate with the engine via JSON-RPC, else, if the host doesn't support  JSON-RPC, it should not be part of any 3.6 cluster.

In order to recover from that state - one can remove the host from the engine and add it again so there will be another attempt to communicate with the host via JSON-RPC (no need to deal with the DB).

Are there any logs from the failed installation of that host ?

Comment 4 Moran Goldboim 2016-09-20 08:03:15 UTC
(In reply to Moti Asayag from comment #3)
> We shouldn't get into a situation where cluster 3.6 contains host with
> XML-RPC protocol.
> 
> The only way to get to this stage is by re-installing the host or by adding
> host which failed to communicate with the engine by JSON-RPC and fallback to
> XML-RPC.
> 
> If the host does support JSON-RPC, we should investigate the reason for
> failing to communicate with the engine via JSON-RPC, else, if the host
> doesn't support  JSON-RPC, it should not be part of any 3.6 cluster.
> 
> In order to recover from that state - one can remove the host from the
> engine and add it again so there will be another attempt to communicate with
> the host via JSON-RPC (no need to deal with the DB).
> 
> Are there any logs from the failed installation of that host ?

so, are you suggesting to move xml-rpc based hosts to non-operational mode once we found them active on 3.6 cluster?

Comment 5 Moti Asayag 2016-09-20 11:51:57 UTC
(In reply to Moran Goldboim from comment #4)
> (In reply to Moti Asayag from comment #3)
> > We shouldn't get into a situation where cluster 3.6 contains host with
> > XML-RPC protocol.
> > 
> > The only way to get to this stage is by re-installing the host or by adding
> > host which failed to communicate with the engine by JSON-RPC and fallback to
> > XML-RPC.
> > 
> > If the host does support JSON-RPC, we should investigate the reason for
> > failing to communicate with the engine via JSON-RPC, else, if the host
> > doesn't support  JSON-RPC, it should not be part of any 3.6 cluster.
> > 
> > In order to recover from that state - one can remove the host from the
> > engine and add it again so there will be another attempt to communicate with
> > the host via JSON-RPC (no need to deal with the DB).
> > 
> > Are there any logs from the failed installation of that host ?
> 
> so, are you suggesting to move xml-rpc based hosts to non-operational mode
> once we found them active on 3.6 cluster?

Yes, we should.

Comment 6 Arik 2016-09-25 07:18:41 UTC
(In reply to Moti Asayag from comment #3)
> Are there any logs from the failed installation of that host ?

I didn't look for a failed installation.
It happened on rhev.tlv, maybe the log still exists there.

Comment 7 Moti Asayag 2016-10-06 11:46:25 UTC
The fix for this patch for 4.0.x will include the following:

1. Add an upgrade script to move all 3.6 and above hosts to json.
2. Remove the fallback code which used to reconnect failed 3.6 hosts via json to xmlrpc.

As a result, failed attempt to communicate with 3.6 hosts via json rpc during installation will end up with an installation failure.

Comment 8 Jiri Belka 2016-10-24 12:08:26 UTC
ok, ovirt-engine-4.0.5.2-0.2.el7ev.noarch

before engine update:

engine=# select protocol from vds_static where vds_name = 'dell-r210ii-04';                                                                                                                                         
 protocol 
----------
        0
(1 row)

after engine update:

engine=# select protocol from vds_static where vds_name = 'dell-r210ii-04';
 protocol 
----------
        1
(1 row)


engine-setup went smoothly.

Comment 9 Moti Asayag 2016-10-25 08:43:38 UTC
Jiri,

Could you attempt to install a 3.5 host and after its failed installation to activate it  and see that it behaves as expected:

1. Installation should fail.
2. Activation should move the host to Non-responsive.

Comment 10 Jiri Belka 2016-10-25 13:54:16 UTC
(In reply to Moti Asayag from comment #9)
> Jiri,
> 
> Could you attempt to install a 3.5 host and after its failed installation to
> activate it  and see that it behaves as expected:
> 
> 1. Installation should fail.
> 2. Activation should move the host to Non-responsive.

Ad 2 - the host ends in non-operational state as 3.5 host can't be managed by 4.0 engine as 4.0 supports only 3.6 and 4.0 cluster level.

Comment 11 Moti Asayag 2016-10-25 14:26:51 UTC
(In reply to Jiri Belka from comment #10)
> (In reply to Moti Asayag from comment #9)
> > Jiri,
> > 
> > Could you attempt to install a 3.5 host and after its failed installation to
> > activate it  and see that it behaves as expected:
> > 
> > 1. Installation should fail.
> > 2. Activation should move the host to Non-responsive.
> 
> Ad 2 - the host ends in non-operational state as 3.5 host can't be managed
> by 4.0 engine as 4.0 supports only 3.6 and 4.0 cluster level.

You are right: This is the expected behavior with 3.5 host, since it supports also jsonrpc. 3.5 was the first host version which supported json-rpc, that's the reason why the ovirt-engine managed to communicate with it.

Could you try the same with 3.4 host ? In 3.4 there is no JSON rpc support, and such case might occur only if the admin mis-configure the repositories file on the host, or attempted to register 3.4 rhev-h.


Note You need to log in before you can comment on or make changes to this bug.