Bug 1434538 - [RFE] Increasing cluster level doesn't guarantee features are enabled on upgraded hosts
Summary: [RFE] Increasing cluster level doesn't guarantee features are enabled on upgr...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Host-Deploy
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: bugs@ovirt.org
QA Contact: Lukas Svaty
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-21 16:46 UTC by Jiri Belka
Modified: 2020-04-01 14:50 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-04-01 14:47:11 UTC
oVirt Team: Infra
Embargoed:
oourfali: ovirt-future?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

Description Jiri Belka 2017-03-21 16:46:49 UTC
Description of problem:

If one adds a host into old cluster and then this host is updated, the cluster compat level is increased, it doesn't mean the host has "activated" features the cluster compat level version (ie. engine $major.$minor) version introduced.

Thus, if adding a host into 3.5 cluster - eg. 3.6 ngn - and then updated to 3.6 cluster in 3.6 engine, the host still doesn't have for example ovirt-vmconsole-host-sshd running.

[root@dell-r210ii-13 ~]# ps auxww | grep vmconsole
root     28489  0.0  0.0 112648   960 ttyS1    S+   17:36   0:00 grep --color=auto vmconsole
[root@dell-r210ii-13 ~]# systemctl list-unit-files | grep vmconsole
ovirt-vmconsole-host-sshd.service                  disabled

It seems we just "trust" that vdsm supports some versions, engine doesn't review that the host is "full configured" for specific cluster.

Does engine validates that services needed to support specific cluster version features are enabled and running on hosts before increasing cluster level?

This general complain caused non-running ovirt-vmconsole-host-sshd.service in 3.6 engine which was added into 3.5 cluster.

If we would continue increasing cluster compat levels just based on vdsm reported supported versions, I'm worried we will have still similar issues from time to time - hosts formally OK but not provided all features specific cluster level requires, eg. daemons running and enabled on hosts.

(IIUC this is how host deploy works, most configuration is done during first deployment and upgrade just upgrades packages...)

Version-Release number of selected component (if applicable):
rhevm-3.6.10.2-0.2.el6.noarch
ovirt-host-deploy-1.4.1-1.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. add 3.6 ngn into 3.5 cluster in 3.6 engine
2. does ovirt-vmconsole-host-sshd on the 3.6 ngn?
3. upgrade cluster/dc compat level to 3.6
4. was cluster level updated?
5. does ovirt-vmconsole-host-sshd run on the 3.6 ngn?

Actual results:
2 - nope
4 - yes
5 - nope

Expected results:
host should be up and cluster compat level updates only if the host practically has all pieces enabled, in this case ovirt-vmconsole-host-sshd should be enabled and running

Additional info:
maybe update should always trigger configuration check or reconfiguration

Comment 2 Oved Ourfali 2017-03-22 06:15:38 UTC
This is virt. 
In general, bugs on different features should be opened on the relevant team. It is not a generic issue.

Comment 3 Tomas Jelinek 2017-03-22 10:19:22 UTC
@Martin: is there any infrastructure which could be used to check not only the supported versions but the actual running processes and the configuration?

Comment 4 Jiri Belka 2017-03-22 12:51:30 UTC
another daemon not running - ovirt-imageio-daemon

[root@dell-r210ii-03 yum.repos.d]# systemctl list-dependencies vdsmd --plain --no-pager -a | grep ovirt
  ovirt-imageio-daemon.service
[root@dell-r210ii-03 yum.repos.d]# systemctl status vdsmd | grep Active
   Active: active (running) since Wed 2017-03-22 11:42:17 CET; 2h 7min ago
[root@dell-r210ii-03 yum.repos.d]# systemctl status ovirt-imageio-daemon | grep Active
   Active: failed (Result: start-limit) since Wed 2017-03-22 11:40:43 CET; 2h 9min ago

# systemctl show vdsmd | grep ^Wants
Wants=mom-vdsm.service system.slice ovirt-imageio-daemon.service

I suppose a "feature" requiring ovirt-imageio-daemon on host would be non working, even Host is Up in engine.

Comment 5 Tomas Jelinek 2017-03-29 10:59:40 UTC
This services are not so critical that the whole host should be marked as non-operational (from virt perspective, maybe storage has different opinion about it). On the other hand, it would be great if there was some way how to mark the host with some exclamation mark that some services are not running on it. 

It would be great to have some service on the host (inside VDSM or standalone) which can provide the engine with some list of services which are not running and should. Also, the logic of deciding if the host is OK, OK with some missing services or completely wrong should be implemented somewhere (could be in that service on host level or inside engine based on the information provided by this service).

Leaving to infra to decide how to go forward. Once the infrastructure will be in place we can add the logic for deciding how critical the particular virt service(s) are.

I think this infrastructure will be useful now and even more in future if we will decide to split VDSM up to more smaller services.

Comment 6 Oved Ourfali 2017-03-29 12:02:28 UTC
Marking that as an RFE.
Reducing severity.

Comment 7 Michal Skrivanek 2020-03-19 15:40:49 UTC
We didn't get to this bug for more than 2 years, and it's not being considered for the upcoming 4.4. It's unlikely that it will ever be addressed so I'm suggesting to close it.
If you feel this needs to be addressed and want to work on it please remove cond nack and target accordingly.

Comment 8 Michal Skrivanek 2020-04-01 14:47:11 UTC
ok, closing. Please reopen if still relevant/you want to work on it.

Comment 9 Michal Skrivanek 2020-04-01 14:50:51 UTC
ok, closing. Please reopen if still relevant/you want to work on it.


Note You need to log in before you can comment on or make changes to this bug.