1328382 – [DOC] Hosted engine upgrade guide should include the supportability of direct upgrade from 3.5 el6 to 3.6 el7

Bug 1328382 - [DOC] Hosted engine upgrade guide should include the supportability of direct upgrade from 3.5 el6 to 3.6 el7

Summary: [DOC] Hosted engine upgrade guide should include the supportability of direct...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	Documentation
Sub Component:
Version:	3.6.3
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	medium
Target Milestone:	ovirt-3.6.9
Target Release:	---
Assignee:	Byron Gravenorst
QA Contact:	Tahlia Richardson
Docs Contact:
URL:
Whiteboard:
Duplicates (5):	1319595 1337641 1339001 1364543 1364568 (view as bug list)
Depends On:
Blocks:	902971 1333223
TreeView+	depends on / blocked

Reported:	2016-04-19 09:24 UTC by nijin ashok
Modified:	2020-03-11 15:05 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-09-28 05:02:59 UTC
oVirt Team:	Docs
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Adding 3.5 host warning (6.42 KB, image/png) 2016-07-22 18:38 UTC, Marina Kalinin	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1319595	high	CLOSED	ha-agent not starting when upgrading the hosted engine from 6.x to 7.x	2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution)	2256881	None	None	After deploying additional hosts in RHEV cluster, RHEV Engine fails to start on new nodes.	2019-04-28 13:27:38 UTC
Red Hat Knowledge Base (Solution)	2300331	None	None	None	2019-11-14 08:05:37 UTC
Red Hat Knowledge Base (Solution)	2350381	Troubleshoot	None	Automatic server reboot during yum update	2019-04-28 13:27:38 UTC

Internal Links: 1319595

Description nijin ashok 2016-04-19 09:24:51 UTC

Description of problem:

As per this https://bugzilla.redhat.com/show_bug.cgi?id=1319595#c4 , direct upgrade from 3.5 el6 to 3.6 el7 is not supported . The flow is 3.5 el6 to 3.5 el7 and then upgrade to el7. This need to be mentioned in the documentation 

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.6/html-single/Self-Hosted_Engine_Guide/index.html#Upgrading_the_Self-Hosted_Engine_from_6_to_7

Version-Release number of selected component (if applicable):

Red Hat Enterprise Virtualization 3.6

Comment 12 Marina Kalinin 2016-05-05 21:47:13 UTC

Great. Nijin, thanks for opening this bug.
I was about to open something similar.

See my comment here:
https://bugzilla.redhat.com/show_bug.cgi?id=1311027#c2

We had an extended conversation about this with Simone and here is what we got:
- It does not really matter when you reinstall your hosts to RHEL7, while on 3.5 or on 3.6.
- If you have RHEL6 host on Hosted Engine, and especially, if you have RHEV-H host, you would have a problem, described in the root cause section here:
https://access.redhat.com/solutions/2300331[1]
- P.S. Our current guide suggest this https://access.redhat.com/solutions/637583 for upgrading RHEL 6 hosts to 7 instead of reinstall. After digging in this issue, I do not think we should recommend this procedure anymore for SHE.

Action Items:
- Review, text and fix the solution: [Eng/GSS/ QE]
https://access.redhat.com/solutions/2300331
- Update docs with the correct procedure.
- Ideally have Engineering fix this issue by providing el6 version of HA 1.3, to allow storage migration in a more convenient way.

[1] The problem:
3.6 Hosted Engine Storage structure has changed from 3.5 to 3.6. If on RHEL7 host, updating ovirt-hosted-engine-ha package would initiate HE storage migration to a new structure. However, if this is a pre-3.6 HE and the hosts are RHEL6 based, this upgrade would not happen, since rhel6 channel does not contain those packages. However, RHEL7 would get the latest packages and will build a new storage structure. And this would not be compatible with the existing setup.

Comment 13 Lucy Bopf 2016-05-09 01:34:44 UTC

Assigning to Byron for review.

Comment 14 Marina Kalinin 2016-05-19 17:21:32 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1337641

Comment 16 Lucy Bopf 2016-06-02 07:08:45 UTC

*** Bug 1339001 has been marked as a duplicate of this bug. ***

Comment 19 Marina Kalinin 2016-06-20 22:02:20 UTC

Current best reference for upgrading 3.5 HE setup is the upgrade helper:
https://access.redhat.com/labs/rhevupgradehelper/

It covers all possible options.

Comment 22 Yaniv Lavi 2016-07-14 11:21:33 UTC

*** Bug 1337641 has been marked as a duplicate of this bug. ***

Comment 23 Yaniv Lavi 2016-07-14 11:22:29 UTC

The decided flow for RHEL-H updates is:
3.5 RHEL-H 6 Cluster -> Single host upgrade to 3.5 RHEV-H 7 -> Reinstall cluster to 3.6 RHEL-H 7

The decided flow for RHEL-H updates is:
3.5 RHEV-H 6 Cluster -> Single host upgrade to 3.5 RHEV-H 7 -> Reinstall cluster to 3.6 RHEV-H 7

Didi, can you please provide the flow to cover this using in cluster upgrade?

Comment 24 Marina Kalinin 2016-07-14 20:52:28 UTC

I think we should not say - "Single host upgrade to 3.5 RHEV-H 7". It would never be an upgrade, but clean install in both cases. But maybe it is only semantics. :-)

Comment 25 Sandro Bonazzola 2016-07-19 08:44:37 UTC

*** Bug 1319595 has been marked as a duplicate of this bug. ***

Comment 26 Yedidyah Bar David 2016-07-20 21:02:25 UTC

I now finished testing the following flow:

1. Installed and deployed hosted-engine 3.5 on two rhel6 hosts with nfs storage
2. Moved host1 to maintenance in the web admin
3. Removed it from the engine
4. Installed on it RHEV-H 3.5 rhel7 (from a usb stick)
5. From the tui, chose hosted-engine -> Additional host setup
6. Input host ID 3 - in 3.5, can't reuse 1. If you have N hosts, input N+1.
7. Eventually it failed as expected because can't add an el7 host to an el6 cluster, prompting to retry
8. Created new cluster el7, moved host to maintenance, edited it to be in cluster el7
9. Accepted "Retry", it finished successfully
10. Upgraded it to RHEV-H 3.6: Installed on the engine rhev-hypervisor7-7.2-20160711.0.el6ev.noarch.rpm (latest QE build, most likely works also with current released build), moved to maintenance, chose Upgrade from web admin
11. (Optional) verified that it upgraded the conf to be on the shared storage with:
egrep 'Upgrading|Successfully upgraded' /var/log/ovirt-hosted-engine-ha/agent.log
12. Moved host2 to maintenance and removed from engine
13. Reinstalled with rhel7.2 and hosted-engine 3.6
14. Stopped NetworkManager, ran 'hosted-engine --deploy' as additional host
15. Accepted default host ID 2, but for more hosts, reuse previous ID that this host had
16. Since it was the last host, it was successfully added to cluster Default. For more hosts, need to add to cluster el7 and Retry
17. In principle, repeat steps 12 to 16 for the rest of the hosts. I didn't have any.
17. Since the requirement is for "RHEL-H" only hosts, moved host1 to maintenance, reinstalled it with rhel7.2 and hosted-engine 3.6, ran deploy, and used the opportunity to now reuse host ID 1.
18. Upgraded the engine: set global maintenance, added 3.6 repos, yum update rhevm\*setup\*, engine-setup
19. At that point I already had the two hosts in cluster Default. In principle, move those that are not to it.
20. Set compatibility level of cluster Default to 3.6. Got a message that I need to reboot all VMs (new for 3.6.7 I think), rebooted them (set global maintenance for engine vm prior to rebooting it and none after it was up).
21. At this point OVF_STORE was created, but score was still 2400 on both hosts. Moved host1 to maintenance and rebooted it, set maintenance to none. It got score 3400, and as expected, the engine vm was shutdown, and started on this host.
22. Moved also the other host to maintenance and reboot (actually just restarted ovirt-ha-agent and ovirt-ha-broker this time, to see that it's enough).
23. Repeat step 22 for all other hosts - I didn't have any. Then verify that all settled down on score 3400.
24. To get rid of the now unused host ID 3 in 'hosted-engine --vm-status', moved one host to maintenance, stopped ovirt-ha-agent, then ran hosted-engine --clean-metadata --host-id=3, started ha-agent, and activated. Replace '3' with N+1. Verified that now 'hosted-engine --vm-status' is clean.

This is very similar to the current procedure, with one important difference (except for using RHEV-H): I do all "at once" - do not upgrade hosts to 3.5/el7 (except for the RHEV-H one) and then to 3.6, but directly to 3.6/el7. This requires restarting the agent (steps 21-23), which are not needed in current procedure. This also has a side-effect of upgrading the engine only after upgrading the hosts. Can't see any real issue with this, just in case setting needinfo on Doron in case he wants to comment/review or wants me to try some other flow.

BTW, not sure what Yaniv meant by "in cluster upgrade". It can't be "in cluster", because we do not allow a single cluster with both el6 and el7 hosts, no matter if rhev-h or rhel-h.

Comment 27 Marina Kalinin 2016-07-21 21:59:54 UTC

In general, I do like it more, as we discussed - much better then installing specific packages. I think this solution is much cleaner.
Couple of questions on the process:

(In reply to Yedidyah Bar David from comment #26)
> I now finished testing the following flow:
...
> 14. Stopped NetworkManager, ran 'hosted-engine --deploy' as additional host
Why do we need to stop NM? IS it also in the regular instructions?
..
> 20. Set compatibility level of cluster Default to 3.6. Got a message that I need to reboot all VMs (new for 3.6.7 I think), rebooted them (set global maintenance for engine vm prior to rebooting it and none after it was up).

Starting 3.6.8 you would not need to restart the VMs in order to change compatibility mode. You would be able doing it in run-time. So, all the running VMs would have to get a note they should be restarted at some point (bz#1356194).
..
> This is very similar to the current procedure, with one important difference
> (except for using RHEV-H): I do all "at once" - do not upgrade hosts to
> 3.5/el7 (except for the RHEV-H one) and then to 3.6, but directly to
> 3.6/el7. This requires restarting the agent (steps 21-23), which are not
> needed in current procedure.
What will happen if we do not restart the engine and continue running with 2400 score until they are restarted? 

> This also has a side-effect of upgrading the
> engine only after upgrading the hosts. Can't see any real issue with this,
> just in case setting needinfo on Doron in case he wants to comment/review or
> wants me to try some other flow.
Why couldn't you upgrade the engine first? Answer: because you would not be able adding the 3.5 host to the environment. I see.
The only challenge here is what if a customer would upgrade the manager on its own first. There is no rollback for this.
I guess, if that will be the case, we can use the flow suggested in the kcs.
https://access.redhat.com/solutions/2351141
> 
> BTW, not sure what Yaniv meant by "in cluster upgrade". It can't be "in
> cluster", because we do not allow a single cluster with both el6 and el7
> hosts, no matter if rhev-h or rhel-h.
Indeed, I don't know why you couldn't add it to the original cluster. I do not see anywhere where we specify host OS in 3.5 on the cluster level.
However, as long as this procedure is tested and documented, I think it is ok.

So, this would cover both RHEL and RHEVH upgrade flows, right?
For RHEV-H it would be same, just get RHEV-H hosts instead of RHELs.

Comment 28 Marina Kalinin 2016-07-21 22:06:11 UTC

*** Bug 1358599 has been marked as a duplicate of this bug. ***

Comment 29 Marina Kalinin 2016-07-22 18:38:30 UTC

Created attachment 1182930 [details]
Adding 3.5 host warning

Comment 30 Marina Kalinin 2016-07-22 18:40:38 UTC

Didi, please see the screenshot.
So, the customer would see this message and will have to choose Yes to continue, right?

Comment 31 Yedidyah Bar David 2016-07-24 06:57:40 UTC

(In reply to Marina from comment #30)
> Didi, please see the screenshot.
> So, the customer would see this message and will have to choose Yes to
> continue, right?

No. At least I didn't see it during comment 26. The message in the screenshot is "It seems like your existing HE infrastructure was deployed with version 3.5 (or before) and never upgraded to current release.". The flow I used was specifically designed to not enter this state. It does this:

1. Start with all 3.5/el6 hosts
- all good
2. Replace one with a 3.5/el7 RHEV-H
- still all good
3. _Upgrade_ that one to 3.6
- Not _add_ it as a new one, but upgrade using the RHEV-H upgrade feature
- This will upgrade the hosted_storage to 3.6
4. For all the others, remove and reinstall 3.6.

Comment 32 Yedidyah Bar David 2016-07-24 07:11:01 UTC

(In reply to Marina from comment #27)
> In general, I do like it more, as we discussed - much better then installing
> specific packages. I think this solution is much cleaner.
> Couple of questions on the process:
> 
> (In reply to Yedidyah Bar David from comment #26)
> > I now finished testing the following flow:
> ...
> > 14. Stopped NetworkManager, ran 'hosted-engine --deploy' as additional host
> Why do we need to stop NM? IS it also in the regular instructions?
> ..

Didn't check, but if you don't, it will tell you to stop it and abort.

> > 20. Set compatibility level of cluster Default to 3.6. Got a message that I need to reboot all VMs (new for 3.6.7 I think), rebooted them (set global maintenance for engine vm prior to rebooting it and none after it was up).
> 
> Starting 3.6.8 you would not need to restart the VMs in order to change
> compatibility mode. You would be able doing it in run-time. So, all the
> running VMs would have to get a note they should be restarted at some point
> (bz#1356194).
> ..

Right.

> > This is very similar to the current procedure, with one important difference
> > (except for using RHEV-H): I do all "at once" - do not upgrade hosts to
> > 3.5/el7 (except for the RHEV-H one) and then to 3.6, but directly to
> > 3.6/el7. This requires restarting the agent (steps 21-23), which are not
> > needed in current procedure.
> What will happen if we do not restart the engine and continue running with
> 2400 score until they are restarted? 

I didn't try that. Most likely everything will work well. Most likely some 3.6 features won't, not sure.

I'd personally not consider a system fully-upgraded until all hosts have score 3400. For any other case, users should have good reasons and know what they are doing.

> 
> > This also has a side-effect of upgrading the
> > engine only after upgrading the hosts. Can't see any real issue with this,
> > just in case setting needinfo on Doron in case he wants to comment/review or
> > wants me to try some other flow.
> Why couldn't you upgrade the engine first?

Because engine-setup will tell you you can't, see bug 1311027.

> Answer: because you would not be
> able adding the 3.5 host to the environment. I see.
> The only challenge here is what if a customer would upgrade the manager on
> its own first. There is no rollback for this.
> I guess, if that will be the case, we can use the flow suggested in the kcs.
> https://access.redhat.com/solutions/2351141
> > 
> > BTW, not sure what Yaniv meant by "in cluster upgrade". It can't be "in
> > cluster", because we do not allow a single cluster with both el6 and el7
> > hosts, no matter if rhev-h or rhel-h.
> Indeed, I don't know why you couldn't add it to the original cluster. I do
> not see anywhere where we specify host OS in 3.5 on the cluster level.

We do not specify.

It's just that the engine checks the OS of one host in the cluster, if it's not empty.

Try this (in 3.5):

1. Create a new cluster
2. Add an el6 host - works
3. Add an el7 host - fails
4. Remove the el6 host - works
5. Add an el7 host - works
6. Add an el6 host - fails

> However, as long as this procedure is tested and documented, I think it is
> ok.
> 
> So, this would cover both RHEL and RHEVH upgrade flows, right?
> For RHEV-H it would be same, just get RHEV-H hosts instead of RHELs.

Indeed. Didn't try with RHEV-H hosts, but can't see why it won't work.

Comment 33 Yaniv Lavi 2016-08-09 11:38:47 UTC

*** Bug 1364568 has been marked as a duplicate of this bug. ***

Comment 49 Byron Gravenorst 2016-09-19 04:51:05 UTC

*** Bug 1364543 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.