Bug 1290073 - engine-setup should warn users running within hosted engine to set to maintenance
engine-setup should warn users running within hosted engine to set to mainten...
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: Setup.Engine (Show other bugs)
3.6.1.2
Unspecified Unspecified
high Severity high (vote)
: ovirt-4.0.2
: 4.0.2.1
Assigned To: Lev Veyde
Jiri Belka
: EasyFix, ZStream
: 1333166 (view as bug list)
Depends On:
Blocks: 902971 1359844
  Show dependency treegraph
 
Reported: 2015-12-09 10:40 EST by Sandro Bonazzola
Modified: 2016-08-12 10:26 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: Warn users to set system into global maintenance mode before running engine-setup. Reason: Data corruption may occur if the engine-setup is run without setting the system into global maintenance. Result: The user is warned and the setup will be aborted if the system is not in the global maintenance mode, if the engine is running in the hosted engine configuration.
Story Points: ---
Clone Of:
: 1359844 (view as bug list)
Environment:
Last Closed: 2016-08-12 10:26:29 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Integration
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.0.z+
rule-engine: exception+
ylavi: planning_ack+
dfediuck: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 57270 master MERGED packaging: Add Hosted Engine VM detection 2016-07-06 08:54 EDT
oVirt gerrit 60264 ovirt-engine-4.0 ABANDONED packaging: Add Hosted Engine VM detection 2016-07-06 09:43 EDT
oVirt gerrit 60265 ovirt-engine-4.0.1 ABANDONED packaging: Add Hosted Engine VM detection 2016-07-06 09:43 EDT
oVirt gerrit 60268 None None None 2016-07-07 07:53 EDT
oVirt gerrit 60275 master MERGED packaging: Add Hosted Engine VM detection 2016-07-25 10:17 EDT
oVirt gerrit 61339 ovirt-engine-3.6 MERGED packaging: Add Hosted Engine VM detection 2016-07-25 11:06 EDT
oVirt gerrit 61340 ovirt-engine-4.0 MERGED packaging: Add Hosted Engine VM detection 2016-07-25 11:06 EDT
oVirt gerrit 61341 ovirt-engine-4.0.2 MERGED packaging: Add Hosted Engine VM detection 2016-07-25 11:06 EDT

  None (edit)
Description Sandro Bonazzola 2015-12-09 10:40:57 EST
Description of problem:
With an Hosted Engine running ovirt-engine 3.6.1.1 I started and upgrade forgetting to move the cluster to global maintenance.
engine-setup started the upgrade to 3.6.1.2 and while handling the DB upgrade the VM got fenced.

Running again engine-setup shows:

[ ERROR ] Failed to execute stage 'Misc configuration': function getdwhhistorytimekeepingbyvarname(unknown) does not exist LINE 2:             select * from GetDwhHistoryTimekeepingByVarName(                                   ^ HINT:  No function matches the given name and argument types. You might need to add explicit type casts. 
[ INFO  ] Yum Performing yum transaction rollback

so the engine instance got corrupted.

engine-setup should avoid to enter misc stage and changing data on disk if running within a hosted engine without being in maintenance.


Version-Release number of selected component (if applicable):
ovirt-engine-3.6.1.1
ovirt-engine-setup-3.6.1.2

How reproducible:

Steps to Reproduce:
1. install hosted engine
2. update the engine without moving to maintenance


Actual results:
hosted engine VM get fenced causing data loss

Expected results:
engine-setup should exit if not in global maintenance


Additional info:
Comment 1 Sandro Bonazzola 2015-12-23 07:18:57 EST
Let's add some warning pointing to the documentation to just read it before starting the upgrade and let's make sure the doc says to move hosts to global maintenance.
Comment 2 Yedidyah Bar David 2016-03-08 10:52:39 EST
We added logic to test if we are hosted-engine in bug 1311027. Can reuse parts for current bug.
Comment 3 Yaniv Kaul 2016-04-10 09:15:06 EDT
(Should be proposed for 4.0 before being backported to 3.6.x).
Comment 4 Sandro Bonazzola 2016-04-28 07:26:05 EDT
Do you think QE should check this warning in both 4.0 and 3.6?
Every oVirt bug targeted to 3.6.x must go through master first if not clearly stated that it affects 3.6.x only.
Comment 5 Yaniv Kaul 2016-05-01 03:24:23 EDT
(In reply to Sandro Bonazzola from comment #4)
> Do you think QE should check this warning in both 4.0 and 3.6?

They do, that's part of the reason for the cloning process.

> Every oVirt bug targeted to 3.6.x must go through master first if not
> clearly stated that it affects 3.6.x only.
Comment 6 Simone Tiraboschi 2016-05-05 08:17:40 EDT
*** Bug 1333166 has been marked as a duplicate of this bug. ***
Comment 7 Marina 2016-05-05 10:57:39 EDT
If possible, I vote for the engine to check maintenance itself and quit, if not enabled. Users may just skip it.
Or, at least, make the default answer as [Abort] rather then [Continue].
Comment 8 Yaniv Lavi 2016-05-05 11:19:18 EDT
(In reply to Marina from comment #7)
> If possible, I vote for the engine to check maintenance itself and quit, if
> not enabled. Users may just skip it.
> Or, at least, make the default answer as [Abort] rather then [Continue].

We do not want to add hosted engine specific tests in the setup, since it will make things much more complicated.
Comment 9 Yaniv Lavi 2016-07-05 04:32:40 EDT
Why do we need to test in two streams?
Comment 10 Lev Veyde 2016-07-05 09:22:28 EDT
(In reply to Yaniv Dary from comment #8)
> (In reply to Marina from comment #7)
> > If possible, I vote for the engine to check maintenance itself and quit, if
> > not enabled. Users may just skip it.
> > Or, at least, make the default answer as [Abort] rather then [Continue].
> 
> We do not want to add hosted engine specific tests in the setup, since it
> will make things much more complicated.

In this particular case we don't have too many options, as we need to stop the engine-setup itself.

After internal discussions we decided on approach that Marina suggested, as the safest way. In case the user wants to override and continue, he/she required to add a flag into the answer file.
Comment 11 Sandro Bonazzola 2016-07-15 10:44:16 EDT
As Lev mentioned we discussed the issue with GSS, not checking if we're running on HE without global maintenance will mean data corruption on hosted engine vm fencing.
Comment 13 Yaniv Lavi 2016-07-28 08:32:19 EDT
Should this be on qa?
Comment 14 Lev Veyde 2016-07-28 09:19:28 EDT
(In reply to Yaniv Dary from comment #13)
> Should this be on qa?

We just recently merged it, so need to check if there is a build with the patch available.
Comment 15 Jiri Belka 2016-08-01 07:17:55 EDT
ok, ovirt-engine-setup.noarch 0:4.0.2.3-0.1.el7ev

...
[ ERROR ] It seems that you are running your engine inside of the hosted-engine VM and are not in "Global
 Maintenance" mode. In that case you should put the system into the "Global Maintenance" mode before runn
ing engine-setup, or the hosted-engine HA agent might kill the machine, which might corrupt your data. 
[ ERROR ] Failed to execute stage 'Setup validation': Hosted Engine setup detected, but Global Maintenanc
e is not set.
...

Note You need to log in before you can comment on or make changes to this bug.