Bug 1290073

Summary: engine-setup should warn users running within hosted engine to set to maintenance
Product: [oVirt] ovirt-engine Reporter: Sandro Bonazzola <sbonazzo>
Component: Setup.EngineAssignee: Lev Veyde <lveyde>
Status: CLOSED CURRENTRELEASE QA Contact: Jiri Belka <jbelka>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.1.2CC: bugs, dfediuck, didi, lveyde, mavital, mkalinin, rmartins, sbonazzo, stirabos, ykaul, ylavi
Target Milestone: ovirt-4.0.2Keywords: EasyFix, ZStream
Target Release: 4.0.2.1Flags: rule-engine: ovirt-4.0.z+
rule-engine: exception+
ylavi: planning_ack+
dfediuck: devel_ack+
mavital: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Feature: Warn users to set system into global maintenance mode before running engine-setup. Reason: Data corruption may occur if the engine-setup is run without setting the system into global maintenance. Result: The user is warned and the setup will be aborted if the system is not in the global maintenance mode, if the engine is running in the hosted engine configuration.
Story Points: ---
Clone Of:
: 1359844 (view as bug list) Environment:
Last Closed: 2016-08-12 14:26:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 902971, 1359844    

Description Sandro Bonazzola 2015-12-09 15:40:57 UTC
Description of problem:
With an Hosted Engine running ovirt-engine 3.6.1.1 I started and upgrade forgetting to move the cluster to global maintenance.
engine-setup started the upgrade to 3.6.1.2 and while handling the DB upgrade the VM got fenced.

Running again engine-setup shows:

[ ERROR ] Failed to execute stage 'Misc configuration': function getdwhhistorytimekeepingbyvarname(unknown) does not exist LINE 2:             select * from GetDwhHistoryTimekeepingByVarName(                                   ^ HINT:  No function matches the given name and argument types. You might need to add explicit type casts. 
[ INFO  ] Yum Performing yum transaction rollback

so the engine instance got corrupted.

engine-setup should avoid to enter misc stage and changing data on disk if running within a hosted engine without being in maintenance.


Version-Release number of selected component (if applicable):
ovirt-engine-3.6.1.1
ovirt-engine-setup-3.6.1.2

How reproducible:

Steps to Reproduce:
1. install hosted engine
2. update the engine without moving to maintenance


Actual results:
hosted engine VM get fenced causing data loss

Expected results:
engine-setup should exit if not in global maintenance


Additional info:

Comment 1 Sandro Bonazzola 2015-12-23 12:18:57 UTC
Let's add some warning pointing to the documentation to just read it before starting the upgrade and let's make sure the doc says to move hosts to global maintenance.

Comment 2 Yedidyah Bar David 2016-03-08 15:52:39 UTC
We added logic to test if we are hosted-engine in bug 1311027. Can reuse parts for current bug.

Comment 3 Yaniv Kaul 2016-04-10 13:15:06 UTC
(Should be proposed for 4.0 before being backported to 3.6.x).

Comment 4 Sandro Bonazzola 2016-04-28 11:26:05 UTC
Do you think QE should check this warning in both 4.0 and 3.6?
Every oVirt bug targeted to 3.6.x must go through master first if not clearly stated that it affects 3.6.x only.

Comment 5 Yaniv Kaul 2016-05-01 07:24:23 UTC
(In reply to Sandro Bonazzola from comment #4)
> Do you think QE should check this warning in both 4.0 and 3.6?

They do, that's part of the reason for the cloning process.

> Every oVirt bug targeted to 3.6.x must go through master first if not
> clearly stated that it affects 3.6.x only.

Comment 6 Simone Tiraboschi 2016-05-05 12:17:40 UTC
*** Bug 1333166 has been marked as a duplicate of this bug. ***

Comment 7 Marina Kalinin 2016-05-05 14:57:39 UTC
If possible, I vote for the engine to check maintenance itself and quit, if not enabled. Users may just skip it.
Or, at least, make the default answer as [Abort] rather then [Continue].

Comment 8 Yaniv Lavi 2016-05-05 15:19:18 UTC
(In reply to Marina from comment #7)
> If possible, I vote for the engine to check maintenance itself and quit, if
> not enabled. Users may just skip it.
> Or, at least, make the default answer as [Abort] rather then [Continue].

We do not want to add hosted engine specific tests in the setup, since it will make things much more complicated.

Comment 9 Yaniv Lavi 2016-07-05 08:32:40 UTC
Why do we need to test in two streams?

Comment 10 Lev Veyde 2016-07-05 13:22:28 UTC
(In reply to Yaniv Dary from comment #8)
> (In reply to Marina from comment #7)
> > If possible, I vote for the engine to check maintenance itself and quit, if
> > not enabled. Users may just skip it.
> > Or, at least, make the default answer as [Abort] rather then [Continue].
> 
> We do not want to add hosted engine specific tests in the setup, since it
> will make things much more complicated.

In this particular case we don't have too many options, as we need to stop the engine-setup itself.

After internal discussions we decided on approach that Marina suggested, as the safest way. In case the user wants to override and continue, he/she required to add a flag into the answer file.

Comment 11 Sandro Bonazzola 2016-07-15 14:44:16 UTC
As Lev mentioned we discussed the issue with GSS, not checking if we're running on HE without global maintenance will mean data corruption on hosted engine vm fencing.

Comment 13 Yaniv Lavi 2016-07-28 12:32:19 UTC
Should this be on qa?

Comment 14 Lev Veyde 2016-07-28 13:19:28 UTC
(In reply to Yaniv Dary from comment #13)
> Should this be on qa?

We just recently merged it, so need to check if there is a build with the patch available.

Comment 15 Jiri Belka 2016-08-01 11:17:55 UTC
ok, ovirt-engine-setup.noarch 0:4.0.2.3-0.1.el7ev

...
[ ERROR ] It seems that you are running your engine inside of the hosted-engine VM and are not in "Global
 Maintenance" mode. In that case you should put the system into the "Global Maintenance" mode before runn
ing engine-setup, or the hosted-engine HA agent might kill the machine, which might corrupt your data. 
[ ERROR ] Failed to execute stage 'Setup validation': Hosted Engine setup detected, but Global Maintenanc
e is not set.
...