Description of problem: With an Hosted Engine running ovirt-engine 3.6.1.1 I started and upgrade forgetting to move the cluster to global maintenance. engine-setup started the upgrade to 3.6.1.2 and while handling the DB upgrade the VM got fenced. Running again engine-setup shows: [ ERROR ] Failed to execute stage 'Misc configuration': function getdwhhistorytimekeepingbyvarname(unknown) does not exist LINE 2: select * from GetDwhHistoryTimekeepingByVarName( ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. [ INFO ] Yum Performing yum transaction rollback so the engine instance got corrupted. engine-setup should avoid to enter misc stage and changing data on disk if running within a hosted engine without being in maintenance. Version-Release number of selected component (if applicable): ovirt-engine-3.6.1.1 ovirt-engine-setup-3.6.1.2 How reproducible: Steps to Reproduce: 1. install hosted engine 2. update the engine without moving to maintenance Actual results: hosted engine VM get fenced causing data loss Expected results: engine-setup should exit if not in global maintenance Additional info:
Let's add some warning pointing to the documentation to just read it before starting the upgrade and let's make sure the doc says to move hosts to global maintenance.
We added logic to test if we are hosted-engine in bug 1311027. Can reuse parts for current bug.
(Should be proposed for 4.0 before being backported to 3.6.x).
Do you think QE should check this warning in both 4.0 and 3.6? Every oVirt bug targeted to 3.6.x must go through master first if not clearly stated that it affects 3.6.x only.
(In reply to Sandro Bonazzola from comment #4) > Do you think QE should check this warning in both 4.0 and 3.6? They do, that's part of the reason for the cloning process. > Every oVirt bug targeted to 3.6.x must go through master first if not > clearly stated that it affects 3.6.x only.
*** Bug 1333166 has been marked as a duplicate of this bug. ***
If possible, I vote for the engine to check maintenance itself and quit, if not enabled. Users may just skip it. Or, at least, make the default answer as [Abort] rather then [Continue].
(In reply to Marina from comment #7) > If possible, I vote for the engine to check maintenance itself and quit, if > not enabled. Users may just skip it. > Or, at least, make the default answer as [Abort] rather then [Continue]. We do not want to add hosted engine specific tests in the setup, since it will make things much more complicated.
Why do we need to test in two streams?
(In reply to Yaniv Dary from comment #8) > (In reply to Marina from comment #7) > > If possible, I vote for the engine to check maintenance itself and quit, if > > not enabled. Users may just skip it. > > Or, at least, make the default answer as [Abort] rather then [Continue]. > > We do not want to add hosted engine specific tests in the setup, since it > will make things much more complicated. In this particular case we don't have too many options, as we need to stop the engine-setup itself. After internal discussions we decided on approach that Marina suggested, as the safest way. In case the user wants to override and continue, he/she required to add a flag into the answer file.
As Lev mentioned we discussed the issue with GSS, not checking if we're running on HE without global maintenance will mean data corruption on hosted engine vm fencing.
Should this be on qa?
(In reply to Yaniv Dary from comment #13) > Should this be on qa? We just recently merged it, so need to check if there is a build with the patch available.
ok, ovirt-engine-setup.noarch 0:4.0.2.3-0.1.el7ev ... [ ERROR ] It seems that you are running your engine inside of the hosted-engine VM and are not in "Global Maintenance" mode. In that case you should put the system into the "Global Maintenance" mode before runn ing engine-setup, or the hosted-engine HA agent might kill the machine, which might corrupt your data. [ ERROR ] Failed to execute stage 'Setup validation': Hosted Engine setup detected, but Global Maintenanc e is not set. ...