Bug 1167801 - engine-setup fails to upgrade when it fails to stop dwh
Summary: engine-setup fails to upgrade when it fails to stop dwh
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-setup
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 3.5.0
Assignee: Yedidyah Bar David
QA Contact: Petr Matyáš
URL:
Whiteboard: infra
Depends On:
Blocks: rhev35rcblocker rhev35gablocker
TreeView+ depends on / blocked
 
Reported: 2014-11-25 12:50 UTC by movciari
Modified: 2019-04-28 13:46 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: Release Note
Doc Text:
When running engine-setup on engine machine, if setup does not succeed in stopping DWH process (which might happen e.g. if dwhd already died/disconnected uncleanly), the setup is aborted. Workaround will be to stop the ovirt-engine-dwhd service manually and run on engine db : UPDATE dwh_history_timekeeping SET var_value = 0 WHERE var_name = 'DwhCurrentlyRunning' Then running again engine-setup should succeed.
Clone Of:
Environment:
Last Closed: 2015-02-17 17:12:05 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine upgrade log (385.08 KB, text/plain)
2014-11-25 12:50 UTC, movciari
no flags Details
dwhd log (800 bytes, text/plain)
2014-11-25 12:52 UTC, movciari
no flags Details
dwh logs (1.38 MB, application/x-gzip)
2014-11-26 12:48 UTC, Yedidyah Bar David
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2599851 0 None None None 2016-09-02 20:23:04 UTC
oVirt gerrit 35583 0 master MERGED packaging: setup: unset DisconnectDwh on failure 2021-01-26 17:50:17 UTC
oVirt gerrit 35598 0 None MERGED packaging: setup: unset DisconnectDwh on failure 2021-01-26 17:50:18 UTC

Description movciari 2014-11-25 12:50:18 UTC
Created attachment 961210 [details]
engine upgrade log

Description of problem:
on my setup where engine, dwh, reports have each its own separate servers, when i tried to upgrade from vt9 to vt11 i got the following error during engine-setup on host with engine:
[ INFO  ] Stage: Misc configuration
[ INFO  ] Stopping DWH service on host mo-1.rhev.lab.eng.brq.redhat.com...

[ ERROR ] dwhd is currently running. Its hostname is mo-1.rhev.lab.eng.brq.redhat.com. Please stop it before running Setup.
[ ERROR ] Failed to execute stage 'Misc configuration': dwhd is currently running
[ INFO  ] Yum Performing yum transaction rollback
[ INFO  ] Stage: Clean up
          Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20141124144013-w5uq3h.log
[ INFO  ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20141124144140-setup.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Execution of setup failed

when this happened first time, dwhd was running, so i tried to stop it manually and i got the same error

Version-Release number of selected component (if applicable):
vt11

How reproducible:
100%

Steps to Reproduce:
1. install engine on host 'a', dwh on host 'b' and reports on host 'c' - older version
2. change repos to new version and upgrade relevant packages on all hosts:
rhevm-setup on host 'a', rhevm-dwh-seutp on host 'b', rhevm-reports-setup on host 'c'
3. run engine-setup on host 'a'

Actual results:
update fails

Expected results:
update should pass

Additional info:

Comment 1 movciari 2014-11-25 12:52:39 UTC
Created attachment 961211 [details]
dwhd log

Comment 2 Shirly Radco 2014-11-25 12:59:05 UTC
Did you first upgrade to 3.5 and then moved the dwh and reports to separate host?
If not, Please follow the upgrade procedure as documented in :
DWH:
https://bugzilla.redhat.com/show_bug.cgi?id=1156009 

Reports:
https://bugzilla.redhat.com/show_bug.cgi?id=1156015

Comment 3 movciari 2014-11-25 13:26:41 UTC
this was installed as a setup on separate hosts from the beginning, and i'm upgrading from older build of 3.5 to newer build in order to test if it will work for zstream updates once released

Comment 4 Shirly Radco 2014-11-25 13:37:00 UTC
But you stated in steps to reproduce that you have installed in hosts b and c older version.

They should have latest repos. 

To test the z-stream upgrade, run engine-setup again after you installed it successfully the first time.
Not sure what you used as "older" version but it might have bugs that were already fixed.

Didi, do you agree?

Comment 5 Shirly Radco 2014-11-25 13:39:53 UTC
Michal, how did you stop the dwh service?

Comment 6 movciari 2014-11-25 13:54:28 UTC
i stopped dwh service with "service ovirt-engine-dwhd stop"... that, and "/etc/init.d/ovirt-engine-dwhd stop" are the only correct ways to stop a service in a system that uses sysV init

older version means older version of 3.5 - i used build vt9...
and yes, i installed it successfully the first time, but on a bit older build - this should be correct testing scenario
if there is some bug in vt9 making update to vt11 fail that was already fixed, please tell me and i will wait for vt12 before verifying https://bugzilla.redhat.com/show_bug.cgi?id=1118322 and https://bugzilla.redhat.com/show_bug.cgi?id=1100205 and testing if zstream update could work

i don't think running engine-setup with same build is sufficient test because setup behaves differently if there are new packages (downloading and installing new rpms, etc.)

Comment 9 Yedidyah Bar David 2014-11-26 10:21:29 UTC
Results of analysis on the machines of the reporter (Thanks, Michal!):

1. If for some reason dwhd looses contact with the engine db, it more-or-less "hangs up". That is, it does not exit, nor try to reconnect.

Not sure what was the root cause in this specific case. First error in the log was:

2014-11-04 10:38:10|DswxDN|3L7BLF|f1VhYd|OVIRT_ENGINE_DWH|ConfigurationSync|Default|6|Java Exception|tJDBCOutput_9|org.postgresql.util.PSQLException:FATAL: terminating connection due to administrator command|1

When I later tried to reproduce by restarting pg on engine db, I got on dwh a different error:

2014-11-26 10:45:11|YtiyXa|YtiyXa|YtiyXa|OVIRT_ENGINE_DWH|HistoryETL|Default|6|Java Exception|tJDBCInput_1|java.lang.NullPointerException:null|1
2014-11-26 10:46:00|dP658j|YtiyXa|3wzY5W|OVIRT_ENGINE_DWH|SampleTimeKeepingJob|Default|6|Java Exception|tJDBCConnection_3|org.postgresql.util.PSQLException:Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.|1

I'll attach all logs.

Since this was the root cause for this bug, I changed the summary accordingly. We opened other bugs for the next steps.

It might be that dwhd does try to reconnect but somehow this does not work well. Not sure. Perhaps the best solution will be to just exit with a message to the log.

2. Running engine-setup on engine side will try to disconnect dwh by setting DisconnectDwh to 1, waiting a bit, then failing. That's the report in this bug's description. It will not set it to 0 before failing, see bug 1168160 for that.

3. Manually restarting dwhd then will make it exit, because it sees that DisconnectDwh is 1. Currently it does not log this, see bug 1168141.

Comment 10 Yedidyah Bar David 2014-11-26 10:22:29 UTC
Workaround:

reset DisconnectDwh by running on engine db:

update vdc_options set option_value='0' where option_name = 'DisconnectDwh';

and then restart dwhd.

Comment 11 Yedidyah Bar David 2014-11-26 10:23:37 UTC
Lowering severity because there is a workaround.

Comment 12 Sandro Bonazzola 2014-11-26 10:28:04 UTC
Moving to infra since it seems an issue within the dwhd daemon.

Comment 13 Yedidyah Bar David 2014-11-26 10:28:34 UTC
Actually the workaround in comment 10 is for bug 1168160. For this bug, restarting dwhd should be enough (after fixing the root cause preventing it from accessing the engine's db).

Comment 14 Yedidyah Bar David 2014-11-26 12:48:01 UTC
Created attachment 961653 [details]
dwh logs

Comment 16 Julie 2014-12-09 07:53:44 UTC
This bug has missed the release notes cut-off date and will be excluded from the release notes.

Comment 17 Eyal Edri 2015-02-17 17:12:05 UTC
rhev 3.5.0 was released. closing.

Comment 18 Nicolas Ecarnot 2015-09-25 08:16:51 UTC
(In reply to Yedidyah Bar David from comment #13)
> Actually the workaround in comment 10 is for bug 1168160. For this bug,
> restarting dwhd should be enough (after fixing the root cause preventing it
> from accessing the engine's db).

I confirm this : I was upgrading from 3.5.1.1-1.el6 to 3.5.4.
engine-setup broke with the error above ([ ERROR ] dwhd is currently running).

I did nothing else than "service ovirt-engine-dwhd restart", then run again engine-setup.
I did NOT run any SQL query.

engine-setup restarting well.

Comment 19 Michael Watters 2017-05-02 13:24:12 UTC
This bug still occurs when attempting to upgrade from ovirt 4.0 to 4.1.

Comment 20 Sandro Bonazzola 2017-05-02 13:36:46 UTC
(In reply to Michael Watters from comment #19)
> This bug still occurs when attempting to upgrade from ovirt 4.0 to 4.1.

Can you please open a new bug on oVirt 4.1 and attach log-collector report to it?

Comment 21 Yedidyah Bar David 2017-05-03 06:30:43 UTC
Michael opened bug 1447347.


Note You need to log in before you can comment on or make changes to this bug.