Bug 1397005

Summary: engine-setup fails on checking if dwhd is running
Product: [oVirt] ovirt-engine Reporter: Petr Matyáš <pmatyas>
Component: Setup.EngineAssignee: Yedidyah Bar David <didi>
Status: CLOSED WORKSFORME QA Contact: Pavel Stehlik <pstehlik>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.0.6CC: bugs, didi, oourfali, pmatyas, sradco, ylavi
Target Milestone: ovirt-4.0.6Flags: rule-engine: ovirt-4.0.z+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-28 11:29:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Metrics RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log
none
dwhd log
none
postgresql logs
none
yum log
none
engine setup log none

Description Petr Matyáš 2016-11-21 11:36:52 UTC
Created attachment 1222361 [details]
engine log

Description of problem:
After running engine-setup after upgrade from 4.0.5-7 to 4.0.6-1 the setup fails on checking if dwhd is running even though dwhd service is not running and should have been stopped.
DWH is local, not remote.

Version-Release number of selected component (if applicable):
4.0.6-1

How reproducible:
always

Steps to Reproduce:
1. install 4.0.5-7
2. upgrade packages to 4.0.6-1
3. run engine-setup

Actual results:
fail on dwhd is running check

Expected results:
successful engine-setup

Additional info:
[ INFO  ] Stopping dwh service
[ INFO  ] Stopping Image I/O Proxy service
[ INFO  ] Stopping websocket-proxy service
[ ERROR ] dwhd is currently running. Its hostname is pm-rh40.rhev.lab.eng.brq.redhat.com. Please stop it before running Setup.
[ ERROR ] Failed to execute stage 'Transaction setup': dwhd is currently running

Comment 1 Yedidyah Bar David 2016-11-21 12:27:32 UTC
Please attach postgresql logs. Most likely this is a duplicate of bug 1286441.

Comment 2 Yedidyah Bar David 2016-11-21 12:28:14 UTC
However, considering the number of duplicates it has, perhaps we should try to do something.

Comment 3 Yedidyah Bar David 2016-11-21 12:28:56 UTC
And please attach also dwhd logs. Thanks.

Comment 4 Petr Matyáš 2016-11-21 12:30:34 UTC
Created attachment 1222379 [details]
dwhd log

Comment 5 Petr Matyáš 2016-11-21 12:36:31 UTC
Created attachment 1222380 [details]
postgresql logs

Comment 6 Yedidyah Bar David 2016-11-21 12:42:39 UTC
Please also yum logs. Sorry :-( Thanks.

Comment 7 Petr Matyáš 2016-11-21 13:33:19 UTC
Created attachment 1222397 [details]
yum log

Comment 8 Yedidyah Bar David 2016-11-21 13:42:26 UTC
Please attach also setup logs. Thanks.

Comment 9 Petr Matyáš 2016-11-21 14:05:36 UTC
Created attachment 1222399 [details]
engine setup log

...I should have packed whole /var/log folder...

Comment 10 Yedidyah Bar David 2016-11-21 14:42:12 UTC
1. yum updated postgresql. yum.log:

Nov 21 10:15:35 Updated: postgresql-server-9.2.18-1.el7.x86_64

2. dwhd fails to reconnect. First error in dwhd log:

2016-11-21 10:18:47|sIAAti|yb0VIK|q0Ha1J|OVIRT_ENGINE_DWH|OsEnumUpdate|Default|6|Java Exception|tJDBCInput_4|org.postgresql.util.PSQLException:FATAL: terminating connection due to administrator command|1

Seems like it never succeeded.

3. engine-setup stops dwhd "successfully":

2016-11-21 10:35:13 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'stop', 'ovirt-engine-dwhd.service'), rc=0

Nothing in stdout/stderr.

4. engine-setup checks the db and sees it's still "up":

2016-11-21 10:35:13 DEBUG otopi.ovirt_engine_setup.engine_common.database database.execute:222 Result: [{'var_value': '1', 'var_datetime': None, 'var_name': 'DwhCurrentlyRunning'}]

Shirly, please have a look. Indeed seems like a duplicate of bug 1286441. But the fix there was supposed to work in this case - between upgrading PG and stopping dwhd passed 20 minutes, this should have been enough for dwhd to reconnect. It might be related to the upgrade. IIRC all the duplicates there are upgrades. Perhaps dwhd with the old pg client library can't connect to the upgraded pg server.

Comment 12 Petr Matyáš 2016-11-28 11:29:45 UTC
This issue must have been related to some particular setup and I can't reporoduce this anymore.