1847963 – [TESTONLY] Verify dwh+grafana on a separate machine

Bug 1847963 - [TESTONLY] Verify dwh+grafana on a separate machine

Summary: [TESTONLY] Verify dwh+grafana on a separate machine

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine-dwh
Classification:	oVirt
Component:	Setup
Sub Component:
Version:	4.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	ovirt-4.4.2
Target Release:	4.4.2
Assignee:	Yedidyah Bar David
QA Contact:	Pavel Novotny
Docs Contact:
URL:
Whiteboard:
Depends On:	1856677
Blocks:
TreeView+	depends on / blocked

Reported:	2020-06-17 12:54 UTC by Yedidyah Bar David
Modified:	2020-09-18 07:13 UTC (History)
CC List:	2 users (show)
Fixed In Version:	ovirt-engine-dwh-4.4.2
Clone Of:
Environment:
Last Closed:	2020-09-18 07:13:23 UTC
oVirt Team:	Metrics
Embargoed:
Flags:	sbonazzo: ovirt-4.4? sbonazzo: planning_ack? sbonazzo: devel_ack+ lleistne: testing_ack+

Attachments	(Terms of Use)

Description Yedidyah Bar David 2020-06-17 12:54:06 UTC

In an internal discussion, we decided to support either engine+dwh+grafana all on the same machine, or: engine on one machine, dwh+grafana on another.

Verify that everything works fine.

Check upgrade, engine-setup rollback, backup/restore, etc.

See also bug 1846279.

Comment 1 Yedidyah Bar David 2020-06-17 13:28:40 UTC

Another relevant flow that I tested:

- Install and setup engine (only, no dwh or grafana) on machine A
- Install and setup dwh on machine B
- Run on machine A 'engine-setup --reconfigure-optional-components', now reply 'Yes' to 'Configure Grafana?'

This flow might have an advantage of allowing splitting the load between the machines (similarly to comment 0), but have only a single machine accessed from the outside for management (so simpler firewall configuration etc., also a bit simpler to setup). Main disadvantage is for people that want to split stuff for extra security - if e.g. a bug is found in grafana that allows taking over the machine it's running on, also the engine can be attacked this way.

Comment 2 Lucie Leistnerova 2020-07-11 10:19:13 UTC

fresh install of dwh 4.4.1 - OK

backup-restore - failed
can't start postgres service

Jul 11 11:06:23 10-37-140-71 systemd[1]: postgresql.service: Start request repeated too quickly.
Jul 11 11:06:23 10-37-140-71 systemd[1]: postgresql.service: Failed with result 'start-limit-hit'.
Jul 11 11:06:23 10-37-140-71 systemd[1]: Failed to start PostgreSQL database server.

upgrade from beta 4.4.0.2-1.el8ev - failed
can't enrol certificates

Failed to execute stage 'Environment customization': str, bytes or bytearray expected, not NoneType


Tested in ovirt-engine-4.4.1.8-0.7.el8ev.noarch with ovirt-engine-dwh-setup-4.4.1.2-1.el8ev.noarch

Comment 4 Lucie Leistnerova 2020-07-11 10:36:49 UTC

Reproduction steps for backup-restore:
1. install dwh on separate machine
2. yum install ovirt-engine-tools-backup
3. engine-backup --file=/tmp/dwh.bck --log=/tmp/backup.log
4. engine-cleanup
5. engine-backup --mode=restore --provision-all-databases --file=/tmp/dwh.bck

Comment 5 Lucie Leistnerova 2020-07-11 17:31:14 UTC

Upgrade 4.3 -> 4.4 worked OK.

Comment 6 Yedidyah Bar David 2020-07-14 08:15:16 UTC

(In reply to Lucie Leistnerova from comment #4)
> Reproduction steps for backup-restore:
> 1. install dwh on separate machine
> 2. yum install ovirt-engine-tools-backup
> 3. engine-backup --file=/tmp/dwh.bck --log=/tmp/backup.log
> 4. engine-cleanup
> 5. engine-backup --mode=restore --provision-all-databases --file=/tmp/dwh.bck

There is no bug in the code; this failed for you simply because the machine was too fast :-)

systemd limits service restarts, by default, to up to 5 times per 10 seconds.
Due to various reasons, some of which are probably irrelevant anymore, we restart PG quite many times in db/user provisioning, and adding creation of grafana user was probably what "broke the camel's back".

Not sure what's the best solution, but IMO it's not here (in dwh), so opening for now a bug on (and pushing a patch for) the engine, to call 'systemctl reset-failed postgresql' after we restart it. No problem making current depend on new, but in principle you can verify it without any change, if it's a little bit slower.

This was a bit hard to diagnose, because only partial logs were provided. comment 2 does clarify the reason, but the error there does not appear in any of the attached logs. restore/postgresql-07.log indeed does show that last 6 restarts happened during 4 seconds (first of which at 09:06:19.847, last at 09:06:23.054).

Comment 7 Yedidyah Bar David 2020-07-16 12:58:19 UTC

Merged the patch for bug 1856677, moving to MODIFIED.

Comment 8 Pavel Novotny 2020-09-07 23:11:43 UTC

Verified in
ovirt-engine-4.4.2.3-0.6.el8ev.noarch
ovirt-engine-dwh-4.4.2.1-1.el8ev.noarch
grafana-6.3.6-2.el8_2.x86_64

Setup: engine on machine A, DWH+Grafana on machine B.

Fresh installation - passed.
Upgrade from 4.4.1 to 4.4.2 - passed.
Backup & restore of DWH+Grafana - passed.
oVirt SSO - failed (I will file separate bug for it as it's not a blocker. Currently I am investigating if it's caused by upgrade or backup&restore or if it's broken after fresh install).

Comment 9 Sandro Bonazzola 2020-09-18 07:13:23 UTC

This bugzilla is included in oVirt 4.4.2 release, published on September 17th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.