Bug 1844965 - engine logs are not copied
Summary: engine logs are not copied
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-ansible-collection
Classification: oVirt
Component: hosted-engine-setup
Version: unspecified
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.4.3-1
: 1.2.1
Assignee: Yedidyah Bar David
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On: 1892378
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-08 06:54 UTC by Yedidyah Bar David
Modified: 2020-11-15 10:50 UTC (History)
4 users (show)

Fixed In Version: ovirt-ansible-collection-1.2.1
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-06 14:01:26 UTC
oVirt Team: Integration
Embargoed:
pm-rhel: ovirt-4.4+
pelauter: planning_ack+
sbonazzo: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-ansible-collection pull 151 0 None closed roles: hosted_engine_setup: Improve engine logs fetching 2020-11-12 09:53:41 UTC

Description Yedidyah Bar David 2020-06-08 06:54:30 UTC
Description of problem:

We have code to copy /var/log/ovirt-engine from the engine VM to the host, at certain points during the deploy process, some of them conditional.

In many cases, directories (/var/log/ovirt-hosted-engine-setup/engine-logs*) are created, but left empty.

This is makes it hard to investigate deploy failures.

I have a guess as to why this (sometimes) happens: We copy the logs from the VM disk image directly, not via ssh to the VM. So perhaps sometimes the logs are still cached in the VM memory and not written to disk.

A solution might be:

1. sync before fetching the logs.

2. Try also ssh to the VM for getting the logs (and do not fail if this fails, as it might be dead).

Often, this information is needed when we timed out waiting for the host to be Up. In this case, it would be nice to also try to get from the logs the reason why it's not Up and output that to the user, instead of, or in addition to, the generic message.

Doing this correctly probably requires also some changes to the engine. For now, I guess we can simply search all ERRORs in engine.log.

Version-Release number of selected component (if applicable):
Since 4.3 or so

How reproducible:
Not sure, but often

Steps to Reproduce:
1. Deploy hosted-engine
2.
3.

Actual results:
Some /var/log/ovirt-hosted-engine-setup/engine-logs directories are empty

Expected results:
All of them are full, and up-to-date compared to their source in the engine VM (at the time of copying them)

Additional info:

Pushed [1] for calling sync. Didn't verify it yet.

[1] https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/pull/325

Comment 1 Yedidyah Bar David 2020-06-16 13:54:04 UTC
https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/pull/325 was merged yesterday.

I now tried deploy with it, and it still did not collect logs - it did not call 'sync'. Flow seems to have been:

- otopi calls:

2020-06-16 13:15:27,659+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:188 ansible-playbook: cmd: ['/bin/ansible-playbook', '--module-path=/usr/share/ovirt-hosted-engine-setup/ansible', '--inventory=localhost,didi-centos8-he-engine.lab.eng.tlv2.redhat.com', '--extra-vars=@/tmp/tmpru00p_pu', '--tags=bootstrap_local_vm', '--skip-tags=always', '/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml']

engine-setup fails (due to another bug), and this playbook terminates (and does not call sync)

- otopi calls:

2020-06-16 13:38:49,812+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:188 ansible-playbook: cmd: ['/bin/ansible-playbook', '--module-path=/usr/share/ovirt-hosted-engine-setup/ansible', '--inventory=localhost,', '--extra-vars=@/tmp/tmp30niqg_9', '--tags=final_clean', '--skip-tags=always', '/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml']

and this (in final_clean.yml -> fetch_engine_logs.yml) collects the logs, but does not call sync either.

Places running stuff on the engine do this by:
    delegate_to: "{{ groups.engine[0] }}"
but above last run has only 'localhost' in its inventory. Perhaps I should somehow add also the engine there, delegate to it. Need to see how to do that.

For now, keeping the bug on NEW, as the most important collection, in the end, is still empty.

Comment 2 Sandro Bonazzola 2020-10-26 07:43:59 UTC
Moving to 4.4.4 since we reached development freeze for 4.4.3 and this is not marked as blocker.

Comment 3 Yedidyah Bar David 2020-10-27 08:15:11 UTC
Found at least one significant reason for not being able to collect the logs:

Relevant steps, in their run order:

1. otopi [1] calls ansible [2], which also creates the local vm and sets otopi_localvm_dir, to be returned to otopi and used by it [3]

2. otopi reads the result, gets otopi_localvm_dir and sets OVEHOSTED_CORE/localVMDir in its own env.

3. Later, otopi calls ansible [4], passes OVEHOSTED_CORE/localVMDir as ansible var he_local_vm_dir.

4. [4] also tries to collect engine logs, checking the local vm dir.

If ansible [2] fails in the middle, before setting otopi_localvm_dir, otopi won't get it, so will not be able to pass it, so we fail to collect logs.
I didn't yet check how this is working when deploying from cockpit.

[1] src/plugins/gr-he-ansiblesetup/core/misc.py:_closeup

[2] bootstrap_local_vm

[3] bootstrap_local_vm/02_create_local_vm.yml

[4] final_clean.yml

Comment 4 Yedidyah Bar David 2020-10-28 15:33:31 UTC
QE: This, together with bug 1892378, should handle several more cases of a failed hosted-engine deploy with empty engine-logs directories.

For reproduction/verification, you should deploy hosted-engine as usual, but make it fail in the middle, after the local engine machine is up.

One flow I tried that does not work is pressing ^C :-(, so don't use that.

One that does work, in ovirt-system-tests, is: https://gerrit.ovirt.org/111926. It's a simple patch, which you can adapt to manual or other automated testing.

From now on, if you notice any failed hosted-engine deployment that does not provide engine-logs on the host, please open a bug. Ideally, I'd like to cover all relevant flows.

Yes, I agree that pressing ^C is a relevant flow, although I didn't open a bug for it. It's not high priority, IMO - if users pressed ^C, and deploy failed, they should know what it failed.

Comment 6 Sandro Bonazzola 2020-11-06 13:48:41 UTC
$ git tag --contains  7bde5f3
1.2.0-1
1.2.1-1

This can be tested with 4.4.3.

Comment 9 Yedidyah Bar David 2020-11-08 08:02:33 UTC
QE: I don't mind that current bug would not be explicitly verified.

I'd like, though, to raise your awareness about it.

If you do a hosted-engine deployment, and engine-logs-* directory is empty, please open a bug.

Ideally, I'd like this to never happen.

Nikolai - setting needinfo on you as QE owner, but it's not only for you :-). Not sure who else should be aware.

Thanks.

Comment 10 Sandro Bonazzola 2020-11-11 06:45:44 UTC
This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 11 Nikolai Sednev 2020-11-15 10:48:46 UTC
alma07 ~]# ll -lsha /var/log/ovirt-hosted-engine-setup/
total 2.5M
4.0K drwx------.  4 root root 4.0K Nov  3 18:12 .
4.0K drwxr-xr-x. 20 root root 4.0K Nov 15 03:27 ..
4.0K drwx------.  3 root root 4.0K Nov  3 18:12 engine-logs-2020-11-03T16:04:34Z
4.0K drwx------.  2 root root 4.0K Nov  3 18:12 engine-logs-2020-11-03T16:12:16Z
616K -rw-r--r--.  1 root root 609K Nov  3 18:12 ovirt-hosted-engine-setup-20201103173400-nra9bo.log
804K -rw-r--r--.  1 root root 800K Nov  3 17:57 ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20201103173807-gseya8.log
144K -rw-r--r--.  1 root root 140K Nov  3 18:01 ovirt-hosted-engine-setup-ansible-create_storage_domain-20201103175958-qh0eed.log
416K -rw-r--r--.  1 root root 412K Nov  3 18:12 ovirt-hosted-engine-setup-ansible-create_target_vm-20201103180431-1540en.log
120K -rw-r--r--.  1 root root 116K Nov  3 18:12 ovirt-hosted-engine-setup-ansible-final_clean-20201103181213-rkih66.log
104K -rw-r--r--.  1 root root 100K Nov  3 17:34 ovirt-hosted-engine-setup-ansible-get_network_interfaces-20201103173410-p1h5g8.log
252K -rw-r--r--.  1 root root 245K Nov  3 17:38 ovirt-hosted-engine-setup-ansible-initial_clean-20201103173702-5ych7f.log

ovirt-hosted-engine-setup-2.4.8-1.el8ev.noarch
ovirt-hosted-engine-ha-2.4.5-1.el8ev.noarch

Comment 12 Nikolai Sednev 2020-11-15 10:50:22 UTC
alma07 ~]# ll -lsha /var/log/ovirt-hosted-engine-setup/engine-logs-2020-11-03T16:04:34Z
total 176K
4.0K drwx------.  3 root root 4.0K Nov  3 18:12 .
4.0K drwx------.  4 root root 4.0K Nov  3 18:12 ..
164K -rw-r--r--.  1 root root 158K Nov  3 18:12 messages
4.0K drwx------. 12  108  108 4.0K Nov  3 17:54 ovirt-engine

alma07 ~]# ll -lsha /var/log/ovirt-hosted-engine-setup/engine-logs-2020-11-03T16:04:34Z/ovirt-engine
total 2.0M
4.0K drwx------. 12  108  108 4.0K Nov  3 17:54 .
4.0K drwx------.  3 root root 4.0K Nov  3 18:12 ..
4.0K drwx------.  2  108  108 4.0K Sep 14 18:43 ansible
1.4M -rw-r--r--.  1  108  108 1.4M Nov  3 17:56 ansible-runner-service.log
8.0K -rw-r--r--.  1  108  108 5.3K Nov  3 17:53 boot.log
4.0K drwx------.  2  108  108 4.0K Sep 14 18:43 brick-setup
4.0K drwx------.  2  108  108 4.0K Sep 14 18:43 cinderlib
4.0K -rw-r--r--.  1  108  108  669 Nov  3 17:53 console.log
4.0K drwx------.  2  108  108 4.0K Sep 14 18:43 db-manual
4.0K drwx------.  2  108  108 4.0K Sep 14 18:43 dump
448K -rw-r--r--.  1  108  108 444K Nov  3 18:06 engine.log
4.0K drwx------.  2  108  108 4.0K Nov  3 17:54 host-deploy
4.0K drwx------.  2  108  108 4.0K Sep 14 18:43 notifier
4.0K drwx------.  2  108  108 4.0K Sep 14 18:43 ova
4.0K drwxr-xr-x.  2 root root 4.0K Jul 29 09:39 ovirt-log-collector
100K -rw-r--r--.  1  108  108  96K Nov  3 18:06 server.log
4.0K drwx------.  2 root root 4.0K Nov  3 17:48 setup
   0 -rw-r--r--.  1  108  108    0 Nov  3 17:51 ui.log


Note You need to log in before you can comment on or make changes to this bug.