Description of problem: We have code to copy /var/log/ovirt-engine from the engine VM to the host, at certain points during the deploy process, some of them conditional. In many cases, directories (/var/log/ovirt-hosted-engine-setup/engine-logs*) are created, but left empty. This is makes it hard to investigate deploy failures. I have a guess as to why this (sometimes) happens: We copy the logs from the VM disk image directly, not via ssh to the VM. So perhaps sometimes the logs are still cached in the VM memory and not written to disk. A solution might be: 1. sync before fetching the logs. 2. Try also ssh to the VM for getting the logs (and do not fail if this fails, as it might be dead). Often, this information is needed when we timed out waiting for the host to be Up. In this case, it would be nice to also try to get from the logs the reason why it's not Up and output that to the user, instead of, or in addition to, the generic message. Doing this correctly probably requires also some changes to the engine. For now, I guess we can simply search all ERRORs in engine.log. Version-Release number of selected component (if applicable): Since 4.3 or so How reproducible: Not sure, but often Steps to Reproduce: 1. Deploy hosted-engine 2. 3. Actual results: Some /var/log/ovirt-hosted-engine-setup/engine-logs directories are empty Expected results: All of them are full, and up-to-date compared to their source in the engine VM (at the time of copying them) Additional info: Pushed [1] for calling sync. Didn't verify it yet. [1] https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/pull/325
https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/pull/325 was merged yesterday. I now tried deploy with it, and it still did not collect logs - it did not call 'sync'. Flow seems to have been: - otopi calls: 2020-06-16 13:15:27,659+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:188 ansible-playbook: cmd: ['/bin/ansible-playbook', '--module-path=/usr/share/ovirt-hosted-engine-setup/ansible', '--inventory=localhost,didi-centos8-he-engine.lab.eng.tlv2.redhat.com', '--extra-vars=@/tmp/tmpru00p_pu', '--tags=bootstrap_local_vm', '--skip-tags=always', '/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml'] engine-setup fails (due to another bug), and this playbook terminates (and does not call sync) - otopi calls: 2020-06-16 13:38:49,812+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:188 ansible-playbook: cmd: ['/bin/ansible-playbook', '--module-path=/usr/share/ovirt-hosted-engine-setup/ansible', '--inventory=localhost,', '--extra-vars=@/tmp/tmp30niqg_9', '--tags=final_clean', '--skip-tags=always', '/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml'] and this (in final_clean.yml -> fetch_engine_logs.yml) collects the logs, but does not call sync either. Places running stuff on the engine do this by: delegate_to: "{{ groups.engine[0] }}" but above last run has only 'localhost' in its inventory. Perhaps I should somehow add also the engine there, delegate to it. Need to see how to do that. For now, keeping the bug on NEW, as the most important collection, in the end, is still empty.
Moving to 4.4.4 since we reached development freeze for 4.4.3 and this is not marked as blocker.
Found at least one significant reason for not being able to collect the logs: Relevant steps, in their run order: 1. otopi [1] calls ansible [2], which also creates the local vm and sets otopi_localvm_dir, to be returned to otopi and used by it [3] 2. otopi reads the result, gets otopi_localvm_dir and sets OVEHOSTED_CORE/localVMDir in its own env. 3. Later, otopi calls ansible [4], passes OVEHOSTED_CORE/localVMDir as ansible var he_local_vm_dir. 4. [4] also tries to collect engine logs, checking the local vm dir. If ansible [2] fails in the middle, before setting otopi_localvm_dir, otopi won't get it, so will not be able to pass it, so we fail to collect logs. I didn't yet check how this is working when deploying from cockpit. [1] src/plugins/gr-he-ansiblesetup/core/misc.py:_closeup [2] bootstrap_local_vm [3] bootstrap_local_vm/02_create_local_vm.yml [4] final_clean.yml
QE: This, together with bug 1892378, should handle several more cases of a failed hosted-engine deploy with empty engine-logs directories. For reproduction/verification, you should deploy hosted-engine as usual, but make it fail in the middle, after the local engine machine is up. One flow I tried that does not work is pressing ^C :-(, so don't use that. One that does work, in ovirt-system-tests, is: https://gerrit.ovirt.org/111926. It's a simple patch, which you can adapt to manual or other automated testing. From now on, if you notice any failed hosted-engine deployment that does not provide engine-logs on the host, please open a bug. Ideally, I'd like to cover all relevant flows. Yes, I agree that pressing ^C is a relevant flow, although I didn't open a bug for it. It's not high priority, IMO - if users pressed ^C, and deploy failed, they should know what it failed.
$ git tag --contains 7bde5f3 1.2.0-1 1.2.1-1 This can be tested with 4.4.3.
QE: I don't mind that current bug would not be explicitly verified. I'd like, though, to raise your awareness about it. If you do a hosted-engine deployment, and engine-logs-* directory is empty, please open a bug. Ideally, I'd like this to never happen. Nikolai - setting needinfo on you as QE owner, but it's not only for you :-). Not sure who else should be aware. Thanks.
This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.
alma07 ~]# ll -lsha /var/log/ovirt-hosted-engine-setup/ total 2.5M 4.0K drwx------. 4 root root 4.0K Nov 3 18:12 . 4.0K drwxr-xr-x. 20 root root 4.0K Nov 15 03:27 .. 4.0K drwx------. 3 root root 4.0K Nov 3 18:12 engine-logs-2020-11-03T16:04:34Z 4.0K drwx------. 2 root root 4.0K Nov 3 18:12 engine-logs-2020-11-03T16:12:16Z 616K -rw-r--r--. 1 root root 609K Nov 3 18:12 ovirt-hosted-engine-setup-20201103173400-nra9bo.log 804K -rw-r--r--. 1 root root 800K Nov 3 17:57 ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20201103173807-gseya8.log 144K -rw-r--r--. 1 root root 140K Nov 3 18:01 ovirt-hosted-engine-setup-ansible-create_storage_domain-20201103175958-qh0eed.log 416K -rw-r--r--. 1 root root 412K Nov 3 18:12 ovirt-hosted-engine-setup-ansible-create_target_vm-20201103180431-1540en.log 120K -rw-r--r--. 1 root root 116K Nov 3 18:12 ovirt-hosted-engine-setup-ansible-final_clean-20201103181213-rkih66.log 104K -rw-r--r--. 1 root root 100K Nov 3 17:34 ovirt-hosted-engine-setup-ansible-get_network_interfaces-20201103173410-p1h5g8.log 252K -rw-r--r--. 1 root root 245K Nov 3 17:38 ovirt-hosted-engine-setup-ansible-initial_clean-20201103173702-5ych7f.log ovirt-hosted-engine-setup-2.4.8-1.el8ev.noarch ovirt-hosted-engine-ha-2.4.5-1.el8ev.noarch
alma07 ~]# ll -lsha /var/log/ovirt-hosted-engine-setup/engine-logs-2020-11-03T16:04:34Z total 176K 4.0K drwx------. 3 root root 4.0K Nov 3 18:12 . 4.0K drwx------. 4 root root 4.0K Nov 3 18:12 .. 164K -rw-r--r--. 1 root root 158K Nov 3 18:12 messages 4.0K drwx------. 12 108 108 4.0K Nov 3 17:54 ovirt-engine alma07 ~]# ll -lsha /var/log/ovirt-hosted-engine-setup/engine-logs-2020-11-03T16:04:34Z/ovirt-engine total 2.0M 4.0K drwx------. 12 108 108 4.0K Nov 3 17:54 . 4.0K drwx------. 3 root root 4.0K Nov 3 18:12 .. 4.0K drwx------. 2 108 108 4.0K Sep 14 18:43 ansible 1.4M -rw-r--r--. 1 108 108 1.4M Nov 3 17:56 ansible-runner-service.log 8.0K -rw-r--r--. 1 108 108 5.3K Nov 3 17:53 boot.log 4.0K drwx------. 2 108 108 4.0K Sep 14 18:43 brick-setup 4.0K drwx------. 2 108 108 4.0K Sep 14 18:43 cinderlib 4.0K -rw-r--r--. 1 108 108 669 Nov 3 17:53 console.log 4.0K drwx------. 2 108 108 4.0K Sep 14 18:43 db-manual 4.0K drwx------. 2 108 108 4.0K Sep 14 18:43 dump 448K -rw-r--r--. 1 108 108 444K Nov 3 18:06 engine.log 4.0K drwx------. 2 108 108 4.0K Nov 3 17:54 host-deploy 4.0K drwx------. 2 108 108 4.0K Sep 14 18:43 notifier 4.0K drwx------. 2 108 108 4.0K Sep 14 18:43 ova 4.0K drwxr-xr-x. 2 root root 4.0K Jul 29 09:39 ovirt-log-collector 100K -rw-r--r--. 1 108 108 96K Nov 3 18:06 server.log 4.0K drwx------. 2 root root 4.0K Nov 3 17:48 setup 0 -rw-r--r--. 1 108 108 0 Nov 3 17:51 ui.log