Description of problem: I created a backup for my hosted-engine environment and wanted to restore it on a freshly EL CentOS 7 setup. The problem is in the documentation it is mentioned that during the hosted-engine --deploy setup you get asked if you want to execute the automtatic setup of the engine and so on. But in the oVirt 4.2 self hosted engine setup you only get asked a minimum of questions and then everything is all setup and finished. And i cannot start the restore backup, since the hosted engine is already active. So is there a new step I dont know about since the documentation clearly does not reflect the current deployment process? Best regards Matthias Version-Release number of selected component (if applicable): ovirt 4.2 How reproducible: 100% Steps to Reproduce: 1. Prepare a fresh host for hosted engine deployment 2. Start hsoted-engine --deploy 3. Finisheds successfully and no restore backup is possible Actual results: Expected results: Additional info:
Can you please provide the link to documentation you read these instructions from ?
Hello. I used the online documentation of oVirt: https://www.ovirt.org/documentation/self-hosted/chap-Backing_up_and_Restoring_an_EL-Based_Self-Hosted_Environment/ The problem is, if I deploy the HE deployment, it runs through without stopping at the question about auto setup. So I used a workaround for the "restoring" of the backup. I deployed a new HE and afterwards I imported the data domains (FibreChannel) Then I was able to import my VMs again. But this is only a workaround, the normal restore should be done over the deployment right? BG
Yes, the normal restore should be done using the vintage flow, using the '--noansible' option. The documentation has been updated: https://github.com/oVirt/ovirt-site/pull/1594
Do we also have a bug/RFE to allow restore using the ansible/node-zero flow? I think we need one.
1.Deployed Node 0 and added one data storage domain. 2.Backed up the engine. 3.Copied backup to safe place. 4.Reprovisioned the host and wiped out SHE-storage domain. 5.Deployed vintage-SHE and answered "Automatically execute engine-setup on the engine appliance on first boot (Yes, No)[Yes]? no". 6.Connected to engine and restored the DB using backup files collected in step 3. 7.Failed to get active host in UI and answered "Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? Ignore" on host's SHE deployment process. 8.Deployment continued and successfully finished. 9.I see that engine is running, but in UI I see the host is nonoperational and I don't see hosted-storage that was previously existing when I've deployed Node 0 and made a backup. It seems that host is unable to identify the OVF_STORE volume. Sosreport from host and engine attached. Restarting agent and broker did not resolved the issue. alma04 ~]# systemctl status ovirt-ha-agent -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2018-05-14 19:39:08 IDT; 1min 10s ago Main PID: 16995 (ovirt-ha-agent) Tasks: 2 CGroup: /system.slice/ovirt-ha-agent.service └─16995 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent May 14 19:39:08 alma04.qa.lab.tlv.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent. May 14 19:39:08 alma04.qa.lab.tlv.redhat.com systemd[1]: Starting oVirt Hosted Engine High Availability Monitoring Agent... May 14 19:39:31 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[16995]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs May 14 19:39:32 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[16995]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs May 14 19:39:41 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[16995]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs May 14 19:39:52 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[16995]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs May 14 19:40:02 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[16995]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs May 14 19:40:12 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[16995]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Created attachment 1436363 [details] sosreport from the engine
Created attachment 1436364 [details] sosreport from alma04
Restore on engine was made using command like this: engine-backup --mode=restore --log=/root/Log_nsednev --file=/root/nsednev --provision-db --provision-dwh-db --provision-reports-db --restore-permissions --he-remove-hosts --he-remove-storage-vm
Works for me with this command as it appears in the documentation: engine-backup --mode=restore --log=/var/log/restore-engine.log --file=/root/engine-backup --provision-db --provision-dwh-db --provision-reports-db --restore-permissions
Please provide ":Fixed In Version:" entry value.
For resore on the same pair of hosts, I've reprovisioned them and also wiped out the iSCSI target. Then used "engine-backup --mode=restore --log=/root/Log_nsednev_from_alma03_rhevm_4_2_4_1 --file=/root/nsednev_from_alma03_rhevm_4_2_4_1 --provision-db --provision-dwh-db --provision-reports-db --restore-permissions --he-remove-hosts " on engine for restore and continued with the deployment on host. After restoring the engine and continuing with the deployment on host, I've seen this error: [ ERROR ] Cannot automatically set CPU level of cluster Default: General command validation failure. The restore finished with success on engine, then deployment finished too, but hosted storage was not recovered as UP in UI as can be seen in attached screenshot. [ INFO ] Hosted Engine successfully deployed Tested on these components: rhvm-appliance-4.2-20180601.0.el7.noarch ovirt-hosted-engine-ha-2.2.13-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.22-1.el7ev.noarch Linux 3.10.0-862.6.1.el7.x86_64 #1 SMP Mon Jun 4 15:33:25 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) ovirt-engine-setup-4.2.4.1-0.1.el7.noarch Moving back to assigned.
Created attachment 1448306 [details] Screenshot from 2018-06-06 16-04-29.png
Created attachment 1448307 [details] broker.log
Created attachment 1448308 [details] agent.log
Created attachment 1448309 [details] vdsm.log
Created attachment 1448310 [details] ovirt-hosted-engine-setup.log
Created attachment 1448323 [details] screenshot of active ha-host with inactive score
Created attachment 1448325 [details] sosreport from alma04 ha host
Created attachment 1448326 [details] sosreport from the engine
It works for me. The difference I see between our environments is that you are using 'Intel SandyBridge IBRS Family' and I'm using 'Intel SandyBridge Family' which might cause this issue. from your ovirt-hosted-engine-setup log: 2018-06-06 15:44:03,809+0300 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._closeup:674 {'name': None, 'level': None, 'cpu_tune': None, 'architecture': 'x86_64', 'mode': None, 'cores': None, 'speed': None, 'id': 'Intel SandyBridge IBRS Family', 'topology': None} 2018-06-06 15:44:05,000+0300 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._closeup:685 Cannot set CPU level of cluster Default Traceback (most recent call last): File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/engine/add_host.py", line 679, in _closeup cluster.update() File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/brokers.py", line 363, in update headers={"Correlation-Id":correlation_id, "Expect":expect} File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", line 68, in update return self.request('PUT', url, body, headers, cls=cls) File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", line 122, in request persistent_auth=self.__persistent_auth File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py", line 79, in do_request persistent_auth) File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py", line 162, in __do_request raise errors.RequestError(response_code, response_reason, response_body) RequestError: status: 500 reason: Internal Server Error detail: General command validation failure. 2018-06-06 15:44:05,078+0300 ERROR otopi.plugins.gr_he_setup.engine.add_host add_host._closeup:693 Cannot automatically set CPU level of cluster Default: General command validation failure. from my ovirt-hosted-engine-setup log: 2018-06-07 04:18:15,888-0400 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._closeup:674 {'name': None, 'level': None, 'cpu_tune': None, 'architecture': 'x86_64', 'mode': None, 'cores': None, 'speed': None, 'id': 'Intel SandyBridge Family', 'topology': None} 2018-06-07 04:18:17,337-0400 DEBUG otopi.context context.dumpEnvironment:859 ENVIRONMENT DUMP - BEGIN 2018-06-07 04:18:17,338-0400 DEBUG otopi.context context.dumpEnvironment:869 ENV CORE/modifiedFiles=list:'['/etc/vdsm/vdsm.conf', '/var/run/ovirt-hosted-engine-ha/vm.conf', '/etc/ovirt-hosted-engine/hosted-engine.conf', '/etc/sysconfig/iptables', '/etc/ovirt-hosted-engine/iptables.example', '/etc/ovirt-hosted-engine/firewalld/hosted-cockpit.xml', '/e I suggest to verify the Restore operation on a regular CPU family (e.g. Intel SandyBridge Family) and afterwards to try with IBRS to isolate the problem.
The issue being consistently reproduced on downstream packages on my environment. Returning back to assigned.
I've followed my previous steps and failed even to get connected to HE-VM over remote console using VNC: 1.Deployed Node 0 and added one data storage domain. 2.Backed up the engine. 3.Copied backup to safe place. 4.Reprovisioned the host and wiped out SHE-storage domain. 5.Deployed vintage-SHE and answered "Automatically execute engine-setup on the engine appliance on first boot (Yes, No)[Yes]? no". 6.Failed to get any response from VM over remote console, plese see the screencast for more details.
Created attachment 1492939 [details] screencast of failed response from remote console
After some more delay the console responded and VM got booted, but I could not log in in to it. Tested on these components: ovirt-engine-appliance-4.2-20181010.1.el7.noarch ovirt-hosted-engine-setup-2.2.29-0.0.master.20181002122252.git9ae169e.el7.noarch ovirt-hosted-engine-ha-2.2.19-0.0.master.20181002122327.20181002122322.gitb449616.el7.noarch Ryan, can someone check the username/password regarding this issue on the appliance ovirt-engine-appliance-4.2-20181010.1.el7.noarch?
Apparently there is an issue with the appliance's default username/password.
The appliance uses cloud-init for configuration and doesn't have a documented/usable "default" password. Cloud-init must be used
(In reply to Nikolai Sednev from comment #23) > I've followed my previous steps and failed even to get connected to HE-VM > over remote console using VNC: > 1.Deployed Node 0 and added one data storage domain. > 2.Backed up the engine. > 3.Copied backup to safe place. > 4.Reprovisioned the host and wiped out SHE-storage domain. > 5.Deployed vintage-SHE ^^^^^^^ Please note that due to https://bugzilla.redhat.com/1620314 in your case this will fail for sure. If you initially deployed with node 0 flow, you should redeploy with node 0 as well restoring the backup: now it supports also an automatic restore procedure. You have to execute: hosted-engine --deploy --restore-from-file=mybackup.tar.gz > and answered "Automatically execute engine-setup on > the engine appliance on first boot (Yes, No)[Yes]? no". > 6.Failed to get any response from VM over remote console, plese see the > screencast for more details.
Hi Simone, Please add 'priority' to the bug.
(In reply to Ryan Barry from comment #27) > The appliance uses cloud-init for configuration and doesn't have a > documented/usable "default" password. Cloud-init must be used It was used, but nothing happened.
(In reply to Simone Tiraboschi from comment #28) > (In reply to Nikolai Sednev from comment #23) > > I've followed my previous steps and failed even to get connected to HE-VM > > over remote console using VNC: > > 1.Deployed Node 0 and added one data storage domain. > > 2.Backed up the engine. > > 3.Copied backup to safe place. > > 4.Reprovisioned the host and wiped out SHE-storage domain. > > 5.Deployed vintage-SHE > ^^^^^^^ > > Please note that due to https://bugzilla.redhat.com/1620314 in your case > this will fail for sure. > If you initially deployed with node 0 flow, you should redeploy with node 0 > as well restoring the backup: now it supports also an automatic restore > procedure. > You have to execute: > > hosted-engine --deploy --restore-from-file=mybackup.tar.gz > > > and answered "Automatically execute engine-setup on > > the engine appliance on first boot (Yes, No)[Yes]? no". > > 6.Failed to get any response from VM over remote console, plese see the > > screencast for more details. In this very bug I was asked especially about restoring on vintage from node 0, if not supported, then there have to be some limitation within the code.
http://pastebin.test.redhat.com/657084 1.Deployed Node 0 and added one data storage domain. 2.Added one additional ha-host. 3.Created one guest VM. 4.Backed up the engine. 5.Copied backup to safe place. 6.Reprovisioned the host and wiped out SHE-storage domain. 7.Deployed Node0 using restore functionality from backed up file. 8.Restore failed with following error: alma03 ~]# hosted-engine --deploy --restore-from-file /root/nsednev_from_alma03_rhevm_4_2_7_node_0_nfs . . . [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"} [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook [ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch. Logs from host attached. Tested on these components: ovirt-hosted-engine-ha-2.2.19-0.0.master.20181002122327.20181002122322.gitb449616.el7.noarch ovirt-hosted-engine-setup-2.2.29-0.0.master.20181002122252.git9ae169e.el7.noarch ovirt-engine-appliance-4.2-20181013.1.el7.noarch
Created attachment 1493704 [details] sosreport from alma03
The real issue happened here: [ INFO ] TASK [Wait for the host to be up] [ INFO ] ok: [localhost] [ INFO ] TASK [Check host status] [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, fix accordingly and re-deploy.\n"} Now the question is why the host has been set as not operational. Unfortunately under sosreport-alma03.qa.lab.tlv.redhat.com-20181014162800/var/log/ovirt-hosted-engine-setup/ in the sos report we have just a not that relevant file. If you still have the environment, can you please collect the whole /var/log/ovirt-hosted-engine-setup/ ? According to https://github.com/sosreport/sos/issues/1243 that sos report issue has been fixed in May. Can you please report your sos version?
(In reply to Nikolai Sednev from comment #31) > In this very bug I was asked especially about restoring on vintage from node > 0, if not supported, then there have to be some limitation within the code. Yes, --restore-from-file is conflicting with --noansible
I think the original issue has been fixed, maybe we should close this and open a new one about the non-operational host issue.
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "alma03.qa.lab.tlv.redhat.com", "affinity_labels": [], "auto_numa_status": "disable", "certificate": {"organization": "qa.lab.tlv.redhat.com", "subject": "O=qa.lab.tlv.redhat.com,CN=alma03.qa.lab.tlv.redhat.com"}, "cluster": {"href": "/ovirt-engine/api/clusters/cf113d58-d5fb-11e8-92ec-00163e7bb853", "id": "cf113d58-d5fb-11e8-92ec-00163e7bb853"}, "comment": "", "cpu": {"name": "Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz", "speed": 1800.0, "topology": {"cores": 4, "sockets": 1, "threads": 1}}, "device_passthrough": {"enabled": true}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"family": "Server", "manufacturer": "Dell Inc.", "product_name": "PowerEdge C6220 II", "serial_number": "7YDSW02", "supported_rng_sources": ["hwrng", "random"], "uuid": "4C4C4544-0059-4410-8053-B7C04F573032"}, "hooks": [], "href": "/ovirt-engine/api/hosts/0b0ef6c8-e2db-4454-a3fe-772c1a1aa5c4", "id": "0b0ef6c8-e2db-4454-a3fe-772c1a1aa5c4", "iscsi": {"initiator": "iqn.1994-05.com.redhat:7cf9a6c4224b"}, "katello_errata": [], "kdump_status": "disabled", "ksm": {"enabled": false}, "libvirt_version": {"build": 0, "full_version": "libvirt-4.5.0-10.el7_6.2", "major": 4, "minor": 5, "revision": 0}, "max_scheduling_memory": 15737028608, "memory": 33564917760, "name": "alma03.qa.lab.tlv.redhat.com", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": {"custom_kernel_cmdline": "", "reported_kernel_cmdline": "BOOT_IMAGE=/vmlinuz-3.10.0-957.el7.x86_64 root=UUID=ed329962-5cd4-47d6-8c45-14b3eae4a610 ro nofb intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024 console=tty0 crashkernel=auto biosdevname=0s console=ttyS1,115200 LANG=en_US.UTF-8", "type": "RHEL", "version": {"full_version": "7.6 - 4.el7", "major": 7, "minor": 6}}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": true, "kdump_detection": true, "pm_proxies": [{"type": "cluster"}, {"type": "dc"}]}, "protocol": "stomp", "se_linux": {"mode": "enforcing"}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:ainP4gYFc2dburBuV2o4+6KF2qC8JSAUHiRHbMsVnZM", "port": 22}, "statistics": [], "status": "non_responsive", "storage_connection_extensions": [], "summary": {"active": 1, "migrating": 0, "total": 1}, "tags": [], "transparent_huge_pages": {"enabled": true}, "type": "rhel", "unmanaged_networks": [], "update_available": false, "version": {"build": 43, "full_version": "vdsm-4.20.43-1.el7ev", "major": 4, "minor": 20, "revision": 0}}]}, "attempts": 120, "changed": false, "deprecations": [{"msg": "The 'ovirt_hosts_facts' module is being renamed 'ovirt_host_facts'", "version": 2.8}]} the issue in your case is that hw_uuid column contains an upper case string, while it's lowercase in vds_unique_id. [root@nsednev-he-1 ~]# sudo -u postgres scl enable rh-postgresql95 -- psql -d engine -c "SELECT vds_name, hw_uuid, vds_id, vds_unique_id, host_name FROM vds" could not change directory to "/root": Permission denied vds_name | hw_uuid | vds_id | vds_unique_id | host_name ------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+------------------------------ alma04.qa.lab.tlv.redhat.com | 4C4C4544-0059-4410-8054-B8C04F573032 | 0f8e942f-b00a-4f1b-b483-e4c43a2b4d37 | 4c4c4544-0059-4410-8054-b8c04f573032 | alma04.qa.lab.tlv.redhat.com alma03.qa.lab.tlv.redhat.com | 4C4C4544-0059-4410-8053-B7C04F573032 | 0b0ef6c8-e2db-4454-a3fe-772c1a1aa5c4 | 4c4c4544-0059-4410-8053-b7c04f573032 | alma03.qa.lab.tlv.redhat.com Moving back to assigned. ovirt-hosted-engine-setup-2.2.29-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.18-1.el7ev.noarch rhvm-appliance-4.2-20181018.0.el7.noarch Forth to Simone's comment, the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1642440 moving to vds_unique_id will cover also this case.
[root@alma04 ~]# vdsm-client Host getHardwareInfo { "systemProductName": "PowerEdge C6220 II", "systemUUID": "4C4C4544-0059-4410-8054-B8C04F573032", "systemSerialNumber": "8YDTW02", "systemFamily": "Server", "systemManufacturer": "Dell Inc." } The vdsm makes it uppercase.
When uppercase changed manually to lower case, the restore on the same storage has also failed, since it's already in the DB of the engine. Currently there is a limitation that restore should be made on different mount point when using NFS or Gluster.
OK, so the restore worked on Node0 to Node0 NFS to NFS, only when capital letters were manually changed to lower case, the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1642440 and I also had to use different (new) storage mount point, forth to limitation which I've added here https://bugzilla.redhat.com/show_bug.cgi?id=1420604.
Failed to restore using these components: ovirt-hosted-engine-setup-2.2.30-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.18-1.el7ev.noarch Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.6 (Maipo) [ INFO ] TASK [Wait for OVF_STORE disk content] [ ERROR ] {u'_ansible_parsed': True, u'stderr_lines': [u'20+0 records in', u'20+0 records out', u'10240 bytes (10 kB) copied, 0.000131806 s, 77.7 MB/s', u'tar: bffc5624-14c2-4f66-9854-33e981604ca5.ovf: Not found in archive', u'tar: Exiting with failure status due to previous errors'], u'changed': True, u'end': u'2018-10-29 13:57:35.511616', u'_ansible_item_label': {u'image_id': u'2abdc4c0-d8d7-40e8-9123-b1eeeb43283b', u'name': u'OVF_STORE', u'id': u'8cf670a2-c573-4a62-9829-07359dd09b84'}, u'stdout': u'', u'failed': True, u'_ansible_item_result': True, u'msg': u'non-zero return code', u'rc': 2, u'start': u'2018-10-29 13:57:34.894457', u'attempts': 12, u'cmd': u"vdsm-client Image prepare storagepoolID=ebda3e8a-da8d-11e8-8a73-00163e7bb853 storagedomainID=c1078875-3b6d-45da-8ee5-6f0eb8efc52c imageID=8cf670a2-c573-4a62-9829-07359dd09b84 volumeID=2abdc4c0-d8d7-40e8-9123-b1eeeb43283b | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - bffc5624-14c2-4f66-9854-33e981604ca5.ovf", u'item': {u'image_id': u'2abdc4c0-d8d7-40e8-9123-b1eeeb43283b', u'name': u'OVF_STORE', u'id': u'8cf670a2-c573-4a62-9829-07359dd09b84'}, u'delta': u'0:00:00.617159', u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"vdsm-client Image prepare storagepoolID=ebda3e8a-da8d-11e8-8a73-00163e7bb853 storagedomainID=c1078875-3b6d-45da-8ee5-6f0eb8efc52c imageID=8cf670a2-c573-4a62-9829-07359dd09b84 volumeID=2abdc4c0-d8d7-40e8-9123-b1eeeb43283b | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - bffc5624-14c2-4f66-9854-33e981604ca5.ovf", u'removes': None, u'argv': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout_lines': [], u'stderr': u'20+0 records in\n20+0 records out\n10240 bytes (10 kB) copied, 0.000131806 s, 77.7 MB/s\ntar: bffc5624-14c2-4f66-9854-33e981604ca5.ovf: Not found in archive\ntar: Exiting with failure status due to previous errors', u'_ansible_no_log': False} [ ERROR ] {u'_ansible_parsed': True, u'stderr_lines': [u'20+0 records in', u'20+0 records out', u'10240 bytes (10 kB) copied, 0.000111758 s, 91.6 MB/s', u'tar: bffc5624-14c2-4f66-9854-33e981604ca5.ovf: Not found in archive', u'tar: Exiting with failure status due to previous errors'], u'changed': True, u'end': u'2018-10-29 13:59:47.536466', u'_ansible_item_label': {u'image_id': u'abbe3c72-2f45-46db-8f11-750ae5053aa8', u'name': u'OVF_STORE', u'id': u'5e3bf7b6-26eb-4411-9635-2a476f899661'}, u'stdout': u'', u'failed': True, u'_ansible_item_result': True, u'msg': u'non-zero return code', u'rc': 2, u'start': u'2018-10-29 13:59:46.918694', u'attempts': 12, u'cmd': u"vdsm-client Image prepare storagepoolID=ebda3e8a-da8d-11e8-8a73-00163e7bb853 storagedomainID=c1078875-3b6d-45da-8ee5-6f0eb8efc52c imageID=5e3bf7b6-26eb-4411-9635-2a476f899661 volumeID=abbe3c72-2f45-46db-8f11-750ae5053aa8 | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - bffc5624-14c2-4f66-9854-33e981604ca5.ovf", u'item': {u'image_id': u'abbe3c72-2f45-46db-8f11-750ae5053aa8', u'name': u'OVF_STORE', u'id': u'5e3bf7b6-26eb-4411-9635-2a476f899661'}, u'delta': u'0:00:00.617772', u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"vdsm-client Image prepare storagepoolID=ebda3e8a-da8d-11e8-8a73-00163e7bb853 storagedomainID=c1078875-3b6d-45da-8ee5-6f0eb8efc52c imageID=5e3bf7b6-26eb-4411-9635-2a476f899661 volumeID=abbe3c72-2f45-46db-8f11-750ae5053aa8 | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - bffc5624-14c2-4f66-9854-33e981604ca5.ovf", u'removes': None, u'argv': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout_lines': [], u'stderr': u'20+0 records in\n20+0 records out\n10240 bytes (10 kB) copied, 0.000111758 s, 91.6 MB/s\ntar: bffc5624-14c2-4f66-9854-33e981604ca5.ovf: Not found in archive\ntar: Exiting with failure status due to previous errors', u'_ansible_no_log': False} [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook [ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch. Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20181029132147-tph9ob.log
Created attachment 1498545 [details] hosted engine setup logs
Created attachment 1498546 [details] sosreport from alma04
(In reply to Nikolai Sednev from comment #41) > Failed to restore using these components: > ovirt-hosted-engine-setup-2.2.30-1.el7ev.noarch > ovirt-hosted-engine-ha-2.2.18-1.el7ev.noarch > Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64 > x86_64 GNU/Linux > Red Hat Enterprise Linux Server release 7.6 (Maipo) > > [ INFO ] TASK [Wait for OVF_STORE disk content] Here it seams that due to any possible reason the engine never correctly uploaded the OVF_STORE content on the new hosted-engine storage domain.
1.Deployed node0 over NFS. 2.Attached one ha-host. 3.Added one ISO domain. 4.Added one NFS data storage domain. 5.Created one guest VM on alma03 (SPM host). 6.Migrated HE-VM to alma04 (none SPM host). 7.Set environment in to global maintenance. 8.Retrieved backup files from the engine. 9.Reprovisioned alma04 (none SPM). 10.Copied backup file to clean alma04. 11.Installed ovirt-hosted-engine-setup on alma04. 12.Tied to restore the engine during deployment using "hosted-engine --deploy --restore-from-file=/root/nsednev_from_alma04_rhevm_4_2_7_nospm" on alma04 on another NFS mount point. 13.Failed.
Created attachment 1498551 [details] engine.log
Let's focus on the latest engine.log: engine VM has been correctly created: 2018-10-29 13:53:58,116+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-27) [] EVENT_ID: USER_ADD_VM(34), VM HostedEngine was created by admin@internal-authz. and editing it correctly triggered an OVF update: 2018-10-29 13:55:06,456+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-57) [2042de45-2539-4a2a-bd04-f4d4902f2a12] EVENT_ID: USER_UPDATE_VM(35), VM HostedEngine configuration was updated by admin@internal-authz. 2018-10-29 13:55:06,460+02 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Before acquiring and wait lock 'EngineLock:{exclusiveLocks='[ebda3e8a-da8d-11e8-8a73-00163e7bb853=OVF_UPDATE]', sharedLocks=''}' 2018-10-29 13:55:06,461+02 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Lock-wait acquired to object 'EngineLock:{exclusiveLocks='[ebda3e8a-da8d-11e8-8a73-00163e7bb853=OVF_UPDATE]', sharedLocks=''}' 2018-10-29 13:55:06,462+02 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Running command: ProcessOvfUpdateForStoragePoolCommand internal: true. Entities affected : ID: ebda3e8a-da8d-11e8-8a73-00163e7bb853 Type: StoragePool 2018-10-29 13:55:06,465+02 INFO [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-57) [2042de45-2539-4a2a-bd04-f4d4902f2a12] Lock freed to object 'EngineLock:{exclusiveLocks='[HostedEngine=VM_NAME]', sharedLocks='[bffc5624-14c2-4f66-9854-33e981604ca5=VM]'}' 2018-10-29 13:55:06,474+02 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Attempting to update VM OVFs in Data Center 'Default' 2018-10-29 13:55:06,543+02 WARN [org.ovirt.engine.core.vdsbroker.builder.vminfo.VmInfoBuildUtils] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] No host NUMA nodes found for vm HostedEngine (bffc5624-14c2-4f66-9854-33e981604ca5) 2018-10-29 13:55:06,590+02 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Successfully updated VM OVFs in Data Center 'Default' 2018-10-29 13:55:06,590+02 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Attempting to update template OVFs in Data Center 'Default' 2018-10-29 13:55:06,594+02 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Successfully updated templates OVFs in Data Center 'Default' 2018-10-29 13:55:06,595+02 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Attempting to remove unneeded template/vm OVFs in Data Center 'Default' 2018-10-29 13:55:06,615+02 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Successfully removed unneeded template/vm OVFs in Data Center 'Default' 2018-10-29 13:55:06,616+02 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Lock freed to object 'EngineLock:{exclusiveLocks='[ebda3e8a-da8d-11e8-8a73-00163e7bb853=OVF_UPDATE]', sharedLocks=''}' 2018-10-29 13:55:06,666+02 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] Lock Acquired to object 'EngineLock:{exclusiveLocks='[bb2288b9-0303-483c-888b-2bc4c4c6a351=STORAGE]', sharedLocks='[ebda3e8a-da8d-11e8-8a73-00163e7bb853=OVF_UPDATE]'}' 2018-10-29 13:55:06,673+02 WARN [org.ovirt.engine.core.vdsbroker.builder.vminfo.VmInfoBuildUtils] (default task-57) [2042de45-2539-4a2a-bd04-f4d4902f2a12] No host NUMA nodes found for vm HostedEngine (bffc5624-14c2-4f66-9854-33e981604ca5) 2018-10-29 13:55:06,733+02 INFO [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] Running command: ProcessOvfUpdateForStorageDomainCommand internal: true. Entities affected : ID: bb2288b9-0303-483c-888b-2bc4c4c6a351 Type: StorageAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN 2018-10-29 13:55:06,772+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] START, SetVolumeDescriptionVDSCommand( SetVolumeDescriptionVDSCommandParameters:{storagePoolId='ebda3e8a-da8d-11e8-8a73-00163e7bb853', ignoreFailoverLimit='false', storageDomainId='bb2288b9-0303-483c-888b-2bc4c4c6a351', imageGroupId='aede6f87-ed75-4f5b-92f8-776cc62bdf3e', imageId='994c1f99-77bc-4df9-a686-8c2ba197211f'}), log id: 532068ec 2018-10-29 13:55:06,773+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] -- executeIrsBrokerCommand: calling 'setVolumeDescription', parameters: 2018-10-29 13:55:06,773+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ spUUID=ebda3e8a-da8d-11e8-8a73-00163e7bb853 2018-10-29 13:55:06,773+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ sdUUID=bb2288b9-0303-483c-888b-2bc4c4c6a351 2018-10-29 13:55:06,773+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ imageGroupGUID=aede6f87-ed75-4f5b-92f8-776cc62bdf3e 2018-10-29 13:55:06,773+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ volUUID=994c1f99-77bc-4df9-a686-8c2ba197211f 2018-10-29 13:55:06,773+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ description={"Updated":false,"Last Updated":"Sun Oct 28 12:12:24 IST 2018","Storage Domains":[{"uuid":"bb2288b9-0303-483c-888b-2bc4c4c6a351"}],"Disk Description":"OVF_STORE"} 2018-10-29 13:55:06,816+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] FINISH, SetVolumeDescriptionVDSCommand, log id: 532068ec and the OVF_STORE content has been uploaded trough alma03 which was the SPM host (the user was redeploying on alma04): 2018-10-29 13:55:06,958+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] START, UploadStreamVDSCommand(HostName = alma03.qa.lab.tlv.redhat.com, UploadStreamVDSCommandParameters:{hostId='f2897fa0-0372-4fdb-9088-475002bb2a91'}), log id: 478e0b1e 2018-10-29 13:55:06,958+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] -- executeVdsBrokerCommand, parameters: 2018-10-29 13:55:06,958+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ spUUID=ebda3e8a-da8d-11e8-8a73-00163e7bb853 2018-10-29 13:55:06,958+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ sdUUID=bb2288b9-0303-483c-888b-2bc4c4c6a351 2018-10-29 13:55:06,958+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ imageGUID=aede6f87-ed75-4f5b-92f8-776cc62bdf3e 2018-10-29 13:55:06,958+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ volUUID=994c1f99-77bc-4df9-a686-8c2ba197211f 2018-10-29 13:55:06,958+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ size=30720 2018-10-29 13:55:07,215+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] FINISH, UploadStreamVDSCommand, log id: 478e0b1e
Everything went as expected on alma03: 2018-10-29 13:55:44,905+02 INFO [org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand] (EE-ManagedThreadFactory-engine-Thread-261) [5811be50] Ending command 'org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand' successfully. 2018-10-29 13:55:44,913+02 INFO [org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand] (EE-ManagedThreadFactory-engine-Thread-262) [5811be50] Ending command 'org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand' successfully. 2018-10-29 13:55:44,924+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-261) [5811be50] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'UploadStream' completed, handling the result. 2018-10-29 13:55:44,924+02 INFO [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-261) [5811be50] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'UploadStream' succeeded, clearing tasks. 2018-10-29 13:55:44,924+02 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-261) [5811be50] SPMAsyncTask::ClearAsyncTask: Attempting to clear task '920dafb3-98eb-43d8-9683-81c59117064c' but then if we check the content of the OVF_STORE volume on host side: [root@alma04 ~]# vdsm-client Image prepare storagepoolID=ebda3e8a-da8d-11e8-8a73-00163e7bb853 storagedomainID=c1078875-3b6d-45da-8ee5-6f0eb8efc52c imageID=5e3bf7b6-26eb-4411-9635-2a476f899661 volumeID=abbe3c72-2f45-46db-8f11-750ae5053aa8 | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -xvf - 20+0 records in 20+0 records out 10240 bytes (10 kB) copied, 0,000141984 s, 72,1 MB/s info.json metadata.json [root@alma04 ~]# cat info.json { "Last Updated" : "Mon Oct 29 13:55:37 IST 2018", "Storage Domains" : [ { "uuid" : "c1078875-3b6d-45da-8ee5-6f0eb8efc52c" } ] }[root@alma04 ~]# cat metadata.json { "VM Status" : { "f51ec4cd-5ab2-4a58-b13e-15314c8b296d" : 1 } }
This seams an engine bug to me: the engine created an OVF_STORE without vm.ovf, not sure if reproducible. Can you please simply try another time on the same host with the same backup file?
Same error, also not working.
I tried reproducing as well on Nikolai's env with exactly the same code and exactly the same backup file and it worked as expected for me. I fear that we have some kind of barely reproducible bug on engine side that "sometimes" forgets to update OVF_STORE disk. I think it's the same issue that sometimes causes https://bugzilla.redhat.com/show_bug.cgi?id=1628836
Created attachment 1498613 [details] Successfully execution on Nikolai's env
Worked for me just fine from NFS to iSCSI and also from NFS to NFS, when restoring over clean host. Tested on these components: ovirt-hosted-engine-setup-2.2.30-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.18-1.el7ev.noarch rhvm-appliance-4.2-20181026.1.el7.noarch. Moving to verified. Previously found issues will be tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1644748.
This bugzilla is included in oVirt 4.2.7 Async 1 release, published on November 13th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.