Bug 1568841 - Cant restore hosted-engine backup at deployment
Summary: Cant restore hosted-engine backup at deployment
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: 2.2.16
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ovirt-4.2.7-1
: ---
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On: 1642440
Blocks: 1420604 1620314 1638096 1642811 ovirt-hosteded-engine-setup-2.2.32
TreeView+ depends on / blocked
 
Reported: 2018-04-18 10:17 UTC by Matthias
Modified: 2018-11-13 16:12 UTC (History)
9 users (show)

Fixed In Version: ovirt-hosted-engine-setup-2.2.31-1.el7ev.noarch.rpm
Doc Type: Bug Fix
Doc Text:
Documentation has been updated adding --noansible to the command line used within the hosted-engine restore backup procedure. See https://github.com/oVirt/ovirt-site/pull/1594/files
Clone Of:
Environment:
Last Closed: 2018-11-13 16:12:30 UTC
oVirt Team: Integration
Embargoed:
nsednev: needinfo+
rule-engine: ovirt-4.2+
rule-engine: blocker+


Attachments (Terms of Use)
sosreport from the engine (9.52 MB, application/x-xz)
2018-05-14 16:50 UTC, Nikolai Sednev
no flags Details
sosreport from alma04 (9.87 MB, application/x-xz)
2018-05-14 16:51 UTC, Nikolai Sednev
no flags Details
Screenshot from 2018-06-06 16-04-29.png (62.95 KB, image/png)
2018-06-06 13:07 UTC, Nikolai Sednev
no flags Details
broker.log (42.14 KB, text/plain)
2018-06-06 13:07 UTC, Nikolai Sednev
no flags Details
agent.log (42.88 KB, text/plain)
2018-06-06 13:07 UTC, Nikolai Sednev
no flags Details
vdsm.log (3.96 MB, text/plain)
2018-06-06 13:08 UTC, Nikolai Sednev
no flags Details
ovirt-hosted-engine-setup.log (525.82 KB, text/plain)
2018-06-06 13:09 UTC, Nikolai Sednev
no flags Details
screenshot of active ha-host with inactive score (87.02 KB, image/png)
2018-06-06 13:17 UTC, Nikolai Sednev
no flags Details
sosreport from alma04 ha host (9.95 MB, application/x-xz)
2018-06-06 13:21 UTC, Nikolai Sednev
no flags Details
sosreport from the engine (9.41 MB, application/x-xz)
2018-06-06 13:23 UTC, Nikolai Sednev
no flags Details
screencast of failed response from remote console (3.07 MB, application/octet-stream)
2018-10-11 13:49 UTC, Nikolai Sednev
no flags Details
sosreport from alma03 (9.37 MB, application/x-xz)
2018-10-14 13:33 UTC, Nikolai Sednev
no flags Details
hosted engine setup logs (165.53 KB, application/x-gzip)
2018-10-29 12:30 UTC, Nikolai Sednev
no flags Details
sosreport from alma04 (10.51 MB, application/x-xz)
2018-10-29 12:31 UTC, Nikolai Sednev
no flags Details
engine.log (336.63 KB, text/plain)
2018-10-29 12:58 UTC, Simone Tiraboschi
no flags Details
Successfully execution on Nikolai's env (494.34 KB, text/plain)
2018-10-29 15:51 UTC, Simone Tiraboschi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-site pull 1594 0 'None' closed Restore procedure done using the vintage flow 2021-01-15 11:01:48 UTC
Red Hat Bugzilla 1628836 0 urgent CLOSED hosted-engine VM created with node zero misses the console device 2021-02-22 00:41:40 UTC
oVirt gerrit 91835 0 'None' MERGED ansible: support restore procedure 2021-01-15 11:01:47 UTC
oVirt gerrit 94455 0 'None' MERGED ansible: support restore procedure 2021-01-15 11:01:48 UTC

Internal Links: 1628836

Description Matthias 2018-04-18 10:17:41 UTC
Description of problem:
I created a backup for my hosted-engine environment and wanted to restore it on a freshly EL CentOS 7 setup. 
The problem is in the documentation it is mentioned that during the hosted-engine --deploy setup you get asked if you want to execute the automtatic setup of the engine and so on. But in the oVirt 4.2 self hosted engine setup you only get asked a minimum of questions and then everything is all setup and finished. And i cannot start the restore backup, since the hosted engine is already active. 

So is there a new step I dont know about since the documentation clearly does not reflect the current deployment process?

Best regards
Matthias

Version-Release number of selected component (if applicable):
ovirt 4.2

How reproducible:
100%

Steps to Reproduce:
1. Prepare a fresh host for hosted engine deployment
2. Start hsoted-engine --deploy
3. Finisheds successfully and no restore backup is possible

Actual results:


Expected results:


Additional info:

Comment 1 Ido Rosenzwig 2018-04-22 11:07:05 UTC
Can you please provide the link to documentation you read these instructions from ?

Comment 2 Matthias 2018-04-23 06:18:27 UTC
Hello.
I used the online documentation of oVirt:

https://www.ovirt.org/documentation/self-hosted/chap-Backing_up_and_Restoring_an_EL-Based_Self-Hosted_Environment/

The problem is, if I deploy the HE deployment, it runs through without stopping at the question about auto setup. 

So I used a workaround for the "restoring" of the backup. I deployed a new HE and afterwards I imported the data domains (FibreChannel) Then I was able to import my VMs again. But this is only a workaround, the normal restore should be done over the deployment right?

BG

Comment 3 Ido Rosenzwig 2018-04-23 11:19:30 UTC
Yes, the normal restore should be done using the vintage flow, 
using the '--noansible' option. 

The documentation has been updated:
https://github.com/oVirt/ovirt-site/pull/1594

Comment 4 Yedidyah Bar David 2018-04-29 09:30:29 UTC
Do we also have a bug/RFE to allow restore using the ansible/node-zero flow? I think we need one.

Comment 5 Nikolai Sednev 2018-05-14 16:41:04 UTC
1.Deployed Node 0 and added one data storage domain.
2.Backed up the engine.
3.Copied backup to safe place.
4.Reprovisioned the host and wiped out SHE-storage domain.
5.Deployed vintage-SHE and answered "Automatically execute engine-setup on the engine appliance on first boot (Yes, No)[Yes]? no".
6.Connected to engine and restored the DB using backup files collected in step 3.
7.Failed to get active host in UI and answered "Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? Ignore" on host's SHE deployment process.
8.Deployment continued and successfully finished.
9.I see that engine is running, but in UI I see the host is nonoperational and I don't see hosted-storage that was previously existing when I've deployed Node 0 and made a backup.

It seems that host is unable to identify the OVF_STORE volume.

Sosreport from host and engine attached.


Restarting agent and broker did not resolved the issue.

alma04 ~]# systemctl status ovirt-ha-agent -l 
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2018-05-14 19:39:08 IDT; 1min 10s ago
 Main PID: 16995 (ovirt-ha-agent)
    Tasks: 2
   CGroup: /system.slice/ovirt-ha-agent.service
           └─16995 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent

May 14 19:39:08 alma04.qa.lab.tlv.redhat.com systemd[1]: Started oVirt Hosted Engine High Availability Monitoring Agent.
May 14 19:39:08 alma04.qa.lab.tlv.redhat.com systemd[1]: Starting oVirt Hosted Engine High Availability Monitoring Agent...
May 14 19:39:31 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[16995]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs
May 14 19:39:32 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[16995]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs
May 14 19:39:41 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[16995]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs
May 14 19:39:52 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[16995]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs
May 14 19:40:02 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[16995]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs
May 14 19:40:12 alma04.qa.lab.tlv.redhat.com ovirt-ha-agent[16995]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config.vm ERROR Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs

Comment 6 Red Hat Bugzilla Rules Engine 2018-05-14 16:41:10 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 7 Nikolai Sednev 2018-05-14 16:50:13 UTC
Created attachment 1436363 [details]
sosreport from the engine

Comment 8 Nikolai Sednev 2018-05-14 16:51:10 UTC
Created attachment 1436364 [details]
sosreport from alma04

Comment 9 Nikolai Sednev 2018-05-14 16:53:40 UTC
Restore on engine was made using command like this:
engine-backup --mode=restore --log=/root/Log_nsednev --file=/root/nsednev --provision-db --provision-dwh-db --provision-reports-db --restore-permissions   --he-remove-hosts --he-remove-storage-vm

Comment 10 Ido Rosenzwig 2018-05-30 10:15:46 UTC
Works for me with this command as it appears in the documentation:

engine-backup --mode=restore --log=/var/log/restore-engine.log --file=/root/engine-backup --provision-db --provision-dwh-db --provision-reports-db --restore-permissions

Comment 11 Nikolai Sednev 2018-05-30 16:16:43 UTC
Please provide ":Fixed In Version:" entry value.

Comment 12 Nikolai Sednev 2018-06-06 13:06:40 UTC
For resore on the same pair of hosts, I've reprovisioned them and also wiped out the iSCSI target.
Then used "engine-backup --mode=restore --log=/root/Log_nsednev_from_alma03_rhevm_4_2_4_1 --file=/root/nsednev_from_alma03_rhevm_4_2_4_1 --provision-db --provision-dwh-db --provision-reports-db --restore-permissions --he-remove-hosts
" on engine for restore and continued with the deployment on host.

After restoring the engine and continuing with the deployment on host, I've seen this error:
[ ERROR ] Cannot automatically set CPU level of cluster Default:
         General command validation failure.

The restore finished with success on engine, then deployment finished too, but hosted storage was not recovered as UP in UI as can be seen in attached screenshot.

[ INFO  ] Hosted Engine successfully deployed


Tested on these components:
rhvm-appliance-4.2-20180601.0.el7.noarch
ovirt-hosted-engine-ha-2.2.13-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.22-1.el7ev.noarch
Linux 3.10.0-862.6.1.el7.x86_64 #1 SMP Mon Jun 4 15:33:25 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)
ovirt-engine-setup-4.2.4.1-0.1.el7.noarch

Moving back to assigned.

Comment 13 Nikolai Sednev 2018-06-06 13:07:02 UTC
Created attachment 1448306 [details]
Screenshot from 2018-06-06 16-04-29.png

Comment 14 Nikolai Sednev 2018-06-06 13:07:39 UTC
Created attachment 1448307 [details]
broker.log

Comment 15 Nikolai Sednev 2018-06-06 13:07:57 UTC
Created attachment 1448308 [details]
agent.log

Comment 16 Nikolai Sednev 2018-06-06 13:08:31 UTC
Created attachment 1448309 [details]
vdsm.log

Comment 17 Nikolai Sednev 2018-06-06 13:09:05 UTC
Created attachment 1448310 [details]
ovirt-hosted-engine-setup.log

Comment 18 Nikolai Sednev 2018-06-06 13:17:08 UTC
Created attachment 1448323 [details]
screenshot of active ha-host with inactive score

Comment 19 Nikolai Sednev 2018-06-06 13:21:15 UTC
Created attachment 1448325 [details]
sosreport from alma04 ha host

Comment 20 Nikolai Sednev 2018-06-06 13:23:49 UTC
Created attachment 1448326 [details]
sosreport from the engine

Comment 21 Ido Rosenzwig 2018-06-07 08:53:07 UTC
It works for me.
The difference I see between our environments is that you are using 'Intel SandyBridge IBRS Family' and I'm using 'Intel SandyBridge Family'
which might cause this issue.

from your ovirt-hosted-engine-setup log:

2018-06-06 15:44:03,809+0300 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._closeup:674 {'name': None, 'level': None, 'cpu_tune': None, 'architecture': 'x86_64', 'mode': None, 'cores': None, 'speed': None, 'id': 'Intel SandyBridge IBRS Family', 'topology': None}
2018-06-06 15:44:05,000+0300 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._closeup:685 Cannot set CPU level of cluster Default
Traceback (most recent call last):
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/engine/add_host.py", line 679, in _closeup
    cluster.update()
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/brokers.py", line 363, in update
    headers={"Correlation-Id":correlation_id, "Expect":expect}
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", line 68, in update
    return self.request('PUT', url, body, headers, cls=cls)
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", line 122, in request
    persistent_auth=self.__persistent_auth
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py", line 79, in do_request
    persistent_auth)
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py", line 162, in __do_request
    raise errors.RequestError(response_code, response_reason, response_body)
RequestError: 
status: 500
reason: Internal Server Error
detail: General command validation failure.
2018-06-06 15:44:05,078+0300 ERROR otopi.plugins.gr_he_setup.engine.add_host add_host._closeup:693 Cannot automatically set CPU level of cluster Default:
General command validation failure.


from my ovirt-hosted-engine-setup log:

2018-06-07 04:18:15,888-0400 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._closeup:674 {'name': None, 'level': None, 'cpu_tune': None, 'architecture': 'x86_64', 'mode': None, 'cores': None, 'speed': None, 'id': 'Intel SandyBridge Family', 'topology': None}
2018-06-07 04:18:17,337-0400 DEBUG otopi.context context.dumpEnvironment:859 ENVIRONMENT DUMP - BEGIN
2018-06-07 04:18:17,338-0400 DEBUG otopi.context context.dumpEnvironment:869 ENV CORE/modifiedFiles=list:'['/etc/vdsm/vdsm.conf', '/var/run/ovirt-hosted-engine-ha/vm.conf', '/etc/ovirt-hosted-engine/hosted-engine.conf', '/etc/sysconfig/iptables', '/etc/ovirt-hosted-engine/iptables.example', '/etc/ovirt-hosted-engine/firewalld/hosted-cockpit.xml', '/e

I suggest to verify the Restore operation on a regular CPU family (e.g. Intel SandyBridge Family) and afterwards to try with IBRS to isolate the problem.

Comment 22 Nikolai Sednev 2018-06-07 09:52:23 UTC
The issue being consistently reproduced on downstream packages on my environment.
Returning back to assigned.

Comment 23 Nikolai Sednev 2018-10-11 13:48:28 UTC
I've followed my previous steps and failed even to get connected to HE-VM over remote console using VNC:
1.Deployed Node 0 and added one data storage domain.
2.Backed up the engine.
3.Copied backup to safe place.
4.Reprovisioned the host and wiped out SHE-storage domain.
5.Deployed vintage-SHE and answered "Automatically execute engine-setup on the engine appliance on first boot (Yes, No)[Yes]? no".
6.Failed to get any response from VM over remote console, plese see the screencast for more details.

Comment 24 Nikolai Sednev 2018-10-11 13:49:29 UTC
Created attachment 1492939 [details]
screencast of failed response from remote console

Comment 25 Nikolai Sednev 2018-10-11 13:58:59 UTC
After some more delay the console responded and VM got booted, but I could not log in in to it.
Tested on these components:
ovirt-engine-appliance-4.2-20181010.1.el7.noarch
ovirt-hosted-engine-setup-2.2.29-0.0.master.20181002122252.git9ae169e.el7.noarch
ovirt-hosted-engine-ha-2.2.19-0.0.master.20181002122327.20181002122322.gitb449616.el7.noarch

Ryan, can someone check the username/password regarding this issue on the appliance ovirt-engine-appliance-4.2-20181010.1.el7.noarch?

Comment 26 Nikolai Sednev 2018-10-11 13:59:52 UTC
Apparently there is an issue with the appliance's default username/password.

Comment 27 Ryan Barry 2018-10-11 14:32:02 UTC
The appliance uses cloud-init for configuration and doesn't have a documented/usable "default" password. Cloud-init must be used

Comment 28 Simone Tiraboschi 2018-10-12 07:57:16 UTC
(In reply to Nikolai Sednev from comment #23)
> I've followed my previous steps and failed even to get connected to HE-VM
> over remote console using VNC:
> 1.Deployed Node 0 and added one data storage domain.
> 2.Backed up the engine.
> 3.Copied backup to safe place.
> 4.Reprovisioned the host and wiped out SHE-storage domain.
> 5.Deployed vintage-SHE 
             ^^^^^^^
             
Please note that due to https://bugzilla.redhat.com/1620314 in your case this will fail for sure.
If you initially deployed with node 0 flow, you should redeploy with node 0 as well restoring the backup: now it supports also an automatic restore procedure.
You have to execute:

  hosted-engine --deploy --restore-from-file=mybackup.tar.gz

> and answered "Automatically execute engine-setup on
> the engine appliance on first boot (Yes, No)[Yes]? no".
> 6.Failed to get any response from VM over remote console, plese see the
> screencast for more details.

Comment 29 Raz Tamir 2018-10-13 09:58:09 UTC
Hi Simone,

Please add 'priority' to the bug.

Comment 30 Nikolai Sednev 2018-10-14 07:31:09 UTC
(In reply to Ryan Barry from comment #27)
> The appliance uses cloud-init for configuration and doesn't have a
> documented/usable "default" password. Cloud-init must be used

It was used, but nothing happened.

Comment 31 Nikolai Sednev 2018-10-14 07:34:49 UTC
(In reply to Simone Tiraboschi from comment #28)
> (In reply to Nikolai Sednev from comment #23)
> > I've followed my previous steps and failed even to get connected to HE-VM
> > over remote console using VNC:
> > 1.Deployed Node 0 and added one data storage domain.
> > 2.Backed up the engine.
> > 3.Copied backup to safe place.
> > 4.Reprovisioned the host and wiped out SHE-storage domain.
> > 5.Deployed vintage-SHE 
>              ^^^^^^^
>              
> Please note that due to https://bugzilla.redhat.com/1620314 in your case
> this will fail for sure.
> If you initially deployed with node 0 flow, you should redeploy with node 0
> as well restoring the backup: now it supports also an automatic restore
> procedure.
> You have to execute:
> 
>   hosted-engine --deploy --restore-from-file=mybackup.tar.gz
> 
> > and answered "Automatically execute engine-setup on
> > the engine appliance on first boot (Yes, No)[Yes]? no".
> > 6.Failed to get any response from VM over remote console, plese see the
> > screencast for more details.

In this very bug I was asked especially about restoring on vintage from node 0, if not supported, then there have to be some limitation within the code.

Comment 32 Nikolai Sednev 2018-10-14 13:32:19 UTC
http://pastebin.test.redhat.com/657084

1.Deployed Node 0 and added one data storage domain.
2.Added one additional ha-host.
3.Created one guest VM.
4.Backed up the engine.
5.Copied backup to safe place.
6.Reprovisioned the host and wiped out SHE-storage domain.
7.Deployed Node0 using restore functionality from backed up file.
8.Restore failed with following error:

alma03 ~]# hosted-engine --deploy --restore-from-file /root/nsednev_from_alma03_rhevm_4_2_7_node_0_nfs
.
.
.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.

Logs from host attached.

Tested on these components:
ovirt-hosted-engine-ha-2.2.19-0.0.master.20181002122327.20181002122322.gitb449616.el7.noarch
ovirt-hosted-engine-setup-2.2.29-0.0.master.20181002122252.git9ae169e.el7.noarch
ovirt-engine-appliance-4.2-20181013.1.el7.noarch

Comment 33 Nikolai Sednev 2018-10-14 13:33:23 UTC
Created attachment 1493704 [details]
sosreport from alma03

Comment 34 Simone Tiraboschi 2018-10-15 06:42:43 UTC
The real issue happened here:

 [ INFO  ] TASK [Wait for the host to be up]
 [ INFO  ] ok: [localhost]
 [ INFO  ] TASK [Check host status]
 [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, fix accordingly and re-deploy.\n"}

Now the question is why the host has been set as not operational.


Unfortunately under sosreport-alma03.qa.lab.tlv.redhat.com-20181014162800/var/log/ovirt-hosted-engine-setup/ in the sos report we have just a not that relevant file.

If you still have the environment, can you please collect the whole /var/log/ovirt-hosted-engine-setup/ ?

According to https://github.com/sosreport/sos/issues/1243 that sos report issue has been fixed in May.
Can you please report your sos version?

Comment 35 Simone Tiraboschi 2018-10-15 06:44:09 UTC
(In reply to Nikolai Sednev from comment #31)
> In this very bug I was asked especially about restoring on vintage from node
> 0, if not supported, then there have to be some limitation within the code.

Yes, --restore-from-file is conflicting with --noansible

Comment 36 Sandro Bonazzola 2018-10-22 07:38:31 UTC
I think the original issue has been fixed, maybe we should close this and open a new one about the non-operational host issue.

Comment 37 Nikolai Sednev 2018-10-24 14:40:53 UTC
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "alma03.qa.lab.tlv.redhat.com", "affinity_labels": [], "auto_numa_status": "disable", "certificate": {"organization": "qa.lab.tlv.redhat.com", "subject": "O=qa.lab.tlv.redhat.com,CN=alma03.qa.lab.tlv.redhat.com"}, "cluster": {"href": "/ovirt-engine/api/clusters/cf113d58-d5fb-11e8-92ec-00163e7bb853", "id": "cf113d58-d5fb-11e8-92ec-00163e7bb853"}, "comment": "", "cpu": {"name": "Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz", "speed": 1800.0, "topology": {"cores": 4, "sockets": 1, "threads": 1}}, "device_passthrough": {"enabled": true}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"family": "Server", "manufacturer": "Dell Inc.", "product_name": "PowerEdge C6220 II", "serial_number": "7YDSW02", "supported_rng_sources": ["hwrng", "random"], "uuid": "4C4C4544-0059-4410-8053-B7C04F573032"}, "hooks": [], "href": "/ovirt-engine/api/hosts/0b0ef6c8-e2db-4454-a3fe-772c1a1aa5c4", "id": "0b0ef6c8-e2db-4454-a3fe-772c1a1aa5c4", "iscsi": {"initiator": "iqn.1994-05.com.redhat:7cf9a6c4224b"}, "katello_errata": [], "kdump_status": "disabled", "ksm": {"enabled": false}, "libvirt_version": {"build": 0, "full_version": "libvirt-4.5.0-10.el7_6.2", "major": 4, "minor": 5, "revision": 0}, "max_scheduling_memory": 15737028608, "memory": 33564917760, "name": "alma03.qa.lab.tlv.redhat.com", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": {"custom_kernel_cmdline": "", "reported_kernel_cmdline": "BOOT_IMAGE=/vmlinuz-3.10.0-957.el7.x86_64 root=UUID=ed329962-5cd4-47d6-8c45-14b3eae4a610 ro nofb intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024 console=tty0 crashkernel=auto biosdevname=0s console=ttyS1,115200 LANG=en_US.UTF-8", "type": "RHEL", "version": {"full_version": "7.6 - 4.el7", "major": 7, "minor": 6}}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": true, "kdump_detection": true, "pm_proxies": [{"type": "cluster"}, {"type": "dc"}]}, "protocol": "stomp", "se_linux": {"mode": "enforcing"}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:ainP4gYFc2dburBuV2o4+6KF2qC8JSAUHiRHbMsVnZM", "port": 22}, "statistics": [], "status": "non_responsive", "storage_connection_extensions": [], "summary": {"active": 1, "migrating": 0, "total": 1}, "tags": [], "transparent_huge_pages": {"enabled": true}, "type": "rhel", "unmanaged_networks": [], "update_available": false, "version": {"build": 43, "full_version": "vdsm-4.20.43-1.el7ev", "major": 4, "minor": 20, "revision": 0}}]}, "attempts": 120, "changed": false, "deprecations": [{"msg": "The 'ovirt_hosts_facts' module is being renamed 'ovirt_host_facts'", "version": 2.8}]}



the issue in your case is that hw_uuid column contains an upper case string, while it's lowercase in vds_unique_id.

[root@nsednev-he-1 ~]# sudo -u postgres scl enable rh-postgresql95 -- psql -d engine -c "SELECT vds_name, hw_uuid, vds_id, vds_unique_id, host_name FROM vds"
could not change directory to "/root": Permission denied
            vds_name           |               hw_uuid                |                vds_id                |            vds_unique_id             |          host_name           
 ------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+------------------------------
alma04.qa.lab.tlv.redhat.com | 4C4C4544-0059-4410-8054-B8C04F573032 | 0f8e942f-b00a-4f1b-b483-e4c43a2b4d37 | 4c4c4544-0059-4410-8054-b8c04f573032 | alma04.qa.lab.tlv.redhat.com
alma03.qa.lab.tlv.redhat.com | 4C4C4544-0059-4410-8053-B7C04F573032 | 0b0ef6c8-e2db-4454-a3fe-772c1a1aa5c4 | 4c4c4544-0059-4410-8053-b7c04f573032 | alma03.qa.lab.tlv.redhat.com

Moving back to assigned.
ovirt-hosted-engine-setup-2.2.29-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.18-1.el7ev.noarch
rhvm-appliance-4.2-20181018.0.el7.noarch

Forth to Simone's comment, the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1642440 moving to vds_unique_id will cover also this case.

Comment 38 Nikolai Sednev 2018-10-24 14:43:04 UTC
[root@alma04 ~]# vdsm-client Host getHardwareInfo
 {
     "systemProductName": "PowerEdge C6220 II", 
     "systemUUID": "4C4C4544-0059-4410-8054-B8C04F573032", 
     "systemSerialNumber": "8YDTW02", 
     "systemFamily": "Server", 
     "systemManufacturer": "Dell Inc."
 }
The vdsm makes it uppercase.

Comment 39 Nikolai Sednev 2018-10-24 15:16:20 UTC
When uppercase changed manually to lower case, the restore on the same storage has also failed, since it's already in the DB of the engine.
Currently there is a limitation that restore should be made on different mount point when using NFS or Gluster.

Comment 40 Nikolai Sednev 2018-10-24 15:45:48 UTC
OK, so the restore worked on Node0 to Node0 NFS to NFS, only when capital letters were manually changed to lower case, the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1642440 and I also had to use different (new) storage mount point, forth to limitation which I've added here https://bugzilla.redhat.com/show_bug.cgi?id=1420604.

Comment 41 Nikolai Sednev 2018-10-29 12:07:49 UTC
Failed to restore using these components:
ovirt-hosted-engine-setup-2.2.30-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.18-1.el7ev.noarch
Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.6 (Maipo)

[ INFO  ] TASK [Wait for OVF_STORE disk content]
[ ERROR ] {u'_ansible_parsed': True, u'stderr_lines': [u'20+0 records in', u'20+0 records out', u'10240 bytes (10 kB) copied, 0.000131806 s, 77.7 MB/s', u'tar: bffc5624-14c2-4f66-9854-33e981604ca5.ovf: Not found in archive', u'tar: Exiting with failure status due to previous errors'], u'changed': True, u'end': u'2018-10-29 13:57:35.511616', u'_ansible_item_label': {u'image_id': u'2abdc4c0-d8d7-40e8-9123-b1eeeb43283b', u'name': u'OVF_STORE', u'id': u'8cf670a2-c573-4a62-9829-07359dd09b84'}, u'stdout': u'', u'failed': True, u'_ansible_item_result': True, u'msg': u'non-zero return code', u'rc': 2, u'start': u'2018-10-29 13:57:34.894457', u'attempts': 12, u'cmd': u"vdsm-client Image prepare storagepoolID=ebda3e8a-da8d-11e8-8a73-00163e7bb853 storagedomainID=c1078875-3b6d-45da-8ee5-6f0eb8efc52c imageID=8cf670a2-c573-4a62-9829-07359dd09b84 volumeID=2abdc4c0-d8d7-40e8-9123-b1eeeb43283b | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - bffc5624-14c2-4f66-9854-33e981604ca5.ovf", u'item': {u'image_id': u'2abdc4c0-d8d7-40e8-9123-b1eeeb43283b', u'name': u'OVF_STORE', u'id': u'8cf670a2-c573-4a62-9829-07359dd09b84'}, u'delta': u'0:00:00.617159', u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"vdsm-client Image prepare storagepoolID=ebda3e8a-da8d-11e8-8a73-00163e7bb853 storagedomainID=c1078875-3b6d-45da-8ee5-6f0eb8efc52c imageID=8cf670a2-c573-4a62-9829-07359dd09b84 volumeID=2abdc4c0-d8d7-40e8-9123-b1eeeb43283b | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - bffc5624-14c2-4f66-9854-33e981604ca5.ovf", u'removes': None, u'argv': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout_lines': [], u'stderr': u'20+0 records in\n20+0 records out\n10240 bytes (10 kB) copied, 0.000131806 s, 77.7 MB/s\ntar: bffc5624-14c2-4f66-9854-33e981604ca5.ovf: Not found in archive\ntar: Exiting with failure status due to previous errors', u'_ansible_no_log': False}
[ ERROR ] {u'_ansible_parsed': True, u'stderr_lines': [u'20+0 records in', u'20+0 records out', u'10240 bytes (10 kB) copied, 0.000111758 s, 91.6 MB/s', u'tar: bffc5624-14c2-4f66-9854-33e981604ca5.ovf: Not found in archive', u'tar: Exiting with failure status due to previous errors'], u'changed': True, u'end': u'2018-10-29 13:59:47.536466', u'_ansible_item_label': {u'image_id': u'abbe3c72-2f45-46db-8f11-750ae5053aa8', u'name': u'OVF_STORE', u'id': u'5e3bf7b6-26eb-4411-9635-2a476f899661'}, u'stdout': u'', u'failed': True, u'_ansible_item_result': True, u'msg': u'non-zero return code', u'rc': 2, u'start': u'2018-10-29 13:59:46.918694', u'attempts': 12, u'cmd': u"vdsm-client Image prepare storagepoolID=ebda3e8a-da8d-11e8-8a73-00163e7bb853 storagedomainID=c1078875-3b6d-45da-8ee5-6f0eb8efc52c imageID=5e3bf7b6-26eb-4411-9635-2a476f899661 volumeID=abbe3c72-2f45-46db-8f11-750ae5053aa8 | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - bffc5624-14c2-4f66-9854-33e981604ca5.ovf", u'item': {u'image_id': u'abbe3c72-2f45-46db-8f11-750ae5053aa8', u'name': u'OVF_STORE', u'id': u'5e3bf7b6-26eb-4411-9635-2a476f899661'}, u'delta': u'0:00:00.617772', u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"vdsm-client Image prepare storagepoolID=ebda3e8a-da8d-11e8-8a73-00163e7bb853 storagedomainID=c1078875-3b6d-45da-8ee5-6f0eb8efc52c imageID=5e3bf7b6-26eb-4411-9635-2a476f899661 volumeID=abbe3c72-2f45-46db-8f11-750ae5053aa8 | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - bffc5624-14c2-4f66-9854-33e981604ca5.ovf", u'removes': None, u'argv': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout_lines': [], u'stderr': u'20+0 records in\n20+0 records out\n10240 bytes (10 kB) copied, 0.000111758 s, 91.6 MB/s\ntar: bffc5624-14c2-4f66-9854-33e981604ca5.ovf: Not found in archive\ntar: Exiting with failure status due to previous errors', u'_ansible_no_log': False}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20181029132147-tph9ob.log

Comment 42 Nikolai Sednev 2018-10-29 12:30:22 UTC
Created attachment 1498545 [details]
hosted engine setup logs

Comment 43 Nikolai Sednev 2018-10-29 12:31:07 UTC
Created attachment 1498546 [details]
sosreport from alma04

Comment 44 Simone Tiraboschi 2018-10-29 12:35:13 UTC
(In reply to Nikolai Sednev from comment #41)
> Failed to restore using these components:
> ovirt-hosted-engine-setup-2.2.30-1.el7ev.noarch
> ovirt-hosted-engine-ha-2.2.18-1.el7ev.noarch
> Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64
> x86_64 GNU/Linux
> Red Hat Enterprise Linux Server release 7.6 (Maipo)
> 
> [ INFO  ] TASK [Wait for OVF_STORE disk content]

Here it seams that due to any possible reason the engine never correctly uploaded the OVF_STORE content on the new hosted-engine storage domain.

Comment 45 Nikolai Sednev 2018-10-29 12:39:17 UTC
1.Deployed node0 over NFS.
2.Attached one ha-host.
3.Added one ISO domain.
4.Added one NFS data storage domain.
5.Created one guest VM on alma03 (SPM host).
6.Migrated HE-VM to alma04 (none SPM host).
7.Set environment in to global maintenance.
8.Retrieved backup files from the engine.
9.Reprovisioned alma04 (none SPM).
10.Copied backup file to clean alma04.
11.Installed ovirt-hosted-engine-setup on alma04.
12.Tied to restore the engine during deployment using "hosted-engine --deploy --restore-from-file=/root/nsednev_from_alma04_rhevm_4_2_7_nospm" on alma04 on another NFS mount point.
13.Failed.

Comment 46 Simone Tiraboschi 2018-10-29 12:58:39 UTC
Created attachment 1498551 [details]
engine.log

Comment 47 Simone Tiraboschi 2018-10-29 13:08:51 UTC
Let's focus on the latest engine.log:

engine VM has been correctly created:
 2018-10-29 13:53:58,116+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-27) [] EVENT_ID: USER_ADD_VM(34), VM HostedEngine was created by admin@internal-authz.

and editing it correctly triggered an OVF update:
 2018-10-29 13:55:06,456+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-57) [2042de45-2539-4a2a-bd04-f4d4902f2a12] EVENT_ID: USER_UPDATE_VM(35), VM HostedEngine configuration was updated by admin@internal-authz.
 2018-10-29 13:55:06,460+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Before acquiring and wait lock 'EngineLock:{exclusiveLocks='[ebda3e8a-da8d-11e8-8a73-00163e7bb853=OVF_UPDATE]', sharedLocks=''}'
 2018-10-29 13:55:06,461+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Lock-wait acquired to object 'EngineLock:{exclusiveLocks='[ebda3e8a-da8d-11e8-8a73-00163e7bb853=OVF_UPDATE]', sharedLocks=''}'
 2018-10-29 13:55:06,462+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Running command: ProcessOvfUpdateForStoragePoolCommand internal: true. Entities affected :  ID: ebda3e8a-da8d-11e8-8a73-00163e7bb853 Type: StoragePool
 2018-10-29 13:55:06,465+02 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-57) [2042de45-2539-4a2a-bd04-f4d4902f2a12] Lock freed to object 'EngineLock:{exclusiveLocks='[HostedEngine=VM_NAME]', sharedLocks='[bffc5624-14c2-4f66-9854-33e981604ca5=VM]'}'
 2018-10-29 13:55:06,474+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Attempting to update VM OVFs in Data Center 'Default'
 2018-10-29 13:55:06,543+02 WARN  [org.ovirt.engine.core.vdsbroker.builder.vminfo.VmInfoBuildUtils] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] No host NUMA nodes found for vm HostedEngine (bffc5624-14c2-4f66-9854-33e981604ca5)
 2018-10-29 13:55:06,590+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Successfully updated VM OVFs in Data Center 'Default'
 2018-10-29 13:55:06,590+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Attempting to update template OVFs in Data Center 'Default'
 2018-10-29 13:55:06,594+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Successfully updated templates OVFs in Data Center 'Default'
 2018-10-29 13:55:06,595+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Attempting to remove unneeded template/vm OVFs in Data Center 'Default'
 2018-10-29 13:55:06,615+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Successfully removed unneeded template/vm OVFs in Data Center 'Default'
 2018-10-29 13:55:06,616+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStoragePoolCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [4bdcdb1] Lock freed to object 'EngineLock:{exclusiveLocks='[ebda3e8a-da8d-11e8-8a73-00163e7bb853=OVF_UPDATE]', sharedLocks=''}'
 2018-10-29 13:55:06,666+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] Lock Acquired to object 'EngineLock:{exclusiveLocks='[bb2288b9-0303-483c-888b-2bc4c4c6a351=STORAGE]', sharedLocks='[ebda3e8a-da8d-11e8-8a73-00163e7bb853=OVF_UPDATE]'}'
 2018-10-29 13:55:06,673+02 WARN  [org.ovirt.engine.core.vdsbroker.builder.vminfo.VmInfoBuildUtils] (default task-57) [2042de45-2539-4a2a-bd04-f4d4902f2a12] No host NUMA nodes found for vm HostedEngine (bffc5624-14c2-4f66-9854-33e981604ca5)
 2018-10-29 13:55:06,733+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] Running command: ProcessOvfUpdateForStorageDomainCommand internal: true. Entities affected :  ID: bb2288b9-0303-483c-888b-2bc4c4c6a351 Type: StorageAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN
 2018-10-29 13:55:06,772+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] START, SetVolumeDescriptionVDSCommand( SetVolumeDescriptionVDSCommandParameters:{storagePoolId='ebda3e8a-da8d-11e8-8a73-00163e7bb853', ignoreFailoverLimit='false', storageDomainId='bb2288b9-0303-483c-888b-2bc4c4c6a351', imageGroupId='aede6f87-ed75-4f5b-92f8-776cc62bdf3e', imageId='994c1f99-77bc-4df9-a686-8c2ba197211f'}), log id: 532068ec
 2018-10-29 13:55:06,773+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] -- executeIrsBrokerCommand: calling 'setVolumeDescription', parameters:
 2018-10-29 13:55:06,773+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ spUUID=ebda3e8a-da8d-11e8-8a73-00163e7bb853
 2018-10-29 13:55:06,773+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ sdUUID=bb2288b9-0303-483c-888b-2bc4c4c6a351
 2018-10-29 13:55:06,773+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ imageGroupGUID=aede6f87-ed75-4f5b-92f8-776cc62bdf3e
 2018-10-29 13:55:06,773+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ volUUID=994c1f99-77bc-4df9-a686-8c2ba197211f
 2018-10-29 13:55:06,773+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ description={"Updated":false,"Last Updated":"Sun Oct 28 12:12:24 IST 2018","Storage Domains":[{"uuid":"bb2288b9-0303-483c-888b-2bc4c4c6a351"}],"Disk Description":"OVF_STORE"}
 2018-10-29 13:55:06,816+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.SetVolumeDescriptionVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] FINISH, SetVolumeDescriptionVDSCommand, log id: 532068ec


and the OVF_STORE content has been uploaded trough alma03 which was the SPM host (the user was redeploying on alma04):
 2018-10-29 13:55:06,958+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] START, UploadStreamVDSCommand(HostName = alma03.qa.lab.tlv.redhat.com, UploadStreamVDSCommandParameters:{hostId='f2897fa0-0372-4fdb-9088-475002bb2a91'}), log id: 478e0b1e
 2018-10-29 13:55:06,958+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] -- executeVdsBrokerCommand, parameters:
 2018-10-29 13:55:06,958+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ spUUID=ebda3e8a-da8d-11e8-8a73-00163e7bb853
 2018-10-29 13:55:06,958+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ sdUUID=bb2288b9-0303-483c-888b-2bc4c4c6a351
 2018-10-29 13:55:06,958+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ imageGUID=aede6f87-ed75-4f5b-92f8-776cc62bdf3e
 2018-10-29 13:55:06,958+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ volUUID=994c1f99-77bc-4df9-a686-8c2ba197211f
 2018-10-29 13:55:06,958+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] ++ size=30720
 2018-10-29 13:55:07,215+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-92) [609dcd76] FINISH, UploadStreamVDSCommand, log id: 478e0b1e

Comment 48 Simone Tiraboschi 2018-10-29 13:31:15 UTC
Everything went as expected on alma03:
 2018-10-29 13:55:44,905+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand] (EE-ManagedThreadFactory-engine-Thread-261) [5811be50] Ending command 'org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand' successfully.
 2018-10-29 13:55:44,913+02 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand] (EE-ManagedThreadFactory-engine-Thread-262) [5811be50] Ending command 'org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand' successfully.
 2018-10-29 13:55:44,924+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-261) [5811be50] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'UploadStream' completed, handling the result.
 2018-10-29 13:55:44,924+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (EE-ManagedThreadFactory-engine-Thread-261) [5811be50] CommandAsyncTask::HandleEndActionResult [within thread]: endAction for action type 'UploadStream' succeeded, clearing tasks.
 2018-10-29 13:55:44,924+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engine-Thread-261) [5811be50] SPMAsyncTask::ClearAsyncTask: Attempting to clear task '920dafb3-98eb-43d8-9683-81c59117064c'


but then if we check the content of the OVF_STORE volume on host side:
[root@alma04 ~]# vdsm-client Image prepare storagepoolID=ebda3e8a-da8d-11e8-8a73-00163e7bb853 storagedomainID=c1078875-3b6d-45da-8ee5-6f0eb8efc52c imageID=5e3bf7b6-26eb-4411-9635-2a476f899661 volumeID=abbe3c72-2f45-46db-8f11-750ae5053aa8 | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -xvf - 
20+0 records in
20+0 records out
10240 bytes (10 kB) copied, 0,000141984 s, 72,1 MB/s
info.json
metadata.json
[root@alma04 ~]# cat info.json
{
  "Last Updated" : "Mon Oct 29 13:55:37 IST 2018",
  "Storage Domains" : [ {
    "uuid" : "c1078875-3b6d-45da-8ee5-6f0eb8efc52c"
  } ]
}[root@alma04 ~]# cat metadata.json
{
  "VM Status" : {
    "f51ec4cd-5ab2-4a58-b13e-15314c8b296d" : 1
  }
}

Comment 49 Simone Tiraboschi 2018-10-29 13:35:53 UTC
This seams an engine bug to me: the engine created an OVF_STORE without vm.ovf, not sure if reproducible.

Can you please simply try another time on the same host with the same backup file?

Comment 50 Nikolai Sednev 2018-10-29 15:25:00 UTC
Same error, also not working.

Comment 51 Simone Tiraboschi 2018-10-29 15:48:51 UTC
I tried reproducing as well on Nikolai's env with exactly the same code and exactly the same backup file and it worked as expected for me.

I fear that we have some kind of barely reproducible bug on engine side that "sometimes" forgets to update OVF_STORE disk.
I think it's the same issue that sometimes causes https://bugzilla.redhat.com/show_bug.cgi?id=1628836

Comment 52 Simone Tiraboschi 2018-10-29 15:51:48 UTC
Created attachment 1498613 [details]
Successfully execution on Nikolai's env

Comment 53 Nikolai Sednev 2018-10-31 14:26:52 UTC
Worked for me just fine from NFS to iSCSI and also from NFS to NFS, when restoring over clean host.
Tested on these components:
ovirt-hosted-engine-setup-2.2.30-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.18-1.el7ev.noarch
rhvm-appliance-4.2-20181026.1.el7.noarch.
Moving to verified.

Previously found issues will be tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1644748.

Comment 54 Sandro Bonazzola 2018-11-13 16:12:30 UTC
This bugzilla is included in oVirt 4.2.7 Async 1 release, published on November 13th 2018.

Since the problem described in this bug report should be resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.