Bug 1328718 - Hosted storage OVF_STORE failed to update and create an vm.conf with 'None'
Summary: Hosted storage OVF_STORE failed to update and create an vm.conf with 'None'
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 3.6.4
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-3.6.8
: ---
Assignee: Tal Nisan
QA Contact: Aharon Canan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-20 07:15 UTC by Paul
Modified: 2017-03-01 15:00 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-20 11:23:37 UTC
oVirt Team: Storage
Embargoed:
amureini: ovirt-3.6.z?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
zipped vdsm and agent logs (6.36 MB, application/x-gzip)
2016-04-21 09:43 UTC, Paul
no flags Details

Description Paul 2016-04-20 07:15:13 UTC
==Description of problem:==

I get the following errors in my event log: 

Failed to update OVF disks 18c50ea6-4654-4525-b241-09e15acf5e99, OVF data isn't updated on those OVF stores (Data Center Default, Storage Domain hostedengine_nfs).

VDSM command failed: Could not acquire resource. Probably resource factory threw an exception.: ()

http://screencast.com/t/S8cfXMsdGM 

When I check on file there is some data, but not updated: http://screencast.com/t/hbXQFlou

I believe my problem might be related to this bug https://bugzilla.redhat.com/show_bug.cgi?id=1303316

As you can see in the screenshot the hostedengine storage is unassigned and so both ovf_stores are OK, but not linked and therefore  can't be updated.


== Version-Release number of selected component (if applicable): ==

vdsm-4.17.23.2-1.el7.noarch
ovirt-hosted-engine-setup-1.3.4.0-1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.3.0-1.el7.centos.noarch
ovirt-release36-007-1.noarch
ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch
ovirt-vmconsole-proxy-1.0.0-1.el7.centos.noarch
ovirt-setup-lib-1.0.1-1.el7.centos.noarch
ovirt-hosted-engine-ha-1.3.5.1-1.el7.centos.noarch
ovirt-vmconsole-1.0.0-1.el7.centos.noarch
ovirt-host-deploy-1.4.1-1.el7.centos.noarch



==How reproducible:==

The error is triggered every hour.
It started after I upgrade from 3.5.
I had problems with the autoimport and had to remove hosted storage ( maybe differently named storage hosted_storage as hostedengine_nfs ?)


=== Actual results: ===

My hostedstorage ovf disks ( http://screencast.com/t/AcdqmJWee )  are not being updated ( what I understood is they should be regularly updated ). 

So I wondered, maybe I can remove these OVF disks and they are recreated automatically? ( Similar when removing the hosted storage domain it was added automatically again )

 ( And for this NFS storage domain, is it normal to have 2 OVF disks? )

I removed OVF_STORE.

=== actual result after removal OVF ==

I removed the OVF disks as explained from the hosted engine/storage.
I started another server, tried several things like putting to maintenance and reinstalling, but I keep getting:

Apr 20 00:18:00 geisha-3 ovirt-ha-agent: WARNING:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Unable to find OVF_STORE
Apr 20 00:18:00 geisha-3 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
Apr 20 00:18:00 geisha-3 ovirt-ha-agent: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
Apr 20 00:18:00 geisha-3 journal: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: ''Configuration value not found: file=/var/run/ovirt-hosted-engine-ha/vm.conf, key=memSize'' - trying to restart agent

Fact it can't find the OVF store seems logical, but now the /var/run/ovirt-hosted-engine-ha/vm.conf is replace with a file conatining only "None".
I tried to set file readonly ( chown root ), but this only threw an error about file not writable, tried different path, but nothing helped.
So I am afraid to touch the other running hosts, as same might happen there and I am unable to start hosted engine again.

I thought OVF would be created automatically again if it is missing, but it isn't...
Can I trigger this OVF, or add it somehow manually? Would deleting the whole hosted_storage trigger an auto import again including OVF?

If this provides no solution, I guess, I have to restore the removed OVF store. Would a complete database restore + restoring folder images/<OVF_STORE_ID> be sufficient? 
Or where is the information about the OVF stores the Web GUI shows stored?

Looking forward to resolve this OVF store issue.

== Expected results: == 

I would have expected the OVF_STORE was able to update. I check filepermissions and all looked good ( see above screenshots )

After OVF_STORE removal I would have expected the OVF_STORE was recreated.
As actual effect of the missing OVF_STORE a vm.conf is written with the word 'None', making hosted engine on this rebooted host fail.


Additional info:

Most above was discussed in https://www.mail-archive.com/users@ovirt.org/msg31846.html.


Any additional info can be supplied ( if available )

Comment 1 Paul 2016-04-20 15:06:08 UTC
Looks like the system does recreate the OVF :-) 
Too bad this failed again...

http://screencast.com/t/RlYCR1rk8T
http://screencast.com/t/CpcQuoKg

Failed to create OVF store disk for Storage Domain hostedengine_nfs.
The Disk with the id b6f34661-8701-4f82-a07c-ed7faab4a1b8 might be removed manually for automatic attempt to create new one. 
OVF updates won't be attempted on the created disk.

And on the hosted storage disk tab : http://screencast.com/t/ZmwjsGoQ1Xbp

Comment 2 Allon Mureinik 2016-04-21 09:22:47 UTC
Can you please attach engine and vdsm's logs?

Comment 3 Paul 2016-04-21 09:43:16 UTC
Created attachment 1149395 [details]
zipped vdsm and agent logs

Files are FROM SPM host not from host running hosted engine.

Comment 4 Red Hat Bugzilla Rules Engine 2016-04-21 12:14:25 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 5 Amit Aviram 2016-04-24 09:22:22 UTC
So what is the problem we want to address here? on the one hand- we have an OVF files which are not being updated, which is one bug. on the second hand- we need to add functionality for missing OVF files- which is another bug.

Allon, I think it will be best to have a new bug for handling the missing OVF, and making this bug only about the initial problem described- which is the problematic ovf files.

What do you think?

Comment 6 Amit Aviram 2016-04-25 08:47:20 UTC
Paul, thanks for the analysis.

It seems that the domain that contains the OVF files lost its pool in its metadata at some earlier point in time, and the problem is not related to the OVF files: when trying to use these files, getting the pool using "dom.getPools()[0]" throws an "list index out of range" error, which fails the process.

you can find this error along the log in some other flows.

can you please add all of your VDSM logs, so we can find out what caused that?

can you please provide info regarding the storage you used, and the general setup? If the upgrade from 3.5 really caused this, it might be related to a specific storage type.

Thanks!

Comment 7 Paul 2016-04-25 09:01:03 UTC
Hi Amit,

I had serious problems upgrading to 3.6 as my hosts ran EL6.
So I first reinstalled my hosts and with migrating / reinstalling hosts I was able to get all into 3.6.

Simone Tiraboschi told me I could try remove the PoolID ( not sure this is the same pool ID we are talking about ) See this discussion here: 
http://lists.ovirt.org/pipermail/users/2016-February/037814.html

My hosted storage runs on NFS.

I will try to provide all vdsm logs ( this will be quite a huge amount ) this evening. I'll send it through wetransfer.

Is there a way to set or correct the pool manually?

Comment 8 Amit Aviram 2016-04-25 10:07:59 UTC
needinfo for Allon was dropped

Comment 9 Amit Aviram 2016-04-25 10:44:58 UTC
(In reply to Paul from comment #7)
> Hi Amit,
> 
> I had serious problems upgrading to 3.6 as my hosts ran EL6.
> So I first reinstalled my hosts and with migrating / reinstalling hosts I
> was able to get all into 3.6.
> 
> Simone Tiraboschi told me I could try remove the PoolID ( not sure this is
> the same pool ID we are talking about ) See this discussion here: 
> http://lists.ovirt.org/pipermail/users/2016-February/037814.html
> 
> My hosted storage runs on NFS.
> 
> I will try to provide all vdsm logs ( this will be quite a huge amount )
> this evening. I'll send it through wetransfer.
> 
> Is there a way to set or correct the pool manually?

Well, the metadata is saved as a file in your storage, let's see if it has the right values: Looking at the logs, it seems that this domains' poolID is 
"00000002-0002-0002-0002-000000000385", so see if the following file exists:

"/rhev/data-center/00000002-0002-0002-0002-000000000385/88b69eba-ef4f-4dbe-ba53-20dadd424d0e/dom_md/metadata"

(I might be wrong about the exact path, but you can try locate the domain uuid and easily find the right path)

if it does, verify that the value of "POOL_UUID" is actually "00000002-0002-0002-0002-000000000385", which is the pool that contains it.

if it is not, you can try put this SD down, shut down VDSM, and change it manually to that value, then restarting VDSM. IIUC, that should resolve the issue.


Please let us know the results

Comment 10 Paul 2016-04-25 11:01:23 UTC
Allright, found the metadata file and updated with the checksum too.

Should  /etc/ovirt-hosted-engine/hosted-engine.conf also contain this poolID?
=> spUUID=00000000-0000-0000-0000-000000000000
=> spUUID=00000002-0002-0002-0002-000000000385

I haven't seen any changes, but will try migrate away and restart some service later tonight and let you know.

Do you till need the complete vdsmd logs?

Comment 11 Paul 2016-04-25 11:08:03 UTC
Looks like after I added the spUUID= entry to the hosted-engine.conf the score dropped from 3400 to 2400 and hosted-engine powered down.

Comment 12 Amit Aviram 2016-04-25 11:25:47 UTC
(In reply to Paul from comment #10)
> Allright, found the metadata file and updated with the checksum too.
> 
> Should  /etc/ovirt-hosted-engine/hosted-engine.conf also contain this poolID?
> => spUUID=00000000-0000-0000-0000-000000000000
> => spUUID=00000002-0002-0002-0002-000000000385

I'm not familiar with HE- tiraboshi, can you please help us here? does "hosted-engine.conf" needs to have info regarding this pool?

> 
> I haven't seen any changes, but will try migrate away and restart some
> service later tonight and let you know.

VDSM must be restarted for this change to make effect.

> 
> Do you still need the complete vdsmd logs?

That's fine, I'm quite positive that having a wrong pool ID is whats wrong here. so you don't need to add further logs.

Comment 13 Paul 2016-04-26 07:03:16 UTC
I restarted vdsmd and no strange errors, but I believe hostedstorage ovf is still not updated.

1. the images in /rhev/data-center/mnt/hostedstorage..../88b69eba-ef4f-4dbe-ba53-20dadd424d0e/images/<OVF_FILE_ID> do still have an old timestamp.

2. In the web GUI the images ( and disk ) show as following: http://screencast.com/t/xagPevn2kJ

3. I also updated the POOL_UUID in the meta files of the ovf images which are not updated: 
http://screencast.com/t/DvpfnvVR

4. the vdsmd log has no 'ERROR' entries, the only WARN entries are

Thread-217972::WARNING::2016-04-26 08:55:59,883::fileUtils::152::Storage.fileUtils::(createdir) Dir /var/run/vdsm/storage/88b69eba-ef4f-4dbe-ba53-20dadd424d0e already exists
Thread-217974::WARNING::2016-04-26 08:55:59,934::fileUtils::152::Storage.fileUtils::(createdir) Dir /var/run/vdsm/storage/88b69eba-ef4f-4dbe-ba53-20dadd424d0e already exists
jsonrpc.Executor/2::DEBUG::2016-04-26 08:55:59,964::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = '  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0
periodic/26::WARNING::2016-04-26 08:56:01,669::periodic::261::virt.periodic.VmDispatcher::(__call__) could not run <class 'virt.periodic.DriveWatermarkMonitor'> on [u'78097a02-f4e7-4d28-bcfc-f011cdb48898']


Where the last entry DriveWatermarkMonitor is related to something else I believe ( serial console?! ) and the other warning just about a directory which already exists.

5. agent.log has no errors ( except errors I causedmyself by restarting vdsmd ).

Are there any other entries I might be missing, or do I need to add pool UID to hosted-engine.conf too?

Comment 14 Simone Tiraboschi 2016-04-26 07:59:11 UTC
(In reply to Amit Aviram from comment #12)
> (In reply to Paul from comment #10)
> > Allright, found the metadata file and updated with the checksum too.
> > 
> > Should  /etc/ovirt-hosted-engine/hosted-engine.conf also contain this poolID?
> > => spUUID=00000000-0000-0000-0000-000000000000
> > => spUUID=00000002-0002-0002-0002-000000000385
> 
> I'm not familiar with HE- tiraboshi, can you please help us here? does
> "hosted-engine.conf" needs to have info regarding this pool?

No,
it's there just for compatibility issue.
On 3.6 host it should be:
 spUUID=00000000-0000-0000-0000-000000000000

If you see locking errors creating the OVF_STORE volumes, can you please check this one https://bugzilla.redhat.com/1322849 ?
hosted-engine host-id in hosted-engine.conf and the spm_id in the engine DB are not really in sync (they are in sync by chance just in case you added the second hosted-engine host as your second host...) and this can cause locking issue.

Can you please grep for host_id in /etc/ovirt-hosted-engine/hosted-engine.conf on each host and run this query on the engine DB?
SELECT vds_spm_id_map.storage_pool_id,vds_spm_id_map.vds_spm_id,vds_spm_id_map.vds_id,vds.vds_name FROM vds_spm_id_map, vds WHERE vds_spm_id_map.vds_id = vds.vds_id;

Comment 15 Simone Tiraboschi 2016-04-26 08:03:28 UTC
(In reply to Paul from comment #11)
> Looks like after I added the spUUID= entry to the hosted-engine.conf the
> score dropped from 3400 to 2400 and hosted-engine powered down.

This was not a good idea: that value is there just for compatibility reasons (till 3.5 the hosted-engine engin storage domain was in a storage pool by itself). Moving back to a value will just cause the 3.5 -> 3.6 to trigger again with possible bad results. Please revert back to spUUID=00000000-0000-0000-0000-000000000000
and restart the involved services; you host should reach 3400 points again.

Comment 16 Paul 2016-04-26 08:08:38 UTC
(In reply to Simone Tiraboschi from comment #14)
> (In reply to Amit Aviram from comment #12)
> > (In reply to Paul from comment #10)
> > > Allright, found the metadata file and updated with the checksum too.
> > > 
> > > Should  /etc/ovirt-hosted-engine/hosted-engine.conf also contain this poolID?
> > > => spUUID=00000000-0000-0000-0000-000000000000
> > > => spUUID=00000002-0002-0002-0002-000000000385
> > 
> > I'm not familiar with HE- tiraboshi, can you please help us here? does
> > "hosted-engine.conf" needs to have info regarding this pool?
> 
> No,
> it's there just for compatibility issue.
> On 3.6 host it should be:
>  spUUID=00000000-0000-0000-0000-000000000000
> 
> If you see locking errors creating the OVF_STORE volumes, can you please
> check this one https://bugzilla.redhat.com/1322849 ?
> hosted-engine host-id in hosted-engine.conf and the spm_id in the engine DB
> are not really in sync (they are in sync by chance just in case you added
> the second hosted-engine host as your second host...) and this can cause
> locking issue.
> 
> Can you please grep for host_id in
> /etc/ovirt-hosted-engine/hosted-engine.conf on each host and run this query
> on the engine DB?
> SELECT
> vds_spm_id_map.storage_pool_id,vds_spm_id_map.vds_spm_id,vds_spm_id_map.
> vds_id,vds.vds_name FROM vds_spm_id_map, vds WHERE vds_spm_id_map.vds_id =
> vds.vds_id;

I will check this tonight and let you know. Hope this solves the not-updating ovf disks.

Comment 17 Paul 2016-04-26 18:37:28 UTC
(In reply to Simone Tiraboschi from comment #14)
> (In reply to Amit Aviram from comment #12)
> > (In reply to Paul from comment #10)
> > > Allright, found the metadata file and updated with the checksum too.
> > > 
> > > Should  /etc/ovirt-hosted-engine/hosted-engine.conf also contain this poolID?
> > > => spUUID=00000000-0000-0000-0000-000000000000
> > > => spUUID=00000002-0002-0002-0002-000000000385
> > 
> > I'm not familiar with HE- tiraboshi, can you please help us here? does
> > "hosted-engine.conf" needs to have info regarding this pool?
> 
> No,
> it's there just for compatibility issue.
> On 3.6 host it should be:
>  spUUID=00000000-0000-0000-0000-000000000000
> 
> If you see locking errors creating the OVF_STORE volumes, can you please
> check this one https://bugzilla.redhat.com/1322849 ?
> hosted-engine host-id in hosted-engine.conf and the spm_id in the engine DB
> are not really in sync (they are in sync by chance just in case you added
> the second hosted-engine host as your second host...) and this can cause
> locking issue.

I checked and the IDs did not match. ( It might also explain why I had errors about storage domain still active while in maintenance ).
I have updated ids in database to reflect ids in hosted-engine.conf

> 
> Can you please grep for host_id in
> /etc/ovirt-hosted-engine/hosted-engine.conf on each host and run this query
> on the engine DB?
> SELECT
> vds_spm_id_map.storage_pool_id,vds_spm_id_map.vds_spm_id,vds_spm_id_map.
> vds_id,vds.vds_name FROM vds_spm_id_map, vds WHERE vds_spm_id_map.vds_id =
> vds.vds_id;

I will check tomorrow morning if there was some daily cron that triggered the update.

Thanks for this update!

Comment 18 Paul 2016-04-28 08:29:51 UTC
I see OVF files images are still not updated.

Maybe it is also related to the hostedstorage disk not correctly linked?
http://screencast.com/t/8XUlvb9O

Logs just seem allright ( no errors or warnings in agent.log ):

Apr 28 10:22:18 geisha-1 ovirt-ha-broker: INFO:engine_health.CpuLoadNoEngine:VM is up on this host with healthy engine
Apr 28 10:22:19 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM
Apr 28 10:22:19 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage
Apr 28 10:22:19 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
Apr 28 10:22:19 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
Apr 28 10:22:19 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Refreshing the storage domain
Apr 28 10:22:19 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images
Apr 28 10:22:19 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images
Apr 28 10:22:19 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain
Apr 28 10:22:19 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE
Apr 28 10:22:20 geisha-1 ovirt-ha-broker: INFO:mgmt_bridge.MgmtBridge:Found bridge ovirtmgmt with ports
Apr 28 10:22:20 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:a109befe-4b59-4447-846d-08d87c324f63, volUUID:39c5b202-feae-409f-8ff9-261c6e561a61
Apr 28 10:22:21 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Found OVF_STORE: imgUUID:055a6a70-a587-48ee-8367-f449ae2cc5ff, volUUID:5617f4ff-8cff-4d1f-882d-9fc801eeeaa0
Apr 28 10:22:21 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Extracting Engine VM OVF from the OVF_STORE
Apr 28 10:22:21 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:OVF_STORE volume path: /rhev/data-center/mnt/hostedstorage.pazion.nl:_opt_hosted-engine/88b69eba-ef4f-4dbe-ba53-20dadd424d0e/images/055a6a70-a587-48ee-8367-f449ae2cc5ff/5617f4ff-8cff-4d1f-882d-9fc801eeeaa0
Apr 28 10:22:21 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Found an OVF for HE VM, trying to convert
Apr 28 10:22:21 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Got vm.conf from OVF_STORE
Apr 28 10:22:21 geisha-1 ovirt-ha-agent: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Current state EngineUp (score: 3400)

I did notice in /var/log/messages:

Apr 26 23:55:33 geisha-1 kernel: dd: sending ioctl 80306d02 to a partition!
Apr 26 23:55:33 geisha-1 kernel: dd: sending ioctl 80306d02 to a partition!
Apr 26 23:55:33 geisha-1 kernel: dd: sending ioctl 80306d02 to a partition!

They seem related to migrating VMs or restarting vdsmd / ovirt-ha-agent services.
Could OVF update somehow trying to update a wrong mount /entrypoint?

Comment 19 Amit Aviram 2016-05-08 06:57:49 UTC
stirabos, Looks like this is out of Storage team's scope, do you think we should move it to your team?

Comment 20 Simone Tiraboschi 2016-05-09 08:18:53 UTC
I think sla since the OVF_STORE volumes are not created by the hosted-engine-setup or hosted-engine-ha but from the engine as for regular hosts.

Comment 21 Tal Nisan 2016-05-09 09:02:32 UTC
Roy, can someone from your team have a look?

Comment 22 Roy Golan 2016-05-09 11:35:37 UTC
Can you please supply a summary of the findings till now if you figured this is not an storage issue?

Comment 23 Roy Golan 2016-05-17 21:55:28 UTC
I don't see what is SLA here if we there is a sync problem in OVFs.

Most probably this setup suffered from bugs of the early version so the migration wasn't smooth. The least you can do is help with a workaround.

Comment 24 Amit Aviram 2016-05-18 11:13:13 UTC
Sorry for the delay, I will reply as soon as feature freeze will pass- we are occupied with urgent feature issues.

Comment 25 Amit Aviram 2016-05-19 09:22:42 UTC
This bug needs to be split into 2:

The bug's summery is talking about the fact that missing OVFs are not recreated- which is something that needs to be decided if we are planning to fix. 

But that's just an implication of manual changes that the user did after its environment got messed up during upgrading- which is the real bug in here.

The first issue should be discussed in another thread- and this bug should address the problems that emerged from the actual use of the system.

Paul, can you please give a summary of what happened before you deleted the OVF files? You tried to upgrade an HE environment, and after doing that the system lost sync to those files?

Comment 26 Paul 2016-05-19 09:38:39 UTC
I had problems upgrading as my hosted-engine was on 3.6 and my hosts on EL6 and vdsm could not be updated. So I reinstalled my hosts to EL7 and with different cluster I managed to get hosts linked to the same environment.

All looked fine, but my hosted_storage did not import completely, it was shown, but locked. I remopved the storage and it was automtically recreated, but again locked.

After update to 3.6.1 or 3.6.2? ( around 1st of March ) all of a sudden hosted_storage ( on NFS ) was imported. I guess this was because my hosted_storage had a different name ( hostedengine_nfs ), but anyway I got a step further :)

By this time I started receiveing following error 
http://screencast.com/t/S8cfXMsdGM
The date of the 2 OVF image folders and files on filesystem were never updated. By this time I deleted the OVF files.

I tried different thing to let the system recreate the OVF like changing IDs as suggested. Also changed the status to OK, and it looks like this is not updated to uknown or error either. http://screencast.com/t/vCx0CQiXm

I do not have any errors in the event log about this all on the moment, but I am uncertain about the state of my hosted storage, especially when I plan to upgrade to 4.0 later this year and what will break.

Comment 27 Amit Aviram 2016-05-19 13:53:55 UTC
(In reply to Paul from comment #26)
> I had problems upgrading as my hosted-engine was on 3.6 and my hosts on EL6
> and vdsm could not be updated. So I reinstalled my hosts to EL7 and with
> different cluster I managed to get hosts linked to the same environment.
> 
> All looked fine, but my hosted_storage did not import completely, it was
> shown, but locked. I removed the storage and it was automatically recreated,
> but again locked.

So this is the real bug here. which is that the hosted_storage that was created with hosts running EL6, and was upgraded to EL7, changing its cluster- could not be imported correctly.

Paul, I know it's a long shot- but do you happen to have the logs from the time you changed the cluster etc.?

> After update to 3.6.1 or 3.6.2? ( around 1st of March ) all of a sudden
> hosted_storage ( on NFS ) was imported. I guess this was because my
> hosted_storage had a different name ( hostedengine_nfs ), but anyway I got a
> step further :)
>
> By this time I started receiveing following error 
> http://screencast.com/t/S8cfXMsdGM
> The date of the 2 OVF image folders and files on filesystem were never
> updated. By this time I deleted the OVF files.
> 
> I tried different thing to let the system recreate the OVF like changing IDs
> as suggested. Also changed the status to OK, and it looks like this is not
> updated to uknown or error either. http://screencast.com/t/vCx0CQiXm
> 
> I do not have any errors in the event log about this all on the moment, but
> I am uncertain about the state of my hosted storage, especially when I plan
> to upgrade to 4.0 later this year and what will break.

We can try doing a workaround to solve your problem- IIUC, your hosted_storage is down- so right click on it and choose "Destroy". this will remove the domain from oVirt, but won't delete its content.
After doing that, import the domain back to oVirt.

Comment 28 Paul 2016-05-19 14:31:27 UTC
(In reply to Amit Aviram from comment #27)
> (In reply to Paul from comment #26)
> > I had problems upgrading as my hosted-engine was on 3.6 and my hosts on EL6
> > and vdsm could not be updated. So I reinstalled my hosts to EL7 and with
> > different cluster I managed to get hosts linked to the same environment.
> > 
> > All looked fine, but my hosted_storage did not import completely, it was
> > shown, but locked. I removed the storage and it was automatically recreated,
> > but again locked.
> 
> So this is the real bug here. which is that the hosted_storage that was
> created with hosts running EL6, and was upgraded to EL7, changing its
> cluster- could not be imported correctly.
> 
> Paul, I know it's a long shot- but do you happen to have the logs from the
> time you changed the cluster etc.?

Which log files do you want? I think I can find them in my backups.
 - I installed host around 7 febr. ( /root/anaconda-ks.cfg )
 - so find backups of februari till early march when storage got imported?
 ( ovf exception error not able to update, dated from march 1st ).

> 
> > After update to 3.6.1 or 3.6.2? ( around 1st of March ) all of a sudden
> > hosted_storage ( on NFS ) was imported. I guess this was because my
> > hosted_storage had a different name ( hostedengine_nfs ), but anyway I got a
> > step further :)
> >
> > By this time I started receiveing following error 
> > http://screencast.com/t/S8cfXMsdGM
> > The date of the 2 OVF image folders and files on filesystem were never
> > updated. By this time I deleted the OVF files.
> > 
> > I tried different thing to let the system recreate the OVF like changing IDs
> > as suggested. Also changed the status to OK, and it looks like this is not
> > updated to uknown or error either. http://screencast.com/t/vCx0CQiXm
> > 
> > I do not have any errors in the event log about this all on the moment, but
> > I am uncertain about the state of my hosted storage, especially when I plan
> > to upgrade to 4.0 later this year and what will break.
> 
> We can try doing a workaround to solve your problem- IIUC, your
> hosted_storage is down- so right click on it and choose "Destroy". this will
> remove the domain from oVirt, but won't delete its content.
> After doing that, import the domain back to oVirt.


To be sure I understand it right. I can 'remove' the storage disk? http://screencast.com/t/AXsl9VpPW
Or should I put hosted storage to maintenance and destroy the complete storage?

My hosted engine is up and running and don't want to loose it. :-)

Comment 29 Amit Aviram 2016-05-19 14:40:36 UTC
(In reply to Paul from comment #28)
> (In reply to Amit Aviram from comment #27)
> > (In reply to Paul from comment #26)
> > > I had problems upgrading as my hosted-engine was on 3.6 and my hosts on EL6
> > > and vdsm could not be updated. So I reinstalled my hosts to EL7 and with
> > > different cluster I managed to get hosts linked to the same environment.
> > > 
> > > All looked fine, but my hosted_storage did not import completely, it was
> > > shown, but locked. I removed the storage and it was automatically recreated,
> > > but again locked.
> > 
> > So this is the real bug here. which is that the hosted_storage that was
> > created with hosts running EL6, and was upgraded to EL7, changing its
> > cluster- could not be imported correctly.
> > 
> > Paul, I know it's a long shot- but do you happen to have the logs from the
> > time you changed the cluster etc.?
> 
> Which log files do you want? I think I can find them in my backups.
>  - I installed host around 7 febr. ( /root/anaconda-ks.cfg )
>  - so find backups of februari till early march when storage got imported?
>  ( ovf exception error not able to update, dated from march 1st ).

We need the VDSM logs, from the moment you had EL7 installed (that includes the logs for moving this host to a new cluster, and probably has some error logs from a failed storage operation that caused the SD to make problems)

> 
> > 
> > > After update to 3.6.1 or 3.6.2? ( around 1st of March ) all of a sudden
> > > hosted_storage ( on NFS ) was imported. I guess this was because my
> > > hosted_storage had a different name ( hostedengine_nfs ), but anyway I got a
> > > step further :)
> > >
> > > By this time I started receiveing following error 
> > > http://screencast.com/t/S8cfXMsdGM
> > > The date of the 2 OVF image folders and files on filesystem were never
> > > updated. By this time I deleted the OVF files.
> > > 
> > > I tried different thing to let the system recreate the OVF like changing IDs
> > > as suggested. Also changed the status to OK, and it looks like this is not
> > > updated to uknown or error either. http://screencast.com/t/vCx0CQiXm
> > > 
> > > I do not have any errors in the event log about this all on the moment, but
> > > I am uncertain about the state of my hosted storage, especially when I plan
> > > to upgrade to 4.0 later this year and what will break.
> > 
> > We can try doing a workaround to solve your problem- IIUC, your
> > hosted_storage is down- so right click on it and choose "Destroy". this will
> > remove the domain from oVirt, but won't delete its content.
> > After doing that, import the domain back to oVirt.
> 
> 
> To be sure I understand it right. I can 'remove' the storage disk?
> http://screencast.com/t/AXsl9VpPW
> Or should I put hosted storage to maintenance and destroy the complete
> storage?
> 
> My hosted engine is up and running and don't want to loose it. :-)

So currently there is nothing wrong with your env?

Comment 30 Paul 2016-05-19 14:53:41 UTC
(In reply to Amit Aviram from comment #29)
> (In reply to Paul from comment #28)
> > (In reply to Amit Aviram from comment #27)
> > > (In reply to Paul from comment #26)
> > > > I had problems upgrading as my hosted-engine was on 3.6 and my hosts on EL6
> > > > and vdsm could not be updated. So I reinstalled my hosts to EL7 and with
> > > > different cluster I managed to get hosts linked to the same environment.
> > > > 
> > > > All looked fine, but my hosted_storage did not import completely, it was
> > > > shown, but locked. I removed the storage and it was automatically recreated,
> > > > but again locked.
> > > 
> > > So this is the real bug here. which is that the hosted_storage that was
> > > created with hosts running EL6, and was upgraded to EL7, changing its
> > > cluster- could not be imported correctly.
> > > 
> > > Paul, I know it's a long shot- but do you happen to have the logs from the
> > > time you changed the cluster etc.?
> > 
> > Which log files do you want? I think I can find them in my backups.
> >  - I installed host around 7 febr. ( /root/anaconda-ks.cfg )
> >  - so find backups of februari till early march when storage got imported?
> >  ( ovf exception error not able to update, dated from march 1st ).
> 
> We need the VDSM logs, from the moment you had EL7 installed (that includes
> the logs for moving this host to a new cluster, and probably has some error
> logs from a failed storage operation that caused the SD to make problems)
> 

I will check backups and try to send you logs will be 100 - 200 MB, so will send them through wetransfer.com to your email.

> > 
> > > 
> > > > After update to 3.6.1 or 3.6.2? ( around 1st of March ) all of a sudden
> > > > hosted_storage ( on NFS ) was imported. I guess this was because my
> > > > hosted_storage had a different name ( hostedengine_nfs ), but anyway I got a
> > > > step further :)
> > > >
> > > > By this time I started receiveing following error 
> > > > http://screencast.com/t/S8cfXMsdGM
> > > > The date of the 2 OVF image folders and files on filesystem were never
> > > > updated. By this time I deleted the OVF files.
> > > > 
> > > > I tried different thing to let the system recreate the OVF like changing IDs
> > > > as suggested. Also changed the status to OK, and it looks like this is not
> > > > updated to uknown or error either. http://screencast.com/t/vCx0CQiXm
> > > > 
> > > > I do not have any errors in the event log about this all on the moment, but
> > > > I am uncertain about the state of my hosted storage, especially when I plan
> > > > to upgrade to 4.0 later this year and what will break.
> > > 
> > > We can try doing a workaround to solve your problem- IIUC, your
> > > hosted_storage is down- so right click on it and choose "Destroy". this will
> > > remove the domain from oVirt, but won't delete its content.
> > > After doing that, import the domain back to oVirt.
> > 
> > 
> > To be sure I understand it right. I can 'remove' the storage disk?
> > http://screencast.com/t/AXsl9VpPW
> > Or should I put hosted storage to maintenance and destroy the complete
> > storage?
> > 
> > My hosted engine is up and running and don't want to loose it. :-)
> 
> So currently there is nothing wrong with your env?

Hosted enbgine is up, but I believe my hosted storage is not correctly imported. Status shows strange and ovf files are not updated.
http://screencast.com/t/7jxnsXsG9mS

Comment 31 Amit Aviram 2016-05-19 15:16:16 UTC
(In reply to Paul from comment #30)
> (In reply to Amit Aviram from comment #29)
> > (In reply to Paul from comment #28)
> > > (In reply to Amit Aviram from comment #27)
> > > > (In reply to Paul from comment #26)
> > > > > I had problems upgrading as my hosted-engine was on 3.6 and my hosts on EL6
> > > > > and vdsm could not be updated. So I reinstalled my hosts to EL7 and with
> > > > > different cluster I managed to get hosts linked to the same environment.
> > > > > 
> > > > > All looked fine, but my hosted_storage did not import completely, it was
> > > > > shown, but locked. I removed the storage and it was automatically recreated,
> > > > > but again locked.
> > > > 
> > > > So this is the real bug here. which is that the hosted_storage that was
> > > > created with hosts running EL6, and was upgraded to EL7, changing its
> > > > cluster- could not be imported correctly.
> > > > 
> > > > Paul, I know it's a long shot- but do you happen to have the logs from the
> > > > time you changed the cluster etc.?
> > > 
> > > Which log files do you want? I think I can find them in my backups.
> > >  - I installed host around 7 febr. ( /root/anaconda-ks.cfg )
> > >  - so find backups of februari till early march when storage got imported?
> > >  ( ovf exception error not able to update, dated from march 1st ).
> > 
> > We need the VDSM logs, from the moment you had EL7 installed (that includes
> > the logs for moving this host to a new cluster, and probably has some error
> > logs from a failed storage operation that caused the SD to make problems)
> > 
> 
> I will check backups and try to send you logs will be 100 - 200 MB, so will
> send them through wetransfer.com to your email.
> 
> > > 
> > > > 
> > > > > After update to 3.6.1 or 3.6.2? ( around 1st of March ) all of a sudden
> > > > > hosted_storage ( on NFS ) was imported. I guess this was because my
> > > > > hosted_storage had a different name ( hostedengine_nfs ), but anyway I got a
> > > > > step further :)
> > > > >
> > > > > By this time I started receiveing following error 
> > > > > http://screencast.com/t/S8cfXMsdGM
> > > > > The date of the 2 OVF image folders and files on filesystem were never
> > > > > updated. By this time I deleted the OVF files.
> > > > > 
> > > > > I tried different thing to let the system recreate the OVF like changing IDs
> > > > > as suggested. Also changed the status to OK, and it looks like this is not
> > > > > updated to uknown or error either. http://screencast.com/t/vCx0CQiXm
> > > > > 
> > > > > I do not have any errors in the event log about this all on the moment, but
> > > > > I am uncertain about the state of my hosted storage, especially when I plan
> > > > > to upgrade to 4.0 later this year and what will break.
> > > > 
> > > > We can try doing a workaround to solve your problem- IIUC, your
> > > > hosted_storage is down- so right click on it and choose "Destroy". this will
> > > > remove the domain from oVirt, but won't delete its content.
> > > > After doing that, import the domain back to oVirt.
> > > 
> > > 
> > > To be sure I understand it right. I can 'remove' the storage disk?
> > > http://screencast.com/t/AXsl9VpPW
> > > Or should I put hosted storage to maintenance and destroy the complete
> > > storage?
> > > 
> > > My hosted engine is up and running and don't want to loose it. :-)
> > 
> > So currently there is nothing wrong with your env?
> 
> Hosted enbgine is up, but I believe my hosted storage is not correctly
> imported. Status shows strange and ovf files are not updated.
> http://screencast.com/t/7jxnsXsG9mS

Well, if you want to take the chance, you can try destroy the hole storage (not deleting the disk, as demonstrated in your screenshot) then import the storage again. It should work in a standard env, but yours is problematic, so maybe it worth just waiting until you'll need to update it again anyway.

Comment 32 Paul 2016-05-19 15:44:33 UTC
(In reply to Amit Aviram from comment #31)
> (In reply to Paul from comment #30)
> > (In reply to Amit Aviram from comment #29)
> > > (In reply to Paul from comment #28)
> > > > (In reply to Amit Aviram from comment #27)
> > > > > (In reply to Paul from comment #26)
> > > > > > I had problems upgrading as my hosted-engine was on 3.6 and my hosts on EL6
> > > > > > and vdsm could not be updated. So I reinstalled my hosts to EL7 and with
> > > > > > different cluster I managed to get hosts linked to the same environment.
> > > > > > 
> > > > > > All looked fine, but my hosted_storage did not import completely, it was
> > > > > > shown, but locked. I removed the storage and it was automatically recreated,
> > > > > > but again locked.
> > > > > 
> > > > > So this is the real bug here. which is that the hosted_storage that was
> > > > > created with hosts running EL6, and was upgraded to EL7, changing its
> > > > > cluster- could not be imported correctly.
> > > > > 
> > > > > Paul, I know it's a long shot- but do you happen to have the logs from the
> > > > > time you changed the cluster etc.?
> > > > 
> > > > Which log files do you want? I think I can find them in my backups.
> > > >  - I installed host around 7 febr. ( /root/anaconda-ks.cfg )
> > > >  - so find backups of februari till early march when storage got imported?
> > > >  ( ovf exception error not able to update, dated from march 1st ).
> > > 
> > > We need the VDSM logs, from the moment you had EL7 installed (that includes
> > > the logs for moving this host to a new cluster, and probably has some error
> > > logs from a failed storage operation that caused the SD to make problems)
> > > 
> > 
> > I will check backups and try to send you logs will be 100 - 200 MB, so will
> > send them through wetransfer.com to your email.
> > 
> > > > 
> > > > > 
> > > > > > After update to 3.6.1 or 3.6.2? ( around 1st of March ) all of a sudden
> > > > > > hosted_storage ( on NFS ) was imported. I guess this was because my
> > > > > > hosted_storage had a different name ( hostedengine_nfs ), but anyway I got a
> > > > > > step further :)
> > > > > >
> > > > > > By this time I started receiveing following error 
> > > > > > http://screencast.com/t/S8cfXMsdGM
> > > > > > The date of the 2 OVF image folders and files on filesystem were never
> > > > > > updated. By this time I deleted the OVF files.
> > > > > > 
> > > > > > I tried different thing to let the system recreate the OVF like changing IDs
> > > > > > as suggested. Also changed the status to OK, and it looks like this is not
> > > > > > updated to uknown or error either. http://screencast.com/t/vCx0CQiXm
> > > > > > 
> > > > > > I do not have any errors in the event log about this all on the moment, but
> > > > > > I am uncertain about the state of my hosted storage, especially when I plan
> > > > > > to upgrade to 4.0 later this year and what will break.
> > > > > 
> > > > > We can try doing a workaround to solve your problem- IIUC, your
> > > > > hosted_storage is down- so right click on it and choose "Destroy". this will
> > > > > remove the domain from oVirt, but won't delete its content.
> > > > > After doing that, import the domain back to oVirt.
> > > > 
> > > > 
> > > > To be sure I understand it right. I can 'remove' the storage disk?
> > > > http://screencast.com/t/AXsl9VpPW
> > > > Or should I put hosted storage to maintenance and destroy the complete
> > > > storage?
> > > > 
> > > > My hosted engine is up and running and don't want to loose it. :-)
> > > 
> > > So currently there is nothing wrong with your env?
> > 
> > Hosted enbgine is up, but I believe my hosted storage is not correctly
> > imported. Status shows strange and ovf files are not updated.
> > http://screencast.com/t/7jxnsXsG9mS
> 
> Well, if you want to take the chance, you can try destroy the hole storage
> (not deleting the disk, as demonstrated in your screenshot) then import the
> storage again. It should work in a standard env, but yours is problematic,
> so maybe it worth just waiting until you'll need to update it again anyway.


I sent the logs through wetransfer. Hope it makes some sense.

I think I am going to see how I can install a hosted engine on EL7 host and then import storage domain of the VMs. 
This way I am also ready for ovirt 4.0.
Any known documents for this step would be welcome :)

Comment 33 Amit Aviram 2016-05-26 09:21:42 UTC
(In reply to Paul from comment #32)
> (In reply to Amit Aviram from comment #31)
> > (In reply to Paul from comment #30)
> > > (In reply to Amit Aviram from comment #29)
> > > > (In reply to Paul from comment #28)
> > > > > (In reply to Amit Aviram from comment #27)
> > > > > > (In reply to Paul from comment #26)
> > > > > > > I had problems upgrading as my hosted-engine was on 3.6 and my hosts on EL6
> > > > > > > and vdsm could not be updated. So I reinstalled my hosts to EL7 and with
> > > > > > > different cluster I managed to get hosts linked to the same environment.
> > > > > > > 
> > > > > > > All looked fine, but my hosted_storage did not import completely, it was
> > > > > > > shown, but locked. I removed the storage and it was automatically recreated,
> > > > > > > but again locked.
> > > > > > 
> > > > > > So this is the real bug here. which is that the hosted_storage that was
> > > > > > created with hosts running EL6, and was upgraded to EL7, changing its
> > > > > > cluster- could not be imported correctly.
> > > > > > 
> > > > > > Paul, I know it's a long shot- but do you happen to have the logs from the
> > > > > > time you changed the cluster etc.?
> > > > > 
> > > > > Which log files do you want? I think I can find them in my backups.
> > > > >  - I installed host around 7 febr. ( /root/anaconda-ks.cfg )
> > > > >  - so find backups of februari till early march when storage got imported?
> > > > >  ( ovf exception error not able to update, dated from march 1st ).
> > > > 
> > > > We need the VDSM logs, from the moment you had EL7 installed (that includes
> > > > the logs for moving this host to a new cluster, and probably has some error
> > > > logs from a failed storage operation that caused the SD to make problems)
> > > > 
> > > 
> > > I will check backups and try to send you logs will be 100 - 200 MB, so will
> > > send them through wetransfer.com to your email.
> > > 
> > > > > 
> > > > > > 
> > > > > > > After update to 3.6.1 or 3.6.2? ( around 1st of March ) all of a sudden
> > > > > > > hosted_storage ( on NFS ) was imported. I guess this was because my
> > > > > > > hosted_storage had a different name ( hostedengine_nfs ), but anyway I got a
> > > > > > > step further :)
> > > > > > >
> > > > > > > By this time I started receiveing following error 
> > > > > > > http://screencast.com/t/S8cfXMsdGM
> > > > > > > The date of the 2 OVF image folders and files on filesystem were never
> > > > > > > updated. By this time I deleted the OVF files.
> > > > > > > 
> > > > > > > I tried different thing to let the system recreate the OVF like changing IDs
> > > > > > > as suggested. Also changed the status to OK, and it looks like this is not
> > > > > > > updated to uknown or error either. http://screencast.com/t/vCx0CQiXm
> > > > > > > 
> > > > > > > I do not have any errors in the event log about this all on the moment, but
> > > > > > > I am uncertain about the state of my hosted storage, especially when I plan
> > > > > > > to upgrade to 4.0 later this year and what will break.
> > > > > > 
> > > > > > We can try doing a workaround to solve your problem- IIUC, your
> > > > > > hosted_storage is down- so right click on it and choose "Destroy". this will
> > > > > > remove the domain from oVirt, but won't delete its content.
> > > > > > After doing that, import the domain back to oVirt.
> > > > > 
> > > > > 
> > > > > To be sure I understand it right. I can 'remove' the storage disk?
> > > > > http://screencast.com/t/AXsl9VpPW
> > > > > Or should I put hosted storage to maintenance and destroy the complete
> > > > > storage?
> > > > > 
> > > > > My hosted engine is up and running and don't want to loose it. :-)
> > > > 
> > > > So currently there is nothing wrong with your env?
> > > 
> > > Hosted enbgine is up, but I believe my hosted storage is not correctly
> > > imported. Status shows strange and ovf files are not updated.
> > > http://screencast.com/t/7jxnsXsG9mS
> > 
> > Well, if you want to take the chance, you can try destroy the hole storage
> > (not deleting the disk, as demonstrated in your screenshot) then import the
> > storage again. It should work in a standard env, but yours is problematic,
> > so maybe it worth just waiting until you'll need to update it again anyway.
> 
> 
> I sent the logs through wetransfer. Hope it makes some sense.

I will look at your logs, thanks.

> 
> I think I am going to see how I can install a hosted engine on EL7 host and
> then import storage domain of the VMs. 
> This way I am also ready for ovirt 4.0.
> Any known documents for this step would be welcome :)

You can have a look in oVirt's site, there is an explanation over there for how to work with HE:
https://www.ovirt.org/documentation/how-to/hosted-engine/

Hope that helps.

Comment 34 Yaniv Lavi 2016-06-20 11:23:37 UTC
It seems that the problem was resolved. if not please reopen with the current issues.

Comment 35 Alex Kaouris 2017-03-01 12:46:45 UTC
I was able to reproduce this on ovirt 4.0 and ovirt 4.1. 
On a clean installation, the ha agent on first host days the following: 

MainThread::WARNING::2017-03-01 12:38:02,365::ovf_store::107::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Unable to find OVF_STORE
MainThread::ERROR::2017-03-01 12:38:02,369::config::443::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) Unable to identify the OVF_STORE volume, falling back to initial vm.conf. Please ensure you already added your first data domain for regular VMs

Data storage was successfully added and then hosted engine storage automatically imported during deploy. The above error keeps going for at least several hours. At some point the host is able to find OVF_STORE. 

at oVirt Engine Version: 4.1.0.4-1.el7.centos, I can only add a second host from GUI. When adding the host, the /run/ovirt-hosted-engine-ha/vm.conf is missing and ha agent and broker is not starting. 
I will check what will happen after several hours when the first host will be able to find OVF_STORE. 

It seems that I have to wait for several hours until first host finds the OVF_STORE and only then to add teh second host, in hope that the second host will find the OVF_STORE and get the engine vm.conf.

Comment 36 Alex Kaouris 2017-03-01 15:00:35 UTC
After approx 2 hours the first host was able to find the OVF_STORE and no more errors on logs. 

Also, I was able to reinstall the second host. This time i checked the "deploy" hosted engine option (I was not aware of that...) and host is flagged as "can run the hosted engine VM". Seems that all are ok now.


Note You need to log in before you can comment on or make changes to this bug.