Bug 1303316

Summary: vm.conf does not get updated if hosted engine is installed on block storage
Product: Red Hat Enterprise Virtualization Manager Reporter: Martin Tessun <mtessun>
Component: ovirt-hosted-engine-haAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED ERRATA QA Contact: Artyom <alukiano>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.0CC: amureini, bobby.prins, dfediuck, gklein, juwu, lsurette, nsoffer, sbonazzo, stirabos, tnisan, ykaul, ylavi
Target Milestone: ovirt-3.6.3Keywords: Triaged
Target Release: 3.6.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-09 19:50:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1283458    

Description Martin Tessun 2016-01-30 17:46:15 UTC
Description of problem:
vm.conf does not get updated from OVF_STORE data if the hosted engine is installed on block storage.

Version-Release number of selected component (if applicable):
RHEV 3.6 beta3

How reproducible:
always

Steps to Reproduce:
1. Install HE on block storage
2. Wait for OVF_STORE being created
3. Check the agent.logs

Actual results:
vm.conf can't be updated, as the OVF_STORE device is missing

Expected results:
vm.conf should be updated (and OVF_STORE device should be activated in udev)

Additional info:

Logs:
MainThread::INFO::2016-01-30 18:31:05,070::hosted_engine::672::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Reloading vm.conf from the shared storage domain
MainThread::INFO::2016-01-30 18:31:05,081::config::205::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Trying to get a fresher copy of vm configuration from the OVF_STORE
MainThread::INFO::2016-01-30 18:31:05,502::ovf_store::101::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:7736c611-be93-4fd1-9f82-73d4f804dabe, volUUID:68ad3016-a133-4d51-a58f-27ce000061f1
MainThread::INFO::2016-01-30 18:31:05,918::ovf_store::101::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:576e711d-2432-49e0-a1e8-b85788c9528d, volUUID:513cc1ff-7c4a-4ec6-a039-67627c5b87c9
MainThread::INFO::2016-01-30 18:31:07,576::ovf_store::110::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE
MainThread::INFO::2016-01-30 18:31:07,576::ovf_store::117::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /rhev/data-center/mnt/blockSD/6de103bc-84b1-404d-bf32-126ce75984d1/images/576e711d-2432-49e0-a1e8-b85788c9528d/513cc1ff-7c4a-4ec6-a039-67627c5b87c9 
MainThread::ERROR::2016-01-30 18:31:07,596::ovf_store::122::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Unable to extract HEVM OVF
MainThread::ERROR::2016-01-30 18:31:07,597::config::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf

Checking the file:
[root@ovirt1 ~]# ls -l /rhev/data-center/mnt/blockSD/6de103bc-84b1-404d-bf32-126ce75984d1/images/576e711d-2432-49e0-a1e8-b85788c9528d/513cc1ff-7c4a-4ec6-a039-67627c5b87c9
lrwxrwxrwx. 1 vdsm kvm 78 Jan 30 17:49 /rhev/data-center/mnt/blockSD/6de103bc-84b1-404d-bf32-126ce75984d1/images/576e711d-2432-49e0-a1e8-b85788c9528d/513cc1ff-7c4a-4ec6-a039-67627c5b87c9 -> /dev/6de103bc-84b1-404d-bf32-126ce75984d1/513cc1ff-7c4a-4ec6-a039-67627c5b87c9

Ok and now check the dev/LV:
[root@ovirt1 ~]# ls -l /dev/6de103bc-84b1-404d-bf32-126ce75984d1/
total 0
lrwxrwxrwx. 1 root root 8 Jan 30 16:44 3b66c63e-4833-4925-906a-554cfc00159a -> ../dm-28
lrwxrwxrwx. 1 root root 8 Jan 30 16:45 a43b38f9-eaa1-4ae9-8f19-b924f52e378d -> ../dm-29
lrwxrwxrwx. 1 root root 8 Jan 30 16:44 e09865c4-b7ad-4a01-a51f-0fc2bd08c2fa -> ../dm-24
lrwxrwxrwx. 1 root root 8 Jan 30 18:34 ff3fb0a9-a9b3-46a0-ae05-a435dc41c036 -> ../dm-23
lrwxrwxrwx. 1 root root 8 Jan 30 16:44 ids -> ../dm-20
lrwxrwxrwx. 1 root root 8 Jan 30 16:44 inbox -> ../dm-21
lrwxrwxrwx. 1 root root 8 Jan 30 17:49 leases -> ../dm-19
lrwxrwxrwx. 1 root root 8 Jan 30 16:44 master -> ../dm-22
lrwxrwxrwx. 1 root root 8 Jan 30 17:50 metadata -> ../dm-16
lrwxrwxrwx. 1 root root 8 Jan 30 16:44 outbox -> ../dm-17

After enabling the LV this does work on the host running the HE.

On the other nodes not running the HE, the OVF_Store gets never seen:
MainThread::INFO::2016-01-30 18:39:31,100::config::205::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Trying to get a fresher copy of vm configuration from the OVF_STORE
MainThread::WARNING::2016-01-30 18:39:33,291::ovf_store::105::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Unable to find OVF_STORE
MainThread::ERROR::2016-01-30 18:39:33,292::config::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf

Comment 1 Martin Tessun 2016-01-30 17:52:11 UTC
Just some additional observations:

After Maintenance-Mode and restart of the hypervisor, the HV is able to get the vm.conf:

MainThread::INFO::2016-01-30 18:50:24,137::hosted_engine::710::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: /var/run/vdsm/storage/6de103bc-84b1-404d-bf32-126ce75984d1/57787128-ae20-4a74-9dda-2608a9ef6b4d/e09865c4-b7ad-4a01-a51f-0fc2bd08c2fa)
MainThread::INFO::2016-01-30 18:50:45,153::hosted_engine::744::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Acquired lock on host id 1
MainThread::INFO::2016-01-30 18:50:45,154::upgrade::944::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(upgrade) Host configuration is already up-to-date
MainThread::INFO::2016-01-30 18:50:45,154::hosted_engine::424::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Reloading vm.conf from the shared storage domain
MainThread::INFO::2016-01-30 18:50:45,154::config::205::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Trying to get a fresher copy of vm configuration from the OVF_STORE
MainThread::INFO::2016-01-30 18:50:46,432::ovf_store::101::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:7736c611-be93-4fd1-9f82-73d4f804dabe, volUUID:68ad3016-a133-4d51-a58f-27ce000061f1
MainThread::INFO::2016-01-30 18:50:47,903::ovf_store::101::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:576e711d-2432-49e0-a1e8-b85788c9528d, volUUID:513cc1ff-7c4a-4ec6-a039-67627c5b87c9
MainThread::INFO::2016-01-30 18:50:51,235::ovf_store::110::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE
MainThread::INFO::2016-01-30 18:50:51,236::ovf_store::117::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /rhev/data-center/mnt/blockSD/6de103bc-84b1-404d-bf32-126ce75984d1/images/576e711d-2432-49e0-a1e8-b85788c9528d/513cc1ff-7c4a-4ec6-a039-67627c5b87c9 
MainThread::INFO::2016-01-30 18:50:51,266::config::225::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Found an OVF for HE VM, trying to convert
MainThread::INFO::2016-01-30 18:50:51,270::config::230::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Got vm.conf from OVF_STORE

Comment 4 Simone Tiraboschi 2016-02-01 08:48:02 UTC
This happens because after a reboot we don't call prepareImage on the OVF_STORE images since we don't know their UUIDs and getImagesList is failing since we are still not connected to the storagePool. We need to have the fix for 1274622 backported to 3.6 in order to be able to fix this one.

Comment 5 Simone Tiraboschi 2016-02-03 11:02:18 UTC
I don't think we can fix for 3.6.3 if getImagesList is broken there.

Comment 6 Yaniv Lavi 2016-02-03 11:21:03 UTC
Allon, can someone have a look for a solution for 3.6?

Comment 7 Allon Mureinik 2016-02-03 12:17:16 UTC
(In reply to Yaniv Dary from comment #6)
> Allon, can someone have a look for a solution for 3.6?

Could someone actually explain the flow and what exactly fails instead of trying to ask for specific patches backports?

Comment 8 Simone Tiraboschi 2016-02-03 12:32:50 UTC
(In reply to Allon Mureinik from comment #7)
> Could someone actually explain the flow and what exactly fails instead of
> trying to ask for specific patches backports?

The flow:
https://bugzilla.redhat.com/show_bug.cgi?id=1274622#c9

The issue:
https://bugzilla.redhat.com/show_bug.cgi?id=1274622

Comment 9 Allon Mureinik 2016-02-03 12:43:51 UTC
(In reply to Simone Tiraboschi from comment #8)
> (In reply to Allon Mureinik from comment #7)
> > Could someone actually explain the flow and what exactly fails instead of
> > trying to ask for specific patches backports?
> 
> The flow:
> https://bugzilla.redhat.com/show_bug.cgi?id=1274622#c9
> 
> The issue:
> https://bugzilla.redhat.com/show_bug.cgi?id=1274622

You should import the domain, and force an update of the OVF_STORE, like Martin suggested in https://gerrit.ovirt.org/#/c/51842/.
Once you do that, store the UUID.

I don't see any reason to read the OVF_STORE in a domain outside of the pool.

Comment 10 Simone Tiraboschi 2016-02-03 12:59:25 UTC
(In reply to Allon Mureinik from comment #9)
> You should import the domain, and force an update of the OVF_STORE, like
> Martin suggested in https://gerrit.ovirt.org/#/c/51842/.
> Once you do that, store the UUID.

Here we have two distinct components involved in this flow: the engine and ovirt-ha-agent

The engine auto-imports the hosted-engine storage domain, only after that, it generates the OVF_STORE and it knows its UUID, ovirt-ha-agent doesn't know.

Now let's see what happens when we reboot the host: we have just ovirt-ha-agent, still no engine. We are still not connected to any storagePool.

ovirt-ha-agent has to fetch the latest engine VM configuration from the OVF_STORE so it has to call prepareImage on it but it doesn't know the OVF_STORE UUID.
We can call getImagesList to discover it but it's failing due to https://bugzilla.redhat.com/show_bug.cgi?id=1274622

Comment 11 Allon Mureinik 2016-02-03 16:58:04 UTC
OK, I think I'm starting to get there...

A few more questions though - isn't the HA agent supposed to be aware of the pool? Why aren't we starting up VDSM and connecting it to the pool straight away?

Also, bug 1274622 (and its fix) are specifically about file storage. This one is about block storage, so I don't see how backporting it (assuming it were possible) would help.

Comment 12 Simone Tiraboschi 2016-02-03 17:18:24 UTC
(In reply to Allon Mureinik from comment #11)
> OK, I think I'm starting to get there...
> 
> A few more questions though - isn't the HA agent supposed to be aware of the
> pool? Why aren't we starting up VDSM and connecting it to the pool straight
> away?

No, it's not.
The hosted-engine storage domain will get attached to the storagePool of the datacenter which contains the hosted-engine cluster only when engine will auto import it (it will do that only when the datacenter will be up and for that the user has to add at least one additional storagedomain).
So the HA agent is agnostic against the storage pool.

> Also, bug 1274622 (and its fix) are specifically about file storage. This
> one is about block storage, so I don't see how backporting it (assuming it
> were possible) would help.

This is true, 1274622 is file based specific but for that we avoid calling getImagesList and prepareImage on the image we don't know about.
On NFS (and on iSCSI too) it seams that the image becomes available as a side effect of something else also if we don't prepare them.

A possibile, but weird and not that smart, solution is to always call getImagesList ignoring all the failure there, if and only if, we get something back (as it should happen on block devices AFAIK) we can scan for OVF_STORE images in order to prepare them. 
On file based devices, we'll simply cross our fingers hoping that accessing the OVF_STORE will continue working as today without the need to prepare them.

Comment 14 Julie 2016-02-22 06:10:55 UTC
If this bug requires doc text for errata release, please provide draft text in the doc text field in the following format:

Cause:
Consequence:
Fix:
Result:

The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Comment 15 Artyom 2016-02-25 15:25:33 UTC
Verified on ovirt-hosted-engine-ha-1.3.4.3-1.el7ev.noarch(over ISCSI storage)

MainThread::INFO::2016-02-25 17:23:50,491::config::205::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Trying to get a fresher copy of vm configuration from the OVF_STORE
MainThread::INFO::2016-02-25 17:23:50,701::ovf_store::100::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:a5e49dd0-be39-4374-9e49-8dce6395b758, volUUID:6ba4dd03-8903-415b-b0d6-b46c21e3e96f
MainThread::INFO::2016-02-25 17:23:51,270::ovf_store::100::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Found OVF_STORE: imgUUID:8147492c-b641-4790-8eae-c9301f7a5e31, volUUID:e8af7c51-dc7e-4791-96f8-5904dc5d62c6
MainThread::INFO::2016-02-25 17:23:51,271::ovf_store::109::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) Extracting Engine VM OVF from the OVF_STORE
MainThread::INFO::2016-02-25 17:23:51,272::ovf_store::116::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) OVF_STORE volume path: /rhev/data-center/mnt/blockSD/4bf73cdc-7ee1-4309-8d69-94a29d9fe36d/images/8147492c-b641-4790-8eae-c9301f7a5e31/e8af7c51-dc7e-4791-96f8-5904dc5d62c6
MainThread::INFO::2016-02-25 17:23:51,281::config::225::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Found an OVF for HE VM, trying to convert
MainThread::INFO::2016-02-25 17:23:51,284::config::230::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_local_conf_file) Got vm.conf from OVF_STORE

Comment 17 errata-xmlrpc 2016-03-09 19:50:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0422.html