1094657 – [RFE] Integrate hosted engine with vdsm using ISCSI storage doamain

Bug 1094657 - [RFE] Integrate hosted engine with vdsm using ISCSI storage doamain

Summary: [RFE] Integrate hosted engine with vdsm using ISCSI storage doamain

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	oVirt
Classification:	Retired
Component:	vdsm
Sub Component:
Version:	3.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	3.5.0
Assignee:	Jiri Moskovcak
QA Contact:	Gil Klein
Docs Contact:
URL:
Whiteboard:	sla
Depends On:	1150419
Blocks:	1080402 1123006
TreeView+	depends on / blocked

Reported:	2014-05-06 08:59 UTC by Jiri Moskovcak
Modified:	2016-02-10 19:41 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-10-17 12:35:23 UTC
oVirt Team:	SLA
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
oVirt gerrit	27880	None	ABANDONED	lvm: ignore LVs with special tag	Never
oVirt gerrit	28237	master	MERGED	use VDSM for creating volumes	Never
oVirt gerrit	28238	None	None	None	Never

Description Jiri Moskovcak 2014-05-06 08:59:05 UTC

Description of problem:
VDSM disables all the logical volumes in it's volume group when it's stopped and then only activates known LVs (leaving unknown deactivated). This causes problems for the engine-hosted ha-agent and ha-broker because these services are running as vdsm thus they can't re-activate their LVs.

Version-Release number of selected component (if applicable):
vdsm-4.14.1-275.git8ddfbf0.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. try to deploy hosted-engine on iscsi
2. when the engine deploys the host the vdsm is restarted and deactivates the LVs
3. ha broker fails to access it's storage

Actual results:
vdsm disables all LVs in the volume group

Expected results:
vdsm only disables it's LVs

Comment 1 Nir Soffer 2014-05-14 15:35:37 UTC

Well, all lvs in vdsm's vg do belong to vdsm. The real issue using vdsm's vg for your. So this is not a bug but the expected behavior of the system.

Comment 2 Federico Simoncelli 2014-05-14 22:08:42 UTC

Can we come up with an lv tag that can be used to mark lvs that should not be managed by vdsm (even if they're part of a storage domain vg)?

Comment 3 Nir Soffer 2014-05-14 22:20:22 UTC

(In reply to Federico Simoncelli from comment #2)
> Can we come up with an lv tag that can be used to mark lvs that should not
> be managed by vdsm (even if they're part of a storage domain vg)?

This seems to be the simplest solution, but I think we should first understand why the special lvs must be in vdsm vg.

Comment 4 Itamar Heim 2014-05-18 20:38:22 UTC

(In reply to Nir Soffer from comment #3)
> (In reply to Federico Simoncelli from comment #2)
> > Can we come up with an lv tag that can be used to mark lvs that should not
> > be managed by vdsm (even if they're part of a storage domain vg)?
> 
> This seems to be the simplest solution, but I think we should first
> understand why the special lvs must be in vdsm vg.

I'd expect Engine to be able to create this (via vdsm) in future versions[1]. we don't want storage not managed via vdsm.

[1] for example, move the hosted engine VM via live storage migration from one domain to another.

Comment 5 Federico Simoncelli 2014-05-28 07:47:47 UTC

(In reply to Itamar Heim from comment #4)
> (In reply to Nir Soffer from comment #3)
> > (In reply to Federico Simoncelli from comment #2)
> > > Can we come up with an lv tag that can be used to mark lvs that should not
> > > be managed by vdsm (even if they're part of a storage domain vg)?
> > 
> > This seems to be the simplest solution, but I think we should first
> > understand why the special lvs must be in vdsm vg.
> 
> I'd expect Engine to be able to create this (via vdsm) in future
> versions[1]. we don't want storage not managed via vdsm.
> 
> [1] for example, move the hosted engine VM via live storage migration from
> one domain to another.

[1] is not limited to block domains. If you want to use a new storage domain for hosted engine it needs to be prepared for such task (for example also nfs domains need additional special files).

We'll probably be able to address that with a new storage domain version. The problem is that (as far as I know) the LVs and the special files were never reviewed by the storage team (at least nobody sought my opinion).
Therefore at this time I cannot guarantee 100% that what we'll officially agree and provide the same files/lvs.

What we can do right now is trying not to interfere with additional files/lvs placed by other applications such as hosted engine.

Comment 6 Federico Simoncelli 2014-05-28 08:27:11 UTC

I just spoke with Jiri and the two files/lvs are:

hosted-engine.metadata
hosted-engine.lockspace

I don't mind to supporting these in storage domain V4 (creation and activation).

As far as the current problem (activation) we can start supporting them early adding the lvs to the special lvs in VDSM.

If we're committing to have this in V4 then we need to be sure that this format is set in stone and won't change.

If you're not sure yet about the format or you want to be more flexible then we can go with the "ignore" lv tag.

Comment 7 Dan Kenigsberg 2014-05-28 13:25:34 UTC

In my opinion, Vdsm should not be aware of hosted-engine at all, and hosted-engine should not create files/lvs within Vdsm's storage domains.

When hosted-engine needs to place a volume in a storage domain, it should use Vdsm's api: createVolume to create it, prepareVolume to activate it.

hosted-engine may keep the volume open; vdsm does not deactivate open volumes. Alternatively, hosted-engine can call prepareVolume again if it finds that the volume as been deactivated.

One of the benefits of this approach for hosted-engine, is that it provides an abstraction: hosted-engine no longer needs to care if it's a fileSD or a blockSD.

Another benefit is that if in the future we'd like to have another highly-available VM, say "hosted-neutron", we do not need to invent new special volumes or a new SD format V5.

Comment 8 Federico Simoncelli 2014-05-28 13:27:28 UTC

Jiri I'd have to admit that comment 7 makes several good points. What do you think?

Comment 9 Jiri Moskovcak 2014-05-28 14:06:26 UTC

Ok, going to implement what Dan suggests in comment#7. Btw, vdsm already knows about hosted-engine, just not on the storage level.

Comment 10 Nir Soffer 2014-05-28 14:12:07 UTC

(In reply to Dan Kenigsberg from comment #7)
I think this is the best solution.

Comment 11 Sandro Bonazzola 2014-05-29 13:50:59 UTC

(In reply to Dan Kenigsberg from comment #7)
> In my opinion, Vdsm should not be aware of hosted-engine at all, and
> hosted-engine should not create files/lvs within Vdsm's storage domains.
> 
> When hosted-engine needs to place a volume in a storage domain, it should
> use Vdsm's api: createVolume to create it, prepareVolume to activate it.
> 
> hosted-engine may keep the volume open; vdsm does not deactivate open
> volumes. Alternatively, hosted-engine can call prepareVolume again if it
> finds that the volume as been deactivated.
> 
> One of the benefits of this approach for hosted-engine, is that it provides
> an abstraction: hosted-engine no longer needs to care if it's a fileSD or a
> blockSD.

Do you mean that createVolume can create the following files in the domain directory?
hosted-engine.metadata
hosted-engine.lockspace

Or that we should create volumes and then symlink?
Or something else?
Volume creation requires a connected pool as far as I know and we can't have pool connected while running the engine, only monitored domains.
Does prepareVolume works without a pool?


> 
> Another benefit is that if in the future we'd like to have another
> highly-available VM, say "hosted-neutron", we do not need to invent new
> special volumes or a new SD format V5.

Comment 12 Sandro Bonazzola 2014-05-29 13:52:15 UTC

In any case, I really think that hosted engine project will benefit really much by having one person from storage team (and maybe one from network) involved in the development / maintenance of the project.

Comment 13 Martin Sivák 2014-05-29 14:08:39 UTC

There are some issues with Dan's proposal:

1) What format will the volume have? Sanlock uses the whole file as it sees fit. Hosted engine agent uses the other file in the same way. Both are about 1MB in size (except iSCSI because the VG has 128MB big extent size) and start with zeroed content.

2) Keeping the file/volume open at all times just to prevent VDSM from closing it is error prone and fragile. I would rather expect a flag that tells VDSM to not touch the volume (independently on who created it or if it is a proper volume or not) except when explicit action is requested.

In general the design assumptions are:

Hosted engine infrastructure has to work even when VDSM crashes (or is updated) or when engine dies. So all the volumes/files have to be available
and the agent then makes sure that broker/storage/vdsm/sanlock are all ready before processing next action in the internal state machine.

There are three things stored in the hosted engine SD. The actual disk for the engine VM and the two metadata files. The metadata files have to have atomic write on the block level (512B or 4kiB) and nobody is allowed to touch the content except the proper services.

Federico:

> new storage format

I do not think that the specific names should be part of the format. API to create custom metadata files with arbitrary names would be handy though.

> were never reviewed by the storage team (at least nobody sought my opinion).

I am pretty sure I discussed the block SD design with you during the VDSM gathering in TLV and the understanding I got was that VDSM does not touch anything without a proper label. Which seems not to be true unfortunately...

Btw you really were approached during the initial hosted engine design and I remember you were attending our daily phone meetings. Although probably not all of them.

Comment 14 Martin Sivák 2014-05-29 14:15:16 UTC

Also if we ever decide to support direct IO device to hold the metadata, then the volume has to be mounted on all hosted engine capable hosts.

There will be no data corruption, because we do not use any "filesystem" and the algorithms are aware of the fact that it is a shared "whiteboard". Everybody publishes its reports to a reserved spot there.

Does VDSM support setup like that?

Comment 15 Sandro Bonazzola 2014-07-21 14:50:28 UTC

Shouldn't this be ON_QA? I guess it wasn't referenced in the code for beta1 but it should be fixed there.

Comment 16 Jiri Moskovcak 2014-07-22 11:02:13 UTC

(In reply to Sandro Bonazzola from comment #15)
> Shouldn't this be ON_QA? I guess it wasn't referenced in the code for beta1
> but it should be fixed there.

Is that up to me to move it to ON_QA?

Comment 17 Sandro Bonazzola 2014-07-22 11:08:19 UTC

(In reply to Jiri Moskovcak from comment #16)
> (In reply to Sandro Bonazzola from comment #15)
> > Shouldn't this be ON_QA? I guess it wasn't referenced in the code for beta1
> > but it should be fixed there.
> 
> Is that up to me to move it to ON_QA?

Well, if Bug-Url reference the bug, I can move it automatically, but if it's not referenced, the assignee should take care of the bug life :-)

Comment 18 Jiri Moskovcak 2014-07-22 11:10:33 UTC

http://gerrit.ovirt.org/#/c/28237/ does ;)

Comment 19 Sandro Bonazzola 2014-10-17 12:35:23 UTC

oVirt 3.5 has been released and should include the fix for this issue.

Note You need to log in before you can comment on or make changes to this bug.