Description of problem: VDSM disables all the logical volumes in it's volume group when it's stopped and then only activates known LVs (leaving unknown deactivated). This causes problems for the engine-hosted ha-agent and ha-broker because these services are running as vdsm thus they can't re-activate their LVs. Version-Release number of selected component (if applicable): vdsm-4.14.1-275.git8ddfbf0.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. try to deploy hosted-engine on iscsi 2. when the engine deploys the host the vdsm is restarted and deactivates the LVs 3. ha broker fails to access it's storage Actual results: vdsm disables all LVs in the volume group Expected results: vdsm only disables it's LVs
Well, all lvs in vdsm's vg do belong to vdsm. The real issue using vdsm's vg for your. So this is not a bug but the expected behavior of the system.
Can we come up with an lv tag that can be used to mark lvs that should not be managed by vdsm (even if they're part of a storage domain vg)?
(In reply to Federico Simoncelli from comment #2) > Can we come up with an lv tag that can be used to mark lvs that should not > be managed by vdsm (even if they're part of a storage domain vg)? This seems to be the simplest solution, but I think we should first understand why the special lvs must be in vdsm vg.
(In reply to Nir Soffer from comment #3) > (In reply to Federico Simoncelli from comment #2) > > Can we come up with an lv tag that can be used to mark lvs that should not > > be managed by vdsm (even if they're part of a storage domain vg)? > > This seems to be the simplest solution, but I think we should first > understand why the special lvs must be in vdsm vg. I'd expect Engine to be able to create this (via vdsm) in future versions[1]. we don't want storage not managed via vdsm. [1] for example, move the hosted engine VM via live storage migration from one domain to another.
(In reply to Itamar Heim from comment #4) > (In reply to Nir Soffer from comment #3) > > (In reply to Federico Simoncelli from comment #2) > > > Can we come up with an lv tag that can be used to mark lvs that should not > > > be managed by vdsm (even if they're part of a storage domain vg)? > > > > This seems to be the simplest solution, but I think we should first > > understand why the special lvs must be in vdsm vg. > > I'd expect Engine to be able to create this (via vdsm) in future > versions[1]. we don't want storage not managed via vdsm. > > [1] for example, move the hosted engine VM via live storage migration from > one domain to another. [1] is not limited to block domains. If you want to use a new storage domain for hosted engine it needs to be prepared for such task (for example also nfs domains need additional special files). We'll probably be able to address that with a new storage domain version. The problem is that (as far as I know) the LVs and the special files were never reviewed by the storage team (at least nobody sought my opinion). Therefore at this time I cannot guarantee 100% that what we'll officially agree and provide the same files/lvs. What we can do right now is trying not to interfere with additional files/lvs placed by other applications such as hosted engine.
I just spoke with Jiri and the two files/lvs are: hosted-engine.metadata hosted-engine.lockspace I don't mind to supporting these in storage domain V4 (creation and activation). As far as the current problem (activation) we can start supporting them early adding the lvs to the special lvs in VDSM. If we're committing to have this in V4 then we need to be sure that this format is set in stone and won't change. If you're not sure yet about the format or you want to be more flexible then we can go with the "ignore" lv tag.
In my opinion, Vdsm should not be aware of hosted-engine at all, and hosted-engine should not create files/lvs within Vdsm's storage domains. When hosted-engine needs to place a volume in a storage domain, it should use Vdsm's api: createVolume to create it, prepareVolume to activate it. hosted-engine may keep the volume open; vdsm does not deactivate open volumes. Alternatively, hosted-engine can call prepareVolume again if it finds that the volume as been deactivated. One of the benefits of this approach for hosted-engine, is that it provides an abstraction: hosted-engine no longer needs to care if it's a fileSD or a blockSD. Another benefit is that if in the future we'd like to have another highly-available VM, say "hosted-neutron", we do not need to invent new special volumes or a new SD format V5.
Jiri I'd have to admit that comment 7 makes several good points. What do you think?
Ok, going to implement what Dan suggests in comment#7. Btw, vdsm already knows about hosted-engine, just not on the storage level.
(In reply to Dan Kenigsberg from comment #7) I think this is the best solution.
(In reply to Dan Kenigsberg from comment #7) > In my opinion, Vdsm should not be aware of hosted-engine at all, and > hosted-engine should not create files/lvs within Vdsm's storage domains. > > When hosted-engine needs to place a volume in a storage domain, it should > use Vdsm's api: createVolume to create it, prepareVolume to activate it. > > hosted-engine may keep the volume open; vdsm does not deactivate open > volumes. Alternatively, hosted-engine can call prepareVolume again if it > finds that the volume as been deactivated. > > One of the benefits of this approach for hosted-engine, is that it provides > an abstraction: hosted-engine no longer needs to care if it's a fileSD or a > blockSD. Do you mean that createVolume can create the following files in the domain directory? hosted-engine.metadata hosted-engine.lockspace Or that we should create volumes and then symlink? Or something else? Volume creation requires a connected pool as far as I know and we can't have pool connected while running the engine, only monitored domains. Does prepareVolume works without a pool? > > Another benefit is that if in the future we'd like to have another > highly-available VM, say "hosted-neutron", we do not need to invent new > special volumes or a new SD format V5.
In any case, I really think that hosted engine project will benefit really much by having one person from storage team (and maybe one from network) involved in the development / maintenance of the project.
There are some issues with Dan's proposal: 1) What format will the volume have? Sanlock uses the whole file as it sees fit. Hosted engine agent uses the other file in the same way. Both are about 1MB in size (except iSCSI because the VG has 128MB big extent size) and start with zeroed content. 2) Keeping the file/volume open at all times just to prevent VDSM from closing it is error prone and fragile. I would rather expect a flag that tells VDSM to not touch the volume (independently on who created it or if it is a proper volume or not) except when explicit action is requested. In general the design assumptions are: Hosted engine infrastructure has to work even when VDSM crashes (or is updated) or when engine dies. So all the volumes/files have to be available and the agent then makes sure that broker/storage/vdsm/sanlock are all ready before processing next action in the internal state machine. There are three things stored in the hosted engine SD. The actual disk for the engine VM and the two metadata files. The metadata files have to have atomic write on the block level (512B or 4kiB) and nobody is allowed to touch the content except the proper services. Federico: > new storage format I do not think that the specific names should be part of the format. API to create custom metadata files with arbitrary names would be handy though. > were never reviewed by the storage team (at least nobody sought my opinion). I am pretty sure I discussed the block SD design with you during the VDSM gathering in TLV and the understanding I got was that VDSM does not touch anything without a proper label. Which seems not to be true unfortunately... Btw you really were approached during the initial hosted engine design and I remember you were attending our daily phone meetings. Although probably not all of them.
Also if we ever decide to support direct IO device to hold the metadata, then the volume has to be mounted on all hosted engine capable hosts. There will be no data corruption, because we do not use any "filesystem" and the algorithms are aware of the fact that it is a shared "whiteboard". Everybody publishes its reports to a reserved spot there. Does VDSM support setup like that?
Shouldn't this be ON_QA? I guess it wasn't referenced in the code for beta1 but it should be fixed there.
(In reply to Sandro Bonazzola from comment #15) > Shouldn't this be ON_QA? I guess it wasn't referenced in the code for beta1 > but it should be fixed there. Is that up to me to move it to ON_QA?
(In reply to Jiri Moskovcak from comment #16) > (In reply to Sandro Bonazzola from comment #15) > > Shouldn't this be ON_QA? I guess it wasn't referenced in the code for beta1 > > but it should be fixed there. > > Is that up to me to move it to ON_QA? Well, if Bug-Url reference the bug, I can move it automatically, but if it's not referenced, the assignee should take care of the bug life :-)
http://gerrit.ovirt.org/#/c/28237/ does ;)
oVirt 3.5 has been released and should include the fix for this issue.