Bug 1879032
Summary: | If there is no master storage domain, the engine should elect one | ||
---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Yedidyah Bar David <didi> |
Component: | BLL.Storage | Assignee: | shani <sleviim> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Amit Sharir <asharir> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.4.1 | CC: | bugs, bzlotnik, eshenitz, mavital, nsoffer, sfishbai, sleviim |
Target Milestone: | ovirt-4.4.6 | Flags: | pm-rhel:
ovirt-4.4+
mavital: testing_plan_complete- |
Target Release: | 4.4.6.4 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ovirt-engine-4.4.6.4 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-05-05 05:36:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yedidyah Bar David
2020-09-15 09:08:20 UTC
It makes sense that engine select new master. I wonder why it does not work with current code, probably this is a result of bad database change. Why did you use: '--he-remove-storage-vm' And why it removed the master from the database? I don't think this is valid operation. It sounds like bad database change that the system is not ready to handle yet. (In reply to Nir Soffer from comment #1) > It makes sense that engine select new master. I wonder why it does not work > with > current code, probably this is a result of bad database change. > > Why did you use: > > '--he-remove-storage-vm' > > And why it removed the master from the database? It removes the engine VM and the hosted_storage domain. It was added [1] for bug 1240466. If you think this code is broken, please advice on what to do to fix it. > I don't think this is valid > operation. It sounds like bad database change that the system is not ready > to handle yet. What do you suggest as an alternative? The specific flow this was used for in the report leading to opening this bug is migrating from hosted-engine setup to standalone/bare-metal. Do you see risk in this, for this flow? Please remember that when doing this db manipulation, the engine is dead - the old engine should not be used anymore (or even does not exist), and all we have is the db, which was restored from backup. [1] https://gerrit.ovirt.org/#/q/Id61ae0b05a75018ded532d7a0c38c15b4b885803,n,z (In reply to Yedidyah Bar David from comment #2) > (In reply to Nir Soffer from comment #1) I don't think we support system without master domain. We have a way to create a new master when the current master is not accessible, but engine is probably not ready to handle a state when there is no master domain the db. I'm not sure how it can be done on engine side, maybe Benny or Eyal can recommend a way to remove the hosted storage domain in a correct way. With bug 1576923 we should have an easy way to select a new master domain. This can be used to select a new master domain. I think this one was covered by Bella as a fix for this bug: https://bugzilla.redhat.com/1836034. By that, the hosted-storage can became the master. It seems to be applicable from ovirt-engine-4.4.3. The fix is available here: https://gerrit.ovirt.org/#/c/110718/ What do you think? (In reply to shani from comment #4) > I think this one was covered by Bella as a fix for this bug: > https://bugzilla.redhat.com/1836034. > By that, the hosted-storage can became the master. > It seems to be applicable from ovirt-engine-4.4.3. This may be the reason why this fail now. If hosted engine storage domain cannot be master, we can safely delete it from the database. Once this domain can be master, we cannot delete it from the db without setting another domain as master, and updating the other domain role on storage. This is tricky since it cannot be done with running SPM. Is bug 1836034 about changing a non-master hosted_storage to master? Or is also related to it being master originally, when created by a new HE deployment? IIUC only the former - it's the master, IIUC, since we moved to node-zero He deployment, where it's created by the engine (and not by HE code, also calling directly vdsm code). If I got it right, then bug 1836034 is not very relevant to current, and current is applicable since node-zero. QE doesn't have enough capacity to verify this bug on the 4.4.5 release. The bug was verified on enviorment -hosted-engine-11 [root@hosted-engine-11 ~]# rpm -q ovirt-engine ovirt-engine-4.4.6.3-0.8.el8ev.noarch [root@oncilla05 ~]# rpm -q vdsm vdsm-4.40.60.3-1.el8ev.x86_64 [root@hosted-engine-11 ~]# rpm -qa | grep release rhv-release-4.4.6-4-001.noarch redhat-release-8.4-0.6.el8.x86_64 Full procedure that was done in the bug verification flow (approved by Yedidyah Bar David). 1. Deploy hosted-engine in a way that causes its hosted_storage to be the master storage domain. 2. On the engine take a backup with : engine-backup. 3. On the host run : hosted-engine --set-maintenance --mode=global. 4. On the engine run : engine-backup --he-remove-storage-vm --mode=restore --provision-all-databases --file=/var/lib/ovirt-engine-backup/ovirt-engine-backup-20210422144955.backup. 5. On the engine run : engine-cleanup. 6. On the engine run : engine-setup (yes to all options). 7. in order to check the new status in the UI run on the engine : hosted-engine --vm-start (before entering the UI check that it is up and running using: hosted-engine --vm-status). Verification Summary and conclusions - After completing the flow mentioned above the UI showed the correct status - the engine selected some other storage domain as master (approved by Yedidyah Bar David). The expected output and actual output were identical and correct. Bug verified. This bugzilla is included in oVirt 4.4.6 release, published on May 4th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.6 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. Due to QE capacity, we are not going to cover this issue in our automation |