Bug 1258754
Summary: | [Docs] - Add steps for cleaning up a failed HE deployment | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ying Cui <ycui> | ||||||
Component: | Documentation | Assignee: | Julie <juwu> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Byron Gravenorst <bgraveno> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 3.5.4 | CC: | adahms, amureini, cshao, dfediuck, fdeutsch, gklein, istein, juwu, laravot, lbopf, leiwang, lsurette, mgoldboi, mkalinin, nsednev, nsoffer, rbalakri, rbarry, srevivo, stirabos, ycui, ylavi, yzhao | ||||||
Target Milestone: | ovirt-3.6.6 | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-07-12 05:38:07 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Docs | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Ying Cui
2015-09-01 08:13:21 UTC
This issue should be in RHEL as well, but for RHEL, the user can yum remove the relevant pkgs, not impact a lots, but for RHEV-H, we have to re-install the whole RHEV-H before we find workaround to cleanup the host. Created attachment 1068874 [details]
sosreport
Created attachment 1068875 [details] varlog.tar.bz2 Any suggested workaround? We have some hints here: http://www.ovirt.org/Hosted_Engine_Howto#Recoving_from_failed_install The issue is here: 2015-09-01 07:23:13 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod File "/usr/share/ovirt-hosted-engine-setup/plugins/ovirt-hosted-engine-setup/vm/configurevm.py", line 112, in _late_setup RuntimeError: Cannot setup Hosted Engine with other VMs running For sure the user has to destroy the previous engine vm with hosted-engine --vm-poweroff and manually cleanup the shared storage in order to try again. On RHEV-H then almost all the config file are persisted just at the end of the deployment process so nothing should really be there and a second attempt should simply work. (In reply to Simone Tiraboschi from comment #5) > For sure the user has to destroy the previous engine vm with > hosted-engine --vm-poweroff > and manually cleanup the shared storage in order to try again. This workaround works good on RHEV-H after aborting HE setup. (In reply to Ying Cui from comment #6) > (In reply to Simone Tiraboschi from comment #5) > > For sure the user has to destroy the previous engine vm with > > hosted-engine --vm-poweroff > > and manually cleanup the shared storage in order to try again. > > This workaround works good on RHEV-H after aborting HE setup. We should add a button in the TUI to fix this, but I think this is not a blocker. Can you please add a release note for this? Moran, is a 3.5.5 target correct in your view? A button will not have the right context in the current page. I'd rather favor to pull it into 3.6.0, where we have refactored the page and a submenu/dialog is available for these kind of actions. We have a workaround that requires a drop to shell that customers are not supposed to do without GSS. Moran, please review. Simone, from comment 0: RuntimeError: Cannot setup Hosted Engine with other VMs running 2015-09-01 07:23:13 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Environment setup': Cannot setup Hosted Engine with other VMs running 2015-09-01 07:23:13 DEBUG otopi.context context.dumpEnvironment:490 ENVIRONMENT DUMP - BEGIN Should he-setup bring down a VM it has spawned if the HE setup is getting aborted? (In reply to Fabian Deutsch from comment #12) > Should he-setup bring down a VM it has spawned if the HE setup is getting > aborted? No, in case of a failure it' currently up to the user with hosted-engine --vm-poweroff command. It's also up to the user to cleanup the storage if needed. Okay, thanks. I think we'll stick to how this works on RHEL as well. Julie, can a note be added to the right documentation to tell the reader that the RHEL guidelines should be followed to clean up a host, in case of an aborted hosted-engine setup? In our docs, we don't tell users how to clean up a failed HE setup. I think the assumed knowledge was to do a fresh installation of the RHEL or RHEV-H in case of a HE setup failure. In out testing, if something goes wrong, we've always spun up a fresh install to proceed with HE setup so I'd like to get some clarification on what is the supported way to go forward- to tell users to always have a fresh installation or provide provide clean-up procedures for RHEL and RHEV-H? Maybe someone from the support team can weigh in as well? Hi Julie, indeed we are missing some important info in the HE guide. See this bug 1293971. I think it is not clear what to do on a failed deployment, and I personally think it should be mentioned somewhere - KCS or official docs. I could not find anything related in the knowledgebase and I would be happy to make one kcs, but it is not clear to me, what should be the process. (in the long term, of course, it would be preferable to have some clean-up tool). Simon, can you please specify the steps for cleaning up a failed HE deployment? We have more than one RFE and we are working on it for 4.0. In the mean time: hosted-engine --vm-poweroff # to poweroff the engine VM if running systemctl stop ovirt-ha-agent; systemctl stop ovirt-ha-broker; systemctl stop vdsmd /bin/rm /etc/ovirt-hosted-engine/hosted-engine.conf /bin/rm /etc/ovirt-hosted-engine/answers.conf /bin/rm /etc/vdsm/vdsm.conf /bin/rm /etc/pki/vdsm/*/*.pem /bin/rm /etc/pki/CA/cacert.pem /bin/rm /etc/pki/libvirt/*.pem /bin/rm /etc/pki/libvirt/private/*.pem And this just acts on the single host while the hosted-engine image is on the shared storage (an iSCSI or FC LUN, an NFS share or a gluster volume in 3.6) and cleaning that it's currently up to the user being on another system and maybe being used also by different hosts. Then VDSM doesn't automatically disconnect the storage server and also the disconnectStorageServer verb if explicitly called can finish with some LVM volumes leftovers that could cause issues on the next attempt so the easiest (but really ugly!!!) way is to reboot the host before the next attempt. We have an RFE also on this: https://bugzilla.redhat.com/show_bug.cgi?id=1149738 In theory it could be possible to redeploy with the answerfile from previous attempts but this also contains the LUN UUID an, related to how the user cleaned the LUN, it could be not valid anymore so also here re-starting from scratch is a safer option. Assigning to Julie for review. Julie, we'll just need to review the KCS Solution attached to this bug, and then publish it when it's ready. (In reply to Simone Tiraboschi from comment #17) > Then VDSM doesn't automatically disconnect the storage server and also the > disconnectStorageServer verb if explicitly called can finish with some LVM > volumes leftovers that could cause issues on the next attempt so the easiest > (but really ugly!!!) way is to reboot the host before the next attempt. > We have an RFE also on this: > https://bugzilla.redhat.com/show_bug.cgi?id=1149738 Not sure if reboot would be sufficient. For instance, if that is iscsi, we would need to clear the /var/lib/iscsi directory. Maybe it is worth asking advice from a storage person, if want to publish this solution through official documentation. At this point I am publishing this: https://access.redhat.com/solutions/2121581 Please provide the info once you talk with the storage team. Liron - can you take a look please? It makes sense to me, Nir - any other opinion? Documentation Link: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.6/html/Self-Hosted_Engine_Guide/Cleaning_Up_a_Failed_Self-hosted_Engine_Deployment.html (In reply to Liron Aravot from comment #26) > It makes sense to me, Nir - any other opinion? No |