Bug 1245143
Summary: | [RHEV-H] Failed to deploy Hosted Engine on a storage domain as an additional host | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | cshao <cshao> | ||||||||||
Component: | ovirt-node-plugin-hosted-engine | Assignee: | Fabian Deutsch <fdeutsch> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | cshao <cshao> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | urgent | ||||||||||||
Version: | 3.5.4 | CC: | acanan, ahino, amureini, bmcclain, cshao, fdeutsch, gklein, huiwa, istein, leiwang, lsurette, nsednev, rbarry, sbonazzo, stirabos, yaniwang, ycui, ykaul | ||||||||||
Target Milestone: | ovirt-3.6.0-rc | Keywords: | Reopened, ZStream | ||||||||||
Target Release: | 3.6.0 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | ovirt-node-plugin-hosted-engine-0.3.0-1.el7ev | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | |||||||||||||
: | 1250400 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2016-03-09 14:32:54 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | Node | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | 1246863, 1249118, 1259247, 1260470, 1267437, 1271976, 1280268 | ||||||||||||
Bug Blocks: | 1059435, 1250199, 1250400 | ||||||||||||
Attachments: |
|
Can this bug also be reproduced by running ovirt-hosted-engine-setup on a plain RHEL host? I think you'have a configuration issue: 2015-07-21 08:56:32 DEBUG otopi.plugins.ovirt_hosted_engine_setup.vdsmd.cpu cpu._customization:137 Compatible CPU models are: [u'model_athlon', u'model_Opteron_G3', u'model_Opteron_G1', u'model_phenom', u'model_Opteron_G2'] 2015-07-21 08:56:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND The following CPU types are supported by this host: 2015-07-21 08:56:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND - model_Opteron_G3: AMD Opteron G3 2015-07-21 08:56:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND - model_Opteron_G2: AMD Opteron G2 2015-07-21 08:56:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND - model_Opteron_G1: AMD Opteron G1 2015-07-21 08:56:32 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 142, in _executeMethod File "/usr/share/ovirt-hosted-engine-setup/plugins/ovirt-hosted-engine-setup/vdsmd/cpu.py", line 194, in _customization RuntimeError: Invalid CPU type specified: None You're failing because no CPU type has been specified in the answer file or the hardware is not compatible with the answer file provided by rhev-h. Moving to node It doesn't look like we specify the CPU type, thus it is probably the answer to an interactive question: [fabiand@tee ovirt-node-plugin-hosted-engine (ovirt-3.5)]$ git grep -i cpu [fabiand@tee ovirt-node-plugin-hosted-engine (ovirt-3.5)]$ Chen, can you please provide the answer file, or tell us if you answered the CPU type question? (In reply to Fabian Deutsch from comment #5) > It doesn't look like we specify the CPU type, thus it is probably the answer > to an interactive question: > > [fabiand@tee ovirt-node-plugin-hosted-engine (ovirt-3.5)]$ git grep -i cpu > [fabiand@tee ovirt-node-plugin-hosted-engine (ovirt-3.5)]$ > > > Chen, can you please provide the answer file, or tell us if you answered the > CPU type question? I used the default CPU type: model_Opteron_G3: AMD Opteron G3 Did you just hit <Return> or did you enter anything? (In reply to Fabian Deutsch from comment #10) > Did you just hit <Return> or did you enter anything? Yes, just hit <Return>. The issue is there: 2015-07-21 09:09:04 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND The specified storage location already contains a data domain. Is this an additional host setup (Yes, No)[Yes]? 2015-07-21 09:09:12 INFO otopi.plugins.ovirt_hosted_engine_setup.storage.storage storage._handleHostId:123 Installing on additional host So it's an additional host and so some values (for instance the CPU kind) are not asked anymore cause they have to match the configuration of other hosts (otherwise the live-migration will not work and so HA will be faulty). Normally on additional hosts we ask to download the answerfile from one other host but here you already appended one 2015-07-21 09:09:23 DEBUG otopi.context context.dumpEnvironment:500 ENV CORE/configFileAppend=str:':/tmp/tmpM3gp2K' and so we assumed (maybe we could improve here!) that it was correctly and completely generated on the first host while yours misses the CPU type and so the issue. Sorry, not sure about closing. Probably we could still ask to download the answerfile if the one you passed is not complete. According to the offline discussion a possible fix is to introduce another question to ask the user if the attached file (using configappend) is the answer file from the other host. The complete fix will also require a change on the node side to set this answer to "no", because in the RHEV-H flow we can not attach the answer file frmo that other host. (In reply to Fabian Deutsch from comment #14) > According to the offline discussion a possible fix is to introduce another > question to ask the user if the attached file (using configappend) is the > answer file from the other host. > > The complete fix will also require a change on the node side to set this > answer to "no", because in the RHEV-H flow we can not attach the answer file > frmo that other host. Just a comment -- In the past, we discussed presenting users with a checkbox (or some other UI element) asking whether it's an additional host. That never got added, but the option to pull the answer file from the other host and use it is potentially there The most recent patches are preparing the following solution: A new button is added to the RHEV-H TUI to start the deployment of an additional host. That button will trigger the deployment as described here: [0] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.5/html/Installation_Guide/Installing_Additional_Hosts_to_a_Self-Hosted_Environment.html Note: The TUI layout will also be changed a bit, to clarify what items are relevant for the first host deployment, and the additional host deployment. The assumption of this approach is, that the SSH is enabled and a root password is set on the first deployed RHEV-H host with self-hosted engine. This can be achieved by going to the RHEV-M page, setting a password in the shown fields and saving that page. With the assumption that the patches are applied the flow looks as follows: 1. Install first RHEV-H host 2. Configure networking and setup HE on the first host 3. After the HE setup: Go to the RHEV-M/Engine page, set a password and hit "<Save / Register>" 4. Install an additional RHEV-H host 5. Go to the Hosted Engine page and select "Start additional host setup" 6. Proceed as described in the docs for RHEL-H [0] The benefit is that the flow is the same on RHEL and RHEV-H. It has to be verified if this is working as expected. Created attachment 1058029 [details]
Layout of the RHEV-M page
This screenshot shows the page where the password needs to be set to enable this flow.
Created attachment 1058030 [details]
Layout of the Hosted Engine page
This screenshot shows the new layout of the hosted engine page to support both flows.
The flow laid out in comment 18 was tested and it seems to work. Created attachment 1058670 [details]
he6.7-software-iscsi-failed
Let me note that the flow of the setting up the initla HE host is not change or directly affected by this change. separate a new bug to trace software iscsi issue. Bug 1251901 - [RHEV-H] Failed to deploy Hosted Engine through specify software iscsi as storage. This bug is covering the basic flow to add an additional host to the HE setup. The bug in comment 34 is covering the still non-working case related to the CPU types. According to comment 27 and comment 31 and comment 33 this bug can be moved to verified. This bug is blocked by bug 1267437 as it block HE setup , I will verify this bug after 1267437 fixed. This bug stills blocked by bug 1271976 as it block HE setup , I will verify this bug after 1271976 fixed. The bug blocked by new bug 1280268 as HE-VM cannot startup automatically after successful configure HE, I will verify this bug after 1280268 fixed. Test version: rhev-hypervisor7-7.2-20151210.1 ovirt-node-3.6.0-0.24.20151209gitc0fa931.el7ev.noarch ovirt-node-plugin-hosted-engine-0.3.0-4.el7ev.noarch ovirt-hosted-engine-setup-1.3.1.2-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.3.3-1.el7ev.noarch Test steps: 1. Install first RHEV-H host 2. Configure networking and setup HE on the first host 3. Specify nfs storage during storage configuration. 4. After the HE setup: Go to the RHEV-M/Engine page, set a password and hit "<Save / Register>" 5. Install an additional RHEV-H host 5. Go to the Hosted Engine page and select "Add this host to an existing group" 6. Proceed as described with correct steps. 7. Reboot host. Test result: Both above two hosts still can auto register to HE, the persistence is working correct. So the bug is fixed, change bug status to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0378.html |
Created attachment 1054273 [details] he-iscsi.tar.gz /var/log/*.* sosreport