Bug 1245143

Summary: [RHEV-H] Failed to deploy Hosted Engine on a storage domain as an additional host
Product: Red Hat Enterprise Virtualization Manager Reporter: cshao <cshao>
Component: ovirt-node-plugin-hosted-engineAssignee: Fabian Deutsch <fdeutsch>
Status: CLOSED ERRATA QA Contact: cshao <cshao>
Severity: high Docs Contact:
Priority: urgent    
Version: 3.5.4CC: acanan, ahino, amureini, bmcclain, cshao, fdeutsch, gklein, huiwa, istein, leiwang, lsurette, nsednev, rbarry, sbonazzo, stirabos, yaniwang, ycui, ykaul
Target Milestone: ovirt-3.6.0-rcKeywords: Reopened, ZStream
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-node-plugin-hosted-engine-0.3.0-1.el7ev Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1250400 (view as bug list) Environment:
Last Closed: 2016-03-09 14:32:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1246863, 1249118, 1259247, 1260470, 1267437, 1271976, 1280268    
Bug Blocks: 1059435, 1250199, 1250400    
Attachments:
Description Flags
he-iscsi.tar.gz
none
Layout of the RHEV-M page
none
Layout of the Hosted Engine page
none
he6.7-software-iscsi-failed none

Comment 1 cshao 2015-07-21 10:07:57 UTC
Created attachment 1054273 [details]
he-iscsi.tar.gz

/var/log/*.*
sosreport

Comment 3 Fabian Deutsch 2015-07-28 09:54:55 UTC
Can this bug also be reproduced by running ovirt-hosted-engine-setup on a plain RHEL host?

Comment 4 Sandro Bonazzola 2015-07-28 10:00:08 UTC
I think you'have a configuration issue:

2015-07-21 08:56:32 DEBUG otopi.plugins.ovirt_hosted_engine_setup.vdsmd.cpu cpu._customization:137 Compatible CPU models are: [u'model_athlon', u'model_Opteron_G3', u'model_Opteron_G1', u'model_phenom', u'model_Opteron_G2']
2015-07-21 08:56:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND                 The following CPU types are supported by this host:
2015-07-21 08:56:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND                 	 - model_Opteron_G3: AMD Opteron G3
2015-07-21 08:56:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND                 	 - model_Opteron_G2: AMD Opteron G2
2015-07-21 08:56:32 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND                 	 - model_Opteron_G1: AMD Opteron G1
2015-07-21 08:56:32 DEBUG otopi.context context._executeMethod:152 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 142, in _executeMethod
  File "/usr/share/ovirt-hosted-engine-setup/plugins/ovirt-hosted-engine-setup/vdsmd/cpu.py", line 194, in _customization
RuntimeError: Invalid CPU type specified: None

You're failing because no CPU type has been specified in the answer file or the hardware is not compatible with the answer file provided by rhev-h.

Moving to node

Comment 5 Fabian Deutsch 2015-07-28 10:46:08 UTC
It doesn't look like we specify the CPU type, thus it is probably the answer to an interactive question:

[fabiand@tee ovirt-node-plugin-hosted-engine (ovirt-3.5)]$ git grep -i cpu
[fabiand@tee ovirt-node-plugin-hosted-engine (ovirt-3.5)]$ 


Chen, can you please provide the answer file, or tell us if you answered the CPU type question?

Comment 9 cshao 2015-07-29 08:44:49 UTC
(In reply to Fabian Deutsch from comment #5)
> It doesn't look like we specify the CPU type, thus it is probably the answer
> to an interactive question:
> 
> [fabiand@tee ovirt-node-plugin-hosted-engine (ovirt-3.5)]$ git grep -i cpu
> [fabiand@tee ovirt-node-plugin-hosted-engine (ovirt-3.5)]$ 
> 
> 
> Chen, can you please provide the answer file, or tell us if you answered the
> CPU type question?

I used the default CPU type: model_Opteron_G3: AMD Opteron G3

Comment 10 Fabian Deutsch 2015-07-29 09:42:30 UTC
Did you just hit <Return> or did you enter anything?

Comment 11 cshao 2015-07-29 09:49:15 UTC
(In reply to Fabian Deutsch from comment #10)
> Did you just hit <Return> or did you enter anything?

Yes, just hit <Return>.

Comment 12 Simone Tiraboschi 2015-07-29 15:45:54 UTC
The issue is there:

2015-07-21 09:09:04 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND                 The specified storage location already contains a data domain. Is this an additional host setup (Yes, No)[Yes]? 
2015-07-21 09:09:12 INFO otopi.plugins.ovirt_hosted_engine_setup.storage.storage storage._handleHostId:123 Installing on additional host

So it's an additional host and so some values (for instance the CPU kind) are not asked anymore cause they have to match the configuration of other hosts (otherwise the live-migration will not work and so HA will be faulty).

Normally on additional hosts we ask to download the answerfile from one other  host but here you already appended one
2015-07-21 09:09:23 DEBUG otopi.context context.dumpEnvironment:500 ENV CORE/configFileAppend=str:':/tmp/tmpM3gp2K'
and so we assumed (maybe we could improve here!) that it was correctly and completely generated on the first host while yours misses the CPU type and so the issue.

Comment 13 Simone Tiraboschi 2015-07-29 15:47:05 UTC
Sorry, not sure about closing.
Probably we could still ask to download the answerfile if the one you passed is not complete.

Comment 14 Fabian Deutsch 2015-07-29 16:08:20 UTC
According to the offline discussion a possible fix is to introduce another question to ask the user if the attached file (using configappend) is the answer file from the other host.

The complete fix will also require a change on the node side to set this answer to "no", because in the RHEV-H flow we can not attach the answer file frmo that other host.

Comment 17 Ryan Barry 2015-07-30 14:51:29 UTC
(In reply to Fabian Deutsch from comment #14)
> According to the offline discussion a possible fix is to introduce another
> question to ask the user if the attached file (using configappend) is the
> answer file from the other host.
> 
> The complete fix will also require a change on the node side to set this
> answer to "no", because in the RHEV-H flow we can not attach the answer file
> frmo that other host.

Just a comment --

In the past, we discussed presenting users with a checkbox (or some other UI element) asking whether it's an additional host. That never got added, but the option to pull the answer file from the other host and use it is potentially there

Comment 18 Fabian Deutsch 2015-07-31 10:56:45 UTC
The most recent patches are preparing the following solution:

A new button is added to the RHEV-H TUI to start the deployment of an additional host.
That button will trigger the deployment as described here:
[0] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.5/html/Installation_Guide/Installing_Additional_Hosts_to_a_Self-Hosted_Environment.html

Note: The TUI layout will also be changed a bit, to clarify what items are relevant for the first host deployment, and the additional host deployment.

The assumption of this approach is, that the SSH is enabled and a root password is set on the first deployed RHEV-H host with self-hosted engine.
This can be achieved by going to the RHEV-M page, setting a password in the shown fields and saving that page.

With the assumption that the patches are applied the flow looks as follows:

1. Install first RHEV-H host
2. Configure networking and setup HE on the first host
3. After the HE setup: Go to the RHEV-M/Engine page, set a password and hit "<Save / Register>"

4. Install an additional RHEV-H host
5. Go to the Hosted Engine page and select "Start additional host setup"
6. Proceed as described in the docs for RHEL-H [0]

The benefit is that the flow is the same on RHEL and RHEV-H.

It has to be verified if this is working as expected.

Comment 19 Fabian Deutsch 2015-07-31 10:57:47 UTC
Created attachment 1058029 [details]
Layout of the RHEV-M page

This screenshot shows the page where the password needs to be set to enable this flow.

Comment 20 Fabian Deutsch 2015-07-31 10:58:19 UTC
Created attachment 1058030 [details]
Layout of the Hosted Engine page

This screenshot shows the new layout of the hosted engine page to support both flows.

Comment 22 Fabian Deutsch 2015-07-31 19:07:24 UTC
The flow laid out in comment 18 was tested and it seems to work.

Comment 28 cshao 2015-08-03 08:48:45 UTC
Created attachment 1058670 [details]
he6.7-software-iscsi-failed

Comment 29 Fabian Deutsch 2015-08-03 12:25:47 UTC
Let me note that the flow of the setting up the initla HE host is not change or directly affected by this change.

Comment 34 cshao 2015-08-10 09:20:33 UTC
separate a new bug to trace software iscsi issue.
Bug 1251901 - [RHEV-H] Failed to deploy Hosted Engine through specify software iscsi as storage.

Comment 35 Fabian Deutsch 2015-08-10 11:20:12 UTC
This bug is covering the basic flow to add an additional host to the HE setup.

The bug in comment 34 is covering the still non-working case related to the CPU types.

According to comment 27 and comment 31 and comment 33 this bug can be moved to verified.

Comment 39 cshao 2015-10-09 07:33:26 UTC
This bug is blocked by bug 1267437 as it block HE setup , I will verify this bug after 1267437 fixed.

Comment 40 cshao 2015-10-26 07:45:26 UTC
This bug stills blocked by bug 1271976 as it block HE setup , I will verify this bug after 1271976 fixed.

Comment 41 cshao 2015-11-23 06:16:59 UTC
The bug blocked by new bug 1280268 as HE-VM cannot startup automatically after successful configure HE, I will verify this bug after 1280268 fixed.

Comment 42 cshao 2015-12-15 07:41:44 UTC
Test version:
rhev-hypervisor7-7.2-20151210.1
ovirt-node-3.6.0-0.24.20151209gitc0fa931.el7ev.noarch
ovirt-node-plugin-hosted-engine-0.3.0-4.el7ev.noarch
ovirt-hosted-engine-setup-1.3.1.2-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.3.3-1.el7ev.noarch


Test steps:
1. Install first RHEV-H host
2. Configure networking and setup HE on the first host
3. Specify nfs storage during storage configuration.
4. After the HE setup: Go to the RHEV-M/Engine page, set a password and hit "<Save / Register>"
5. Install an additional RHEV-H host
5. Go to the Hosted Engine page and select "Add this host to an existing group"
6. Proceed as described with correct steps.
7. Reboot host.

Test result:
Both above two hosts still can auto register to HE, the persistence is working correct.

So the bug is fixed, change bug status to VERIFIED.

Comment 44 errata-xmlrpc 2016-03-09 14:32:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0378.html