Bug 1409203

Summary: Login HE-VM failed if select "Yes" when the step "Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?"
Product: [oVirt] ovirt-hosted-engine-setup Reporter: Yihui Zhao <yzhao>
Component: GeneralAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: Nikolai Sednev <nsednev>
Severity: high Docs Contact:
Priority: high    
Version: 2.1.0CC: bugs, cshao, dguo, fdeutsch, huzhao, jiawu, leiwang, nsednev, pmatyas, qiyuan, rbarry, weiwang, yaniwang, ycui, yzhao
Target Milestone: ovirt-4.1.0-rcKeywords: Triaged
Target Release: 2.1.0.1Flags: rule-engine: ovirt-4.1+
rule-engine: blocker+
rule-engine: planning_ack+
fdeutsch: devel_ack+
gklein: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-15 15:04:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1372260    
Bug Blocks: 1361511, 1379405, 1402435, 1403903, 1413928, 1434957    
Attachments:
Description Flags
LoginFailed.png
none
engine.log
none
/var/log/*, sosreport
none
difference.png none

Description Yihui Zhao 2016-12-30 09:14:03 UTC
Created attachment 1236047 [details]
LoginFailed.png

Description of problem:
Login HE-VM failed if select "Yes" when the step "Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?" on cockpit.


Version-Release number of selected component (if applicable):
PXE profile: RHVH-4.0-73-20161116.0  
cockpit-ovirt-dashboard-0.10.6-1.4.3.el7ev.noarch
cockpit-ws-122-3.el7.x86_64
RHVH4.1_20161222.0
ovirt-hosted-engine-setup-2.1.0-0.0.master.git46cacd3.el7ev.noarch
ovirt-host-deploy-1.6.0-0.2.master.gitb76ad50.el7ev.noarch
ovirt-hosted-engine-ha-2.1.0-0.0.master.git118000f.el7ev.noarch
20161214.0-1.el7ev.4.0.rpm (engine-appliance-image rpm)

How reproducible:
100%

Steps to Reproduce:
1. Install redhat-virtualization-host-4.1-0.20161222.0 via anaconda GUI
2. Login Cockpit to deploy HE via cockpit

Actual results:
After step 2, if I select "Yes" when the step "Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?" on cockpit , then login HE-VM
with correct password failed, and block later steps.

Expected results:
After step 2, whatever select "Yes/No" when the step "Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?" on cockpit. It don't affect login to HE-VM.


Additional info:
For RHVH4.0,the step "add lines for the appliance itself and for this host to /etc/hosts on engine VM?" on cockpit.  If select "Yes",it will add lines about RHVH like "10.73.131.65 dell-per730-35.lab.eng.pek2.redhat.com" to /etc/hosts on the engine VM automatically. And don't affect login to HE-VM.

Comment 1 Yihui Zhao 2016-12-30 09:15:05 UTC
Created attachment 1236049 [details]
engine.log

Comment 2 Yihui Zhao 2016-12-30 09:18:26 UTC
Created attachment 1236050 [details]
/var/log/*, sosreport

Comment 3 Fabian Deutsch 2017-01-02 12:16:30 UTC
This is a crazy bug.

Any idea why this can be caused?

Comment 4 Yihui Zhao 2017-01-03 02:46:41 UTC
(In reply to Fabian Deutsch from comment #3)
> This is a crazy bug.
> 
> Any idea why this can be caused?

Hi Fabian,
   I also think this is a crazy bug.
I found that if I select "yes" when the step "Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?" on cockpit,the FQDN about HE-VM wasn't setted by me,just like dhcp.

Select "no", the FQDN about HE-VM was setted by me.

About attachment difference.png

Comment 5 Yihui Zhao 2017-01-03 02:47:25 UTC
Created attachment 1236756 [details]
difference.png

Comment 6 Ryan Barry 2017-01-03 02:59:19 UTC
Can this be reproduced on the CLI? Cockpit doesn't do anything special here, so I'd expect the bug to be in the base component

Comment 7 Yihui Zhao 2017-01-03 03:01:37 UTC
(In reply to Ryan Barry from comment #6)
> Can this be reproduced on the CLI? Cockpit doesn't do anything special here,
> so I'd expect the bug to be in the base component

Checking

Comment 8 Yihui Zhao 2017-01-03 03:28:54 UTC
(In reply to Ryan Barry from comment #6)
> Can this be reproduced on the CLI? Cockpit doesn't do anything special here,
> so I'd expect the bug to be in the base component

Hi,Ryan
    It can reproduce on the CLI if select "Yes".

Thanks,
Yihui

Comment 9 Red Hat Bugzilla Rules Engine 2017-01-03 06:14:37 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 10 Simone Tiraboschi 2017-01-12 15:43:19 UTC
This is not reproducible upstream on Centos, I suspect it could be related to https://bugzilla.redhat.com/show_bug.cgi?id=1411671#c9

Comment 11 Sandro Bonazzola 2017-01-17 09:12:48 UTC
Can you please ensure hostname is properly configured and reproduce?

Comment 12 Yihui Zhao 2017-01-17 10:00:58 UTC
(In reply to Sandro Bonazzola from comment #11)
> Can you please ensure hostname is properly configured and reproduce?

Hi, Sandro
   I set hostname and reproduce. If select "yes", the HE-VM's hostname is not 
which i provide.

Like this: set HE-VM FQDN : a.redhat.com

           vnc login to HE-VM: Red Hat Enterprise Linux Server 7.3(Maipo)
                               Kernel 3.10.0-514.2.2.el7.x86_64 on an x86_64
                               
                               dhcp-8-216 login:

Comment 13 Sandro Bonazzola 2017-01-18 13:43:30 UTC
Yihui, can you give Simone access to the host reproducing the issue?

Comment 16 Nikolai Sednev 2017-01-22 09:14:34 UTC
1-Is your "a.redhat.com" is a resolvable FQDN in redhat.com domain?
2-Is your MAC address being assigned randomly from default MAC pool during the deployment or you're using IP to MAC reservation in your DHCP pool?
3-Your scenario should work regardless of steps 1 and 3, as when you're answering "YES" to "Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?", you should get "a.redhat.com" record added in /etc/hosts on both engine and hosts and always get resolvable, unless its not being properly created on both host and the appliance, it might be that during deployment, process checks first for if "a.redhat.com" is resolvable through the DNS, regardless of local records, which might be a bug.
4-How many hosts do you have in your environment? If only a single host with the appliance, then its a really strange bug. But with several hosts I would avoid using /etc/hosts, as all other additional hosts should have the a.redhat.com record in their /etc/hosts, which might be a bad idea for multiple hosts environments.
5-May you try to reproduce with an existing DNS record and MAC to IP reservation? Lets say you have some MAC to IP reservation, which has also a DNS resolvable record, and then you'll try to deploy your HE using that reserved MAC address and resolvable FQDN on RHEVH. Would this issue happen in this case, while answering "yes" or "no" for "Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?"?

Comment 18 Yihui Zhao 2017-01-22 09:52:24 UTC
(In reply to Nikolai Sednev from comment #16)
> 1-Is your "a.redhat.com" is a resolvable FQDN in redhat.com domain?
> 2-Is your MAC address being assigned randomly from default MAC pool during
> the deployment or you're using IP to MAC reservation in your DHCP pool?
> 3-Your scenario should work regardless of steps 1 and 3, as when you're
> answering "YES" to "Add lines for the appliance itself and for this host to
> /etc/hosts on the engine VM?", you should get "a.redhat.com" record added in
> /etc/hosts on both engine and hosts and always get resolvable, unless its
> not being properly created on both host and the appliance, it might be that
> during deployment, process checks first for if "a.redhat.com" is resolvable
> through the DNS, regardless of local records, which might be a bug.
> 4-How many hosts do you have in your environment? If only a single host with
> the appliance, then its a really strange bug. But with several hosts I would
> avoid using /etc/hosts, as all other additional hosts should have the
> a.redhat.com record in their /etc/hosts, which might be a bad idea for
> multiple hosts environments.
> 5-May you try to reproduce with an existing DNS record and MAC to IP
> reservation? Lets say you have some MAC to IP reservation, which has also a
> DNS resolvable record, and then you'll try to deploy your HE using that
> reserved MAC address and resolvable FQDN on RHEVH. Would this issue happen
> in this case, while answering "yes" or "no" for "Add lines for the appliance
> itself and for this host to /etc/hosts on the engine VM?"?

1.No, it just a FQDN i provided , so we should modify the /etc/hosts
2.Yes, the IP and MAC address is in DHCP pool.
3.
4.single host
5.i will reproduce this bug with an existing DNS

Additional info: on RHVH4.0 , it worked well by my test path.

Comment 19 Simone Tiraboschi 2017-01-25 17:32:43 UTC
You were probably hitting this one: https://bugzilla.redhat.com/1372260

Our cloud-init script is restarting sshd to make the configuration effective but due to https://bugzilla.redhat.com/1372260 'systemd restart sshd' could hang forever and so our cloud-init script.

"Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?" was probably just slightly changing a timeout and so you were hitting BZ#1372260 or not.

Comment 20 Sandro Bonazzola 2017-01-26 16:41:09 UTC
*** Bug 1416023 has been marked as a duplicate of this bug. ***

Comment 21 Simone Tiraboschi 2017-01-27 13:40:58 UTC
*** Bug 1417196 has been marked as a duplicate of this bug. ***

Comment 22 Nikolai Sednev 2017-02-06 14:29:58 UTC
Works for me on RHEVM host with these components:
rhvm-appliance-4.1.20170126.0-1.el7ev.noarch
libvirt-client-2.0.0-10.el7_3.4.x86_64
ovirt-hosted-engine-setup-2.1.0.1-1.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
vdsm-4.19.4-1.el7ev.x86_64
ovirt-hosted-engine-ha-2.1.0.1-1.el7ev.noarch
ovirt-host-deploy-1.6.0-1.el7ev.noarch
ovirt-imageio-common-1.0.0-0.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.3.x86_64
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
ovirt-setup-lib-1.1.0-1.el7ev.noarch
Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016
Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo).

Comment 23 Nikolai Sednev 2017-02-06 17:01:23 UTC
Works for me also on latest 4.1RHVH:
rhvm-appliance-4.1.20170126.0-1.el7ev.noarch
ovirt-imageio-common-1.0.0-0.el7ev.noarch
ovirt-hosted-engine-ha-2.1.0.1-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0.1-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-host-deploy-1.6.0-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-node-ng-nodectl-4.1.0-0.20170104.1.el7.noarch
libvirt-client-2.0.0-10.el7_3.4.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.3.x86_64
vdsm-4.19.4-1.el7ev.x86_64
sanlock-3.4.0-1.el7.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
ovirt-setup-lib-1.1.0-1.el7ev.noarch
Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016
Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 7.3

Moving to verified.