Bug 1221290

Summary: [self-hosted] Can't add 2nd host into self-hosted env: The VDSM host was found in a failed state... Unable to add slot-5b to the manager
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: ovirt-hosted-engine-setupAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED ERRATA QA Contact: Jiri Belka <jbelka>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.5.1CC: aburden, bazulay, ecohen, gklein, jbelka, lpeer, lsurette, oourfali, pstehlik, pzhukov, sbonazzo, stirabos, yeylon, ylavi
Target Milestone: ---Keywords: Regression, TestBlocker, ZStream
Target Release: 3.5.3Flags: ylavi: Triaged+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: integration
Fixed In Version: ovirt-hosted-engine-setup-1.2.4-2.el6ev Doc Type: Bug Fix
Doc Text:
Previously, in self-hosted engine environments, the HOST_ID for additional hosts was stored as a string, which prevented the Manager from properly identifying and contacting VDSM on the host and resulted in failure to add host. Now, the HOST_ID is stored as an integer and the host is added to the self-hosted engine as expected.
Story Points: ---
Clone Of: 1216172 Environment:
Last Closed: 2015-06-15 13:17:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1216172    
Bug Blocks: 1218280    
Attachments:
Description Flags
sosreport-dell-r210ii-13.rhev.lab.eng.brq.redhat.com-20150518163140.tar.xz none

Comment 3 Jiri Belka 2015-05-18 11:08:10 UTC
[ INFO  ] Still waiting for VDSM host to become operational...
[ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs.
[ ERROR ] Unable to add hosted_engine_2 to the manager
[ INFO  ] Enabling and starting HA services
          Hosted Engine successfully set up
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150518130020.conf'
[ INFO  ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination

[root@dell-r210ii-13 yum.repos.d]# rpm -qa ovirt-hosted-engine-setup
ovirt-hosted-engine-setup-1.2.4-1.el7ev.noarch

still seeing:

ioprocess communication (29023)::ERROR::2015-05-18 12:59:24,381::__init__::152::IOProcessClient::(_communicate) IOProcess failure
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 107, in _communicate
    raise Exception("FD closed")
Exception: FD closed

Comment 4 Sandro Bonazzola 2015-05-18 12:07:16 UTC
Jiri, the above ioprocess stack trace is a known issue in ioprocess, see bug #1189200

That's not the cause of "[ ERROR ] Unable to add hosted_engine_2 to the manager".

can you attach full sos report for this new run?

Comment 5 Jiri Belka 2015-05-18 14:35:52 UTC
Created attachment 1026733 [details]
sosreport-dell-r210ii-13.rhev.lab.eng.brq.redhat.com-20150518163140.tar.xz

Comment 6 Jiri Belka 2015-05-18 14:49:50 UTC
Ha, I put problematic host into maintenance and used 'Reinstall' and now I see it Up. Please inspect the logs why "second" try did work, thanks.

Comment 7 Sandro Bonazzola 2015-05-19 10:15:56 UTC
jiri, we see similar issues (fail on first run, success on the second run) when environmental / dhcp related issues come in.
Can you check the env / dhcp?

Comment 8 Simone Tiraboschi 2015-05-19 12:35:02 UTC
Ok, found in ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20150518125810-vvuk3a.log :

2015-05-18 12:59:24 DEBUG otopi.context context.dumpEnvironment:500 ENV NETWORK/iptablesEnable=bool:'False'

So it's just blocked by this one:
https://bugzilla.redhat.com/show_bug.cgi?id=1221148

In a few world: the second host doesn't got the firewall properly configured and so, depending from the previous firewall configuration, the engine could be able or not to contact VDSM and so the issue adding the second host.

Just as a workaround, you can try to deploy the second host keeping both iptables and firewalld disabled.

Comment 10 Jiri Belka 2015-05-29 10:03:30 UTC
ok, ovirt-hosted-engine-setup-1.2.4-2.el7ev.noarch

basic tests - maintenance for hosts, moving self-hosted engine, running vm

on 2nd host:

...
          Hosted Engine successfully set up
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150529112728.conf'
[ INFO  ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
                                                                              
[root@dell-r210ii-13 ~]# iptables -nL INPUT
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            state RELATED,ESTABLISHED
ACCEPT     icmp --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:54321
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:111
ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:111
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:22
ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:161
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:16514
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            multiport dports 5900:6923
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            multiport dports 49152:49216
REJECT     all  --  0.0.0.0/0            0.0.0.0/0            reject-with icmp-host-prohibited

Comment 13 Andrew Burden 2015-06-12 04:31:46 UTC
Cheers mate

Comment 15 errata-xmlrpc 2015-06-15 13:17:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1108.html