Created attachment 1064276 [details] HE bond log files Description of problem: It reports cannot add host when setup Hosted Engine through vlan or bonded network. It will report error when add host to cluster as follows. [ERROR] Cannot automatically add the host to cluster Default: Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details. [ERROR] Failed to execute stage 'Closing up': Cannot add the host to cluster Default Version-Release number of selected component (if applicable): rhev-hypervisor6-6.7-20150813.0.el6ev ovirt-node-3.2.3-18.el6.noarch ovirt-node-plugin-hosted-engine-0.2.0-18.0.el6ev.noarch ovirt-hosted-engine-setup-1.2.5.3-1.el6ev.noarch How reproducible: 100% Steps to Reproduce: Scenario 1: 1. Clean install rhev-hypervisor6-6.7-20150813.0.el6ev 2. Create network through bond # cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: p3p1 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: p3p1 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 00:1b:21:36:79:f0 Slave queue ID: 0 Slave Interface: p3p2 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 00:1b:21:36:79:f1 Slave queue ID: 0 3. Setup hosted engine through ova type. 4. Setup vm to rhevm 5. Select the default cluster to add host Enter the name of the cluster to which you want to add the host (Default) [Default]: Scenario 2: 1. Clean install rhev-hypervisor6-6.7-20150813.0.el6ev 2. Create network through vlan 3. Setup hosted engine through ova type. 4. Setup vm to rhevm 5. Select the default cluster to add host Enter the name of the cluster to which you want to add the host (Default) [Default]: Actual results: 1. It reports error like follows. [ERROR] Cannot automatically add the host to cluster Default: Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details. [ERROR] Failed to execute stage 'Closing up': Cannot add the host to cluster Default Expected results: 1. Setup hosted engine through vlan or bond network succeed. Additional info: No such issue when using em1 as network.
Created attachment 1064277 [details] HE vlan log files
2015-08-17 05:10:56 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._closeup:654 Cannot add the host to cluster Default Traceback (most recent call last): File "/usr/share/ovirt-hosted-engine-setup/plugins/ovirt-hosted-engine-setup/engine/add_host.py", line 645, in _closeup File "/usr/lib/python2.6/site-packages/ovirtsdk/infrastructure/brokers.py", line 13280, in add File "/usr/lib/python2.6/site-packages/ovirtsdk/infrastructure/proxy.py", line 88, in add File "/usr/lib/python2.6/site-packages/ovirtsdk/infrastructure/proxy.py", line 118, in request File "/usr/lib/python2.6/site-packages/ovirtsdk/infrastructure/proxy.py", line 146, in __doRequest File "/usr/lib/python2.6/site-packages/ovirtsdk/web/connection.py", line 134, in doRequest RequestError: status: 409 reason: Conflict detail: Cannot add Host. SSH authentication failed, verify authentication parameters are correct (Username/Password, public-key etc.) You may refer to the engine.log file for further details. Not sure what Conflict means here. Juan?
The 409 HTTP code and the "Conflict" reason are the translations of the type of error associated to that error message into HTTP terms. From ErrorMessage.java: VDS_CANNOT_AUTHENTICATE_TO_SERVER(ErrorType.CONFLICT) In theory "conflict" means that there is a conflict between the requested action and the state of the system, and that the conflict can be resolved by the user: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html - 10.4.10 In this particular case I think you shouldn't focus on that, rather on what the error message is saying.
Hi wanghui, Could you please share the log from engine vm? (/var/log/ovirt-engine/engine.log) I was able to reproduce your report only when the hostname of rhev-h was 'localhost.localdomain' so engine was not able to communicate with rhev-h as it tried to connect via ssh using root. Would be nice to double check if rhev-h hostname was reachable from rhev-m. Steps that I made ---------------------------- #1 Installed rhev-hypervisor6-6.7-20150813.0.el6ev #2 (TUI) Network tab configured hostname #3 (TUI) Configured bond0 and assigned eth0 and eth1 to it #4 (TUI) Hosted Engine Tab provided http://address/rhel67.iso and started the setup #5 Installed the RHEL6.7 and setup rhevm 3.5.4.2-1.3.el6ev <At this stage node.localdomain and engine.localdomain are configured in both machines in /etc/hosts> #6 Ask the hosted-engine to connect to engine and finish the setup Should be everything okay. If I missed the step #2 I can reproduce your report of "status: 409" as described in comment#3. Additional data (from working scenario) =========================================== # cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 6.7 (20150813.0.el6ev) [root@node ~]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : node.localdomain Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 2400 Local maintenance : False Host timestamp : 10596 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=10596 (Fri Aug 21 16:08:32 2015) host-id=1 score=2400 maintenance=False state=EngineUp # cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: load balancing (round-robin) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: up Speed: 100 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 52:54:00:59:17:ec Slave queue ID: 0 Slave Interface: eth1 MII Status: up Speed: 100 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 52:54:00:6a:67:2d Slave queue ID: 0 [root@node ~]# ifconfig bond0 Link encap:Ethernet HWaddr 52:54:00:59:17:EC inet6 addr: fe80::5054:ff:fe59:17ec/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:5686303 errors:0 dropped:6579 overruns:0 frame:0 TX packets:7391175 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:6632432255 (6.1 GiB) TX bytes:6658130300 (6.2 GiB) eth0 Link encap:Ethernet HWaddr 52:54:00:59:17:EC inet6 addr: fe80::5054:ff:fe59:17ec/64 Scope:Link UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:2843258 errors:0 dropped:2994 overruns:0 frame:0 TX packets:3695604 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3318764852 (3.0 GiB) TX bytes:3329191078 (3.1 GiB) Interrupt:11 Base address:0x4000 eth1 Link encap:Ethernet HWaddr 52:54:00:59:17:EC inet6 addr: fe80::5054:ff:fe59:17ec/64 Scope:Link UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:2843110 errors:0 dropped:3585 overruns:0 frame:0 TX packets:3695598 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3313670435 (3.0 GiB) TX bytes:3328941500 (3.1 GiB) Interrupt:11 Base address:0x2000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:23184 errors:0 dropped:0 overruns:0 frame:0 TX packets:23184 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:10535033 (10.0 MiB) TX bytes:10535033 (10.0 MiB) rhevm Link encap:Ethernet HWaddr 52:54:00:59:17:EC inet addr:192.168.122.30 Bcast:192.168.122.255 Mask:255.255.255.0 inet6 addr: fe80::5054:ff:fe59:17ec/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2020935 errors:0 dropped:0 overruns:0 frame:0 TX packets:1588440 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1338074021 (1.2 GiB) TX bytes:6216474993 (5.7 GiB) vnet0 Link encap:Ethernet HWaddr FE:16:3E:03:BD:98 inet6 addr: fe80::fc16:3eff:fe03:bd98/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1647 errors:0 dropped:0 overruns:0 frame:0 TX packets:2375 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:500 RX bytes:505453 (493.6 KiB) TX bytes:494602 (483.0 KiB) Thanks!
I also see that localhost.localdomain was used in the setup. Can you please make sure your host name is DNS resolvable? and retest asap?
(In reply to Yaniv Dary from comment #6) > I also see that localhost.localdomain was used in the setup. > Can you please make sure your host name is DNS resolvable? and retest asap? Hi Yaniv, rhevh's hostname is loalhost.localdomain which can be only resolved by local. For the self-hosted engine doc, i didn't see the rhevh hostname should be resolvable. And rhevm add rhevh should not need to resolve hostname also. We add rhevh from rhevm side only need to provide its hostname. So I need to make it clearly, why rhevh hostname shoule be DNS reslovable? Thanks, Hui Wang
(In reply to wanghui from comment #7) > (In reply to Yaniv Dary from comment #6) > > I also see that localhost.localdomain was used in the setup. > > Can you please make sure your host name is DNS resolvable? and retest asap? > > Hi Yaniv, > > rhevh's hostname is loalhost.localdomain which can be only resolved by > local. For the self-hosted engine doc, i didn't see the rhevh hostname > should be resolvable. And rhevm add rhevh should not need to resolve > hostname also. We add rhevh from rhevm side only need to provide its > hostname. > > So I need to make it clearly, why rhevh hostname shoule be DNS reslovable? It is our support requirements, you should test like in a customer environment. Issue that arise from unsupported flow, are not real issue. The engine and it's hosts must be DNS resolvable. When you provide a hostname, the engine need to be able to reach the host, which require DNS. > > Thanks, > Hui Wang
Retest on rhev-hypervisor6-6.7-20150813.0.el6ev. Setup rhevh hostname as a dns resolvable. Setup rhevm hostname as a dns resolvable. Test can be passed with bond(mode=1) as rhevh network.
Virt QE has informed me this is resolved on DNS resolvable. Closing.
I have one question for this issue, do we provide a way that we can add the host again after we resolve the hostname? Because we almost finished the HE setup process(Engine is up now). Can we provide a new way to handle this host is not reachable issue instead of quit directly? Thanks, Hui Wang
(In reply to wanghui from comment #11) > I have one question for this issue, do we provide a way that we can add the > host again after we resolve the hostname? Because we almost finished the HE > setup process(Engine is up now). Can we provide a new way to handle this > host is not reachable issue instead of quit directly? For hosted engine the best practice would be to reinstall cleanly, so remove any old installation and start over. > > Thanks, > Hui Wang