Description of problem: I've deployed HE over iSCSI on one RHEL7.2 host, added to it NFS data storage domain and tried to add additional RHEL7.2 hosed-engine-host and failed. In my case both hosts also may "see" 2 additional empty iSCSI LUNs on iSCSI storage, only one of which is HE's LUN (the LUN with 75Gigs there belongs to HE, others are empty). iqn of hosted-engine-host was correct. Version-Release number of selected component (if applicable): Engine: rhevm-doc-4.0.0-2.el7ev.noarch rhev-guest-tools-iso-4.0-4.el7ev.noarch rhevm-4.0.1.1-0.1.el7ev.noarch rhevm-spice-client-x86-msi-4.0-2.el7ev.noarch rhevm-branding-rhev-4.0.0-3.el7ev.noarch rhevm-spice-client-x64-msi-4.0-2.el7ev.noarch rhevm-guest-agent-common-1.0.12-2.el7ev.noarch rhevm-dependencies-4.0.0-1.el7ev.noarch rhevm-setup-plugins-4.0.0.1-1.el7ev.noarch rhev-release-4.0.1-2-001.noarch Linux version 3.10.0-327.22.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Jun 9 10:09:10 EDT 2016 Linux 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) Hosts: ovirt-engine-sdk-python-3.6.7.0-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.18.x86_64 mom-0.5.5-1.el7ev.noarch ovirt-setup-lib-1.0.2-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.5.x86_64 ovirt-vmconsole-host-1.0.4-1.el7ev.noarch ovirt-hosted-engine-ha-2.0.1-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.1-1.el7ev.noarch ovirt-host-deploy-1.5.1-1.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch vdsm-4.18.6-1.el7ev.x86_64 sanlock-3.2.4-2.el7_2.x86_64 ovirt-imageio-daemon-0.3.0-0.el7ev.noarch ovirt-imageio-common-0.3.0-0.el7ev.noarch Linux version 3.10.0-327.30.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Wed Jul 13 22:09:46 EDT 2016 Linux 3.10.0-327.30.1.el7.x86_64 #1 SMP Wed Jul 13 22:09:46 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) How reproducible: 100% Steps to Reproduce: 1.Deploy HE over iSCSI and add additionally NFS data storage domain to get HE imported correctly in to the engine's WEBUI. 2.Try adding additional HE host via WEBUI, make sure it's iqn is correct and mapped prior to adding it via WEBUI. 3. Actual results: Host alma04.qa.lab.tlv.redhat.com reports about one of the Active Storage Domains as Problematic. Host failed to get added as additional hosted-engine-host. Expected results: Should successfully get added via WEBUI. Additional info: Sosreports from both hosts and the engine attached (the host that being added is alma04).
Created attachment 1181172 [details] sosreport from engine
Created attachment 1181173 [details] sosreport from alma03
Created attachment 1181175 [details] sosreport from alma04
This seams just a duplicate of 1350763: Add host failed - failed to configure ovirtmgmt network on host since vdsm is still on recovery It's not hosted-engine specific. 2016-07-18 11:44:53,487 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [148b825d] Correlation ID: 148b825d, Call Stack: null, Custom Event ID: -1, Message: Installing Host alma04.qa.lab.tlv.redhat.com. Stage: Termination. 2016-07-18 11:44:53,620 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to alma04.qa.lab.tlv.redhat.com/10.35.117.26 2016-07-18 11:44:56,620 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.TimeBoundPollVDSCommand] (org.ovirt.thread.pool-6-thread-24) [148b825d] Command 'TimeBoundPollVDSCommand(HostName = alma04.qa.lab.tlv.redhat.com, TimeBoundPollVDSCommandParameters:{runAsync='true', hostId='beff4479-67f4-425e-8484-82ac26dc2fc4'})' execution failed: VDSGenericException: VDSNetworkException: Timeout during xml-rpc call 2016-07-18 11:44:56,620 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.TimeBoundPollVDSCommand] (org.ovirt.thread.pool-6-thread-24) [148b825d] Timeout waiting for VDSM response: null 2016-07-18 11:44:56,620 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.SSLClient] (org.ovirt.thread.pool-6-thread-32) [] null: java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039) [rt.jar:1.8.0_101] at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) [rt.jar:1.8.0_101] at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) [rt.jar:1.8.0_101] at org.ovirt.vdsm.jsonrpc.client.reactors.stomp.SSLStompClient.waitForConnect(SSLStompClient.java:107) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.reactors.stomp.SSLStompClient.sendMessage(SSLStompClient.java:78) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.JsonRpcClient.call(JsonRpcClient.java:81) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:91) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer.lambda$timeBoundPoll$2(JsonRpcVdsServer.java:972) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer$FutureCallable.call(JsonRpcVdsServer.java:458) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcVdsServer$FutureCallable.call(JsonRpcVdsServer.java:447) [vdsbroker.jar:] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_101] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_101] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_101] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_101] 2016-07-18 11:45:00,602 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmsStatisticsFetcher] (DefaultQuartzScheduler7) [39efcb4] Fetched 1 VMs from VDS 'de5723be-3605-420e-9afd-86dc0e08c606' 2016-07-18 11:45:01,621 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to alma04.qa.lab.tlv.redhat.com/10.35.117.26 2016-07-18 11:45:04,764 INFO [org.ovirt.engine.core.bll.network.NetworkConfigurator] (org.ovirt.thread.pool-6-thread-24) [148b825d] Engine managed to communicate with VDSM agent on host 'alma04.qa.lab.tlv.redhat.com' ('beff4479-67f4-425e-8484-82ac26dc2fc4') 2016-07-18 11:45:05,736 INFO [org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand] (org.ovirt.thread.pool-6-thread-24) [54341c1e] Lock Acquired to object 'EngineLock:{exclusiveLocks='[beff4479-67f4-425e-8484-82ac26dc2fc4=<HOST_NETWORK, ACTION_TYPE_FAILED_SETUP_NETWORKS_IN_PROGRESS>]', sharedLocks='null'}' 2016-07-18 11:45:05,829 INFO [org.ovirt.engine.core.bll.network.host.HostSetupNetworksCommand] (org.ovirt.thread.pool-6-thread-24) [54341c1e] Running command: HostSetupNetworksCommand internal: true. Entities affected : ID: beff4479-67f4-425e-8484-82ac26dc2fc4 Type: VDSAction group CONFIGURE_HOST_NETWORK with role type ADMIN 2016-07-18 11:45:05,835 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (org.ovirt.thread.pool-6-thread-24) [54341c1e] START, HostSetupNetworksVDSCommand(HostName = alma04.qa.lab.tlv.redhat.com, HostSetupNetworksVdsCommandParameters:{runAsync='true', hostId='beff4479-67f4-425e-8484-82ac26dc2fc4', vds='Host[alma04.qa.lab.tlv.redhat.com,beff4479-67f4-425e-8484-82ac26dc2fc4]', rollbackOnFailure='true', connectivityTimeout='120', networks='[HostNetwork:{defaultRoute='true', bonding='false', networkName='ovirtmgmt', nicName='enp3s0f0', vlan='null', mtu='0', vmNetwork='true', stp='false', properties='null', ipv4BootProtocol='DHCP', ipv4Address='null', ipv4Netmask='null', ipv4Gateway='null', ipv6BootProtocol='NONE', ipv6Address='null', ipv6Prefix='null', ipv6Gateway='null'}]', removedNetworks='[]', bonds='[]', removedBonds='[]'}), log id: 7572ebe8 2016-07-18 11:45:05,838 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (org.ovirt.thread.pool-6-thread-24) [54341c1e] FINISH, HostSetupNetworksVDSCommand, log id: 7572ebe8 *** This bug has been marked as a duplicate of bug 1350763 ***
These two are different bugs, mine was found on 4.0.1, wheres https://bugzilla.redhat.com/show_bug.cgi?id=1350763#c31 was for 3.6.8 and also was verified on 2016-07-18 06:39:04 EDT, however I still got this bug on 2016-07-18 12:08 EDT, so this bug should be fixed for 4.0.
Ok sorry, 1348103 is the same issue for 4.0.2 so setting this as a duplicate of that. *** This bug has been marked as a duplicate of bug 1348103 ***
I'm not sure these still the same, as in bug 1348103 addition succeeded after 5 minutes, but in mine it's not, it stays in this state statically.
(In reply to Nikolai Sednev from comment #7) > I'm not sure these still the same, as in bug 1348103 addition succeeded > after 5 minutes, but in mine it's not, it stays in this state statically. On my opinion the issue is really the same. In that case after 5 minutes the AutoRecoveryManager kicks in and the host goes up by itself and so just retrying on hosted-engine-side is enough to continue. Maybe we need to better understand why here the AutoRecoveryManager is not enough but the root issue is really the same.