Bug 1364787 - Installing a 3.6 host on a 3.6 cluster cause host to stuck on activating state
Summary: Installing a 3.6 host on a 3.6 cluster cause host to stuck on activating state
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Host-Deploy
Version: 4.0.2.4
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ovirt-4.0.2
: 4.0.2.6
Assignee: Martin Perina
QA Contact: Michael Burman
URL:
Whiteboard:
: 1366593 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-07 13:14 UTC by sefi litmanovich
Modified: 2016-08-17 14:44 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-08-17 14:44:52 UTC
oVirt Team: Network
Embargoed:
rule-engine: ovirt-4.0.z+
rule-engine: blocker+
rule-engine: planning_ack+
danken: devel_ack+
pstehlik: testing_ack+


Attachments (Terms of Use)
engine log (497.07 KB, application/x-gzip)
2016-08-07 13:14 UTC, sefi litmanovich
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 62113 0 ovirt-engine-4.0 MERGED engine: fix NPE related to switchType 2016-08-09 12:08:21 UTC
oVirt gerrit 62117 0 ovirt-engine-4.0.2 MERGED engine: fix NPE related to switchType 2016-08-09 12:09:24 UTC
oVirt gerrit 62380 0 master MERGED engine: Set default switchType if none is received from vdsm 2016-08-16 14:16:53 UTC

Description sefi litmanovich 2016-08-07 13:14:18 UTC
Created attachment 1188377 [details]
engine log

Description of problem:

Tried to deploy a 3.6 host in rhevm-4.0.2.4-0.1.el7ev.noarch in a 3.6 DC-Cluster.
The host was installed on a different 3.6 env and was put to maintenance on that env (not sure that is related to the problem).
Deployment ended with error:
'Host host_upgrade installation failed. Failed to configure management network on the host.'
In engine log there were several messages starting with:

2016-08-07 15:11:05,194 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CollectVdsNetworkDataAfterInstallationVDSCommand] (org.ovirt.thread.pool-6-thread-46) [5c804e8b] Failed in 'CollectVdsNetworkDataAfterInstallationVDS' method, for vd
s: 'host_upgrade'; host: 'monique-vds01.tlv.redhat.com': null
2016-08-07 15:11:05,194 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CollectVdsNetworkDataAfterInstallationVDSCommand] (org.ovirt.thread.pool-6-thread-46) [5c804e8b] Command 'CollectVdsNetworkDataAfterInstallationVDSCommand(HostName =
 host_upgrade, CollectHostNetworkDataVdsCommandParameters:{runAsync='true', hostId='5e3c8781-2190-4f87-97fb-b31c5096e6b9', vds='Host[host_upgrade,5e3c8781-2190-4f87-97fb-b31c5096e6b9]'})' execution failed: null
2016-08-07 15:11:05,195 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (org.ovirt.thread.pool-6-thread-46) [5c804e8b] transaction rolled back
2016-08-07 15:11:05,195 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (org.ovirt.thread.pool-6-thread-46) [5c804e8b] Exception: org.ovirt.engine.core.common.errors.EngineException: EngineException: java.lang.NullP
ointerException (Failed with error ENGINE and code 5001)
        at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:114) [bll.jar:]
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.runVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:]
        at org.ovirt.engine.core.bll.network.NetworkConfigurator.lambda$refreshNetworkConfiguration$0(NetworkConfigurator.java:144) [bll.jar:]........ (long trace, see attached log)


2016-08-07 15:11:05,206 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (org.ovirt.thread.pool-6-thread-46) [5c804e8b] Host installation failed for host '5e3c8781-2190-4f87-97fb-b31c5096e6b9', 'host_upgrade': Failed to configure management network on the host
2016-08-07 15:11:05,212 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-6-thread-46) [5c804e8b] START, SetVdsStatusVDSCommand(HostName = host_upgrade, SetVdsStatusVDSCommandParameters:{runAsync='true', hostId='5e3c8781-2190-4f87-97fb-b31c5096e6b9', status='NonOperational', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 3934a87b
2016-08-07 15:11:05,217 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (org.ovirt.thread.pool-6-thread-46) [5c804e8b] FINISH, SetVdsStatusVDSCommand, log id: 3934a87b
2016-08-07 15:11:05,233 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-46) [5c804e8b] Correlation ID: 5c804e8b, Job ID: 80d8bb10-45ac-4b7f-b5d9-51425f9616b8, Call Stack: null, Custom Event ID: -1, Message: Host host_upgrade installation failed. Failed to configure management network on the host.
2016-08-07 15:11:05,238 INFO  [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (org.ovirt.thread.pool-6-thread-46) [5c804e8b] Lock freed to object 'EngineLock:{exclusiveLocks='[5e3c8781-2190-4f87-97fb-b31c5096e6b9=<VDS, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2016-08-07 15:11:06,671 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler1) [31b0ff79] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand' return value 'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSInfoReturnForXmlRpc@1559ddc2'
2016-08-07 15:11:06,671 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler1) [31b0ff79] HostName = host_upgrade
2016-08-07 15:11:06,672 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler1) [31b0ff79] Failed in 'GetCapabilitiesVDS' method, for vds: 'host_upgrade'; host: 'monique-vds01.tlv.redhat.com': null
2016-08-07 15:11:06,672 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler1) [31b0ff79] Command 'GetCapabilitiesVDSCommand(HostName = host_upgrade, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='5e3c8781-2190-4f87-97fb-b31c5096e6b9', vds='Host[host_upgrade,5e3c8781-2190-4f87-97fb-b31c5096e6b9]'})' execution failed: null
2016-08-07 15:11:06,672 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler1) [31b0ff79] Failure to refresh Vds runtime info: null


The the following error appear over an over again:

2016-08-07 15:11:06,672 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler1) [31b0ff79] Exception: java.lang.NullPointerException
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder.getSwitchType(VdsBrokerObjectsBuilder.java:2165) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder.addHostNetworksAndUpdateInterfaces(VdsBrokerObjectsBuilder.java:1780) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder.updateNetworkData(VdsBrokerObjectsBuilder.java:1733) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerObjectsBuilder.updateVDSDynamicData(VdsBrokerObjectsBuilder.java:875) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:17) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:73) [vdsbroker.jar:]
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:451) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:653) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.beforeFirstRefreshTreatment(HostMonitoring.java:687) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:129) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refresh(HostMonitoring.java:85) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:238) [vdsbroker.jar:]
        at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source) [:1.8.0_101]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_101]
        at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_101]
        at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_101]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_101]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_101]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_101]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_101]

Host became non operational and after restarting network on the host I tried to get_capalities in order for the host's nic to appear (there is non right now in rhevm), at that point host became non responsice and I tried to re install.
During the re installation same issue occurred only this time the host is stuck on 'Activating' and engine.log is filled with the last above mentioned error.

Version-Release number of selected component (if applicable):
rhevm-4.0.2.4-0.1.el7ev.noarch

How reproducible:
happened once to me.

Steps to Reproduce:
1. Have a host with 3.6 vdsm-4.17.33-1.el7ev.noarch (in my case host had ovirtmgmt bridge from previous rhevm 3.6 environment it was installed on.)
2. Try to install host on a 3.6 cluster in rhevm-4.0


Actual results:
Install fails because of network configuration.

Expected results:
Install succeeds

Additional info:
Didn't see any related messaged on server.log.

Comment 1 SATHEESARAN 2016-08-12 13:20:53 UTC
*** Bug 1366593 has been marked as a duplicate of this bug. ***

Comment 2 Michael Burman 2016-08-15 07:44:41 UTC
Hi, i'm testing this report on 4.0.2.6-0.1.el7ev with vdsm-4.17.33-1.el7ev.noarch

The result is that host installed with success in cluster 3.6 on rhv-m 4.0 and it's UP, but 'ovirtmgmt' network is out of sync with the host.

DC report 'legacy' type and host nothing. 

- It looks like vdsm is missing the  'switch': value on 3.6. and it's only reported in the cfg.(not like in vdsm 4.0) 

Just to compare the output of getVdsCaps - 

vdsm 3.6 - 

networks = {'ovirtmgmt': {'addr': '10.35.128.15',
                                  'bridged': True,
                                  'cfg': {'BOOTPROTO': 'dhcp',
                                          'DEFROUTE': 'yes',
                                          'DELAY': '0',
                                          'DEVICE': 'ovirtmgmt',
                                          'HOTPLUG': 'no',
                                          'IPV6INIT': 'no',
                                          'MTU': '1500',
                                          'NM_CONTROLLED': 'no',
                                          'ONBOOT': 'yes',
                                          'STP': 'off',
                                          'SWITCH': 'legacy',
                                          'TYPE': 'Bridge'},
                                  'dhcpv4': True,
                                  'dhcpv6': False,
                                  'gateway': '10.35.128.254',
                                  'iface': 'ovirtmgmt',
                                  'ipv4addrs': ['10.35.128.15/24'],
                                  'ipv6addrs': [],
                                  'ipv6gateway': '::',
                                  'mtu': '1500',
                                  'netmask': '255.255.255.0',
                                  'ports': ['enp4s0'],
                                  'stp': 'off'}}





vdsm 4.0 - 

networks = {'ovirtmgmt': {'addr': '10.35.160.53',
                                  'bridged': True,
                                  'cfg': {'BOOTPROTO': 'dhcp',
                                          'DEFROUTE': 'yes',
                                          'DELAY': '0',
                                          'DEVICE': 'ovirtmgmt',
                                          'IPV6INIT': 'no',
                                          'MTU': '1500',
                                          'NM_CONTROLLED': 'no',
                                          'ONBOOT': 'yes',
                                          'STP': 'off',
                                          'TYPE': 'Bridge'},
                                  'dhcpv4': True,
                                  'dhcpv6': False,
                                  'gateway': '10.35.160.254',
                                  'hostQos': {'out': {'ls': {'d': 0, 'm1': 0, 'm2': 50}}},
                                  'iface': 'ovirtmgmt',
                                  'ipv4addrs': ['10.35.160.53/24'],
                                  'ipv6addrs': [],
                                  'ipv6autoconf': False,
                                  'ipv6gateway': '::',
                                  'mtu': '1500',
                                  'netmask': '255.255.255.0',
                                  'ports': ['enp4s0f0'],
                                  'stp': 'off',
                                  'switch': 'legacy'}}

Let me know if you would like to keep it in the same bug or it deserve a separate one.

Comment 3 Dan Kenigsberg 2016-08-16 10:28:10 UTC
yes, please open a fresh new blocker - adding a 3.6 host to 4.0 results in ovirtmgmt out of sync.

Comment 4 Michael Burman 2016-08-16 11:49:41 UTC
Verified on - 4.0.2.6-0.1.el7ev and vdsm-4.18.11-1.el7ev.x86_64


Note You need to log in before you can comment on or make changes to this bug.