Bug 981960

Summary: NullPointerException on Power Management command
Product: Red Hat Enterprise Virtualization Manager Reporter: Kiril Nesenko <knesenko>
Component: ovirt-engineAssignee: Martin Perina <mperina>
Status: CLOSED CURRENTRELEASE QA Contact: Artyom <alukiano>
Severity: high Docs Contact:
Priority: high    
Version: 3.3.0CC: acathrow, alukiano, bazulay, bdagan, dfediuck, eedri, iheim, jkt, knesenko, lpeer, oramraz, pstehlik, Rhev-m-bugs, yeylon, yzaslavs
Target Milestone: ---Keywords: Regression, TestBlocker
Target Release: 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: is7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-21 22:14:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kiril Nesenko 2013-07-07 10:24:34 UTC
Description of problem:

NullPointerException on Power Management command.

2013-07-07 11:51:45,083 - MainThread - plmanagement.error_fetcher - ERROR - Errors fetched from VDC(jenkins-automation-rpm-vm36.eng.lab.tlv.redhat.com): 2013-07-07 11:38:02,997 ERROR [org.ovirt.engine.core.bll.StartVdsCommand] (pool-4-thread-47) Failed to verify host cinteg35.ci.lab.tlv.redhat.com start status. Have retried 18 times with delay of 10 seconds between each retry.
2013-07-07 11:38:03,036 INFO  [org.ovirt.engine.core.bll.FenceExecutor] (pool-4-thread-47) Executing <Start> Power Management command, Proxy Host:null, Agent:ipmilan, Target Host:cinteg35.ci.lab.tlv.redhat.com, Management IP:10.35.148.109, User:admin, Options:
2013-07-07 11:38:03,036 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (pool-4-thread-47) CreateCommand failed: java.lang.NullPointerException
	at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) [rt.jar:1.7.0_25]
	at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) [rt.jar:1.7.0_25]
	at org.ovirt.engine.core.vdsbroker.ResourceManager.GetVdsManager(ResourceManager.java:192) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.initializeVdsBroker(VdsBrokerCommand.java:47) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.<init>(VdsBrokerCommand.java:28) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand.<init>(FenceVdsVDSCommand.java:18) [vdsbroker.jar:]
	at sun.reflect.GeneratedConstructorAccessor113.newInstance(Unknown Source) [:1.7.0_25]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) [rt.jar:1.7.0_25]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526) [rt.jar:1.7.0_25]
	at org.ovirt.engine.core.vdsbroker.ResourceManager.CreateCommand(ResourceManager.java:319) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:356) [vdsbroker.jar:]
	at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:]
	at org.ovirt.engine.core.bll.FenceExecutor.runFenceAction(FenceExecutor.java:192) [bll.jar:]
	at org.ovirt.engine.core.bll.FenceExecutor.Fence(FenceExecutor.java:166) [bll.jar:]
	at org.ovirt.engine.core.bll.FenceVdsBaseCommand.handleWaitFailure(FenceVdsBaseCommand.java:367) [bll.jar:]
	at org.ovirt.engine.core.bll.FenceVdsBaseCommand.handleSingleAgent(FenceVdsBaseCommand.java:192) [bll.jar:]
	at org.ovirt.engine.core.bll.FenceVdsBaseCommand.executeCommand(FenceVdsBaseCommand.java:159) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1068) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1153) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1623) [bll.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:]
	at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1171) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:317) [bll.jar:]

Version-Release number of selected component (if applicable):
57e65cd0946763755ef8834732bea2d3deb196e4

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Martin Perina 2013-07-10 14:38:30 UTC
I've noticed strange thing in engine.log, SSH Soft Fencing has not been executed prior to real fencing even when these patches are on.

I've not been able to reproduce NullPointerException, are there precise steps to reproduce it?

I've tried 2 scenarios:

1) I've stopped VDSM on host and after time interval, SSH Soft Fencing has been executed and host became up without any errors

2) I've set SSH Soft Fencing command to value, that will not start VDSM ("echo 0"), then I've stop VDSM, after time interval SSH Soft Fencing has been executed,
host hasn't come up, so after another time interval real fencing has been executed and after PM restart, host came up without any errors

Comment 5 Martin Perina 2013-07-18 07:37:26 UTC
Kiril, could you please provide some info how to reproduce this bug?

Comment 6 Kiril Nesenko 2013-07-18 07:48:43 UTC
(In reply to Martin Perina from comment #5)
> Kiril, could you please provide some info how to reproduce this bug?

You can take a look in the job - comment #2 and flow. I have no idea how to reproduce it and seems like this exception disappeared, I can't see it anymore in the job.

Comment 8 Artyom 2013-07-22 10:11:18 UTC
To reproduce this error, enable concurrent option under power management with two correct power management agents, error appear in versions sf18 and is6, for more details see bug: https://bugzilla.redhat.com/show_bug.cgi?id=977689

Comment 9 Martin Perina 2013-07-26 06:19:45 UTC
Since patch http://gerrit.ovirt.org/#/c/17206/ prevents executing power management commands with proxy host set to null, it should also fix this bug. Could you please retest with patch 17206 included?

Comment 10 Artyom 2013-07-28 06:10:32 UTC
Can't check path because this bug https://bugzilla.redhat.com/show_bug.cgi?id=982266, have host just with power managements types apc_snmp and ipmilan, and I need host with two power management for checking.

Comment 12 Artyom 2013-08-12 13:22:57 UTC
Check with the same power management as first and second pm's with concurrent enabled, fence worked fine.(No have hosts with two different power managements of same type)
Proxy host not null now and has correct value
Verified on is9.1

Comment 13 Itamar Heim 2014-01-21 22:14:27 UTC
Closing - RHEV 3.3 Released

Comment 14 Itamar Heim 2014-01-21 22:22:06 UTC
Closing - RHEV 3.3 Released