Bug 975558

Summary: DestroyVmVDSCommand fails with NPE. can't stop VM
Product: Red Hat Enterprise Virtualization Manager Reporter: Ohad Basan <obasan>
Component: ovirt-engineAssignee: Omer Frenkel <ofrenkel>
Status: CLOSED DUPLICATE QA Contact: Tareq Alayan <talayan>
Severity: high Docs Contact:
Priority: urgent    
Version: 3.3.0CC: acathrow, bazulay, eedri, hateya, iheim, jkt, lpeer, mgoldboi, michal.skrivanek, obasan, Rhev-m-bugs, talayan, yeylon
Target Milestone: ---Keywords: Regression, TestBlocker, Triaged
Target Release: 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-17 09:29:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
failed run from jenkins logs
none
local run logs none

Description Ohad Basan 2013-06-18 18:54:15 UTC
Description of problem:
when engine 3.3 executes DestroyVmVDSCommand command (using a host with vdsm3.2)
VM won't stop and an NPE is thrown

2013-06-18 21:03:36,325 ERROR [org.jboss.as.ejb3.invocation] (ajp-/127.0.0.1:8702-2) JBAS014134: EJB Invocation failed on component VdsEventListener for method public abstract void org.ovirt.engine.core.common.businessentities.IVdsEventListener.removeAsyncRunningCommand(org.ovirt.engine.core.compat.Guid): javax.ejb.EJBException: java.lang.NullPointerException

Comment 6 Omer Frenkel 2013-06-19 13:51:01 UTC
what is the test scenario? please provide steps to reproduce

Comment 7 Ohad Basan 2013-06-19 13:53:21 UTC
I think the writer of the test can provide you this info
Tareq can you assist?

Comment 8 Michal Skrivanek 2013-06-24 11:45:29 UTC
*** Bug 966597 has been marked as a duplicate of this bug. ***

Comment 11 Tareq Alayan 2013-07-01 06:37:22 UTC
I couldn't reproduce the NPE manually.
The test fails when it attempts to power off vm that have direct lun attached to it.
1. create vm
2. attach an external disk to it (direct lun)
3. start vm
4. power off vm
5. see [1] snapshot of engine log when run in jenkins (logs attached) (jenkins...tar.bz2)
6. it didn't cause a NPE when i ran manually but take a look at engine log snapshot[2]  (attached vdsm.log engine.log)

link to automatic test in jenkins http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-storage_direct_lun_sanity-iscsi-rest/3/consoleFull





[1] engine log
Thread-134::DEBUG::2013-06-26 13:26:57,450::libvirtconnection::136::vds::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPUACCT controller is not mounted
Thread-129::DEBUG::2013-06-26 13:26:57,457::vmChannels::199::vds::(register) Add fileno 28 to listener's channels.
Thread-134::ERROR::2013-06-26 13:26:57,450::sampling::355::vm.Vm::(collect) vmId=`f7a84203-3d1b-493f-a38f-694ec79009e8`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x1372928>
Traceback (most recent call last):
  File "/usr/share/vdsm/sampling.py", line 351, in collect
    statsFunction()
  File "/usr/share/vdsm/sampling.py", line 226, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/vm.py", line 513, in _sampleCpu
    cpuStats = self._vm._dom.getCPUStats(True, 0)
  File "/usr/share/vdsm/vm.py", line 815, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 112, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1849, in getCPUStats
    if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', dom=self)
libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted




[2]  engine log (run locally)
Thread-134::DEBUG::2013-06-26 13:26:57,450::libvirtconnection::136::vds::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPUACCT controller is not mounted
Thread-129::DEBUG::2013-06-26 13:26:57,457::vmChannels::199::vds::(register) Add fileno 28 to listener's channels.
Thread-134::ERROR::2013-06-26 13:26:57,450::sampling::355::vm.Vm::(collect) vmId=`f7a84203-3d1b-493f-a38f-694ec79009e8`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x1372928>
Traceback (most recent call last):
  File "/usr/share/vdsm/sampling.py", line 351, in collect
    statsFunction()
  File "/usr/share/vdsm/sampling.py", line 226, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/vm.py", line 513, in _sampleCpu
    cpuStats = self._vm._dom.getCPUStats(True, 0)
  File "/usr/share/vdsm/vm.py", line 815, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 112, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1849, in getCPUStats
    if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', dom=self)
libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted

Comment 12 Tareq Alayan 2013-07-01 06:38:34 UTC
Created attachment 767250 [details]
failed run from jenkins logs

Comment 13 Tareq Alayan 2013-07-01 06:41:23 UTC
Created attachment 767251 [details]
local run logs

Comment 14 Omer Frenkel 2013-07-01 09:29:23 UTC
just a note,
the snippets in comment 11 are vdsm log, not engine log.
the attached vdsm.log in local.run does not contain this error, and both engine logs does not contain the NPE reported, so it doesnt seem related to this bug.

but, if it happens on the automatic test and not manually, what is the difference?

Comment 15 Barak 2013-07-01 12:24:55 UTC
(In reply to Tareq Alayan from comment #11)
> I couldn't reproduce the NPE manually.
> The test fails when it attempts to power off vm that have direct lun
> attached to it.
> 1. create vm
> 2. attach an external disk to it (direct lun)
> 3. start vm
> 4. power off vm
> 5. see [1] snapshot of engine log when run in jenkins (logs attached)
> (jenkins...tar.bz2)
> 6. it didn't cause a NPE when i ran manually but take a look at engine log
> snapshot[2]  (attached vdsm.log engine.log)
> 
> link to automatic test in jenkins
> http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-
> storage_direct_lun_sanity-iscsi-rest/3/consoleFull
> 
> 
> 
> 
> 
> [1] engine log
> Thread-134::DEBUG::2013-06-26
> 13:26:57,450::libvirtconnection::136::vds::(wrapper) Unknown libvirterror:
> ecode: 55 edom: 10 level: 2 message: Requested operation is not valid:
> cgroup CPUACCT controller is not mounted
> Thread-129::DEBUG::2013-06-26 13:26:57,457::vmChannels::199::vds::(register)
> Add fileno 28 to listener's channels.
> Thread-134::ERROR::2013-06-26 13:26:57,450::sampling::355::vm.Vm::(collect)
> vmId=`f7a84203-3d1b-493f-a38f-694ec79009e8`::Stats function failed:
> <AdvancedStatsFunction _sampleCpu at 0x1372928>
> Traceback (most recent call last):
>   File "/usr/share/vdsm/sampling.py", line 351, in collect
>     statsFunction()
>   File "/usr/share/vdsm/sampling.py", line 226, in __call__
>     retValue = self._function(*args, **kwargs)
>   File "/usr/share/vdsm/vm.py", line 513, in _sampleCpu
>     cpuStats = self._vm._dom.getCPUStats(True, 0)
>   File "/usr/share/vdsm/vm.py", line 815, in f
>     ret = attr(*args, **kwargs)
>   File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line
> 112, in wrapper
>     ret = f(*args, **kwargs)
>   File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1849, in
> getCPUStats
>     if ret is None: raise libvirtError ('virDomainGetCPUStats() failed',
> dom=self)
> libvirtError: Requested operation is not valid: cgroup CPUACCT controller is
> not mounted
> 
> 
> 
> 
> [2]  engine log (run locally)
> Thread-134::DEBUG::2013-06-26
> 13:26:57,450::libvirtconnection::136::vds::(wrapper) Unknown libvirterror:
> ecode: 55 edom: 10 level: 2 message: Requested operation is not valid:
> cgroup CPUACCT controller is not mounted
> Thread-129::DEBUG::2013-06-26 13:26:57,457::vmChannels::199::vds::(register)
> Add fileno 28 to listener's channels.
> Thread-134::ERROR::2013-06-26 13:26:57,450::sampling::355::vm.Vm::(collect)
> vmId=`f7a84203-3d1b-493f-a38f-694ec79009e8`::Stats function failed:
> <AdvancedStatsFunction _sampleCpu at 0x1372928>
> Traceback (most recent call last):
>   File "/usr/share/vdsm/sampling.py", line 351, in collect
>     statsFunction()
>   File "/usr/share/vdsm/sampling.py", line 226, in __call__
>     retValue = self._function(*args, **kwargs)
>   File "/usr/share/vdsm/vm.py", line 513, in _sampleCpu
>     cpuStats = self._vm._dom.getCPUStats(True, 0)
>   File "/usr/share/vdsm/vm.py", line 815, in f
>     ret = attr(*args, **kwargs)
>   File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line
> 112, in wrapper
>     ret = f(*args, **kwargs)
>   File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1849, in
> getCPUStats
>     if ret is None: raise libvirtError ('virDomainGetCPUStats() failed',
> dom=self)
> libvirtError: Requested operation is not valid: cgroup CPUACCT controller is
> not mounted

The above exception was fixed in http://gerrit.ovirt.org/#/c/16204/

Comment 16 Ohad Basan 2013-07-01 12:28:59 UTC
I think that there was a mixed in the errors.
The problem which I opened the bug on "NPE on destroyVds command"
can be seen in the engine.log I provided.
This problem still persists.
The owner of the test can provide additional information for why the command doesn't fail on manual removal and does fail on automatic removal.
(different params maybe? )

Comment 18 Omer Frenkel 2013-07-01 13:20:05 UTC
thanks Ohad,
i meant to ask Tareq.

Tareq,
if the error happens on the automatic test and not manually, what is the difference between the environments that cause this?

Comment 19 Tareq Alayan 2013-07-14 12:34:32 UTC
steps to reproduce: 
assume: have is5 installed and running 
1. create VM with network and disk.
2. start vm - vm is up 
3. via sdk: 
from ovirtsdk import api
api = api.API("https://url.to.rhevm.engine", "USERNAME@internal", "PASSWORD", persistent_auth=False, insecure=True)

vms= api.vms.list()
test1 = vms[0]
test1.stop()

Comment 20 Omer Frenkel 2013-07-17 09:29:49 UTC
the reproduction steps described in comment 19 lead to the same error fixed in bug 974999
i've verified with Barak Dagan which experienced the error described here that this is the same bug.

*** This bug has been marked as a duplicate of bug 974999 ***