975558 – DestroyVmVDSCommand fails with NPE. can't stop VM

Bug 975558 - DestroyVmVDSCommand fails with NPE. can't stop VM

Summary: DestroyVmVDSCommand fails with NPE. can't stop VM

Keywords:
Status:	CLOSED DUPLICATE of bug 974999
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	3.3.0
Assignee:	Omer Frenkel
QA Contact:	Tareq Alayan
Docs Contact:
URL:
Whiteboard:	virt
Duplicates (1):	966597 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-06-18 18:54 UTC by Ohad Basan
Modified:	2015-10-27 23:57 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-07-17 09:29:49 UTC
oVirt Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
failed run from jenkins logs (514.63 KB, application/x-bzip2) 2013-07-01 06:38 UTC, Tareq Alayan	no flags	Details
local run logs (412.35 KB, application/zip) 2013-07-01 06:41 UTC, Tareq Alayan	no flags	Details
View All

Description Ohad Basan 2013-06-18 18:54:15 UTC

Description of problem:
when engine 3.3 executes DestroyVmVDSCommand command (using a host with vdsm3.2)
VM won't stop and an NPE is thrown

2013-06-18 21:03:36,325 ERROR [org.jboss.as.ejb3.invocation] (ajp-/127.0.0.1:8702-2) JBAS014134: EJB Invocation failed on component VdsEventListener for method public abstract void org.ovirt.engine.core.common.businessentities.IVdsEventListener.removeAsyncRunningCommand(org.ovirt.engine.core.compat.Guid): javax.ejb.EJBException: java.lang.NullPointerException

Comment 6 Omer Frenkel 2013-06-19 13:51:01 UTC

what is the test scenario? please provide steps to reproduce

Comment 7 Ohad Basan 2013-06-19 13:53:21 UTC

I think the writer of the test can provide you this info
Tareq can you assist?

Comment 8 Michal Skrivanek 2013-06-24 11:45:29 UTC

*** Bug 966597 has been marked as a duplicate of this bug. ***

Comment 11 Tareq Alayan 2013-07-01 06:37:22 UTC

I couldn't reproduce the NPE manually.
The test fails when it attempts to power off vm that have direct lun attached to it.
1. create vm
2. attach an external disk to it (direct lun)
3. start vm
4. power off vm
5. see [1] snapshot of engine log when run in jenkins (logs attached) (jenkins...tar.bz2)
6. it didn't cause a NPE when i ran manually but take a look at engine log snapshot[2]  (attached vdsm.log engine.log)

link to automatic test in jenkins http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-storage_direct_lun_sanity-iscsi-rest/3/consoleFull





[1] engine log
Thread-134::DEBUG::2013-06-26 13:26:57,450::libvirtconnection::136::vds::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPUACCT controller is not mounted
Thread-129::DEBUG::2013-06-26 13:26:57,457::vmChannels::199::vds::(register) Add fileno 28 to listener's channels.
Thread-134::ERROR::2013-06-26 13:26:57,450::sampling::355::vm.Vm::(collect) vmId=`f7a84203-3d1b-493f-a38f-694ec79009e8`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x1372928>
Traceback (most recent call last):
  File "/usr/share/vdsm/sampling.py", line 351, in collect
    statsFunction()
  File "/usr/share/vdsm/sampling.py", line 226, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/vm.py", line 513, in _sampleCpu
    cpuStats = self._vm._dom.getCPUStats(True, 0)
  File "/usr/share/vdsm/vm.py", line 815, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 112, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1849, in getCPUStats
    if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', dom=self)
libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted




[2]  engine log (run locally)
Thread-134::DEBUG::2013-06-26 13:26:57,450::libvirtconnection::136::vds::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPUACCT controller is not mounted
Thread-129::DEBUG::2013-06-26 13:26:57,457::vmChannels::199::vds::(register) Add fileno 28 to listener's channels.
Thread-134::ERROR::2013-06-26 13:26:57,450::sampling::355::vm.Vm::(collect) vmId=`f7a84203-3d1b-493f-a38f-694ec79009e8`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x1372928>
Traceback (most recent call last):
  File "/usr/share/vdsm/sampling.py", line 351, in collect
    statsFunction()
  File "/usr/share/vdsm/sampling.py", line 226, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/vm.py", line 513, in _sampleCpu
    cpuStats = self._vm._dom.getCPUStats(True, 0)
  File "/usr/share/vdsm/vm.py", line 815, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 112, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1849, in getCPUStats
    if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', dom=self)
libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted

Comment 12 Tareq Alayan 2013-07-01 06:38:34 UTC

Created attachment 767250 [details]
failed run from jenkins logs

Comment 13 Tareq Alayan 2013-07-01 06:41:23 UTC

Created attachment 767251 [details]
local run logs

Comment 14 Omer Frenkel 2013-07-01 09:29:23 UTC

just a note,
the snippets in comment 11 are vdsm log, not engine log.
the attached vdsm.log in local.run does not contain this error, and both engine logs does not contain the NPE reported, so it doesnt seem related to this bug.

but, if it happens on the automatic test and not manually, what is the difference?

Comment 15 Barak 2013-07-01 12:24:55 UTC

(In reply to Tareq Alayan from comment #11)
> I couldn't reproduce the NPE manually.
> The test fails when it attempts to power off vm that have direct lun
> attached to it.
> 1. create vm
> 2. attach an external disk to it (direct lun)
> 3. start vm
> 4. power off vm
> 5. see [1] snapshot of engine log when run in jenkins (logs attached)
> (jenkins...tar.bz2)
> 6. it didn't cause a NPE when i ran manually but take a look at engine log
> snapshot[2]  (attached vdsm.log engine.log)
> 
> link to automatic test in jenkins
> http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-
> storage_direct_lun_sanity-iscsi-rest/3/consoleFull
> 
> 
> 
> 
> 
> [1] engine log
> Thread-134::DEBUG::2013-06-26
> 13:26:57,450::libvirtconnection::136::vds::(wrapper) Unknown libvirterror:
> ecode: 55 edom: 10 level: 2 message: Requested operation is not valid:
> cgroup CPUACCT controller is not mounted
> Thread-129::DEBUG::2013-06-26 13:26:57,457::vmChannels::199::vds::(register)
> Add fileno 28 to listener's channels.
> Thread-134::ERROR::2013-06-26 13:26:57,450::sampling::355::vm.Vm::(collect)
> vmId=`f7a84203-3d1b-493f-a38f-694ec79009e8`::Stats function failed:
> <AdvancedStatsFunction _sampleCpu at 0x1372928>
> Traceback (most recent call last):
>   File "/usr/share/vdsm/sampling.py", line 351, in collect
>     statsFunction()
>   File "/usr/share/vdsm/sampling.py", line 226, in __call__
>     retValue = self._function(*args, **kwargs)
>   File "/usr/share/vdsm/vm.py", line 513, in _sampleCpu
>     cpuStats = self._vm._dom.getCPUStats(True, 0)
>   File "/usr/share/vdsm/vm.py", line 815, in f
>     ret = attr(*args, **kwargs)
>   File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line
> 112, in wrapper
>     ret = f(*args, **kwargs)
>   File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1849, in
> getCPUStats
>     if ret is None: raise libvirtError ('virDomainGetCPUStats() failed',
> dom=self)
> libvirtError: Requested operation is not valid: cgroup CPUACCT controller is
> not mounted
> 
> 
> 
> 
> [2]  engine log (run locally)
> Thread-134::DEBUG::2013-06-26
> 13:26:57,450::libvirtconnection::136::vds::(wrapper) Unknown libvirterror:
> ecode: 55 edom: 10 level: 2 message: Requested operation is not valid:
> cgroup CPUACCT controller is not mounted
> Thread-129::DEBUG::2013-06-26 13:26:57,457::vmChannels::199::vds::(register)
> Add fileno 28 to listener's channels.
> Thread-134::ERROR::2013-06-26 13:26:57,450::sampling::355::vm.Vm::(collect)
> vmId=`f7a84203-3d1b-493f-a38f-694ec79009e8`::Stats function failed:
> <AdvancedStatsFunction _sampleCpu at 0x1372928>
> Traceback (most recent call last):
>   File "/usr/share/vdsm/sampling.py", line 351, in collect
>     statsFunction()
>   File "/usr/share/vdsm/sampling.py", line 226, in __call__
>     retValue = self._function(*args, **kwargs)
>   File "/usr/share/vdsm/vm.py", line 513, in _sampleCpu
>     cpuStats = self._vm._dom.getCPUStats(True, 0)
>   File "/usr/share/vdsm/vm.py", line 815, in f
>     ret = attr(*args, **kwargs)
>   File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line
> 112, in wrapper
>     ret = f(*args, **kwargs)
>   File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1849, in
> getCPUStats
>     if ret is None: raise libvirtError ('virDomainGetCPUStats() failed',
> dom=self)
> libvirtError: Requested operation is not valid: cgroup CPUACCT controller is
> not mounted

The above exception was fixed in http://gerrit.ovirt.org/#/c/16204/

Comment 16 Ohad Basan 2013-07-01 12:28:59 UTC

I think that there was a mixed in the errors.
The problem which I opened the bug on "NPE on destroyVds command"
can be seen in the engine.log I provided.
This problem still persists.
The owner of the test can provide additional information for why the command doesn't fail on manual removal and does fail on automatic removal.
(different params maybe? )

Comment 18 Omer Frenkel 2013-07-01 13:20:05 UTC

thanks Ohad,
i meant to ask Tareq.

Tareq,
if the error happens on the automatic test and not manually, what is the difference between the environments that cause this?

Comment 19 Tareq Alayan 2013-07-14 12:34:32 UTC

steps to reproduce: 
assume: have is5 installed and running 
1. create VM with network and disk.
2. start vm - vm is up 
3. via sdk: 
from ovirtsdk import api
api = api.API("https://url.to.rhevm.engine", "USERNAME@internal", "PASSWORD", persistent_auth=False, insecure=True)

vms= api.vms.list()
test1 = vms[0]
test1.stop()

Comment 20 Omer Frenkel 2013-07-17 09:29:49 UTC

the reproduction steps described in comment 19 lead to the same error fixed in bug 974999
i've verified with Barak Dagan which experienced the error described here that this is the same bug.

*** This bug has been marked as a duplicate of bug 974999 ***

Note You need to log in before you can comment on or make changes to this bug.