Description of problem: when engine 3.3 executes DestroyVmVDSCommand command (using a host with vdsm3.2) VM won't stop and an NPE is thrown 2013-06-18 21:03:36,325 ERROR [org.jboss.as.ejb3.invocation] (ajp-/127.0.0.1:8702-2) JBAS014134: EJB Invocation failed on component VdsEventListener for method public abstract void org.ovirt.engine.core.common.businessentities.IVdsEventListener.removeAsyncRunningCommand(org.ovirt.engine.core.compat.Guid): javax.ejb.EJBException: java.lang.NullPointerException
what is the test scenario? please provide steps to reproduce
I think the writer of the test can provide you this info Tareq can you assist?
*** Bug 966597 has been marked as a duplicate of this bug. ***
I couldn't reproduce the NPE manually. The test fails when it attempts to power off vm that have direct lun attached to it. 1. create vm 2. attach an external disk to it (direct lun) 3. start vm 4. power off vm 5. see [1] snapshot of engine log when run in jenkins (logs attached) (jenkins...tar.bz2) 6. it didn't cause a NPE when i ran manually but take a look at engine log snapshot[2] (attached vdsm.log engine.log) link to automatic test in jenkins http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3-storage_direct_lun_sanity-iscsi-rest/3/consoleFull [1] engine log Thread-134::DEBUG::2013-06-26 13:26:57,450::libvirtconnection::136::vds::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPUACCT controller is not mounted Thread-129::DEBUG::2013-06-26 13:26:57,457::vmChannels::199::vds::(register) Add fileno 28 to listener's channels. Thread-134::ERROR::2013-06-26 13:26:57,450::sampling::355::vm.Vm::(collect) vmId=`f7a84203-3d1b-493f-a38f-694ec79009e8`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x1372928> Traceback (most recent call last): File "/usr/share/vdsm/sampling.py", line 351, in collect statsFunction() File "/usr/share/vdsm/sampling.py", line 226, in __call__ retValue = self._function(*args, **kwargs) File "/usr/share/vdsm/vm.py", line 513, in _sampleCpu cpuStats = self._vm._dom.getCPUStats(True, 0) File "/usr/share/vdsm/vm.py", line 815, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 112, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1849, in getCPUStats if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', dom=self) libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted [2] engine log (run locally) Thread-134::DEBUG::2013-06-26 13:26:57,450::libvirtconnection::136::vds::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPUACCT controller is not mounted Thread-129::DEBUG::2013-06-26 13:26:57,457::vmChannels::199::vds::(register) Add fileno 28 to listener's channels. Thread-134::ERROR::2013-06-26 13:26:57,450::sampling::355::vm.Vm::(collect) vmId=`f7a84203-3d1b-493f-a38f-694ec79009e8`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x1372928> Traceback (most recent call last): File "/usr/share/vdsm/sampling.py", line 351, in collect statsFunction() File "/usr/share/vdsm/sampling.py", line 226, in __call__ retValue = self._function(*args, **kwargs) File "/usr/share/vdsm/vm.py", line 513, in _sampleCpu cpuStats = self._vm._dom.getCPUStats(True, 0) File "/usr/share/vdsm/vm.py", line 815, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 112, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1849, in getCPUStats if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', dom=self) libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted
Created attachment 767250 [details] failed run from jenkins logs
Created attachment 767251 [details] local run logs
just a note, the snippets in comment 11 are vdsm log, not engine log. the attached vdsm.log in local.run does not contain this error, and both engine logs does not contain the NPE reported, so it doesnt seem related to this bug. but, if it happens on the automatic test and not manually, what is the difference?
(In reply to Tareq Alayan from comment #11) > I couldn't reproduce the NPE manually. > The test fails when it attempts to power off vm that have direct lun > attached to it. > 1. create vm > 2. attach an external disk to it (direct lun) > 3. start vm > 4. power off vm > 5. see [1] snapshot of engine log when run in jenkins (logs attached) > (jenkins...tar.bz2) > 6. it didn't cause a NPE when i ran manually but take a look at engine log > snapshot[2] (attached vdsm.log engine.log) > > link to automatic test in jenkins > http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.3/job/3.3- > storage_direct_lun_sanity-iscsi-rest/3/consoleFull > > > > > > [1] engine log > Thread-134::DEBUG::2013-06-26 > 13:26:57,450::libvirtconnection::136::vds::(wrapper) Unknown libvirterror: > ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: > cgroup CPUACCT controller is not mounted > Thread-129::DEBUG::2013-06-26 13:26:57,457::vmChannels::199::vds::(register) > Add fileno 28 to listener's channels. > Thread-134::ERROR::2013-06-26 13:26:57,450::sampling::355::vm.Vm::(collect) > vmId=`f7a84203-3d1b-493f-a38f-694ec79009e8`::Stats function failed: > <AdvancedStatsFunction _sampleCpu at 0x1372928> > Traceback (most recent call last): > File "/usr/share/vdsm/sampling.py", line 351, in collect > statsFunction() > File "/usr/share/vdsm/sampling.py", line 226, in __call__ > retValue = self._function(*args, **kwargs) > File "/usr/share/vdsm/vm.py", line 513, in _sampleCpu > cpuStats = self._vm._dom.getCPUStats(True, 0) > File "/usr/share/vdsm/vm.py", line 815, in f > ret = attr(*args, **kwargs) > File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line > 112, in wrapper > ret = f(*args, **kwargs) > File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1849, in > getCPUStats > if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', > dom=self) > libvirtError: Requested operation is not valid: cgroup CPUACCT controller is > not mounted > > > > > [2] engine log (run locally) > Thread-134::DEBUG::2013-06-26 > 13:26:57,450::libvirtconnection::136::vds::(wrapper) Unknown libvirterror: > ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: > cgroup CPUACCT controller is not mounted > Thread-129::DEBUG::2013-06-26 13:26:57,457::vmChannels::199::vds::(register) > Add fileno 28 to listener's channels. > Thread-134::ERROR::2013-06-26 13:26:57,450::sampling::355::vm.Vm::(collect) > vmId=`f7a84203-3d1b-493f-a38f-694ec79009e8`::Stats function failed: > <AdvancedStatsFunction _sampleCpu at 0x1372928> > Traceback (most recent call last): > File "/usr/share/vdsm/sampling.py", line 351, in collect > statsFunction() > File "/usr/share/vdsm/sampling.py", line 226, in __call__ > retValue = self._function(*args, **kwargs) > File "/usr/share/vdsm/vm.py", line 513, in _sampleCpu > cpuStats = self._vm._dom.getCPUStats(True, 0) > File "/usr/share/vdsm/vm.py", line 815, in f > ret = attr(*args, **kwargs) > File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line > 112, in wrapper > ret = f(*args, **kwargs) > File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1849, in > getCPUStats > if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', > dom=self) > libvirtError: Requested operation is not valid: cgroup CPUACCT controller is > not mounted The above exception was fixed in http://gerrit.ovirt.org/#/c/16204/
I think that there was a mixed in the errors. The problem which I opened the bug on "NPE on destroyVds command" can be seen in the engine.log I provided. This problem still persists. The owner of the test can provide additional information for why the command doesn't fail on manual removal and does fail on automatic removal. (different params maybe? )
thanks Ohad, i meant to ask Tareq. Tareq, if the error happens on the automatic test and not manually, what is the difference between the environments that cause this?
steps to reproduce: assume: have is5 installed and running 1. create VM with network and disk. 2. start vm - vm is up 3. via sdk: from ovirtsdk import api api = api.API("https://url.to.rhevm.engine", "USERNAME@internal", "PASSWORD", persistent_auth=False, insecure=True) vms= api.vms.list() test1 = vms[0] test1.stop()
the reproduction steps described in comment 19 lead to the same error fixed in bug 974999 i've verified with Barak Dagan which experienced the error described here that this is the same bug. *** This bug has been marked as a duplicate of bug 974999 ***