Bug 994546 - [engine-backend] NullPointerException during a failure in detaching a local ISO domain
[engine-backend] NullPointerException during a failure in detaching a local I...
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.3.0
x86_64 Unspecified
unspecified Severity medium
: ---
: 3.3.0
Assigned To: Ravi Nori
Pavel Stehlik
infra
: Triaged
Depends On:
Blocks: 974849
  Show dependency treegraph
 
Reported: 2013-08-07 09:35 EDT by Elad
Modified: 2016-02-10 14:32 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-10-08 09:59:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
amureini: Triaged+
scohen: Triaged+


Attachments (Terms of Use)
logs (905.73 KB, application/x-gzip)
2013-08-07 09:35 EDT, Elad
no flags Details

  None (edit)
Description Elad 2013-08-07 09:35:28 EDT
Created attachment 783917 [details]
logs

Description of problem:
During a failure of detaching a local ISO domain from pool, I've encountered this NPE:

2013-08-07 15:56:22,460 ERROR [org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl] (DefaultQuartzScheduler_Worker-98) Failed to invoke scheduled method PerformLoadBalancing: java.
lang.reflect.InvocationTargetException

2013-08-07 15:56:22,631 ERROR [org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl] (DefaultQuartzScheduler_Worker-100) Failed to invoke scheduled method OnTimer: java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) [:1.7.0_25]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_25]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_25]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]
Caused by: java.lang.NullPointerException
        at org.ovirt.engine.core.vdsbroker.VdsManager.OnTimer(VdsManager.java:212) [vdsbroker.jar:]
        ... 6 more


Version-Release number of selected component (if applicable):
rhevm-3.3.0-0.13.master.el6ev.noarch



Steps to Reproduce:
on a data center with connected storage pool and a local ISO domain is maintenance:
- detach the local ISO domain and stop ovirt-engine right after engine sends DetachStorageDomainVDSCommand to host 



Additional info:
logs
Comment 1 Allon Mureinik 2013-08-22 10:44:19 EDT
Sergey, is your recent fix related to this?
Comment 2 Sergey Gotliv 2013-08-22 14:35:25 EDT
This is the root cause of the problem: LockManager is not found so the function can't acquire the lock, this is infra issue. Scenario itself is not longer valid, because you can't attach Local Storage to Non Local DC anymore, but this exception probably san appear in another flow either.

2013-08-07 15:56:23,443 ERROR [org.ovirt.engine.core.utils.ejb.EJBUtilsStrategy] (DefaultQuartzScheduler_Worker-4) Failed to lookup resource type: LOCK_MANAGER. JNDI name: java:global/engine/bll/LockManager: java.lang.IllegalArgumentException: JBAS011857: NamingStore is null
	at org.jboss.as.naming.NamingContext.<init>(NamingContext.java:152)
	at org.jboss.as.naming.NamingContext.<init>(NamingContext.java:125)
	at org.jboss.as.naming.InitialContext.<init>(InitialContext.java:86)
	at org.jboss.as.naming.InitialContextFactory.getInitialContext(InitialContextFactory.java:44)
Comment 3 Barak 2013-09-09 08:02:24 EDT
Yair - status ?
Comment 4 Yair Zaslavsky 2013-09-09 08:39:35 EDT
I looked at the log and there were other several issues - for example

Caused by: java.sql.SQLException: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource
	at org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:147)
	at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111) [spring-jdbc.jar:3.1.3.RELEASE]
	at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77) [spring-jdbc.jar:3.1.3.RELEASE]
	... 85 more
Caused by: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource

Elad, do you know if you had some DB issues for example?
Did other flow behave ok after you saw this issue?
The reason I'm asking is that looks like jboss is being shutdown (I also tried to google up the JBoss AS code in the error and from what I see it has to do with JBoss being shutdown , and at the same time, the application trying to perform a JNDI lookup)
Comment 5 Elad 2013-09-10 05:50:50 EDT
(In reply to Yair Zaslavsky from comment #4)
> I looked at the log and there were other several issues - for example
> 
> Caused by: java.sql.SQLException: javax.resource.ResourceException:
> IJ000451: The connection manager is shutdown: java:/ENGINEDataSource
> 	at
> org.jboss.jca.adapters.jdbc.WrapperDataSource.
> getConnection(WrapperDataSource.java:147)
> 	at
> org.springframework.jdbc.datasource.DataSourceUtils.
> doGetConnection(DataSourceUtils.java:111) [spring-jdbc.jar:3.1.3.RELEASE]
> 	at
> org.springframework.jdbc.datasource.DataSourceUtils.
> getConnection(DataSourceUtils.java:77) [spring-jdbc.jar:3.1.3.RELEASE]
> 	... 85 more
> Caused by: javax.resource.ResourceException: IJ000451: The connection
> manager is shutdown: java:/ENGINEDataSource
> 
> Elad, do you know if you had some DB issues for example?
> Did other flow behave ok after you saw this issue?
> The reason I'm asking is that looks like jboss is being shutdown (I also
> tried to google up the JBoss AS code in the error and from what I see it has
> to do with JBoss being shutdown , and at the same time, the application
> trying to perform a JNDI lookup)

AFAIK anything else works fine..
Comment 6 Ravi Nori 2013-09-17 11:31:32 EDT
I was unable to reproduce this with upstream master. Please try to reproduce with latest/next release and let me know.
Comment 7 Elad 2013-09-25 08:01:36 EDT
(In reply to Ravi Nori from comment #6)
> I was unable to reproduce this with upstream master. Please try to reproduce
> with latest/next release and let me know.

I'm having trouble checking it on latest build, I fail to create local ISO domain:

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1166, in attachStorageDomain
    pool.attachSD(sdUUID)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 998, in attachSD
    dom.acquireClusterLock(self.id)
  File "/usr/share/vdsm/storage/sd.py", line 487, in acquireClusterLock
    self._clusterLock.acquire(hostID)
  File "/usr/share/vdsm/storage/clusterlock.py", line 112, in acquire
    raise se.AcquireLockFailure(self._sdUUID, rc, out, err)
AcquireLockFailure: Cannot obtain lock: "id=e2fefc75-ae0c-453c-a9a0-63716107da9e, rc=1, out=[], err=['panic: [31756] validate_lease_params: bad lease/op max timeouts: (Success)']"


there is a bug on that
https://bugzilla.redhat.com/show_bug.cgi?id=994463
Comment 9 Aharon Canan 2013-09-29 08:31:11 EDT
Elad, is out for a while, 
please contact me if needed.
Comment 13 Barak 2013-09-30 16:25:35 EDT
(In reply to Aharon Canan from comment #12)
> Barak - 
> the reproduction steps in the description doesn't work?

No, See comments #6 and #11

Note You need to log in before you can comment on or make changes to this bug.