Red Hat Bugzilla – Bug 994546
[engine-backend] NullPointerException during a failure in detaching a local ISO domain
Last modified: 2016-02-10 14:32:56 EST
Created attachment 783917 [details]
Description of problem:
During a failure of detaching a local ISO domain from pool, I've encountered this NPE:
2013-08-07 15:56:22,460 ERROR [org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl] (DefaultQuartzScheduler_Worker-98) Failed to invoke scheduled method PerformLoadBalancing: java.
2013-08-07 15:56:22,631 ERROR [org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl] (DefaultQuartzScheduler_Worker-100) Failed to invoke scheduled method OnTimer: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) [:1.7.0_25]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_25]
at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_25]
at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [scheduler.jar:]
at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]
Caused by: java.lang.NullPointerException
at org.ovirt.engine.core.vdsbroker.VdsManager.OnTimer(VdsManager.java:212) [vdsbroker.jar:]
... 6 more
Version-Release number of selected component (if applicable):
Steps to Reproduce:
on a data center with connected storage pool and a local ISO domain is maintenance:
- detach the local ISO domain and stop ovirt-engine right after engine sends DetachStorageDomainVDSCommand to host
Sergey, is your recent fix related to this?
This is the root cause of the problem: LockManager is not found so the function can't acquire the lock, this is infra issue. Scenario itself is not longer valid, because you can't attach Local Storage to Non Local DC anymore, but this exception probably san appear in another flow either.
2013-08-07 15:56:23,443 ERROR [org.ovirt.engine.core.utils.ejb.EJBUtilsStrategy] (DefaultQuartzScheduler_Worker-4) Failed to lookup resource type: LOCK_MANAGER. JNDI name: java:global/engine/bll/LockManager: java.lang.IllegalArgumentException: JBAS011857: NamingStore is null
Yair - status ?
I looked at the log and there were other several issues - for example
Caused by: java.sql.SQLException: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111) [spring-jdbc.jar:3.1.3.RELEASE]
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77) [spring-jdbc.jar:3.1.3.RELEASE]
... 85 more
Caused by: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource
Elad, do you know if you had some DB issues for example?
Did other flow behave ok after you saw this issue?
The reason I'm asking is that looks like jboss is being shutdown (I also tried to google up the JBoss AS code in the error and from what I see it has to do with JBoss being shutdown , and at the same time, the application trying to perform a JNDI lookup)
(In reply to Yair Zaslavsky from comment #4)
> I looked at the log and there were other several issues - for example
> Caused by: java.sql.SQLException: javax.resource.ResourceException:
> IJ000451: The connection manager is shutdown: java:/ENGINEDataSource
> doGetConnection(DataSourceUtils.java:111) [spring-jdbc.jar:3.1.3.RELEASE]
> getConnection(DataSourceUtils.java:77) [spring-jdbc.jar:3.1.3.RELEASE]
> ... 85 more
> Caused by: javax.resource.ResourceException: IJ000451: The connection
> manager is shutdown: java:/ENGINEDataSource
> Elad, do you know if you had some DB issues for example?
> Did other flow behave ok after you saw this issue?
> The reason I'm asking is that looks like jboss is being shutdown (I also
> tried to google up the JBoss AS code in the error and from what I see it has
> to do with JBoss being shutdown , and at the same time, the application
> trying to perform a JNDI lookup)
AFAIK anything else works fine..
I was unable to reproduce this with upstream master. Please try to reproduce with latest/next release and let me know.
(In reply to Ravi Nori from comment #6)
> I was unable to reproduce this with upstream master. Please try to reproduce
> with latest/next release and let me know.
I'm having trouble checking it on latest build, I fail to create local ISO domain:
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 857, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 1166, in attachStorageDomain
File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
return f(self, *args, **kwargs)
File "/usr/share/vdsm/storage/sp.py", line 998, in attachSD
File "/usr/share/vdsm/storage/sd.py", line 487, in acquireClusterLock
File "/usr/share/vdsm/storage/clusterlock.py", line 112, in acquire
raise se.AcquireLockFailure(self._sdUUID, rc, out, err)
AcquireLockFailure: Cannot obtain lock: "id=e2fefc75-ae0c-453c-a9a0-63716107da9e, rc=1, out=, err=['panic:  validate_lease_params: bad lease/op max timeouts: (Success)']"
there is a bug on that
Elad, is out for a while,
please contact me if needed.
(In reply to Aharon Canan from comment #12)
> Barak -
> the reproduction steps in the description doesn't work?
No, See comments #6 and #11