Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 920736

Summary: engine: java.lang.NullPointerException for SpmStatusVDS
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: mkublin <mkublin>
Status: CLOSED CURRENTRELEASE QA Contact: Pavel Stehlik <pstehlik>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: acathrow, bazulay, dyasny, iheim, lpeer, pstehlik, Rhev-m-bugs, sgrinber, yeylon, ykaul, yzaslavs
Target Milestone: ---Keywords: Regression
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: sf13 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-03-12 15:59:24 UTC
Created attachment 709049 [details]
logs

Description of problem:

we are getting java.lang.NullPointerException for SpmStatusVDS 

Version-Release number of selected component (if applicable):

sf10

How reproducible:

100%

Steps to Reproduce:
1. in two hosts cluster with 1 NFS domain block connectivity to the storage from both hosts
2.
3.
  
Actual results:

java.lang.NullPointerException for SpmStatusVDS

Expected results:

we should not get NPE

Additional info:logs

engine: 

2013-03-12 05:21:47,176 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-72) Failed in SpmStatusVDS method
2013-03-12 05:21:47,176 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-72) Error code GeneralException and error message VDSGenericException: VDSErrorException: Failed to SpmStatusVDS, error 
= 
2013-03-12 05:21:47,176 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-72) Command SpmStatusVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to SpmStatusV
DS, error = 
2013-03-12 05:21:47,177 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (QuartzScheduler_Worker-72) CreateCommand failed: java.lang.NullPointerException
        at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:332) [rt.jar:1.7.0_09-icedtea]
        at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:987) [rt.jar:1.7.0_09-icedtea]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.GetVdsManager(ResourceManager.java:199) [engine-vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.initializeVdsBroker(VdsBrokerCommand.java:51) [engine-vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.<init>(VdsBrokerCommand.java:32) [engine-vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand.<init>(HSMGetAllTasksStatusesVDSCommand.java:22) [engine-vdsbroker.jar:]
        at sun.reflect.GeneratedConstructorAccessor98.newInstance(Unknown Source) [:1.7.0_09-icedtea]
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) [rt.jar:1.7.0_09-icedtea]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:525) [rt.jar:1.7.0_09-icedtea]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.CreateCommand(ResourceManager.java:333) [engine-vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:370) [engine-vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand$IrsProxyData.proceedStoragePoolStats(IrsBrokerCommand.java:231) [engine-vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand$IrsProxyData._updatingTimer_Elapsed(IrsBrokerCommand.java:183) [engine-vdsbroker.jar:]
        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) [:1.7.0_09-icedtea]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_09-icedtea]
        at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_09-icedtea]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [engine-scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz-2.1.2.jar:]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz-2.1.2.jar:]


vdsm: 

Thread-66::INFO::2013-03-12 05:42:52,323::logUtils::37::dispatcher::(wrapper) Run and protect: getSpmStatus(spUUID='5e1e9d7a-ba64-48cd-84b1-a7e3e67829b7', options=None)
Thread-66::ERROR::2013-03-12 05:42:52,324::task::833::TaskManager.Task::(_setError) Task=`1ea6a0d5-10dd-4cb6-961f-1fa66d719bf9`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 840, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 603, in getSpmStatus
    pool = self.getPool(spUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 313, in getPool
    raise se.StoragePoolUnknown(spUUID)
StoragePoolUnknown: Unknown pool id, pool not connected: ('5e1e9d7a-ba64-48cd-84b1-a7e3e67829b7',)
Thread-66::DEBUG::2013-03-12 05:42:52,324::task::852::TaskManager.Task::(_run) Task=`1ea6a0d5-10dd-4cb6-961f-1fa66d719bf9`::Task._run: 1ea6a0d5-10dd-4cb6-961f-1fa66d719bf9 ('5e1e9d7a-ba64-48cd-84b1-a7e3e67829b7',) {} failed - stopping t
ask

Comment 2 Yair Zaslavsky 2013-03-13 09:56:52 UTC
From what I can see at the log, 
there is an attempt to InitializeVdsBroker with a non existing id of vds.
This happens at invocation of HSMGetAllTaskStatuses during the failover.
I also took a look at where we remove at the ResourceManager the entry from the concurrent hash map - I see it is removed by RemoveVdsCommand.
During my search in code, I also saw that RemoveVdsCommand occurs during VdsCommand.InitializeVds - which performs removeVds and addVds - so here is another thing to fix, this "refresh" is not atomic, and may cause problems.
Michael, please shed more light here

Comment 3 mkublin 2013-03-17 09:33:36 UTC
Actually a problem is:
Inside IrsBrokerCommand we have a variable mCurrentVdsId, it is shared between different threads.
This variable can be set to null from different thread for example via ResetIrsVDSCommand.
The critical section are not synchronized and never were.
So this is a race by design and this is , I suppose, an oldest race - it is our moby-dick.
The problem is: monitoring of pool not stopped during reconstruct, monitoring of pool can run during spm stop.

Why we see a problem now and did not saw before, I suppose nobody take attention to exception in logs.

Comment 8 Itamar Heim 2013-06-11 08:32:13 UTC
3.2 has been released

Comment 9 Itamar Heim 2013-06-11 08:32:16 UTC
3.2 has been released

Comment 10 Itamar Heim 2013-06-11 08:33:24 UTC
3.2 has been released