Bug 853747

Summary: [Log][engine] NullPointerException in spmStart in case storage is inaccessible or not connected (Command SpmStartVDS execution failed. Exception: NullPointerException)
Product: Red Hat Enterprise Virtualization Manager Reporter: Gadi Ickowicz <gickowic>
Component: ovirt-engineAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED CURRENTRELEASE QA Contact: Gadi Ickowicz <gickowic>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.1.0CC: abaron, amureini, dyasny, fsimonce, hateya, iheim, lpeer, nlevinki, Rhev-m-bugs, sgrinber, yeylon, ykaul
Target Milestone: ---   
Target Release: 3.1.0   
Hardware: All   
OS: Linux   
Whiteboard: storage
Fixed In Version: SI20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-04 20:06:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine logs none

Description Gadi Ickowicz 2012-09-02 15:36:13 UTC
Created attachment 609123 [details]
engine logs

Description of problem:
The following exception occurs when spmStart fails on a host because spmStart was sent when storage was not connected:

2012-09-02 18:10:38,904 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-39) [18c0d8a4] Vds: green-vdsa.qa.lab.tlv.redhat.com
2012-09-02 18:10:38,904 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-39) [18c0d8a4] Failed in SpmStartVDS method, for vds: green-vdsa.qa.lab.tlv.redhat.com; host: 10.3
5.102.10
2012-09-02 18:10:38,904 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-39) [18c0d8a4] Command SpmStartVDS execution failed. Exception: NullPointerException:
2012-09-02 18:10:38,904 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (QuartzScheduler_Worker-39) [18c0d8a4] FINISH, SpmStartVDSCommand, log id: 658c9ed5
2012-09-02 18:10:38,907 INFO  [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand] (QuartzScheduler_Worker-39) [5c454588] Running command: SetStoragePoolStatusCommand internal: true. Entities affected :
  ID: bd560b80-c245-46b3-ad8c-b142a3460cf6 Type: StoragePool


Version-Release number of selected component (if applicable):
rhevm-3.1.0-14.el6ev.noarch

How reproducible:
?

Steps to Reproduce:
1. Block connection between spm and master (single) storage
2. Check engine logs during spmStart
  
Actual results:
exception is visible in logs. spmStart is automatically resent after and succeeds

Comment 2 Itamar Heim 2012-09-02 16:39:06 UTC
i wonder why no stacktrace on the NPE.

hiding in the log is missing toString on StatusForXmlRpc?

2012-09-02 18:10:38,903 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-39) [18c0d8a4] Command org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.irsbroker.OneUuidReturnForXmlRpc
mUuid                         0e4f76c6-051b-4eb1-9d5b-7a799db786f7
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         0
mMessage                      OK

Comment 4 Federico Simoncelli 2012-09-29 09:45:32 UTC
commit 8d4b56a2385fd71f2d0b4e152284371d3e519c05
Author: Federico Simoncelli <fsimonce>
Date:   Wed Sep 19 17:48:37 2012 -0400

    core: trust the SpmStart task result during election
    
    After the spmStart task ended an additional getSpmStatus was issued
    to verify whether the host really became the SPM or not.
    This second command could fail on its own for several reasons (eg:
    temporary network failure, etc.) and its result wouldn't reflect the
    actual outcome of the spmStart task. Worst scenario: the SpmStart
    task succeeded and the getSpmStatus temporarily failed; this would
    cause the engine to proceed with the election on the next host.
    
    This patch is removing the additional getSpmStatus command trusting
    the spmStart task result.
    
    Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=853747
    Signed-off-by: Federico Simoncelli <fsimonce>
    Change-Id: I832957996226cf091b1b7fe8fa3cc7657507795a

http://gerrit.ovirt.org/#/c/8072/

Comment 5 Allon Mureinik 2012-10-02 14:58:06 UTC
Merged change id I832957996226cf091b1b7fe8fa3cc7657507795a

Comment 6 Gadi Ickowicz 2012-10-14 11:29:51 UTC
Verified on SI20 - no exception in the logs.

rhevm-3.1.0-20.el6ev.noarch