Created attachment 609123 [details] engine logs Description of problem: The following exception occurs when spmStart fails on a host because spmStart was sent when storage was not connected: 2012-09-02 18:10:38,904 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-39) [18c0d8a4] Vds: green-vdsa.qa.lab.tlv.redhat.com 2012-09-02 18:10:38,904 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-39) [18c0d8a4] Failed in SpmStartVDS method, for vds: green-vdsa.qa.lab.tlv.redhat.com; host: 10.3 5.102.10 2012-09-02 18:10:38,904 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-39) [18c0d8a4] Command SpmStartVDS execution failed. Exception: NullPointerException: 2012-09-02 18:10:38,904 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (QuartzScheduler_Worker-39) [18c0d8a4] FINISH, SpmStartVDSCommand, log id: 658c9ed5 2012-09-02 18:10:38,907 INFO [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand] (QuartzScheduler_Worker-39) [5c454588] Running command: SetStoragePoolStatusCommand internal: true. Entities affected : ID: bd560b80-c245-46b3-ad8c-b142a3460cf6 Type: StoragePool Version-Release number of selected component (if applicable): rhevm-3.1.0-14.el6ev.noarch How reproducible: ? Steps to Reproduce: 1. Block connection between spm and master (single) storage 2. Check engine logs during spmStart Actual results: exception is visible in logs. spmStart is automatically resent after and succeeds
i wonder why no stacktrace on the NPE. hiding in the log is missing toString on StatusForXmlRpc? 2012-09-02 18:10:38,903 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-39) [18c0d8a4] Command org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.irsbroker.OneUuidReturnForXmlRpc mUuid 0e4f76c6-051b-4eb1-9d5b-7a799db786f7 mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 0 mMessage OK
commit 8d4b56a2385fd71f2d0b4e152284371d3e519c05 Author: Federico Simoncelli <fsimonce> Date: Wed Sep 19 17:48:37 2012 -0400 core: trust the SpmStart task result during election After the spmStart task ended an additional getSpmStatus was issued to verify whether the host really became the SPM or not. This second command could fail on its own for several reasons (eg: temporary network failure, etc.) and its result wouldn't reflect the actual outcome of the spmStart task. Worst scenario: the SpmStart task succeeded and the getSpmStatus temporarily failed; this would cause the engine to proceed with the election on the next host. This patch is removing the additional getSpmStatus command trusting the spmStart task result. Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=853747 Signed-off-by: Federico Simoncelli <fsimonce> Change-Id: I832957996226cf091b1b7fe8fa3cc7657507795a http://gerrit.ovirt.org/#/c/8072/
Merged change id I832957996226cf091b1b7fe8fa3cc7657507795a
Verified on SI20 - no exception in the logs. rhevm-3.1.0-20.el6ev.noarch