Created attachment 609123 [details]
Description of problem:
The following exception occurs when spmStart fails on a host because spmStart was sent when storage was not connected:
2012-09-02 18:10:38,904 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-39) [18c0d8a4] Vds: green-vdsa.qa.lab.tlv.redhat.com
2012-09-02 18:10:38,904 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-39) [18c0d8a4] Failed in SpmStartVDS method, for vds: green-vdsa.qa.lab.tlv.redhat.com; host: 10.3
2012-09-02 18:10:38,904 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-39) [18c0d8a4] Command SpmStartVDS execution failed. Exception: NullPointerException:
2012-09-02 18:10:38,904 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (QuartzScheduler_Worker-39) [18c0d8a4] FINISH, SpmStartVDSCommand, log id: 658c9ed5
2012-09-02 18:10:38,907 INFO [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand] (QuartzScheduler_Worker-39) [5c454588] Running command: SetStoragePoolStatusCommand internal: true. Entities affected :
ID: bd560b80-c245-46b3-ad8c-b142a3460cf6 Type: StoragePool
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Block connection between spm and master (single) storage
2. Check engine logs during spmStart
exception is visible in logs. spmStart is automatically resent after and succeeds
i wonder why no stacktrace on the NPE.
hiding in the log is missing toString on StatusForXmlRpc?
2012-09-02 18:10:38,903 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-39) [18c0d8a4] Command org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand return value
Class Name: org.ovirt.engine.core.vdsbroker.irsbroker.OneUuidReturnForXmlRpc
mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
Author: Federico Simoncelli <firstname.lastname@example.org>
Date: Wed Sep 19 17:48:37 2012 -0400
core: trust the SpmStart task result during election
After the spmStart task ended an additional getSpmStatus was issued
to verify whether the host really became the SPM or not.
This second command could fail on its own for several reasons (eg:
temporary network failure, etc.) and its result wouldn't reflect the
actual outcome of the spmStart task. Worst scenario: the SpmStart
task succeeded and the getSpmStatus temporarily failed; this would
cause the engine to proceed with the election on the next host.
This patch is removing the additional getSpmStatus command trusting
the spmStart task result.
Signed-off-by: Federico Simoncelli <email@example.com>
Merged change id I832957996226cf091b1b7fe8fa3cc7657507795a
Verified on SI20 - no exception in the logs.