Bug 846692

Summary: [engine-core] able to change data-center name when pool\host is not connected causes meta-data out of sync and host re-election failure for good
Product: Red Hat Enterprise Virtualization Manager Reporter: Leonid Natapov <lnatapov>
Component: ovirt-engineAssignee: Ayal Baron <abaron>
Status: CLOSED DUPLICATE QA Contact: Haim <hateya>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.1.0CC: amureini, dron, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, sgrinber, yeylon, ykaul
Target Milestone: ---Keywords: Regression
Target Release: 3.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-12 12:15:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine log
none
vdsm log none

Description Leonid Natapov 2012-08-08 12:52:36 UTC
Created attachment 603016 [details]
engine log

[Storage] Storage Pool's name not update in metadata file after changing DataCenter's name in the UI. Therefore SPM ellection fails and DC remains in Non-responsive status forever.

How to reproduce:
Easiest way to reproduce when you have 1 host in cluster.
1.Wroking setup with ISCSI Storage.
2.Switch host to Maintenance.
3.Go to DC tab and edit DataCenter.
4.Change DataCenter's name.
5.Activate the host.

If you will look at storage pool info after activating host (vdsClient -s 0 getStoragePoolInfo <UUID>),you'll see that the name remains unchanged. So,what happens is no match between the engine and vdsm regarding the pool's name. Therefore SPM could not be ellected and DataCenter remains in Non-Responsive status forever.

full engine log attached.

2012-08-08 13:40:32,221 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-93) [1f162ad8] spm stop succeeded, continuing with spm selection
2012-08-08 13:40:32,248 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-93) [1f162ad8] starting spm on vds purple-vds2, storage pool ISCSI, prevId 2, LVER 0
2012-08-08 13:40:32,250 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (QuartzScheduler_Worker-93) [1f162ad8] START, SpmStartVDSCommand(vdsId = 2053b160-e138-11e1-920b-23ee23087e40, storagePoolId = 7736f5e8-e136-11e1-a62d-27f1e2fc96c7, prevId=2, prevLVER=0, storagePoolFormatType=null, recoveryMode=Manual, SCSIFencing=true), log id: 60d9e39b
2012-08-08 13:40:32,251 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-93) [1f162ad8] Failed in SpmStartVDS method, for vds: purple-vds2; host: purple-vds2
2012-08-08 13:40:32,252 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-93) [1f162ad8] Command SpmStartVDS execution failed. Exception: NullPointerException:
2012-08-08 13:40:32,252 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (QuartzScheduler_Worker-93) [1f162ad8] FINISH, SpmStartVDSCommand, log id: 60d9e39b
2012-08-08 13:40:32,259 INFO  [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand] (QuartzScheduler_Worker-93) [ee06ca] Running command: SetStoragePoolStatusCommand internal: true. Entities affected :  ID: 7736f5e8-e136-11e1-a62d-27f1e2fc96c7 Type: StoragePool
2012-08-08 13:40:32,295 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-93) [ee06ca] IrsBroker::Failed::GetStoragePoolInfoVDS due to: IrsSpmStartFailedException: IRSGenericException: IRSErrorException: SpmStart failed
2012-08-08 13:40:33,561 INFO  [org.ovirt.engine.core.bll.ChangeVDSClusterCommand] (ajp--127.0.0.1-8009-44) [7819075] Running command: ChangeVDSClusterCommand internal: false. Entities affected :  ID: 2588b87e-e138-11e1-9485-677d17e5653e Type: VDS,  ID: 68d7b280-e145-11e1-a5c3-5bf8a05311b3 Type: VdsGroups

Comment 1 Leonid Natapov 2012-08-08 13:27:29 UTC
Possible solution is to block DataCenter editing when it's status other than UP.

Comment 2 Haim 2012-08-08 13:45:13 UTC
(In reply to comment #1)
> Possible solution is to block DataCenter editing when it's status other than
> UP.

leonid, please add vdsm log.

Comment 3 Leonid Natapov 2012-08-08 14:36:53 UTC
Created attachment 603052 [details]
vdsm log

added

Comment 4 Dafna Ron 2012-08-12 12:15:13 UTC
this bug is related to issue - closing as duplicate of : 

https://bugzilla.redhat.com/show_bug.cgi?id=845310

as we can see in log we are sending UpdateStoragePoolCommand and it changes storagePoolFormatType=null 
so next time we send SpmStartVDSCommand we are failing 

2012-08-08 17:50:31,987 INFO  [org.ovirt.engine.core.bll.storage.UpdateStoragePoolCommand] (ajp-/127.0.0.1:8009-11) [4de8eea] Running command: UpdateStoragePoolCommand internal: false. Entities affected :  ID: 0e1d85d9-ca62-46c0-a41e-7f
df69cc1c72 Type: StoragePool


2012-08-08 17:50:46,964 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (QuartzScheduler_Worker-20) [c2dde86] START, SpmStartVDSCommand(vdsId = affc37b4-e14d-11e1-9e4a-001a4a169738, storagePoolId = 0e1d85d9-ca62-46c0-a41e-7fdf69cc1c72, prevId=-1, prevLVER=0, storagePoolFormatType=null, recoveryMode=Manual, SCSIFencing=false), log id: 29943e18

*** This bug has been marked as a duplicate of bug 845310 ***