Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 920636

Summary: engine: second host become non-operational when adding it to pool during createStoragePool
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Roy Golan <rgolan>
Status: CLOSED NOTABUG QA Contact: Artyom <alukiano>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: acathrow, bazulay, iheim, jkt, lpeer, pstehlik, Rhev-m-bugs, yeylon, yzaslavs
Target Milestone: ---Keywords: Triaged
Target Release: 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-20 17:08:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1019461    
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-03-12 13:24:55 UTC
Created attachment 708966 [details]
logs

Description of problem:

I have two hosts, one was in maintenance and other in up. 
I created a new pool and during the create of the pool I activated the second host. 
engine sends ConnectStoragePool to vdsm although the master domain is not yet created which causes hsm host to become non-operational. 

Version-Release number of selected component (if applicable):

sf10
vdsm-4.10.2-11.0.el6ev.x86_64

How reproducible:

looks like a race

Steps to Reproduce:
1. create a new Pool with one host active and one in maintenance
2. activate the second host during createStorgaePool
3.
  
Actual results:

since engine has the master domain in the db but the domainhas not yet been finished to be created, when we add the second host and engine sends connectStoragePool with the master domain UUID from the db, vdsm on hsm host will fail with no master domain error and host will become non-operational. 

Expected results:

host should not become non-operational (maybe creating a temp flag to new master domain in db before we actually know it was created and active). 

Additional info:logs


2013-03-12 02:27:34,127 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-3-thread-46) START, ConnectStoragePoolVDSCommand(HostName = gold-vdsd, HostId = 83834e1f-9e60-41b5-a9cc-16460a8a2fe2, storagePoolId = 5e1e9d7a-ba64-48cd-84b1-a7e3e67829b7, vds_spm_id = 2, masterDomainId = b37a4aef-09dd-4a4e-ae1e-a3bdb12c4ba5, masterVersion = 1), log id: 4ddc0a3f


2013-03-12 02:27:37,690 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-46) Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: 'spUUID=5e1e9d7a-ba64-48cd-84b1-a7e3e67829b7, msdUUID=b37a4aef-09dd-4a4e-ae1e-a3bdb12c4ba5'


2013-03-12 02:27:37,713 INFO  [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-20) [10904fbb] Running command: SetNonOperationalVdsCommand internal: true. Entities affected :  ID: 83834e1f-9e60-41b5-a9cc-16460a8a2fe2 Type: VDS

Comment 1 Dafna Ron 2013-03-12 14:06:02 UTC
also reproduced when activating master domain and activating a host at the same time. 

reproduction: 

1. have two hosts, one SPM and one HSM with 1 NFS domain
2. put the HSM host in maintenance
3. put master domain in maintenance
4. activate the master domain + activate the host.

Comment 2 Ayal Baron 2013-03-13 10:19:24 UTC
connectStoragePool should not be run until createStoragePool is finished.
Sounds like infra should add the createStoragePool to the queue and then connect would hold until create is finished.

Michael, thoughts about this?

Comment 3 mkublin 2013-03-13 13:18:57 UTC
Agree, or new event should be or prevent activation of host until creation of pool (AddStoragePoolWithStoragesCommand) will be finished.
By the way, auto recovery success to activate a host after that.

Comment 5 Roy Golan 2013-09-16 14:27:03 UTC
Also the auto-recovery will activate the host after 3 minutes and it will probably be up. 

Isn't it better to succeed adding and installing the host (which is not a quick operation) first and then the system will recover for us from an intermediate state while we create the pool?

Comment 6 Andrew Cathrow 2013-10-20 14:23:30 UTC
(In reply to Roy Golan from comment #5)
> Also the auto-recovery will activate the host after 3 minutes and it will
> probably be up. 
> 
> Isn't it better to succeed adding and installing the host (which is not a
> quick operation) first and then the system will recover for us from an
> intermediate state while we create the pool?

Agree

Comment 7 Barak 2013-10-20 17:08:48 UTC
Per comments #4 #5 & #6,
moving this bug to CLOSE NOTABUG