Bug 895996

Summary: SPM doesn't switch to non-operational after block connectivity to storage
Product: Red Hat Enterprise Virtualization Manager Reporter: Arik <ahadas>
Component: ovirt-engineAssignee: mkublin <mkublin>
Status: CLOSED CURRENTRELEASE QA Contact: vvyazmin <vvyazmin>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: acathrow, ahadas, bazulay, dyasny, hateya, iheim, lnatapov, lpeer, ofrenkel, Rhev-m-bugs, sgrinber, yeylon, ykaul, yzaslavs
Target Milestone: ---   
Target Release: 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: sf5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 915537    
Attachments:
Description Flags
engine log none

Description Arik 2013-01-16 13:05:52 UTC
Created attachment 679539 [details]
engine log

Description of problem:
SPM doesn't become non-operational after disconnecting all hosts in the cluster from the storage

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. run 3 hosts in a clutser
2. block the connection to the storage from all hosts
3.
  
Actual results:
SPM doesn't become non-operational,
eventually all the 3 hosts move to 'up' state.

Expected results:
SPM become non-operational

Additional info:
----- Original Message -----
> From: "Omer Frenkel" <ofrenkel>
> To: "Michael Kublin" <mkublin>
> Cc: "Arik Hadas" <ahadas>
> Sent: Wednesday, January 16, 2013 2:29:33 PM
> Subject: Re: log for bug
>
>
>
> ----- Original Message -----
> > From: "Michael Kublin" <mkublin>
> > To: "Arik Hadas" <ahadas>, "Omer Frenkel"
> > <ofrenkel>
> > Sent: Wednesday, January 16, 2013 2:04:25 PM
> > Subject: Re: log for bug
> >
> >
> > Hi, I took look at logs.
> > For some reason we did not do InitVdsOnUp after 12:27, but these is
> > less important for you case.
>
> the relevant initVdsOnUp was in 12:28:30
>
> > I take a look around 11:59.
>
> i hope it's the same scenario..
>
> > I think that spm was bamba.
> > During InitVdsOnUp we failed to connect host to pool because
> > missing
> > master domain, so I triggered a
> > reconstruct.
> > 2013-01-15 11:59:10,352 INFO
> >  [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand]
> > (pool-11-thread-41) [43e1763b] Running command:
> > ReconstructMasterDomainCommand internal: true. Entities affected :
> >  ID: 6ff7ee1a-eecd-4ef9-b303-d894d6f595e9 Type: Storage
> > (Thread name is changed because of using a queue)
> > Now, the master was not found so no reconstruct is done.
> > 2013-01-15 11:59:14,639 INFO
> >  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> > (pool-11-thread-41) [43e1763b] No string for
> > RECONSTRUCT_MASTER_FAILED_NO_MASTER type. Use default Log
> >
> > At such case ReconstructMasterDomainCommand finished with success,
> > it
> > means InitVdsOnUp successes - it is a bug.
>
> i am not sure how its possible that reconstruct has succeeded, the
> storage is disconnected,
> (and there is only one domain in the pool)
>
the storage was disconnected because wrong parameter passed to ReconstructMasterDomainCommand
ReconstructMasterParameters.isInActive == false. (I think these wrong , these is my mistake)

> if the reconstruct was really successful then maybe the host really
> should be up?
The command ReconstructMasterDomainCommand.isSuccessed == true, because last master and it is usually will be true.
These a way that it is working now after re factoring made by storage team.

> > Thanks guys, it is mine. Can you please open a bug or if you have
> > already opened assign it to me.
> >
> > ----- Original Message -----
> > From: "Arik Hadas" <ahadas>
> > To: "Michael Kublin" <mkublin>
> > Sent: Wednesday, January 16, 2013 12:37:15 PM
> > Subject: log for bug
> >
> > on 12:27 I blocked connection to storage from all 3 hosts (knight,
> > honda, bamba)
> > knight was the SPM before the disconnection
> >
> > knight moved to status 'connecting'
> >
> > the storage domain moved to maintenance
> >
> > knight moved to status 'up'
> >
> > honda become SPM (the storage is not activated..)
> >
> > honda is cleared from being SPM
> >
> > -> all three hosts are in status 'up'
> >
>

Comment 4 Leonid Natapov 2013-01-30 15:03:56 UTC
sf5. fixed. after blocking connection from all hosts to SD ,SPM becomes non operational.

Comment 5 Itamar Heim 2013-06-11 08:33:02 UTC
3.2 has been released

Comment 6 Itamar Heim 2013-06-11 08:33:05 UTC
3.2 has been released

Comment 7 Itamar Heim 2013-06-11 08:33:58 UTC
3.2 has been released

Comment 8 Itamar Heim 2013-06-11 08:42:31 UTC
3.2 has been released