Created attachment 679539[details]
engine log
Description of problem:
SPM doesn't become non-operational after disconnecting all hosts in the cluster from the storage
Version-Release number of selected component (if applicable):
How reproducible:
100%
Steps to Reproduce:
1. run 3 hosts in a clutser
2. block the connection to the storage from all hosts
3.
Actual results:
SPM doesn't become non-operational,
eventually all the 3 hosts move to 'up' state.
Expected results:
SPM become non-operational
Additional info:
----- Original Message -----
> From: "Omer Frenkel" <ofrenkel>
> To: "Michael Kublin" <mkublin>
> Cc: "Arik Hadas" <ahadas>
> Sent: Wednesday, January 16, 2013 2:29:33 PM
> Subject: Re: log for bug
>
>
>
> ----- Original Message -----
> > From: "Michael Kublin" <mkublin>
> > To: "Arik Hadas" <ahadas>, "Omer Frenkel"
> > <ofrenkel>
> > Sent: Wednesday, January 16, 2013 2:04:25 PM
> > Subject: Re: log for bug
> >
> >
> > Hi, I took look at logs.
> > For some reason we did not do InitVdsOnUp after 12:27, but these is
> > less important for you case.
>
> the relevant initVdsOnUp was in 12:28:30
>
> > I take a look around 11:59.
>
> i hope it's the same scenario..
>
> > I think that spm was bamba.
> > During InitVdsOnUp we failed to connect host to pool because
> > missing
> > master domain, so I triggered a
> > reconstruct.
> > 2013-01-15 11:59:10,352 INFO
> > [org.ovirt.engine.core.bll.storage.ReconstructMasterDomainCommand]
> > (pool-11-thread-41) [43e1763b] Running command:
> > ReconstructMasterDomainCommand internal: true. Entities affected :
> > ID: 6ff7ee1a-eecd-4ef9-b303-d894d6f595e9 Type: Storage
> > (Thread name is changed because of using a queue)
> > Now, the master was not found so no reconstruct is done.
> > 2013-01-15 11:59:14,639 INFO
> > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> > (pool-11-thread-41) [43e1763b] No string for
> > RECONSTRUCT_MASTER_FAILED_NO_MASTER type. Use default Log
> >
> > At such case ReconstructMasterDomainCommand finished with success,
> > it
> > means InitVdsOnUp successes - it is a bug.
>
> i am not sure how its possible that reconstruct has succeeded, the
> storage is disconnected,
> (and there is only one domain in the pool)
>
the storage was disconnected because wrong parameter passed to ReconstructMasterDomainCommand
ReconstructMasterParameters.isInActive == false. (I think these wrong , these is my mistake)
> if the reconstruct was really successful then maybe the host really
> should be up?
The command ReconstructMasterDomainCommand.isSuccessed == true, because last master and it is usually will be true.
These a way that it is working now after re factoring made by storage team.
> > Thanks guys, it is mine. Can you please open a bug or if you have
> > already opened assign it to me.
> >
> > ----- Original Message -----
> > From: "Arik Hadas" <ahadas>
> > To: "Michael Kublin" <mkublin>
> > Sent: Wednesday, January 16, 2013 12:37:15 PM
> > Subject: log for bug
> >
> > on 12:27 I blocked connection to storage from all 3 hosts (knight,
> > honda, bamba)
> > knight was the SPM before the disconnection
> >
> > knight moved to status 'connecting'
> >
> > the storage domain moved to maintenance
> >
> > knight moved to status 'up'
> >
> > honda become SPM (the storage is not activated..)
> >
> > honda is cleared from being SPM
> >
> > -> all three hosts are in status 'up'
> >
>