Bug 784033

Summary: Wrong flow of storage status in case of storage connections issues
Product: [Retired] oVirt Reporter: Jakub Libosvar <jlibosva>
Component: ovirt-engine-coreAssignee: lpeer <lpeer>
Status: CLOSED WONTFIX QA Contact:
Severity: low Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: acathrow, amureini, iheim, ykaul
Target Milestone: ---   
Target Release: 3.3.4   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-25 07:25:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
backend, vdsm logs none

Description Jakub Libosvar 2012-01-23 16:01:52 UTC
Created attachment 557001 [details]
backend, vdsm logs

Description of problem:
I have two storage domains (each on different server) and one host in iSCSI datacenter. I drop connection from host to master domain. In backend log you can see that both storage domains are in trouble, which is not true, only domain 4f98b552-3f04-4d03-af74-45cca0884ec9 had dropped connection:
2012-01-23 16:49:27,147 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-30) domain cc1ff6aa-3320-4f18-a5e9-a68b6db70f23 in problem. vds: srh-03.rhev.lab.eng.brq.redhat.com
2012-01-23 16:49:27,147 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-30) domain 4f98b552-3f04-4d03-af74-45cca0884ec9 in problem. vds: srh-03.rhev.lab.eng.brq.redhat.com

then the domain which had no problems at all is recovered from problems and elected as new master:
2012-01-23 16:49:37,652 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-50) Domain cc1ff6aa-3320-4f18-a5e9-a68b6db70f23 recovered from problem. vds: srh-03.rhev.lab.eng.brq.redhat.com
2012-01-23 16:49:37,652 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-50) Domain cc1ff6aa-3320-4f18-a5e9-a68b6db70f23 has recovered from problem. No active host in the DC is reporting it as problematic, so clearing the domain recovery timer.

with this action is taken even the problematic storage domain to up status and stands there for 5 minutes. Then is moved to Inactive

2012-01-23 16:54:27,161 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-87) domain 4f98b552-3f04-4d03-af74-45cca0884ec9 was reported by all hosts in status UP as problematic. Moving the Domain to NonOperational.



Version-Release number of selected component (if applicable):
ovirt-engine-3.0.0_0001-1.2.fc16.x86_64
vdsm-4.9.3.1-0.fc16.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Have two storages each on different server and one host
2. Block connection to master storage domain
  
Actual results:
After new master is selected, problematic storage goes up too

Expected results:
No need to moving problematic storage up (vdsm reports problems all the time during this)

Additional info:
Attached corresponding vdsm (but there is ~-2 minutes clock skew)

Comment 1 Itamar Heim 2013-02-25 07:25:17 UTC
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.