Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 918958

Summary: [engine-backend] engine is reporting that data center is UP while master storage domain is inactive [false positive]
Product: Red Hat Enterprise Virtualization Manager Reporter: Elad <ebenahar>
Component: ovirt-engineAssignee: Liron Aravot <laravot>
Status: CLOSED WORKSFORME QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: abaron, acathrow, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, yeylon, ykaul
Target Milestone: ---Keywords: Regression
Target Release: 3.2.0Flags: abaron: Triaged+
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-14 08:24:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Elad 2013-03-07 10:06:53 UTC
Created attachment 706482 [details]
logs

Description of problem:
When vdsm get stuck by a negative flow and cause to master storage domain to become inactive, (like in bug #918915) engie report that data center is UP. 

Version-Release number of selected component (if applicable):

rhevm-backend-3.2.0-10.10.beta1.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Run one or more LVM operations on VDSM (like CreateVolume or DeleteVolume)
2. run the following script on VDSM:

while true; do kill -STOP `pgrep lvm` && sleep 10 && kill -CONT `pgrep lvm`; done
3. vdsm will stuck  because LVM operation stoped but superVDSM won't go back to normal when the LVM operation will continue.
4. master storage domain will become inactive. 
  
Actual results:
Engine will report that the data center is UP.

Expected results:
Engine should not report that the data center is Up while master storage domain is inactive.

Additional info:
see logs attatched

Comment 2 Ayal Baron 2013-03-08 20:47:13 UTC
Why is this bug marked as regression?

I'm not sure what your definition of inactive is since the monitoring shows that the domain is still accessible (i.e. VMs will go on working properly since I/O works):
Thread-117784::INFO::2013-03-06 18:00:33,300::logUtils::39::dispatcher::(wrapper) Run and protect: repoStats, Return response: {'ad3962a4-30b8-47b1-a3df-cf3bd852cb20': {'delay': '0.0418319702148', 'lastCheck': '6847.2', 'code': 0, 'valid': True}}

Any reason not to close this as notabug?

Comment 3 Ayal Baron 2013-03-12 22:42:51 UTC
> Thread-117784::INFO::2013-03-06
> 18:00:33,300::logUtils::39::dispatcher::(wrapper) Run and protect:
> repoStats, Return response: {'ad3962a4-30b8-47b1-a3df-cf3bd852cb20':
> {'delay': '0.0418319702148', 'lastCheck': '6847.2', 'code': 0, 'valid':
> True}}

reviewed with Haim, I missed that lastCheck is greater than 60.
Liron, please check why DC did not move  to non-op (and determine if this is indeed a storage issue or not)

Comment 4 Haim 2013-03-14 08:24:42 UTC
no reproduction, we will re-open if reproduced.