Bug 969121 - [engine-backend] host stuck in non-operational and SDs remain active while Data center is Non-responsive
Summary: [engine-backend] host stuck in non-operational and SDs remain active while Da...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.3.0
Assignee: Nobody's working on this, feel free to take it
QA Contact: Elad
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-30 17:17 UTC by Elad
Modified: 2016-02-10 17:08 UTC (History)
11 users (show)

Fixed In Version: is2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (10.21 MB, application/x-gzip)
2013-05-30 17:17 UTC, Elad
no flags Details
logs (804.69 KB, application/x-gzip)
2013-07-08 07:34 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 14767 0 None None None Never

Description Elad 2013-05-30 17:17:34 UTC
Created attachment 754989 [details]
logs

Description of problem:
after interruption in reconstruct spm tries to connect to pool and fails. after that, the host stuck in non-operational and the domains remain active  

Version-Release number of selected component (if applicable):
vdsm-4.10.2-22.0.el6ev.x86_64
rhevm-3.2.0-11.29.el6ev.noarch

How reproducible:
50%

Steps to Reproduce: on 1 host and 2 SDs from different storage servers:
1. maintenance to the master domain and during that, stop vdsmd
2. reconstruct will fail 
3. start to vdsmd

Actual results:
1) host will become non-operational and stuck. 
2) the 2 SDs will remain active even though there are no active hosts in the setup.

Expected results:
1) host should not stuck in non-operational
2) storage domains should become inactive


Additional info: logs

Comment 1 Elad 2013-06-02 06:27:27 UTC
CORRECTION: reproduction steps: happened to me with 2 storage domains from the same server

Comment 2 Liron Aravot 2013-07-07 17:09:19 UTC
Elad ,
1. the logs of vdsm and the engine do not match, the engine logs are till 27/5 while the vdsm logs start at the 29/5 - please try to reproduce and attach the correct logs of less big timeframe if possible, thanks.

2. please point to the point in time in the logs in which the scenario you referred to happend, i didn't see it in the logs. I think that the best option is to reproduce the issue.

Regardless, there were two possibly related issues which reminds me the issue that you described:
1. This bug (from the text) - Domains statuses aren't changed. (Recent)
 https://bugzilla.redhat.com/show_bug.cgi?id=977169 

2. When deactivating domain, it's saved status in the compensation is set to UNKNOWN instead of Active (compensation doesn't appear in the engine log) 
https://bugzilla.redhat.com/show_bug.cgi?id=920694#c6

#2 was merged, while #1 wasn't.

Comment 3 Elad 2013-07-08 07:34:52 UTC
Created attachment 770314 [details]
logs

Managed to reproduce on 3.2: rhevm-3.2.1-0.39.el6ev.noarch

attached engine.log and vdsm.log

Comment 4 Liron Aravot 2013-07-08 08:27:58 UTC
this issue in the new log should be be solved by #2  - moving to MODIFIED

Comment 5 Elad 2013-07-15 12:08:59 UTC
Does not reprodeuced on RHEM3.3 (is5):
rhevm-3.3.0-0.6.master.el6ev.noarch
vdsm-4.11.0-121.git082925a.el6.x86_64

After interruption in reconstruct, host becomes active and reconstruct ends successfuly.

Comment 6 Itamar Heim 2014-01-21 22:28:20 UTC
Closing - RHEV 3.3 Released

Comment 7 Itamar Heim 2014-01-21 22:28:24 UTC
Closing - RHEV 3.3 Released

Comment 8 Itamar Heim 2014-01-21 22:31:15 UTC
Closing - RHEV 3.3 Released


Note You need to log in before you can comment on or make changes to this bug.