Bug 969121
Summary: | [engine-backend] host stuck in non-operational and SDs remain active while Data center is Non-responsive | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Elad <ebenahar> | ||||||
Component: | ovirt-engine | Assignee: | Nobody's working on this, feel free to take it <nobody> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 3.2.0 | CC: | acanan, acathrow, amureini, ebenahar, iheim, jkt, laravot, lpeer, Rhev-m-bugs, scohen, yeylon | ||||||
Target Milestone: | --- | ||||||||
Target Release: | 3.3.0 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | storage | ||||||||
Fixed In Version: | is2 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | Type: | Bug | |||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
CORRECTION: reproduction steps: happened to me with 2 storage domains from the same server Elad , 1. the logs of vdsm and the engine do not match, the engine logs are till 27/5 while the vdsm logs start at the 29/5 - please try to reproduce and attach the correct logs of less big timeframe if possible, thanks. 2. please point to the point in time in the logs in which the scenario you referred to happend, i didn't see it in the logs. I think that the best option is to reproduce the issue. Regardless, there were two possibly related issues which reminds me the issue that you described: 1. This bug (from the text) - Domains statuses aren't changed. (Recent) https://bugzilla.redhat.com/show_bug.cgi?id=977169 2. When deactivating domain, it's saved status in the compensation is set to UNKNOWN instead of Active (compensation doesn't appear in the engine log) https://bugzilla.redhat.com/show_bug.cgi?id=920694#c6 #2 was merged, while #1 wasn't. Created attachment 770314 [details]
logs
Managed to reproduce on 3.2: rhevm-3.2.1-0.39.el6ev.noarch
attached engine.log and vdsm.log
this issue in the new log should be be solved by #2 - moving to MODIFIED Does not reprodeuced on RHEM3.3 (is5): rhevm-3.3.0-0.6.master.el6ev.noarch vdsm-4.11.0-121.git082925a.el6.x86_64 After interruption in reconstruct, host becomes active and reconstruct ends successfuly. Closing - RHEV 3.3 Released Closing - RHEV 3.3 Released Closing - RHEV 3.3 Released |
Created attachment 754989 [details] logs Description of problem: after interruption in reconstruct spm tries to connect to pool and fails. after that, the host stuck in non-operational and the domains remain active Version-Release number of selected component (if applicable): vdsm-4.10.2-22.0.el6ev.x86_64 rhevm-3.2.0-11.29.el6ev.noarch How reproducible: 50% Steps to Reproduce: on 1 host and 2 SDs from different storage servers: 1. maintenance to the master domain and during that, stop vdsmd 2. reconstruct will fail 3. start to vdsmd Actual results: 1) host will become non-operational and stuck. 2) the 2 SDs will remain active even though there are no active hosts in the setup. Expected results: 1) host should not stuck in non-operational 2) storage domains should become inactive Additional info: logs