Red Hat Bugzilla – Bug 969121
[engine-backend] host stuck in non-operational and SDs remain active while Data center is Non-responsive
Last modified: 2016-02-10 12:08:29 EST
Created attachment 754989 [details]
Description of problem:
after interruption in reconstruct spm tries to connect to pool and fails. after that, the host stuck in non-operational and the domains remain active
Version-Release number of selected component (if applicable):
Steps to Reproduce: on 1 host and 2 SDs from different storage servers:
1. maintenance to the master domain and during that, stop vdsmd
2. reconstruct will fail
3. start to vdsmd
1) host will become non-operational and stuck.
2) the 2 SDs will remain active even though there are no active hosts in the setup.
1) host should not stuck in non-operational
2) storage domains should become inactive
Additional info: logs
CORRECTION: reproduction steps: happened to me with 2 storage domains from the same server
1. the logs of vdsm and the engine do not match, the engine logs are till 27/5 while the vdsm logs start at the 29/5 - please try to reproduce and attach the correct logs of less big timeframe if possible, thanks.
2. please point to the point in time in the logs in which the scenario you referred to happend, i didn't see it in the logs. I think that the best option is to reproduce the issue.
Regardless, there were two possibly related issues which reminds me the issue that you described:
1. This bug (from the text) - Domains statuses aren't changed. (Recent)
2. When deactivating domain, it's saved status in the compensation is set to UNKNOWN instead of Active (compensation doesn't appear in the engine log)
#2 was merged, while #1 wasn't.
Created attachment 770314 [details]
Managed to reproduce on 3.2: rhevm-3.2.1-0.39.el6ev.noarch
attached engine.log and vdsm.log
this issue in the new log should be be solved by #2 - moving to MODIFIED
Does not reprodeuced on RHEM3.3 (is5):
After interruption in reconstruct, host becomes active and reconstruct ends successfuly.
Closing - RHEV 3.3 Released