Bug 1390960
Summary: | when one of the node goes to non responsive state status of vms residing on that host goes to UNKNOWN | ||
---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | RamaKasturi <knarra> |
Component: | BLL.Infra | Assignee: | Martin Perina <mperina> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Pavel Stehlik <pstehlik> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.1.0 | CC: | bugs, knarra, michal.skrivanek, mperina, oourfali, sabose, sasundar |
Target Milestone: | --- | Flags: | sabose:
ovirt-4.1?
rule-engine: planning_ack? rule-engine: devel_ack? rule-engine: testing_ack? |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-02-03 07:28:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1277939 |
Description
RamaKasturi
2016-11-02 09:57:27 UTC
sos reports can be found in the link below: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1390960/ That is intentional. Unresponsive host is not responding hence the status of VMs cannot be known. Why is it a problem? Hi Michal, Hi Michal, AFAIU, when the vm is marked highly available and if the host on which vm resides goes down, vm should be automatically restarted on other host. Please correct me if i am wrong. Thanks kasturi. I have re-tested this with the latest 4.1 bits by running the steps below. 1. Install HC cluster 2. configure power management 3. Apply new gluster fencing policies. 4. create new vm and mark that as highly available. 5. Now bring down ovirtmgmt nic on one of the hosts and wait for host to move to non responsive state. Actual results: status of vms residing on that host goes to 'UNKNOWN' till the host becomes reponsive. Expected results: vms residing on that host should not go to UNKOWN state and I/O stops happening on the vm. 1. First host goes to non-responsive. 2. Engine checks the status of host through PM. 3. Engine thinks host is rebooting. But I don't think its correct. For some reason, fencing is skipped and engine thinks fencing is executed. Note: 'Host <host> is rebooting' message is logged when fencing is executed on the host. This is by definition, so perhaps the gluster logic to skip fencing isn't working well? Can you give the relevant time as the logs span over a lot of time? This should be between 12.00 P.M - 2.00 P.M IST. please ignore comment 5,6,7 &9. This was supposed to go to another bug, it was my bad that i updated here. (In reply to RamaKasturi from comment #9) > This should be between 12.00 P.M - 2.00 P.M IST. Hi Oved, I ran this test case long time back and exactly not sure of what the time would be. But based on the bug logged i think it should be between 3.30P.M - 4.30 P.M. Thanks kasturi Can you retest this with fix for Bug 1413928 (In reply to Sahina Bose from comment #12) > Can you retest this with fix for Bug 1413928 Any news? Sas, can you check this? Closing. If the issue occurs again, please reopen. (In reply to Sahina Bose from comment #14) > Sas, can you check this? I will retest this scenario with RHV 4.3 and RHGS 3.4.4 |