Bug 921521
| Summary: | VM status is still green when the hypervisor were brought down which causes the Data centre to be non-responsive. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Rahul Hinduja <rhinduja> | ||||||
| Component: | ovirt-engine-webadmin-portal | Assignee: | Omer Frenkel <ofrenkel> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Lukas Svaty <lsvaty> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 3.1.0 | CC: | acathrow, cpelland, dron, ecohen, iheim, jkt, michal.skrivanek, mkalinin, ofrenkel, Rhev-m-bugs, scohen, sputhenp, yeylon | ||||||
| Target Milestone: | --- | Keywords: | Regression, ZStream | ||||||
| Target Release: | 3.3.0 | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | virt | ||||||||
| Fixed In Version: | is1 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 984943 (view as bug list) | Environment: | |||||||
| Last Closed: | Type: | Bug | |||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 984943 | ||||||||
| Attachments: |
|
||||||||
Created attachment 709978 [details]
Data centre
the VMs should go to pause state only on EIO issue. are you sure they tried to perform any write activity? (In reply to comment #3) > the VMs should go to pause state only on EIO issue. are you sure they tried > to perform any write activity? they should go to unknown state, adding additional question: 1) could be related to UI refresh time? we had similar issue with data-center status. 2) did eventually they turned to unknown? (In reply to comment #4) > (In reply to comment #3) > > the VMs should go to pause state only on EIO issue. are you sure they tried > > to perform any write activity? > > they should go to unknown state, adding additional question: > > 1) could be related to UI refresh time? we had similar issue with > data-center status. > 2) did eventually they turned to unknown? The host was shutdown when it had running VMs and VMs statuses did not update properly, this has nothing to do with storage. The VMs are not and should not be in PAUSED as they are no longer running at all (the processes are dead since the host was shut down). They should indeed be in UNKNOWN state as Haim stated above since engine cannot know whether it lost communications to the host or the host shutdown. (In reply to comment #4) > (In reply to comment #3) > > the VMs should go to pause state only on EIO issue. are you sure they tried > > to perform any write activity? > > they should go to unknown state, adding additional question: > > 1) could be related to UI refresh time? we had similar issue with > data-center status. > 2) did eventually they turned to unknown? They were not turned to unknown state. Also tried refreshing the UI, as well as signed out and logged in again. VM's were always green (up) in status When I stop the VDSM, the communication is not available...
2013-04-08 10:29:20,903 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-3) [6c279b77] Failed to refresh VDS , vds = 6c8ab668-3cab-4af4-a111-4b02dc694393 : XXXXXX.redhat.com, VDS Network Error, continuing.
java.net.ConnectException: Connection refused
...
2 possible log items detected after a while:
- VdsNotRespondingTreatmentCommand (handles migrating VMs in current code only and sets them to Unknown status)
2013-04-08 10:30:18,783 INFO [org.ovirt.engine.core.bll.VdsEventListener] (pool-10-thread-49) ResourceManager::vdsNotResponding entered for Host 6c8ab668-3cab-4af4-a111-4b02dc694393, 10.34.63.178
2013-04-08 10:30:18,976 WARN [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-10-thread-49) [3de1b0d2] CanDoAction of action VdsNotRespondingTreatment failed. Reasons:VDS_FENCE_DISABLED
- VdsManager (different test)
2013-04-08 11:07:46,843 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-10) Server failed to respond, vds_id = 6c8ab668-3cab-4af4-a111-4b02dc694393, vds_name = XXXXXX.redhat.com, error = java.net.ConnectException: Connection refused
2013-04-08 11:07:46,892 INFO [org.ovirt.engine.core.bll.VdsEventListener] (pool-10-thread-49) ResourceManager::vdsNotResponding entered for Host 6c8ab668-3cab-4af4-a111-4b02dc694393, 10.34.63.178
Proposed solution:
- when the host is set to NON-RESPONSIVE after a timeout, all running VMs on the hosts are set to UNKNOWN status (class VdsManager) and a new message about VM status transition to Unknown is inserted into audit log
- when the VDSM is available again, the host is Up after a short while and a new message is logged into audit log - 'VM {} status is restored to {}' (class: VdsUpdateRunTimeInfo)
The problem is that VdsNotRespondingTreatment command fails for hosts with disabled PM, and doesn't call the HandleError method that handle this situation. since this is a regression and severe in my opinion, raising priority. *** Bug 953546 has been marked as a duplicate of this bug. *** Merged u/s: a32eb72e62d076393e82fe1e7c908cfd98271ba0 *** Bug 974297 has been marked as a duplicate of this bug. *** tested on rhevm3.3 is2 VM moved to status Unknown after hypervisor was brought down Closing - RHEV 3.3 Released Closing - RHEV 3.3 Released Closing - RHEV 3.3 Released |
Created attachment 709977 [details] virtual machine tab Description of problem: ======================= VM status is still green when the hypervisor were brought down which causes the Data centre to be non-responsive. Version-Release number of selected component (if applicable): ============================================================= RHEVH: ====== Red Hat Enterprise Virtualization Hypervisor release 6.4 (20130306.2.el6_4) Setup: ====== 1. rhs-client43.lab.eng.blr.redhat.com (RHEV-H PosixFS datacenter) 2. rhs-client17.lab.eng.blr.redhat.com 3. rhs-client18.lab.eng.blr.redhat.com Setup Involves: 1. client43: Installed rhev-h 2. Created DataCentre and added hypervisor 3. Formatted client17 and 18 with rhel 6.4 4. Created 2 RHS VM's(10.70.37.147 and 10.70.37.219) one on client17 and one on client18 5. Create 1*2 volume from 10.70.37.147 and 10.70.37.219 6. Added the above volume in DC 7. Rhevm: mario.lab.eng.blr.redhat.com Steps Carried: =============== 1. Powered off the hypervisor 2. Data centre went to non-responsive state. Storage domain was in unknown state. 3. VM's were not accessible. But the VM status were in green light on the RHEVM virtual machines tab. Actual results: =============== VM status is Up though VM are not accessible Expected results: ================= VM status should go to paused state